Publications by Neale B | LitMetric

Publications by authors named "Neale B"

Page 1 of 15

The Scalable Variant Call Representation: Enabling Genetic Analysis Beyond One Million Genomes.

Timothy Poterba Christopher Vittal Daniel King Daniel Goldstein Jacqueline I Goldstein

Bioinformatics

December 2024

Motivation: The Variant Call Format (VCF) is widely used in genome sequencing but scales poorly. For instance, we estimate a 150,000 genome VCF would occupy 900 TiB, making it costly and complicated to produce, analyze, and store. The issue stems from VCF's requirement to densely represent both reference-genotypes and allele-indexed arrays.

View Article and Find Full Text PDF

Phenotype harmonization and analysis for The Populations Underrepresented in Mental illness Association Studies (the PUMAS Project).

Ana M Ramirez-Diaz Ana M Diaz-Zuluaga Rocky E Stroud Annabel Vreeker Mary Bitta

medRxiv

October 2024

Article Synopsis

- The PUMAS project aims to address the lack of representation of African and Latin American populations in psychiatric genetics studies by analyzing genetic data from individuals with serious mental illness (SMI), including disorders like schizophrenia and bipolar disorder, using data from 89,320 participants across four different cohorts.
- The research involves harmonizing data from various clinical assessments to create standardized measures of mental health symptoms, which allows for more accurate genetic analyses across different diagnoses and symptoms.
- The findings show that schizophrenia and severe bipolar disorder are the most common diagnoses among participants, and a set of 19 key symptoms has been identified, which may be useful for cross-diagnosis genetic studies.

View Article and Find Full Text PDF

FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations.

Thomas Cheng Li Hufeng Zhou Vineet Verma Xiangru Tang Yanjun Shao

Bioinform Adv

September 2024

Motivation: Functional Annotation of genomic Variants Online Resources (FAVOR) offers multi-faceted, whole genome variant functional annotations, which is essential for Whole Genome and Exome Sequencing (WGS/WES) analysis and the functional prioritization of disease-associated variants. A versatile chatbot designed to facilitate informative interpretation and interactive, user-centric summary of the whole genome variant functional annotation data in the FAVOR database is needed.

Results: We have developed FAVOR-GPT, a generative natural language interface powered by integrating large language models (LLMs) and FAVOR.

View Article and Find Full Text PDF

A blended genome and exome sequencing method captures genetic variation in an unbiased, high-quality, and cost-effective manner.

Toni A Boltz Benjamin B Chu Calwing Liao Julia M Sealock Robert Ye

bioRxiv

September 2024

We deployed the Blended Genome Exome (BGE), a DNA library blending approach that generates low pass whole genome (1-4× mean depth) and deep whole exome (30-40× mean depth) data in a single sequencing run. This technology is cost-effective, empowers most genomic discoveries possible with deep whole genome sequencing, and provides an unbiased method to capture the diversity of common SNP variation across the globe. To evaluate this new technology at scale, we applied BGE to sequence >53,000 samples from the Populations Underrepresented in Mental Illness Associations Studies (PUMAS) Project, which included participants across African, African American, and Latin American populations.

View Article and Find Full Text PDF

Semi-supervised machine learning method for predicting homogeneous ancestry groups to assess Hardy-Weinberg equilibrium in diverse whole-genome sequencing studies.

Derek Shyr Rounak Dey Xihao Li Hufeng Zhou Eric Boerwinkle

Am J Hum Genet

October 2024

Large-scale, multi-ethnic whole-genome sequencing (WGS) studies, such as the National Human Genome Research Institute Genome Sequencing Program's Centers for Common Disease Genomics (CCDG), play an important role in increasing diversity for genetic research. Before performing association analyses, assessing Hardy-Weinberg equilibrium (HWE) is a crucial step in quality control procedures to remove low quality variants and ensure valid downstream analyses. Diverse WGS studies contain ancestrally heterogeneous samples; however, commonly used HWE methods assume that the samples are homogeneous.

View Article and Find Full Text PDF

Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases.

Kai Yuan Ryan J Longchamps Antonio F Pardiñas Mingrui Yu Tzu-Ting Chen

Nat Genet

September 2024

Genome-wide association studies (GWAS) of human complex traits or diseases often implicate genetic loci that span hundreds or thousands of genetic variants, many of which have similar statistical significance. While statistical fine-mapping in individuals of European ancestry has made important discoveries, cross-population fine-mapping has the potential to improve power and resolution by capitalizing on the genomic diversity across ancestries. Here we present SuSiEx, an accurate and computationally efficient method for cross-population fine-mapping.

View Article and Find Full Text PDF

Principled distillation of UK Biobank phenotype data reveals underlying structure in human variation.

Caitlin E Carey Rebecca Shafee Robbee Wedow Amanda Elliott Duncan S Palmer

Nat Hum Behav

August 2024

Data within biobanks capture broad yet detailed indices of human variation, but biobank-wide insights can be difficult to extract due to complexity and scale. Here, using large-scale factor analysis, we distill hundreds of variables (diagnoses, assessments and survey items) into 35 latent constructs, using data from unrelated individuals with predominantly estimated European genetic ancestry in UK Biobank. These factors recapitulate known disease classifications, disentangle elements of socioeconomic status, highlight the relevance of psychiatric constructs to health and improve measurement of pro-health behaviours.

View Article and Find Full Text PDF

Exome-wide evidence of compound heterozygous effects across common phenotypes in the UK Biobank.

Frederik H Lassen Samvida S Venkatesh Nikolas Baya Barney Hill Wei Zhou

Cell Genom

July 2024

The phenotypic impact of compound heterozygous (CH) variation has not been investigated at the population scale. We phased rare variants (MAF ∼0.001%) in the UK Biobank (UKBB) exome-sequencing data to characterize recessive effects in 175,587 individuals across 311 common diseases.

View Article and Find Full Text PDF

A unified framework for estimating country-specific cumulative incidence for 18 diseases stratified by polygenic risk.

Bradley Jermy Kristi Läll Brooke N Wolford Ying Wang Kristina Zguro

Nat Commun

June 2024

Polygenic scores (PGSs) offer the ability to predict genetic risk for complex diseases across the life course; a key benefit over short-term prediction models. To produce risk estimates relevant to clinical and public health decision-making, it is important to account for varying effects due to age and sex. Here, we develop a novel framework to estimate country-, age-, and sex-specific estimates of cumulative incidence stratified by PGS for 18 high-burden diseases.

View Article and Find Full Text PDF

Author Correction: Nuclear genetic control of mtDNA copy number and heteroplasmy in humans.

Rahul Gupta Masahiro Kanai Timothy J Durham Kristin Tsuo Jason G McCoy

Nature

June 2024

View Article and Find Full Text PDF

Efficient and accurate mixed model association tool for single-cell eQTL analysis.

Wei Zhou Anna S E Cuomo Angli Xue Masahiro Kanai Grant Chau

medRxiv

May 2024

Understanding the genetic basis of gene expression can help us understand the molecular underpinnings of human traits and disease. Expression quantitative trait locus (eQTL) mapping can help in studying this relationship but have been shown to be very cell-type specific, motivating the use of single-cell RNA sequencing and single-cell eQTLs to obtain a more granular view of genetic regulation. Current methods for single-cell eQTL mapping either rely on the "pseudobulk" approach and traditional pipelines for bulk transcriptomics or do not scale well to large datasets.

View Article and Find Full Text PDF

GUIDE deconstructs genetic architectures using association studies.

Daniel Lazarev Grant Chau Alex Bloemendal Claire Churchhouse Benjamin M Neale

bioRxiv

May 2024

Genome-wide association studies have revealed that the genetic architecture of most complex traits is characterized by a large number of distinct effects scattered across the genome. Functional enrichment analyses of these results suggest that the associations for any given complex trait are not purely random. Thus, we set out to leverage the genetic association results from many traits with a view to identifying the set of modules, or latent factors, that mediate these associations.

View Article and Find Full Text PDF

A harmonized public resource of deeply sequenced diverse human genomes.

Zan Koenig Mary T Yohannes Lethukuthula L Nkambule Xuefang Zhao Julia K Goodrich

Genome Res

June 2024

Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs.

View Article and Find Full Text PDF

Genome-wide association study identifies 30 obsessive-compulsive disorder associated loci.

Nora I Strom Zachary F Gerring Marco Galimberti Dongmei Yu Matthew W Halvorsen

medRxiv

March 2024

Article Synopsis

Obsessive-compulsive disorder (OCD) affects about 1% of people and has a strong genetic component, but previous studies have not fully explained its genetic causes or biological mechanisms.
A large genome-wide association study (GWAS) analyzed data from over 53,000 OCD cases and over 2 million control participants, identifying 30 significant genetic markers related to OCD and suggesting a 6.7% heritability from SNPs.
The research also found 249 candidate risk genes linked to OCD, particularly in specific brain regions, and showed genetic correlations with various psychiatric disorders, laying the groundwork for further studies and potential treatments.

View Article and Find Full Text PDF

The landscape of regional missense mutational intolerance quantified from 125,748 exomes.

Katherine R Chao Lily Wang Ruchit Panchal Calwing Liao Haneen Abderrazzaq

bioRxiv

May 2024

Missense variants can have a range of functional impacts depending on factors such as the specific amino acid substitution and location within the gene. To interpret their deleteriousness, studies have sought to identify regions within genes that are specifically intolerant of missense variation . Here, we leverage the patterns of rare missense variation in 125,748 individuals in the Genome Aggregation Database (gnomAD) against a null mutational model to identify transcripts that display regional differences in missense constraint.

View Article and Find Full Text PDF

Blended Genome Exome (BGE) as a Cost Efficient Alternative to Deep Whole Genomes or Arrays.

Matthew DeFelice Jonna L Grimsby Daniel Howrigan Kai Yuan Sinéad B Chapman

bioRxiv

July 2024

Genomic scientists have long been promised cheaper DNA sequencing, but deep whole genomes are still costly, especially when considered for large cohorts in population-level studies. More affordable options include microarrays + imputation, whole exome sequencing (WES), or low-pass whole genome sequencing (WGS) + imputation. WES + array + imputation has recently been shown to yield 99% of association signals detected by WGS.

View Article and Find Full Text PDF

The Scalable Variant Call Representation: Enabling Genetic Analysis Beyond One Million Genomes.

Timothy Poterba Christopher Vittal Daniel King Daniel Goldstein Jacqueline I Goldstein

bioRxiv

January 2024

The Variant Call Format (VCF) is widely used in genome sequencing but scales poorly. For instance, we estimate a 150,000 genome VCF would occupy 900 TiB, making it both costly and complicated to produce and analyze. The issue stems from VCF's requirement to densely represent both reference-genotypes and allele-indexed arrays.

View Article and Find Full Text PDF

Author Correction: A genomic mutational constraint map using variation in 76,156 human genomes.

Siwei Chen Laurent C Francioli Julia K Goodrich Ryan L Collins Masahiro Kanai

Nature

February 2024

View Article and Find Full Text PDF

A genomic mutational constraint map using variation in 76,156 human genomes.

Siwei Chen Laurent C Francioli Julia K Goodrich Ryan L Collins Masahiro Kanai

Nature

January 2024

Article Synopsis

- The study focuses on understanding how purifying natural selection affects variations in non-coding regions of the human genome, alongside existing knowledge of protein-coding genes responsible for human disorders.
- Researchers created a comprehensive constraint map, named Gnocchi, using data from 76,156 human genomes to analyze genomic variations, with a refined model that factors in local sequences and features to identify areas with less variation.
- Findings indicate that while protein-coding regions show stronger constraint, certain non-coding regions related to regulatory elements are also important, suggesting that analyzing non-coding DNA can help uncover previously unidentified constrained genes linked to diseases.

View Article and Find Full Text PDF

Inferring compound heterozygosity from large-scale exome sequencing data.

Michael H Guo Laurent C Francioli Sarah L Stenton Julia K Goodrich Nicholas A Watts

Nat Genet

January 2024

Recessive diseases arise when both copies of a gene are impacted by a damaging genetic variant. When a patient carries two potentially causal variants in a gene, accurate diagnosis requires determining that these variants occur on different copies of the chromosome (that is, are in trans) rather than on the same copy (that is, in cis). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings.

View Article and Find Full Text PDF

Improving fine-mapping by modeling infinitesimal effects.

Ran Cui Roy A Elzur Masahiro Kanai Jacob C Ulirsch Omer Weissbrod

Nat Genet

January 2024

Fine-mapping aims to identify causal genetic variants for phenotypes. Bayesian fine-mapping algorithms (for example, SuSiE, FINEMAP, ABF and COJO-ABF) are widely used, but assessing posterior probability calibration remains challenging in real data, where model misspecification probably exists, and true causal variants are unknown. We introduce replication failure rate (RFR), a metric to assess fine-mapping consistency by downsampling.

View Article and Find Full Text PDF

CHARR efficiently estimates contamination from DNA sequencing data.

Wenhan Lu Laura D Gauthier Timothy Poterba Edoardo Giacopuzzi Julia K Goodrich

Am J Hum Genet

December 2023

DNA sample contamination is a major issue in clinical and research applications of whole-genome and -exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely.

View Article and Find Full Text PDF

Decoding Genetics, Ancestry, and Geospatial Context for Precision Health.

Satoshi Koyama Ying Wang Kaavya Paruchuri Md Mesbah Uddin So Mi J Cho

medRxiv

October 2023

Mass General Brigham, an integrated healthcare system based in the Greater Boston area of Massachusetts, annually serves 1.5 million patients. We established the Mass General Brigham Biobank (MGBB), encompassing 142,238 participants, to unravel the intricate relationships among genomic profiles, environmental context, and disease manifestations within clinical practice.

View Article and Find Full Text PDF

Bayesian multivariate genetic analysis improves translational insights.

Sarah M Urbut Satoshi Koyama Whitney Hornsby Rohan Bhukar Sumeet Kheterpal

iScience

October 2023

While lipid traits are known essential mediators of cardiovascular disease, few approaches have taken advantage of their shared genetic effects. We apply a Bayesian multivariate size estimator, mash, to GWAS of four lipid traits in the Million Veterans Program (MVP) and provide posterior mean and local false sign rates for all effects. These estimates borrow information across traits to improve effect size accuracy.

View Article and Find Full Text PDF

An Ethical Framework for Research Using Genetic Ancestry.

Anna C F Lewis Santiago J Molina Paul S Appelbaum Bege Dauda Agustin Fuentes

Perspect Biol Med

January 2023

A wide range of research uses patterns of genetic variation to infer genetic similarity between individuals, typically referred to as genetic ancestry. This research includes inference of human demographic history, understanding the genetic architecture of traits, and predicting disease risk. Researchers are not just structuring an intellectual inquiry when using genetic ancestry, they are also creating analytical frameworks with broader societal ramifications.

View Article and Find Full Text PDF