Haplotype phasing, the process of determining which genetic variants are physically located on the same chromosome, is crucial for various genetic analyses. In this study, we first benchmark SHAPEIT and Beagle, two state-of-the-art phasing methods, on two large datasets: > 8 million diverse, research-consented 23andMe, Inc. customers and the UK Biobank (UKB). We find that both perform exceptionally well. Beagle's median switch error rate (SER) (after excluding single SNP switches) in white British trios from UKB is 0.026% compared to 0.00% for European ancestry 23andMe research participants; 55.6% of European ancestry 23andMe research participants have zero non-single SNP switches, compared to 42.4% of white British trios. South Asian ancestry 23andMe research participants have the highest median SER amongst the 23andMe populations, but it is still remarkably low at 0.46%. We also investigate the relationship between identity-by-descent (IBD) and SER, finding that switch errors tend to occur in regions of little or no IBD segment coverage. SHAPEIT and Beagle excel at 'intra-chromosomal' phasing, but lack the ability to phase across chromosomes, motivating us to develop an inter-chromosomal phasing method, called HAPTIC ( lotype ling and lustering), that assigns paternal and maternal variants discretely genome-wide. Our approach uses identity-by-descent (IBD) segments to phase blocks of variants on different chromosomes. HAPTIC represents the segments a focal individual shares with their relatives as nodes in a signed graph and performs bipartite clustering on the signed graph using spectral clustering. We test HAPTIC on 1022 UKB trios, yielding a median phase error of 0.08% in regions covered by IBD segments (33.5% of sites). We also ran HAPTIC in the 23andMe database and found a median phase error rate (the rate of mismatching alleles between the inferred and true phase) of 0.92% in Europeans (93.8% of sites) and 0.09% in admixed Africans (92.7% of sites). HAPTIC's precision depends heavily on data from relatives, so will increase as datasets grow larger and more diverse. HAPTIC enables analyses that require the parent-of-origin of variants, such as association studies and ancestry inference of untyped parents.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11100733 | PMC |
http://dx.doi.org/10.1101/2024.05.06.592816 | DOI Listing |
Recent advancements in Parkinson's disease (PD) drug development have been significantly driven by genetic research. Importantly, drugs supported by genetic evidence are more likely to be approved. While genome-wide association studies (GWAS) are a powerful tool to nominate genomic regions associated with certain traits or diseases, pinpointing the causal biologically relevant gene is often challenging.
View Article and Find Full Text PDFGenet Med Open
July 2024
23andMe, Inc, Sunnyvale, CA.
Purpose: Carrier screening identifies reproductive risk for autosomal recessive and X-linked genetic conditions. Currently, some medical society guidelines continue to recommend ethnicity-based carrier screening for conditions associated with Ashkenazi Jewish (AJ) ancestry. We assessed the utility and limitations of these guidelines in a large, ethnically and genetically diverse cohort of genotyped individuals.
View Article and Find Full Text PDFGenome Med
December 2024
Human Genomics and Evolution, St Vincent's Institute of Medical Research, Fitzroy, 3065, Australia.
Background: Multiplexed Assays of Variant Effects (MAVEs) can test all possible single variants in a gene of interest. The resulting saturation-style functional data may help resolve variant classification disparities between populations, especially for Variants of Uncertain Significance (VUS).
Methods: We analyzed clinical significance classifications in 213,663 individuals of European-like genetic ancestry versus 206,975 individuals of non-European-like genetic ancestry from All of Us and the Genome Aggregation Database.
Front Cardiovasc Med
November 2024
The Second Affiliated Hospital of Heilongjiang University of Chinese Medicine, Harbin, Heilongjiang, China.
Objective: This study aimed to investigate the causal relationship between insomnia and the risk of myocardial infarction (MI) and explore potential mediators such as smoking initiation, alcohol consumption and body mass index (BMI) using mendelian randomization (MR) analysis.
Methods: Data from 1,207,228 individuals of European ancestry were obtained from the UK Biobank and 23andMe for insomnia-related genetic associations. Genetic instruments for MI, smoking initiation, alcohol consumption, and BMI were derived from large-scale genome-wide association studies.
Nat Commun
November 2024
Department of Psychiatry and Psychotherapy, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Berlin, Germany.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!