Haplotype phasing, the process of determining which genetic variants are physically located on the same chromosome, is crucial for various genetic analyses. In this study, we first benchmark SHAPEIT and Beagle, two state-of-the-art phasing methods, on two large datasets: > 8 million diverse, research-consented 23andMe, Inc. customers and the UK Biobank (UKB). We find that both perform exceptionally well. Beagle's median switch error rate (SER) (after excluding single SNP switches) in white British trios from UKB is 0.026% compared to 0.00% for European ancestry 23andMe research participants; 55.6% of European ancestry 23andMe research participants have zero non-single SNP switches, compared to 42.4% of white British trios. South Asian ancestry 23andMe research participants have the highest median SER amongst the 23andMe populations, but it is still remarkably low at 0.46%. We also investigate the relationship between identity-by-descent (IBD) and SER, finding that switch errors tend to occur in regions of little or no IBD segment coverage. SHAPEIT and Beagle excel at 'intra-chromosomal' phasing, but lack the ability to phase across chromosomes, motivating us to develop an inter-chromosomal phasing method, called HAPTIC ( lotype ling and lustering), that assigns paternal and maternal variants discretely genome-wide. Our approach uses identity-by-descent (IBD) segments to phase blocks of variants on different chromosomes. HAPTIC represents the segments a focal individual shares with their relatives as nodes in a signed graph and performs bipartite clustering on the signed graph using spectral clustering. We test HAPTIC on 1022 UKB trios, yielding a median phase error of 0.08% in regions covered by IBD segments (33.5% of sites). We also ran HAPTIC in the 23andMe database and found a median phase error rate (the rate of mismatching alleles between the inferred and true phase) of 0.92% in Europeans (93.8% of sites) and 0.09% in admixed Africans (92.7% of sites). HAPTIC's precision depends heavily on data from relatives, so will increase as datasets grow larger and more diverse. HAPTIC enables analyses that require the parent-of-origin of variants, such as association studies and ancestry inference of untyped parents.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11100733PMC
http://dx.doi.org/10.1101/2024.05.06.592816DOI Listing

Publication Analysis

Top Keywords

ancestry 23andme
12
23andme participants
12
shapeit beagle
8
error rate
8
snp switches
8
white british
8
british trios
8
european ancestry
8
identity-by-descent ibd
8
ibd segments
8

Similar Publications

Recent advancements in Parkinson's disease (PD) drug development have been significantly driven by genetic research. Importantly, drugs supported by genetic evidence are more likely to be approved. While genome-wide association studies (GWAS) are a powerful tool to nominate genomic regions associated with certain traits or diseases, pinpointing the causal biologically relevant gene is often challenging.

View Article and Find Full Text PDF

Purpose: Carrier screening identifies reproductive risk for autosomal recessive and X-linked genetic conditions. Currently, some medical society guidelines continue to recommend ethnicity-based carrier screening for conditions associated with Ashkenazi Jewish (AJ) ancestry. We assessed the utility and limitations of these guidelines in a large, ethnically and genetically diverse cohort of genotyped individuals.

View Article and Find Full Text PDF

Background: Multiplexed Assays of Variant Effects (MAVEs) can test all possible single variants in a gene of interest. The resulting saturation-style functional data may help resolve variant classification disparities between populations, especially for Variants of Uncertain Significance (VUS).

Methods: We analyzed clinical significance classifications in 213,663 individuals of European-like genetic ancestry versus 206,975 individuals of non-European-like genetic ancestry from All of Us and the Genome Aggregation Database.

View Article and Find Full Text PDF

Objective: This study aimed to investigate the causal relationship between insomnia and the risk of myocardial infarction (MI) and explore potential mediators such as smoking initiation, alcohol consumption and body mass index (BMI) using mendelian randomization (MR) analysis.

Methods: Data from 1,207,228 individuals of European ancestry were obtained from the UK Biobank and 23andMe for insomnia-related genetic associations. Genetic instruments for MI, smoking initiation, alcohol consumption, and BMI were derived from large-scale genome-wide association studies.

View Article and Find Full Text PDF

Genome-wide meta-analysis of myasthenia gravis uncovers new loci and provides insights into polygenic prediction.

Nat Commun

November 2024

Department of Psychiatry and Psychotherapy, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Berlin, Germany.

Article Synopsis
  • * The research identified 12 significant genetic markers linked to MG, with certain markers associated specifically with early-onset (under 50) and late-onset (50 and older) forms of the disease.
  • * Additionally, the study highlighted the potential role of genetic factors in determining the age of disease onset and demonstrated that polygenic risk scores could help predict MG status, explaining over 4% of the variation in disease presence.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!