The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance mean imputed r at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency < 1%) by 0.5-3.1% for an array of one million sites and 0.7-7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169386PMC
http://dx.doi.org/10.1534/g3.118.200502DOI Listing

Publication Analysis

Top Keywords

rare variant
12
variant association
12
tag snps
12
tag snp
8
imputation methods
8
array design
8
imputation accuracy
8
association
5
populations
5
imputation-aware tag
4

Similar Publications

Exome sequencing reveals a rare damaging variant in GRIN2C in familial late-onset Alzheimer's disease.

Alzheimers Res Ther

January 2025

Department of Neuroscience "Rita Levi Montalcini", University of Turin, Via Cherasco 15, Turin, 10126, Italy.

Background: Alzheimer's disease (AD) is a progressive neurodegenerative disorder with both genetic and environmental factors contributing to its pathogenesis. While early-onset AD has well-established genetic determinants, the genetic basis for late-onset AD remains less clear. This study investigates a large Italian family with late-onset autosomal dominant AD, identifying a novel rare missense variant in GRIN2C gene associated with the disease, and evaluates the functional impact of this variant.

View Article and Find Full Text PDF

Oligogenic effect is associated with the clinical heterogeneity of autosomal dominant deafness-15.

Sci Rep

January 2025

Center for Medical Genetics, Hunan Key Laboratory of Medical Genetics, MOE Key Lab of Rare Pediatric Diseases, School of Life Sciences, Central South University, Changsha, 410000, Hunan, China.

Autosomal dominant deafness-15 which is caused by mutation in the POU4F3 gene, has been reported with a wide degree of clinical heterogeneity, even between intrafamilial members. However, the reason is still elusive. In this study, A four-generation Chinese family with 11 patients manifesting late-onset progressive non-syndromic hearing loss was recruited.

View Article and Find Full Text PDF

In monogenic diseases, double mosaic variants of the same gene have rarely been identified. Here, we report the case of triple mosaic variants in PURA, a gene responsible for a neurodevelopmental syndrome (OMIM# 616158). Whole-exome sequencing identified three somatic PURA variants in our case with a similar neurodevelopmental syndrome: NM_005859.

View Article and Find Full Text PDF

Investigating the genetic factors influencing human birth weight may lead to biological insights into fetal growth and long-term health. We report analyses of rare variants that impact birth weight when carried by either fetus or mother, using whole exome sequencing data in up to 234,675 participants. Rare protein-truncating and deleterious missense variants are collapsed to perform gene burden tests.

View Article and Find Full Text PDF

HiFi long-read genomes for difficult-to-detect, clinically relevant variants.

Am J Hum Genet

January 2025

Department of Human Genetics, Radboud University Medical Center, Nijmegen, the Netherlands; Radboudumc Research Institute for Medical Innovation, Radboud University Medical Center, Nijmegen, the Netherlands. Electronic address:

Clinical short-read exome and genome sequencing approaches have positively impacted diagnostic testing for rare diseases. Yet, technical limitations associated with short reads challenge their use for the detection of disease-associated variation in complex regions of the genome. Long-read sequencing (LRS) technologies may overcome these challenges, potentially qualifying as a first-tier test for all rare diseases.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!