Large-scale human genetics studies are ascertaining increasing proportions of populations as they continue growing in both number and scale. As a result, the amount of cryptic relatedness within these study cohorts is growing rapidly and has significant implications on downstream analyses. We demonstrate this growth empirically among the first 92,455 exomes from the DiscovEHR cohort and, via a custom simulation framework we developed called SimProgeny, show that these measures are in line with expectations given the underlying population and ascertainment approach. For example, within DiscovEHR we identified ∼66,000 close (first- and second-degree) relationships, involving 55.6% of study participants. Our simulation results project that >70% of the cohort will be involved in these close relationships, given that DiscovEHR scales to 250,000 recruited individuals. We reconstructed 12,574 pedigrees by using these relationships (including 2,192 nuclear families) and leveraged them for multiple applications. The pedigrees substantially improved the phasing accuracy of 20,947 rare, deleterious compound heterozygous mutations. Reconstructed nuclear families were critical for identifying 3,415 de novo mutations in ∼1,783 genes. Finally, we demonstrate the segregation of known and suspected disease-causing mutations, including a tandem duplication that occurs in LDLR and causes familial hypercholesterolemia, through reconstructed pedigrees. In summary, this work highlights the prevalence of cryptic relatedness expected among large healthcare population-genomic studies and demonstrates several analyses that are uniquely enabled by large amounts of cryptic relatedness.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5986700PMC
http://dx.doi.org/10.1016/j.ajhg.2018.03.012DOI Listing

Publication Analysis

Top Keywords

cryptic relatedness
12
92455 exomes
8
nuclear families
8
profiling leveraging
4
relatedness
4
leveraging relatedness
4
relatedness precision
4
precision medicine
4
medicine cohort
4
cohort 92455
4

Similar Publications

is a One Health pathogen found in humans, animals, and the environment, with food representing a potential transmission route. One Health studies are often limited to a single country or selected reservoirs and ribotypes. This study provides a varied and accessible collection of isolates and sequencing data derived from human, animal, and food sources across 13 European countries.

View Article and Find Full Text PDF

While traits that contribute to premating sexual interactions are known to be wildly diverse, much less is known about the diversity of postmating (especially female) reproductive traits and the mechanisms shaping this diversity. To assess the rate, pattern, and potential drivers of postmating reproductive trait evolution, we analyzed male and female traits across up to 30 species within a phylogenetic comparative framework. In addition to postmating reproductive morphology (e.

View Article and Find Full Text PDF

Using eDNA to Supplement Population Genetic Analyses for Cryptic Marine Species: Identifying Population Boundaries for Alaska Harbour Porpoises.

Mol Ecol

October 2024

Marine Mammal Genetics Program, Southwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, La Jolla, California, USA.

Article Synopsis
  • Isolation by distance and geographic boundaries have shaped the population genetic structure of harbour porpoises along the Pacific coast, with a focus on Alaska where recent research used both tissue and environmental DNA (eDNA) samples to fill gaps in previous studies.* -
  • The study found limited genetic differentiation among harbour porpoise populations based on nuclear SNP data, but mtDNA analysis revealed significant structuring, especially between the Gulf of Alaska and the eastern Bering Sea, suggesting restricted gene flow and potential natal site fidelity.* -
  • The targeted eDNA sampling in Southeast Alaska significantly enhanced the genetic dataset, indicating a population boundary within the recognized Southeast Alaska Stock, which is vital for informing conservation efforts and mitigating fisheries conflicts.*
View Article and Find Full Text PDF

MethylGenotyper: Accurate Estimation of SNP Genotypes and Genetic Relatedness from DNA Methylation Data.

Genomics Proteomics Bioinformatics

September 2024

Ministry of Education Key Laboratory of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China.

Article Synopsis
  • Epigenome-wide association studies (EWAS) can be affected by confounding factors like population structure and genetic relatedness, making kinship estimation difficult without genotyping data.
  • The authors introduced a new method called MethylGenotyper, which accurately determines genotypes at thousands of SNPs using DNA methylation microarray data.
  • In tests with data from Chinese and Australian samples, MethylGenotyper achieved high accuracy and allowed for the identification of genetic relationships, and is available as an R package for broader use in future studies.
View Article and Find Full Text PDF

Linear mixed models (LMMs) have been widely used in genome-wide association studies to control for population stratification and cryptic relatedness. However, estimating LMM parameters is computationally expensive, necessitating large-scale matrix operations to build the genetic relationship matrix (GRM). Over the past 25 years, Randomized Linear Algebra has provided alternative approaches to such matrix operations by leveraging , which often results in provably accurate fast and efficient approximations.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!