An Amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDFExome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exome sequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits.
View Article and Find Full Text PDFcan be mutated in individuals diagnosed with unicentric and idiopathic multicentric Castleman disease. Defective lymphocyte apoptosis may be a pathological mechanism shared between Castleman disease and autoimmune lymphoproliferative syndrome.
View Article and Find Full Text PDFThe human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues.
View Article and Find Full Text PDFThe contribution of genetic variants to sporadic amyotrophic lateral sclerosis (ALS) remains largely unknown. Either recessive or de novo variants could result in an apparently sporadic occurrence of ALS. In an attempt to find such variants we sequenced the exomes of 44 ALS-unaffected-parents trios.
View Article and Find Full Text PDFAlthough genome-wide association studies (GWASs) for nonsyndromic orofacial clefts have identified multiple strongly associated regions, the causal variants are unknown. To address this, we selected 13 regions from GWASs and other studies, performed targeted sequencing in 1,409 Asian and European trios, and carried out a series of statistical and functional analyses. Within a cluster of strongly associated common variants near NOG, we found that one, rs227727, disrupts enhancer activity.
View Article and Find Full Text PDFThe human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required.
View Article and Find Full Text PDFA complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error.
View Article and Find Full Text PDFGermline variation at immunoglobulin (IG) loci is critical for pathogen-mediated immunity, but establishing complete haplotype sequences in these regions has been problematic because of complex sequence architecture and diploid source DNA. We sequenced BAC clones from the effectively haploid human hydatidiform mole cell line, CHM1htert, across the light chain IG loci, kappa (IGK) and lambda (IGL), creating single haplotype representations of these regions. The IGL haplotype generated here is 1.
View Article and Find Full Text PDFRecurrent deletions of chromosome 15q13.3 associate with intellectual disability, schizophrenia, autism and epilepsy. To gain insight into the instability of this region, we sequenced it in affected individuals, normal individuals and nonhuman primates.
View Article and Find Full Text PDFExome sequencing in families affected by rare genetic disorders has the potential to rapidly identify new disease genes (genes in which mutations cause disease), but the identification of a single causal mutation among thousands of variants remains a significant challenge. We developed a scoring algorithm to prioritize potential causal variants within a family according to segregation with the phenotype, population frequency, predicted effect, and gene expression in the tissue(s) of interest. To narrow the search space in families with multiple affected individuals, we also developed two complementary approaches to exome-based mapping of autosomal-dominant disorders.
View Article and Find Full Text PDFGenomics is a relatively new scientific discipline, having DNA sequencing as its core technology. As technology has improved the cost and scale of genome characterization over sequencing's 40-year history, the scope of inquiry has commensurately broadened. Massively parallel sequencing has proven revolutionary, shifting the paradigm of genomics to address biological questions at a genome-wide scale.
View Article and Find Full Text PDFThe immunoglobulin heavy-chain locus (IGH) encodes variable (IGHV), diversity (IGHD), joining (IGHJ), and constant (IGHC) genes and is responsible for antibody heavy-chain biosynthesis, which is vital to the adaptive immune response. Programmed V-(D)-J somatic rearrangement and the complex duplicated nature of the locus have impeded attempts to reconcile its genomic organization based on traditional B-lymphocyte derived genetic material. As a result, sequence descriptions of germline variation within IGHV are lacking, haplotype inference using traditional linkage disequilibrium methods has been difficult, and the human genome reference assembly is missing several expressed IGHV genes.
View Article and Find Full Text PDFReduced FCGR3B copy number is associated with increased risk of systemic lupus erythematosus (SLE). The five FCGR2/FCGR3 genes are arranged across two highly paralogous genomic segments on chromosome 1q23. Previous studies have suggested mechanisms for structural rearrangements at the FCGR2/FCGR3 locus and have proposed mechanisms whereby altered FCGR3B copy number predisposes to autoimmunity, but the high degree of sequence similarity between paralogous segments has prevented precise definition of the molecular events and their functional consequences.
View Article and Find Full Text PDFBackground: Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing.
View Article and Find Full Text PDFThe 17q21.31 inversion polymorphism exists either as direct (H1) or inverted (H2) haplotypes with differential predispositions to disease and selection. We investigated its genetic diversity in 2,700 individuals, with an emphasis on African populations.
View Article and Find Full Text PDFRecurrent deletions have been associated with numerous diseases and genomic disorders. Few, however, have been resolved at the molecular level because their breakpoints often occur in highly copy-number-polymorphic duplicated sequences. We present an approach that uses a combination of somatic cell hybrids, array comparative genomic hybridization, and the specificity of next-generation sequencing to determine breakpoints that occur within segmental duplications.
View Article and Find Full Text PDFBackground: The enormous throughput and low cost of second-generation sequencing platforms now allow research and clinical geneticists to routinely perform single experiments that identify tens of thousands to millions of variant sites. Existing methods to annotate variant sites using information from publicly available databases via web browsers are too slow to be useful for the large sequencing datasets being routinely generated by geneticists. Because sequence annotation of variant sites is required before functional characterization can proceed, the lack of a high-throughput pipeline to efficiently annotate variant sites can act as a significant bottleneck in genetics research.
View Article and Find Full Text PDFNovel methods of targeted sequencing of unique regions from complex eukaryotic genomes have generated a great deal of excitement, but critical demonstrations of these methods efficacy with respect to diploid genotype calling and experimental variation are lacking. To address this issue, we optimized microarray-based genomic selection (MGS) for use with the Illumina Genome Analyzer (IGA). A set of 202 fragments (304 kb total) contained within a 1.
View Article and Find Full Text PDFWe developed a general method, microarray-based genomic selection (MGS), capable of selecting and enriching targeted sequences from complex eukaryotic genomes without the repeat blocking steps necessary for bacterial artificial chromosome (BAC)-based genomic selection. We demonstrate that large human genomic regions, on the order of hundreds of kilobases, can be enriched and resequenced with resequencing arrays. MGS, when combined with a next-generation resequencing technology, can enable large-scale resequencing in single-investigator laboratories.
View Article and Find Full Text PDFHumans play little role in the epidemiology of Escherichia coli O157:H7, a commensal bacterium of cattle. Why then does E. coli O157:H7 code for virulence determinants, like the Shiga toxins (Stxs), responsible for the morbidity and mortality of colonized humans? One possibility is that the virulence of these bacteria to humans is coincidental and these virulence factors evolved for and are maintained for other roles they play in the ecology of these bacteria.
View Article and Find Full Text PDF