Here, we characterize the DNA methylation phenotypes of bone marrow cells from mice with hematopoietic deficiency of or (or both enzymes) or expressing the dominant-negative mutation [R882H in humans; the most common mutation found in acute myeloid leukemia (AML)]. Using these cells as substrates, we defined DNA remethylation after overexpressing wild-type (WT) DNMT3A1, DNMT3B1, DNMT3B3 (an inactive splice isoform of DNMT3B), or DNMT3L (a catalytically inactive "chaperone" for DNMT3A and DNMT3B in early embryogenesis). Overexpression of for 2 weeks reverses the hypomethylation phenotype of Dnmt3a-deficient cells or cells expressing the R878H mutation.
View Article and Find Full Text PDFHere the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels.
View Article and Find Full Text PDFPurpose: Persistent molecular disease (PMD) after induction chemotherapy predicts relapse in AML. In this study, we used whole-exome sequencing (WES) and targeted error-corrected sequencing to assess the frequency and mutational patterns of PMD in 30 patients with AML.
Materials And Methods: The study cohort included 30 patients with adult AML younger than 65 years who were uniformly treated with standard induction chemotherapy.
The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina.
View Article and Find Full Text PDFFew studies have explored the impact of rare variants (minor allele frequency < 1%) on highly heritable plasma metabolites identified in metabolomic screens. The Finnish population provides an ideal opportunity for such explorations, given the multiple bottlenecks and expansions that have shaped its history, and the enrichment for many otherwise rare alleles that has resulted. Here, we report genetic associations for 1391 plasma metabolites in 6136 men from the late-settlement region of Finland.
View Article and Find Full Text PDFMutations in the gene encoding DNA methyltransferase 3A () are the most common cause of clonal hematopoiesis and are among the most common initiating events of acute myeloid leukemia (AML). Studies in germline and somatic knockout mice have identified focal, canonical hypomethylation phenotypes in hematopoietic cells; however, the kinetics of methylation loss following acquired inactivation in hematopoietic cells is essentially unknown. Therefore, we evaluated a somatic, inducible model of hematopoietic loss, and show that inactivation of in murine hematopoietic cells results in a relatively slow loss of methylation at canonical sites throughout the genome; in contrast, remethylation of Dnmt3a deficient genomes in hematopoietic cells occurs much more quickly.
View Article and Find Full Text PDFGermline pathogenic variants in DNMT3A were recently described in patients with overgrowth, obesity, behavioral, and learning difficulties (DNMT3A Overgrowth Syndrome/DOS). Somatic mutations in the DNMT3A gene are also the most common cause of clonal hematopoiesis, and can initiate acute myeloid leukemia (AML). Using whole genome bisulfite sequencing, we studied DNA methylation in peripheral blood cells of 11 DOS patients and found a focal, canonical hypomethylation phenotype, which is most severe with the dominant negative DNMT3A mutation.
View Article and Find Full Text PDFBackground: Mitochondrial genome copy number (MT-CN) varies among humans and across tissues and is highly heritable, but its causes and consequences are not well understood. When measured by bulk DNA sequencing in blood, MT-CN may reflect a combination of the number of mitochondria per cell and cell-type composition. Here, we studied MT-CN variation in blood-derived DNA from 19184 Finnish individuals using a combination of genome (N = 4163) and exome sequencing (N = 19034) data as well as imputed genotypes (N = 17718).
View Article and Find Full Text PDFThe contribution of genome structural variation (SV) to quantitative traits associated with cardiometabolic diseases remains largely unknown. Here, we present the results of a study examining genetic association between SVs and cardiometabolic traits in the Finnish population. We used sensitive methods to identify and genotype 129,166 high-confidence SVs from deep whole-genome sequencing (WGS) data of 4,848 individuals.
View Article and Find Full Text PDFLong-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci.
View Article and Find Full Text PDFA key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline to map and characterize structural variants in 17,795 deeply sequenced human genomes.
View Article and Find Full Text PDFAn Amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDFExome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exome sequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits.
View Article and Find Full Text PDFSummary: Large-scale human genetics studies are now employing whole genome sequencing with the goal of conducting comprehensive trait mapping analyses of all forms of genome variation. However, methods for structural variation (SV) analysis have lagged far behind those for smaller scale variants, and there is an urgent need to develop more efficient tools that scale to the size of human populations. Here, we present a fast and highly scalable software toolkit (svtools) and cloud-based pipeline for assembling high quality SV maps-including deletions, duplications, mobile element insertions, inversions and other rearrangements-in many thousands of human genomes.
View Article and Find Full Text PDFBackground: Allogeneic hematopoietic stem-cell transplantation is the only curative treatment for patients with myelodysplastic syndrome (MDS). The molecular predictors of disease progression after transplantation are unclear.
Methods: We sequenced bone marrow and skin samples from 90 adults with MDS who underwent allogeneic hematopoietic stem-cell transplantation after a myeloablative or reduced-intensity conditioning regimen.
In lung adenocarcinoma, canonical EML4-ALK inversion results in a fusion protein with a constitutively active ALK kinase domain. Evidence of ALK rearrangement occurs in a minority (2-7%) of lung adenocarcinoma, and only ~60% of these patients will respond to targeted ALK inhibition by drugs such as crizotinib and ceritinib. Clinically, targeted anti-ALK therapy is often initiated based on evidence of an ALK genomic rearrangement detected by fluorescence in situ hybridization (FISH) of interphase cells in formalin-fixed, paraffin-embedded tissue sections.
View Article and Find Full Text PDFRecurrent genomic mutations in uterine and non-uterine leiomyosarcomas have not been well established. Using a next generation sequencing (NGS) panel of common cancer-associated genes, 25 leiomyosarcomas arising from multiple sites were examined to explore genetic alterations, including single nucleotide variants (SNV), small insertions/deletions (indels), and copy number alterations (CNA). Sequencing showed 86 non-synonymous, coding region somatic variants within 151 gene targets in 21 cases, with a mean of 4.
View Article and Find Full Text PDFSummary: Here we present SVScore, a tool for in silico structural variation (SV) impact prediction. SVScore aggregates per-base single nucleotide polymorphism (SNP) pathogenicity scores across relevant genomic intervals for each SV in a manner that considers variant type, gene features and positional uncertainty. We show that the allele frequency spectrum of high-scoring SVs is strongly skewed toward lower frequencies, suggesting that they are under purifying selection, and that SVScore identifies deleterious variants more effectively than alternative methods.
View Article and Find Full Text PDFQuality assurance for clinical next-generation sequencing (NGS)-based assays is difficult given the complex methods and the range of sequence variants such assays can detect. As the number and range of mutations detected by clinical NGS assays has increased, it is difficult to apply standard analyte-specific proficiency testing (PT). Most current proficiency testing challenges for NGS are methods-based PT surveys that use DNA from reference samples engineered to harbor specific mutations that test both sequence generation and bioinformatics analysis.
View Article and Find Full Text PDFContext: -Most current proficiency testing challenges for next-generation sequencing assays are methods-based proficiency testing surveys that use DNA from characterized reference samples to test both the wet-bench and bioinformatics/dry-bench aspects of the tests. Methods-based proficiency testing surveys are limited by the number and types of mutations that either are naturally present or can be introduced into a single DNA sample.
Objective: -To address these limitations by exploring a model of in silico proficiency testing in which sequence data from a single well-characterized specimen are manipulated electronically.
Background: The Long Life Family Study (LLFS) is an international study to identify the genetic components of various healthy aging phenotypes. We hypothesized that pedigree-specific rare variants at longevity-associated genes could have a similar functional impact on healthy phenotypes.
Methods: We performed custom hybridization capture sequencing to identify the functional variants in 464 candidate genes for longevity or the major diseases of aging in 615 pedigrees (4,953 individuals) from the LLFS, using a multiplexed, custom hybridization capture.
Objectives: To evaluate the extent of human-to-human specimen contamination in clinical next-generation sequencing (NGS) data.
Methods: Using haplotype analysis to detect specimen admixture, with orthogonal validation by short tandem repeat analysis, we determined the rate of clinically significant (>5%) DNA contamination in clinical NGS data from 296 consecutive cases. Haplotype analysis was performed using read haplotypes at common, closely spaced single-nucleotide polymorphisms in low linkage disequilibrium in the population, which were present in regions targeted by the clinical assay.