Somatic mutations in individual cells lead to genomic mosaicism, contributing to the intricate regulatory landscape of genetic disorders and cancers. To evaluate and refine the detection of somatic mosaicism across different technologies with personalized donor-specific assembly (DSA), we obtained tissue from the dorsolateral prefrontal cortex (DLPFC) of a post-mortem neurotypical 31-year-old individual. We sequenced bulk DLPFC tissue using Oxford Nanopore Technologies (∼60X), NovaSeq (∼30X), and linked-read sequencing (∼28X).
View Article and Find Full Text PDFMammalian genomes contain thousands of genes for long noncoding RNA (lncRNAs), some of which have been shown to affect protein coding gene expression through diverse mechanisms. The lncRNA transcripts are longer than 200 nucleotides and are often capped, spliced, and polyadenylated, but not translated into protein. Nuclear lncRNAs can modify chromatin structure and transcription in trans or cis by interacting with the DNA, forming R-loops, and recruiting regulatory proteins.
View Article and Find Full Text PDFThe transfer of mitochondrial DNA into the nuclear genomes of eukaryotes (Numts) has been linked to lifespan in nonhuman species and recently demonstrated to occur in rare instances from one human generation to the next. Here, we investigated numtogenesis dynamics in humans in 2 ways. First, we quantified Numts in 1,187 postmortem brain and blood samples from different individuals.
View Article and Find Full Text PDFWhen somatic cells acquire complex karyotypes, they often are removed by the immune system. Mutant somatic cells that evade immune surveillance can lead to cancer. Neurons with complex karyotypes arise during neurotypical brain development, but neurons are almost never the origin of brain cancers.
View Article and Find Full Text PDFSomatic mosaicism is defined as an occurrence of two or more populations of cells having genomic sequences differing at given loci in an individual who is derived from a single zygote. It is a characteristic of multicellular organisms that plays a crucial role in normal development and disease. To study the nature and extent of somatic mosaicism in autism spectrum disorder, bipolar disorder, focal cortical dysplasia, schizophrenia, and Tourette syndrome, a multi-institutional consortium called the Brain Somatic Mosaicism Network (BSMN) was formed through the National Institute of Mental Health (NIMH).
View Article and Find Full Text PDFCandida albicans is a frequent colonizer of human mucosal surfaces as well as an opportunistic pathogen. C. albicans is remarkably versatile in its ability to colonize diverse host sites with differences in oxygen and nutrient availability, pH, immune responses, and resident microbes, among other cues.
View Article and Find Full Text PDFWhen somatic cells acquire complex karyotypes, they are removed by the immune system. Mutant somatic cells that evade immune surveillance can lead to cancer. Neurons with complex karyotypes arise during neurotypical brain development, but neurons are almost never the origin of brain cancers.
View Article and Find Full Text PDFThe transfer of mitochondrial DNA into the nuclear genomes of eukaryotes (Numts) has been linked to lifespan in non-human species and recently demonstrated to occur in rare instances from one human generation to the next . Here we investigated numtogenesis dynamics in humans in two ways. First, we quantified Numts in 1,187 post-mortem brain and blood samples from different individuals.
View Article and Find Full Text PDFMucoepidermoid Carcinomas (MEC) represent the most common malignancies of salivary glands. Approximately 50% of all MEC cases are known to harbor gene fusions, but the additional molecular drivers remain largely uncharacterized. Here, we sought to resolve controversy around the role of human papillomavirus (HPV) as a potential driver of mucoepidermoid carcinoma.
View Article and Find Full Text PDFWe present SquiggleNet, the first deep-learning model that can classify nanopore reads directly from their electrical signals. SquiggleNet operates faster than DNA passes through the pore, allowing real-time classification and read ejection. Using 1 s of sequencing data, the classifier achieves significantly higher accuracy than base calling followed by sequence alignment.
View Article and Find Full Text PDFPurpose: In locally advanced p16+ oropharyngeal squamous cell carcinoma (OPSCC), (i) to investigate kinetics of human papillomavirus (HPV) circulating tumor DNA (ctDNA) and association with tumor progression after chemoradiation, and (ii) to compare the predictive value of ctDNA to imaging biomarkers of MRI and FDG-PET.
Experimental Design: Serial blood samples were collected from patients with AJCC8 stage III OPSCC ( = 34) enrolled on a randomized trial: pretreatment; during chemoradiation at weeks 2, 4, and 7; and posttreatment. All patients also had dynamic-contrast-enhanced and diffusion-weighted MRI, as well as FDG-PET scans pre-chemoradiation and week 2 during chemoradiation.
Background: Human papillomavirus (HPV) is a well-established driver of malignant transformation at a number of sites, including head and neck, cervical, vulvar, anorectal, and penile squamous cell carcinomas; however, the impact of HPV integration into the host human genome on this process remains largely unresolved. This is due to the technical challenge of identifying HPV integration sites, which includes limitations of existing informatics approaches to discovering viral-host breakpoints from low-read-coverage sequencing data.
Methods: To overcome this limitation, the authors developed SearcHPV, a new HPV detection pipeline based on targeted capture technology, and applied the algorithm to targeted capture data.
Mobile element insertions (MEIs) are repetitive genomic sequences that contribute to genetic variation and can lead to genetic disorders. Targeted and whole-genome approaches using short-read sequencing have been developed to identify reference and non-reference MEIs; however, the read length hampers detection of these elements in complex genomic regions. Here, we pair Cas9-targeted nanopore sequencing with computational methodologies to capture active MEIs in human genomes.
View Article and Find Full Text PDFVirtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies. Given this ubiquity of srWGS in large-scale genomics initiatives, we sought to establish expectations for routine SV detection from this data type by comparison with lrWGS assembly, as well as to quantify the genomic properties and added value of SVs uniquely accessible to each technology. Analyses from the Human Genome Structural Variation Consortium (HGSVC) of three families captured ~11,000 SVs per genome from srWGS and ~25,000 SVs per genome from lrWGS assembly.
View Article and Find Full Text PDFBackground: Post-zygotic mutations incurred during DNA replication, DNA repair, and other cellular processes lead to somatic mosaicism. Somatic mosaicism is an established cause of various diseases, including cancers. However, detecting mosaic variants in DNA from non-cancerous somatic tissues poses significant challenges, particularly if the variants only are present in a small fraction of cells.
View Article and Find Full Text PDFLong-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci.
View Article and Find Full Text PDFThe transfer and integration of whole and partial mitochondrial genomes into the nuclear genomes of eukaryotes is an ongoing process that has facilitated the transfer of genes and contributed to the evolution of various cellular pathways. Many previous studies have explored the impact of these insertions, referred to as NumtS, but have focused primarily on older events that have become fixed and are therefore present in all individual genomes for a given species. We previously developed an approach to identify novel Numt polymorphisms from next-generation sequence data and applied it to thousands of human genomes.
View Article and Find Full Text PDFBackground: The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage.
View Article and Find Full Text PDFGermline copy number variants (CNVs) and single-nucleotide polymorphisms (SNPs) form the basis of inter-individual genetic variation. Although the phenotypic effects of SNPs have been extensively investigated, the effects of CNVs is relatively less understood. To better characterize mechanisms by which CNVs affect cellular phenotype, we tested their association with variable CpG methylation in a genome-wide manner.
View Article and Find Full Text PDFNew technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies.
View Article and Find Full Text PDFBackground: Multiple myeloma (MM) is a hematological cancer caused by abnormal accumulation of monoclonal plasma cells in bone marrow. With the increase in treatment options, risk-adapted therapy is becoming more and more important. Survival analysis is commonly applied to study progression or other events of interest and stratify the risk of patients.
View Article and Find Full Text PDFLong Interspersed Element-1 (LINE-1) retrotransposition contributes to inter- and intra-individual genetic variation and occasionally can lead to human genetic disorders. Various strategies have been developed to identify human-specific LINE-1 (L1Hs) insertions from short-read whole genome sequencing (WGS) data; however, they have limitations in detecting insertions in complex repetitive genomic regions. Here, we developed a computational tool (PALMER) and used it to identify 203 non-reference L1Hs insertions in the NA12878 benchmark genome.
View Article and Find Full Text PDFIdentifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation.
View Article and Find Full Text PDF