Publications by authors named "Sedlazeck F"

Variant calling using long-read RNA sequencing (lrRNA-seq) can be applied to diverse tasks, such as capturing full-length isoforms and gene expression profiling. It poses challenges, however, due to higher error rates than DNA data, the complexities of transcript diversity, RNA editing events, etc. In this paper, we propose Clair3-RNA, the first deep learning-based variant caller tailored for lrRNA-seq data.

View Article and Find Full Text PDF

The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research and clinical laboratories to evaluate variant detection on male sex chromosomes X and Y, we create a small variant benchmark set with 111,725 variants for the Genome in a Bottle HG002 reference material. We develop an active evaluation approach to demonstrate the benchmark set reliably identifies errors in challenging genomic regions and across short and long read callsets.

View Article and Find Full Text PDF

Rare diseases are collectively common, affecting approximately one in twenty individuals worldwide. In recent years, rapid progress has been made in rare disease diagnostics due to advances in DNA sequencing, development of new computational and experimental approaches to prioritize genes and genetic variants, and increased global exchange of clinical and genetic data. However, more than half of individuals suspected to have a rare disease lack a genetic diagnosis.

View Article and Find Full Text PDF

Structural variants (SVs) drive gene expression in the human brain and are causative of many neurological conditions. However, most existing genetic studies have been based on short-read sequencing methods, which capture fewer than half of the SVs present in any one individual. Long-read sequencing (LRS) enhances our ability to detect disease-associated and functionally relevant structural variants (SVs); however, its application in large-scale genomic studies has been limited by challenges in sample preparation and high costs.

View Article and Find Full Text PDF

Background: MECP2 Duplication Syndrome, also known as X-linked intellectual developmental disorder Lubs type (MRXSL; MIM: 300260), is a neurodevelopmental disorder caused by copy number gains spanning MECP2. Despite varying genomic rearrangement structures, including duplications and triplications, and a wide range of duplication sizes, no clear correlation exists between DNA rearrangement and clinical features. We had previously demonstrated that up to 38% of MRXSL families are characterized by complex genomic rearrangements (CGRs) of intermediate complexity (2 ≤ copy number variant breakpoints < 5), yet the impact of these genomic structures on regulation of gene expression and phenotypic manifestations have not been investigated.

View Article and Find Full Text PDF

The advent of single cell DNA sequencing revealed astonishing dynamics of genomic variability, but failed at characterizing smaller to mid size variants that on the germline level have a profound impact. In this work we discover novel dynamics in three brains utilizing single cell long-read sequencing. This provides key insights into the dynamic of the genomes of individual cells and further highlights brain specific activity of transposable elements.

View Article and Find Full Text PDF

Accurately genotyping structural variant (SV) alleles is crucial to genomics research. We present a novel method (kanpig) for genotyping SVs that leverages variant graphs and k-mer vectors to rapidly generate accurate SV genotypes. We benchmark kanpig against the latest SV benchmarks and show single-sample genotyping concordance of 82.

View Article and Find Full Text PDF

Chromosomal inversions (INVs) are particularly challenging to detect due to their copy-number neutral state and association with repetitive regions. Inversions represent about 1/20 of all balanced structural chromosome aberrations and can lead to disease by gene disruption or altering regulatory regions of dosage-sensitive genes in Short-read genome sequencing (srGS) can only resolve ∼70% of cytogenetically visible inversions referred to clinical diagnostic laboratories, likely due to breakpoints in repetitive regions. Here, we study 12 inversions by long-read genome sequencing (lrGS) ( = 9) or srGS ( = 3) and resolve nine of them.

View Article and Find Full Text PDF

Research and medical genomics require comprehensive, scalable methods for the discovery of novel disease targets, evolutionary drivers and genetic markers with clinical significance. This necessitates a framework to identify all types of variants independent of their size or location. Here we present DRAGEN, which uses multigenome mapping with pangenome references, hardware acceleration and machine learning-based variant detection to provide insights into individual genomes, with ~30 min of computation time from raw reads to variant detection.

View Article and Find Full Text PDF

The abundance of Lp(a) protein holds significant implications for the risk of cardiovascular disease (CVD), which is directly impacted by the copy number (CN) of KIV-2, a 5.5 kbp sub-region. KIV-2 is highly polymorphic in the population and accurate analysis is challenging.

View Article and Find Full Text PDF
Article Synopsis
  • * The authors introduce "stratifications," or specific BED files, that outline different genomic contexts for GRCh37/38 and the new T2T-CHM13 reference, which includes previously challenging regions to sequence.
  • * They also compare the performance of sequencing benchmarks across these references, showing how difficult regions in CHM13 impact the overall performance, and provide a snakemake pipeline for generating stratifications to aid in optimizing sequencing platforms.
View Article and Find Full Text PDF
Article Synopsis
  • The Long-Read Personalized OncoGenomics (POG) dataset features 189 patient tumors and 41 matched normal samples, sequenced with Oxford Nanopore Technologies, providing a comprehensive resource for cancer research.
  • It highlights the advantages of long-read sequencing in identifying complex structural variants, viral integrations, and specific DNA behaviors, such as prominent methylation patterns associated with various cancers.
  • The findings underscore the potential of this dataset in precision medicine, serving as a tool for advancing analytical techniques in cancer genomics.
View Article and Find Full Text PDF

The exponential increase in sequencing data calls for conceptual and computational advances to extract useful biological insights. One such advance, minimizers, allows for reducing the quantity of data handled while maintaining some of its key properties. We provide a basic introduction to minimizers, cover recent methodological developments, and review the diverse applications of minimizers to analyze genomic data, including de novo genome assembly, metagenomics, read alignment, read correction, and pangenomes.

View Article and Find Full Text PDF
Article Synopsis
  • Current genomic variant calling pipelines are not one-size-fits-all, requiring developers and researchers to make subjective tradeoffs based on their specific applications.
  • StratoMod is introduced as a machine-learning tool that predicts germline variant calling errors in a data-driven way, improving the accuracy of variant detection, especially in complex genomic regions.
  • It offers insights into the impact of different reference methods on recall rates and helps identify clinically relevant variants that might be overlooked by existing pipelines, facilitating better decision-making in pipeline design.
View Article and Find Full Text PDF
Article Synopsis
  • The study investigates various methods for whole genome amplification in the analysis of somatic mutations, specifically copy number variants (CNVs), in human brain tissue.
  • Three techniques are compared: PicoPLEX, primary template-directed amplification (PTA), and droplet MDA, revealing distinct characteristics of each method in terms of amplification efficiency and chimeric profiles.
  • The research confirms that a significant portion of brain cells (20.6%) exhibit CNVs, emphasizing the need for careful selection of amplification methods and reference genomes when studying genomic variations in both healthy and diseased brains.
View Article and Find Full Text PDF
Article Synopsis
  • * The 1000 Genomes Project and Oxford Nanopore Technologies are working together to produce LRS data from at least 800 samples to enhance the identification of genetic variations and better understand human genetic diversity.
  • * Initial analysis of 100 samples shows high accuracy in detecting genetic variants, including structural variants that disrupt gene function, and provides valuable data for the clinical genetics community to advance research on pathogenic variations.
View Article and Find Full Text PDF
Article Synopsis
  • - Respiratory syncytial virus (RSV) and human noroviruses (HuNoV) are major pathogens that cause respiratory and gastrointestinal infections respectively, making it essential to generate full-length genome sequences for studying their diversity and tracking variants.
  • - The study developed oligonucleotide probe sets from numerous viral isolate sequences, which were utilized in a capture enrichment sequencing workflow to analyze samples, significantly improving the quality of viral genome recovery.
  • - The results showed that over 99% of RSV genomes and over 96% of HuNoV genomes were complete post-capture, demonstrating the effectiveness of this method for comprehensive genome sequencing and monitoring emerging variants.
View Article and Find Full Text PDF
Article Synopsis
  • The Genome in a Bottle Consortium (GIAB) is creating matched tumor-normal samples that are publicly consented for sharing genomic data and cell lines, focusing on pancreatic ductal adenocarcinoma (PDAC).
  • They provide a comprehensive genomic dataset from the first individual, combining high-depth DNA from tumor and normal cells using advanced whole genome sequencing technologies.
  • This open-access resource aims to help develop benchmarks for detecting genetic variants in cancer, fostering innovation in genome measurement and analysis tools.
View Article and Find Full Text PDF

Modern sequencing technology enables the systematic detection of complex structural variation (SV) across genomes. However, extensive DNA rearrangements arising through a series of mutations, a phenomenon we refer to as serial SV (sSV), remain underexplored, posing a challenge for SV discovery. Here, we present NAHRwhals ( https://github.

View Article and Find Full Text PDF
Article Synopsis
  • Parkinson's disease (PD) is influenced by genetics, but much of its heritability is still unclear due to previous focus on single nucleotide variants, while more complex genetic variations have been less studied.
  • A specific CT-rich region in the SNCA gene may connect to PD risk, but its detailed role has not been explored until now.
  • The study utilized advanced sequencing techniques on a large participant group, confirming existing associations and identifying new ones, while suggesting that the CT-rich region's disease association may not be solely driven by SNCA gene expression.
View Article and Find Full Text PDF

Background: A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants using next-generation sequencing technologies. In this study, we employed long-read sequencing technology, extensively utilized in studying complex genomic regions, to characterize genetic alterations, including short variants (single nucleotide variants and short insertions and deletions) and copy number variations, in 370 CMRGs across 41 individuals from 19 global populations.

Results: Our analysis revealed high levels of genetic variants in CMRGs, with 68.

View Article and Find Full Text PDF

The assignment of variants across haplotypes, phasing, is crucial for predicting the consequences, interaction, and inheritance of mutations and is a key step in improving our understanding of phenotype and disease. However, phasing is limited by read length and stretches of homozygosity along the genome. To overcome this limitation, we designed MethPhaser, a method that utilizes methylation signals from Oxford Nanopore Technologies to extend Single Nucleotide Variation (SNV)-based phasing.

View Article and Find Full Text PDF

The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a complex genomic rearrangement (CGR). Although it has been identified as an important pathogenic DNA mutation signature in genomic disorders and cancer genomes, its architecture remains unresolved. Here, we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the DNA of 24 patients identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted structural variant (SV) haplotypes.

View Article and Find Full Text PDF

A biallelic (AAGGG) expansion in the poly(A) tail of an AluSx3 transposable element within the gene RFC1 is a frequent cause of cerebellar ataxia, neuropathy, vestibular areflexia syndrome (CANVAS), and more recently, has been reported as a rare cause of Parkinson's disease (PD) in the Finnish population. Here, we investigate the prevalence of RFC1 (AAGGG) expansions in PD patients of non-Finnish European ancestry in 1609 individuals from the Parkinson's Progression Markers Initiative study. We identified four PD patients carrying the biallelic RFC1 (AAGGG) expansion and did not identify any carriers in controls.

View Article and Find Full Text PDF

The coppery titi monkey (Plecturocebus cupreus) is an emerging nonhuman primate model system for behavioral and neurobiological research. At the same time, the almost entire absence of genomic resources for the species has hampered insights into the genetic underpinnings of the phenotypic traits of interest. To facilitate future genotype-to-phenotype studies, we here present a high-quality, fully annotated de novo genome assembly for the species with chromosome-length scaffolds spanning the autosomes and chromosome X (scaffold N50 = 130.

View Article and Find Full Text PDF