Publications by authors named "Aaron M Wenger"

A major challenge in epigenetics is uncovering the dynamic distribution of nucleosomes and other DNA-binding proteins, which plays a crucial role in regulating cellular functions. Established approaches such as ATAC-seq, ChIP-seq, and CUT&RUN provide valuable insights but are limited by the ensemble nature of their data, masking the cellular and molecular heterogeneity that is often functionally significant. Recently, long-read sequencing technologies, particularly Single Molecule, Real-Time (SMRT/PacBio) sequencing, have introduced transformative capabilities, such as N-methyladenine (6mA) footprinting.

View Article and Find Full Text PDF
Article Synopsis
  • The Genome in a Bottle Consortium (GIAB) is creating matched tumor-normal samples that are publicly consented for sharing genomic data and cell lines, focusing on pancreatic ductal adenocarcinoma (PDAC).
  • They provide a comprehensive genomic dataset from the first individual, combining high-depth DNA from tumor and normal cells using advanced whole genome sequencing technologies.
  • This open-access resource aims to help develop benchmarks for detecting genetic variants in cancer, fostering innovation in genome measurement and analysis tools.
View Article and Find Full Text PDF

Motivation: In diploid organisms, phasing is the problem of assigning the alleles at heterozygous variants to one of two haplotypes. Reads from PacBio HiFi sequencing provide long, accurate observations that can be used as the basis for both calling and phasing variants. HiFi reads also excel at calling larger classes of variation, such as structural or tandem repeat variants.

View Article and Find Full Text PDF

Resolving the molecular basis of a Mendelian condition (MC) remains challenging owing to the diverse mechanisms by which genetic variants cause disease. To address this, we developed a synchronized long-read genome, methylome, epigenome, and transcriptome sequencing approach, which enables accurate single-nucleotide, insertion-deletion, and structural variant calling and diploid genome assembly, and permits the simultaneous elucidation of haplotype-resolved CpG methylation, chromatin accessibility, and full-length transcript information in a single long-read sequencing run. Application of this approach to an Undiagnosed Diseases Network (UDN) participant with a chromosome X;13 balanced translocation of uncertain significance revealed that this translocation disrupted the functioning of four separate genes (, , , and ) previously associated with single-gene MCs.

View Article and Find Full Text PDF

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region.

View Article and Find Full Text PDF

Long-read HiFi genome sequencing allows for accurate detection and direct phasing of single nucleotide variants, indels, and structural variants. Recent algorithmic development enables simultaneous detection of CpG methylation for analysis of regulatory element activity directly in HiFi reads. We present a comprehensive haplotype resolved 5-base HiFi genome sequencing dataset from a rare disease cohort of 276 samples in 152 families to identify rare (~0.

View Article and Find Full Text PDF

Background: Long-read sequencing (LRS) techniques have been very successful in identifying structural variants (SVs). However, the high error rate of LRS made the detection of small variants (substitutions and short indels < 20 bp) more challenging. The introduction of PacBio HiFi sequencing makes LRS also suited for detecting small variation.

View Article and Find Full Text PDF
Article Synopsis
  • Accurately detecting somatic structural variations (SV) in cancer genomes is difficult due to a lack of high-quality datasets for benchmarking.
  • The study analyzes somatic SVs in melanoma and normal lymphoblastoid cell lines using four different sequencing technologies, resulting in a validated set of somatic SVs.
  • The findings emphasize the impact of tumor purity and sequence depth on SV detection, and the datasets are available for community research and benchmarking efforts.
View Article and Find Full Text PDF

Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as .

View Article and Find Full Text PDF

Over the past decade, advances in genetic testing, particularly the advent of next-generation sequencing, have led to a paradigm shift in the diagnosis of molecular diseases and disorders. Despite our present collective ability to interrogate more than 90% of the human genome, portions of the genome have eluded us, resulting in stagnation of diagnostic yield with existing methodologies. Here we show how application of a new technology, long-read sequencing, has the potential to improve molecular diagnostic rates.

View Article and Find Full Text PDF
Article Synopsis
  • Scientists developed a new way called DeepConsensus to help correct DNA sequences more accurately than an older method called pbccs.
  • DeepConsensus uses advanced technology to lower errors in the DNA reads by 42%, which means it helps make the sequencing more reliable.
  • This new approach not only improves the quality of the DNA readings but also enhances how genes are understood and reduces mistakes in identifying genetic variations.
View Article and Find Full Text PDF
Article Synopsis
  • The study focused on improving diagnosis and understanding of genetic disorders in children through the Genomic Answers for Kids program by analyzing genetic information from 960 families.
  • Researchers utilized various sequencing methods, including short-read and long-read genome sequencing, alongside machine learning to prioritize genetic variants and stored the data in a structured database for future access.
  • The results showed varying diagnostic success rates, with new diagnostic information gained from structural variants and long-read sequencing, highlighting ongoing challenges in identifying variants of unknown significance in nondiagnostic cases.
View Article and Find Full Text PDF

The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly.

View Article and Find Full Text PDF
Article Synopsis
  • Researchers studied a gene called NOTCH2NLC that has a repeating sequence (GGC) linked to a brain disease and found that some people who carry these repeats don’t show any symptoms.* -
  • They noticed that these asymptomatic carriers had a special change in their DNA called hypermethylation, which might protect them from getting the disease.* -
  • By using advanced DNA sequencing methods, scientists discovered that fathers often had longer repeat lengths compared to their affected children, suggesting a complicated relationship between repeat size and symptoms.*
View Article and Find Full Text PDF

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci.

View Article and Find Full Text PDF

Long-read sequencing (LRS) has the potential to comprehensively identify all medically relevant genome variation, including variation commonly missed by short-read sequencing (SRS) approaches. To determine this potential, we performed LRS around 15×-40× genome coverage using the Pacific Biosciences Sequel I System for five trios. The respective probands were diagnosed with intellectual disability (ID) whose etiology remained unresolved after SRS exomes and genomes.

View Article and Find Full Text PDF
Article Synopsis
  • An amendment to this paper has been published.
  • The amendment can be accessed through a link provided at the top of the paper.
  • Readers are encouraged to check the link for updated information.
View Article and Find Full Text PDF

A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models.

View Article and Find Full Text PDF

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies.

View Article and Find Full Text PDF

The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient's disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process.

View Article and Find Full Text PDF

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses.

View Article and Find Full Text PDF

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.

View Article and Find Full Text PDF

Single-molecule long-read sequencing datasets were generated for a son-father-mother trio of Han Chinese descent that is part of the Genome in a Bottle (GIAB) consortium portfolio. The dataset was generated using the Pacific Biosciences Sequel System. The son and each parent were sequenced to an average coverage of 60 and 30, respectively, with N50 subread lengths between 16 and 18 kb.

View Article and Find Full Text PDF

The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome.

View Article and Find Full Text PDF