Publications by authors named "Karen H Miga"

The combination of ultra-long (UL) Oxford Nanopore Technologies (ONT) sequencing reads with long, accurate Pacific Bioscience (PacBio) High Fidelity (HiFi) reads has enabled the completion of a human genome and spurred similar efforts to complete the genomes of many other species. However, this approach for complete, "telomere-to-telomere" genome assembly relies on multiple sequencing platforms, limiting its accessibility. ONT "Duplex" sequencing reads, where both strands of the DNA are read to improve quality, promise high per-base accuracy.

View Article and Find Full Text PDF
Article Synopsis
  • The Genome in a Bottle Consortium (GIAB) is creating matched tumor-normal samples that are publicly consented for sharing genomic data and cell lines, focusing on pancreatic ductal adenocarcinoma (PDAC).
  • They provide a comprehensive genomic dataset from the first individual, combining high-depth DNA from tumor and normal cells using advanced whole genome sequencing technologies.
  • This open-access resource aims to help develop benchmarks for detecting genetic variants in cancer, fostering innovation in genome measurement and analysis tools.
View Article and Find Full Text PDF

Somatic variant detection is an integral part of cancer genomics analysis. While most methods have focused on short-read sequencing, long-read technologies now offer potential advantages in terms of repeat mapping and variant phasing. We present DeepSomatic, a deep learning method for detecting somatic SNVs and insertions and deletions (indels) from both short-read and long-read data, with modes for whole-genome and exome sequencing, and able to run on tumor-normal, tumor-only, and with FFPE-prepared samples.

View Article and Find Full Text PDF
Article Synopsis
  • * Long-read sequencing (LRS) offers a promising solution by providing more comprehensive data, including better long-range mapping and methylation profiling, which can help identify variants not detectable by SRS.
  • * In a study involving 98 samples, LRS successfully identified additional rare variants in 11 cases, enhancing diagnostic accuracy for rare monogenic diseases and suggesting its future importance in clinical genomics.
View Article and Find Full Text PDF
Article Synopsis
  • The study presents detailed genomes of six ape species, achieving high accuracy and complete sequencing of all their chromosomes.
  • It addresses complex genomic regions, leading to enhanced understanding of evolutionary relationships among these species.
  • The findings will serve as a crucial resource for future research on human evolution and our closest ape relatives.
View Article and Find Full Text PDF

Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy.

View Article and Find Full Text PDF

Highlighting the Distinguished Speakers Symposium on "The Future of Human Genetics and Genomics," this collection of articles is based on presentations at the ASHG 2023 Annual Meeting in Washington, DC, in celebration of all our field has accomplished in the past 75 years, since the founding of ASHG in 1948.

View Article and Find Full Text PDF
Article Synopsis
  • Apes have two sex chromosomes: the essential Y chromosome for male reproduction and the X chromosome necessary for both reproduction and cognition, with differences in mating patterns affecting their function.
  • Studying these chromosomes is challenging due to their repetitive structures, but researchers created gapless assemblies for five great apes and one lesser ape to explore their evolutionary complexities.
  • The Y chromosomes are highly variable and undergo significant changes compared to the more stable X chromosomes, and this research can provide insights into human evolution and aid in the conservation of endangered ape species.
View Article and Find Full Text PDF
Article Synopsis
  • The Human Genome Project laid the groundwork for genetic research but initially struggled with representing human genetic diversity.
  • Recent breakthroughs, namely complete gap-free genomes from the Telomere-to-Telomere Consortium and high-quality pangenomes from the Human Pangenome Reference Consortium, have addressed these issues.
  • These advancements, driven by improved DNA sequencing technology, not only provide clearer genome mapping but also enhance our understanding of genetic diversity, leading to better applications in precision medicine and human biology.
View Article and Find Full Text PDF
Article Synopsis
  • The study focuses on the genomic structure of crab-eating and rhesus macaques, addressing the need for better understanding of their genetic differences and similarities.
  • Researchers provide a complete genome assembly for the crab-eating macaque and 20 haplotype-resolved assemblies to explore significant genomic variations between the two species and their implications.
  • Findings include that macaques have lower segmental duplication and longer centromeres than humans, as well as differences in genetic variants and alternative splicing, which may relate to metabolic and evolutionary traits, enhancing their use in biomedical research.
View Article and Find Full Text PDF

Reference-free genome phasing is vital for understanding allele inheritance and the impact of single-molecule DNA variation on phenotypes. To achieve thorough phasing across homozygous or repetitive regions of the genome, long-read sequencing technologies are often used to perform phased de novo assembly. As a step toward reducing the cost and complexity of this type of analysis, we describe new methods for accurately phasing Oxford Nanopore Technologies (ONT) sequence data with the Shasta genome assembler and a modular tool for extending phasing to the chromosome scale called GFAse.

View Article and Find Full Text PDF

Most current studies rely on short-read sequencing to detect somatic structural variation (SV) in cancer genomes. Long-read sequencing offers the advantage of better mappability and long-range phasing, which results in substantial improvements in germline SV detection. However, current long-read SV detection methods do not generalize well to the analysis of somatic SVs in tumor genomes with complex rearrangements, heterogeneity, and aneuploidy.

View Article and Find Full Text PDF

The combination of ultra-long Oxford Nanopore (ONT) sequencing reads with long, accurate PacBio HiFi reads has enabled the completion of a human genome and spurred similar efforts to complete the genomes of many other species. However, this approach for complete, "telomere-to-telomere" genome assembly relies on multiple sequencing platforms, limiting its accessibility. ONT "Duplex" sequencing reads, where both strands of the DNA are read to improve quality, promise high per-base accuracy.

View Article and Find Full Text PDF

Fluorescent in situ hybridization (FISH) is a powerful method for the targeted visualization of nucleic acids in their native contexts. Recent technological advances have leveraged computationally designed oligonucleotide (oligo) probes to interrogate > 100 distinct targets in the same sample, pushing the boundaries of FISH-based assays. However, even in the most highly multiplexed experiments, repetitive DNA regions are typically not included as targets, as the computational design of specific probes against such regions presents significant technical challenges.

View Article and Find Full Text PDF
Article Synopsis
  • Apes have two main sex chromosomes, X and Y, where Y is crucial for male reproduction and its deletions can lead to infertility, while X is important for both reproduction and brain function.
  • Recent advancements in genomic techniques helped researchers create complete structures of the X and Y chromosomes for multiple great ape species, allowing them to explore their evolutionary complexities.
  • Findings indicate that Y chromosomes are highly variable and undergo rapid changes due to unique genetic regions and transposable elements, while X chromosomes are more stable, highlighting differing evolutionary paths among great ape species.
View Article and Find Full Text PDF

The UCSC Genome Browser (https://genome.ucsc.edu) is a web-based genomic visualization and analysis tool that serves data to over 7,000 distinct users per day worldwide.

View Article and Find Full Text PDF

Advances in long-read sequencing and assembly now mean that individual labs can generate phased genomes that are more accurate and more contiguous than the original human reference genome. With declining costs and increasing democratization of technology, we suggest that complete genome assemblies, where both parental haplotypes are phased telomere to telomere, will become standard in human genetics. Soon, even in clinical settings where rigorous sample-handling standards must be met, affected individuals could have reference-grade genomes fully sequenced and assembled in just a few hours given advances in technology, computational processing, and annotation.

View Article and Find Full Text PDF
Article Synopsis
  • Long-read sequencing technology is enhancing the detection of genetic variants in complex regions of the genome and facilitating quicker genetic diagnoses in clinical settings.
  • Newer third-generation sequencing platforms, such as those from PacBio and Oxford Nanopore, are rapidly advancing, but traditional variant calling methods struggle with increased data complexity.
  • The developed local haplotype approximation method improves variant calling accuracy and allows DeepVariant to work effectively across various long-read sequencing platforms.
View Article and Find Full Text PDF

Long-read sequencing technologies substantially overcome the limitations of short-reads but have not been considered as a feasible replacement for population-scale projects, being a combination of too expensive, not scalable enough or too error-prone. Here we develop an efficient and scalable wet lab and computational protocol, Napu, for Oxford Nanopore Technologies long-read sequencing that seeks to address those limitations. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the National Institutes of Health Center for Alzheimer's and Related Dementias.

View Article and Find Full Text PDF

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region.

View Article and Find Full Text PDF

Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations.

View Article and Find Full Text PDF

Fluorescent hybridization (FISH) is a powerful method for the targeted visualization of nucleic acids in their native contexts. Recent technological advances have leveraged computationally designed oligonucleotide (oligo) probes to interrogate >100 distinct targets in the same sample, pushing the boundaries of FISH-based assays. However, even in the most highly multiplexed experiments, repetitive DNA regions are typically not included as targets, as the computational design of specific probes against such regions presents significant technical challenges.

View Article and Find Full Text PDF

Long-read sequencing technologies substantially overcome the limitations of short-reads but to date have not been considered as feasible replacement at scale due to a combination of being too expensive, not scalable enough, or too error-prone. Here, we develop an efficient and scalable wet lab and computational protocol for Oxford Nanopore Technologies (ONT) long-read sequencing that seeks to provide a genuine alternative to short-reads for large-scale genomics projects. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the NIH Center for Alzheimer's and Related Dementias (CARD).

View Article and Find Full Text PDF

Advances in long-read sequencing technologies have broadened our understanding of genetic variation in the human population, uncovered new complex structural variants and offered an opportunity to elucidate new variant associations with disease.

View Article and Find Full Text PDF