Publications by Mark Chaisson | LitMetric

Publications by authors named "Mark Chaisson"

Page 1 of 3

An integrative TAD catalog in lymphoblastoid cell lines discloses the functional impact of deletions and insertions in human genomes.

Chong Li Marc Jan Bonder Sabriya Syed Matthew Jensen Mark J P Chaisson

Genome Res

December 2024

The human genome is packaged within a three-dimensional (3D) nucleus and organized into structural units known as compartments, topologically associating domains (TADs), and loops. TAD boundaries, separating adjacent TADs, have been found to be well conserved across mammalian species and more evolutionarily constrained than TADs themselves. Recent studies show that structural variants (SVs) can modify 3D genomes through the disruption of TADs, which play an essential role in insulating genes from outside regulatory elements' aberrant regulation.

View Article and Find Full Text PDF

Complex genetic variation in nearly complete human genomes.

Glennis A Logsdon Peter Ebert Peter A Audano Mark Loftus David Porubsky Mark J P Chaisson

bioRxiv

September 2024

Article Synopsis

* It achieves a high level of completeness, closing 92% of previous assembly gaps and fully assembling complex regions, including 1,852 complex structural variants and 1,246 human centromeres.
* The findings lead to significant improvements in genotyping accuracy and enable the detection of over 26,000 structural variants per sample, enhancing the potential for future disease association research.

View Article and Find Full Text PDF

VISTA: an integrated framework for structural variant discovery.

Varuni Sarwal Seungmo Lee Jianzhi Yang Sriram Sankararaman Mark Chaisson

Brief Bioinform

July 2024

Structural variation (SV) refers to insertions, deletions, inversions, and duplications in human genomes. SVs are present in approximately 1.5% of the human genome.

View Article and Find Full Text PDF

Genotyping sequence-resolved copy-number variations using pangenomes reveals paralog-specific global diversity and expression divergence of duplicated genes.

Walfred Ma Mark Jp Chaisson

bioRxiv

October 2024

Copy-number variable (CNV) genes are important in evolution and disease, yet sequence variation in CNV genes is a blindspot for large-scale studies. We present a method, ctyper, that leverages pangenomes to produce copy-number maps with allele-specific sequences containing locally phased variants of CNV genes from NGS reads. We extensively characterized accuracy and efficiency on a database of 3,351 CNV genes including , , and as well as 212 non-CNV medically-relevant challenging genes.

View Article and Find Full Text PDF

Analysis and benchmarking of small and large genomic variants across tandem repeats.

Adam C English Egor Dolzhenko Helyaneh Ziaei Jam Sean K McKenzie Nathan D Olson Mark J P Chaisson

Nat Biotechnol

April 2024

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies.

View Article and Find Full Text PDF

A High-Quality Blue Whale Genome, Segmental Duplications, and Historical Demography.

Yury V Bukhman Phillip A Morin Susanne Meyer Li-Fang Chu Jeff K Jacobsen Mark J P Chaisson

Mol Biol Evol

March 2024

Article Synopsis

The blue whale is the largest animal ever known, making its genome a key subject for studying longevity and cancer resistance.
Researchers created a detailed genome assembly of the blue whale using advanced sequencing methods and collaborated with databases like NCBI for annotation.
Findings revealed significant gene amplifications linked to the blue whale's size and genetic variations between Pacific and Atlantic populations, highlighting the genome's potential for future biological and conservation studies.

View Article and Find Full Text PDF

Chromosome level genome assembly of the Etruscan shrew Suncus etruscus.

Yury V Bukhman Susanne Meyer Li-Fang Chu Linelle Abueg Jessica Antosiewicz-Bourget Mark J P Chaisson

Sci Data

February 2024

Suncus etruscus is one of the world's smallest mammals, with an average body mass of about 2 grams. The Etruscan shrew's small body is accompanied by a very high energy demand and numerous metabolic adaptations. Here we report a chromosome-level genome assembly using PacBio long read sequencing, 10X Genomics linked short reads, optical mapping, and Hi-C linked reads.

View Article and Find Full Text PDF

Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy.

Delphine Larivière Linelle Abueg Nadolina Brajuka Cristóbal Gallardo-Alba Bjorn Grüning Mark J P Chaisson

Nat Biotechnol

March 2024

View Article and Find Full Text PDF

Benchmarking of small and large variants across tandem repeats.

Adam English Egor Dolzhenko Helyaneh Ziaei Jam Sean Mckenzie Nathan D Olson Mark J P Chaisson

bioRxiv

November 2023

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples.

View Article and Find Full Text PDF

Advances in the discovery and analyses of human tandem repeats.

Mark J P Chaisson Arvis Sulovari Paul N Valdmanis Danny E Miller Evan E Eichler

Emerg Top Life Sci

December 2023

Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences.

View Article and Find Full Text PDF

HQAlign: aligning nanopore reads for SV detection using current-level modeling.

Dhaivat Joshi Suhas Diggavi Mark J P Chaisson Sreeram Kannan

Bioinformatics

October 2023

Motivation: Detection of structural variants (SVs) from the alignment of sample DNA reads to the reference genome is an important problem in understanding human diseases. Long reads that can span repeat regions, along with an accurate alignment of these long reads play an important role in identifying novel SVs. Long-read sequencers, such as nanopore sequencing, can address this problem by providing very long reads but with high error rates, making accurate alignment challenging.

View Article and Find Full Text PDF

Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation.

Mikhail Kolmogorov Kimberley J Billingsley Mira Mastoras Melissa Meredith Jean Monlong Mark Chaisson

Nat Methods

October 2023

Long-read sequencing technologies substantially overcome the limitations of short-reads but have not been considered as a feasible replacement for population-scale projects, being a combination of too expensive, not scalable enough or too error-prone. Here we develop an efficient and scalable wet lab and computational protocol, Napu, for Oxford Nanopore Technologies long-read sequencing that seeks to address those limitations. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the National Institutes of Health Center for Alzheimer's and Related Dementias.

View Article and Find Full Text PDF

vamos: variable-number tandem repeats annotation using efficient motif sets.

Jingwen Ren Bida Gu Mark J P Chaisson

Genome Biol

July 2023

Roughly 3% of the human genome is composed of variable-number tandem repeats (VNTRs): arrays of motifs at least six bases. These loci are highly polymorphic, yet current approaches that define and merge variants based on alignment breakpoints do not capture their full diversity. Here we present a method vamos: VNTR Annotation using efficient Motif Sets that instead annotates VNTR using repeat composition under different levels of motif diversity.

View Article and Find Full Text PDF

Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy.

Delphine Larivière Linelle Abueg Nadolina Brajuka Cristóbal Gallardo-Alba Bjorn Grüning Mark Chaisson

bioRxiv

June 2023

Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years.

View Article and Find Full Text PDF

A draft human pangenome reference.

Wen-Wei Liao Mobin Asri Jana Ebler Daniel Doerr Marina Haukness Mark J P Chaisson

Nature

May 2023

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels.

View Article and Find Full Text PDF

The motif composition of variable number tandem repeats impacts gene expression.

Tsung-Yu Lu Paulina N Smaruj Geoffrey Fudenberg Nicholas Mancuso Mark J P Chaisson

Genome Res

April 2023

Understanding the impact of DNA variation on human traits is a fundamental question in human genetics. Variable number tandem repeats (VNTRs) make up ∼3% of the human genome but are often excluded from association analysis owing to poor read mappability or divergent repeat content. Although methods exist to estimate VNTR length from short-read data, it is known that VNTRs vary in both length and repeat (motif) composition.

View Article and Find Full Text PDF

Structural variation across 138,134 samples in the TOPMed consortium.

Goo Jun Adam C English Ginger A Metcalf Jianzhi Yang Mark Jp Chaisson

Res Sq

February 2023

Article Synopsis

Researchers compiled a comprehensive catalog of 355,667 structural variants (SVs) from DNA data, with over half being novel, to better understand the relationship between SVs and diseases.
The study involved rigorous methods to ensure high-quality variant identification, showing over 90% accuracy compared to previous genetic assemblies.
This catalog reveals significant connections between SVs and various health traits, identifying 690 specific regions that may influence medically relevant genes, providing a crucial resource for disease research.

View Article and Find Full Text PDF

Structural variation across 138,134 samples in the TOPMed consortium.

Goo Jun Adam C English Ginger A Metcalf Jianzhi Yang Mark Jp Chaisson

bioRxiv

January 2023

Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.

View Article and Find Full Text PDF

HQAlign: Aligning nanopore reads for SV detection using current-level modeling.

Dhaivat Joshi Suhas Diggavi Mark J P Chaisson Sreeram Kannan

ArXiv

January 2023

Motivation: Detection of structural variants (SV) from the alignment of sample DNA reads to the reference genome is an important problem in understanding human diseases. Long reads that can span repeat regions, along with an accurate alignment of these long reads play an important role in identifying novel SVs. Long read sequencers such as nanopore sequencing can address this problem by providing very long reads but with high error rates, making accurate alignment challenging.

View Article and Find Full Text PDF

HQAlign: Aligning nanopore reads for SV detection using current-level modeling.

Dhaivat Joshi Suhas Diggavi Mark J P Chaisson Sreeram Kannan

bioRxiv

January 2023

Motivation: Detection of structural variants (SV) from the alignment of sample DNA reads to the reference genome is an important problem in understanding human diseases. Long reads that can span repeat regions, along with an accurate alignment of these long reads play an important role in identifying novel SVs. Long read sequencers such as nanopore sequencing can address this problem by providing very long reads but with high error rates, making accurate alignment challenging.

View Article and Find Full Text PDF

Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation.

Mikhail Kolmogorov Kimberley J Billingsley Mira Mastoras Melissa Meredith Jean Monlong Mark Chaisson

bioRxiv

April 2023

Long-read sequencing technologies substantially overcome the limitations of short-reads but to date have not been considered as feasible replacement at scale due to a combination of being too expensive, not scalable enough, or too error-prone. Here, we develop an efficient and scalable wet lab and computational protocol for Oxford Nanopore Technologies (ONT) long-read sequencing that seeks to provide a genuine alternative to short-reads for large-scale genomics projects. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the NIH Center for Alzheimer's and Related Dementias (CARD).

View Article and Find Full Text PDF

A haplotype-resolved genome assembly of the Nile rat facilitates exploration of the genetic basis of diabetes.

Huishi Toh Chentao Yang Giulio Formenti Kalpana Raja Lily Yan Mark J P Chaisson

BMC Biol

November 2022

Background: The Nile rat (Avicanthis niloticus) is an important animal model because of its robust diurnal rhythm, a cone-rich retina, and a propensity to develop diet-induced diabetes without chemical or genetic modifications. A closer similarity to humans in these aspects, compared to the widely used Mus musculus and Rattus norvegicus models, holds the promise of better translation of research findings to the clinic.

Results: We report a 2.

View Article and Find Full Text PDF

Semi-automated assembly of high-quality diploid human reference genomes.

Erich D Jarvis Giulio Formenti Arang Rhie Andrea Guarracino Chentao Yang Mark J P Chaisson

Nature

November 2022

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome.

View Article and Find Full Text PDF

TT-Mars: structural variants assessment based on haplotype-resolved assemblies.

Jianzhi Yang Mark J P Chaisson

Genome Biol

May 2022

Variant benchmarking is often performed by comparing a test callset to a gold standard set of variants. In repetitive regions of the genome, it may be difficult to establish what is the truth for a call, for example, when different alignment scoring metrics provide equally supported but different variant calls on the same data. Here, we provide an alternative approach, TT-Mars, that takes advantage of the recent production of high-quality haplotype-resolved genome assemblies by providing false discovery rates for variant calls based on how well their call reflects the content of the assembly, rather than comparing calls themselves.

View Article and Find Full Text PDF

The Human Pangenome Project: a global resource to map genomic diversity.

Ting Wang Lucinda Antonacci-Fulton Kerstin Howe Heather A Lawson Julian K Lucas Mark J P Chaisson

Nature

April 2022

The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation.

View Article and Find Full Text PDF