Publications by Chen-Shan Chin

Publications by authors named "Chen-Shan Chin"

Page 1 of 3

Complex genetic variation in nearly complete human genomes.

Glennis A Logsdon Peter Ebert Peter A Audano Mark Loftus David Porubsky Chen-Shan Chin

bioRxiv

September 2024

Article Synopsis

* It achieves a high level of completeness, closing 92% of previous assembly gaps and fully assembling complex regions, including 1,852 complex structural variants and 1,246 human centromeres.
* The findings lead to significant improvements in genotyping accuracy and enable the detection of over 26,000 structural variants per sample, enhancing the potential for future disease association research.

View Article and Find Full Text PDF

Recurrent evolution and selection shape structural diversity at the amylase locus.

Davide Bolognini Alma Halgren Runyang Nicolas Lou Alessandro Raveane Joana L Rocha Chen-Shan Chin

Nature

October 2024

The adoption of agriculture triggered a rapid shift towards starch-rich diets in human populations. Amylase genes facilitate starch digestion, and increased amylase copy number has been observed in some modern human populations with high-starch intake, although evidence of recent selection is lacking. Here, using 94 long-read haplotype-resolved assemblies and short-read data from approximately 5,600 contemporary and ancient humans, we resolve the diversity and evolutionary history of structural variation at the amylase locus.

View Article and Find Full Text PDF

The complete sequence and comparative analysis of ape sex chromosomes.

Kateryna D Makova Brandon D Pickett Robert S Harris Gabrielle A Hartley Monika Cechova Chen-Shan Chin

Nature

June 2024

Article Synopsis

Apes have two sex chromosomes: the essential Y chromosome for male reproduction and the X chromosome necessary for both reproduction and cognition, with differences in mating patterns affecting their function.
Studying these chromosomes is challenging due to their repetitive structures, but researchers created gapless assemblies for five great apes and one lesser ape to explore their evolutionary complexities.
The Y chromosomes are highly variable and undergo significant changes compared to the more stable X chromosomes, and this research can provide insights into human evolution and aid in the conservation of endangered ape species.

View Article and Find Full Text PDF

The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes.

Kateryna D Makova Brandon D Pickett Robert S Harris Gabrielle A Hartley Monika Cechova Chen-Shan Chin

bioRxiv

December 2023

Article Synopsis

Apes have two main sex chromosomes, X and Y, where Y is crucial for male reproduction and its deletions can lead to infertility, while X is important for both reproduction and brain function.
Recent advancements in genomic techniques helped researchers create complete structures of the X and Y chromosomes for multiple great ape species, allowing them to explore their evolutionary complexities.
Findings indicate that Y chromosomes are highly variable and undergo rapid changes due to unique genetic regions and transposable elements, while X chromosomes are more stable, highlighting differing evolutionary paths among great ape species.

View Article and Find Full Text PDF

Genomic variant benchmark: if you cannot measure it, you cannot improve it.

Sina Majidian Daniel Paiva Agustinho Chen-Shan Chin Fritz J Sedlazeck Medhat Mahmoud

Genome Biol

October 2023

Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods.

View Article and Find Full Text PDF

Long-read whole-genome analysis of human single cells.

Joanna Hård Jeff E Mold Jesper Eisfeldt Christian Tellgren-Roth Susana Häggqvist Chen-Shan Chin

Nat Commun

August 2023

Long-read sequencing has dramatically increased our understanding of human genome variation. Here, we demonstrate that long-read technology can give new insights into the genomic architecture of individual cells. Clonally expanded CD8+ T-cells from a human donor were subjected to droplet-based multiple displacement amplification (dMDA) to generate long molecules with reduced bias.

View Article and Find Full Text PDF

The complete sequence of a human Y chromosome.

Arang Rhie Sergey Nurk Monika Cechova Savannah J Hoyt Dylan J Taylor Chen-Shan Chin

Nature

September 2023

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region.

View Article and Find Full Text PDF

Multiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes.

Chen-Shan Chin Sairam Behera Asif Khalak Fritz J Sedlazeck Peter H Sudmant

Nat Methods

August 2023

Advancements in sequencing technologies and assembly methods enable the regular production of high-quality genome assemblies characterizing complex regions. However, challenges remain in efficiently interpreting variation at various scales, from smaller tandem repeats to megabase rearrangements, across many human genomes. We present a PanGenome Research Tool Kit (PGR-TK) enabling analyses of complex pangenome structural and haplotype variation at multiple scales.

View Article and Find Full Text PDF

Benchmarking challenging small variants with linked and long reads.

Justin Wagner Nathan D Olson Lindsay Harris Ziad Khan Jesse Farek Chen-Shan Chin

Cell Genom

May 2022

Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as .

View Article and Find Full Text PDF

Semi-automated assembly of high-quality diploid human reference genomes.

Erich D Jarvis Giulio Formenti Arang Rhie Andrea Guarracino Chentao Yang Chen-Shan Chin

Nature

November 2022

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome.

View Article and Find Full Text PDF

A complete reference genome improves analysis of human genetic variation.

Sergey Aganezov Stephanie M Yan Daniela C Soto Melanie Kirsche Samantha Zarate Chen-Shan Chin

Science

April 2022

Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery.

View Article and Find Full Text PDF

The complete sequence of a human genome.

Sergey Nurk Sergey Koren Arang Rhie Mikko Rautiainen Andrey V Bzikadze Chen-Shan Chin

Science

April 2022

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding.

View Article and Find Full Text PDF

Ten simple rules for large-scale data processing.

Arkarachai Fungtammasan Alexandra Lee Jaclyn Taroni Kurt Wheeler Chen-Shan Chin

PLoS Comput Biol

February 2022

View Article and Find Full Text PDF

Curated variation benchmarks for challenging medically relevant autosomal genes.

Justin Wagner Nathan D Olson Lindsay Harris Jennifer McDaniel Haoyu Cheng Chen-Shan Chin

Nat Biotechnol

May 2022

The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly.

View Article and Find Full Text PDF

Dynamic prediction of renal survival among deeply phenotyped kidney transplant recipients using artificial intelligence: an observational, international, multicohort study.

Marc Raynaud Olivier Aubert Gillian Divard Peter P Reese Nassim Kamar Chen-Shan Chin

Lancet Digit Health

December 2021

Background: Kidney allograft failure is a common cause of end-stage renal disease. We aimed to develop a dynamic artificial intelligence approach to enhance risk stratification for kidney transplant recipients by generating continuously refined predictions of survival using updates of clinical data.

Methods: In this observational study, we used data from adult recipients of kidney transplants from 18 academic transplant centres in Europe, the USA, and South America, and a cohort of patients from six randomised controlled trials.

View Article and Find Full Text PDF

An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates.

Ann M Mc Cartney Medhat Mahmoud Michael Jochum Daniel Paiva Agustinho Barry Zorman Chen-Shan Chin

F1000Res

October 2021

Article Synopsis

* The event aimed to assess the current status of research, highlight ongoing challenges, and explore how to leverage various strengths to enhance scientific progress.
* Over four days, eight groups developed new open-source methods to improve species variation analysis and created a resource for the research community, with daily summaries and methods available on GitHub.

View Article and Find Full Text PDF

A draft reference assembly of the genome.

Kevin McKernan Liam T Kane Seth Crawford Chen-Shan Chin Aaron Trippe

F1000Res

August 2021

We describe the use of high-fidelity single molecule sequencing to assemble the genome of the psychoactive mushroom. The genome is 46.6Mb, 46% GC, and in 32 contigs with an N50 of 3.

View Article and Find Full Text PDF

Shotgun transcriptome, spatial omics, and isothermal profiling of SARS-CoV-2 infection reveals unique host responses, viral diversification, and drug interactions.

Daniel Butler Christopher Mozsary Cem Meydan Jonathan Foox Joel Rosiene Chen-Shan Chin

Nat Commun

March 2021

In less than nine months, the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) killed over a million people, including >25,000 in New York City (NYC) alone. The COVID-19 pandemic caused by SARS-CoV-2 highlights clinical needs to detect infection, track strain evolution, and identify biomarkers of disease course. To address these challenges, we designed a fast (30-minute) colorimetric test (LAMP) for SARS-CoV-2 infection from naso/oropharyngeal swabs and a large-scale shotgun metatranscriptomics platform (total-RNA-seq) for host, viral, and microbial profiling.

View Article and Find Full Text PDF

Chromosome-scale, haplotype-resolved assembly of human genomes.

Shilpa Garg Arkarachai Fungtammasan Andrew Carroll Mike Chou Anthony Schmitt Chen-Shan Chin

Nat Biotechnol

March 2021

Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for phased assembly either do not generate chromosome-scale phasing or require pedigree information, which limits their application. We present a method named diploid assembly (DipAsm) that uses long, accurate reads and long-range conformation data for single individuals to generate a chromosome-scale phased assembly within 1 day.

View Article and Find Full Text PDF

Amplification-free long-read sequencing reveals unforeseen CRISPR-Cas9 off-target activity.

Ida Höijer Josefin Johansson Sanna Gudmundsson Chen-Shan Chin Ignas Bunikis

Genome Biol

December 2020

Background: One ongoing concern about CRISPR-Cas9 genome editing is that unspecific guide RNA (gRNA) binding may induce off-target mutations. However, accurate prediction of CRISPR-Cas9 off-target activity is challenging. Here, we present SMRT-OTS and Nano-OTS, two novel, amplification-free, long-read sequencing protocols for detection of gRNA-driven digestion of genomic DNA by Cas9 in vitro.

View Article and Find Full Text PDF

A diploid assembly-based benchmark for variants in the major histocompatibility complex.

Chen-Shan Chin Justin Wagner Qiandong Zeng Erik Garrison Shilpa Garg

Nat Commun

September 2020

Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long reads and linked reads now enable us to construct accurate, phased de novo assemblies. We focus on a medically important, highly variable, 5 million base-pair (bp) region where diploid assembly is particularly useful - the Major Histocompatibility Complex (MHC). Here, we develop a human genome benchmark derived from a diploid assembly for the openly-consented Genome in a Bottle sample HG002.

View Article and Find Full Text PDF

Trajectories of glomerular filtration rate and progression to end stage kidney disease after kidney transplantation.

Marc Raynaud Olivier Aubert Peter P Reese Yassine Bouatou Maarten Naesens Chen-Shan Chin

Kidney Int

January 2021

Although the gold standard of monitoring kidney transplant function relies on glomerular filtration rate (GFR), little is known about GFR trajectories after transplantation, their determinants, and their association with outcomes. To evaluate these parameters we examined kidney transplant recipients receiving care at 15 academic centers. Patients underwent prospective monitoring of estimated GFR (eGFR) measurements, with assessment of clinical, functional, histological and immunological parameters.

View Article and Find Full Text PDF

Ribbon: intuitive visualization for complex genomic variation.

Maria Nattestad Robert Aboukhalil Chen-Shan Chin Michael C Schatz

Bioinformatics

April 2021

Summary: Ribbon is an alignment visualization tool that shows how alignments are positioned within both the reference and read contexts, giving an intuitive view that enables a better understanding of structural variants and the read evidence supporting them. Ribbon was born out of a need to curate complex structural variant calls and determine whether each was well supported by long-read evidence, and it uses the same intuitive visualization method to shed light on contig alignments from genome-to-genome comparisons.

Availability And Implementation: Ribbon is freely available online at http://genomeribbon.

View Article and Find Full Text PDF

Effect of sequence depth and length in long-read assembly of the maize inbred NC358.

Shujun Ou Jianing Liu Kapeel M Chougule Arkarachai Fungtammasan Arun S Seetharam Chen-Shan Chin

Nat Commun

May 2020

Article Synopsis

Advances in long-read data and scaffolding technologies have led to improved reference-quality genome assemblies, particularly for complex genomes like maize.
Critical assessments of sequence depth and read length are essential for effective resource allocation when generating these assemblies.
The study highlights that higher depth and longer subread lengths significantly enhance assembly quality, with high-quality optical maps further improving the contiguity of fragmented assemblies.

View Article and Find Full Text PDF

A strategy for building and using a human reference pangenome.

Bastien Llamas Giuseppe Narzisi Valerie Schneider Peter A Audano Evan Biederstedt Chen-Shan Chin

F1000Res

October 2019

In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes.

View Article and Find Full Text PDF