The GIAB genomic stratifications resource for human reference genomes.

Nathan Dwarshuis Divya Kalra Jennifer McDaniel Philippe Sanio Pilar Alvarez Jerez Bharati Jadhav Wenyu Eddy Huang Rajarshi Mondal Ben Busby Nathan D Olson Fritz J Sedlazeck Justin Wagner Sina Majidian Justin M Zook

Nat Commun

Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD., USA.

Published: October 2024

* The authors introduce "stratifications," or specific BED files, that outline different genomic contexts for GRCh37/38 and the new T2T-CHM13 reference, which includes previously challenging regions to sequence.
* They also compare the performance of sequencing benchmarks across these references, showing how difficult regions in CHM13 impact the overall performance, and provide a snakemake pipeline for generating stratifications to aid in optimizing sequencing platforms.

Despite the growing variety of sequencing and variant-calling tools, no workflow performs equally well across the entire human genome. Understanding context-dependent performance is critical for enabling researchers, clinicians, and developers to make informed tradeoffs when selecting sequencing hardware and software. Here we describe a set of "stratifications," which are BED files that define distinct contexts throughout the genome. We define these for GRCh37/38 as well as the new T2T-CHM13 reference, adding many new hard-to-sequence regions which are critical for understanding performance as the field progresses. Specifically, we highlight the increase in hard-to-map and GC-rich stratifications in CHM13 relative to the previous references. We then compare the benchmarking performance with each reference and show the performance penalty brought about by these additional difficult regions in CHM13. Additionally, we demonstrate how the stratifications can track context-specific improvements over different platform iterations, using Oxford Nanopore Technologies as an example. The means to generate these stratifications are available as a snakemake pipeline at https://github.com/usnistgov/giab-stratifications . We anticipate this being useful in enabling precise risk-reward calculations when building sequencing pipelines for any of the commonly-used reference genomes.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11489684	PMC
http://dx.doi.org/10.1038/s41467-024-53260-y	DOI Listing

Publication Analysis

Top Keywords

reference genomes

giab genomic

stratifications

genomic stratifications

stratifications resource

resource human

reference

human reference

genomes despite

despite growing

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered