We present the results of the human genomic small variant calling benchmarking initiative of the German Research Foundation (DFG) funded Next Generation Sequencing Competence Network (NGS-CN) and the German Human Genome-Phenome Archive (GHGA). In this effort, we developed NCBench, a continuous benchmarking platform for the evaluation of small genomic variant callsets in terms of recall, precision, and false positive/negative error patterns. NCBench is implemented as a continuously re-evaluated open-source repository. We show that it is possible to entirely rely on public free infrastructure (Github, Github Actions, Zenodo) in combination with established open-source tools. NCBench is agnostic of the used dataset and can evaluate an arbitrary number of given callsets, while reporting the results in a visual and interactive way. We used NCBench to evaluate over 40 callsets generated by various variant calling pipelines available in the participating groups that were run on three exome datasets from different enrichment kits and at different coverages. While all pipelines achieve high overall quality, subtle systematic differences between callers and datasets exist and are made apparent by NCBench.These insights are useful to improve existing pipelines and develop new workflows. NCBench is meant to be open for the contribution of any given callset. Most importantly, for authors, it will enable the omission of repeated re-implementation of paper-specific variant calling benchmarks for the publication of new tools or pipelines, while readers will benefit from being able to (continuously) observe the performance of tools and pipelines at the time of reading instead of at the time of writing.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11428021PMC
http://dx.doi.org/10.12688/f1000research.140344.1DOI Listing

Publication Analysis

Top Keywords

variant calling
16
tools pipelines
8
ncbench
6
variant
5
pipelines
5
ncbench providing
4
providing open
4
open reproducible
4
reproducible transparent
4
transparent adaptable
4

Similar Publications

Genomic and phenotypic correlates of mosaic loss of chromosome Y in blood.

Am J Hum Genet

January 2025

Division of Biostatistics, Data Science Institute, Medical College of Wisconsin, Milwaukee, WI, USA; Cancer Center, Medical College of Wisconsin, Milwaukee, WI, USA. Electronic address:

Mosaic loss of Y (mLOY) is the most common somatic chromosomal alteration detected in human blood. The presence of mLOY is associated with altered blood cell counts and increased risk of Alzheimer disease, solid tumors, and other age-related diseases. We sought to gain a better understanding of genetic drivers and associated phenotypes of mLOY through analyses of whole-genome sequencing (WGS) of a large set of genetically diverse males from the Trans-Omics for Precision Medicine (TOPMed) program.

View Article and Find Full Text PDF

Primary ciliary dyskinesia (PCD, OMIM 244400) is a rare genetic disorder that affects motile cilia and is characterised by impaired mucociliary clearance of the airway epithelium, which results in chronic upper and lower airway infections. While short-read next-generation sequencing technology has been used for the genetic testing of PCD, its effectiveness is limited in identifying variants in the gene because of the nearly identical pseudogene As we confirmed that the gene was not expressed in airway cells, we obtained nasal mucosa biopsy specimens for total RNA sequencing (RNA-seq) with library enrichment using exome oligos. Among the 34 nasal samples from patients suspected of having PCD, three aberrant splicing patterns in were identified in two samples.

View Article and Find Full Text PDF

Clair3-RNA: A deep learning-based small variant caller for long-read RNA sequencing data.

bioRxiv

January 2025

Department of Computer Science, School of Computing and Data Science, University of Hong Kong, Hong Kong, China.

Variant calling using long-read RNA sequencing (lrRNA-seq) can be applied to diverse tasks, such as capturing full-length isoforms and gene expression profiling. It poses challenges, however, due to higher error rates than DNA data, the complexities of transcript diversity, RNA editing events, etc. In this paper, we propose Clair3-RNA, the first deep learning-based variant caller tailored for lrRNA-seq data.

View Article and Find Full Text PDF

Background And Aims: Familial hypercholesterolemia (FH) and other disorders with similar features are common genetic disorders that remain underdiagnosed and undertreated, due in part to the cost of screening. The aim of this study was to design and implement a whole gene targeted NGS panel for the molecular diagnosis of FH and statin intolerance with an emphasis on high quality variant calling, including copy number analysis.

Methods: A whole gene panel for hybridisation-based short read NGS was designed for the dominant FH-genes low density lipoprotein receptor (), apolipoprotein B (APOB), proproteinconvertas subtilisin/kexin type 9 (PCSK9), apolipoprotein E (APOE) and the recessive FH-genes low density lipoprotein receptor adaptor protein 1 (), ATP binding cassette subfamily member 5/8 (ABCG5/8) and lipase A, lysosomal acid type (), as well as solute carrier organic anion transporter family member 1B1 (), not an FH gene but linked to statin intolerance.

View Article and Find Full Text PDF

Genotypic and phenotypic diversity of Mycobacterium tuberculosis strains from eastern India.

Infect Genet Evol

January 2025

Immunogenomics & Systems Biology group, Institute of Life Sciences (ILS), Bhubaneswar, Odisha, India; School of Biotechnology, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar, Odisha, India. Electronic address:

Whole genome sequencing has been used to investigate the genomic diversity of M. tuberculosis in the northern and southern states of India, but information about the eastern part of the country is still limited. Through a sequencing-based strategy, this study seeks to comprehend the diversity and drug resistance pattern in the eastern region.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!