A benchmarking study of individual somatic variant callers and voting-based ensembles for whole-exome sequencing.

Brief Bioinform

Predictive Oncology Laboratory, Marseille Research Cancer Center, INSERM U1068, CNRS U7258, Institut Paoli-Calmettes, Aix-Marseille University, Equipe labellisée « Ligue Nationale Contre le Cancer », 13009 Marseille, France.

Published: November 2024

AI Article Synopsis

Article Abstract

By identifying somatic mutations, whole-exome sequencing (WES) has become a technology of choice for the diagnosis and guiding treatment decisions in many cancers. Despite advances in the field of somatic variant detection and the emergence of sophisticated tools incorporating machine learning, accurately identifying somatic variants remains challenging. Each new somatic variant caller is often accompanied by claims of superior performance compared to predecessors. Furthermore, most comparative studies focus on a limited set of tools and reference datasets, leading to inconsistent results and making it difficult for laboratories to select the optimal solution. Our study comprehensively evaluated 20 somatic variant callers across four reference WES datasets. We subsequently assessed the performance of ensemble approaches by exploring all possible combinations of these callers, generating 8178 and 1013 combinations for single-nucleotide variants (SNVs) and indels, respectively, with varying voting thresholds. Our analysis identified five high-performing individual somatic variant callers: Muse, Mutect2, Dragen, TNScope, and NeuSomatic. For somatic SNVs, an ensemble combining LoFreq, Muse, Mutect2, SomaticSniper, Strelka, and Lancet outperformed the top-performing caller (Dragen) by >3.6% (mean F1 score = 0.927). Similarly, for somatic indels, an ensemble of Mutect2, Strelka, Varscan2, and Pindel outperformed the best individual caller (Neusomatic) by >3.5% (mean F1 score = 0.867). By considering the computational costs of each combination, we were able to identify an optimal solution involving four somatic variant callers, Muse, Mutect2, and Strelka for the SNVs and Mutect2, Strelka, and Varscan2 for the indels, enabling accurate and cost-effective somatic variant detection in whole exome.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbae697DOI Listing

Publication Analysis

Top Keywords

somatic variant
28
variant callers
16
muse mutect2
12
mutect2 strelka
12
somatic
11
individual somatic
8
whole-exome sequencing
8
identifying somatic
8
variant detection
8
optimal solution
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!