Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers.

BMC Bioinformatics

Department of Biosystems Science and Engineering, ETH Zurich, Mattenstr, Basel, 26, 4058, Switzerland.

Published: January 2017

Background: Next-generation sequencing of matched tumor and normal biopsy pairs has become a technology of paramount importance for precision cancer treatment. Sequencing costs have dropped tremendously, allowing the sequencing of the whole exome of tumors for just a fraction of the total treatment costs. However, clinicians and scientists cannot take full advantage of the generated data because the accuracy of analysis pipelines is limited. This particularly concerns the reliable identification of subclonal mutations in a cancer tissue sample with very low frequencies, which may be clinically relevant.

Results: Using simulations based on kidney tumor data, we compared the performance of nine state-of-the-art variant callers, namely deepSNV, GATK HaplotypeCaller, GATK UnifiedGenotyper, JointSNVMix2, MuTect, SAMtools, SiNVICT, SomaticSniper, and VarScan2. The comparison was done as a function of variant allele frequencies and coverage. Our analysis revealed that deepSNV and JointSNVMix2 perform very well, especially in the low-frequency range. We attributed false positive and false negative calls of the nine tools to specific error sources and assigned them to processing steps of the pipeline. All of these errors can be expected to occur in real data sets. We found that modifying certain steps of the pipeline or parameters of the tools can lead to substantial improvements in performance. Furthermore, a novel integration strategy that combines the ranks of the variants yielded the best performance. More precisely, the rank-combination of deepSNV, JointSNVMix2, MuTect, SiNVICT and VarScan2 reached a sensitivity of 78% when fixing the precision at 90%, and outperformed all individual tools, where the maximum sensitivity was 71% with the same precision.

Conclusions: The choice of well-performing tools for alignment and variant calling is crucial for the correct interpretation of exome sequencing data obtained from mixed samples, and common pipelines are suboptimal. We were able to relate observed substantial differences in performance to the underlying statistical models of the tools, and to pinpoint the error sources of false positive and false negative calls. These findings might inspire new software developments that improve exome sequencing pipelines and further the field of precision cancer treatment.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5209852PMC
http://dx.doi.org/10.1186/s12859-016-1417-7DOI Listing

Publication Analysis

Top Keywords

exome sequencing
12
sequencing data
8
variant callers
8
precision cancer
8
cancer treatment
8
jointsnvmix2 mutect
8
deepsnv jointsnvmix2
8
false positive
8
positive false
8
false negative
8

Similar Publications

Rationale: Developmental and epileptic encephalopathy (DEE) defines a group of severe and heterogeneous neurodevelopmental disorders. The voltage-gated potassium channel subfamily 2 voltage-gated potassium channel α subunit encoded by the KCNB1 gene is essential for neuronal excitability. Previous studies have shown that KCNB1 variants can cause DEE.

View Article and Find Full Text PDF

Computational Analysis of Missense Mutations: Insight into Protein Structure and Interaction Dynamics.

ACS Chem Neurosci

January 2025

Laboratory for Innovative Drugs (Lab4IND), Computational Drug Design Center (HITMER), Bahçeşehir University, 34734 İstanbul, Türkiye.

is implicated in a range of conditions, including autism spectrum disorder, intellectual disability, seizures, autosomal recessive nonsyndromic intellectual disability, heterotaxy, and ciliary dysfunction. In order to understand the molecular mechanisms underlying these conditions, we focused on the structural and dynamic activity consequences of mutations within this gene. In this study, whole exome sequencing identified the c.

View Article and Find Full Text PDF

Nasopharyngeal cancer (NPC), although rare in young individuals worldwide, is significantly influenced by the Epstein-Barr virus (EBV). Considering EBV's widespread prevalence, understanding its role in NPC's future occurrence, disease progression, clinical symptoms, metastatic tendencies, and prognosis is crucial. In this study, we extensively analyzed two young patients with NPC, who displayed distinct clinical features.

View Article and Find Full Text PDF

The tightly-regulated spatial and temporal distribution of zinc ion concentrations within cellular compartments is controlled by two groups of Zn transporters: the 14-member ZIP/SLC39 family, facilitating Zn influx into the cytoplasm from the extracellular space or intracellular organelles; and the 10-member ZnT/SLC30 family, mobilizing Zn in the opposite direction. Genetic aberrations in most zinc transporters cause human syndromes. Notably, previous studies demonstrated osteopenia and male-specific cardiac death in mice lacking the ZnT5/ zinc transporter, and suggested association of two homozygous frameshift variants with perinatal mortality in humans, due to hydrops fetalis and hypertrophic cardiomyopathy.

View Article and Find Full Text PDF

Pharmacogenetic testing can prevent severe toxicities from several oncology drug therapies; it also has the potential to improve the outcomes from supportive care drugs. Paired tumor and germline sequencing is increasingly common in oncology practice; these include sequencing of pharmacogenes, but the germline pharmacogenetic variants are rarely included in the clinical reports, despite many being clinically actionable. We established an informatics workflow to evaluate the clinical sequencing results for pharmacogenetic variants.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!