Impact of post-alignment processing in variant discovery from whole exome data.

BMC Bioinformatics

Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st St SW, Rochester, MN, 55905, USA.

Published: October 2016

Background: GATK Best Practices workflows are widely used in large-scale sequencing projects and recommend post-alignment processing before variant calling. Two key post-processing steps include the computationally intensive local realignment around known INDELs and base quality score recalibration (BQSR). Both have been shown to reduce erroneous calls; however, the findings are mainly supported by the analytical pipeline that incorporates BWA and GATK UnifiedGenotyper. It is not known whether there is any benefit of post-processing and to what extent the benefit might be for pipelines implementing other methods, especially given that both mappers and callers are typically updated. Moreover, because sequencing platforms are upgraded regularly and the new platforms provide better estimations of read quality scores, the need for post-processing is also unknown. Finally, some regions in the human genome show high sequence divergence from the reference genome; it is unclear whether there is benefit from post-processing in these regions.

Results: We used both simulated and NA12878 exome data to comprehensively assess the impact of post-processing for five or six popular mappers together with five callers. Focusing on chromosome 6p21.3, which is a region of high sequence divergence harboring the human leukocyte antigen (HLA) system, we found that local realignment had little or no impact on SNP calling, but increased sensitivity was observed in INDEL calling for the Stampy + GATK UnifiedGenotyper pipeline. No or only a modest effect of local realignment was detected on the three haplotype-based callers and no evidence of effect on Novoalign. BQSR had virtually negligible effect on INDEL calling and generally reduced sensitivity for SNP calling that depended on caller, coverage and level of divergence. Specifically, for SAMtools and FreeBayes calling in the regions with low divergence, BQSR reduced the SNP calling sensitivity but improved the precision when the coverage is insufficient. However, in regions of high divergence (e.g., the HLA region), BQSR reduced the sensitivity of both callers with little gain in precision rate. For the other three callers, BQSR reduced the sensitivity without increasing the precision rate regardless of coverage and divergence level.

Conclusions: We demonstrated that the gain from post-processing is not universal; rather, it depends on mapper and caller combination, and the benefit is influenced further by sequencing depth and divergence level. Our analysis highlights the importance of considering these key factors in deciding to apply the computationally intensive post-processing to Illumina exome data.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5048557PMC
http://dx.doi.org/10.1186/s12859-016-1279-zDOI Listing

Publication Analysis

Top Keywords

exome data
12
local realignment
12
snp calling
12
reduced sensitivity
12
bqsr reduced
12
post-alignment processing
8
processing variant
8
computationally intensive
8
benefit post-processing
8
mappers callers
8

Similar Publications

Background: Mucopolysaccharidosis type I (MPS I - IDUA gene) is a rare autosomal recessive lysosomal storage disorder. Clinical symptoms, including visceral overload, are progressive and typically begin postnatally. Descriptions of hepatosplenomegaly associated with lysosomal pathology are uncommon during the prenatal period.

View Article and Find Full Text PDF

Background: Prenatally transmitted viruses can cause severe damage to the developing brain. There is unexplained variability in prenatal brain injury and postnatal neurodevelopmental outcomes, suggesting disease modifiers. Of note, prenatal Zika infection can cause a spectrum of neurodevelopmental disorders, including congenital Zika syndrome.

View Article and Find Full Text PDF

Understanding the molecular landscape of nonmuscle-invasive bladder cancer (NMIBC) is essential to improve risk assessment and treatment regimens. We performed a comprehensive genomic analysis of patients with NMIBC using whole-exome sequencing (n = 438), shallow whole-genome sequencing (n = 362) and total RNA sequencing (n = 414). A large genomic variation within NMIBC was observed and correlated with different molecular subtypes.

View Article and Find Full Text PDF

Efficacy and safety of PD-1 blockade-activated neoantigen specific cellular therapy for advanced relapsed non-small cell lung cancer.

Cancer Immunol Immunother

January 2025

Department of Oncology, Lianyungang Clinical College of Nanjing Medical University/The First People's Hospital of Lianyungang, The Affiliated Lianyungang Hospital of Xuzhou Medical University, Lianyungang, 222002, China.

Background: Due to its strong immunogenicity and tumor specificity, neoplastic antigen has emerged as an immunotherapy target with wide therapeutic prospect and clinical application value. Anti-programmed death-1 (PD-1) antibodies reinvigorate T cell-mediated antitumor immunity. So, we conducted single-arm trial to assess the safety and efficacy of PD-1 blockade(Camrelizumab)-activated neoantigen specific cellular therapy (aNASCT) on advanced relapsed non-small lung cancer(NSCLC)(ClinicalTrials.

View Article and Find Full Text PDF

Basic Science and Pathogenesis.

Alzheimers Dement

December 2024

Faculdade de Medicina de Ciências Médicas de Minas Gerais, Belo Horizonte, Brazil.

Background: Most research initiatives have emerged from high-income countries (HIC), leaving a gap in understanding the disease's genetic basis in diverse populations like those in Latin American countries (LAC). ReDLat tackles this gap, focusing on LAC's unique genetics and socioeconomic factors to identify specific Alzheimer's Disease (AD) and Frontotemporal Dementia (FTD) risk factors in Mexico, Colombia, Peru, Chile, Argentina, and Brazil.

Method: We employed a comprehensive genetic analysis approach, integrating Whole Genome Sequencing (WGS), Exome Sequencing, and SNP arrays to understand the cohort's unique genetic architecture.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!