RIG: Recalibration and interrelation of genomic sequence data with the GATK.

G3 (Bethesda)

Interdisciplinary Program in Genetics, Texas A&M University, College Station, Texas 77843 Biochemistry & Biophysics Department, Texas A&M University, College Station, Texas 77843

Published: February 2015

Recent advances in variant calling made available in the Genome Analysis Toolkit (GATK) enable the use of validated single-nucleotide polymorphisms and indels to improve variant calling. However, large collections of variants for this purpose often are unavailable to research communities. We introduce a workflow to generate reliable collections of single-nucleotide polymorphisms and indels by leveraging available genomic resources to inform variant calling using the GATK. The workflow is demonstrated for the crop plant Sorghum bicolor by (i) generating an initial set of variants using reduced representation sequence data from an experimental cross and association panels, (ii) using the initial variants to inform variant calling from whole-genome sequence data of resequenced individuals, and (iii) using variants identified from whole-genome sequence data for recalibration of the reduced representation sequence data. The reliability of variants called with the workflow is verified by comparison with genetically mappable variants from an independent sorghum experimental cross. Comparison with a recent sorghum resequencing study shows that the workflow identifies an additional 1.62 million high-confidence variants from the same sequence data. Finally, the workflow's performance is validated using Arabidopsis sequence data, yielding variant call sets with 95% sensitivity and 99% positive predictive value. The Recalibration and Interrelation of genomic sequence data with the GATK (RIG) workflow enables the GATK to accurately identify genetic variation in organisms lacking validated variant resources.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4390580PMC
http://dx.doi.org/10.1534/g3.115.017012DOI Listing

Publication Analysis

Top Keywords

sequence data
32
variant calling
16
recalibration interrelation
8
interrelation genomic
8
sequence
8
genomic sequence
8
data
8
data gatk
8
single-nucleotide polymorphisms
8
polymorphisms indels
8

Similar Publications

Background: Ovarian cancers (OC) and cervical cancers (CC) have poor survival rates. Tumor-infiltrating lymphocytes (TILs) play a pivotal role in prognosis, but shared immune mechanisms remain elusive.

Methods: We integrated single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) to explore immune regulation in OC and CC, focusing on the PI3K/AKT pathway and FLT3 as key modulators.

View Article and Find Full Text PDF

Blood-based epigenome-wide association study and prediction of alcohol consumption.

Clin Epigenetics

January 2025

Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.

Alcohol consumption is an important risk factor for multiple diseases. It is typically assessed via self-report, which is open to measurement error through recall bias. Instead, molecular data such as blood-based DNA methylation (DNAm) could be used to derive a more objective measure of alcohol consumption by incorporating information from cytosine-phosphate-guanine (CpG) sites known to be linked to the trait.

View Article and Find Full Text PDF

Background: During mammalian spermatogenesis, the cytoskeleton system plays a significant role in morphological changes. Male infertility such as non-obstructive azoospermia (NOA) might be explained by studies of the cytoskeletal system during spermatogenesis.

Methods: The cytoskeleton, scaffold, and actin-binding genes were analyzed by microarray and bioinformatics (771 spermatogenic cellsgenes and 774 Sertoli cell genes).

View Article and Find Full Text PDF

Streptococcus dysgalactiae (S. dysgalactiae ) is a common pathogen of humans and various animals. However, the phylogenetic position of animal S.

View Article and Find Full Text PDF

A cross-tissue transcriptome-wide association study identifies new susceptibility genes for benign prostatic hyperplasia.

Sci Rep

January 2025

Department of Urology, The Second Hospital & Clinical Medical School, Lanzhou University, Lanzhou, 730030, People's Republic of China.

Benign prostatic hyperplasia (BPH) is a prevalent urinary system disorder. Despite evidence of a significant genetic component from previous studies, the specific pathogenic genes and biological mechanisms are still largely unknown. The study utilized the FinnGen R10 dataset, encompassing 177,901 individuals (36,601 cases and 141,300 controls), and the GTEx v8 EQTLs files to conduct single-tissue and cross-tissue transcriptome-wide association studies (TWAS).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!