Comparison of genetic variants in matched samples using thesaurus annotation.

Bioinformatics

Ludwig Institute for Cancer Research, University of Oxford, Oxford, UK.

Published: March 2016

Motivation: Calling changes in DNA, e.g. as a result of somatic events in cancer, requires analysis of multiple matched sequenced samples. Events in low-mappability regions of the human genome are difficult to encode in variant call files and have been under-reported as a result. However, they can be described accurately through thesaurus annotation-a technique that links multiple genomic loci together to explicate a single variant.

Results: We here describe software and benchmarks for using thesaurus annotation to detect point changes in DNA from matched samples. In benchmarks on matched normal/tumor samples we show that the technique can recover between five and ten percent more true events than conventional approaches, while strictly limiting false discovery and being fully consistent with popular variant analysis workflows. We also demonstrate the utility of the approach for analysis of de novo mutations in parents/child families.

Availability And Implementation: Software performing thesaurus annotation is implemented in java; available in source code on github at GeneticThesaurus (https://github.com/tkonopka/GeneticThesaurus) and as an executable on sourceforge at geneticthesaurus (https://sourceforge.net/projects/geneticthesaurus). Mutation calling is implemented in an R package available on github at RGeneticThesaurus (https://github.com/tkonopka/RGeneticThesaurus).

Supplementary Information: Supplementary data are available at Bioinformatics online.

Contact: tomasz.konopka@ludwig.ox.ac.uk.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4795618PMC
http://dx.doi.org/10.1093/bioinformatics/btv654DOI Listing

Publication Analysis

Top Keywords

thesaurus annotation
12
matched samples
8
changes dna
8
comparison genetic
4
genetic variants
4
matched
4
variants matched
4
samples
4
thesaurus
4
samples thesaurus
4

Similar Publications

scGO: interpretable deep neural network for cell status annotation and disease diagnosis.

Brief Bioinform

November 2024

School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China.

Machine learning has emerged as a transformative tool for elucidating cellular heterogeneity in single-cell RNA sequencing. However, a significant challenge lies in the "black box" nature of deep learning models, which obscures the decision-making process and limits interpretability in cell status annotation. In this study, we introduced scGO, a Gene Ontology (GO)-inspired deep learning framework designed to provide interpretable cell status annotation for scRNA-seq data.

View Article and Find Full Text PDF

The Homo sapiens Chromosomal Location Ontology (HSCLO) is designed to facilitate the integration of human genomic features into biomedical knowledge graphs from releases GRCh37 and GRCh38 at multiple resolutions. HSCLO comprises two distinct versions, HSCLO37 and HSCLO38, each tailored to its respective human genome release. This ontology supports the efficient integration and analysis of human genomic data across scales ranging from entire chromosomes to individual base pairs, thereby enhancing data retrieval and interoperability within large-scale biomedical datasets.

View Article and Find Full Text PDF

A Unique Expression Profile Responding to Powdery Mildew in Wild Emmer Wheat D430.

Int J Mol Sci

December 2024

Yantai Key Laboratory of Characteristic Agricultural Biological Resources Conservation and Germplasm Innovative Utilization, College of Life Sciences, Yantai University, Yantai 264005, China.

Powdery mildew, caused by f. sp. (), is a disease that seriously harms wheat production and occurs in all wheat-producing areas around the world.

View Article and Find Full Text PDF

Background: Mining functional gene modules from genomic data is an important step to detect gene members of pathways or other relations such as protein-protein interactions. This work explores the plausibility of detecting functional gene modules by factorizing gene-phenotype association matrix from the phenotype ontology data rather than the conventionally used gene expression data. Recently, the hierarchical structure of phenotype ontologies has not been sufficiently utilized in gene clustering while functionally related genes are consistently associated with phenotypes on the same path in phenotype ontologies.

View Article and Find Full Text PDF

Transcriptome and Gene Expression Analysis Revealed : A Potential New Marker for Somatic Embryogenesis in Common Centaury ( Rafn.).

Int J Mol Sci

December 2024

Department of Plant Physiology, Institute for Biological Research "Siniša Stanković"-National Institute of Republic of Serbia, University of Belgrade, Bulevar Despota Stefana 142, 11108 Belgrade, Serbia.

Rafn. is a medicinal plant used as a model for studying plant developmental processes due to its developmental plasticity and ease of manipulation in vitro. Identifying the genes involved in its organogenesis and somatic embryogenesis (SE) is the first step toward unraveling the molecular mechanisms underlying its morphogenic plasticity.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!