Motivation: Calling changes in DNA, e.g. as a result of somatic events in cancer, requires analysis of multiple matched sequenced samples. Events in low-mappability regions of the human genome are difficult to encode in variant call files and have been under-reported as a result. However, they can be described accurately through thesaurus annotation-a technique that links multiple genomic loci together to explicate a single variant.
Results: We here describe software and benchmarks for using thesaurus annotation to detect point changes in DNA from matched samples. In benchmarks on matched normal/tumor samples we show that the technique can recover between five and ten percent more true events than conventional approaches, while strictly limiting false discovery and being fully consistent with popular variant analysis workflows. We also demonstrate the utility of the approach for analysis of de novo mutations in parents/child families.
Availability And Implementation: Software performing thesaurus annotation is implemented in java; available in source code on github at GeneticThesaurus (https://github.com/tkonopka/GeneticThesaurus) and as an executable on sourceforge at geneticthesaurus (https://sourceforge.net/projects/geneticthesaurus). Mutation calling is implemented in an R package available on github at RGeneticThesaurus (https://github.com/tkonopka/RGeneticThesaurus).
Supplementary Information: Supplementary data are available at Bioinformatics online.
Contact: tomasz.konopka@ludwig.ox.ac.uk.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4795618 | PMC |
http://dx.doi.org/10.1093/bioinformatics/btv654 | DOI Listing |
Brief Bioinform
November 2024
School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China.
Machine learning has emerged as a transformative tool for elucidating cellular heterogeneity in single-cell RNA sequencing. However, a significant challenge lies in the "black box" nature of deep learning models, which obscures the decision-making process and limits interpretability in cell status annotation. In this study, we introduced scGO, a Gene Ontology (GO)-inspired deep learning framework designed to provide interpretable cell status annotation for scRNA-seq data.
View Article and Find Full Text PDFSci Data
January 2025
The Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.
The Homo sapiens Chromosomal Location Ontology (HSCLO) is designed to facilitate the integration of human genomic features into biomedical knowledge graphs from releases GRCh37 and GRCh38 at multiple resolutions. HSCLO comprises two distinct versions, HSCLO37 and HSCLO38, each tailored to its respective human genome release. This ontology supports the efficient integration and analysis of human genomic data across scales ranging from entire chromosomes to individual base pairs, thereby enhancing data retrieval and interoperability within large-scale biomedical datasets.
View Article and Find Full Text PDFInt J Mol Sci
December 2024
Yantai Key Laboratory of Characteristic Agricultural Biological Resources Conservation and Germplasm Innovative Utilization, College of Life Sciences, Yantai University, Yantai 264005, China.
Powdery mildew, caused by f. sp. (), is a disease that seriously harms wheat production and occurs in all wheat-producing areas around the world.
View Article and Find Full Text PDFBMC Genomics
January 2025
College of Software, Nankai University, TianJin, China.
Background: Mining functional gene modules from genomic data is an important step to detect gene members of pathways or other relations such as protein-protein interactions. This work explores the plausibility of detecting functional gene modules by factorizing gene-phenotype association matrix from the phenotype ontology data rather than the conventionally used gene expression data. Recently, the hierarchical structure of phenotype ontologies has not been sufficiently utilized in gene clustering while functionally related genes are consistently associated with phenotypes on the same path in phenotype ontologies.
View Article and Find Full Text PDFInt J Mol Sci
December 2024
Department of Plant Physiology, Institute for Biological Research "Siniša Stanković"-National Institute of Republic of Serbia, University of Belgrade, Bulevar Despota Stefana 142, 11108 Belgrade, Serbia.
Rafn. is a medicinal plant used as a model for studying plant developmental processes due to its developmental plasticity and ease of manipulation in vitro. Identifying the genes involved in its organogenesis and somatic embryogenesis (SE) is the first step toward unraveling the molecular mechanisms underlying its morphogenic plasticity.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!