categoryCompare, an analytical tool based on feature annotations.

Robert M Flight Benjamin J Harrison Fahim Mohammad Mary B Bunge Lawrence D F Moon Jeffrey C Petruska Eric C Rouchka

Front Genet

Bioinformatics and Biomedical Computing Laboratory, Department of Computer Engineering and Computer Science, University of Louisville Louisville, KY, USA.

Published: May 2014

Assessment of high-throughput-omics data initially focuses on relative or raw levels of a particular feature, such as an expression value for a transcript, protein, or metabolite. At a second level, analyses of annotations including known or predicted functions and associations of each individual feature, attempt to distill biological context. Most currently available comparative- and meta-analyses methods are dependent on the availability of identical features across data sets, and concentrate on determining features that are differentially expressed across experiments, some of which may be considered "biomarkers." The heterogeneity of measurement platforms and inherent variability of biological systems confounds the search for robust biomarkers indicative of a particular condition. In many instances, however, multiple data sets show involvement of common biological processes or signaling pathways, even though individual features are not commonly measured or differentially expressed between them. We developed a methodology, categoryCompare, for cross-platform and cross-sample comparison of high-throughput data at the annotation level. We assessed the utility of the approach using hypothetical data, as well as determining similarities and differences in the set of processes in two instances: (1) denervated skin vs. denervated muscle, and (2) colon from Crohn's disease vs. colon from ulcerative colitis (UC). The hypothetical data showed that in many cases comparing annotations gave superior results to comparing only at the gene level. Improved analytical results depended as well on the number of genes included in the annotation term, the amount of noise in relation to the number of genes expressing in unenriched annotation categories, and the specific method in which samples are combined. In the skin vs. muscle denervation comparison, the tissues demonstrated markedly different responses. The Crohn's vs. UC comparison showed gross similarities in inflammatory response in the two diseases, with particular processes specific to each disease.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4010757	PMC
http://dx.doi.org/10.3389/fgene.2014.00098	DOI Listing

Publication Analysis

Top Keywords

data sets

differentially expressed

hypothetical data

number genes

data

categorycompare analytical

analytical tool

tool based

based feature

feature annotations

Similar Publications

Integrated View of Baseline Protein Expression in Human Tissues Using Public Data Independent Acquisition Data Sets.

J Proteome Res

January 2025

European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, U.K.

Ananth Prakash Andrew Collins Liora Vilmovsky Silvie Fexova Andrew R Jones

The PRIDE database is the largest public data repository of mass spectrometry-based proteomics data and currently stores more than 40,000 data sets covering a wide range of organisms, experimental techniques, and biological conditions. During the past few years, PRIDE has seen a significant increase in the amount of submitted data-independent acquisition (DIA) proteomics data sets. This provides an excellent opportunity for large-scale data reanalysis and reuse.

View Article and Find Full Text PDF

Similar Publications

Coancestry superposed on admixed populations yields measures of relatedness at individual-level resolution.

bioRxiv

December 2024

Danfeng Chen John D Storey

The admixture model is widely applied to estimate and interpret population structure among individuals. Here we consider a "standard admixture" model that assumes the admixed populations are unrelated and also a generalized model, where the admixed populations themselves are related via coancestry (or covariance) of allele frequencies. The generalized model yields a potentially more realistic and substantially more flexible model that we call "super admixture".

View Article and Find Full Text PDF

Similar Publications

NEBULA101: an open dataset for the study of language aptitude in behaviour, brain structure and function.

Sci Data

January 2025

Brain and Language Lab, Department of Psychology, Faculty of Psychology and Education Science, University of Geneva, Geneva, Switzerland.

Alessandra Rampinini Irene Balboni Olga Kepinska Raphael Berthele Narly Golestani

This paper introduces the "NEBULA101 - Neuro-behavioural Understanding of Language Aptitude" dataset, which comprises behavioural and brain imaging data from 101 healthy adults to examine individual differences in language and cognition. Human language, a multifaceted behaviour, varies significantly among individuals, at different processing levels. Recent advances in cognitive science have embraced an integrated approach, combining behavioural and brain studies to explore these differences comprehensively.

View Article and Find Full Text PDF

Similar Publications

Semisupervised Contrastive Learning for Bioactivity Prediction Using Cell Painting Image Data.

J Chem Inf Model

January 2025

Research Unit Structural Chemistry and Computational Biophysics, Leibniz-Forschungsinstitut für Molekulare Pharmakologie, Berlin 13125, Germany.

David Bushiri Pwesombo Carsten Beese Christopher Schmied Han Sun

Morphological profiling has recently demonstrated remarkable potential for identifying the biological activities of small molecules. Alongside the fully supervised and self-supervised machine learning methods recently proposed for bioactivity prediction from Cell Painting image data, we introduce here a semisupervised contrastive (SemiSupCon) learning approach. This approach combines the strengths of using biological annotations in supervised contrastive learning and leveraging large unannotated image data sets with self-supervised contrastive learning.

View Article and Find Full Text PDF

Similar Publications

Generalized, sublethal damage-based mathematical approach for improved modeling of clonogenic survival curve flattening upon hyperthermia, radiotherapy, and beyond.

Phys Med Biol

January 2025

OncoRay - National Center for Radiation Research in Oncology, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Helmholtz-Zentrum Dresden - Rossendorf, Dresden, Sachsen, 01307, GERMANY.

Adriana María De Mendoza Soňa Michlíiková Paula Sofía Castro Anni Gyssel Muñoz Lisa Eckhardt

Mathematical modeling can offer valuable insights into the behavior of biological systems upon treatment. Different mathematical models (empirical, semi-empirical, and mechanistic) have been designed to predict the efficacy of either hyperthermia (HT), radiotherapy (RT), or their combination. However, mathematical approaches capable of modeling cell survival from shared general principles for both mono-treatments alone and their co-application are rare.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!