Gene Prioritization by Compressive Data Fusion and Chaining.

Marinka Žitnik Edward A Nam Christopher Dinh Adam Kuspa Gad Shaulsky Blaž Zupan

PLoS Comput Biol

Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America.

Published: October 2015

Data integration procedures combine heterogeneous data sets into predictive models, but they are limited to data explicitly related to the target object type, such as genes. Collage is a new data fusion approach to gene prioritization. It considers data sets of various association levels with the prediction task, utilizes collective matrix factorization to compress the data, and chaining to relate different object types contained in a data compendium. Collage prioritizes genes based on their similarity to several seed genes. We tested Collage by prioritizing bacterial response genes in Dictyostelium as a novel model system for prokaryote-eukaryote interactions. Using 4 seed genes and 14 data sets, only one of which was directly related to the bacterial response, Collage proposed 8 candidate genes that were readily validated as necessary for the response of Dictyostelium to Gram-negative bacteria. These findings establish Collage as a method for inferring biological knowledge from the integration of heterogeneous and coarsely related data sets.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4605714	PMC
http://dx.doi.org/10.1371/journal.pcbi.1004552	DOI Listing

Publication Analysis

Top Keywords

data sets

data

gene prioritization

data fusion

seed genes

bacterial response

genes

collage

prioritization compressive

compressive data

Similar Publications

PathInHydro, a Set of Machine Learning Models to Identify Unbinding Pathways of Gas Molecules in [NiFe] Hydrogenases.

J Chem Inf Model

January 2025

Institute of Chemistry, Technische Universität Berlin, Straße des 17. Juni 135, Berlin 10623, Germany.

Farzin Sohraby Jing-Yao Guo Ariane Nunes-Alves

Machine learning (ML) is a powerful tool for the automated data analysis of molecular dynamics (MD) simulations. Recent studies showed that ML models can be used to identify protein-ligand unbinding pathways and understand the underlying mechanism. To expedite the examination of MD simulations, we constructed PathInHydro, a set of supervised ML models capable of automatically assigning unbinding pathways for the dissociation of gas molecules from [NiFe] hydrogenases, using the unbinding trajectories of CO and H from [NiFe] hydrogenase as a training set.

View Article and Find Full Text PDF

Similar Publications

Integrated View of Baseline Protein Expression in Human Tissues Using Public Data Independent Acquisition Data Sets.

J Proteome Res

January 2025

European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, U.K.

Ananth Prakash Andrew Collins Liora Vilmovsky Silvie Fexova Andrew R Jones

The PRIDE database is the largest public data repository of mass spectrometry-based proteomics data and currently stores more than 40,000 data sets covering a wide range of organisms, experimental techniques, and biological conditions. During the past few years, PRIDE has seen a significant increase in the amount of submitted data-independent acquisition (DIA) proteomics data sets. This provides an excellent opportunity for large-scale data reanalysis and reuse.

View Article and Find Full Text PDF

Similar Publications

Coancestry superposed on admixed populations yields measures of relatedness at individual-level resolution.

bioRxiv

December 2024

Danfeng Chen John D Storey

The admixture model is widely applied to estimate and interpret population structure among individuals. Here we consider a "standard admixture" model that assumes the admixed populations are unrelated and also a generalized model, where the admixed populations themselves are related via coancestry (or covariance) of allele frequencies. The generalized model yields a potentially more realistic and substantially more flexible model that we call "super admixture".

View Article and Find Full Text PDF

Similar Publications

NEBULA101: an open dataset for the study of language aptitude in behaviour, brain structure and function.

Sci Data

January 2025

Brain and Language Lab, Department of Psychology, Faculty of Psychology and Education Science, University of Geneva, Geneva, Switzerland.

Alessandra Rampinini Irene Balboni Olga Kepinska Raphael Berthele Narly Golestani

This paper introduces the "NEBULA101 - Neuro-behavioural Understanding of Language Aptitude" dataset, which comprises behavioural and brain imaging data from 101 healthy adults to examine individual differences in language and cognition. Human language, a multifaceted behaviour, varies significantly among individuals, at different processing levels. Recent advances in cognitive science have embraced an integrated approach, combining behavioural and brain studies to explore these differences comprehensively.

View Article and Find Full Text PDF

Similar Publications

Semisupervised Contrastive Learning for Bioactivity Prediction Using Cell Painting Image Data.

J Chem Inf Model

January 2025

Research Unit Structural Chemistry and Computational Biophysics, Leibniz-Forschungsinstitut für Molekulare Pharmakologie, Berlin 13125, Germany.

David Bushiri Pwesombo Carsten Beese Christopher Schmied Han Sun

Morphological profiling has recently demonstrated remarkable potential for identifying the biological activities of small molecules. Alongside the fully supervised and self-supervised machine learning methods recently proposed for bioactivity prediction from Cell Painting image data, we introduce here a semisupervised contrastive (SemiSupCon) learning approach. This approach combines the strengths of using biological annotations in supervised contrastive learning and leveraging large unannotated image data sets with self-supervised contrastive learning.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!