Predictive modelling using pathway scores: robustness and significance of pathway collections.

BMC Bioinformatics

Computational and Systems Medicine, Department of Surgery and Cancer, Sir Alexander Fleming building, Imperial College, London, SW1 2AZ, UK.

Published: November 2019

Background: Transcriptomic data is often used to build statistical models which are predictive of a given phenotype, such as disease status. Genes work together in pathways and it is widely thought that pathway representations will be more robust to noise in the gene expression levels. We aimed to test this hypothesis by constructing models based on either genes alone, or based on sample specific scores for each pathway, thus transforming the data to a 'pathway space'. We progressively degraded the raw data by addition of noise and examined the ability of the models to maintain predictivity.

Results: Models in the pathway space indeed had higher predictive robustness than models in the gene space. This result was independent of the workflow, parameters, classifier and data set used. Surprisingly, randomised pathway mappings produced models of similar accuracy and robustness to true mappings, suggesting that the success of pathway space models is not conferred by the specific definitions of the pathway. Instead, predictive models built on the true pathway mappings led to prediction rules with fewer influential pathways than those built on randomised pathways. The extent of this effect was used to differentiate pathway collections coming from a variety of widely used pathway databases.

Conclusions: Prediction models based on pathway scores are more robust to degradation of gene expression information than the equivalent models based on ungrouped genes. While models based on true pathway scores are not more robust or accurate than those based on randomised pathways, true pathways produced simpler prediction rules, emphasizing a smaller number of pathways.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6827178PMC
http://dx.doi.org/10.1186/s12859-019-3163-0DOI Listing

Publication Analysis

Top Keywords

models based
16
pathway
13
pathway scores
12
models
11
pathway collections
8
gene expression
8
pathway space
8
pathway mappings
8
true pathway
8
prediction rules
8

Similar Publications

deep-AMPpred: A Deep Learning Method for Identifying Antimicrobial Peptides and Their Functional Activities.

J Chem Inf Model

January 2025

School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Key Laboratory of Agricultural Sensors for Ministry of Agriculture and Rural Affairs, Anhui Agricultural University, Hefei, Anhui 230036, China.

Antimicrobial peptides (AMPs) are small peptides that play an important role in disease defense. As the problem of pathogen resistance caused by the misuse of antibiotics intensifies, the identification of AMPs as alternatives to antibiotics has become a hot topic. Accurately identifying AMPs using computational methods has been a key issue in the field of bioinformatics in recent years.

View Article and Find Full Text PDF

Objective: Carbohydrate antigen 19-9 (CA19-9) and carcinoembryonic antigen (CEA) serve as pivotal tumor markers in colorectal cancer (CRC). However, uncertainty persists regarding the prognostic significance of the two tumor markers when falling within the normal range. We attempt to compare the prognostic differences of tumor markers at different levels within the reference range.

View Article and Find Full Text PDF

Radon Exposure and Gestational Diabetes.

JAMA Netw Open

January 2025

Department of Obstetrics and Gynecology, Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, New York.

Importance: Understanding environmental risk factors for gestational diabetes (GD) is crucial for developing preventive strategies and improving pregnancy outcomes.

Objective: To examine the association of county-level radon exposure with GD risk in pregnant individuals.

Design, Setting, And Participants: This multicenter, population-based cohort study used data from the Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be (nuMoM2b) cohort, which recruited nulliparous pregnant participants from 8 US clinical centers between October 2010 and September 2013.

View Article and Find Full Text PDF

Establishing a living biobank of pediatric high-grade glioma and ependymoma suitable for cancer pharmacology.

Neuro Oncol

January 2025

Childhood Cancer & Cell Death team (C3 team), Consortium South-ROCK, LabEx DEVweCAN, Institut Convergence Plascan, Centre Léon Bérard, Centre de Recherche en Cancérologie de Lyon (CRCL), Université Claude Bernard Lyon 1, INSERM 1052, CNRS 5286, 69008 Lyon, France.

Background: Brain tumors are the deadliest solid tumors in children and adolescents. Most of these tumors are glial in origin and exhibit strong heterogeneity, hampering the development of effective therapeutic strategies. In the past decades, patient-derived tumor organoids (PDT-O) have emerged as powerful tools for modeling tumoral cell diversity and dynamics, and they could then help defining new therapeutic options for pediatric brain tumors.

View Article and Find Full Text PDF

Background: Falls are among the most prevalent workplace accidents, necessitating thorough screening for susceptibility to falls and customization of individualized fall prevention programs. The aim of this study was to develop and validate a high fall risk prediction model using machine learning (ML) and video-based first three steps in middle-aged workers.

Methods: Train data (n=190, age 54.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!