Identification of differentially expressed proteins in a proteomics workflow typically encompasses five key steps: raw data quantification, expression matrix construction, matrix normalization, missing value imputation (MVI), and differential expression analysis. The plethora of options in each step makes it challenging to identify optimal workflows that maximize the identification of differentially expressed proteins. To identify optimal workflows and their common properties, we conduct an extensive study involving 34,576 combinatoric experiments on 24 gold standard spike-in datasets. Applying frequent pattern mining techniques to top-ranked workflows, we uncover high-performing rules that demonstrate optimality has conserved properties. Via machine learning, we confirm optimal workflows are indeed predictable, with average cross-validation F1 scores and Matthew's correlation coefficients surpassing 0.84. We introduce an ensemble inference to integrate results from individual top-performing workflows for expanding differential proteome coverage and resolve inconsistencies. Ensemble inference provides gains in pAUC (up to 4.61%) and G-mean (up to 11.14%) and facilitates effective aggregation of information across varied quantification approaches such as topN, directLFQ, MaxLFQ intensities, and spectral counts. However, further development and evaluation are needed to establish acceptable frameworks for conducting ensemble inference on multiple proteomics workflows.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11082229PMC
http://dx.doi.org/10.1038/s41467-024-47899-wDOI Listing

Publication Analysis

Top Keywords

ensemble inference
16
optimal workflows
12
differential expression
8
expression analysis
8
high-performing rules
8
identification differentially
8
differentially expressed
8
expressed proteins
8
identify optimal
8
workflows
6

Similar Publications

Protein Secondary Structure Prediction (PSSP) is regarded as a challenging task in bioinformatics, and numerous approaches to achieve a more accurate prediction have been proposed. Accurate PSSP can be instrumental in inferring protein tertiary structure and their functions. Machine Learning and in particular Deep Learning approaches show promising results for the PSSP problem.

View Article and Find Full Text PDF

Acute diarrheal disease is one of the leading causes of death in children under age 5, disproportionately impacting children in low-resource settings. Many of these cases are caused by bacteria and therefore could respond to antibiotic treatment; however, the benefits of widely prescribing antibiotics must be weighed against the risks for the emergence of microbial resistance. These challenges present the opportunity for developing individualized treatment guidelines for diarrheal disease.

View Article and Find Full Text PDF

Spaceflight has several detrimental effects on human and rodent health. For example, liver dysfunction is a common phenotype observed in space-flown rodents, and this dysfunction is partially reflected in transcriptomic changes. Studies linking transcriptomics with liver dysfunction rely on tools which exploit correlation, but these tools make no attempt to disambiguate true correlations from spurious ones.

View Article and Find Full Text PDF

Robust RNA secondary structure prediction with a mixture of deep learning and physics-based experts.

Biol Methods Protoc

January 2025

Department of Physics, George Washington University, Washington, DC 20052, United States.

A mixture-of-experts (MoE) approach has been developed to mitigate the poor out-of-distribution (OOD) generalization of deep learning (DL) models for single-sequence-based prediction of RNA secondary structure. The main idea behind this approach is to use DL models for in-distribution (ID) test sequences to leverage their superior ID performances, while relying on physics-based models for OOD sequences to ensure robust predictions. One key ingredient of the pipeline, named MoEFold2D, is automated ID/OOD detection via consensus analysis of an ensemble of DL model predictions without requiring access to training data during inference.

View Article and Find Full Text PDF

Identifying cancer prognosis genes through causal learning.

Brief Bioinform

November 2024

School of Artificial Intelligence, Jilin University, 3003 Qianjin Street, 130012 Changchun, China.

Accurate identification of causal genes for cancer prognosis is critical for estimating disease progression and guiding treatment interventions. In this study, we propose CPCG (Cancer Prognosis's Causal Gene), a two-stage framework identifying gene sets causally associated with patient prognosis across diverse cancer types using transcriptomic data. Initially, an ensemble approach models gene expression's impact on survival with parametric and semiparametric hazard models.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!