Identification of differentially expressed proteins in a proteomics workflow typically encompasses five key steps: raw data quantification, expression matrix construction, matrix normalization, missing value imputation (MVI), and differential expression analysis. The plethora of options in each step makes it challenging to identify optimal workflows that maximize the identification of differentially expressed proteins. To identify optimal workflows and their common properties, we conduct an extensive study involving 34,576 combinatoric experiments on 24 gold standard spike-in datasets. Applying frequent pattern mining techniques to top-ranked workflows, we uncover high-performing rules that demonstrate optimality has conserved properties. Via machine learning, we confirm optimal workflows are indeed predictable, with average cross-validation F1 scores and Matthew's correlation coefficients surpassing 0.84. We introduce an ensemble inference to integrate results from individual top-performing workflows for expanding differential proteome coverage and resolve inconsistencies. Ensemble inference provides gains in pAUC (up to 4.61%) and G-mean (up to 11.14%) and facilitates effective aggregation of information across varied quantification approaches such as topN, directLFQ, MaxLFQ intensities, and spectral counts. However, further development and evaluation are needed to establish acceptable frameworks for conducting ensemble inference on multiple proteomics workflows.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11082229 | PMC |
http://dx.doi.org/10.1038/s41467-024-47899-w | DOI Listing |
Comput Struct Biotechnol J
January 2025
University of Cyprus, Department of Computer Science, Nicosia, Cyprus.
Protein Secondary Structure Prediction (PSSP) is regarded as a challenging task in bioinformatics, and numerous approaches to achieve a more accurate prediction have been proposed. Accurate PSSP can be instrumental in inferring protein tertiary structure and their functions. Machine Learning and in particular Deep Learning approaches show promising results for the PSSP problem.
View Article and Find Full Text PDFmedRxiv
January 2025
Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Georgia, USA.
Acute diarrheal disease is one of the leading causes of death in children under age 5, disproportionately impacting children in low-resource settings. Many of these cases are caused by bacteria and therefore could respond to antibiotic treatment; however, the benefits of widely prescribing antibiotics must be weighed against the risks for the emergence of microbial resistance. These challenges present the opportunity for developing individualized treatment guidelines for diarrheal disease.
View Article and Find Full Text PDFSci Rep
January 2025
NASA Ames Research Center, Moffett Field, Mountain View, USA.
Spaceflight has several detrimental effects on human and rodent health. For example, liver dysfunction is a common phenotype observed in space-flown rodents, and this dysfunction is partially reflected in transcriptomic changes. Studies linking transcriptomics with liver dysfunction rely on tools which exploit correlation, but these tools make no attempt to disambiguate true correlations from spurious ones.
View Article and Find Full Text PDFBiol Methods Protoc
January 2025
Department of Physics, George Washington University, Washington, DC 20052, United States.
A mixture-of-experts (MoE) approach has been developed to mitigate the poor out-of-distribution (OOD) generalization of deep learning (DL) models for single-sequence-based prediction of RNA secondary structure. The main idea behind this approach is to use DL models for in-distribution (ID) test sequences to leverage their superior ID performances, while relying on physics-based models for OOD sequences to ensure robust predictions. One key ingredient of the pipeline, named MoEFold2D, is automated ID/OOD detection via consensus analysis of an ensemble of DL model predictions without requiring access to training data during inference.
View Article and Find Full Text PDFBrief Bioinform
November 2024
School of Artificial Intelligence, Jilin University, 3003 Qianjin Street, 130012 Changchun, China.
Accurate identification of causal genes for cancer prognosis is critical for estimating disease progression and guiding treatment interventions. In this study, we propose CPCG (Cancer Prognosis's Causal Gene), a two-stage framework identifying gene sets causally associated with patient prognosis across diverse cancer types using transcriptomic data. Initially, an ensemble approach models gene expression's impact on survival with parametric and semiparametric hazard models.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!