Predicting missing proteomics values using machine learning: Filling the gap using transcriptomics and other biological features.

Comput Struct Biotechnol J

Department of Toxicogenomics, School of Oncology and Developmental Biology (GROW), Maastricht University, Maastricht, The Netherlands.

Published: April 2022

Proteins are often considered the main biological element in charge of the different functions and structures of a cell. However, proteomics, the global study of all expressed proteins, often performed by mass spectrometry, is limited by its stochastic sampling and can only quantify a limited amount of protein per sample. Transcriptomics, which allows an exhaustive analysis of all expressed transcripts, is often used as a surrogate. However, the transcript level does not present a high level of correlation with the corresponding protein level, notably due to the existence of several post-transcriptional regulatory mechanisms. In this publication, we hypothesize that the missing protein values in proteomics could be predicted using machine learning regression methods, trained with many features extracted from transcriptomics, including known translational regulatory elements such as microRNAs and circular RNAs. After considering different machine learning algorithms applied on two different splitting strategies, we report that random forest can predict proteins in new samples out of transcriptomics data with good accuracy. The proposed pre-processing and model building scripts can be accessed on GitHub: https://github.com/jochotecoa/ml_proteomics.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9077535PMC
http://dx.doi.org/10.1016/j.csbj.2022.04.017DOI Listing

Publication Analysis

Top Keywords

machine learning
12
predicting missing
4
missing proteomics
4
proteomics values
4
values machine
4
learning filling
4
filling gap
4
transcriptomics
4
gap transcriptomics
4
transcriptomics biological
4

Similar Publications

Exploring the role of oxidative stress in carotid atherosclerosis: insights from transcriptomic data and single-cell sequencing combined with machine learning.

Biol Direct

January 2025

National Key Laboratory for Innovation and Transformation of Luobing Theory; The Key Laboratory of Cardiovascular Remodeling and Function Research, Chinese Ministry of Education, Chinese National Health Commission and Chinese Academy of Medical Sciences, Jinan, China.

Background: Carotid atherosclerotic plaque is the primary cause of cardiovascular and cerebrovascular diseases. It is closely related to oxidative stress and immune inflammation. This bioinformatic study was conducted to identify key oxidative stress-related genes and key immune cell infiltration involved in the formation, progression, and stabilization of plaques and investigate the relationship between them.

View Article and Find Full Text PDF

Unveiling new therapeutic horizons in rheumatoid arthritis: an In-depth exploration of circular RNAs derived from plasma exosomes.

J Orthop Surg Res

January 2025

Department of Rheumatology and Immunology, Affiliated Hospital of Yangzhou University, Yangzhou University, No. 368 Hanjiang Middle Road, Yangzhou, Jiangsu, 225000, China.

Rheumatoid arthritis (RA), a chronic inflammatory joint disease causing permanent disability, involves exosomes, nanosized mammalian extracellular particles. Circular RNA (circRNA) serves as a biomarker in RA blood samples. This research screened differentially expressed circRNAs in RA patient plasma exosomes for novel diagnostic biomarkers.

View Article and Find Full Text PDF

AiGPro: a multi-tasks model for profiling of GPCRs for agonist and antagonist.

J Cheminform

January 2025

School of Systems Biomedical Science, Soongsil University, 369 Sangdo-ro, Dongjak-gu, 06978, Seoul, Republic of Korea.

G protein-coupled receptors (GPCRs) play vital roles in various physiological processes, making them attractive drug discovery targets. Meanwhile, deep learning techniques have revolutionized drug discovery by facilitating efficient tools for expediting the identification and optimization of ligands. However, existing models for the GPCRs often focus on single-target or a small subset of GPCRs or employ binary classification, constraining their applicability for high throughput virtual screening.

View Article and Find Full Text PDF

Detection of early relapse in multiple myeloma patients.

Cell Div

January 2025

Babak Myeloma Group, Department of Pathophysiology, Faculty of Medicine, Masaryk University, Brno, Czech Republic.

Background: Multiple myeloma (MM) represents the second most common hematological malignancy characterized by the infiltration of the bone marrow by plasma cells that produce monoclonal immunoglobulin. While the quality and length of life of MM patients have significantly increased, MM remains a hard-to-treat disease; almost all patients relapse. As MM is highly heterogenous, patients relapse at different times.

View Article and Find Full Text PDF

Background: Amebiasis represents a significant global health concern. This is especially evident in developing countries, where infections are more common. The primary diagnostic method in laboratories involves the microscopy of stool samples.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!