Evaluation of machine learning models on protein level inference from prioritized RNA features.

Brief Bioinform

Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Rare Disease Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China.

Published: May 2022

The parallel measurement of transcriptome and proteome revealed unmatched profiles. Since proteomic analysis is more expensive and challenging than transcriptomic analysis, the question of how to use messenger RNA (mRNA) expression data to predict protein level is extremely important. Here, we comprehensively evaluated 13 machine learning models on inferring protein expression levels using RNA expression profile. A total of 20 proteogenomic datasets from three mainstream proteomic platforms with >2500 samples of 13 human tissues were collected for model evaluation. Our results highlighted that the appropriate feature selection methods combined with classical machine learning models could achieve excellent predictive performance. The voting ensemble model outperformed other candidate models across datasets. Adding the mRNA proxy model to the regression model further improved the prediction performance. The dataset and gene characteristics could affect the prediction performance. Finally, we applied the model to the brain transcriptome of cerebral cortex regions to infer the protein profile for better understanding the functional characteristics of the brain regions. This benchmarking work not only provides useful hints on the inherent correlation between transcriptome and proteome, but also has practical value of the transcriptome-based prediction of protein expression levels.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbac091DOI Listing

Publication Analysis

Top Keywords

machine learning
12
learning models
12
protein level
8
transcriptome proteome
8
protein expression
8
expression levels
8
prediction performance
8
protein
5
model
5
evaluation machine
4

Similar Publications

BMT: A Cross-Validated ThinPrep Pap Cervical Cytology Dataset for Machine Learning Model Training and Validation.

Sci Data

December 2024

Department of Pathology and Laboratory Medicine, Alpert Medical School, Brown University, Providence, RI, 02912, USA.

In the past several years, a few cervical Pap smear datasets have been published for use in clinical training. However, most publicly available datasets consist of pre-segmented single cell images, contain on-image annotations that must be manually edited out, or are prepared using the conventional Pap smear method. Multicellular liquid Pap image datasets are a more accurate reflection of current cervical screening techniques.

View Article and Find Full Text PDF

Background: High triglyceride (TG) affects and is affected of other hematological factors. The determination of serum fasted triglycerides concentrations, as part of a lipid profile, is crucial key point in hematological factors and significantly affect various systemic diseases. This study was carried out to assess the potential relation between the concentration of TG and hematological factors.

View Article and Find Full Text PDF

Generative Artificial Intelligence (AI), characterized by its ability to generate diverse forms of content including text, images, video and audio, has revolutionized many fields, including medical education. Generative AI leverages machine learning to create diverse content, enabling personalized learning, enhancing resource accessibility, and facilitating interactive case studies. This narrative review explores the integration of generative artificial intelligence (AI) into orthopedic education and training, highlighting its potential, current challenges, and future trajectory.

View Article and Find Full Text PDF

Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review.

BMC Public Health

December 2024

Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada.

Background: Machine learning (ML) is increasingly used in population and public health to support epidemiological studies, surveillance, and evaluation. Our objective was to conduct a scoping review to identify studies that use ML in population health, with a focus on its use in non-communicable diseases (NCDs). We also examine potential algorithmic biases in model design, training, and implementation, as well as efforts to mitigate these biases.

View Article and Find Full Text PDF

Development and Validation of a Nomogram Based on Multiparametric MRI for Predicting Lymph Node Metastasis in Endometrial Cancer: A Retrospective Cohort Study.

Acad Radiol

December 2024

Department of Radiology, The First Affiliated Hospital of Guangxi Medical University, Nanning, China (Y.T., Y.W., Y.Y., X.Q., Y.H., J.L.); Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor (Guangxi Medical University), Ministry of Education, Nanning 530021, Guangxi Zhuang Autonomous Region, PR China (J.L.). Electronic address:

Rationale And Objectives: To develop a radiomics nomogram based on clinical and magnetic resonance features to predict lymph node metastasis (LNM) in endometrial cancer (EC).

Materials And Methods: We retrospectively collected 308 patients with endometrial cancer (EC) from two centers. These patients were divided into a training set (n=155), a test set (n=67), and an external validation set (n=86).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!