MSPJ: Discovering potential biomarkers in small gene expression datasets ensemble learning.

Comput Struct Biotechnol J

Department of Neurosurgery, Xinqiao Hospital, The Army Medical University, Chongqing 400037, China.

Published: July 2022

In transcriptomics, differentially expressed genes (DEGs) provide fine-grained phenotypic resolution for comparisons between groups and insights into molecular mechanisms underlying the pathogenesis of complex diseases or phenotypes. The robust detection of DEGs from large datasets is well-established. However, owing to various limitations (e.g., the low availability of samples for some diseases or limited research funding), small sample size is frequently used in experiments. Therefore, methods to screen reliable and stable features are urgently needed for analyses with limited sample size. In this study, MSPJ, a new machine learning approach for identifying DEGs was proposed to mitigate the reduced power and improve the stability of DEG identification in small gene expression datasets. This ensemble learning-based method consists of three algorithms: an improved multiple random sampling with -analysis, SVM-RFE (support vector machines-recursive feature elimination), and permutation test. MSPJ was compared with ten classical methods by 94 simulated datasets and large-scale benchmarking with 165 real datasets. The results showed that, among these methods MSPJ had the best performance in most small gene expression datasets, especially those with sample size below 30. In summary, the MSPJ method enables effective feature selection for robust DEG identification in small transcriptome datasets and is expected to expand research on the molecular mechanisms underlying complex diseases or phenotypes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9304602PMC
http://dx.doi.org/10.1016/j.csbj.2022.07.022DOI Listing

Publication Analysis

Top Keywords

small gene
12
gene expression
12
expression datasets
12
sample size
12
datasets ensemble
8
molecular mechanisms
8
mechanisms underlying
8
complex diseases
8
diseases phenotypes
8
deg identification
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!