Biological data are accumulating at a faster rate, but interpreting them still remains a problem. Classifying biological data into distinct groups is the first step in understanding them. Data classification in response to a certain treatment is an extremely important aspect for differentially expressed genes in making present/absent calls. Many feature selection algorithms have been developed including the support vector machine recursive feature elimination procedure (SVM-RFE) and its variants. Support vector machine RFEs are greedy methods that attempt to find superlative possible combinations leading to binary classification, which may not be biologically significant. To overcome this limitation of SVM-RFE, we propose a novel feature selection algorithm, termed as "sigFeature" (https://bioconductor.org/packages/sigFeature/), based on SVM and statistic to discover the differentially significant features along with good performance in classification. The "sigFeature" R package is centered around a function called "sigFeature," which provides automatic selection of features for the binary classification. Using six publicly available microarray data sets (downloaded from Gene Expression Omnibus) with different biological attributes, we further compared the performance of "sigFeature" to three other feature selection algorithms. A small number of selected features (by "sigFeature") also show higher classification accuracy. For further downstream evaluation of its biological signature, we conducted gene set enrichment analysis with the selected features (genes) from "sigFeature" and compared it with the outputs of other algorithms. We observed that "sigFeature" is able to predict the signature of four out of six microarray data sets accurately, whereas the other algorithms predict less data set signatures. Thus, "sigFeature" is considerably better than related algorithms in discovering differentially significant features from microarray data sets.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7169426PMC
http://dx.doi.org/10.3389/fgene.2020.00247DOI Listing

Publication Analysis

Top Keywords

feature selection
16
support vector
12
vector machine
12
microarray data
12
data sets
12
novel feature
8
gene expression
8
data
8
biological data
8
selection algorithms
8

Similar Publications

Prospective validation study of a combined urine and plasma test for predicting high-grade prostate cancer in biopsy naïve men.

Scand J Urol

January 2025

Department of Urology, Odense University Hospital, Odense, Denmark; Academy of Geriatric Cancer Research (AgeCare), Odense University Hospital, Odense, Denmark; Department of Clinical Research, University of Southern Denmark, Odense, Denmark.

Objective: Early and accurate diagnosis of prostate cancer (PC) is crucial for effective treatment. Diagnosing  clinically insignificant cancers can lead to overdiagnosis and overtreatment, highlighting the importance of accurately selecting patients for further evaluation based on improved risk prediction tools. Novel biomarkers offer promise for enhancing this diagnostic process.

View Article and Find Full Text PDF

Background: Clear cell renal cell carcinoma (ccRCC) is the most common subtype of renal cell carcinoma (RCC). Due to the lack of symptoms until advanced stages, early diagnosis of ccRCC is challenging. Therefore, the identification of novel secreted biomarkers for the early detection of ccRCC is urgently needed.

View Article and Find Full Text PDF

Sleep stages classification one of the essential factors concerning sleep disorder diagnoses, which can contribute to many functional disease treatments or prevent the primary cognitive risks in daily activities. In this study, A novel method of mapping EEG signals to music is proposed to classify sleep stages. A total of 4.

View Article and Find Full Text PDF

Developing Mobile Health Applications for Inflammatory Bowel Disease: A Systematic Review of Features and Technologies.

Middle East J Dig Dis

October 2024

Department of Health Information Technology, Ferdows Faculty of Medical Sciences, Birjand University of Medical Sciences, Birjand, Iran.

Background: Patients with inflammatory bowel disease (IBD) require lifelong treatment, which significantly impacts their quality of life. Self-management of this disease is an effective factor in managing chronic conditions and improving patients' quality of life. The use of mobile applications is a novel approach to providing self-management models and healthcare services for patients with IBD.

View Article and Find Full Text PDF

To predict local progression after microwave ablation (MWA) in patients with stage I non-small cell lung cancer (NSCLC), we developed a CT-based radiomics model. Postoperative CT images were used. The intraclass correlation coefficients, two-sample t-test, least absolute shrinkage and selection operator (LASSO) regression, and Pearson correlation analysis were applied to select radiomics features and establish radiomics score.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!