Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to the so-called "curse of dimensionality" (i.e., extensively larger number of features compared to the number of samples). Therefore, the generalizability of machine learning models benefits from feature selection, which aims to extract only the most "informative" features and remove noisy "non-informative," irrelevant and redundant features. In this article, we provide a general overview of the different feature selection methods, their advantages, disadvantages, and use cases, focusing on the detection of relevant features (i.e., SNPs) for disease risk prediction.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580915PMC
http://dx.doi.org/10.3389/fbinf.2022.927312DOI Listing

Publication Analysis

Top Keywords

feature selection
12
disease risk
12
machine learning
12
selection methods
8
risk prediction
8
review feature
4
machine
4
methods machine
4
machine learning-based
4
learning-based disease
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!