Aging is a complex process with poorly understood genetic mechanisms. Recent studies have sought to classify genes as pro-longevity or anti-longevity using a variety of machine learning algorithms. However, it is not clear which types of features are best for optimizing classification performance and which algorithms are best suited to this task. Further, performance assessments based on held-out test data are lacking. We systematically compare five popular classification algorithms using gene ontology and gene expression datasets as features to predict the pro-longevity versus anti-longevity status of genes for two model organisms (C. elegans and S. cerevisiae) using the GenAge database as ground truth. We find that elastic net penalized logistic regression performs particularly well at this task. Using elastic net, we make novel predictions of pro- and anti-longevity genes that are not currently in the GenAge database.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7728194PMC
http://dx.doi.org/10.1371/journal.pcbi.1008429DOI Listing

Publication Analysis

Top Keywords

gene expression
8
genage database
8
elastic net
8
identifying longevity
4
longevity associated
4
genes
4
associated genes
4
genes integrating
4
integrating gene
4
expression curated
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!