Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning.

Sci Rep

Departamento de Biología del Neurodesarrollo, Instituto de Investigaciones Biológicas Clemente Estable, Av. Italia 3318, 11600, Montevideo, Uruguay.

Published: July 2022

The function of most genes is unknown. The best results in automated function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that a gene's function is not independent of its location, the few available examples of gene function prediction based on gene location rely on sequence identity between genes of different organisms and are thus subjected to the limitations of the relationship between sequence and function. Here we predict thousands of gene functions in five model eukaryotes (Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Homo sapiens) using machine learning models exclusively trained with features derived from the location of genes in the genomes to which they belong. Our aim was not to obtain the best performing method to automated function prediction but to explore the extent to which a gene's location can predict its function in eukaryotes. We found that our models outperform BLAST when predicting terms from Biological Process and Cellular Component Ontologies, showing that, at least in some cases, gene location alone can be more useful than sequence to infer gene function.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9270439PMC
http://dx.doi.org/10.1038/s41598-022-15329-wDOI Listing

Publication Analysis

Top Keywords

function prediction
16
gene function
12
model eukaryotes
8
based gene
8
machine learning
8
function
8
automated function
8
gene location
8
gene
7
location
6

Similar Publications

"The Brain is…": A Survey of the Brain's Many Definitions.

Neuroinformatics

January 2025

Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, 760 Westwood Plaza, Los Angeles, CA, 90024, USA.

A reader of the peer-reviewed neuroscience literature will often encounter expressions like the following: 'the brain is a dynamic system', 'the brain is a complex network', or 'the brain is a highly metabolic organ'. These expressions attempt to define the essential functions and properties of the mammalian or human brain in a simple phrase or sentence, sometimes using metaphors or analogies. We sought to survey the most common phrases of the form 'the brain is…' in the biomedical literature to provide insights into current conceptualizations of the brain.

View Article and Find Full Text PDF

Stroke is the second-leading global cause of death. The damage attributed to the immune storm triggered by ischemia-reperfusion injury (IRI) post-stroke is substantial. However, data on the transcriptomic dynamics of pyroptosis in IRI are limited.

View Article and Find Full Text PDF

Chronic obstructive pulmonary disease (COPD) is a prevalent chronic inflammatory airway disease with high incidence and significant disease burden. R-loops, functional chromatin structure formed during transcription, are closely associated with inflammation due to its aberrant formation. However, the role of R-loop regulators (RLRs) in COPD remains unclear.

View Article and Find Full Text PDF

Old and New Biomarkers in Idiopathic Recurrent Acute Pericarditis (IRAP): Prognosis and Outcomes.

Curr Cardiol Rep

January 2025

Division of Internal Medicine, Fatebenefratelli Hospital, ASST Fatebenefratelli Sacco, University of Milan, Piazzale Principessa Clotilde, 3, Milan, 20121, Italy.

Purpose Of Review: To outline the latest discoveries regarding the utility and reliability of serum biomarkers in idiopathic recurrent acute pericarditis (IRAP), considering recent findings on its pathogenesis. The study highlights the predictive role of these biomarkers in potential short- (cardiac tamponade, recurrences) and long-term complications (constrictive pericarditis, death).

Recent Findings: The pathogenesis of pericarditis has been better defined in recent years, focusing on the autoinflammatory pathway.

View Article and Find Full Text PDF

Unraveling the potential mechanism and prognostic value of pentose phosphate pathway in hepatocellular carcinoma: a comprehensive analysis integrating bulk transcriptomics and single-cell sequencing data.

Funct Integr Genomics

January 2025

Institute of Infectious Diseases, Guangdong Province, Guangzhou Eighth People's Hospital, Guangzhou Medical University, 8 Huaying Road, Baiyun District, Guangzhou, 510440, China.

Hepatocellular carcinoma (HCC) remains a malignant and life-threatening tumor with an extremely poor prognosis, posing a significant global health challenge. Despite the continuous emergence of novel therapeutic agents, patients exhibit substantial heterogeneity in their responses to anti-tumor drugs and overall prognosis. The pentose phosphate pathway (PPP) is highly activated in various tumor cells and plays a pivotal role in tumor metabolic reprogramming.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!