PSAURON: a tool for assessing protein annotation across a broad range of species.

NAR Genom Bioinform

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.

Published: March 2025

Evaluating the accuracy of protein-coding sequences in genome annotations is a challenging problem for which there is no broadly applicable solution. In this manuscript, we introduce PSAURON (Protein Sequence Assessment Using a Reference ORF Network), a novel software tool developed to help assess the quality of protein-coding gene annotations. Utilizing a machine learning model trained on a diverse dataset from over 1000 plant and animal genomes, PSAURON assigns a score to coding DNA or protein sequence that reflects the likelihood that the sequence is a genuine protein-coding region. PSAURON scores can be used for genome-wide protein annotation assessment as well as the rapid identification of potentially spurious annotated proteins. Validation against established benchmarks demonstrates PSAURON's effectiveness and correlation with recognized measures of protein quality, highlighting its potential use as a widely applicable method to evaluate precision in gene annotation. PSAURON is open source and freely available at https://github.com/salzberg-lab/PSAURON.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11704789PMC
http://dx.doi.org/10.1093/nargab/lqae189DOI Listing

Publication Analysis

Top Keywords

protein annotation
8
protein sequence
8
psauron
5
protein
5
psauron tool
4
tool assessing
4
assessing protein
4
annotation broad
4
broad range
4
range species
4

Similar Publications

Objective: This study aimed to explore the active components and potential mechanism of Tanre Qing Injection (TRQI) in the treatment of Acute Respiratory Distress Syndrome (ARDS) using network pharmacology, molecular docking, and animal experiments.

Methods: The targets of active ingredients were identified using the TCMSP and Swiss Target Prediction databases. The targets associated with ARDS were obtained from the GeneCards database, Mala card database, and Open Targets Platform.

View Article and Find Full Text PDF

Small proteins (≤100 amino acids) play important roles across all life forms, ranging from unicellular bacteria to higher organisms. In this study, we have developed SProtFP which is a machine learning-based method for functional annotation of prokaryotic small proteins into selected functional categories. SProtFP uses independent artificial neural networks (ANNs) trained using a combination of physicochemical descriptors for classifying small proteins into antitoxin type 2, bacteriocin, DNA-binding, metal-binding, ribosomal protein, RNA-binding, type 1 toxin and type 2 toxin proteins.

View Article and Find Full Text PDF

Evaluating the accuracy of protein-coding sequences in genome annotations is a challenging problem for which there is no broadly applicable solution. In this manuscript, we introduce PSAURON (Protein Sequence Assessment Using a Reference ORF Network), a novel software tool developed to help assess the quality of protein-coding gene annotations. Utilizing a machine learning model trained on a diverse dataset from over 1000 plant and animal genomes, PSAURON assigns a score to coding DNA or protein sequence that reflects the likelihood that the sequence is a genuine protein-coding region.

View Article and Find Full Text PDF

Network analysis of differentially expressed genes involved in oral submucous fibrosis and oral squamous cell carcinoma: a comparative approach.

Oral Surg Oral Med Oral Pathol Oral Radiol

December 2024

Department of Bioengineering and Biotechnology, Birla Institute of Technology, Mesra, Ranchi, Jharkhand, India. Electronic address:

Objective: Oral submucous fibrosis (OSMF) is categorized as an oral potentially malignant disorder (OPMD) with an increased risk of occurrence of oral squamous cell carcinoma (OSCC). In this study, we aimed to identify the hub genes associated with OSMF and OSCC.

Study Design: Using RStudio, a set of differentially expressed genes (DEGs) were identified in (A) OSMF, (B) OSCC, and (C) comparative OSMF-OSCC category, obtained from Gene Expression Omnibus (GEO).

View Article and Find Full Text PDF

Evolutionary Analysis of Hypoderma Pantholopsum in Tibetan Antelopes on the Qinghai-Tibetan Plateau.

Acta Parasitol

January 2025

Academy of Animal Sciences and Veterinary Medicine, Qinghai University, Xining, People's Republic of China.

Purpose: Hypoderma pantholopsum is a parasite that parasitizes Tibetan antelopes (Pantholops hodgsonii). This study aims was to reveal the genetic diversity within H. pantholopsum and contribute to the protection of Tibetan antelope.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!