Evaluating the accuracy of protein-coding sequences in genome annotations is a challenging problem for which there is no broadly applicable solution. In this manuscript, we introduce PSAURON (Protein Sequence Assessment Using a Reference ORF Network), a novel software tool developed to help assess the quality of protein-coding gene annotations. Utilizing a machine learning model trained on a diverse dataset from over 1000 plant and animal genomes, PSAURON assigns a score to coding DNA or protein sequence that reflects the likelihood that the sequence is a genuine protein-coding region. PSAURON scores can be used for genome-wide protein annotation assessment as well as the rapid identification of potentially spurious annotated proteins. Validation against established benchmarks demonstrates PSAURON's effectiveness and correlation with recognized measures of protein quality, highlighting its potential use as a widely applicable method to evaluate precision in gene annotation. PSAURON is open source and freely available at https://github.com/salzberg-lab/PSAURON.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11704789 | PMC |
http://dx.doi.org/10.1093/nargab/lqae189 | DOI Listing |
Comb Chem High Throughput Screen
January 2025
Department of Pharmacy, Taicang TCM Hospital Affiliated to Nanjing University of Chinese Medicine, Jiangsu, China.
Objective: This study aimed to explore the active components and potential mechanism of Tanre Qing Injection (TRQI) in the treatment of Acute Respiratory Distress Syndrome (ARDS) using network pharmacology, molecular docking, and animal experiments.
Methods: The targets of active ingredients were identified using the TCMSP and Swiss Target Prediction databases. The targets associated with ARDS were obtained from the GeneCards database, Mala card database, and Open Targets Platform.
NAR Genom Bioinform
March 2025
National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi 110067, India.
Small proteins (≤100 amino acids) play important roles across all life forms, ranging from unicellular bacteria to higher organisms. In this study, we have developed SProtFP which is a machine learning-based method for functional annotation of prokaryotic small proteins into selected functional categories. SProtFP uses independent artificial neural networks (ANNs) trained using a combination of physicochemical descriptors for classifying small proteins into antitoxin type 2, bacteriocin, DNA-binding, metal-binding, ribosomal protein, RNA-binding, type 1 toxin and type 2 toxin proteins.
View Article and Find Full Text PDFNAR Genom Bioinform
March 2025
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.
Evaluating the accuracy of protein-coding sequences in genome annotations is a challenging problem for which there is no broadly applicable solution. In this manuscript, we introduce PSAURON (Protein Sequence Assessment Using a Reference ORF Network), a novel software tool developed to help assess the quality of protein-coding gene annotations. Utilizing a machine learning model trained on a diverse dataset from over 1000 plant and animal genomes, PSAURON assigns a score to coding DNA or protein sequence that reflects the likelihood that the sequence is a genuine protein-coding region.
View Article and Find Full Text PDFOral Surg Oral Med Oral Pathol Oral Radiol
December 2024
Department of Bioengineering and Biotechnology, Birla Institute of Technology, Mesra, Ranchi, Jharkhand, India. Electronic address:
Objective: Oral submucous fibrosis (OSMF) is categorized as an oral potentially malignant disorder (OPMD) with an increased risk of occurrence of oral squamous cell carcinoma (OSCC). In this study, we aimed to identify the hub genes associated with OSMF and OSCC.
Study Design: Using RStudio, a set of differentially expressed genes (DEGs) were identified in (A) OSMF, (B) OSCC, and (C) comparative OSMF-OSCC category, obtained from Gene Expression Omnibus (GEO).
Acta Parasitol
January 2025
Academy of Animal Sciences and Veterinary Medicine, Qinghai University, Xining, People's Republic of China.
Purpose: Hypoderma pantholopsum is a parasite that parasitizes Tibetan antelopes (Pantholops hodgsonii). This study aims was to reveal the genetic diversity within H. pantholopsum and contribute to the protection of Tibetan antelope.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!