Motivation: The biases in CoDing Sequence (CDS) prediction tools, which have been based on historic genomic annotations from model organisms, impact our understanding of novel genomes and metagenomes. This hinders the discovery of new genomic information as it results in predictions being biased towards existing knowledge. To date, users have lacked a systematic and replicable approach to identify the strengths and weaknesses of any CDS prediction tool and allow them to choose the right tool for their analysis.

Results: We present an evaluation framework (ORForise) based on a comprehensive set of 12 primary and 60 secondary metrics that facilitate the assessment of the performance of CDS prediction tools. This makes it possible to identify which performs better for specific use-cases. We use this to assess 15 ab initio- and model-based tools representing those most widely used (historically and currently) to generate the knowledge in genomic databases. We find that the performance of any tool is dependent on the genome being analysed, and no individual tool ranked as the most accurate across all genomes or metrics analysed. Even the top-ranked tools produced conflicting gene collections, which could not be resolved by aggregation. The ORForise evaluation framework provides users with a replicable, data-led approach to make informed tool choices for novel genome annotations and for refining historical annotations.

Availability And Implementation: Code and datasets for reproduction and customisation are available at https://github.com/NickJD/ORForise.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8825762PMC
http://dx.doi.org/10.1093/bioinformatics/btab827DOI Listing

Publication Analysis

Top Keywords

cds prediction
12
prediction tool
8
prediction tools
8
evaluation framework
8
tool
7
tool rule
4
rule prokaryotic
4
prokaryotic gene
4
prediction
4
gene prediction
4

Similar Publications

Introduction: As surgical accessibility improves, the incidence of postoperative complications is expected to rise. The implementation of a precise and objective risk stratification tool holds the potential to mitigate these complications by early identification of high-risk patients. Moreover, it could address the escalating costs from resource misallocation.

View Article and Find Full Text PDF

The complete plastome size of DC. 1813 was 159,893 bp in length and has a typical quadripartite structure. The 87,148-bp-long large single-copy and the 18,763-bp-long small single-copy regions were separated by a pair of inverted repeats (each 26,991 bp).

View Article and Find Full Text PDF

Calmodulin-binding transcription activator (), as one of the transcription factors, is involved in performing important functions in modulating plant stress responses and development in a Ca/CaM-driven modus. However, genome-scale analysis of has not been systemically investigated in roses. Rose ( Jacq.

View Article and Find Full Text PDF

A data management system for precision medicine.

PLOS Digit Health

January 2025

Clinical Care & Research, ORTEC B.V., Zoetermeer, The Netherlands.

Precision, or personalised medicine has advanced requirements for medical data management systems (MedDMSs). MedDMS for precision medicine should be able to process hundreds of parameters from multiple sites, be adaptable while remaining in sync at multiple locations, real-time syncing to analytics and be compliant with international privacy legislation. This paper describes the LogiqSuite software solution, aimed to support a precision medicine solution at the patient care (LogiqCare), research (LogiqScience) and data science (LogiqAnalytics) level.

View Article and Find Full Text PDF

strains S3W10 and SS15, isolated from shrimp ponds, exhibit potential probiotic benefits for aquaculture. In this study, the genomic features of S3W10 and SS15 were thoroughly characterized to evaluate their probiotic properties and safety for aquaculture use. The genomes of S3W10 and SS15 consist of 130 and 74 contigs, with sizes of 4.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!