A machine learning-based service for estimating quality of genomes using PATRIC.

Bruce Parrello Rory Butler Philippe Chlenski Robert Olson Jamie Overbeek Gordon D Pusch Veronika Vonstein Ross Overbeek

BMC Bioinformatics

Fellowship for Interpretation of Genomes, Burr Ridge, 60527, IL, USA.

Published: October 2019

Background: Recent advances in high-volume sequencing technology and mining of genomes from metagenomic samples call for rapid and reliable genome quality evaluation. The current release of the PATRIC database contains over 220,000 genomes, and current metagenomic technology supports assemblies of many draft-quality genomes from a single sample, most of which will be novel.

Description: We have added two quality assessment tools to the PATRIC annotation pipeline. EvalCon uses supervised machine learning to calculate an annotation consistency score. EvalG implements a variant of the CheckM algorithm to estimate contamination and completeness of an annotated genome.We report on the performance of these tools and the potential utility of the consistency score. Additionally, we provide contamination, completeness, and consistency measures for all genomes in PATRIC and in a recent set of metagenomic assemblies.

Conclusion: EvalG and EvalCon facilitate the rapid quality control and exploration of PATRIC-annotated draft genomes.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6775668	PMC
http://dx.doi.org/10.1186/s12859-019-3068-y	DOI Listing

Publication Analysis

Top Keywords

genomes patric

consistency score

contamination completeness

genomes

machine learning-based

learning-based service

service estimating

quality

estimating quality

quality genomes

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!