Average Nucleotide Identity (ANI) is becoming a standard measure for bacterial species delimitation. However, its calculation can take orders of magnitude longer than similarity estimates based on sampling of short nucleotides, compiled into so-called sketches. These estimates are widely used. However, their variable correlation with ANI has suggested that they might not be as accurate. For a where-the-rubber-meets-the-road assessment, we compared two sketching programs, mash and dashing, against ANI, in delimiting species among Esterobacterales genomes. Receiver Operating Characteristic (ROC) analysis found Area Under the Curve (AUC) values of 0.99, almost perfect species discrimination for all three measures. Subsampling to avoid over-represented species reduced these AUC values to 0.92, still highly accurate. Focused tests with ten genera, each represented by more than three species, also showed almost identical results for all methods. Shigella showed the lowest AUC values (0.68), followed by Citrobacter (0.80). All other genera, Dickeya, Enterobacter, Escherichia, Klebsiella, Pectobacterium, Proteus, Providencia and Yersinia, produced AUC values above 0.90. The species delimitation thresholds varied, with species distance ranges in a few genera overlapping the genus ranges of other genera. Mash was able to separate the E. coli + Shigella complex into 25 apparent phylogroups, four of them corresponding, roughly, to the four Shigella species represented in the data. Our results suggest that fast estimates of genome similarity are as good as ANI for species delimitation. Therefore, these estimates might suffice for covering the role of genomic similarity in bacterial taxonomy, and should increase confidence in their use for efficient bacterial identification and clustering, from epidemiological to genome-based detection of potential contaminants in farming and industry settings.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10501659PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0291492PLOS

Publication Analysis

Top Keywords

auc values
16
species delimitation
12
species
10
ranges genera
8
fast genome-based
4
delimitation
4
genome-based delimitation
4
delimitation enterobacterales
4
enterobacterales species
4
species average
4

Similar Publications

A machine learning-based model to predict POD24 in follicular lymphoma: a study by the Chinese workshop on follicular lymphoma.

Biomark Res

January 2025

Department of Hematology, The First Affiliated Hospital of Xiamen University and Institute of Hematology, School of Medicine, Xiamen University, Xiamen, 361003, P.R. China.

Background: Disease progression within 24 months (POD24) significantly impacts overall survival (OS) in patients with follicular lymphoma (FL). This study aimed to develop a robust predictive model, FLIPI-C, using a machine learning approach to identify FL patients at high risk of POD24.

Methods: A cohort of 1,938 FL patients (FL1-3a) from seventeen centers nationwide in China was randomly divided into training and internal validation sets (2:1 ratio).

View Article and Find Full Text PDF

Association between remnant cholesterol (RC) and endometriosis: a cross-sectional study based on NHANES data.

Lipids Health Dis

January 2025

Department of Obstetrics and Gynecology, Center for Reproductive Medicine, Guangdong Provincial Key Laboratory of Major Obstetric Diseases, Guangdong Provincial Clinical Research Center for Obstetrics and Gynecology, Guangdong-Hong Kong-Macao Greater Bay Area Higher Education Joint Laboratory of Maternal-Fetal Medicine, The Third Affiliated Hospital, Guangzhou Medical University, Guangzhou, China.

Background: Prior research indicates a potential link between dyslipidemia and endometriosis (EMs). However, the relationship between remnant cholesterol (RC) and EMs has not been thoroughly investigated. Consequently, looking into and clarifying the connection between RC and EMs was the primary goal of this study.

View Article and Find Full Text PDF

Loss of cervical lordosis (LOCL) is the most common postoperative cervical deformity. This study aimed to identify the predictors of LOCL by investigating the relationship between various factors and LOCL development after surgery for cervical spinal cord tumors. A retrospective analysis was conducted on 51 patients who underwent cervical spinal tumor resection at a single center.

View Article and Find Full Text PDF

This study aims to construct and validate noninvasive diagnosis models for evaluating significant liver fibrosis in patients with chronic hepatitis B (CHB). A cohort of 259 CHB patients were selected as research subjects. Through random grouping, 182 cases were included in the training set and 77 cases in the validation set.

View Article and Find Full Text PDF

Hepatocellular carcinoma (HCC) is a leading cause of cancer mortality globally due to HCC late diagnosis and limited treatment options. MiRNAs (miRNAs) emerged as potential biomarkers for various diseases, including HCC. However, the value of miRNA-101 as a serum biomarker for HCV-induced HCC has not been fully investigated.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!