To compare the performance of prokaryotic taxonomy classifiers using curated 16S full-length rRNA sequences.

Comput Biol Med

Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan; Bioinformatics and Biostatistics Core Lab, Center of Genomic and Precision Medicine, National Taiwan University, Taipei, Taiwan; College of Biomedical Engineering, China Medical University, Taichung, Taiwan. Electronic address:

Published: June 2022

AI Article Synopsis

  • Taxonomic assignment is crucial for analyzing bacterial 16S rRNA, but focusing only on specific regions limits the ability to distinguish species, prompting the need for full-length sequence analysis with new sequencing technologies.
  • This study compares the accuracy of seven different 16S sequence classifiers, using various training datasets to determine which ones perform best for taxonomic classification.
  • Results showed that SINTAX and SPINGO, when trained with RDP sequences, achieved the highest accuracy, indicating that the choice of training dataset significantly impacts classifier performance.

Article Abstract

Background: Taxonomic assignment is a vital step in the analytic pipeline of bacterial 16S ribosomal RNA (rRNA) sequencing. Over the past decade, most research in this field used next-generation sequencing technology to target V3∼V4 regions to analyze bacterial composition. However, focusing on only one or two hypervariable regions limited the taxonomic resolution to the species level. In recent years, third-generation sequencing technology has allowed researchers to easily access full-length prokaryotic 16S sequences and presented an opportunity to attain greater taxonomic depth. However, the accuracy of current taxonomic classifiers in analyzing 16S full-length sequence analysis remains unclear.

Objective: The purpose of this study is to compare the accuracy of several widely-used 16S sequence classifiers and to indicate the most suitable 16S training dataset for each classifier.

Methods: Both curated 16S full-length sequences and cross-validation datasets were used to validate the performance of seven classifiers, including QIIME2, mothur, SINTAX, SPINGO, Ribosomal Database Project (RDP), IDTAXA, and Kraken2. Different sequence training datasets, such as SILVA, Greengenes, and RDP, were used to train the classification models.

Results: The accuracy of each classifier to the species levels were illustrated. According to the experimental results, using RDP sequences as the training data, SINTAX and SPINGO provided the highest accuracy, and were recommended for the task of classifying prokaryotic 16S full-length rRNA sequences.

Conclusion: The performance of the classifiers was affected by sequence training datasets. Therefore, different classifiers should use the most suitable 16S training data to improve the accuracy and taxonomy resolution in the taxonomic assignment.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2022.105416DOI Listing

Publication Analysis

Top Keywords

16s full-length
16
16s
9
curated 16s
8
full-length rrna
8
taxonomic assignment
8
sequencing technology
8
prokaryotic 16s
8
suitable 16s
8
16s training
8
performance classifiers
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!