Background: Long reads have gained popularity in the analysis of metagenomics data. Therefore, we comprehensively assessed metagenomics classification tools on the species taxonomic level. We analysed kmer-based tools, mapping-based tools and two general-purpose long reads mappers. We evaluated more than 20 pipelines which use either nucleotide or protein databases and selected 13 for an extensive benchmark. We prepared seven synthetic datasets to test various scenarios, including the presence of a host, unknown species and related species. Moreover, we used available sequencing data from three well-defined mock communities, including a dataset with abundance varying from 0.0001 to 20% and six real gut microbiomes.
Results: General-purpose mappers Minimap2 and Ram achieved similar or better accuracy on most testing metrics than best-performing classification tools. They were up to ten times slower than the fastest kmer-based tools requiring up to four times less RAM. All tested tools were prone to report organisms not present in datasets, except CLARK-S, and they underperformed in the case of the high presence of the host's genetic material. Tools which use a protein database performed worse than those based on a nucleotide database. Longer read lengths made classification easier, but due to the difference in read length distributions among species, the usage of only the longest reads reduced the accuracy. The comparison of real gut microbiome datasets shows a similar abundance profiles for the same type of tools but discordance in the number of reported organisms and abundances between types. Most assessments showed the influence of database completeness on the reports.
Conclusion: The findings indicate that kmer-based tools are well-suited for rapid analysis of long reads data. However, when heightened accuracy is essential, mappers demonstrate slightly superior performance, albeit at a considerably slower pace. Nevertheless, a combination of diverse categories of tools and databases will likely be necessary to analyse complex samples. Discrepancies observed among tools when applied to real gut datasets, as well as a reduced performance in cases where unknown species or a significant proportion of the host genome is present in the sample, highlight the need for continuous improvement of existing tools. Additionally, regular updates and curation of databases are important to ensure their effectiveness.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10782538 | PMC |
http://dx.doi.org/10.1186/s12859-024-05634-8 | DOI Listing |
Microbiol Resour Announc
January 2025
Department of Livestock Infectiology and Environmental Hygiene, University of Hohenheim, Stuttgart, Germany.
is one of the closest relatives of the highly host-adapted and uncultivable hemotrophic mycoplasma. The complete genome of strain 117C was constructed from long-reads derived from Pacific Biosciences single-molecule, real-time sequencing technology. The genome is organized into one circular, gapless chromosome with a length of 1,034 kb.
View Article and Find Full Text PDFSci Data
January 2025
Shaanxi Key Laboratory of Plant Nematology, Bio-Agriculture Institute of Shaanxi, Xi'an, China.
Ditylenchus destructor, commonly known as the potato rot nematode, is a significant plant-parasitic pathogen affecting over 120 plant species globally. Effective control measures for D. destructor are limited, underscoring the need a high-quality reference genome to understand its pathogenic mechanisms.
View Article and Find Full Text PDFAnal Chem
January 2025
School of Chemistry and Chemical Engineering, Jiangsu University, Zhenjiang 212013, PR China.
Wearable sensors have broad application potential in motion assessment, health monitoring, and medical diagnosis. However, relying on a specialized instrument for power supply and signal reading makes sensors unsuitable for on-site detection. To solve this problem, a reusable self-powered electrochromic sensor patch based on enzymatic biofuel cells were constructed to realize the on-site visualized monitoring.
View Article and Find Full Text PDFJ Gerontol B Psychol Sci Soc Sci
January 2025
Department of Human Development and Family Studies, Pennsylvania State University, State College, Pennsylvania, USA.
Objective: Studies using ecological momentary assessment (EMA) of activity participation rely on items tapping domains informed by factor analyses based on single time points. Analyses from a single time point focus on differences between participants and provide little insight into how activities cluster together within a person across moments or days. The present study compared the factor structure in activity participation between- and within-persons using an expanded set of momentary activity items in middle and older adulthood.
View Article and Find Full Text PDFNucleic Acids Res
January 2025
Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå University, Biomedicinbyggnaden 6K och 6L, Umeå universitetssjukhus, 901 87, Umeå, Sweden.
Single-cell RNA-seq methods can be used to delineate cell types and states at unprecedented resolution but do little to explain why certain genes are expressed. Single-cell ATAC-seq and multiome (ATAC + RNA) have emerged to give a complementary view of the cell state. It is however unclear what additional information can be extracted from ATAC-seq data besides transcription factor binding sites.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!