Contaminating sequences in public genome databases is a pervasive issue with potentially far-reaching consequences. This problem has attracted much attention in the recent literature and many different tools are now available to detect contaminants. Although these methods are based on diverse algorithms that can sometimes produce widely different estimates of the contamination level, the majority of genomic studies rely on a single method of detection, which represents a risk of systematic error. In this work, we used two orthogonal methods to assess the level of contamination among National Center for Biotechnological Information Reference Sequence Database (RefSeq) bacterial genomes. First, we applied the most popular solution, CheckM, which is based on gene markers. We then complemented this approach by a genome-wide method, termed Physeter, which now implements a -folds algorithm to avoid inaccurate detection due to potential contamination of the reference database. We demonstrate that CheckM cannot currently be applied to all available genomes and bacterial groups. While it performed well on the majority of RefSeq genomes, it produced dubious results for 12,326 organisms. Among those, Physeter identified 239 contaminated genomes that had been missed by CheckM. In conclusion, we emphasize the importance of using multiple methods of detection while providing an upgrade of our own detection tool, Physeter, which minimizes incorrect contamination estimates in the context of unavoidably contaminated reference databases.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8570097PMC
http://dx.doi.org/10.3389/fmicb.2021.755101DOI Listing

Publication Analysis

Top Keywords

contamination reference
8
reference sequence
8
contamination
5
sequence databases
4
databases time
4
time divide-and-rule
4
divide-and-rule tactics
4
tactics contaminating
4
contaminating sequences
4
sequences public
4

Similar Publications

Decontamination of DNA sequences from a Streptomyces genome for optimal genome mining.

Braz J Microbiol

January 2025

Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo (USP), São Paulo, SP, 05508-900, Brazil.

Despite meticulous precautions, contamination of genomic DNA samples is not uncommon, which can significantly compromise the analysis of microorganisms' whole-genome sequencing data, thus affecting all subsequent analyses. Thanks to advancements in software and bioinformatics techniques, it is now possible to address this issue and prevent the loss of the entire dataset obtained in a contaminated whole-genome sequencing, where the DNA of another bacterium is present. In this study, it was observed that the sequencing reads from Streptomyces sp.

View Article and Find Full Text PDF

A new database to guide reference material selection for dietary supplement and nutrition science.

Anal Bioanal Chem

January 2025

ICF International Contractor in support of the Office of Dietary Supplements, National Institutes of Health, Bethesda, MD, USA.

Rigorous research on the health effects of dietary supplements and related nutritional interventions requires thorough chemical characterization of complex matrices for their composition of macro- and micronutrients, botanical phytochemicals, and potential contaminants. Reference materials (RMs) with metrologically traceable values for these specific properties are ideal analytical tools to ensure requisite chemical measurements are reliable. However, identifying and comparing appropriate RMs for studying dietary ingredients and their metabolites is challenging, creating a barrier to reproducible regulatory testing and research.

View Article and Find Full Text PDF

Disposal and resource utilization of oil-based drill cuttings in China - a review.

J Air Waste Manag Assoc

January 2025

Chongqing Yuanda Air Pollution Control Franchise Co Ltd. Technology Branch, Chongqing, China.

As a significant player in the global shale gas extraction industry, China has achieved a leading position in shale gas production on a worldwide scale. However, China is also facing the challenge of managing a considerable quantity of oil-based drill cuttings (OBDCs), which are classified as hazardous waste. Without appropriate treatment methods, these materials could cause significant environmental contamination.

View Article and Find Full Text PDF

Background: Unregulated contaminants in drinking water, such as per- and polyfluoroalkyl substances (PFAS), can contribute to cumulative health risks, particularly in overburdened and less-advantaged communities. To our knowledge, there has been no nationwide assessment of socioeconomic disparities in exposures to unregulated contaminants in drinking water.

Objective: The goals of this study were to identify determinants of unregulated contaminant detection among US public water systems (PWSs) and evaluate disparities related to race, ethnicity, and socioeconomic status.

View Article and Find Full Text PDF

Foreign Contaminants Target Brain Health.

CNS Neurol Disord Drug Targets

January 2025

Department of Pharmaceutical Chemistry, Delhi Pharmaceutical Sciences & Research University, Delhi, India-110017.

Neurodisease, caused by undesired substances, can lead to mental health conditions like depression, anxiety and neurocognitive problems like dementia. These substances can be referred to as contaminants that can cause damage, corruption, and infection or reduce brain functionality. Contaminants, whether conceptual or physical, have the ability to disrupt many processes.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!