Publicly available genomes are crucial for phylogenetic and metagenomic studies, in which contaminating sequences can be the cause of major problems. This issue is expected to be especially important for Cyanobacteria because axenic strains are notoriously difficult to obtain and keep in culture. Yet, despite their great scientific interest, no data are currently available concerning the quality of publicly available cyanobacterial genomes. As reliably detecting contaminants is a complex task, we designed a pipeline combining six methods in a consensus strategy to assess the contamination level of 440 genome assemblies of Cyanobacteria. Two methods are based on published reference databases of ribosomal genes (SSU rRNA 16S and ribosomal proteins), one is indirectly based on a reference database of marker genes (CheckM), and three are based on complete genome analysis. Among those genome-wide methods, Kraken and DIAMOND blastx share the same reference database that we derived from Ensembl Bacteria, whereas CONCOCT does not require any reference database, instead relying on differences in DNA tetramer frequencies. Given that all the six methods appear to have their own strengths and limitations, we used the consensus of their rankings to infer that >5% of cyanobacterial genome assemblies are highly contaminated by foreign DNA (i.e., contaminants were detected by 5 or 6 methods). Our results will help researchers to check the quality of publicly available genomic data before use in their own analyses. Moreover, we argue that journals should make mandatory the submission of raw read data along with genome assemblies in order to facilitate the detection of contaminants in sequence databases.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6059444 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0200323 | PLOS |
Genome Biol
January 2025
The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, 2800, Denmark.
Background: Streptomyces is a highly diverse genus known for the production of secondary or specialized metabolites with a wide range of applications in the medical and agricultural industries. Several thousand complete or nearly complete Streptomyces genome sequences are now available, affording the opportunity to deeply investigate the biosynthetic potential within these organisms and to advance natural product discovery initiatives.
Results: We perform pangenome analysis on 2371 Streptomyces genomes, including approximately 1200 complete assemblies.
BMC Genomics
January 2025
College of Life Sciences, Shaanxi Normal University, Xi'an, 710062, China.
Background: Chemosensory perception plays a vital role in insect survival and adaptability, driving essential behaviours such as navigation, mate identification, and food location. This sensory process is governed by diverse gene families, including odorant-binding proteins (OBPs), olfactory receptors (ORs), ionotropic receptors (IRs), chemosensory proteins (CSPs), gustatory receptors (GRs), and sensory neuron membrane proteins (SNMPs). The oriental mole cricket (Gryllotalpa orientalis Burmeister), an invasive pest with an underground, phyllophagous lifestyle, causes substantial crop damage.
View Article and Find Full Text PDFEMBO J
January 2025
Howard Hughes Medical Institute, Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA.
Chromosome segregation relies on kinetochores that assemble on specialized centromeric chromatin containing a histone H3 variant. In budding yeast, a single centromeric nucleosome containing Cse4 assembles at a sequence-defined 125 bp centromere. Yeast centromeric sequences are poor templates for nucleosome formation in vitro, suggesting the existence of mechanisms that specifically stabilize Cse4 nucleosomes in vivo.
View Article and Find Full Text PDFSci Data
January 2025
State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou, 350002, China.
Anisarchus medius (Reinhardt, 1837) is a widely distributed Arctic fish, serving as an indicator of climate change impacts on coastal Arctic ecosystems. This study presents a chromosome-level genome assembly for A. medius using PacBio sequencing and Hi-C technology.
View Article and Find Full Text PDFPeerJ
January 2025
Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America.
Despite the recent surge of viral metagenomic studies, it remains a significant challenge to recover complete virus genomes from metagenomic data. The majority of viral contigs generated from de novo assembly programs are highly fragmented, presenting significant challenges to downstream analysis and inference. To address this issue, we have developed Virseqimprover, a computational pipeline that can extend assembled contigs to complete or nearly complete genomes while maintaining extension quality.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!