GTDB-Tk v2: memory friendly classification with the genome taxonomy database.

Bioinformatics

Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, QLD 4072, Australia.

Published: November 2022

Summary: The Genome Taxonomy Database (GTDB) and associated taxonomic classification toolkit (GTDB-Tk) have been widely adopted by the microbiology community. However, the growing size of the GTDB bacterial reference tree has resulted in GTDB-Tk requiring substantial amounts of memory (∼320 GB) which limits its adoption and ease of use. Here, we present an update to GTDB-Tk that uses a divide-and-conquer approach where user genomes are initially placed into a bacterial reference tree with family-level representatives followed by placement into an appropriate class-level subtree comprising species representatives. This substantially reduces the memory requirements of GTDB-Tk while having minimal impact on classification.

Availability And Implementation: GTDB-Tk is implemented in Python and licenced under the GNU General Public Licence v3.0. Source code and documentation are available at: https://github.com/ecogenomics/gtdbtk.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710552PMC
http://dx.doi.org/10.1093/bioinformatics/btac672DOI Listing

Publication Analysis

Top Keywords

genome taxonomy
8
taxonomy database
8
bacterial reference
8
reference tree
8
gtdb-tk
6
gtdb-tk memory
4
memory friendly
4
friendly classification
4
classification genome
4
database summary
4

Similar Publications

The Low Density Lipoprotein receptors (LDLRs) gene family includes 15 receptors: very low-density lipoprotein receptor (VLDLR), LDLR, Sorting-related receptor with A-type repeats (SORLA), and 12 LDL receptor-related proteins (LRPs): LRP1, LRP1B, LRP2, LRP3, LRP4, LRP5, LRP6, LRP8, LRP10, LRP11, LRP12, LRP13. Most of these are involved in the transduction of key signals during embryonic development and in the regulation of cholesterol homeostasis. In oviparous animals, the VLDL receptor is also known as VTGR since it facilitates the uptake of vitellogenin in ovary.

View Article and Find Full Text PDF

Genome assembly of the grassland caterpillar Gynaephora qinghaiensis.

Sci Data

January 2025

State Key Laboratory of Rice Biology, Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, 310058, China.

The grassland caterpillars are the most damaging insect pests to the alpine meadow of the Qinghai-Tibetan Plateau in China. In this study, we present a genome assembly of one grassland caterpillar Gynaephora qinghaiensis by using Oxford Nanopore long-read and BGI short-read sequencing. The genome assembly of 861.

View Article and Find Full Text PDF

Purpose: Conventional prostate magnetic resonance imaging has limited accuracy for clinically significant prostate cancer (csPCa). We performed diffusion basis spectrum imaging (DBSI) prior to biopsy and applied artificial intelligence models to these DBSI metrics to predict csPCa.

Materials And Methods: Between February 2020 and March 2024, 241 patients underwent prostate MRI that included conventional and DBSI-specific sequences prior to prostate biopsy.

View Article and Find Full Text PDF

Heterocytes, specialized cells for nitrogen fixation in cyanobacteria, are surrounded by heterocyte glycolipids (HGs), which contribute to protection of the nitrogenase enzyme from oxygen. Diverse HGs preserve in the sediment and have been widely used as evidence of past nitrogen fixation, and structural variation has been suggested to preserve taxonomic information and reflect paleoenvironmental conditions. Here, by comprehensive HG identification and screening of HG biosynthetic gene clusters throughout cyanobacteria, we reconstruct the convergent evolutionary history of HG structure, in which different clades produce the same HGs.

View Article and Find Full Text PDF

The Tapetum Determinant 1 (TPD1) family proteins are known to play a crucial role in the regulation of reproduction in plants, including Cenchrus americanus (pearl millet). However, members of TPD1 family proteins have not been fully identified. The current study aims to identify and characterize the TPD1 family proteins in Cenchrus americanus (L.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!