Dimension reduction (DR or embedding) algorithms such as t-SNE and UMAP have many applications in big data visualization but remain slow for large datasets. Here, we further improve the UMAP-like algorithms by (i) combining several aspects of t-SNE and UMAP to create a new DR algorithm; (ii) replacing its rate-limiting step, the K-nearest neighbor graph (K-NNG), with a Hierarchical Navigable Small World (HNSW) graph; and (iii) extending the functionality to DNA/RNA sequence data by combining HNSW with locality sensitive hashing algorithms (e.g.
View Article and Find Full Text PDFViruses shape microbial community structure and activity through the control of population diversity and cell abundances. Identifying and monitoring the dynamics of specific virus-host pairs in nature is hampered by the limitations of culture-independent approaches such as metagenomics, which do not always provide strain-level resolution, and culture-based analyses, which eliminate the ecological background and in-situ interactions. Here, we have explored the interaction of a specific "autochthonous" host strain and its viruses within a natural community.
View Article and Find Full Text PDFRecent genomic analyses have revealed that microbial communities are predominantly composed of persistent, sequence-discrete species and intraspecies units (genomovars), but the mechanisms that create and maintain these units remain unclear. By analyzing closely-related isolate genomes from the same or related samples and identifying recent recombination events using a novel bioinformatics methodology, we show that high ecological cohesiveness coupled to frequent-enough and unbiased (i.e.
View Article and Find Full Text PDFBackground: Arsenic (As) metabolism pathways and their coupling to nitrogen (N) and carbon (C) cycling contribute to elemental biogeochemical cycling. However, how whole-microbial communities respond to As stress and which taxa are the predominant As-transforming bacteria or archaea in situ remains unclear. Hence, by constructing and applying ROCker profiles to precisely detect and quantify As oxidation (aioA, arxA) and reduction (arrA, arsC1, arsC2) genes in short-read metagenomic and metatranscriptomic datasets, we investigated the dominant microbial communities involved in arsenite (As(III)) oxidation and arsenate (As(V)) reduction and revealed their potential pathways for coupling As with N and C in situ in rice paddies.
View Article and Find Full Text PDF