Unlabelled: Thousands of complete genome sequences for strains of a species that are now available enable the advancement of pangenome analytics to a new level of sophistication. We collected 2,377 publicly available complete genomes of for detailed pangenome analysis. The core genome and accessory genomes consisted of 2,398 and 5,182 genes, respectively.
View Article and Find Full Text PDFSurveillance programs for managing antimicrobial resistance (AMR) have yielded thousands of genomes suited for data-driven mechanism discovery. We present a workflow integrating pangenomics, gene annotation, and machine learning to identify AMR genes at scale. When applied to 12 species, 27,155 genomes, and 69 drugs, we 1) find AMR gene transfer mostly confined within related species, with 925 genes in multiple species but just eight in multiple phylogenetic classes, 2) demonstrate that discovery-oriented support vector machines outperform contemporary methods at recovering known AMR genes, recovering 263 genes compared to 145 by Pyseer, and 3) identify 142 AMR gene candidates.
View Article and Find Full Text PDFLactobacillaceae represent a large family of important microbes that are foundational to the food industry. Many genome sequences of Lactobacillaceae strains are now available, enabling us to conduct a comprehensive pangenome analysis of this family. We collected 3591 high-quality genomes from public sources and found that: 1) they contained enough genomes for 26 species to perform a pangenomic analysis, 2) the normalized Heap's coefficient λ (a measure of pangenome openness) was found to have an average value of 0.
View Article and Find Full Text PDFBackground: Cumulative sequencing efforts have yielded enough genomes to construct pangenomes for dozens of bacterial species and elucidate intraspecies gene conservation. Given the diversity of organisms for which this is achievable, similar analyses for ancestral species are feasible through the integration of pangenomics and phylogenetics, promising deeper insights into the nature of ancient life.
Results: We construct pangenomes for 183 bacterial species from 54,085 genomes and identify their core genomes using a novel statistical model to estimate genome-specific error rates and underlying gene frequencies.
Background: With the exponential growth of publicly available genome sequences, pangenome analyses have provided increasingly complete pictures of genetic diversity for many microbial species. However, relatively few studies have scaled beyond single pangenomes to compare global genetic diversity both within and across different species. We present here several methods for "comparative pangenomics" that can be used to contextualize multi-pangenome scale genetic diversity with gene function for multiple species at multiple resolutions: pangenome shape, genes, sequence variants, and positions within variants.
View Article and Find Full Text PDFThe evolution of antimicrobial resistance (AMR) poses a persistent threat to global public health. Sequencing efforts have already yielded genome sequences for thousands of resistant microbial isolates and require robust computational tools to systematically elucidate the genetic basis for AMR. Here, we present a generalizable machine learning workflow for identifying genetic features driving AMR based on constructing reference strain-agnostic pan-genomes and training random subspace ensembles (RSEs).
View Article and Find Full Text PDFBackground: Vanillin is an industrially valuable molecule that can be produced from simple carbon sources in engineered microorganisms such as Saccharomyces cerevisiae and Escherichia coli. In E. coli, de novo production of vanillin was demonstrated previously as a proof of concept.
View Article and Find Full Text PDFThree chloride-bridged lanthanide compounds, [Ln4Cl6(CH3OH)12(OH)2]·4Cl·2CH3OH [Ln = Gd (), Dy () and Er ()], have been unexpectedly isolated by the reactions of LnCl3·6H2O and N,N'-bis(salicylidene)-1,2-(phenylene-diamine) (H2L). X-ray crystallographic analysis reveals a triclinic cell with a unique defect-dicubane {Ln4} core and the structure across this series is nominally isomorphic. Measurements of direct current magnetic susceptibility and isothermal magnetization give insight into the relevant cluster Hamiltonians for , , and , and alternating current susceptibility shows slow relaxation in , but not in or down to 2 K and up to 1 kHz.
View Article and Find Full Text PDF