Philos Trans R Soc Lond B Biol Sci
October 2022
The definition of bacterial species is traditionally a taxonomic issue while bacterial populations are identified by population genetics. These assignments are species specific, and depend on the practitioner. Legacy multilocus sequence typing is commonly used to identify sequence types (STs) and clusters (ST Complexes).
View Article and Find Full Text PDFPhilos Trans R Soc Lond B Biol Sci
October 2022
Salmonella enterica serovar Typhimurium strain ATCC14028s is commercially available from multiple national type culture collections, and has been widely used since 1960 for quality control of growth media and experiments on fitness ("laboratory evolution"). ATCC14028s has been implicated in multiple cross-contaminations in the laboratory, and has also caused multiple laboratory infections and one known attempt at bioterrorism. According to hierarchical clustering of 3002 core gene sequences, ATCC14028s belongs to HierCC cluster HC20_373 in which most internal branch lengths are only one to three SNPs long.
View Article and Find Full Text PDFThe gastric bacterium shares a coevolutionary history with humans that predates the out-of-Africa diaspora, and the geographical specificities of populations reflect multiple well-known human migrations. We extensively sampled from 16 ethnically diverse human populations across Siberia to help resolve whether ancient northern Eurasian populations persisted at high latitudes through the last glacial maximum and the relationships between present-day Siberians and Native Americans. A total of 556 strains were cultivated and genotyped by multilocus sequence typing, and 54 representative draft genomes were sequenced.
View Article and Find Full Text PDFMotivation: Routine infectious disease surveillance is increasingly based on large-scale whole-genome sequencing databases. Real-time surveillance would benefit from immediate assignments of each genome assembly to hierarchical population structures. Here we present pHierCC, a pipeline that defines a scalable clustering scheme, HierCC, based on core genome multi-locus typing that allows incremental, static, multi-level cluster assignments of genomes.
View Article and Find Full Text PDFBlastFrost is a highly efficient method for querying 100,000s of genome assemblies, building on Bifrost, a dynamic data structure for compacted and colored de Bruijn graphs. BlastFrost queries a Bifrost data structure for sequences of interest and extracts local subgraphs, enabling the identification of the presence or absence of individual genes or single nucleotide sequence variants. We show two examples using Salmonella genomes: finding within minutes the presence of genes in the SPI-2 pathogenicity island in a collection of 926 genomes and identifying single nucleotide polymorphisms associated with fluoroquinolone resistance in three genes among 190,209 genomes.
View Article and Find Full Text PDFBacterial genomes can contain traces of a complex evolutionary history, including extensive homologous recombination, gene loss, gene duplications, and horizontal gene transfer. To reconstruct the phylogenetic and population history of a set of multiple bacteria, it is necessary to examine their pangenome, the composite of all the genes in the set. Here we introduce PEPPAN, a novel pipeline that can reliably construct pangenomes from thousands of genetically diverse bacterial genomes that represent the diversity of an entire genus.
View Article and Find Full Text PDFPhilos Trans R Soc Lond B Biol Sci
November 2020
We have recently developed bioinformatic tools to accurately assign metagenomic sequence reads to microbial taxa: SPARSE for probabilistic, taxonomic classification of sequence reads; EToKi for assembling and polishing genomes from short-read sequences; and GrapeTree, a graphic visualizer of genetic distances between large numbers of genomes. Together, these methods support comparative analyses of genomes from ancient skeletons and modern humans. Here, we illustrate these capabilities with 784 samples from historical dental calculus, modern saliva and modern dental plaque.
View Article and Find Full Text PDFis the primary infectious cause of antibiotic-associated diarrhea. Local transmissions and international outbreaks of this pathogen have been previously elucidated by bacterial whole-genome sequencing, but comparative genomic analyses at the global scale were hampered by the lack of specific bioinformatic tools. Here we introduce a publicly accessible database within EnteroBase (http://enterobase.
View Article and Find Full Text PDFEnteroBase is an integrated software environment that supports the identification of global population structures within several bacterial genera that include pathogens. Here, we provide an overview of how EnteroBase works, what it can do, and its future prospects. EnteroBase has currently assembled more than 300,000 genomes from Illumina short reads from , , , , , , and and genotyped those assemblies by core genome multilocus sequence typing (cgMLST).
View Article and Find Full Text PDFThis month: selected work from the 2018 RECOMB meeting, organized by Ecole Polytechnique and held last April in Paris.
View Article and Find Full Text PDFCurrent methods struggle to reconstruct and visualize the genomic relationships of large numbers of bacterial genomes. GrapeTree facilitates the analyses of large numbers of allelic profiles by a static "GrapeTree Layout" algorithm that supports interactive visualizations of large trees within a web browser window. GrapeTree also implements a novel minimum spanning tree algorithm (MSTree V2) to reconstruct genetic relationships despite high levels of missing data.
View Article and Find Full Text PDFSalmonella enterica serovar Paratyphi C causes enteric (paratyphoid) fever in humans. Its presentation can range from asymptomatic infections of the blood stream to gastrointestinal or urinary tract infection or even a fatal septicemia [1]. Paratyphi C is very rare in Europe and North America except for occasional travelers from South and East Asia or Africa, where the disease is more common [2, 3].
View Article and Find Full Text PDFFor many decades, Salmonella enterica has been subdivided by serological properties into serovars or further subdivided for epidemiological tracing by a variety of diagnostic tests with higher resolution. Recently, it has been proposed that so-called eBurst groups (eBGs) based on the alleles of seven housekeeping genes (legacy multilocus sequence typing [MLST]) corresponded to natural populations and could replace serotyping. However, this approach lacks the resolution needed for epidemiological tracing and the existence of natural populations had not been independently validated by independent criteria.
View Article and Find Full Text PDFOnly few molecular studies have addressed the age of bacterial pathogens that infected humans before the beginnings of medical bacteriology, but these have provided dramatic insights. The global genetic diversity of Helicobacter pylori, which infects human stomachs, parallels that of its human host. The time to the most recent common ancestor (tMRCA) of these bacteria approximates that of anatomically modern humans, i.
View Article and Find Full Text PDFIn 2013 Zhou et al. concluded that Salmonella enterica serovar Agona represents a genetically monomorphic lineage of recent ancestry, whose most recent common ancestor existed in 1932, or earlier. The Abstract stated 'Agona consists of three lineages with minimal mutational diversity: only 846 single nucleotide polymorphisms (SNPs) have accumulated in the non-repetitive, core genome since Agona evolved in 1932 and subsequently underwent a major population expansion in the 1960s.
View Article and Find Full Text PDFTuberculosis (TB) was once a major killer in Europe, but it is unclear how the strains and patterns of infection at 'peak TB' relate to what we see today. Here we describe 14 genome sequences of M. tuberculosis, representing 12 distinct genotypes, obtained from human remains from eighteenth-century Hungary using metagenomics.
View Article and Find Full Text PDFEpidemics and pandemics of cholera, a severe diarrheal disease, have occurred since the early 19th century and waves of epidemic disease continue today. Cholera epidemics are caused by individual, genetically monomorphic lineages of Vibrio cholerae: the ongoing seventh pandemic, which has spread globally since 1961, is associated with lineage L2 of biotype El Tor. Previous genomic studies of the epidemiology of the seventh pandemic identified three successive sub-lineages within L2, designated waves 1 to 3, which spread globally from the Bay of Bengal on multiple occasions.
View Article and Find Full Text PDFMultiple epidemic diseases have been designated as emerging or reemerging because the numbers of clinical cases have increased. Emerging diseases are often suspected to be driven by increased virulence or fitness, possibly associated with the gain of novel genes or mutations. However, the time period over which humans have been afflicted by such diseases is only known for very few bacterial pathogens, and the evidence for recently increased virulence or fitness is scanty.
View Article and Find Full Text PDFPlague, one of the most devastating infectious diseases in human history, is caused by the bacterial species Yersinia pestis. A live attenuated Y. pestis strain (EV76) has been widely used as a plague vaccine in various countries around the world.
View Article and Find Full Text PDF