The reconstruction of complete microbial metabolic pathways using 'omics data from environmental samples remains challenging. Computational pipelines for pathway reconstruction that utilize machine learning methods to predict the presence or absence of KEGG modules in incomplete genomes are lacking. Here, we present MetaPathPredict, a software tool that incorporates machine learning models to predict the presence of complete KEGG modules within bacterial genomic datasets.
View Article and Find Full Text PDFAlthough often neglected in gut microbiota studies, recent evidence suggests that imbalanced, or dysbiotic, gut mycobiota (fungal microbiota) communities in infancy coassociate with states of bacterial dysbiosis linked to inflammatory diseases such as asthma. In the present study, we (i) characterized the infant gut mycobiota at 3 months and 1 year of age in 343 infants from the CHILD Cohort Study, (ii) defined associations among gut mycobiota community composition and environmental factors for the development of inhalant allergic sensitization (atopy) at age 5 years, and (iii) built a predictive model for inhalant atopy status at age 5 years using these data. We show that in Canadian infants, fungal communities shift dramatically in composition over the first year of life.
View Article and Find Full Text PDFAdvances in high-throughput sequencing are reshaping how we perceive microbial communities inhabiting the human body, with implications for therapeutic interventions. Several large-scale datasets derived from hundreds of human microbiome samples sourced from multiple studies are now publicly available. However, idiosyncratic data processing methods between studies introduce systematic differences that confound comparative analyses.
View Article and Find Full Text PDFMotivation: A perennial problem in the analysis of environmental sequence information is the assignment of reads or assembled sequences, e.g. contigs or scaffolds, to discrete taxonomic bins.
View Article and Find Full Text PDFA revolution is unfolding in microbial ecology where petabytes of 'multi-omics' data are produced using next generation sequencing and mass spectrometry platforms. This cornucopia of biological information has enormous potential to reveal the hidden metabolic powers of microbial communities in natural and engineered ecosystems. However, to realize this potential, the development of new technologies and interpretative frameworks grounded in ecological design principles are needed to overcome computational and analytical bottlenecks.
View Article and Find Full Text PDFUnlabelled: Next-generation sequencing is producing vast amounts of sequence information from natural and engineered ecosystems. Although this data deluge has an enormous potential to transform our lives, knowledge creation and translation need software applications that scale with increasing data processing and analysis requirements. Here, we present improvements to MetaPathways, an annotation and analysis pipeline for environmental sequence information that expedites this transformation.
View Article and Find Full Text PDFDespite recent advances in metagenomic and single-cell genomic sequencing to investigate uncultivated microbial diversity and metabolic potential, fundamental questions related to population structure, interactions, and biogeochemical roles of candidate divisions remain. Numerous molecular surveys suggest that stratified ecosystems manifesting anoxic, sulfidic, and/or methane-rich conditions are enriched in these enigmatic microbes. Here we describe diversity, abundance, and cooccurrence patterns of uncultivated microbial communities inhabiting the permanently stratified waters of meromictic Sakinaw Lake, British Columbia, Canada, using 454 sequencing of the small-subunit rRNA gene with three-domain resolution.
View Article and Find Full Text PDFBackground: A convergence of high-throughput sequencing and computational power is transforming biology into information science. Despite these technological advances, converting bits and bytes of sequence information into meaningful insights remains a challenging enterprise. Biological systems operate on multiple hierarchical levels from genomes to biomes.
View Article and Find Full Text PDFMarine Group A (MGA) is a deeply branching and uncultivated phylum of bacteria. Although their functional roles remain elusive, MGA subgroups are particularly abundant and diverse in oxygen minimum zones and permanent or seasonally stratified anoxic basins, suggesting metabolic adaptation to oxygen-deficiency. Here, we expand a previous survey of MGA diversity in O2-deficient waters of the Northeast subarctic Pacific Ocean (NESAP) to include Saanich Inlet (SI), an anoxic fjord with seasonal O2 gradients and periodic sulfide accumulation.
View Article and Find Full Text PDFOil in subsurface reservoirs is biodegraded by resident microbial communities. Water-mediated, anaerobic conversion of hydrocarbons to methane and CO2, catalyzed by syntrophic bacteria and methanogenic archaea, is thought to be one of the dominant processes. We compared 160 microbial community compositions in ten hydrocarbon resource environments (HREs) and sequenced twelve metagenomes to characterize their metabolic potential.
View Article and Find Full Text PDFBackground: A central challenge to understanding the ecological and biogeochemical roles of microorganisms in natural and human engineered ecosystems is the reconstruction of metabolic interaction networks from environmental sequence information. The dominant paradigm in metabolic reconstruction is to assign functional annotations using BLAST. Functional annotations are then projected onto symbolic representations of metabolism in the form of KEGG pathways or SEED subsystems.
View Article and Find Full Text PDFBackground: Pairwise comparison of time series data for both local and time-lagged relationships is a computationally challenging problem relevant to many fields of inquiry. The Local Similarity Analysis (LSA) statistic identifies the existence of local and lagged relationships, but determining significance through a p-value has been algorithmically cumbersome due to an intensive permutation test, shuffling rows and columns and repeatedly calculating the statistic. Furthermore, this p-value is calculated with the assumption of normality -- a statistical luxury dissociated from most real world datasets.
View Article and Find Full Text PDFMarine Group A (MGA) is a candidate phylum of Bacteria that is ubiquitous and abundant in the ocean. Despite being prevalent, the structural and functional properties of MGA populations remain poorly constrained. Here, we quantified MGA diversity and population structure in relation to nutrients and O(2) concentrations in the oxygen minimum zone (OMZ) of the Northeast subarctic Pacific Ocean using a combination of catalyzed reporter deposition fluorescence in situ hybridization (CARD-FISH) and 16S small subunit ribosomal RNA (16S rRNA) gene sequencing (clone libraries and 454-pyrotags).
View Article and Find Full Text PDFDissolved oxygen concentration is a crucial organizing principle in marine ecosystems. As oxygen levels decline, energy is increasingly diverted away from higher trophic levels into microbial metabolism, leading to loss of fixed nitrogen and to production of greenhouse gases, including nitrous oxide and methane. In this Review, we describe current efforts to explore the fundamental factors that control the ecological and microbial biodiversity in oxygen-starved regions of the ocean, termed oxygen minimum zones.
View Article and Find Full Text PDFWe present a programmable droplet-based microfluidic device that combines the reconfigurable flow-routing capabilities of integrated microvalve technology with the sample compartmentalization and dispersion-free transport that is inherent to droplets. The device allows for the execution of user-defined multistep reaction protocols in 95 individually addressable nanoliter-volume storage chambers by consecutively merging programmable sequences of picoliter-volume droplets containing reagents or cells. This functionality is enabled by "flow-controlled wetting," a droplet docking and merging mechanism that exploits the physics of droplet flow through a channel to control the precise location of droplet wetting.
View Article and Find Full Text PDFString barcoding is a recently introduced technique for genomic based identification of microorganisms. In this paper, we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size, on a well equipped workstation.
View Article and Find Full Text PDF