Small molecule machine learning aims to predict chemical, biochemical, or biological properties from molecular structures, with applications such as toxicity prediction, ligand binding, and pharmacokinetics. A recent trend is developing end-to-end models that avoid explicit domain knowledge. These models assume no coverage bias in training and evaluation data, meaning the data are representative of the true distribution.
View Article and Find Full Text PDFThe discovery and identification of molecules in biological and environmental samples is crucial for advancing biomedical and chemical sciences. Tandem mass spectrometry (MS/MS) is the leading technique for high-throughput elucidation of molecular structures. However, decoding a molecular structure from its mass spectrum is exceptionally challenging, even when performed by human experts.
View Article and Find Full Text PDFMetabolites provide a direct functional signature of cellular state. Untargeted metabolomics usually relies on mass spectrometry, a technology capable of detecting thousands of compounds in a biological sample. Metabolite annotation is executed using tandem mass spectrometry.
View Article and Find Full Text PDFThe exchange of metabolites mediates algal and bacterial interactions that maintain ecosystem function. Yet, while thousands of metabolites are produced, only a few molecules have been identified in these associations. Using the ubiquitous microalgae Pseudo-nitzschia sp.
View Article and Find Full Text PDFUntargeted mass spectrometry is employed to detect small molecules in complex biospecimens, generating data that are difficult to interpret. We developed Qemistree, a data exploration strategy based on the hierarchical organization of molecular fingerprints predicted from fragmentation spectra. Qemistree allows mass spectrometry data to be represented in the context of sample metadata and chemical ontologies.
View Article and Find Full Text PDFInterpretation of fragmentation mass spectra depends on our knowledge of collision-induced dissociation mechanisms. Computational methods for the annotation of fragmentation mechanisms operate within the boundaries of recognized fragmentation pathways. The prevalence of charge migration fragmentation (CMF) in sodiated ion fragmentation spectra, which produces nonsodiated fragment ions, is unknown.
View Article and Find Full Text PDFSIRIUS 4 is the best-in-class computational tool for metabolite identification from high-resolution tandem mass spectrometry data. It offers de novo molecular formula annotation with outstanding accuracy. When searching fragmentation spectra in a structure database, it reaches over 70% correct identifications.
View Article and Find Full Text PDFMass spectrometry is a predominant experimental technique in metabolomics and related fields, but metabolite structural elucidation remains highly challenging. We report SIRIUS 4 (https://bio.informatik.
View Article and Find Full Text PDFMotivation: Metabolites, small molecules that are involved in cellular reactions, provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem mass spectrometry to identify the thousands of compounds in a biological sample. Recently, we presented CSI:FingerID for searching in molecular structure databases using tandem mass spectrometry data.
View Article and Find Full Text PDFUnlabelled: Synechococcus sp. strain PCC 7002 has been gaining significance as both a model system for photosynthesis research and for industrial applications. Until recently, the genetic toolbox for this model cyanobacterium was rather limited and relied primarily on tools that only allowed constitutive gene expression.
View Article and Find Full Text PDFCyanobacterial regulation of gene expression must contend with a genome organization that lacks apparent functional context, as the majority of cellular processes and metabolic pathways are encoded by genes found at disparate locations across the genome and relatively few transcription factors exist. In this study, global transcript abundance data from the model cyanobacterium Synechococcus sp. PCC 7002 grown under 42 different conditions was analyzed using Context-Likelihood of Relatedness (CLR).
View Article and Find Full Text PDFMetal homeostasis is a crucial cellular function for nearly all organisms. Some heavy metals (e.g.
View Article and Find Full Text PDFSynechococcus sp. PCC 7002 and many other cyanobacteria have two genes that encode key enzymes involved in chlorophyll a, biliverdin, and heme biosynthesis: acsFI/acsFII, ho1/ho2, and hemF/hemN. Under atmospheric O2 levels, AcsFI synthesizes 3,8-divinyl protochlorophyllide from Mg-protoporphyrin IX monomethyl ester, Ho1 oxidatively cleaves heme to form biliverdin, and HemF oxidizes coproporphyrinogen III to protoporphyrinogen IX.
View Article and Find Full Text PDFFilamentous anoxygenic phototrophs (FAPs) are abundant members of microbial mat communities inhabiting neutral and alkaline geothermal springs. Natural populations of FAPs related to Chloroflexus spp. and Roseiflexus spp.
View Article and Find Full Text PDFGlycogen and compatible solutes are the major polymeric and soluble carbohydrates in cyanobacteria and function as energy reserves and osmoprotectants, respectively. Glycogen synthase null mutants (glgA-I glgA-II) were constructed in the cyanobacterium Synechococcus sp. strain PCC 7002.
View Article and Find Full Text PDFSynechococcus sp. strain PCC 7002 is a unicellular, euryhaline cyanobacterium. It is a model organism for studies of cyanobacterial metabolism and has great potential for biotechnological applications.
View Article and Find Full Text PDFThe unicellular, euryhaline cyanobacterium Synechococcus sp. strain PCC 7002 is a model organism for laboratory-based studies of cyanobacterial metabolism and is a potential platform for biotechnological applications. Two of its most notable properties are its exceptional tolerance of high-light intensity and very rapid growth under optimal conditions.
View Article and Find Full Text PDFAn uncultured member of the phylum Chlorobi, provisionally named 'Candidatus Thermochlorobacter aerophilum', occurs in the microbial mats of alkaline siliceous hot springs at the Yellowstone National Park. 'Ca. T.
View Article and Find Full Text PDFThe genome of the unicellular, euryhaline cyanobacterium Synechococcus sp. PCC 7002 encodes about 3200 proteins. Transcripts were detected for nearly all annotated open reading frames by a global transcriptomic analysis by Next-Generation (SOLiD™) sequencing of cDNA.
View Article and Find Full Text PDFNorthern analysis was employed to investigate mRNA produced by mutant strains of Azotobacter vinelandii with defined deletions in the nif structural genes and in the intergenic noncoding regions. The results indicate that intergenic RNA secondary structures effect the differential accumulation of transcripts, supporting the high Fe protein-to-MoFe protein ratio required for optimal diazotrophic growth.
View Article and Find Full Text PDFMost biological nitrogen (N(2)) fixation results from the activity of a molybdenum-dependent nitrogenase, a complex iron-sulfur enzyme found associated with a diversity of bacteria and some methanogenic archaea. Azotobacter vinelandii, an obligate aerobe, fixes nitrogen via the oxygen-sensitive Mo nitrogenase but is also able to fix nitrogen through the activities of genetically distinct alternative forms of nitrogenase designated the Vnf and Anf systems when Mo is limiting. The Vnf system appears to replace Mo with V, and the Anf system is thought to contain Fe as the only transition metal within the respective active site metallocofactors.
View Article and Find Full Text PDFThe phototrophic microbial mat community of Mushroom Spring, an alkaline siliceous hot spring in Yellowstone National Park, was studied by metatranscriptomic methods. RNA was extracted from mat specimens collected at four timepoints during light-to-dark and dark-to-light transitions in one diel cycle, and these RNA samples were analyzed by both pyrosequencing and SOLiD technologies. Pyrosequencing was used to assess the community composition, which showed that ~84% of the rRNA was derived from members of four kingdoms Cyanobacteria, Chloroflexi, Chlorobi and Acidobacteria.
View Article and Find Full Text PDF