Publications by authors named "Edward Catoiu"

Article Synopsis
  • iModulonDB is a centralized database launched in 2020, focusing on iModulons—sets of co-expressed genes identified through independent component analysis of transcriptomic data, enhancing our understanding of transcriptional regulatory networks in organisms.
  • The recent update significantly expands the database, adding 19 new ICA decompositions, over 8,900 expression profiles, and features for 12 additional organisms, making it a more comprehensive resource for researchers.
  • New tools, interactive graphs, and improved interfaces facilitate user engagement and analysis of genetic regulation, allowing scientists to quickly access information regarding experimental conditions and explore related resources.
View Article and Find Full Text PDF

Unlabelled: A critical body of knowledge has developed through advances in protein microscopy, protein-fold modeling, structural biology software, availability of sequenced bacterial genomes, large-scale mutation databases, and genome-scale models. Based on these recent advances, we develop a computational framework that; i) identifies the oligomeric structural proteome encoded by an organism's genome from available structural resources; ii) maps multi-strain alleleomic variation, resulting in the structural proteome for a species; and iii) calculates the 3D orientation of proteins across subcellular compartments with residue-level precision. Using the platform, we; iv) compute the quaternary K-12 MG1655 structural proteome; v) use a dataset of 12,000 mutations to build Random Forest classifiers that can predict the severity of mutations; and, in combination with a genome-scale model that computes proteome allocation, vi) obtain the spatial allocation of the proteome.

View Article and Find Full Text PDF

A critical body of knowledge has developed through advances in protein microscopy, protein-fold modeling, structural biology software, availability of sequenced bacterial genomes, large-scale mutation databases, and genome-scale models. Based on these recent advances, we develop a computational platform that; i) computes the oligomeric structural proteome encoded by an organism's genome; ii) maps multi-strain alleleomic variation, resulting in the structural proteome for a species; and iii) calculates the 3D orientation of proteins across subcellular compartments with angstrom-level precision. Using the platform, we; iv) compute the full quaternary K-12 MG1655 structural proteome; v) deploy structure-guided analyses to identify consequential mutations; and, in combination with a genome-scale model that computes proteome allocation, vi) obtain a draft 3D visualization of the proteome in a functioning cell.

View Article and Find Full Text PDF

The genomic diversity across strains of a species forms the genetic basis for differences in their behavior. A large-scale assessment of sequence variation has been made possible by the growing availability of strain-specific whole-genome sequences (WGS) and with the advent of large-scale databases of laboratory-acquired mutations. We define the "alleleome" through a genome-scale assessment of amino acid (AA) sequence diversity in open reading frames across 2,661 WGS from wild-type strains.

View Article and Find Full Text PDF

Background: The reconstruction of metabolic networks and the three-dimensional coverage of protein structures have reached the genome-scale in the widely studied Escherichia coli K-12 MG1655 strain. The combination of the two leads to the formation of a structural systems biology framework, which we have used to analyze differences between the reactive oxygen species (ROS) sensitivity of the proteomes of sequenced strains of E. coli.

View Article and Find Full Text PDF

Oxidative stress is concomitant with aerobic metabolism. Thus, bacterial genomes encode elaborate mechanisms to achieve redox homeostasis. Here we report that the peroxide-sensing transcription factor, oxyR, is a common mutational target using bacterial species belonging to two genera, Escherichia coli and Vibrio natriegens, in separate growth conditions implemented during laboratory evolution.

View Article and Find Full Text PDF

Pseudogenes represent open reading frames that have been damaged by mutations, rendering the gene product non-functional. Pseudogenes are found in many genomes and are not always eliminated, even if they are potentially 'wasteful'. This raises a fundamental question about their prevalence.

View Article and Find Full Text PDF

Background: Essentiality assays are important tools commonly utilized for the discovery of gene functions. Growth/no growth screens of single gene knockout strain collections are also often utilized to test the predictive power of genome-scale models. False positive predictions occur when computational analysis predicts a gene to be non-essential, however experimental screens deem the gene to be essential.

View Article and Find Full Text PDF

Mycobacterium tuberculosis is a serious human pathogen threat exhibiting complex evolution of antimicrobial resistance (AMR). Accordingly, the many publicly available datasets describing its AMR characteristics demand disparate data-type analyses. Here, we develop a reference strain-agnostic computational platform that uses machine learning approaches, complemented by both genetic interaction analysis and 3D structural mutation-mapping, to identify signatures of AMR evolution to 13 antibiotics.

View Article and Find Full Text PDF
Article Synopsis
  • Researchers reconstructed metabolic models for 410 Salmonella strains across 64 serovars, revealing diverse metabolic networks related to carbon metabolism and cell wall biosynthesis.
  • The study found that a strain’s metabolic capabilities are closely linked to its serovar and the host it was isolated from.
  • Experimental growth predictions matched 83.1% of actual results, highlighting specific nutritional requirements in some strains and showing that extraintestinal serovars may have lost important pathways for survival in the gastrointestinal environment.
View Article and Find Full Text PDF

Genome-scale models of metabolism and macromolecular expression (ME-models) explicitly compute the optimal proteome composition of a growing cell. ME-models expand upon the well-established genome-scale models of metabolism (M-models), and they enable a new fundamental understanding of cellular growth. ME-models have increased predictive capabilities and accuracy due to their inclusion of the biosynthetic costs for the machinery of life, but they come with a significant increase in model size and complexity.

View Article and Find Full Text PDF

Summary: Working with protein structures at the genome-scale has been challenging in a variety of ways. Here, we present ssbio, a Python package that provides a framework to easily work with structural information in the context of genome-scale network reconstructions, which can contain thousands of individual proteins. The ssbio package provides an automated pipeline to construct high quality genome-scale models with protein structures (GEM-PROs), wrappers to popular third-party programs to compute associated protein properties, and methods to visualize and annotate structures directly in Jupyter notebooks, thus lowering the barrier of linking 3D structural data with established systems workflows.

View Article and Find Full Text PDF