Motivation: The human microbiome, which is linked to various diseases by growing evidence, has a profound impact on human health. Since changes in the composition of the microbiome across time are associated with disease and clinical outcomes, microbiome analysis should be performed in a longitudinal study. However, due to limited sample sizes and differing numbers of timepoints for different subjects, a significant amount of data cannot be utilized, directly affecting the quality of analysis results.
View Article and Find Full Text PDFThe cell cycle of Caulobacter crescentus involves the polar morphogenesis and an asymmetric cell division driven by precise interactions and regulations of proteins, which makes Caulobacter an ideal model organism for investigating bacterial cell development and differentiation. The abundance of molecular data accumulated on Caulobacter motivates system biologists to analyze the complex regulatory network of cell cycle via quantitative modeling. In this paper, We propose a comprehensive model to accurately characterize the underlying mechanisms of cell cycle regulation based on the study of: a) chromosome replication and methylation; b) interactive pathways of five master regulatory proteins including DnaA, GcrA, CcrM, CtrA, and SciP, as well as novel consideration of their corresponding mRNAs; c) cell cycle-dependent proteolysis of CtrA through hierarchical protease complexes.
View Article and Find Full Text PDFAccuracy of protein-ligand binding free energy calculations utilizing implicit solvent models is critically affected by parameters of the underlying dielectric boundary, specifically, the atomic and water probe radii. Here, a global multidimensional optimization pipeline is developed to find optimal atomic radii specifically for protein-ligand binding calculations in implicit solvent. The computational pipeline has these three key components: (1) a massively parallel implementation of a deterministic global optimization algorithm (VTDIRECT95), (2) an accurate yet reasonably fast generalized Born implicit solvent model (GBNSR6), and (3) a novel robustness metric that helps distinguish between nearly degenerate local minima via a postprocessing step of the optimization.
View Article and Find Full Text PDFThe growing size and complexity of molecular network models makes them increasingly difficult to construct and understand. Modifying a model that consists of tens of reactions is no easy task. Attempting the same on a model containing hundreds of reactions can seem nearly impossible.
View Article and Find Full Text PDFBiologists seek to create increasingly complex molecular regulatory network models. Writing such a model is a creative effort that requires flexible analysis tools and better modeling languages than offered by many of today's biochemical model editors. Our Multistate Model Builder (MSMB) supports multistate models created using different modeling styles that suit the modeler rather than the software.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
August 2019
Parameter estimation in discrete or continuous deterministic cell cycle models is challenging for several reasons, including the nature of what can be observed, and the accuracy and quantity of those observations. The challenge is even greater for stochastic models, where the number of simulations and amount of empirical data must be even larger to obtain statistically valid parameter estimates. The two main contributions of this work are (1) stochastic model parameter estimation based on directly matching multivariate probability distributions, and (2) a new quasi-Newton algorithm class QNSTOP for stochastic optimization problems.
View Article and Find Full Text PDFStoring biologically equivalent indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. Moreover, a unified system is also desirable to compare the indel calling results produced by different tools.
View Article and Find Full Text PDFBackground: Parameter estimation in systems biology is typically done by enforcing experimental observations through an objective function as the parameter space of a model is explored by numerical simulations. Past studies have shown that one usually finds a set of "feasible" parameter vectors that fit the available experimental data equally well, and that these alternative vectors can make different predictions under novel experimental conditions. In this study, we characterize the feasible region of a complex model of the budding yeast cell cycle under a large set of discrete experimental constraints in order to test whether the statistical features of relative protein abundance predictions are influenced by the topology of the cell cycle regulatory network.
View Article and Find Full Text PDFBackground: Most biomolecular reaction modeling tools allow users to build models with a single list of parameter values. However, a common scenario involves different parameterizations of the model to account for the results of related experiments, for example, to define the phenotypes for a variety of mutations (gene knockout, over expression, etc.) of a specific biochemical network.
View Article and Find Full Text PDFBMC Bioinformatics
October 2015
Background: Numerous tools have been developed to predict the fitness effects (i.e., neutral, deleterious, or beneficial) of genetic variants on corresponding proteins.
View Article and Find Full Text PDFBackground: Many genetic variants have been identified in the human genome. The functional effects of a single variant have been intensively studied. However, the joint effects of multiple variants in the same genes have been largely ignored due to their complexity or lack of data.
View Article and Find Full Text PDFIn this study, we focus on a recent stochastic budding yeast cell cycle model. First, we estimate the model parameters using extensive data sets: phenotypes of 110 genetic strains, single cell statistics of wild type and cln3 strains. Optimization of stochastic model parameters is achieved by an automated algorithm we recently used for a deterministic cell cycle model.
View Article and Find Full Text PDFBackground: Building models of molecular regulatory networks is challenging not just because of the intrinsic difficulty of describing complex biological processes. Writing a model is a creative effort that calls for more flexibility and interactive support than offered by many of today's biochemical model editors. Our model editor MSMB - Multistate Model Builder - supports multistate models created using different modeling styles.
View Article and Find Full Text PDFBMC Med Genomics
October 2014
Background: Insulin secreted by pancreatic islet β-cells is the principal regulating hormone of glucose metabolism and plays a key role in controlling glucose level in blood. Impairment of the pancreatic islet function may cause glucose to accumulate in blood, and result in diabetes mellitus. Recent studies have shown that mitochondrial dysfunction has a strong negative effect on insulin secretion.
View Article and Find Full Text PDFBMC Bioinformatics
January 2014
Background: With the development of sequencing technologies, more and more sequence variants are available for investigation. Different classes of variants in the human genome have been identified, including single nucleotide substitutions, insertion and deletion, and large structural variations such as duplications and deletions. Insertion and deletion (indel) variants comprise a major proportion of human genetic variation.
View Article and Find Full Text PDFBackground: Parameter estimation from experimental data is critical for mathematical modeling of protein regulatory networks. For realistic networks with dozens of species and reactions, parameter estimation is an especially challenging task. In this study, we present an approach for parameter estimation that is effective in fitting a model of the budding yeast cell cycle (comprising 26 nonlinear ordinary differential equations containing 126 rate constants) to the experimentally observed phenotypes (viable or inviable) of 119 genetic strains carrying mutations of cell cycle genes.
View Article and Find Full Text PDFBackground: The Structural Classification of Proteins (SCOP) database uses a large number of hidden Markov models (HMMs) to represent families and superfamilies composed of proteins that presumably share the same evolutionary origin. However, how the HMMs are related to one another has not been examined before.
Results: In this work, taking into account the processes used to build the HMMs, we propose a working hypothesis to examine the relationships between HMMs and the families and superfamilies that they represent.
Typical multiscale biochemical models contain fast-scale and slow-scale reactions, where "fast" reactions fire much more frequently than "slow" ones. This feature often causes stiffness in discrete stochastic simulation methods such as Gillespie's algorithm and the Tau-Leaping method leading to inefficient simulation. This paper proposes a new strategy to automatically detect stiffness and identify species that cause stiffness for the Tau-Leaping method, as well as two stiffness reduction methods.
View Article and Find Full Text PDFBiological processes such as circadian rhythms, cell division, metabolism, and development occur as ordered sequences of events. The synchronization of these coordinated events is essential for proper cell function, and hence the determination of critical time points in biological processes is an important component of all biological investigations. In particular, such critical time points establish logical ordering constraints on subprocesses, impose prerequisites on temporal regulation and spatial compartmentalization, and situate dynamic reorganization of functional elements in preparation for subsequent stages.
View Article and Find Full Text PDFOne important aspect of biological systems such as gene regulatory networks and protein-protein interaction networks is the stochastic nature of interactions between chemical species. Such stochastic behaviour can be accurately modelled by the Chemical Master Equation (CME). However, the CME usually imposes intensive computational requirements when used to characterise molecular biological systems.
View Article and Find Full Text PDFThis paper extends previous work on the Darwinian evolutionary fitness effect of the fixation of deleterious mutations by incorporating compensatory mutations, which are mutations (deleterious by themselves) that ameliorate other deleterious mutations, thus reducing the genetic load of populations. Since having compensatory mutations essentially changes the distributional shapes of deleterious mutations, the effect of compensatory mutations is studied by comparing distributions of deleterious mutations without compensatory mutations to those with compensatory mutations. The effect of effective population size (N(e)), fitness distributional shape, and mutation rate on population fitness reduction is studied.
View Article and Find Full Text PDFWe present a new approach to segmenting multiple time series by analyzing the dynamics of cluster formation and rearrangement around putative segment boundaries. This approach finds application in distilling large numbers of gene expression profiles into temporal relationships underlying biological processes. By directly minimizing information-theoretic measures of segmentation quality derived from Kullback-Leibler (KL) divergences, our formulation reveals clusters of genes along with a segmentation such that clusters show concerted behavior within segments but exhibit significant regrouping across segmentation boundaries.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
February 2009
Hidden Markov model (HMM) classifier design is considered for the analysis of sequential data, incorporating both labeled and unlabeled data for training; the balance between the use of labeled and unlabeled data is controlled by an allocation parameter \lambda \in [0, 1), where \lambda = 0 corresponds to purely supervised HMM learning (based only on the labeled data) and \lambda = 1 corresponds to unsupervised HMM-based clustering (based only on the unlabeled data). The associated estimation problem can typically be reduced to solving a set of fixed-point equations in the form of a "natural-parameter homotopy." This paper applies a homotopy method to track a continuous path of solutions, starting from a local supervised solution (\lambda = 0) to a local unsupervised solution (\lambda = 1).
View Article and Find Full Text PDFThis work extends the work of Whitlock in examining the critical effective population sizes from the fixation of both deleterious and beneficial mutations under drift and selection to prevent mutation breakdown of the population. The validity of approximations for the probability of fixation depends on the nature of the assumed distribution for the fitness effect of both types of mutations. Using no approximation for the probability of fixation and assuming a heavy tailed fitness effect distribution, the current model indicates that the coefficients of variation for the fitness effect distributions of both types of mutations and the fitness effect distribution mean for the beneficial mutations are important predictors of the critical effective population size.
View Article and Find Full Text PDF