We explore how to quantify uncertainty when designing predictive models for healthcare to provide well-calibrated results. Uncertainty quantification and calibration are critical in medicine, as one must not only accommodate the variability of the underlying physiology, but adjust to the uncertain data collection and reporting process. This occurs not only on the context of electronic health records (i.
View Article and Find Full Text PDFObjective: The study sought to build predictive models of next menstrual cycle start date based on mobile health self-tracked cycle data. Because app users may skip tracking, disentangling physiological patterns of menstruation from tracking behaviors is necessary for the development of predictive models.
Materials And Methods: We use data from a popular menstrual tracker (186 000 menstruators with over 2 million tracked cycles) to learn a predictive model, which (1) accounts explicitly for self-tracking adherence; (2) updates predictions as a given cycle evolves, allowing for interpretable insight into how these predictions change over time; and (3) enables modeling of an individual's cycle length history while incorporating population-level information.
Personalized cancer treatments based on the molecular profile of a patient's tumor are an emerging and exciting class of treatments in oncology. As genomic tumor profiling is becoming more common, targeted treatments for specific molecular alterations are gaining traction. To discover new potential therapeutics that may apply to broad classes of tumors matching some molecular pattern, experimentalists and pharmacologists rely on high-throughput, in vitro screens of many compounds against many different cell lines.
View Article and Find Full Text PDFThe menstrual cycle is a key indicator of overall health for women of reproductive age. Previously, menstruation was primarily studied through survey results; however, as menstrual tracking mobile apps become more widely adopted, they provide an increasingly large, content-rich source of menstrual health experiences and behaviors over time. By exploring a database of user-tracked observations from the Clue app by BioWink GmbH of over 378,000 users and 4.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
January 2020
Predicting how interactions between transcription factors and regulatory DNA sequence dictate rates of transcription and, ultimately, drive developmental outcomes remains an open challenge in physical biology. Using stripe 2 of the gene in embryos as a case study, we dissect the regulatory forces underpinning a key step along the developmental decision-making cascade: the generation of cytoplasmic mRNA patterns via the control of transcription in individual cells. Using live imaging and computational approaches, we found that the transcriptional burst frequency is modulated across the stripe to control the mRNA production rate.
View Article and Find Full Text PDFT cells engage in two modes of interaction with antigen-presenting surfaces: stable synapses and motile kinapses. Although it is surmised that durable interactions of T cells with antigen-presenting cells involve synapses, in situ 3D imaging cannot resolve the mode of interaction. We have established in vitro 2D platforms and quantitative metrics to determine cell-intrinsic modes of interaction when T cells are faced with spatially continuous or restricted stimulation.
View Article and Find Full Text PDFGene regulatory circuits must contend with intrinsic noise that arises due to finite numbers of proteins. While some circuits act to reduce this noise, others appear to exploit it. A striking example is the competence circuit in Bacillus subtilis, which exhibits much larger noise in the duration of its competence events than a synthetically constructed analog that performs the same function.
View Article and Find Full Text PDFWe present the Unsupervised Phenome Model (UPhenome), a probabilistic graphical model for large-scale discovery of computational models of disease, or phenotypes. We tackle this challenge through the joint modeling of a large set of diseases and a large set of clinical observations. The observations are drawn directly from heterogeneous patient record data (notes, laboratory tests, medications, and diagnosis codes), and the diseases are modeled in an unsupervised fashion.
View Article and Find Full Text PDFNanopore sequencing promises long read-lengths and single-molecule resolution, but the stochastic motion of the DNA molecule inside the pore is, as of this writing, a barrier to high accuracy reads. We develop a method of statistical inference that explicitly accounts for this error, and demonstrate that high accuracy (>99%) sequence inference is feasible even under highly diffusive motion by using a hidden Markov model to jointly analyze multiple stochastic reads. Using this model, we place bounds on achievable inference accuracy under a range of experimental parameters.
View Article and Find Full Text PDFBackground: Single-molecule techniques have emerged as incisive approaches for addressing a wide range of questions arising in contemporary biological research [Trends Biochem Sci 38:30-37, 2013; Nat Rev Genet 14:9-22, 2013; Curr Opin Struct Biol 2014, 28C:112-121; Annu Rev Biophys 43:19-39, 2014]. The analysis and interpretation of raw single-molecule data benefits greatly from the ongoing development of sophisticated statistical analysis tools that enable accurate inference at the low signal-to-noise ratios frequently associated with these measurements. While a number of groups have released analysis toolkits as open source software [J Phys Chem B 114:5386-5403, 2010; Biophys J 79:1915-1927, 2000; Biophys J 91:1941-1951, 2006; Biophys J 79:1928-1944, 2000; Biophys J 86:4015-4029, 2004; Biophys J 97:3196-3205, 2009; PLoS One 7:e30024, 2012; BMC Bioinformatics 288 11(8):S2, 2010; Biophys J 106:1327-1337, 2014; Proc Int Conf Mach Learn 28:361-369, 2013], it remains difficult to compare analysis for experiments performed in different labs due to a lack of standardization.
View Article and Find Full Text PDFBackground: The extraordinary success of imatinib in the treatment of BCR-ABL1 associated cancers underscores the need to identify novel functional gene fusions in cancer. RNA sequencing offers a genome-wide view of expressed transcripts, uncovering biologically functional gene fusions. Although several bioinformatics tools are already available for the detection of putative fusion transcripts, candidate event lists are plagued with non-functional read-through events, reverse transcriptase template switching events, incorrect mapping, and other systematic errors.
View Article and Find Full Text PDFThe bacterial transcription factor LacI loops DNA by binding to two separate locations on the DNA simultaneously. Despite being one of the best-studied model systems for transcriptional regulation, the number and conformations of loop structures accessible to LacI remain unclear, though the importance of multiple coexisting loops has been implicated in interactions between LacI and other cellular regulators of gene expression. To probe this issue, we have developed a new analysis method for tethered particle motion, a versatile and commonly used in vitro single-molecule technique.
View Article and Find Full Text PDFMany single-molecule experiments aim to characterize biomolecular processes in terms of kinetic models that specify the rates of transition between conformational states of the biomolecule. Estimation of these rates often requires analysis of a population of molecules, in which the conformational trajectory of each molecule is represented by a noisy, time-dependent signal trajectory. Although hidden Markov models (HMMs) may be used to infer the conformational trajectories of individual molecules, estimating a consensus kinetic model from the population of inferred conformational trajectories remains a statistically difficult task, as inferred parameters vary widely within a population.
View Article and Find Full Text PDFRecent single-cell experiments have revived interest in the unavoidable or intrinsic noise in biochemical and genetic networks arising from the small number of molecules of the participating species. That is, rather than modeling regulatory networks in terms of the deterministic dynamics of concentrations, we model the dynamics of the probability of a given copy number of the reactants in single cells. Most of the modeling activity of the last decade has centered on stochastic simulation, i.
View Article and Find Full Text PDFWe address the problem of analyzing sets of noisy time-varying signals that all report on the same process but confound straightforward analyses due to complex inter-signal heterogeneities and measurement artifacts. In particular we consider single-molecule experiments which indirectly measure the distinct steps in a biomolecular process via observations of noisy time-dependent signals such as a fluorescence intensity or bead position. Straightforward hidden Markov model (HMM) analyses attempt to characterize such processes in terms of a set of conformational states, the transitions that can occur between these states, and the associated rates at which those transitions occur; but require ad-hoc post-processing steps to combine multiple signals.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
January 2011
Over the past decade, a number of researchers in systems biology have sought to relate the function of biological systems to their network-level descriptions--lists of the most important players and the pairwise interactions between them. Both for large networks (in which statistical analysis is often framed in terms of the abundance of repeated small subgraphs) and for small networks which can be analyzed in greater detail (or even synthesized in vivo and subjected to experiment), revealing the relationship between the topology of small subgraphs and their biological function has been a central goal. We here seek to pose this revelation as a statistical task, illustrated using a particular setup which has been constructed experimentally and for which parameterized models of transcriptional regulation have been studied extensively.
View Article and Find Full Text PDFBackground: The recent explosion of experimental techniques in single molecule biophysics has generated a variety of novel time series data requiring equally novel computational tools for analysis and inference. This article describes in general terms how graphical modeling may be used to learn from biophysical time series data using the variational Bayesian expectation maximization algorithm (VBEM). The discussion is illustrated by the example of single-molecule fluorescence resonance energy transfer (smFRET) versus time data, where the smFRET time series is modeled as a hidden Markov model (HMM) with Gaussian observables.
View Article and Find Full Text PDFIntracellular transmission of information via chemical and transcriptional networks is thwarted by a physical limitation: The finite copy number of the constituent chemical species introduces unavoidable intrinsic noise. Here we solve for the complete probabilistic description of the intrinsically noisy response to an oscillatory driving signal. We derive and numerically verify a number of simple scaling laws.
View Article and Find Full Text PDFA key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profiles and then searching for overrepresented motifs in the promoter sequences of genes in a cluster. However, genes with similar expression profiles may be controlled by distinct regulatory programs.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
June 2010
Min-cut clustering, based on minimizing one of two heuristic cost functions proposed by Shi and Malik nearly a decade ago, has spawned tremendous research, both analytic and algorithmic, in the graph partitioning and image segmentation communities over the last decade. It is, however, unclear if these heuristics can be derived from a more general principle, facilitating generalization to new problem settings. Motivated by an existing graph partitioning framework, we derive relationships between optimizing relevance information, as defined in the Information Bottleneck method, and the regularized cut in a K-partitioned graph.
View Article and Find Full Text PDFTime series data provided by single-molecule Förster resonance energy transfer (smFRET) experiments offer the opportunity to infer not only model parameters describing molecular complexes, e.g., rate constants, but also information about the model itself, e.
View Article and Find Full Text PDFPhys Rev E Stat Nonlin Soft Matter Phys
October 2009
Determining the mechanism by which tRNAs rapidly and precisely transit through the ribosomal A, P, and E sites during translation remains a major goal in the study of protein synthesis. Here, we report the real-time dynamics of the L1 stalk, a structural element of the large ribosomal subunit that is implicated in directing tRNA movements during translation. Within pretranslocation ribosomal complexes, the L1 stalk exists in a dynamic equilibrium between open and closed conformations.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
April 2009
The past decade has seen great advances in our understanding of the role of noise in gene regulation and the physical limits to signaling in biological networks. Here, we introduce the spectral method for computation of the joint probability distribution over all species in a biological network. The spectral method exploits the natural eigenfunctions of the master equation of birth-death processes to solve for the joint distribution of modules within the network, which then inform each other and facilitate calculation of the entire joint distribution.
View Article and Find Full Text PDF