Life-course epidemiology relies on specifying complex (causal) models that describe how variables interplay over time. Traditionally, such models have been constructed by perusing existing theory and previous studies. By comparing data-driven and theory-driven models, we investigated whether data-driven causal discovery algorithms can help in this process.
View Article and Find Full Text PDFWe adapt graphical causal structure learning methods to apply to nonstationary time series data, specifically to processes that exhibit stochastic trends. We modify the likelihood component of the BIC score used by score-based search algorithms, such that it remains a consistent selection criterion for integrated or cointegrated processes. We use this modified score in conjunction with the SVAR-GFCI algorithm [15], which allows us to recover qualitative structural information about the underlying data-generating process even in the presence of latent (unmeasured) factors.
View Article and Find Full Text PDFInt J Data Sci Anal
August 2018
Many real datasets contain values missing not at random (MNAR). In this scenario, investigators often perform list-wise deletion, or delete samples with missing values, before applying causal discovery algorithms. List-wise deletion is a sound and general strategy when paired with algorithms such as FCI and RFCI, but the deletion procedure also eliminates otherwise good samples that contain only a few missing values.
View Article and Find Full Text PDFA fundamental task in various disciplines of science, including biology, is to find underlying causal relations and make use of them. Causal relations can be seen if interventions are properly applied; however, in many cases they are difficult or even impossible to conduct. It is then necessary to discover causal relations by analyzing statistical properties of purely observational data, which is known as causal discovery or causal structure search.
View Article and Find Full Text PDFThe heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In large-scale complex dynamical systems such as the Earth system, real experiments are rarely feasible. However, a rapidly increasing amount of observational and simulated data opens up the use of novel data-driven causal methods beyond the commonly adopted correlation techniques.
View Article and Find Full Text PDFMotivation: Integration of data from different modalities is a necessary step for multi-scale data analysis in many fields, including biomedical research and systems biology. Directed graphical models offer an attractive tool for this problem because they can represent both the complex, multivariate probability distributions and the causal pathways influencing the system. Graphical models learned from biomedical data can be used for classification, biomarker selection and functional analysis, while revealing the underlying network structure and thus allowing for arbitrary likelihood queries over the data.
View Article and Find Full Text PDFMach Learn Knowl Discov Databases
September 2017
Discovering causal structure from observational data in the presence of latent variables remains an active research area. Constraint-based causal discovery algorithms are relatively efficient at discovering such causal models from data using independence tests. Typically, however, they derive and output only one such model.
View Article and Find Full Text PDFSeveral studies have indicated that bi-factor models fit a broad range of psychometric data better than alternative multidimensional models such as second-order models, e.g Rodriguez, Reise and Haviland (2016), Gignac (2016), and Carnivez (2016). Murray and Johnson (2013) and Gignac (2016) argue that this phenomenon is partially due to un-modeled complexities (e.
View Article and Find Full Text PDFInt J Approx Reason
September 2017
We present an algorithm for estimating bounds on causal effects from observational data which combines graphical model search with simple linear regression. We assume that the underlying system can be represented by a linear structural equation model with no feedback, and we allow for the possibility of latent confounders. Under assumptions standard in the causal search literature, we use conditional independence constraints to search for an equivalence class of ancestral graphs.
View Article and Find Full Text PDFExisting score-based causal model search algorithms such as (and a speeded up version, ) are asymptotically correct, fast, and reliable, but make the unrealistic assumption that the true causal graph does not contain any unmeasured confounders. There are several constraint-based causal search algorithms (e.g , or +) that are asymptotically correct without assuming that there are no unmeasured confounders, but often perform poorly on small samples.
View Article and Find Full Text PDFWe present an algorithm for estimating bounds on causal effects from observational data which combines graphical model search with simple linear regression. We assume that the underlying system can be represented by a linear structural equation model with no feedback, and we allow for the possibility of latent variables. Under assumptions standard in the causal search literature, we use conditional independence constraints to search for an equivalence class of ancestral graphs.
View Article and Find Full Text PDFAppl Inform (Berl)
February 2016
This paper aims to give a broad coverage of central concepts and principles involved in automated causal inference and emerging approaches to causal discovery from i.i.d data and from time series.
View Article and Find Full Text PDFCommunity-acquired pneumonia (CAP) is an important clinical condition with regard to patient mortality, patient morbidity, and healthcare resource utilization. The assessment of the likely clinical course of a CAP patient can significantly influence decision making about whether to treat the patient as an inpatient or as an outpatient. That decision can in turn influence resource utilization, as well as patient well being.
View Article and Find Full Text PDFWe present evidence of a potentially serious source of error intrinsic to all spotted cDNA microarrays that use IMAGE clones of expressed sequence tags (ESTs). We found that a high proportion of these EST sequences contain 5'-end poly(dT) sequences that are remnants from the oligo(dT)-primed reverse transcription of polyadenylated mRNA templates used to generate EST cDNA for sequence clone libraries. Analysis of expression data from two single-dye cDNA microarray experiments showed that ESTs whose sequences contain repeats of consecutive 5'-end dT residues appeared to be strongly coexpressed, while expression data of all other sequences exhibited no such pattern.
View Article and Find Full Text PDFMotivation: One approach to inferring genetic regulatory structure from microarray measurements of mRNA transcript hybridization is to estimate the associations of gene expression levels measured in repeated samples. The associations may be estimated by correlation coefficients or by conditional frequencies (for discretized measurements) or by some other statistic. Although these procedures have been successfully applied to other areas, their validity when applied to microarray measurements has yet to be tested.
View Article and Find Full Text PDF