In many scientific fields, such as economics and neuroscience, we are often faced with nonstationary time series, and concerned with both finding causal relations and forecasting the values of variables of interest, both of which are particularly challenging in such nonstationary environments. In this paper, we study causal discovery and forecasting for nonstationary time series. By exploiting a particular type of state-space model to represent the processes, we show that nonstationarity helps to identify causal structure and that forecasting naturally benefits from learned causal knowledge.
View Article and Find Full Text PDFA fundamental task in various disciplines of science, including biology, is to find underlying causal relations and make use of them. Causal relations can be seen if interventions are properly applied; however, in many cases they are difficult or even impossible to conduct. It is then necessary to discover causal relations by analyzing statistical properties of purely observational data, which is known as causal discovery or causal structure search.
View Article and Find Full Text PDFThe heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In large-scale complex dynamical systems such as the Earth system, real experiments are rarely feasible. However, a rapidly increasing amount of observational and simulated data opens up the use of novel data-driven causal methods beyond the commonly adopted correlation techniques.
View Article and Find Full Text PDFWe test the adequacies of several proposed and two new statistical methods for recovering the causal structure of systems with feedback from synthetic BOLD time series. We compare an adaptation of the first correct method for recovering cyclic linear systems; Granger causal regression; a multivariate autoregressive model with a permutation test; the Group Iterative Multiple Model Estimation (GIMME) algorithm; the Ramsey et al. non-Gaussian methods; two non-Gaussian methods by Hyvärinen and Smith; a method due to Patel et al.
View Article and Find Full Text PDFMotivation: Integration of data from different modalities is a necessary step for multi-scale data analysis in many fields, including biomedical research and systems biology. Directed graphical models offer an attractive tool for this problem because they can represent both the complex, multivariate probability distributions and the causal pathways influencing the system. Graphical models learned from biomedical data can be used for classification, biomarker selection and functional analysis, while revealing the underlying network structure and thus allowing for arbitrary likelihood queries over the data.
View Article and Find Full Text PDFDiscovery of causal relationships from observational data is a fundamental problem. Roughly speaking, there are two types of methods for causal discovery, constraint-based ones and score-based ones. Score-based methods avoid the multiple testing problem and enjoy certain advantages compared to constraint-based ones.
View Article and Find Full Text PDFModern technologies allow large, complex biomedical datasets to be collected from patient cohorts. These datasets are comprised of both continuous and categorical data ("Mixed Data"), and essential variables may be unobserved in this data due to the complex nature of biomedical phenomena. Causal inference algorithms can identify important relationships from biomedical data; however, handling the challenges of causal inference over mixed data with unmeasured confounders in a scalable way is still an open problem.
View Article and Find Full Text PDFDiscovering causal structure of a dynamical system from observed time series is a traditional and important problem. In many practical applications, observed data are obtained by applying subsampling or temporally aggregation to the original causal processes, making it difficult to discover the underlying causal relations. Subsampling refers to the procedure that for every consecutive observations, one is kept, the rest being skipped, and recently some advances have been made in causal discovery from such data.
View Article and Find Full Text PDFProc IEEE Int Conf Data Min
November 2017
We address two important issues in causal discovery from nonstationary or heterogeneous data, where parameters associated with a causal structure may change over time or across data sets. First, we investigate how to efficiently estimate the "driving force" of the nonstationarity of a causal mechanism. That is, given a causal mechanism that varies over time or across data sets and whose qualitative structure is known, we aim to extract from data a low-dimensional and interpretable representation of the main components of the changes.
View Article and Find Full Text PDFIt is commonplace to encounter nonstationary or heterogeneous data, of which the underlying generating process changes over time or across data sets (the data sets may have different experimental conditions or data collection conditions). Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this paper we develop a principled framework for causal discovery from such data, called Constraint-based causal Discovery from Nonstationary/heterogeneous Data (CD-NOD), which addresses two important questions.
View Article and Find Full Text PDFWe describe two modifications that parallelize and reorganize caching in the well-known Greedy Equivalence Search (GES) algorithm for discovering directed acyclic graphs on random variables from sample values. We apply one of these modifications, the Fast Greedy Search (FGS) assuming faithfulness, to an i.i.
View Article and Find Full Text PDFDomain adaptation arises in supervised learning when the training (source domain) and test (target domain) data have different distributions. Let and denote the features and target, respectively, previous work on domain adaptation mainly considers the covariate shift situation where the distribution of the features () changes across domains while the conditional distribution (∣) stays the same. To reduce domain discrepancy, recent methods try to find invariant components [Formula: see text] that have similar [Formula: see text] on different domains by explicitly minimizing a distribution discrepancy measure.
View Article and Find Full Text PDFUsing Gebharter's (2014) representation, we consider aspects of the problem of discovering the structure of unmeasured sub-mechanisms when the variables in those sub-mechanisms have not been measured. Exploiting an early insight of Sober's (1998), we provide a correct algorithm for identifying latent, endogenous structure-sub-mechanisms-for a restricted class of structures. The algorithm can be merged with other methods for discovering causal relations among unmeasured variables, and feedback relations between measured variables and unobserved causes can sometimes be learned.
View Article and Find Full Text PDFJ Am Med Inform Assoc
November 2015
The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools.
View Article and Find Full Text PDFWe consider several alternative ways of exploiting non-Gaussian distributional features, including some that can in principle identify direct, positive feedback relations (graphically, 2-cycles) and combinations of methods that can identify high dimensional graphs. All of the procedures are implemented in the TETRAD freeware (Ramsey et al., 2013).
View Article and Find Full Text PDFFailing to engage in joint attention is a strong marker of impaired social cognition associated with autism spectrum disorder (ASD). The goal of this study was to localize the source of impaired joint attention in individuals with ASD by examining both behavioral and fMRI data collected during various tasks involving eye gaze, directional cuing, and face processing. The tasks were designed to engage three brain networks associated with social cognition [face processing, theory of mind (TOM), and action understanding].
View Article and Find Full Text PDFLindquist and Sobel claim that the graphical causal models they call "agnostic" do not imply any counterfactual conditionals. They doubt that "causal effects" can be discovered using graphical causal models typical of SEMs, DCMs, Bayes nets, Granger causal models, etc. Each of these claims is false or exaggerated.
View Article and Find Full Text PDFSmith et al. report a large study of the accuracy of 38 search procedures for recovering effective connections in simulations of DCM models under 28 different conditions. Their results are disappointing: no method reliably finds and directs connections without large false negatives, large false positives, or both.
View Article and Find Full Text PDFNeumann et al. (2010) aim to find directed graphical representations of the independence and dependence relations among activities in brain regions by applying a search procedure to merged fMRI activity records from a large number of contrasts obtained under a variety of conditions. To that end, Neumann et al.
View Article and Find Full Text PDFWe agree with Cramer et al.'s goal of the discovery of causal relationships, but we argue that the authors' characterization of latent variable models (as deployed for such purposes) overlooks a wealth of extant possibilities. We provide a preliminary analysis of their data, using existing algorithms for causal inference and for the specification of latent variable models.
View Article and Find Full Text PDFNeuroimaging (e.g. fMRI) data are increasingly used to attempt to identify not only brain regions of interest (ROIs) that are especially active during perception, cognition, and action, but also the qualitative causal relations among activity in these regions (known as effective connectivity; Friston, 1994).
View Article and Find Full Text PDFThe conditional intervention principle is a formal principle that relates patterns of interventions and outcomes to causal structure. It is a central assumption of experimental design and the causal Bayes net formalism. Two studies suggest that preschoolers can use the conditional intervention principle to distinguish causal chains, common cause and interactive causal structures even in the absence of differential spatiotemporal cues and specific mechanism knowledge.
View Article and Find Full Text PDFWe discuss our concerns regarding the reliability of data generated by spotted cDNA microarrays. Two types of error we highlight are cross-hybridization artifact due to sequence homologies and sequence errors in the cDNA used for spotting on microarrays. We feel that statisticians who analyze microarray data should be aware of these sources of unreliability intrinsic to cDNA microarray design and use.
View Article and Find Full Text PDF