Before we attempt to (approximately) learn a function between two sets of observables of a physical process, we must first decide what the and of the desired function are going to be. Here we demonstrate two distinct, data-driven ways of first deciding "the right quantities" to relate through such a function, and then proceeding to learn it. This is accomplished by first processing simultaneous heterogeneous data streams (ensembles of time series) from observations of a physical system: records of multiple of the system.
View Article and Find Full Text PDFConfinement can substantially alter the physicochemical properties of materials by breaking translational isotropy and rendering all physical properties position-dependent. Molecular dynamics (MD) simulations have proven instrumental in characterizing such spatial heterogeneities and probing the impact of confinement on materials' properties. For static properties, this is a straightforward task and can be achieved via simple spatial binning.
View Article and Find Full Text PDFSingle-cell RNA sequencing has been widely used to investigate cell state transitions and gene dynamics of biological processes. Current strategies to infer the sequential dynamics of genes in a process typically rely on constructing cell pseudotime through cell trajectory inference. However, the presence of concurrent gene processes in the same group of cells and technical noise can obscure the true progression of the processes studied.
View Article and Find Full Text PDFExperimental work across species has demonstrated that spontaneously generated behaviors are robustly coupled to variations in neural activity within the cerebral cortex. Functional magnetic resonance imaging data suggest that temporal correlations in cortical networks vary across distinct behavioral states, providing for the dynamic reorganization of patterned activity. However, these data generally lack the temporal resolution to establish links between cortical signals and the continuously varying fluctuations in spontaneous behavior observed in awake animals.
View Article and Find Full Text PDFConfinement breaks translational and rotational symmetry in materials and makes all physical properties functions of position. Such spatial variations are key to modulating material properties at the nanoscale, and characterizing them accurately is therefore an intense area of research in the molecular simulations community. This is relatively easy to accomplish for basic mechanical observables.
View Article and Find Full Text PDFMaterials under confinement can possess properties that deviate considerably from their bulk counterparts. Indeed, confinement makes all physical properties position-dependent and possibly anisotropic, and characterizing such spatial variations and directionality has been an intense area of focus in experimental and computational studies of confined matter. While this task is fairly straightforward for simple mechanical observables, it is far more daunting for transport properties such as diffusivity that can only be estimated from autocorrelations of mechanical observables.
View Article and Find Full Text PDFSIAM J Math Data Sci
March 2021
A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g.
View Article and Find Full Text PDFWe propose a new fast method of measuring distances between large numbers of related high dimensional datasets called the Diffusion Earth Mover's Distance (EMD). We model the datasets as distributions supported on common data graph that is derived from the affinity matrix computed on the combined data. In such cases where the graph is a discretization of an underlying Riemannian closed manifold, we prove that Diffusion EMD is topologically equivalent to the standard EMD with a geodesic ground distance.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
December 2020
We propose a local conformal autoencoder (LOCA) for standardized data coordinates. LOCA is a deep learning-based method for obtaining standardized data coordinates from scientific measurements. Data observations are modeled as samples from an unknown, nonlinear deformation of an underlying Riemannian manifold, which is parametrized by a few normalized, latent variables.
View Article and Find Full Text PDFThe paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely many multivariate samples. When the distributions are locally low-dimensional, the proposed test can be made more powerful to distinguish certain alternatives by incorporating local covariance matrices and constructing an anisotropic kernel. The kernel matrix is asymmetric; it computes the affinity between [Formula: see text] data points and a set of [Formula: see text] reference points, where [Formula: see text] can be drastically smaller than [Formula: see text].
View Article and Find Full Text PDFAn amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDFThe high-dimensional data created by high-throughput technologies require visualization tools that reveal data structure and patterns in an intuitive form. We present PHATE, a visualization method that captures both local and global nonlinear structure using an information-geometric distance between data points. We compare PHATE to other tools on a variety of artificial and biological datasets, and find that it consistently preserves a range of patterns in data, including continual progressions, branches and clusters, better than other tools.
View Article and Find Full Text PDFIt is known that if ( ) ∈ is a sequence of orthogonal polynomials in ([-1,1],()), then the roots are distributed according to an arcsine distribution (1 - ) for a wide variety of weights (). We connect this to a result of the Hilbert transform due to Tricomi: if ()(1 - ) ∈ (-1,1) and its Hilbert transform vanishes on (-1,1), then the function is a multiple of the arcsine distribution We also prove a localized Parseval-type identity that seems to be new: if ()(1- ) ∈ L(-1, 1) and has mean value 0 on (-1, 1), then .
View Article and Find Full Text PDFWe consider the analysis of high dimensional data given in the form of a matrix with columns consisting of observations and rows consisting of features. Often the data is such that the observations do not reside on a regular grid, and the given order of the features is arbitrary and does not convey a notion of locality. Therefore, traditional transforms and metrics cannot be used for data organization and analysis.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
September 2017
The discovery of physical laws consistent with empirical observations is at the heart of (applied) science and engineering. These laws typically take the form of nonlinear differential equations depending on parameters; dynamical systems theory provides, through the appropriate normal forms, an "intrinsic" prototypical characterization of the types of dynamical regimes accessible to a given model. Using an implementation of data-informed geometry learning, we directly reconstruct the relevant "normal forms": a quantitative mapping from empirical observations to prototypical realizations of the underlying dynamics.
View Article and Find Full Text PDFPublic reporting of measures of hospital performance is an important component of quality improvement efforts in many countries. However, it can be challenging to provide an overall characterization of hospital performance because there are many measures of quality. In the United States, the Centers for Medicare and Medicaid Services reports over 100 measures that describe various domains of hospital quality, such as outcomes, the patient experience and whether established processes of care are followed.
View Article and Find Full Text PDFWe describe and implement a computer-assisted approach for accelerating the exploration of uncharted effective free-energy surfaces (FESs). More generally, the aim is the extraction of coarse-grained, macroscopic information from stochastic or atomistic simulations, such as molecular dynamics (MD). The approach functionally links the MD simulator with nonlinear manifold learning techniques.
View Article and Find Full Text PDFRandomized trials of hypertension have seldom examined heterogeneity in response to treatments over time and the implications for cardiovascular outcomes. Understanding this heterogeneity, however, is a necessary step toward personalizing antihypertensive therapy. We applied trajectory-based modeling to data on 39 763 study participants of the ALLHAT (Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial) to identify distinct patterns of systolic blood pressure (SBP) response to randomized medications during the first 6 months of the trial.
View Article and Find Full Text PDFThis paper presents a robust unsupervised harmonic co-clustering method for profiling arbor morphologies for ensembles of reconstructed brain cells (e.g., neurons, microglia) based on quantitative measurements of the cellular arbors.
View Article and Find Full Text PDFThe purpose of this study is to introduce diffusion methods as a tool to label CT scan images according to their position in the human body. A comparative study of different methods based on a k-NN search is carried out and we propose a new, simple and efficient way of applying diffusion techniques that is able to give better location forecasts than methods that can be considered the current state-of-the-art.
View Article and Find Full Text PDFFinding informative low-dimensional descriptions of high-dimensional simulation data (like the ones arising in molecular dynamics or kinetic Monte Carlo simulations of physical and chemical processes) is crucial to understanding physical phenomena, and can also dramatically assist in accelerating the simulations themselves. In this paper, we discuss and illustrate the use of nonlinear intrinsic variables (NIV) in the mining of high-dimensional multiscale simulation data. In particular, we focus on the way NIV allows us to functionally merge different simulation ensembles, and different partial observations of these ensembles, as well as to infer variables not explicitly measured.
View Article and Find Full Text PDFThe goal of this study is to identify preseizure changes in intracranial EEG (icEEG). A novel approach based on the recently developed diffusion map framework, which is considered to be one of the leading manifold learning methods, is proposed. Diffusion mapping provides dimensionality reduction of the data as well as pattern recognition that can be used to distinguish different states of the patient, for example, interictal and preseizure.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
July 2013
In this paper, we present a method for time series analysis based on empirical intrinsic geometry (EIG). EIG enables one to reveal the low-dimensional parametric manifold as well as to infer the underlying dynamics of high-dimensional time series. By incorporating concepts of information geometry, this method extends existing geometric analysis tools to support stochastic settings and parametrizes the geometry of empirical distributions.
View Article and Find Full Text PDFObjective: We tested if a relationship between distant parts of the default mode network (DMN), a resting state network defined by fMRI studies, can be observed with intracranial EEG recorded from patients with localization-related epilepsy.
Methods: Magnitude squared coherence, mutual information, cross-approximate entropy, and the coherence of the gamma power time-series were estimated, for one hour intracranial EEG recordings of background activity from 9 patients, to evaluate the relationship between two test areas which were within the DMN (anterior cingulate and orbital frontal, denoted as T1 and posterior cingulate and mesial parietal, denoted as T2), and one control area (denoted as C), which was outside the DMN. We tested if the relationship between T1 and T2 was stronger than the relationship between each of these areas and C.