Before we attempt to (approximately) learn a function between two sets of observables of a physical process, we must first decide what the and of the desired function are going to be. Here we demonstrate two distinct, data-driven ways of first deciding "the right quantities" to relate through such a function, and then proceeding to learn it. This is accomplished by first processing simultaneous heterogeneous data streams (ensembles of time series) from observations of a physical system: records of multiple of the system.
View Article and Find Full Text PDFExperimental work across species has demonstrated that spontaneously generated behaviors are robustly coupled to variations in neural activity within the cerebral cortex. Functional magnetic resonance imaging data suggest that temporal correlations in cortical networks vary across distinct behavioral states, providing for the dynamic reorganization of patterned activity. However, these data generally lack the temporal resolution to establish links between cortical signals and the continuously varying fluctuations in spontaneous behavior observed in awake animals.
View Article and Find Full Text PDFConfinement breaks translational and rotational symmetry in materials and makes all physical properties functions of position. Such spatial variations are key to modulating material properties at the nanoscale, and characterizing them accurately is therefore an intense area of research in the molecular simulations community. This is relatively easy to accomplish for basic mechanical observables.
View Article and Find Full Text PDFMaterials under confinement can possess properties that deviate considerably from their bulk counterparts. Indeed, confinement makes all physical properties position-dependent and possibly anisotropic, and characterizing such spatial variations and directionality has been an intense area of focus in experimental and computational studies of confined matter. While this task is fairly straightforward for simple mechanical observables, it is far more daunting for transport properties such as diffusivity that can only be estimated from autocorrelations of mechanical observables.
View Article and Find Full Text PDFSIAM J Math Data Sci
March 2021
A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
December 2020
We propose a local conformal autoencoder (LOCA) for standardized data coordinates. LOCA is a deep learning-based method for obtaining standardized data coordinates from scientific measurements. Data observations are modeled as samples from an unknown, nonlinear deformation of an underlying Riemannian manifold, which is parametrized by a few normalized, latent variables.
View Article and Find Full Text PDFThe paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely many multivariate samples. When the distributions are locally low-dimensional, the proposed test can be made more powerful to distinguish certain alternatives by incorporating local covariance matrices and constructing an anisotropic kernel. The kernel matrix is asymmetric; it computes the affinity between [Formula: see text] data points and a set of [Formula: see text] reference points, where [Formula: see text] can be drastically smaller than [Formula: see text].
View Article and Find Full Text PDFAn amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDFThe high-dimensional data created by high-throughput technologies require visualization tools that reveal data structure and patterns in an intuitive form. We present PHATE, a visualization method that captures both local and global nonlinear structure using an information-geometric distance between data points. We compare PHATE to other tools on a variety of artificial and biological datasets, and find that it consistently preserves a range of patterns in data, including continual progressions, branches and clusters, better than other tools.
View Article and Find Full Text PDFIt is known that if ( ) ∈ is a sequence of orthogonal polynomials in ([-1,1],()), then the roots are distributed according to an arcsine distribution (1 - ) for a wide variety of weights (). We connect this to a result of the Hilbert transform due to Tricomi: if ()(1 - ) ∈ (-1,1) and its Hilbert transform vanishes on (-1,1), then the function is a multiple of the arcsine distribution We also prove a localized Parseval-type identity that seems to be new: if ()(1- ) ∈ L(-1, 1) and has mean value 0 on (-1, 1), then .
View Article and Find Full Text PDFWe consider the analysis of high dimensional data given in the form of a matrix with columns consisting of observations and rows consisting of features. Often the data is such that the observations do not reside on a regular grid, and the given order of the features is arbitrary and does not convey a notion of locality. Therefore, traditional transforms and metrics cannot be used for data organization and analysis.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
September 2017
The discovery of physical laws consistent with empirical observations is at the heart of (applied) science and engineering. These laws typically take the form of nonlinear differential equations depending on parameters; dynamical systems theory provides, through the appropriate normal forms, an "intrinsic" prototypical characterization of the types of dynamical regimes accessible to a given model. Using an implementation of data-informed geometry learning, we directly reconstruct the relevant "normal forms": a quantitative mapping from empirical observations to prototypical realizations of the underlying dynamics.
View Article and Find Full Text PDFPublic reporting of measures of hospital performance is an important component of quality improvement efforts in many countries. However, it can be challenging to provide an overall characterization of hospital performance because there are many measures of quality. In the United States, the Centers for Medicare and Medicaid Services reports over 100 measures that describe various domains of hospital quality, such as outcomes, the patient experience and whether established processes of care are followed.
View Article and Find Full Text PDFWe describe and implement a computer-assisted approach for accelerating the exploration of uncharted effective free-energy surfaces (FESs). More generally, the aim is the extraction of coarse-grained, macroscopic information from stochastic or atomistic simulations, such as molecular dynamics (MD). The approach functionally links the MD simulator with nonlinear manifold learning techniques.
View Article and Find Full Text PDFRandomized trials of hypertension have seldom examined heterogeneity in response to treatments over time and the implications for cardiovascular outcomes. Understanding this heterogeneity, however, is a necessary step toward personalizing antihypertensive therapy. We applied trajectory-based modeling to data on 39 763 study participants of the ALLHAT (Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial) to identify distinct patterns of systolic blood pressure (SBP) response to randomized medications during the first 6 months of the trial.
View Article and Find Full Text PDFThe purpose of this study is to introduce diffusion methods as a tool to label CT scan images according to their position in the human body. A comparative study of different methods based on a k-NN search is carried out and we propose a new, simple and efficient way of applying diffusion techniques that is able to give better location forecasts than methods that can be considered the current state-of-the-art.
View Article and Find Full Text PDFFinding informative low-dimensional descriptions of high-dimensional simulation data (like the ones arising in molecular dynamics or kinetic Monte Carlo simulations of physical and chemical processes) is crucial to understanding physical phenomena, and can also dramatically assist in accelerating the simulations themselves. In this paper, we discuss and illustrate the use of nonlinear intrinsic variables (NIV) in the mining of high-dimensional multiscale simulation data. In particular, we focus on the way NIV allows us to functionally merge different simulation ensembles, and different partial observations of these ensembles, as well as to infer variables not explicitly measured.
View Article and Find Full Text PDFThe goal of this study is to identify preseizure changes in intracranial EEG (icEEG). A novel approach based on the recently developed diffusion map framework, which is considered to be one of the leading manifold learning methods, is proposed. Diffusion mapping provides dimensionality reduction of the data as well as pattern recognition that can be used to distinguish different states of the patient, for example, interictal and preseizure.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
July 2013
In this paper, we present a method for time series analysis based on empirical intrinsic geometry (EIG). EIG enables one to reveal the low-dimensional parametric manifold as well as to infer the underlying dynamics of high-dimensional time series. By incorporating concepts of information geometry, this method extends existing geometric analysis tools to support stochastic settings and parametrizes the geometry of empirical distributions.
View Article and Find Full Text PDFObjective: We tested if a relationship between distant parts of the default mode network (DMN), a resting state network defined by fMRI studies, can be observed with intracranial EEG recorded from patients with localization-related epilepsy.
Methods: Magnitude squared coherence, mutual information, cross-approximate entropy, and the coherence of the gamma power time-series were estimated, for one hour intracranial EEG recordings of background activity from 9 patients, to evaluate the relationship between two test areas which were within the DMN (anterior cingulate and orbital frontal, denoted as T1 and posterior cingulate and mesial parietal, denoted as T2), and one control area (denoted as C), which was outside the DMN. We tested if the relationship between T1 and T2 was stronger than the relationship between each of these areas and C.
Objective: We propose an automated nutritional assessment algorithm that provides a method for malnutrition risk prediction with high accuracy and reliability.
Methods: The database used for this study was a file of 432 patients, where each patient was described by 4 laboratory parameters and 11 clinical parameters. A malnutrition risk assessment of low (1), moderate (2), or high (3) was assigned by a dietitian for each patient.
Recovering the three-dimensional structure of molecules is important for understanding their functionality. We describe a spectral graph algorithm for reconstructing the three-dimensional structure of molecules from their cryo-electron microscopy images taken at random unknown orientations.We first identify a one-to-one correspondence between radial lines in three-dimensional Fourier space of the molecule and points on the unit sphere.
View Article and Find Full Text PDFThe single-particle reconstruction problem of electron cryo-microscopy (cryo-EM) is to find the three-dimensional structure of a macromolecule given its two-dimensional noisy projection images at unknown random directions. Ab initio estimates of the 3D structure are often obtained by the "Angular Reconstitution" method, in which a coordinate system is established from three projections, and the orientation of the particle giving rise to each image is deduced from common lines among the images. However, a reliable detection of common lines is difficult due to the low signal-to-noise ratio of the images.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
September 2009
Nonlinear independent component analysis is combined with diffusion-map data analysis techniques to detect good observables in high-dimensional dynamic data. These detections are achieved by integrating local principal component analysis of simulation bursts by using eigenvectors of a Markov matrix describing anisotropic diffusion. The widely applicable procedure, a crucial step in model reduction approaches, is illustrated on stochastic chemical reaction network simulations.
View Article and Find Full Text PDF