Misleading or unnecessary data can have out-sized impacts on the health or accuracy of Machine Learning (ML) models. We present a Bayesian sequential selection method, akin to Bayesian experimental design, that identifies critically important information within a dataset while ignoring data that are either misleading or bring unnecessary complexity to the surrogate model of choice. Our method improves sample-wise error convergence and eliminates instances where more data lead to worse performance and instabilities of the surrogate model, often termed sample-wise "double descent".
View Article and Find Full Text PDFExtreme events in society and nature, such as pandemic spikes, rogue waves or structural failures, can have catastrophic consequences. Characterizing extremes is difficult, as they occur rarely, arise from seemingly benign conditions, and belong to complex and often unknown infinite-dimensional systems. Such challenges render attempts at characterizing them moot.
View Article and Find Full Text PDFPhilos Trans A Math Phys Eng Sci
August 2022
We derive criteria for the selection of datapoints used for data-driven reduced-order modelling and other areas of supervised learning based on Gaussian process regression (GPR). While this is a well-studied area in the fields of active learning and optimal experimental design, most criteria in the literature are empirical. Here we introduce an optimality condition for the selection of a new input defined as the minimizer of the distance between the approximated output probability density function (pdf) of the reduced-order model and the exact one.
View Article and Find Full Text PDFProc Math Phys Eng Sci
February 2020
For many important problems the quantity of interest is an unknown function of the parameters, which is a random vector with known statistics. Since the dependence of the output on this random vector is unknown, the challenge is to identify its statistics, using the minimum number of function evaluations. This problem can be seen in the context of active learning or optimal experimental design.
View Article and Find Full Text PDFFor a large class of dynamical systems, the optimally time-dependent (OTD) modes, a set of deformable orthonormal tangent vectors that track directions of instabilities along any trajectory, are known to depend "pointwise" on the state of the system on the attractor but not on the history of the trajectory. We leverage the power of neural networks to learn this "pointwise" mapping from the phase space to OTD space directly from data. The result of the learning process is a cartography of directions associated with strongest instabilities in the phase space.
View Article and Find Full Text PDFExtreme events that arise spontaneously in chaotic dynamical systems often have an adverse impact on the system or the surrounding environment. As such, their mitigation is highly desirable. Here, we introduce a control strategy for mitigating extreme events in a turbulent shear flow.
View Article and Find Full Text PDFRogue waves are strong localizations of the wave field that can develop in different branches of physics and engineering, such as water or electromagnetic waves. Here, we experimentally quantify the prediction potentials of a comprehensive rogue-wave reduced-order precursor tool that has been recently developed to predict extreme events due to spatially localized modulation instability. The laboratory tests have been conducted in two different water wave facilities and they involve unidirectional water waves; in both cases we show that the deterministic and spontaneous emergence of extreme events is well predicted through the reported scheme.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
October 2018
We develop a method for the evaluation of extreme event statistics associated with nonlinear dynamical systems from a small number of samples. From an initial dataset of design points, we formulate a sequential strategy that provides the "next-best" data point (set of parameters) that when evaluated results in improved estimates of the probability density function (pdf) for a scalar quantity of interest. The approach uses Gaussian process regression to perform Bayesian inference on the parameter-to-observation map describing the quantity of interest.
View Article and Find Full Text PDFPhilos Trans A Math Phys Eng Sci
August 2018
We discuss extreme events as random occurrences of strongly transient dynamics that lead to nonlinear energy transfers within a chaotic attractor. These transient events are the result of finite-time instabilities and therefore are inherently connected with both statistical and dynamical properties of the system. We consider two classes of problems related to extreme events and nonlinear energy transfers, namely (i) the derivation of precursors for the short-term prediction of extreme events, and (ii) the efficient sampling of random realizations for the fastest convergence of the probability density function in the tail region.
View Article and Find Full Text PDFWe introduce a data-driven forecasting method for high-dimensional chaotic systems using long short-term memory (LSTM) recurrent neural networks. The proposed LSTM neural networks perform inference of high-dimensional dynamical systems in their reduced order space and are shown to be an effective set of nonlinear approximators of their attractor. We demonstrate the forecasting performance of the LSTM and compare it with Gaussian processes (GPs) in time series obtained from the Lorenz 96 system, the Kuramoto-Sivashinsky equation and a prototype climate model.
View Article and Find Full Text PDFExtreme events are ubiquitous in a wide range of dynamical systems, including turbulent fluid flows, nonlinear waves, large-scale networks, and biological systems. We propose a variational framework for probing conditions that trigger intermittent extreme events in high-dimensional nonlinear dynamical systems. We seek the triggers as the probabilistically feasible solutions of an appropriately constrained optimization problem, where the function to be maximized is a system observable exhibiting intermittent extreme bursts.
View Article and Find Full Text PDFHigh-dimensional chaotic dynamical systems can exhibit strongly transient features. These are often associated with instabilities that have a finite-time duration. Because of the finite-time character of these transient events, their detection through infinite-time methods, e.
View Article and Find Full Text PDFDrawing upon the bursting mechanism in slow-fast systems, we propose indicators for the prediction of such rare extreme events which do not require a priori known slow and fast coordinates. The indicators are associated with functionals defined in terms of optimally time-dependent (OTD) modes. One such functional has the form of the largest eigenvalue of the symmetric part of the linearized dynamics reduced to these modes.
View Article and Find Full Text PDFNeuronal plasticity helps animals learn from their environment. However, it is challenging to link specific changes in defined neurons to altered behavior. Here, we focus on circadian rhythms in the structure of the principal s-LNv clock neurons in Drosophila.
View Article and Find Full Text PDFPhys Rev E Stat Nonlin Soft Matter Phys
June 2015
We study the evolution of localized wave groups in unidirectional water wave envelope equations [the nonlinear Schrödinger (NLSE) and the modified NLSE (MNLSE)]. These localizations of energy can lead to disastrous extreme responses (rogue waves). We analytically quantify the role of such spatial localization, introducing a technique to reduce the underlying partial differential equation dynamics to a simple ordinary differential equation for the wave packet amplitude.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
May 2014
A major challenge in contemporary data science is the development of statistically accurate particle filters to capture non-Gaussian features in large-dimensional chaotic dynamical systems. Blended particle filters that capture non-Gaussian features in an adaptively evolving low-dimensional subspace through particles interacting with evolving Gaussian statistics on the remaining portion of phase space are introduced here. These blended particle filters are constructed in this paper through a mathematical formalism involving conditional Gaussian mixtures combined with statistically nonlinear forecast models compatible with this structure developed recently with high skill for uncertainty quantification.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
August 2013
A framework for low-order predictive statistical modeling and uncertainty quantification in turbulent dynamical systems is developed here. These reduced-order, modified quasilinear Gaussian (ROMQG) algorithms apply to turbulent dynamical systems in which there is significant linear instability or linear nonnormal dynamics in the unperturbed system and energy-conserving nonlinear interactions that transfer energy from the unstable modes to the stable modes where dissipation occurs, resulting in a statistical steady state; such turbulent dynamical systems are ubiquitous in geophysical and engineering turbulence. The ROMQG method involves constructing a low-order, nonlinear, dynamical system for the mean and covariance statistics in the reduced subspace that has the unperturbed statistics as a stable fixed point and optimally incorporates the indirect effect of non-Gaussian third-order statistics for the unperturbed system in a systematic calibration stage.
View Article and Find Full Text PDF