Recurrent neural networks (RNNs) are an important class of models for learning sequential behavior. However, training RNNs to learn long-term dependencies is a tremendously difficult task, and this difficulty is widely attributed to the vanishing and exploding gradient (VEG) problem. Since it was first characterized 30 years ago, the belief that if VEG occurs during optimization then RNNs learn long-term dependencies poorly has become a central tenet in the RNN literature and has been steadily cited as motivation for a wide variety of research advancements.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
September 2023
The canonical solution methodology for finite constrained Markov decision processes (CMDPs), where the objective is to maximize the expected infinite-horizon discounted rewards subject to the expected infinite-horizon discounted costs' constraints, is based on convex linear programming (LP). In this brief, we first prove that the optimization objective in the dual linear program of a finite CMDP is a piecewise linear convex (PWLC) function with respect to the Lagrange penalty multipliers. Next, we propose a novel, provably optimal, two-level gradient-aware search (GAS) algorithm which exploits the PWLC structure to find the optimal state-value function and Lagrange penalty multipliers of a finite CMDP.
View Article and Find Full Text PDFUsing the data from loop detector sensors for near-real-time detection of traffic incidents on highways is crucial to averting major traffic congestion. While recent supervised machine learning methods offer solutions to incident detection by leveraging human-labeled incident data, the false alarm rate is often too high to be used in practice. Specifically, the inconsistency in the human labeling of the incidents significantly affects the performance of supervised learning models.
View Article and Find Full Text PDFTo enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross-validation within a single study to assess model accuracy. While an essential first step, cross-validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets.
View Article and Find Full Text PDFAtom-probe tomography (APT) facilitates nano- and atomic-scale characterization and analysis of microstructural features. Specifically, APT is well suited to study the interfacial properties of granular or heterophase systems. Traditionally, the identification of the interface between, for precipitate and matrix phases, in APT data has been obtained either by extracting iso-concentration surfaces based on a user-supplied concentration value or by manually perturbing the concentration value until the iso-concentration surface qualitatively matches the interface.
View Article and Find Full Text PDFBackground: Current multi-petaflop supercomputers are powerful systems, but present challenges when faced with problems requiring large machine learning workflows. Complex algorithms running at system scale, often with different patterns that require disparate software packages and complex data flows cause difficulties in assembling and managing large experiments on these machines.
Results: This paper presents a workflow system that makes progress on scaling machine learning ensembles, specifically in this first release, ensembles of deep neural networks that address problems in cancer research across the atomistic, molecular and population scales.