Learning long-term dependencies with gradient descent is difficult.

IEEE Trans Neural Netw

Dept. d'Inf. et de Recherche Oper., Montreal Univ., Que.

Published: October 2012

AI Article Synopsis

  • Recurrent neural networks (RNNs) are powerful tools for mapping input sequences to output sequences in tasks like recognition, production, and prediction.
  • Training RNNs can be challenging when dealing with long-term dependencies in input/output sequences, as traditional gradient-based learning methods struggle to maintain relevant information over extended periods.
  • The text discusses the trade-off between effective learning through gradient descent and the ability to retain information over long durations, prompting the exploration of alternative learning methods beyond standard gradient descent.

Article Abstract

Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. We show why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a trade-off between efficient learning by gradient descent and latching on information for long periods. Based on an understanding of this problem, alternatives to standard gradient descent are considered.

Download full-text PDF

Source
http://dx.doi.org/10.1109/72.279181DOI Listing

Publication Analysis

Top Keywords

gradient descent
12
recurrent neural
8
neural networks
8
learning long-term
4
long-term dependencies
4
gradient
4
dependencies gradient
4
descent difficult
4
difficult recurrent
4
networks map
4

Similar Publications

Tensor neural networks for high-dimensional Fokker-Planck equations.

Neural Netw

January 2025

Division of Applied Mathematics, Brown University, Providence, RI 02912, USA; Advanced Computing, Mathematics and Data Division, Pacific Northwest National Laboratory, Richland, WA, United States. Electronic address:

We solve high-dimensional steady-state Fokker-Planck equations on the whole space by applying tensor neural networks. The tensor networks are a linear combination of tensor products of one-dimensional feedforward networks or a linear combination of several selected radial basis functions. The use of tensor feedforward networks allows us to efficiently exploit auto-differentiation (in physical variables) in major Python packages while using radial basis functions can fully avoid auto-differentiation, which is rather expensive in high dimensions.

View Article and Find Full Text PDF

Animals survive in dynamic environments changing at arbitrary timescales, but such data distribution shifts are a challenge to neural networks. To adapt to change, neural systems may change a large number of parameters, which is a slow process involving forgetting past information. In contrast, animals leverage distribution changes to segment their stream of experience into tasks and associate them with internal task abstracts.

View Article and Find Full Text PDF

We theoretically demonstrate that ponderomotive interactions near the electron cross-over can be used for aberration correction in ultrafast electron microscopes. Highly magnified electron shadow images from SiN thin films are utilized to visualize the distortions induced by spherical aberrations. Our simulations of electron-light interactions indicate that spherical aberrations can be compensated resulting in an aberration-free angle of 8.

View Article and Find Full Text PDF

Digital micromirror devices (DMDs), owing to their rapid refresh rates and ability to shape particular optical patterns, are key tools for holographic 3D near-eye displays. However, relying on a single-sideband (SSB) filter to eliminate crosstalk from zero-order and conjugate noise leads to an enormous decrease in the utilization of spatial bandwidth product (SBP). In this paper, we develop a new strategy for the binary hologram optimization framework to enable the full utilization of SBP of DMD holographic display by minimizing conjugate noise.

View Article and Find Full Text PDF

We conducted a genome-wide association study on income among individuals of European descent (N = 668,288) to investigate the relationship between socio-economic status and health disparities. We identified 162 genomic loci associated with a common genetic factor underlying various income measures, all with small effect sizes (the Income Factor). Our polygenic index captures 1-5% of income variance, with only one fourth due to direct genetic effects.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!