Learning long-term dependencies with gradient descent is difficult.

IEEE Trans Neural Netw

Dept. d'Inf. et de Recherche Oper., Montreal Univ., Que.

Published: October 2012

Recurrent neural networks (RNNs) are powerful tools for mapping input sequences to output sequences in tasks like recognition, production, and prediction.
Training RNNs can be challenging when dealing with long-term dependencies in input/output sequences, as traditional gradient-based learning methods struggle to maintain relevant information over extended periods.
The text discusses the trade-off between effective learning through gradient descent and the ability to retain information over long durations, prompting the exploration of alternative learning methods beyond standard gradient descent.

Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. We show why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a trade-off between efficient learning by gradient descent and latching on information for long periods. Based on an understanding of this problem, alternatives to standard gradient descent are considered.

Download full-text PDF	Source
http://dx.doi.org/10.1109/72.279181	DOI Listing

Publication Analysis

Top Keywords

gradient descent

recurrent neural

neural networks

learning long-term

long-term dependencies

gradient

dependencies gradient

descent difficult

difficult recurrent

networks map

Similar Publications

Tensor neural networks for high-dimensional Fokker-Planck equations.

Neural Netw

January 2025

Division of Applied Mathematics, Brown University, Providence, RI 02912, USA; Advanced Computing, Mathematics and Data Division, Pacific Northwest National Laboratory, Richland, WA, United States. Electronic address:

Taorui Wang Zheyuan Hu Kenji Kawaguchi Zhongqiang Zhang George Em Karniadakis

We solve high-dimensional steady-state Fokker-Planck equations on the whole space by applying tensor neural networks. The tensor networks are a linear combination of tensor products of one-dimensional feedforward networks or a linear combination of several selected radial basis functions. The use of tensor feedforward networks allows us to efficiently exploit auto-differentiation (in physical variables) in major Python packages while using radial basis functions can fully avoid auto-differentiation, which is rather expensive in high dimensions.

View Article and Find Full Text PDF

Similar Publications

Flexible task abstractions emerge in linear networks with fast and bounded units.

ArXiv

January 2025

Kai Sandbrink Jan P Bauer Alexandra M Proca Andrew M Saxe Christopher Summerfield

Animals survive in dynamic environments changing at arbitrary timescales, but such data distribution shifts are a challenge to neural networks. To adapt to change, neural systems may change a large number of parameters, which is a slow process involving forgetting past information. In contrast, animals leverage distribution changes to segment their stream of experience into tasks and associate them with internal task abstracts.

View Article and Find Full Text PDF

Similar Publications

Design for light-based spherical aberration correction of ultrafast electron microscopes.

Opt Express

January 2025

Marius Constantin Chirita Mihaila Martin Kozák

We theoretically demonstrate that ponderomotive interactions near the electron cross-over can be used for aberration correction in ultrafast electron microscopes. Highly magnified electron shadow images from SiN thin films are utilized to visualize the distortions induced by spherical aberrations. Our simulations of electron-light interactions indicate that spherical aberrations can be compensated resulting in an aberration-free angle of 8.

View Article and Find Full Text PDF

Similar Publications

High space-bandwidth product DMD holographic display using gradient descent optimization.

Opt Express

December 2024

Jingyi Pei Chenliang Chang Xian Ding Bo Dai Qi Wang

Digital micromirror devices (DMDs), owing to their rapid refresh rates and ability to shape particular optical patterns, are key tools for holographic 3D near-eye displays. However, relying on a single-sideband (SSB) filter to eliminate crosstalk from zero-order and conjugate noise leads to an enormous decrease in the utilization of spatial bandwidth product (SBP). In this paper, we develop a new strategy for the binary hologram optimization framework to enable the full utilization of SBP of DMD holographic display by minimizing conjugate noise.

View Article and Find Full Text PDF

Similar Publications

Associations between common genetic variants and income provide insights about the socio-economic health gradient.

Nat Hum Behav

January 2025

Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands.

Hyeokmoon Kweon Casper A P Burik Yuchen Ning Rafael Ahlskog Charley Xia

We conducted a genome-wide association study on income among individuals of European descent (N = 668,288) to investigate the relationship between socio-economic status and health disparities. We identified 162 genomic loci associated with a common genetic factor underlying various income measures, all with small effect sizes (the Income Factor). Our polygenic index captures 1-5% of income variance, with only one fourth due to direct genetic effects.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!