Predictive coding, currently a highly influential theory in neuroscience, has not been widely adopted in machine learning yet. In this work, we transform the seminal model of Rao and Ballard (1999) into a modern deep learning framework while remaining maximally faithful to the original schema. The resulting network we propose (PreCNet) is tested on a widely used next-frame video prediction benchmark, which consists of images from an urban environment recorded from a car-mounted camera, and achieves state-of-the-art performance. Performance on all measures (MSE, PSNR, and SSIM) was further improved when a larger training set (2M images from BDD100k) pointed to the limitations of the KITTI training set. This work demonstrates that an architecture carefully based on a neuroscience model, without being explicitly tailored to the task at hand, can exhibit exceptional performance.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2023.3240857DOI Listing

Publication Analysis

Top Keywords

next-frame video
8
video prediction
8
predictive coding
8
training set
8
precnet next-frame
4
prediction based
4
based predictive
4
coding predictive
4
coding currently
4
currently highly
4

Similar Publications

Adaptation optimizes sensory encoding for future stimuli.

PLoS Comput Biol

January 2025

Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

Sensory neurons continually adapt their response characteristics according to recent stimulus history. However, it is unclear how such a reactive process can benefit the organism. Here, we test the hypothesis that adaptation actually acts proactively in the sense that it optimally adjusts sensory encoding for future stimuli.

View Article and Find Full Text PDF

A spatiotemporal style transfer algorithm for dynamic visual stimulus generation.

Nat Comput Sci

December 2024

Department of Neural Dynamics and Magnetoencephalography, Hertie Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany.

Understanding how visual information is encoded in biological and artificial systems often requires the generation of appropriate stimuli to test specific hypotheses, but available methods for video generation are scarce. Here we introduce the spatiotemporal style transfer (STST) algorithm, a dynamic visual stimulus generation framework that allows the manipulation and synthesis of video stimuli for vision research. We show how stimuli can be generated that match the low-level spatiotemporal features of their natural counterparts, but lack their high-level semantic features, providing a useful tool to study object recognition.

View Article and Find Full Text PDF

Current semi-supervised video object segmentation (VOS) methods often employ the entire features of one frame to predict object masks and update memory. This introduces significant redundant computations. To reduce redundancy, we introduce a Region Aware Video Object Segmentation (RAVOS) approach, which predicts regions of interest (ROIs) for efficient object segmentation and memory storage.

View Article and Find Full Text PDF

Causal Factor Disentanglement for Few-Shot Domain Adaptation in Video Prediction.

Entropy (Basel)

November 2023

Language Intelligence and Information Retrieval (LIIR) Lab, Department of Computer Science KU Leuven, 3001 Leuven, Belgium.

An important challenge in machine learning is performing with accuracy when few training samples are available from the target distribution. If a large number of training samples from a related distribution are available, transfer learning can be used to improve the performance. This paper investigates how to do transfer learning more effectively if the source and target distributions are related through a Sparse Mechanism Shift for the application of next-frame prediction.

View Article and Find Full Text PDF

Diffusion Probabilistic Modeling for Video Generation.

Entropy (Basel)

October 2023

Department of Computer Science, University of California, Irvine, CA 92697, USA.

Denoising diffusion probabilistic models are a promising new class of generative models that mark a milestone in high-quality image generation. This paper showcases their ability to sequentially generate video, surpassing prior methods in perceptual and probabilistic forecasting metrics. We propose an autoregressive, end-to-end optimized video diffusion model inspired by recent advances in neural video compression.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!