Selecting salient frames for spatiotemporal video modeling and segmentation.

IEEE Trans Image Process

Evanston Northwestern Healthcare Research Institute and Northwestern University, Evanston, IL 60201, USA.

Published: December 2007

We propose a new statistical generative model for spatiotemporal video segmentation. The objective is to partition a video sequence into homogeneous segments that can be used as "building blocks" for semantic video segmentation. The baseline framework is a Gaussian mixture model (GMM)-based video modeling approach that involves a six-dimensional spatiotemporal feature space. Specifically, we introduce the concept of frame saliency to quantify the relevancy of a video frame to the GMM-based spatiotemporal video modeling. This helps us use a small set of salient frames to facilitate the model training by reducing data redundancy and irrelevance. A modified expectation maximization algorithm is developed for simultaneous GMM training and frame saliency estimation, and the frames with the highest saliency values are extracted to refine the GMM estimation for video segmentation. Moreover, it is interesting to find that frame saliency can imply some object behaviors. This makes the proposed method also applicable to other frame-related video analysis tasks, such as key-frame extraction, video skimming, etc. Experiments on real videos demonstrate the effectiveness and efficiency of the proposed method.

Download full-text PDF	Source
http://dx.doi.org/10.1109/tip.2007.908283	DOI Listing

Publication Analysis

Top Keywords

spatiotemporal video

video modeling

video segmentation

frame saliency

video

salient frames

proposed method

selecting salient

spatiotemporal

frames spatiotemporal

Similar Publications

Image reconstruction for compressed ultrafast photography based on manifold learning and the alternating direction method of multipliers.

J Opt Soc Am A Opt Image Sci Vis

August 2024

Haoyu Zhou Yan Song Zhiming Yao Dongwei Hei Yang Li

Compressed ultrafast photography (CUP) is a high-speed imaging technique with a frame rate of up to ten trillion frames per second (fps) and a sequence depth of hundreds of frames. This technique is a powerful tool for investigating ultrafast processes. However, since the reconstruction process is an ill-posed problem, the image reconstruction will be more difficult with the increase of the number of reconstruction frames and the number of pixels of each reconstruction frame.

View Article and Find Full Text PDF

Similar Publications

Full fine-tuning strategy for endoscopic foundation models with expanded learnable offset parameters.

Biomed Phys Eng Express

January 2025

Shandong Normal University, Jinan, Jinan, Shandong, 250014, CHINA.

Minghan Dong Xiangwei Zheng Xia Zhang Xingyu Zhang Mingzhe Zhang

In the medical field, endoscopic video analysis is crucial for disease diagnosis and minimally invasive surgery. The Endoscopic Foundation Models (Endo- FM) utilize large-scale self-supervised pre-training on endoscopic video data and leverage video transformer models to capture long-range spatiotemporal dependencies. However, detecting complex lesions such as gastrointestinal metaplasia (GIM) in endoscopic videos remains challenging due to unclear boundaries and indistinct features, and Endo-FM has not demonstrated good performance.

View Article and Find Full Text PDF

Similar Publications

A Combined Frame Difference and Convolution Method for Moving Vehicle Detection in Satellite Videos.

Sensors (Basel)

January 2025

Faculty of Land and Resources Engineering, Kunming University of Science and Technology, Kunming 650093, China.

Xin Luo Jiatian Li Xiaohui A Yuxi Deng

To address the challenges of missed detections caused by insufficient shape and texture features and blurred boundaries in existing detection methods, this paper introduces a novel moving vehicle detection approach for satellite videos. The proposed method leverages frame difference and convolution to effectively integrate spatiotemporal information. First, a frame difference module (FDM) is designed, combining frame difference and convolution.

View Article and Find Full Text PDF

Similar Publications

Towards Quality Assessment for Arbitrary Translational 6DoF Video: Subjective Quality Database and Objective Assessment Metric.

Entropy (Basel)

January 2025

Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China.

Chongchong Jin Yeyao Chen

Arbitrary translational Six Degrees of Freedom (6DoF) video represents a transitional stage towards immersive terminal videos, allowing users to freely switch viewpoints for a 3D scene experience. However, the increased freedom of movement introduces new distortions that significantly impact human visual perception quality. Therefore, it is crucial to explore quality assessment (QA) to validate its application feasibility.

View Article and Find Full Text PDF

Similar Publications

Neural processing of naturalistic audiovisual events in space and time.

Commun Biol

January 2025

Western Institute for Neuroscience, Western University, London, ON, Canada.

Yu Hu Yalda Mohsenzadeh

Our brain seamlessly integrates distinct sensory information to form a coherent percept. However, when real-world audiovisual events are perceived, the specific brain regions and timings for processing different levels of information remain less investigated. To address that, we curated naturalistic videos and recorded functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) data when participants viewed videos with accompanying sounds.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!