We propose a new statistical generative model for spatiotemporal video segmentation. The objective is to partition a video sequence into homogeneous segments that can be used as "building blocks" for semantic video segmentation. The baseline framework is a Gaussian mixture model (GMM)-based video modeling approach that involves a six-dimensional spatiotemporal feature space. Specifically, we introduce the concept of frame saliency to quantify the relevancy of a video frame to the GMM-based spatiotemporal video modeling. This helps us use a small set of salient frames to facilitate the model training by reducing data redundancy and irrelevance. A modified expectation maximization algorithm is developed for simultaneous GMM training and frame saliency estimation, and the frames with the highest saliency values are extracted to refine the GMM estimation for video segmentation. Moreover, it is interesting to find that frame saliency can imply some object behaviors. This makes the proposed method also applicable to other frame-related video analysis tasks, such as key-frame extraction, video skimming, etc. Experiments on real videos demonstrate the effectiveness and efficiency of the proposed method.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/tip.2007.908283 | DOI Listing |
J Opt Soc Am A Opt Image Sci Vis
August 2024
Compressed ultrafast photography (CUP) is a high-speed imaging technique with a frame rate of up to ten trillion frames per second (fps) and a sequence depth of hundreds of frames. This technique is a powerful tool for investigating ultrafast processes. However, since the reconstruction process is an ill-posed problem, the image reconstruction will be more difficult with the increase of the number of reconstruction frames and the number of pixels of each reconstruction frame.
View Article and Find Full Text PDFBiomed Phys Eng Express
January 2025
Shandong Normal University, Jinan, Jinan, Shandong, 250014, CHINA.
In the medical field, endoscopic video analysis is crucial for disease diagnosis and minimally invasive surgery. The Endoscopic Foundation Models (Endo- FM) utilize large-scale self-supervised pre-training on endoscopic video data and leverage video transformer models to capture long-range spatiotemporal dependencies. However, detecting complex lesions such as gastrointestinal metaplasia (GIM) in endoscopic videos remains challenging due to unclear boundaries and indistinct features, and Endo-FM has not demonstrated good performance.
View Article and Find Full Text PDFSensors (Basel)
January 2025
Faculty of Land and Resources Engineering, Kunming University of Science and Technology, Kunming 650093, China.
To address the challenges of missed detections caused by insufficient shape and texture features and blurred boundaries in existing detection methods, this paper introduces a novel moving vehicle detection approach for satellite videos. The proposed method leverages frame difference and convolution to effectively integrate spatiotemporal information. First, a frame difference module (FDM) is designed, combining frame difference and convolution.
View Article and Find Full Text PDFEntropy (Basel)
January 2025
Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China.
Arbitrary translational Six Degrees of Freedom (6DoF) video represents a transitional stage towards immersive terminal videos, allowing users to freely switch viewpoints for a 3D scene experience. However, the increased freedom of movement introduces new distortions that significantly impact human visual perception quality. Therefore, it is crucial to explore quality assessment (QA) to validate its application feasibility.
View Article and Find Full Text PDFCommun Biol
January 2025
Western Institute for Neuroscience, Western University, London, ON, Canada.
Our brain seamlessly integrates distinct sensory information to form a coherent percept. However, when real-world audiovisual events are perceived, the specific brain regions and timings for processing different levels of information remain less investigated. To address that, we curated naturalistic videos and recorded functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) data when participants viewed videos with accompanying sounds.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!