Given a query image containing the object of interest (OOI), we propose a novel learning framework for retrieving relevant frames from the input video sequence. While techniques based on object matching have been applied to solve this task, their performance would be typically limited due to the lack of capabilities in handling variations in visual appearances of the OOI across video frames. Our proposed framework can be viewed as a weakly supervised approach, which only requires a small number of (randomly selected) relevant and irrelevant frames from the input video for performing satisfactory retrieval performance. By utilizing frame-level label information of such video frames together with the query image, we propose a novel query-adaptive multiple instance learning algorithm, which exploits the visual appearance information of the OOI from the query and that of the aforementioned video frames. As a result, the derived learning model would exhibit additional discriminating abilities while retrieving relevant instances. Experiments on two real-world video data sets would confirm the effectiveness and robustness of our proposed approach.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2015.2403236DOI Listing

Publication Analysis

Top Keywords

video frames
12
query-adaptive multiple
8
multiple instance
8
instance learning
8
query image
8
propose novel
8
retrieving relevant
8
frames input
8
input video
8
video
7

Similar Publications

To address the challenges of missed detections caused by insufficient shape and texture features and blurred boundaries in existing detection methods, this paper introduces a novel moving vehicle detection approach for satellite videos. The proposed method leverages frame difference and convolution to effectively integrate spatiotemporal information. First, a frame difference module (FDM) is designed, combining frame difference and convolution.

View Article and Find Full Text PDF

SSIM over MSE: A new perspective for video anomaly detection.

Neural Netw

January 2025

Department of Computing, Macquarie University, Sydney, 4627345, New South Wales, Australia.

Video anomaly detection plays a crucial role in ensuring public safety. Its goal is to detect abnormal patterns contained in video frames. Most existing models distinguish the anomalies based on the Mean Squared Error (MSE), which is hard to align with human perception, resulting in discrepancies between model-detected anomalies and those recognized by humans.

View Article and Find Full Text PDF

Instance segmentation of surgical instruments is a long-standing research problem, crucial for the development of many applications for computer-assisted surgery. This problem is commonly tackled via fully-supervised training of deep learning models, requiring expensive pixel-level annotations to train. In this work, we develop a framework for instance segmentation not relying on spatial annotations for training.

View Article and Find Full Text PDF

When undergoing or about to undergo a needle-related procedure, most people are not aware of the adverse emotional and physical reactions (so-called vasovagal reactions; VVR), that might occur. Thus, rather than relying on self-report measurements, we investigate whether we can predict VVR levels from the video sequence containing facial information measured during the blood donation. We filmed 287 blood donors throughout the blood donation procedure where we obtained 1945 videos for data analysis.

View Article and Find Full Text PDF

Blink detection is considered a useful indicator both for clinical conditions and drowsiness state. In this work, we propose and compare deep learning architectures for the task of detecting blinks in video frame sequences. The first step is the training and application of an eye detector that extracts the eye regions from each video frame.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!