Audiovisual Moments in Time: A large-scale annotated dataset of audiovisual actions.

PLoS One

Computational Neuroscience and Cognitive Robotics Centre, University of Birmingham, Birmingham, United Kingdom.

Published: April 2024

We present Audiovisual Moments in Time (AVMIT), a large-scale dataset of audiovisual action events. In an extensive annotation task 11 participants labelled a subset of 3-second audiovisual videos from the Moments in Time dataset (MIT). For each trial, participants assessed whether the labelled audiovisual action event was present and whether it was the most prominent feature of the video. The dataset includes the annotation of 57,177 audiovisual videos, each independently evaluated by 3 of 11 trained participants. From this initial collection, we created a curated test set of 16 distinct action classes, with 60 videos each (960 videos). We also offer 2 sets of pre-computed audiovisual feature embeddings, using VGGish/YamNet for audio data and VGG16/EfficientNetB0 for visual data, thereby lowering the barrier to entry for audiovisual DNN research. We explored the advantages of AVMIT annotations and feature embeddings to improve performance on audiovisual event recognition. A series of 6 Recurrent Neural Networks (RNNs) were trained on either AVMIT-filtered audiovisual events or modality-agnostic events from MIT, and then tested on our audiovisual test set. In all RNNs, top 1 accuracy was increased by 2.71-5.94% by training exclusively on audiovisual events, even outweighing a three-fold increase in training data. Additionally, we introduce the Supervised Audiovisual Correspondence (SAVC) task whereby a classifier must discern whether audio and visual streams correspond to the same action label. We trained 6 RNNs on the SAVC task, with or without AVMIT-filtering, to explore whether AVMIT is helpful for cross-modal learning. In all RNNs, accuracy improved by 2.09-19.16% with AVMIT-filtered data. We anticipate that the newly annotated AVMIT dataset will serve as a valuable resource for research and comparative experiments involving computational models and human participants, specifically when addressing research questions where audiovisual correspondence is of critical importance.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10984512PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0301098PLOS

Publication Analysis

Top Keywords

audiovisual
15
moments time
12
audiovisual moments
8
dataset audiovisual
8
audiovisual action
8
audiovisual videos
8
test set
8
feature embeddings
8
audiovisual events
8
audiovisual correspondence
8

Similar Publications

Introduction: Traumatic injuries are a significant public health concern globally, resulting in substantial mortality, hospitalisation and healthcare burden. Despite the establishment of specialised trauma centres, there remains considerable variability in trauma-care practices and outcomes, particularly in the initial phase of trauma resuscitation in the trauma bay. This stage is prone to preventable errors leading to adverse events (AEs) that can impact patient outcomes.

View Article and Find Full Text PDF

Travel restrictions during the novel coronavirus, SARS-CoV-2 (COVID-19) public health emergency affected the U.S. Food and Drug Administration's (FDA) ability to conduct on-site bioavailability/bioequivalence (BA/BE) and Good Laboratory Practice (GLP) nonclinical inspections.

View Article and Find Full Text PDF

A comprehensive analysis of everyday sound perception can be achieved using Electroencephalography (EEG) with the concurrent acquisition of information about the environment. While extensive research has been dedicated to speech perception, the complexities of auditory perception within everyday environments, specifically the types of information and the key features to extract, remain less explored. Our study aims to systematically investigate the relevance of different feature categories: discrete sound-identity markers, general cognitive state information, and acoustic representations, including discrete sound onset, the envelope, and mel-spectrogram.

View Article and Find Full Text PDF

Audio-visual concert performances synchronize audience's heart rates.

Ann N Y Acad Sci

January 2025

Department of Neuropsychology and Psychopharmacology, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands.

People enjoy engaging with music. Live music concerts provide an excellent option to investigate real-world music experiences, and at the same time, use neurophysiological synchrony to assess dynamic engagement. In the current study, we assessed engagement in a live concert setting using synchrony of cardiorespiratory measures, comparing inter-subject, stimulus-response, correlation, and phase coherence.

View Article and Find Full Text PDF

Background: Understanding ICU nurses' experiences in caring for patients with intellectual developmental disabilities is crucial. Insights can inform supportive measures and training programs to enhance nurse well-being and patient population-specific outcomes.

Objective: The primary objective of this study was to explore and understand the lived experiences of nurses caring for patients with intellectual developmental disabilities.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!