AI Article Synopsis

  • Sound event localization and detection (SELD) is important for machine listening, focusing on identifying and locating sound events in audio recordings using multi-channel microphones.
  • Previous research has often looked at time-frequency and spatio-temporal correlations separately, which limits their effectiveness in real-world situations.
  • This paper introduces the Spatio-Temporal-Frequency Fusion Network (STFF-Net), which integrates features from spatial, temporal, and frequency domains using advanced techniques, and shows improved performance on benchmark datasets compared to existing methods.

Article Abstract

Sound event localization and detection (SELD) is a crucial component of machine listening that aims to simultaneously identify and localize sound events in multichannel audio recordings. This task demands an integrated analysis of spatial, temporal, and frequency domains to accurately characterize sound events. The spatial domain pertains to the varying acoustic signals captured by multichannel microphones, which are essential for determining the location of sound sources. However, the majority of recent studies have focused on time-frequency correlations and spatio-temporal correlations separately, leading to inadequate performance in real-life scenarios. In this paper, we propose a novel SELD method that utilizes the newly developed Spatio-Temporal-Frequency Fusion Network (STFF-Net) to jointly learn comprehensive features across spatial, temporal, and frequency domains of sound events. The backbone of our STFF-Net is the Enhanced-3D (E3D) residual block, which combines 3D convolutions with a parameter-free attention mechanism to capture and refine the intricate correlations among these domains. Furthermore, our method incorporates the multi-ACCDOA format to effectively handle homogeneous overlaps between sound events. During the evaluation, we conduct extensive experiments on three de facto benchmark datasets, and our results demonstrate that the proposed SELD method significantly outperforms current state-of-the-art approaches.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11436190PMC
http://dx.doi.org/10.3390/s24186090DOI Listing

Publication Analysis

Top Keywords

sound events
16
sound event
8
event localization
8
localization detection
8
spatial temporal
8
temporal frequency
8
frequency domains
8
seld method
8
sound
7
joint spatio-temporal-frequency
4

Similar Publications

Background: Fragile X syndrome (FXS) is a leading known genetic cause of intellectual disability and autism spectrum disorders (ASD)-associated behaviors. A consistent and debilitating phenotype of FXS is auditory hypersensitivity that may lead to delayed language and high anxiety. Consistent with findings in FXS human studies, the mouse model of FXS, the Fmr1 knock out (KO) mouse, shows auditory hypersensitivity and temporal processing deficits.

View Article and Find Full Text PDF

High-energy nuclear collisions create a quark-gluon plasma, whose initial condition and subsequent expansion vary from event to event, impacting the distribution of the eventwise average transverse momentum [P([p_{T}])]. Disentangling the contributions from fluctuations in the nuclear overlap size (geometrical component) and other sources at a fixed size (intrinsic component) remains a challenge. This problem is addressed by measuring the mean, variance, and skewness of P([p_{T}]) in ^{208}Pb+^{208}Pb and ^{129}Xe+^{129}Xe collisions at sqrt[s_{NN}]=5.

View Article and Find Full Text PDF

Evaluation of Peer Review of Percutaneous Coronary Intervention Operator Performance.

Circ Cardiovasc Qual Outcomes

January 2025

Division of Cardiology, Department of Medicine, University of Washington, Seattle (J.A.D., E.J.S., D.H.A.).

Background: Case-based peer review of percutaneous coronary intervention (PCI) is used by many hospitals for quality improvement and to make decisions regarding physician competency. However, there are no studies testing the reliability or validity of peer review for PCI performance evaluation.

Methods: We recruited interventional cardiologists from 12 Veterans Affairs Health System facilities throughout the United States to provide PCI cases for review.

View Article and Find Full Text PDF

Objective digital measurement of gamblers visiting gambling venues is conducted using cashless cards and facial recognition systems, but these methods are confined within a single gambling venue. Hence, we propose an objective digital measurement method using a transformer, a state-of-the-art machine learning approach, to detect total gambling venue visitations for gamblers who visit multiple gambling venues using sounds in gamblers' environments. We sampled gambling and nongambling event datasets from websites to create a gambling play classifier.

View Article and Find Full Text PDF

In the search for the neural correlates of auditory consciousness, a candidate has been found using electroencephalography: the auditory awareness negativity (AAN). Earlier studies have investigated the AAN in response to lateralized sound. With headphones, there is a clear lateralization of AAN when two auditory lateralization cues are combined: the interaural level difference (ILD) and interaural time difference (ITD).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!