Sound event localization and detection (SELD) is a crucial component of machine listening that aims to simultaneously identify and localize sound events in multichannel audio recordings. This task demands an integrated analysis of spatial, temporal, and frequency domains to accurately characterize sound events. The spatial domain pertains to the varying acoustic signals captured by multichannel microphones, which are essential for determining the location of sound sources. However, the majority of recent studies have focused on time-frequency correlations and spatio-temporal correlations separately, leading to inadequate performance in real-life scenarios. In this paper, we propose a novel SELD method that utilizes the newly developed Spatio-Temporal-Frequency Fusion Network (STFF-Net) to jointly learn comprehensive features across spatial, temporal, and frequency domains of sound events. The backbone of our STFF-Net is the Enhanced-3D (E3D) residual block, which combines 3D convolutions with a parameter-free attention mechanism to capture and refine the intricate correlations among these domains. Furthermore, our method incorporates the multi-ACCDOA format to effectively handle homogeneous overlaps between sound events. During the evaluation, we conduct extensive experiments on three de facto benchmark datasets, and our results demonstrate that the proposed SELD method significantly outperforms current state-of-the-art approaches.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11436190 | PMC |
http://dx.doi.org/10.3390/s24186090 | DOI Listing |
J Neurodev Disord
January 2025
Graduate Neuroscience Program, University of California, Riverside, CA, USA.
Background: Fragile X syndrome (FXS) is a leading known genetic cause of intellectual disability and autism spectrum disorders (ASD)-associated behaviors. A consistent and debilitating phenotype of FXS is auditory hypersensitivity that may lead to delayed language and high anxiety. Consistent with findings in FXS human studies, the mouse model of FXS, the Fmr1 knock out (KO) mouse, shows auditory hypersensitivity and temporal processing deficits.
View Article and Find Full Text PDFPhys Rev Lett
December 2024
CERN, Geneva, Switzerland.
High-energy nuclear collisions create a quark-gluon plasma, whose initial condition and subsequent expansion vary from event to event, impacting the distribution of the eventwise average transverse momentum [P([p_{T}])]. Disentangling the contributions from fluctuations in the nuclear overlap size (geometrical component) and other sources at a fixed size (intrinsic component) remains a challenge. This problem is addressed by measuring the mean, variance, and skewness of P([p_{T}]) in ^{208}Pb+^{208}Pb and ^{129}Xe+^{129}Xe collisions at sqrt[s_{NN}]=5.
View Article and Find Full Text PDFCirc Cardiovasc Qual Outcomes
January 2025
Division of Cardiology, Department of Medicine, University of Washington, Seattle (J.A.D., E.J.S., D.H.A.).
Background: Case-based peer review of percutaneous coronary intervention (PCI) is used by many hospitals for quality improvement and to make decisions regarding physician competency. However, there are no studies testing the reliability or validity of peer review for PCI performance evaluation.
Methods: We recruited interventional cardiologists from 12 Veterans Affairs Health System facilities throughout the United States to provide PCI cases for review.
Sci Rep
January 2025
Institute for the Future of Human Society, Kyoto University, Kyoto, Japan.
Objective digital measurement of gamblers visiting gambling venues is conducted using cashless cards and facial recognition systems, but these methods are confined within a single gambling venue. Hence, we propose an objective digital measurement method using a transformer, a state-of-the-art machine learning approach, to detect total gambling venue visitations for gamblers who visit multiple gambling venues using sounds in gamblers' environments. We sampled gambling and nongambling event datasets from websites to create a gambling play classifier.
View Article and Find Full Text PDFNeuropsychologia
December 2024
Stockholm University, Department of Psychology.
In the search for the neural correlates of auditory consciousness, a candidate has been found using electroencephalography: the auditory awareness negativity (AAN). Earlier studies have investigated the AAN in response to lateralized sound. With headphones, there is a clear lateralization of AAN when two auditory lateralization cues are combined: the interaural level difference (ILD) and interaural time difference (ITD).
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!