The complexity of polyphonic sounds imposes numerous challenges on their classification. Especially in real life, polyphonic sound events have discontinuity and unstable time-frequency variations. Traditional single acoustic features cannot characterize the key feature information of the polyphonic sound event, and this deficiency results in poor model classification performance. In this paper, we propose a convolutional recurrent neural network model based on the temporal-frequency (TF) attention mechanism and feature space (FS) attention mechanism (TFFS-CRNN). The TFFS-CRNN model aggregates Log-Mel spectrograms and MFCCs feature as inputs, which contains the TF-attention module, the convolutional recurrent neural network (CRNN) module, the FS-attention module and the bidirectional gated recurrent unit (BGRU) module. In polyphonic sound events detection (SED), the TF-attention module can capture the critical temporal-frequency features more capably. The FS-attention module assigns different dynamically learnable weights to different dimensions of features. The TFFS-CRNN model improves the characterization of features for key feature information in polyphonic SED. By using two attention modules, the model can focus on semantically relevant time frames, key frequency bands, and important feature spaces. Finally, the BGRU module learns contextual information. The experiments were conducted on the DCASE 2016 Task3 dataset and the DCASE 2017 Task3 dataset. Experimental results show that the F1-score of the TFFS-CRNN model improved 12.4% and 25.2% compared with winning system models in DCASE challenge; the ER is reduced by 0.41 and 0.37 as well. The proposed TFFS-CRNN model algorithm has better classification performance and lower ER in polyphonic SED.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9503981 | PMC |
http://dx.doi.org/10.3390/s22186818 | DOI Listing |
PLoS One
July 2024
AUDIAS Research Group, Escuela Politécnica Superior, Universidad Autónoma de Madrid, Madrid, Spain.
In recent years, the relation between Sound Event Detection (SED) and Source Separation (SSep) has received a growing interest, in particular, with the aim to enhance the performance of SED by leveraging the synergies between both tasks. In this paper, we present a detailed description of JSS (Joint Source Separation and Sound Event Detection), our joint-training scheme for SSep and SED, and we measure its performance in the DCASE Challenge for SED in domestic environments. Our experiments demonstrate that JSS can improve SED performance, in terms of Polyphonic Sound Detection Score (PSDS), even without additional training data.
View Article and Find Full Text PDFPhys Rev E
April 2024
Department of Physics and Center for Soft Matter Research, New York University, New York, New York 10003, USA.
Acoustic trapping uses forces exerted by sound waves to transport small objects along specified trajectories in three dimensions. The structure of the time-averaged acoustic force landscape acting on an object is determined by the amplitude and phase profiles of the sound's pressure wave. These profiles typically are sculpted by deliberately selecting the amplitude and relative phase of the sound projected by each transducer in large arrays of transducers, all operating at the same carrier frequency.
View Article and Find Full Text PDFPLoS One
April 2024
MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, Australia.
Music ensemble performance provides an ecologically valid context for investigating leadership dynamics in small group interactions. Musical texture, specifically the relative salience of simultaneously sounding ensemble parts, is a feature that can potentially alter leadership dynamics by introducing hierarchical relationships between individual parts. The present study extended previous work on quantifying interpersonal coupling in musical ensembles by examining the relationship between musical texture and leader-follower relations, operationalised as directionality of influence between co-performers' body motion in concert video recordings.
View Article and Find Full Text PDFMath Biosci Eng
January 2024
Institute of Intelligent Manufacturing, Guangdong Academy of Science, Guangdong Key Laboratory of Modern Control Technology, Guangzhou 510030, China.
Sound event localization and detection have been applied in various fields. Due to the polyphony and noise interference, it becomes challenging to accurately predict the sound event and their occurrence locations. Aiming at this problem, we propose a Multiple Attention Fusion ResNet, which uses ResNet34 as the base network.
View Article and Find Full Text PDFJ Acoust Soc Am
October 2023
School of Physics, Engineering and Technology (Retired), University of York, York, YO10 5DD, United Kingdom.
Multiple fundamental frequency estimation has been extensively used in applications such as melody extraction, music transcription, instrument identification, and source separation. This paper presents an approach based on the iterative detection and extraction of note events, which are considered to be harmonic sounds characterised by a continuous pitch trajectory. Note events are assumed to be associated with musical notes being played by a single instrument, and their pitch trajectories are iteratively estimated.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!