Hierarchical-Concatenate Fusion TDNN for sound event classification.

PLoS One

School of Information Science and Engineering, Shenyang University of Technology, Shenyang City, Liaoning Province, China.

Published: October 2024

Semantic feature combination/parsing issue is one of the key problems in sound event classification for acoustic scene analysis, environmental sound monitoring, and urban soundscape analysis. The input audio signal in the acoustic scene classification is composed of multiple acoustic events, which usually leads to low recognition rate in complex environments. To address this issue, this paper proposes the Hierarchical-Concatenate Fusion(HCF)-TDNN model by adding HCF Module to ECAPA-TDNN model for sound event classification. In the HCF module, firstly, the audio signal is converted into two-dimensional time-frequency features for segmentation. Then, the segmented features are convolved one by one for improving the small receptive field in perceiving details. Finally, after the convolution is completed, the two adjacent parts are combined before proceeding with the next convolution for enlarging the receptive field in capturing large targets. Therefore, the improved model further enhances the scalability by emphasizing channel attention and efficient propagation and aggregation of feature information. The proposed model is trained and validated on the Urbansound8K dataset. The experimental results show that the proposed model can achieve the best classification accuracy of 95.83%, which is an approximate improvement of 5% (relatively) over the ECAPA-TDNN model.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11527289PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0312998PLOS

Publication Analysis

Top Keywords

sound event
12
event classification
12
acoustic scene
8
audio signal
8
hcf module
8
ecapa-tdnn model
8
receptive field
8
proposed model
8
model
6
classification
5

Similar Publications

Direct Pathway Neurons in the Mouse Ventral Striatum Are Active During Goal-Directed Action but Not Reward Consumption During Operant Conditioning.

Biomedicines

December 2024

Department of Psychiatry, Division of Molecular Therapeutics, New York State Psychiatric Institute, Columbia University, New York, NY 10032, USA.

Background/objectives: Learning is classically modeled to consist of an acquisition period followed by a mastery period when the skill no longer requires conscious control and becomes automatic. Dopamine neurons projecting to the ventral striatum (VS) produce a teaching signal that shifts from responding to rewarding or aversive events to anticipating cues, thus facilitating learning. However, the role of the dopamine-receptive neurons in the ventral striatum, particularly in encoding decision-making processes, remains less understood.

View Article and Find Full Text PDF

Background: The ability to passively and continuously monitor coughing for prolonged periods of time would significantly improve cough management and research. To date there is no automated clinically validated cough monitor that can be routinely used in clinical care and research. Here we describe the validation of such an automated cough monitor.

View Article and Find Full Text PDF

The story of David Ferrier's demonstration at the International Medical Congress in London in August 1881 of a monkey experimentally rendered hemiplegic by a focal surgical brain lesion-prompting Charcot's observation, "C'est un malade!"-is well known as a seminal event in the history of the localization of functions in the cerebral cortex. Less well known is the fact that, on the same occasion, Ferrier demonstrated a second monkey, known as monkey F, apparently deaf as a consequence of bilateral temporo-sphenoidal brain lesions. The purpose of this article is, first, to give a chronological account of this demonstration and subsequent related events, including Ferrier's trial under the Vivisection Act, the publication of the pathological findings in the animal's brain, the dispute about the localization of the "auditory centre" with Edward Schäfer, and the first glimmerings of human homologues of cortical deafness.

View Article and Find Full Text PDF

Background And Aims: Enhanced recovery after surgery (ERAS) has been applied in various laparoscopic procedures. Intravenous lidocaine (IVL) infusion is used for laparoscopic procedures as a part of ERAS protocols. The study aimed to evaluate the role of IVL infusion in enhanced bowel recovery after laparoscopic renal surgeries.

View Article and Find Full Text PDF

Background: Fragile X syndrome (FXS) is a leading known genetic cause of intellectual disability and autism spectrum disorders (ASD)-associated behaviors. A consistent and debilitating phenotype of FXS is auditory hypersensitivity that may lead to delayed language and high anxiety. Consistent with findings in FXS human studies, the mouse model of FXS, the Fmr1 knock out (KO) mouse, shows auditory hypersensitivity and temporal processing deficits.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!