Because of their simple design structure, end-to-end deep learning (E2E-DL) models have gained a lot of attention for speech enhancement. A number of DL models have achieved excellent results in eliminating the background noise and enhancing the quality as well as the intelligibility of noisy speech. Designing resource-efficient and compact models during real-time processing is still a key challenge. In order to enhance the accomplishment of E2E models, the sequential and local characteristics of speech signal should be efficiently taken into consideration while modeling. In this paper, we present resource-efficient and compact neural models for end-to-end noise-robust waveform-based speech enhancement. Combining the Convolutional Encode-Decoder (CED) and Recurrent Neural Networks (RNNs) in the Convolutional Recurrent Network (CRN) framework, we have aimed at different speech enhancement systems. Different noise types and speakers are used to train and test the proposed models. With LibriSpeech and the DEMAND dataset, the experiments show that the proposed models lead to improved quality and intelligibility with fewer trainable parameters, notably reduced model complexity, and inference time than existing recurrent and convolutional models. The quality and intelligibility are improved by 31.61% and 17.18% over the noisy speech. We further performed cross corpus analysis to demonstrate the generalization of the proposed E2E SE models across different speech datasets.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9611713PMC
http://dx.doi.org/10.3390/s22207782DOI Listing

Publication Analysis

Top Keywords

speech enhancement
16
models
10
end-to-end deep
8
convolutional recurrent
8
speech
8
noisy speech
8
resource-efficient compact
8
e2e models
8
proposed models
8
quality intelligibility
8

Similar Publications

Systematic Review of EEG-Based Imagined Speech Classification Methods.

Sensors (Basel)

December 2024

Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia.

This systematic review examines EEG-based imagined speech classification, emphasizing directional words essential for development in the brain-computer interface (BCI). This study employed a structured methodology to analyze approaches using public datasets, ensuring systematic evaluation and validation of results. This review highlights the feature extraction techniques that are pivotal to classification performance.

View Article and Find Full Text PDF

Dialogue systems must understand children's utterance intentions by considering their unique linguistic characteristics, such as syntactic incompleteness, pronunciation inaccuracies, and creative expressions, to enable natural conversational engagement in child-robot interactions. Even state-of-the-art large language models (LLMs) for language understanding and contextual awareness cannot comprehend children's intent as accurately as humans because of their distinctive features. An LLM-based dialogue system should acquire the manner by which humans understand children's speech to enhance its intention reasoning performance in verbal interactions with children.

View Article and Find Full Text PDF

A specific deletion on the short arm of chromosome 5 (5p) is the hallmark of the rare genetic syndrome called Cri du Chat Syndrome (CdCS). It causes severe difficulty with swallowing, speech, motor skills, and cognitive deficiencies. These arise from characteristic laryngeal abnormalities and oral-motor dysfunctions.

View Article and Find Full Text PDF

This systematic review of neuropsychological rehabilitation strategies for primary progressive aphasia will consider recent developments in cognitive neuroscience, especially neuroimaging techniques such as EEG and fMRI, to outline how these tools might be integrated into clinical practice to maximize treatment outcomes. A systematic search of peer-reviewed literature from the last decade was performed following the PRISMA guidelines across multiple databases. A total of 63 studies were included, guided by predefined inclusion and exclusion criteria, with a focus on cognitive and language rehabilitation in PPA, interventions guided by neuroimaging, and mechanisms of neuroplasticity.

View Article and Find Full Text PDF

Intelligibility Sound Therapy Enhances the Ability of Speech-in-Noise Perception and Pre-Perceptual Neurophysiological Response.

Biology (Basel)

December 2024

Department of Otorhinolaryngology, Head and Neck Surgery, Graduate School of Biomedical Sciences, Hiroshima University, Kasumi 1-2-3, Minami-ku, Hiroshima 734-8551, Japan.

Aural rehabilitation with hearing aids can decrease the attentional requirements of cognitive resources by amplifying deteriorated-frequency sound in hearing loss patients and improving auditory discrimination ability like speech-in-noise perception. As aural rehabilitation with an intelligible-hearing sound also can be hopeful, the aim of this study was to evaluate the effectiveness of aural rehabilitation with intelligible-hearing sound for hearing loss patients. Adult native Japanese speakers (17 males and 23 females, 68.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!