Nonlinear Regularization Decoding Method for Speech Recognition.

Sensors (Basel)

College of Computer Science and Technology, Xinjiang University, Urumqi 830017, China.

Published: June 2024

Existing end-to-end speech recognition methods typically employ hybrid decoders based on CTC and Transformer. However, the issue of error accumulation in these hybrid decoders hinders further improvements in accuracy. Additionally, most existing models are built upon Transformer architecture, which tends to be complex and unfriendly to small datasets. Hence, we propose a Nonlinear Regularization Decoding Method for Speech Recognition. Firstly, we introduce the nonlinear Transformer decoder, breaking away from traditional left-to-right or right-to-left decoding orders and enabling associations between any characters, mitigating the limitations of Transformer architectures on small datasets. Secondly, we propose a novel regularization attention module to optimize the attention score matrix, reducing the impact of early errors on later outputs. Finally, we introduce the tiny model to address the challenge of overly large model parameters. The experimental results indicate that our model demonstrates good performance. Compared to the baseline, our model achieves recognition improvements of 0.12%, 0.54%, 0.51%, and 1.2% on the Aishell1, Primewords, Free ST Chinese Corpus, and Common Voice 16.1 datasets of Uyghur, respectively.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11207489PMC
http://dx.doi.org/10.3390/s24123846DOI Listing

Publication Analysis

Top Keywords

speech recognition
12
nonlinear regularization
8
regularization decoding
8
decoding method
8
method speech
8
hybrid decoders
8
small datasets
8
recognition
4
recognition existing
4
existing end-to-end
4

Similar Publications

Some Challenging Questions About Outcomes in Children With Cochlear Implants.

Perspect ASHA Spec Interest Groups

December 2024

DeVault Otologic Research Laboratory, Department of Otolaryngology-Head and Neck Surgery, Indiana University School of Medicine, Indianapolis.

Purpose: Cochlear implants (CIs) have improved the quality of life for many children with severe-to-profound sensorineural hearing loss. Despite the reported CI benefits of improved speech recognition, speech intelligibility, and spoken language processing, large individual differences in speech and language outcomes are still consistently reported in the literature. The enormous variability in CI outcomes has made it challenging to predict which children may be at high risk for limited benefits and how potential risk factors can be improved with interventions.

View Article and Find Full Text PDF

Introduction: It is still under debate whether and how semantic content will modulate the emotional prosody perception in children with autism spectrum disorder (ASD). The current study aimed to investigate the issue using two experiments by systematically manipulating semantic information in Chinese disyllabic words.

Method: The present study explored the potential modulation of semantic content complexity on emotional prosody perception in Mandarin-speaking children with ASD.

View Article and Find Full Text PDF

Artificial intelligence (AI) scribe applications in the healthcare community are in the early adoption phase and offer unprecedented efficiency for medical documentation. They typically use an application programming interface with a large language model (LLM), for example, generative pretrained transformer 4. They use automatic speech recognition on the physician-patient interaction, generating a full medical note for the encounter, together with a draft follow-up e-mail for the patient and, often, recommendations, all within seconds or minutes.

View Article and Find Full Text PDF

Polariton lattices as binarized neuromorphic networks.

Light Sci Appl

January 2025

Spin-Optics laboratory, St. Petersburg State University, St. Petersburg, 198504, Russia.

We introduce a novel neuromorphic network architecture based on a lattice of exciton-polariton condensates, intricately interconnected and energized through nonresonant optical pumping. The network employs a binary framework, where each neuron, facilitated by the spatial coherence of pairwise coupled condensates, performs binary operations. This coherence, emerging from the ballistic propagation of polaritons, ensures efficient, network-wide communication.

View Article and Find Full Text PDF

When listening to speech under adverse conditions, listeners compensate using neurocognitive resources. A clinically relevant form of adverse listening is listening through a cochlear implant (CI), which provides a spectrally degraded signal. CI listening is often simulated through noise-vocoding.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!