Existing end-to-end speech recognition methods typically employ hybrid decoders based on CTC and Transformer. However, the issue of error accumulation in these hybrid decoders hinders further improvements in accuracy. Additionally, most existing models are built upon Transformer architecture, which tends to be complex and unfriendly to small datasets. Hence, we propose a Nonlinear Regularization Decoding Method for Speech Recognition. Firstly, we introduce the nonlinear Transformer decoder, breaking away from traditional left-to-right or right-to-left decoding orders and enabling associations between any characters, mitigating the limitations of Transformer architectures on small datasets. Secondly, we propose a novel regularization attention module to optimize the attention score matrix, reducing the impact of early errors on later outputs. Finally, we introduce the tiny model to address the challenge of overly large model parameters. The experimental results indicate that our model demonstrates good performance. Compared to the baseline, our model achieves recognition improvements of 0.12%, 0.54%, 0.51%, and 1.2% on the Aishell1, Primewords, Free ST Chinese Corpus, and Common Voice 16.1 datasets of Uyghur, respectively.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11207489 | PMC |
http://dx.doi.org/10.3390/s24123846 | DOI Listing |
Perspect ASHA Spec Interest Groups
December 2024
DeVault Otologic Research Laboratory, Department of Otolaryngology-Head and Neck Surgery, Indiana University School of Medicine, Indianapolis.
Purpose: Cochlear implants (CIs) have improved the quality of life for many children with severe-to-profound sensorineural hearing loss. Despite the reported CI benefits of improved speech recognition, speech intelligibility, and spoken language processing, large individual differences in speech and language outcomes are still consistently reported in the literature. The enormous variability in CI outcomes has made it challenging to predict which children may be at high risk for limited benefits and how potential risk factors can be improved with interventions.
View Article and Find Full Text PDFJ Commun Disord
January 2025
School of Foreign Studies, China University of Petroleum (East China), Qingdao, China. Electronic address:
Introduction: It is still under debate whether and how semantic content will modulate the emotional prosody perception in children with autism spectrum disorder (ASD). The current study aimed to investigate the issue using two experiments by systematically manipulating semantic information in Chinese disyllabic words.
Method: The present study explored the potential modulation of semantic content complexity on emotional prosody perception in Mandarin-speaking children with ASD.
Plast Reconstr Surg Glob Open
January 2025
Department of Computer Science, Johns Hopkins University, Baltimore, MD.
Artificial intelligence (AI) scribe applications in the healthcare community are in the early adoption phase and offer unprecedented efficiency for medical documentation. They typically use an application programming interface with a large language model (LLM), for example, generative pretrained transformer 4. They use automatic speech recognition on the physician-patient interaction, generating a full medical note for the encounter, together with a draft follow-up e-mail for the patient and, often, recommendations, all within seconds or minutes.
View Article and Find Full Text PDFLight Sci Appl
January 2025
Spin-Optics laboratory, St. Petersburg State University, St. Petersburg, 198504, Russia.
We introduce a novel neuromorphic network architecture based on a lattice of exciton-polariton condensates, intricately interconnected and energized through nonresonant optical pumping. The network employs a binary framework, where each neuron, facilitated by the spatial coherence of pairwise coupled condensates, performs binary operations. This coherence, emerging from the ballistic propagation of polaritons, ensures efficient, network-wide communication.
View Article and Find Full Text PDFTrends Hear
January 2025
Department of Otolaryngology - Head & Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA.
When listening to speech under adverse conditions, listeners compensate using neurocognitive resources. A clinically relevant form of adverse listening is listening through a cochlear implant (CI), which provides a spectrally degraded signal. CI listening is often simulated through noise-vocoding.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!