Nonlinear Regularization Decoding Method for Speech Recognition.

Jiang Zhang Liejun Wang Yinfeng Yu Miaomiao Xu

Sensors (Basel)

College of Computer Science and Technology, Xinjiang University, Urumqi 830017, China.

Published: June 2024

Existing end-to-end speech recognition methods typically employ hybrid decoders based on CTC and Transformer. However, the issue of error accumulation in these hybrid decoders hinders further improvements in accuracy. Additionally, most existing models are built upon Transformer architecture, which tends to be complex and unfriendly to small datasets. Hence, we propose a Nonlinear Regularization Decoding Method for Speech Recognition. Firstly, we introduce the nonlinear Transformer decoder, breaking away from traditional left-to-right or right-to-left decoding orders and enabling associations between any characters, mitigating the limitations of Transformer architectures on small datasets. Secondly, we propose a novel regularization attention module to optimize the attention score matrix, reducing the impact of early errors on later outputs. Finally, we introduce the tiny model to address the challenge of overly large model parameters. The experimental results indicate that our model demonstrates good performance. Compared to the baseline, our model achieves recognition improvements of 0.12%, 0.54%, 0.51%, and 1.2% on the Aishell1, Primewords, Free ST Chinese Corpus, and Common Voice 16.1 datasets of Uyghur, respectively.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11207489	PMC
http://dx.doi.org/10.3390/s24123846	DOI Listing

Publication Analysis

Top Keywords

speech recognition

nonlinear regularization

regularization decoding

decoding method

method speech

hybrid decoders

small datasets

recognition

recognition existing

existing end-to-end

Similar Publications

Some Challenging Questions About Outcomes in Children With Cochlear Implants.

Perspect ASHA Spec Interest Groups

December 2024

DeVault Otologic Research Laboratory, Department of Otolaryngology-Head and Neck Surgery, Indiana University School of Medicine, Indianapolis.

Susan T Sehgal Irina Castellanos William G Kronenberger David B Pisoni

Purpose: Cochlear implants (CIs) have improved the quality of life for many children with severe-to-profound sensorineural hearing loss. Despite the reported CI benefits of improved speech recognition, speech intelligibility, and spoken language processing, large individual differences in speech and language outcomes are still consistently reported in the literature. The enormous variability in CI outcomes has made it challenging to predict which children may be at high risk for limited benefits and how potential risk factors can be improved with interventions.

View Article and Find Full Text PDF

Similar Publications

Is simpler better? Semantic content modulates the emotional prosody perception in Mandarin-speaking children with autism spectrum disorder.

J Commun Disord

January 2025

School of Foreign Studies, China University of Petroleum (East China), Qingdao, China. Electronic address:

Ting Wang Li Xia Lulu Cheng

Introduction: It is still under debate whether and how semantic content will modulate the emotional prosody perception in children with autism spectrum disorder (ASD). The current study aimed to investigate the issue using two experiments by systematically manipulating semantic information in Chinese disyllabic words.

Method: The present study explored the potential modulation of semantic content complexity on emotional prosody perception in Mandarin-speaking children with ASD.

View Article and Find Full Text PDF

Similar Publications

Artificial Intelligence Scribe and Large Language Model Technology in Healthcare Documentation: Advantages, Limitations, and Recommendations.

Plast Reconstr Surg Glob Open

January 2025

Department of Computer Science, Johns Hopkins University, Baltimore, MD.

Sarah A Mess Alison J Mackey David E Yarowsky

Artificial intelligence (AI) scribe applications in the healthcare community are in the early adoption phase and offer unprecedented efficiency for medical documentation. They typically use an application programming interface with a large language model (LLM), for example, generative pretrained transformer 4. They use automatic speech recognition on the physician-patient interaction, generating a full medical note for the encounter, together with a draft follow-up e-mail for the patient and, often, recommendations, all within seconds or minutes.

View Article and Find Full Text PDF

Similar Publications

Polariton lattices as binarized neuromorphic networks.

Light Sci Appl

January 2025

Spin-Optics laboratory, St. Petersburg State University, St. Petersburg, 198504, Russia.

Evgeny Sedov Alexey Kavokin

We introduce a novel neuromorphic network architecture based on a lattice of exciton-polariton condensates, intricately interconnected and energized through nonresonant optical pumping. The network employs a binary framework, where each neuron, facilitated by the spatial coherence of pairwise coupled condensates, performs binary operations. This coherence, emerging from the ballistic propagation of polaritons, ensures efficient, network-wide communication.

View Article and Find Full Text PDF

Similar Publications

Individual Differences in the Recognition of Spectrally Degraded Speech: Associations With Neurocognitive Functions in Adult Cochlear Implant Users and With Noise-Vocoded Simulations.

Trends Hear

January 2025

Department of Otolaryngology - Head & Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA.

Aaron C Moberly Liping Du Terrin N Tamati

When listening to speech under adverse conditions, listeners compensate using neurocognitive resources. A clinically relevant form of adverse listening is listening through a cochlear implant (CI), which provides a spectrally degraded signal. CI listening is often simulated through noise-vocoding.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!