The coal mining industry in Northern Shaanxi is robust, with a prevalent use of the local dialect, known as "Shapu", characterized by a distinct Northern Shaanxi accent. This study addresses the practical need for speech recognition in this dialect. We propose an end-to-end speech recognition model for the North Shaanxi dialect, leveraging the Conformer architecture. To tailor the model to the coal mining context, we developed a specialized corpus reflecting the phonetic characteristics of the dialect and its usage in the industry. We investigated feature extraction techniques suitable for the North Shaanxi dialect, focusing on the unique pronunciation of initial consonants and vowels. A preprocessing module was designed to accommodate the dialect's rapid speech tempo and polyphonic nature, enhancing recognition performance. To enhance the decoder's text generation capability, we replaced the Conformer decoder with a Transformer architecture. Additionally, to mitigate the computational demands of the model, we incorporated Connectionist Temporal Classification (CTC) joint training for optimization. The experimental results on our self-established voice dataset for the Northern Shaanxi coal mining industry demonstrate that the proposed Conformer-Transformer-CTC model achieves a 9.2% and 10.3% reduction in the word error rate compared to the standalone Conformer and Transformer models, respectively, confirming the advancement of our method. The next step will involve researching how to improve the performance of dialect speech recognition by integrating external language models and extracting pronunciation features of different dialects, thereby achieving better recognition results.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.3390/s25020341 | DOI Listing |
Sensors (Basel)
January 2025
SHCCIG Yubei Coal Industry Co., Ltd., Xi'an 710900, China.
The coal mining industry in Northern Shaanxi is robust, with a prevalent use of the local dialect, known as "Shapu", characterized by a distinct Northern Shaanxi accent. This study addresses the practical need for speech recognition in this dialect. We propose an end-to-end speech recognition model for the North Shaanxi dialect, leveraging the Conformer architecture.
View Article and Find Full Text PDFJ Clin Med
January 2025
Assistance Publique-Hôpitaux de Paris, Hôpital Bicêtre, Service d'Oto-Rhino-Laryngologie, 78 Rue du Général Leclerc, 94270 Le Kremlin-Bicêtre, France.
Hearing aids (HAs) have been used for standard high-frequency hearing loss and tinnitus, but their effects on speech intelligibility in noise (SIN) in people with normal hearing, including hidden hearing loss (HHL), have been little explored. We included in a prospective cohort study patients who experience poor SIN and have normal pure tone average in quiet conditions or slight HL. We used open-fit HAs.
View Article and Find Full Text PDFGeorgian Med News
November 2024
1Department of Nursing, Hangzhou Geriatric Hospital, Gongshu District, Zhejiang, China.
Objective: The integration of physical therapy (PT), occupational therapy (OT), and speech therapy (ST) into a triple therapy approach has gained recognition in the rehabilitation of patients. The integration of PT-OT-ST triple therapy with accelerated recovery strategies in pulmonary rehabilitation for elderly mechanically ventilated patients is anticipated to overcome the limitations of traditional rehabilitation approaches.
Methods: By applying stringent inclusion and exclusion criteria, a total of 60 elderly patients over 60 years old requiring mechanical ventilation were selected.
Disabil Rehabil Assist Technol
January 2025
School of Rehabilitation Therapy, Queen's University, Kingston, Ontario, Canada.
This article explores the existing research evidence on the potential effectiveness of lipreading as a communication strategy to enhance speech recognition in individuals with hearing impairment. A scoping review was conducted, involving a search of six electronic databases (MEDLINE, Embase, Web of Science, Engineering Village, CINAHL, and PsycINFO) for research papers published between January 2013 and June 2023. This study included original research papers with full texts available in English, covering all study designs: qualitative, quantitative, and mixed methods.
View Article and Find Full Text PDFJ Imaging
January 2025
Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USA.
The integration of artificial intelligence into daily life significantly enhances the autonomy and quality of life of visually impaired individuals. This paper introduces the Visual Impairment Spatial Awareness (VISA) system, designed to holistically assist visually impaired users in indoor activities through a structured, multi-level approach. At the foundational level, the system employs augmented reality (AR) markers for indoor positioning, neural networks for advanced object detection and tracking, and depth information for precise object localization.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!