AI Article Synopsis

  • The study develops an automatic speech recognition (ASR) model specifically to diagnose pronunciation problems in children with speech sound disorders (SSDs), aiming to replace manual transcription methods.
  • The researchers fine-tuned the wav2vec2.0 XLS-R model to better recognize the way children with SSDs pronounce words, achieving a Phoneme Error Rate (PER) of only 10%.
  • In comparison, a leading ASR model called Whisper struggled with this task, showing a much higher PER of about 50%, highlighting the need for more specialized ASR approaches in clinical settings.

Article Abstract

This study presents a model of automatic speech recognition (ASR) that is designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Because ASR models trained for general purposes mainly predict input speech into standard spelling words, well-known high-performance ASR models are not suitable for evaluating pronunciation in children with SSDs. We fine-tuned the wav2vec2.0 XLS-R model to recognise words as they are pronounced by children, rather than converting the speech into their standard spelling words. The model was fine-tuned with a speech dataset of 137 children with SSDs pronouncing 73 Korean words that are selected for actual clinical diagnosis. The model's Phoneme Error Rate (PER) was only 10% when its predictions of children's pronunciations were compared to human annotations of pronunciations as heard. In contrast, despite its robust performance on general tasks, the state-of-the-art ASR model Whisper showed limitations in recognising the speech of children with SSDs, with a PER of approximately 50%. While the model still requires improvement in terms of the recognition of unclear pronunciation, this study demonstrates that ASR models can streamline complex pronunciation error diagnostic procedures in clinical fields.

Download full-text PDF

Source
http://dx.doi.org/10.1080/02699206.2024.2387609DOI Listing

Publication Analysis

Top Keywords

asr models
12
children ssds
12
automatic speech
8
speech recognition
8
recognition asr
8
speech sound
8
sound disorders
8
speech standard
8
standard spelling
8
speech
7

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!