Natural language sampling (NLS) offers rich insights into real-world speech and language usage across diverse groups; yet, human transcription is time-consuming and costly. Automatic speech recognition (ASR) technology has the potential to revolutionize NLS research. However, its performance in clinical-research settings with young children and those with developmental delays remains unknown. This study evaluates the OpenAI Whisper ASR model on n=34 NLS sessions of toddlers with and without language delays. Manual comparison of ASR to human transcriptions of children with Down Syndrome (DS; n=19; 2-5 years old) and typically-developing children (TD; n=15; 2-3 years old) revealed ASR accurately captured 50% of words spoken by TD children but only 14% for those with DS. About 20% of words were missed in both groups, and 21% (TD) and 6% (DS) of words were replaced. ASR also struggled with developmentally informative sounds, such as non-speech vocalizations, missing almost 50% in the DS data and misinterpreting most of the rest. While ASR shows potential in reducing transcription time, its limitations underscore the need for human-in-the-loop clinical machine learning systems, especially for underrepresented groups.

Download full-text PDF

Source
http://dx.doi.org/10.1109/EMBC53108.2024.10782773DOI Listing

Publication Analysis

Top Keywords

automatic speech
8
speech recognition
8
natural language
8
children developmental
8
developmental delays
8
asr
6
children
5
benchmarking automatic
4
recognition technology
4
technology natural
4

Similar Publications

Listeners can adapt to noise-vocoded speech under divided attention using a dual task design [Wang, Chen, Yan, McGettigan, Rosen, and Adank, Trends Hear. 27, 23312165231192297 (2023)]. Adaptation to noise-vocoded speech, an artificial degradation, was largely unaffected for domain-general (visuomotor) and domain-specific (semantic or phonological) dual tasks.

View Article and Find Full Text PDF

In online teaching environments, the lack of direct emotional interaction between teachers and students poses challenges for teachers to consciously and effectively manage their emotional expressions. The design and implementation of an early warning system for teaching provide a novel approach to intelligent evaluation and improvement of online education. This study focuses on segmenting different emotional segments and recognizing emotions in instructional videos.

View Article and Find Full Text PDF

Background: Language interventions are complex behavioural interventions, making it difficult to distinguish the specific factors contributing to efficacy. The efficacy of oral language comprehension interventions varies greatly, but the reasons for this have received little attention.

Aims: The aim of this meta-analysis was to examine which intervention factors are associated with efficacy (as expressed with effect sizes) regarding interventions aiming to improve oral language comprehension on its own, or together with expressive language, in children under the age of 18 with or at risk for (developmental) language disorder-(D)LD.

View Article and Find Full Text PDF

Preconceived assumptions about the speaker have been shown to strongly and automatically influence speech interpretation. This study contributes to previous research by investigating the impact of non-nativeness on perceived metaphor sensibility. To eliminate the effects of speech disfluency, we used exclusively written sentences but introduced their "authors" as having a strong native or non-native accent through a written vignette.

View Article and Find Full Text PDF

Objectives: Accurate segmentation of the vocal tract from MRI data is essential for various voice, speech, and singing applications. Manual segmentation is time-intensive and susceptible to errors. This study aimed to evaluate the efficacy of deep learning algorithms for automatic vocal tract segmentation from 3D MRI.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!