Autism Spectrum Disorder (ASD) is characterized by difficulties in social communication, social interactions and repetitive behaviors. Some of these difficulties are apparent in the speech characteristics of ASD children who are verbal. Developing algorithms that can extract and quantify speech features that are unique to ASD children is, therefore, extremely valuable for assessing the initial state of each child and their development over time. An important component of such algorithms is speaker diarization in the noisy clinical environments where ASD children are diagnosed. Here we present a Gaussian Mixture Model (GMM) approach for speaker diarization that was applied to 34 recordings from clinical assessments using the Autism Diagnostic Observation Schedule (ADOS). We used mel-frequency cepstral coefficients (MFCC) and pitch based features to classify segments containing speech of the child, therapist, parent, movement noises (chair, toys, etc.) and simultaneous speech. We achieved an accuracy of 89% in identifying segments with children's speech and an accuracy of 74.5% in identifying children's and therapists' speech segments. These accuracy rates are similar to the diarization accuracy rates reported by previous similar studies, thereby demonstrating a promising route for the automated assessment of speech in children with ASD.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/EMBC.2019.8857247 | DOI Listing |
PLoS One
November 2024
Department of Computer Science and Engineering, Yeungnam University, Gyeongsan, Republic of Korea.
Segmentation process is very popular in Speech recognition, word count, speaker indexing and speaker diarization process. This paper describes the speaker segmentation system which detects the speaker change point in an audio recording of multi speakers with the help of feature extraction and proposed distance metric algorithms. In this new approach, pre-processing of audio stream includes noise reduction, speech compression by using discrete wavelet transform (Daubechies wavelet 'db40' at level 2) and framing.
View Article and Find Full Text PDFAdv Simul (Lond)
October 2024
D-MAVT, ETH Zurich, Leonhardstrasse, Zurich, 8092, Zurich, Switzerland.
Background: Debriefings are central to effective learning in simulation-based medical education. However, educators often face challenges when conducting debriefings, which are further compounded by the lack of empirically derived knowledge on optimal debriefing processes. The goal of this study was to explore the technical feasibility of audio-based speaker diarization for automatically, objectively, and reliably measuring debriefing interaction patterns among debriefers and participants.
View Article and Find Full Text PDFJ Clin Med
August 2024
Department of Psychiatry and Psychology, Institute of Neuroscience, Hospital Clinic of Barcelona, 08036 Barcelona, Catalonia, Spain.
: Bipolar disorder (BD) involves significant mood and energy shifts reflected in speech patterns. Detecting these patterns is crucial for diagnosis and monitoring, currently assessed subjectively. Advances in natural language processing offer opportunities to objectively analyze them.
View Article and Find Full Text PDFJMIR Aging
August 2024
Department of Anatomy and Neurobiology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, United States.
Background: With the aging global population and the rising burden of Alzheimer disease and related dementias (ADRDs), there is a growing focus on identifying mild cognitive impairment (MCI) to enable timely interventions that could potentially slow down the onset of clinical dementia. The production of speech by an individual is a cognitively complex task that engages various cognitive domains. The ease of audio data collection highlights the potential cost-effectiveness and noninvasive nature of using human speech as a tool for cognitive assessment.
View Article and Find Full Text PDFJ Speech Lang Hear Res
August 2024
Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin.
Purpose: This study examines the accuracy of Interaction Detection in Early Childhood Settings (IDEAS), a program that automatically transcribes audio files and estimates linguistic units relevant to speech-language therapy, including part-of-speech units that represent features of language complexity, such as adjectives and coordinating conjunctions.
Method: Forty-five video-recorded speech-language therapy sessions involving 27 speech-language pathologists (SLPs) and 56 children were used. The measure determines the accuracy of IDEAS diarization (i.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!