Speech disorders affect an individual's ability to generate sounds or utilize the voice appropriately. Neurological, developmental, physical, and trauma may cause speech disorders. Speech impairments influence communication, social interaction, education, and quality of life. Successful intervention entails early and precise diagnosis to allow for prompt treatment of these conditions. However, clinical examinations by speech-language pathologists are time-consuming, subjective, and demand an automated speech disorder detection (SDD) model. Mel-spectrogram images present a visual representation of multiple speech disorders. By classifying Mel-Spectrogram, various speech disorders can be identified. In this study, the authors proposed an image classification-based automated SDD model to classify Mel-Spectrograms to identify multiple speech disorders. Initially, Wavelet Transform (WT) hybridization technique was employed to generate Mel-Spectrogram using the voice samples. A feature extraction approach was developed using an enhanced LEVIT transformer. Finally, the extracted features were classified using an ensemble learning (EL) approach, containing CatBoost and XGBoost as base learners, and Extremely Randomized Tree as a meta learner. To reduce the computational resources, the authors used quantization-aware training (QAT). They employed Shapley Additive Explanations (SHAP) values to offer model interpretability. The proposed model was generalized using Voice ICar fEDerico II (VOICED) and LANNA datasets. The exceptional accuracy of 99.1 with limited parameters of 8.2 million demonstrated the significance of the proposed approach. The proposed model enhances speech disorder classification and offers novel prospects for building accessible, accurate, and efficient diagnostic tools. Researchers may integrate multimodal data to increase the model's use across languages and dialects, refining the proposed model for real-time clinical and telehealth deployment.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.slast.2025.100261 | DOI Listing |
Front Psychiatry
February 2025
Seaver Autism Center for Research and Treatment, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
Background: FOXP1 syndrome is a genetic neurodevelopmental disorder associated with complex clinical presentations including global developmental delay, mild to profound intellectual disability, speech and language impairment, autism traits, attention-deficit/hyperactivity disorder (ADHD), and a range of behavioral challenges. To date, much of the literature focuses on childhood symptoms and little is known about the FOXP1 syndrome phenotype in adolescence or adulthood.
Methods: A series of caregiver interviews and standardized questionnaires assessed psychiatric and behavioral features of 20 adolescents and adults with FOXP1 syndrome.
Sci Rep
March 2025
Department of Communication Sciences and Disorders, Saint Mary's College, Notre Dame, IN, USA.
Speech emotion recognition (SER) is an important application in Affective Computing and Artificial Intelligence. Recently, there has been a significant interest in Deep Neural Networks using speech spectrograms. As the two-dimensional representation of the spectrogram includes more speech characteristics, research interest in convolution neural networks (CNNs) or advanced image recognition models is leveraged to learn deep patterns in a spectrogram to effectively perform SER.
View Article and Find Full Text PDFBrain Res
March 2025
Department of Speech-Language Pathology, Federal University of Paraiba, João Pessoa, PB 58051-900, Brazil.
Unlabelled: Functional near-infrared spectroscopy (fNIRS) estimates the cortical hemodynamic response induced by sound stimuli. fNIRS can be used to understand the symptomatology of tinnitus and consequently provide effective ways of evaluating and treating the symptom.
Objective: Compare the changes in the oxy-hemoglobin and deoxy-hemoglobin concentration of individuals with and without tinnitus using auditory stimulation by fNIRS.
Neuroimage
March 2025
Inkendaal Rehabilitation Hospital, Vlezenbeek, Belgium; Université libre de Bruxelles (ULB), Faculty of Psychology, Educational Sciences and Speech and Language therapy, Brussels, Belgium.
Maturation of the auditory system in early childhood significantly influences the development of language-related perceptual and cognitive abilities. This study aims to provide insights into the neurophysiological changes underlying auditory processing and speech-sound discrimination in the first two years of life. We conducted a study using high-density electroencephalography (EEG) to longitudinally record cortical auditory event-related potentials (CAEP) in response to synthesized syllable sounds with pitch/duration change in a cohort of 79 extremely and very preterm-born infants without developmental disorders.
View Article and Find Full Text PDFAm J Speech Lang Pathol
March 2025
Faculty of Health Sciences, School of Communication Sciences and Disorders, Western University, London, Ontario, Canada.
Purpose: This exploratory study evaluated the test-retest stability of three participation-based patient-reported outcome measures (PROMs) rated by individuals with Parkinson's disease (IWPD), primary communication partners (PCPs) serving as proxy raters, and control participants over three study visits spanning approximately 1 month.
Method: Twenty-three IWPD and hypophonia, 23 PCPs, and 30 control participants attended three non-intervention experimental visits. During each visit, all participants completed three participation-based PROMs: Communicative Participation Item Bank (CPIB), Voice Activity and Participation Profile (VAPP), and Levels of Speech Usage Scale (LSUS).
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!