IMAGE CLASSIFICATION-DRIVEN SPEECH DISORDER DETECTION USING DEEP LEARNING TECHNIQUE.

Nasser Ali Aljarallah Ashit Kumar Dutta Abdul Rahaman Wahab Sait

SLAS Technol

Department of Documents and Archive, Center of Documents and Administrative Communication, King Faisal University, Al Hofuf, 31982, Al-Ahsa, Saudi Arabia.

Published: March 2025

Speech disorders affect an individual's ability to generate sounds or utilize the voice appropriately. Neurological, developmental, physical, and trauma may cause speech disorders. Speech impairments influence communication, social interaction, education, and quality of life. Successful intervention entails early and precise diagnosis to allow for prompt treatment of these conditions. However, clinical examinations by speech-language pathologists are time-consuming, subjective, and demand an automated speech disorder detection (SDD) model. Mel-spectrogram images present a visual representation of multiple speech disorders. By classifying Mel-Spectrogram, various speech disorders can be identified. In this study, the authors proposed an image classification-based automated SDD model to classify Mel-Spectrograms to identify multiple speech disorders. Initially, Wavelet Transform (WT) hybridization technique was employed to generate Mel-Spectrogram using the voice samples. A feature extraction approach was developed using an enhanced LEVIT transformer. Finally, the extracted features were classified using an ensemble learning (EL) approach, containing CatBoost and XGBoost as base learners, and Extremely Randomized Tree as a meta learner. To reduce the computational resources, the authors used quantization-aware training (QAT). They employed Shapley Additive Explanations (SHAP) values to offer model interpretability. The proposed model was generalized using Voice ICar fEDerico II (VOICED) and LANNA datasets. The exceptional accuracy of 99.1 with limited parameters of 8.2 million demonstrated the significance of the proposed approach. The proposed model enhances speech disorder classification and offers novel prospects for building accessible, accurate, and efficient diagnostic tools. Researchers may integrate multimodal data to increase the model's use across languages and dialects, refining the proposed model for real-time clinical and telehealth deployment.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.slast.2025.100261	DOI Listing

Publication Analysis

Top Keywords

speech disorders

speech disorder

proposed model

speech

disorder detection

sdd model

multiple speech

model

disorders

proposed

Similar Publications

Adolescents and adults with FOXP1 syndrome show high rates of anxiety and externalizing behaviors but not psychiatric decompensation or skill loss.

Front Psychiatry

February 2025

Seaver Autism Center for Research and Treatment, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, United States.

Tess Levy Hailey Silver Renee Soufer Audrey Rouhandeh Alexander Kolevzon

Background: FOXP1 syndrome is a genetic neurodevelopmental disorder associated with complex clinical presentations including global developmental delay, mild to profound intellectual disability, speech and language impairment, autism traits, attention-deficit/hyperactivity disorder (ADHD), and a range of behavioral challenges. To date, much of the literature focuses on childhood symptoms and little is known about the FOXP1 syndrome phenotype in adolescence or adulthood.

Methods: A series of caregiver interviews and standardized questionnaires assessed psychiatric and behavioral features of 20 adolescents and adults with FOXP1 syndrome.

View Article and Find Full Text PDF

Similar Publications

A multi-dilated convolution network for speech emotion recognition.

Sci Rep

March 2025

Department of Communication Sciences and Disorders, Saint Mary's College, Notre Dame, IN, USA.

Samaneh Madanian Olayinka Adeleye John Michael Templeton Talen Chen Christian Poellabauer

Speech emotion recognition (SER) is an important application in Affective Computing and Artificial Intelligence. Recently, there has been a significant interest in Deep Neural Networks using speech spectrograms. As the two-dimensional representation of the spectrogram includes more speech characteristics, research interest in convolution neural networks (CNNs) or advanced image recognition models is leveraged to learn deep patterns in a spectrogram to effectively perform SER.

View Article and Find Full Text PDF

Similar Publications

Cortical tinnitus evaluation using functional near-infrared spectroscopy.

Brain Res

March 2025

Department of Speech-Language Pathology, Federal University of Paraiba, João Pessoa, PB 58051-900, Brazil.

Mariana Lopes Martins Edgard Morya Liliane Kline Araújo de Lima Isabelle Costa de Vasconcelos Sheila Andreoli Balan

Unlabelled: Functional near-infrared spectroscopy (fNIRS) estimates the cortical hemodynamic response induced by sound stimuli. fNIRS can be used to understand the symptomatology of tinnitus and consequently provide effective ways of evaluating and treating the symptom.

Objective: Compare the changes in the oxy-hemoglobin and deoxy-hemoglobin concentration of individuals with and without tinnitus using auditory stimulation by fNIRS.

View Article and Find Full Text PDF

Similar Publications

Longitudinal high-density cortical auditory event-related potentials and speech-sound discrimination in the first two years of life in extremely and very preterm infants without developmental disorders.

Neuroimage

March 2025

Inkendaal Rehabilitation Hospital, Vlezenbeek, Belgium; Université libre de Bruxelles (ULB), Faculty of Psychology, Educational Sciences and Speech and Language therapy, Brussels, Belgium.

Karine Pelc Aleksandra Gajewska Natan Napiórkowski Jonathan Dan Caroline Verhoeven

Maturation of the auditory system in early childhood significantly influences the development of language-related perceptual and cognitive abilities. This study aims to provide insights into the neurophysiological changes underlying auditory processing and speech-sound discrimination in the first two years of life. We conducted a study using high-density electroencephalography (EEG) to longitudinally record cortical auditory event-related potentials (CAEP) in response to synthesized syllable sounds with pitch/duration change in a cohort of 79 extremely and very preterm-born infants without developmental disorders.

View Article and Find Full Text PDF

Similar Publications

Exploring the Stability of Communicative Participation and Level of Daily Speech Usage Among Individuals With Hypophonia and Parkinson's Disease.

Am J Speech Lang Pathol

March 2025

Faculty of Health Sciences, School of Communication Sciences and Disorders, Western University, London, Ontario, Canada.

Allyson D Page Cynthia Mancinelli Julie Theurer Mandar Jog Scott G Adams

Purpose: This exploratory study evaluated the test-retest stability of three participation-based patient-reported outcome measures (PROMs) rated by individuals with Parkinson's disease (IWPD), primary communication partners (PCPs) serving as proxy raters, and control participants over three study visits spanning approximately 1 month.

Method: Twenty-three IWPD and hypophonia, 23 PCPs, and 30 control participants attended three non-intervention experimental visits. During each visit, all participants completed three participation-based PROMs: Communicative Participation Item Bank (CPIB), Voice Activity and Participation Profile (VAPP), and Levels of Speech Usage Scale (LSUS).

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!