IMAGE CLASSIFICATION-DRIVEN SPEECH DISORDER DETECTION USING DEEP LEARNING TECHNIQUE.

SLAS Technol

Department of Documents and Archive, Center of Documents and Administrative Communication, King Faisal University, Al Hofuf, 31982, Al-Ahsa, Saudi Arabia.

Published: March 2025

Speech disorders affect an individual's ability to generate sounds or utilize the voice appropriately. Neurological, developmental, physical, and trauma may cause speech disorders. Speech impairments influence communication, social interaction, education, and quality of life. Successful intervention entails early and precise diagnosis to allow for prompt treatment of these conditions. However, clinical examinations by speech-language pathologists are time-consuming, subjective, and demand an automated speech disorder detection (SDD) model. Mel-spectrogram images present a visual representation of multiple speech disorders. By classifying Mel-Spectrogram, various speech disorders can be identified. In this study, the authors proposed an image classification-based automated SDD model to classify Mel-Spectrograms to identify multiple speech disorders. Initially, Wavelet Transform (WT) hybridization technique was employed to generate Mel-Spectrogram using the voice samples. A feature extraction approach was developed using an enhanced LEVIT transformer. Finally, the extracted features were classified using an ensemble learning (EL) approach, containing CatBoost and XGBoost as base learners, and Extremely Randomized Tree as a meta learner. To reduce the computational resources, the authors used quantization-aware training (QAT). They employed Shapley Additive Explanations (SHAP) values to offer model interpretability. The proposed model was generalized using Voice ICar fEDerico II (VOICED) and LANNA datasets. The exceptional accuracy of 99.1 with limited parameters of 8.2 million demonstrated the significance of the proposed approach. The proposed model enhances speech disorder classification and offers novel prospects for building accessible, accurate, and efficient diagnostic tools. Researchers may integrate multimodal data to increase the model's use across languages and dialects, refining the proposed model for real-time clinical and telehealth deployment.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.slast.2025.100261DOI Listing

Publication Analysis

Top Keywords

speech disorders
20
speech disorder
12
proposed model
12
speech
9
disorder detection
8
sdd model
8
multiple speech
8
model
6
disorders
5
proposed
5

Similar Publications

Background: FOXP1 syndrome is a genetic neurodevelopmental disorder associated with complex clinical presentations including global developmental delay, mild to profound intellectual disability, speech and language impairment, autism traits, attention-deficit/hyperactivity disorder (ADHD), and a range of behavioral challenges. To date, much of the literature focuses on childhood symptoms and little is known about the FOXP1 syndrome phenotype in adolescence or adulthood.

Methods: A series of caregiver interviews and standardized questionnaires assessed psychiatric and behavioral features of 20 adolescents and adults with FOXP1 syndrome.

View Article and Find Full Text PDF

Speech emotion recognition (SER) is an important application in Affective Computing and Artificial Intelligence. Recently, there has been a significant interest in Deep Neural Networks using speech spectrograms. As the two-dimensional representation of the spectrogram includes more speech characteristics, research interest in convolution neural networks (CNNs) or advanced image recognition models is leveraged to learn deep patterns in a spectrogram to effectively perform SER.

View Article and Find Full Text PDF

Unlabelled: Functional near-infrared spectroscopy (fNIRS) estimates the cortical hemodynamic response induced by sound stimuli. fNIRS can be used to understand the symptomatology of tinnitus and consequently provide effective ways of evaluating and treating the symptom.

Objective: Compare the changes in the oxy-hemoglobin and deoxy-hemoglobin concentration of individuals with and without tinnitus using auditory stimulation by fNIRS.

View Article and Find Full Text PDF

Maturation of the auditory system in early childhood significantly influences the development of language-related perceptual and cognitive abilities. This study aims to provide insights into the neurophysiological changes underlying auditory processing and speech-sound discrimination in the first two years of life. We conducted a study using high-density electroencephalography (EEG) to longitudinally record cortical auditory event-related potentials (CAEP) in response to synthesized syllable sounds with pitch/duration change in a cohort of 79 extremely and very preterm-born infants without developmental disorders.

View Article and Find Full Text PDF

Purpose: This exploratory study evaluated the test-retest stability of three participation-based patient-reported outcome measures (PROMs) rated by individuals with Parkinson's disease (IWPD), primary communication partners (PCPs) serving as proxy raters, and control participants over three study visits spanning approximately 1 month.

Method: Twenty-three IWPD and hypophonia, 23 PCPs, and 30 control participants attended three non-intervention experimental visits. During each visit, all participants completed three participation-based PROMs: Communicative Participation Item Bank (CPIB), Voice Activity and Participation Profile (VAPP), and Levels of Speech Usage Scale (LSUS).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!