Many acoustic features and machine learning models have been studied to build automatic detection systems to distinguish dysarthric speech from healthy speech. These systems can help to improve the reliability of diagnosis. However, speech recorded for diagnosis in real-life clinical conditions can differ from the training data of the detection system in terms of, for example, recording conditions, speaker identity, and language. These mismatches may lead to a reduction in detection performance in practical applications. In this study, we investigate the use of the wav2vec2 model as a feature extractor together with a support vector machine (SVM) classifier to build automatic detection systems for dysarthric speech. The performance of the wav2vec2 features is evaluated in two cross-database scenarios, language-dependent and language-independent, to study their generalizability to unseen speakers, recording conditions, and languages before and after fine-tuning the wav2vec2 model. The results revealed that the fine-tuned wav2vec2 features showed better generalization in both scenarios and gave an absolute accuracy improvement of 1.46%-8.65% compared to the non-fine-tuned wav2vec2 features.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/JBHI.2024.3392829 | DOI Listing |
Int J Comput Assist Radiol Surg
December 2024
Department of Neurosurgery, Tokyo Women's Medical University, Tokyo, Japan.
J Acoust Soc Am
November 2024
Department of Information Engineering and Computer Science, Feng Chia University, Taichung 407, Taiwan.
PLoS One
September 2024
College of Information Technology, Zhejiang Shuren University, Hangzhou, Zhejiang, China.
Alzheimer's Disease is a neurodegenerative disorder, and one of its common and prominent early symptoms is language impairment. Therefore, early diagnosis of Alzheimer's Disease through speech and text information is of significant importance. However, the multimodal data is often complex and inconsistent, which leads to inadequate feature extraction.
View Article and Find Full Text PDFClin Linguist Phon
August 2024
Department of Rehabilitation Medicine, Incheon St.Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.
IEEE J Biomed Health Inform
August 2024
Many acoustic features and machine learning models have been studied to build automatic detection systems to distinguish dysarthric speech from healthy speech. These systems can help to improve the reliability of diagnosis. However, speech recorded for diagnosis in real-life clinical conditions can differ from the training data of the detection system in terms of, for example, recording conditions, speaker identity, and language.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!