Different Performances of Machine Learning Models to Classify Dysphonic and Non-Dysphonic Voices.

J Voice

Department of Statistics, Graduate Program in Health Decision Models, Universidade Federal da Paraíba - UFPB, João Pessoa, Paraíba, Brasil; Department of Speech-Language and Hearing Sciences, Graduate Program in Linguistics, Universidade Federal da Paraíba - UFPB, João Pessoa, Paraíba, Brasil. Electronic address:

Published: December 2022

Objective: To analyze the performance of 10 different machine learning (ML) classifiers for discrimination between dysphonic and non-dysphonic voices, using a variance threshold as a method for the selection and reduction of acoustic measurements used in the classifier.

Method: We analyzed 435 samples of individuals (337 female and 98 male), with a mean age of 41.07 ± 13.73 years, of which 384 were dysphonic and 51 were non-dysphonic. From the sustained /ε/ vowel sample, 34 acoustic measurements were extracted, including traditional perturbation and noise measurements, cepstral/spectral measurements, and measurements based on nonlinear models. The variance method was used to select the best set of acoustic measurements. We tested the performance of the best-selected set with 10 ML classifiers using precision, sensitivity, specificity, accuracy, and F1-Score measurements. The kappa coefficient was used to verify the reproducibility between the two datasets (training and testing).

Results: The naive Bayes (NB) and stochastic gradient descent classifier (SGDC) models performed best in terms of accuracy, AUC, sensitivity, and specificity for a reduced dataset of 15 acoustic measures compared to the full dataset of 34 acoustic measures. SGDC and NB obtained the best performance results, with an accuracy of 0.91 and 0.76, respectively. These two classifiers presented moderate agreement, with a Kappa of 0.57 (SGDC) and 0.45 (NB).

Conclusion: Among the tested models, the NB and SGDC models performed better in discriminating between dysphonic and non-dysphonic voices from a set of 15 acoustic measures.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jvoice.2022.11.001DOI Listing

Publication Analysis

Top Keywords

dysphonic non-dysphonic
16
non-dysphonic voices
12
acoustic measurements
12
acoustic measures
12
machine learning
8
set acoustic
8
sensitivity specificity
8
sgdc models
8
models performed
8
dataset acoustic
8

Similar Publications

Comprehensive Review of Multilingual Patient-Reported Outcome Measures for Dysphonia.

J Voice

January 2025

Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA; Boston University Chobanian and Avedisian School of Medicine, Boston, MA. Electronic address:

Introduction: Patient-reported outcome measures (PROMs) represent an important part of a comprehensive voice assessment for clinical care and research. Access to multilingual PROMs enables inclusion of information from diverse patient populations. This review compares available translated and validated PROMs for adult dysphonia.

View Article and Find Full Text PDF

Objective: To develop a multiparametric index based on machine learning (ML) to predict and classify the overall degree of vocal deviation (GG).

Method: The sample consisted of 300 dysphonic and non-dysphonic participants of both sexes. Two speech tasks were sustained vowel [a] and connected speech (counting numbers from 1 to 10).

View Article and Find Full Text PDF

Background Acoustic vocal analysis provides objective and measurable values for various voice parameters, such as fundamental frequency (F0), shimmer, jitter, and the noise-to-harmony ratio (NHR). In severely dysphonic patients, who present increased variability in glottic cycles and abnormalities in vocal intensity, conventional acoustic analysis is an unreliable predictor of dysphonia. The logarithmic transformation of the vocal spectrum (cepstrum) allows capturing the signal without relying on recording technique, frequency, or vocal intensity.

View Article and Find Full Text PDF

Introduction: The present study aimed to validate the Voice-Related Quality of Life (V-RQOL), vocal self-assessment questionnaire for Spanish.

Methods: The validation and psychometric properties were developed according to the criteria of the Scientific Advisory Committee of Medical Outcomes Trust (SAC). The Spanish translation for linguistic and cultural adaptation of the V-RQOL was used.

View Article and Find Full Text PDF

Objectives: Posterior glottic diastasis (PGD) is an underappreciated etiology of dysphonia in patients with prior airway reconstruction or prolonged intubation. In endoscopic posterior cricoid reduction (ePCR), cricoid is removed to minimize the posterior glottic gap. Dynamic voice computed tomography (DVCT) permits visualization of the posterior glottis, estimating the amount of cricoid to be removed.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!