Objective: To develop and validate a deep-learning classifier trained on voice data extracted from videolaryngostroboscopy recordings, differentiating between three different vocal fold (VF) states: healthy (HVF), unilateral paralysis (UVFP), and VF lesions, including benign and malignant pathologies.
Methods: Patients with UVFP (n = 105), VF lesions (n = 63), and HVF (n = 41) were retrospectively identified. Voice samples were extracted from stroboscopic videos (Pentax Laryngeal Strobe Model 9400), including sustained /i/ phonation, pitch glide, and /i/ sniff task. Extracted audio files were converted into Mel-spectrograms. Voice samples were independently divided into training (80%), validation (10%), and test (10%) by patient. Pretrained ResNet18 models were trained to classify (1) HVF and pathological VF (lesions and UVFP), and (2) HVF, UVFP, and VF lesions. Both classifiers were further validated on an external dataset consisting of 12 UVFP, 13 VF lesions, and 15 HVF patients. Model performances were evaluated by accuracy and F1-score.
Results: When evaluated on a hold-out test set, the binary classifier demonstrated stronger performance compared to the multi-class classifier (accuracy 83% vs. 40%; F1-score 0.90 vs. 0.36). When evaluated on an external dataset, the binary classifier achieved an accuracy of 63% and F1-score of 0.48, compared to 35% and 0.25 for the multi-class classifier.
Conclusions: Deep-learning classifiers differentiating HVF, UVFP, and VF lesions were developed using voice data from stroboscopic videos. Although healthy and pathological voice were differentiated with moderate accuracy, multi-class classification lowered model performance. The model performed poorly on an external dataset. Voice captured in stroboscopic videos may have limited diagnostic value, though further studies are needed.
Level Of Evidence: 4 Laryngoscope, 2025.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1002/lary.32036 | DOI Listing |
J Clin Med
February 2025
Faculty of Medicine, "Carol Davila" University of Medicine and Pharmacy, 20021 Bucharest, Romania.
: Dysphonia, a common symptom after thyroid surgery, is most often caused by damage to the recurrent laryngeal nerve. Laryngeal electromyography (LEMG) is used as a qualitative diagnostic tool to distinguish neurological etiology from other causes of dysphonia. The purpose of this study is to establish the value of LEMG as a predictor factor in the recovery of unilateral recurrent paralysis post-thyroidectomy.
View Article and Find Full Text PDFLaryngoscope
February 2025
Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, U.S.A.
Objective: To develop and validate a deep-learning classifier trained on voice data extracted from videolaryngostroboscopy recordings, differentiating between three different vocal fold (VF) states: healthy (HVF), unilateral paralysis (UVFP), and VF lesions, including benign and malignant pathologies.
Methods: Patients with UVFP (n = 105), VF lesions (n = 63), and HVF (n = 41) were retrospectively identified. Voice samples were extracted from stroboscopic videos (Pentax Laryngeal Strobe Model 9400), including sustained /i/ phonation, pitch glide, and /i/ sniff task.
Ear Nose Throat J
November 2024
Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Bundang Hospital, Seoul National University College of Medicine, Seongnam-si, Republic of Korea.
Hyaluronic acid (HA) is a commonly used injectable material in temporary vocal fold injections (VFI) in patients with unilateral vocal fold paralysis (UVFP). Hyaluronic acid has generally been known for its three-six months of longevity following VFI. Owing to recent advances in cross-linking technologies, the longevity of HA-based materials, including deep-volumizing cross-linked HA used in VFI, has been improved.
View Article and Find Full Text PDFJ Voice
January 2024
Communication Science and Disorders, University of Pittsburgh, Pittsburgh, Pennsylvania; University of Pittsburgh Voice Center, Department of Otolaryngology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania. Electronic address:
Objective: The potential for negative sequalae in psychosocial well-being presents clinical importance to the assessment of voice disorders. Despite the impairment voice disorders cause in the psychosocial domain, the clinical assessment of these disorders relies heavily on visual perceptual judgments of the larynx, audio-perceptual, as well as acoustic and aerodynamic measures. While these measures aid in accurate diagnosis and are necessary for standard of care, they present little insight into the patient experience of having a voice disorder.
View Article and Find Full Text PDFActa Otorhinolaryngol Ital
June 2021
IRCCS Ospedale Policlinico San Martino, Genoa, Italy.
Objective: Development of transnasal fiberoptic laryngoscopy, integration of an operative channel (OC), the advent of high-definition television imaging, with improvements in laser technology, cleared the way for office-based laryngology. Three treatment categories can be identified: bioendoscopy-guided biopsy; laryngeal injection; laser-assisted surgery.
Methods: 26 patients underwent OBPs at the Otolaryngology Clinic of IRCCS Policlinico San Martino, Genoa, Italy.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!