Automatic vocal tract landmark localization from midsagittal MRI data.

Sci Rep

Clinic for Phoniatrics, Pedaudiology & Communication Disorders, University Hospital and Medical Faculty, RWTH Aachen University, Aachen, Germany.

Published: January 2020

The various speech sounds of a language are obtained by varying the shape and position of the articulators surrounding the vocal tract. Analyzing their variations is crucial for understanding speech production, diagnosing speech disorders and planning therapy. Identifying key anatomical landmarks of these structures on medical images is a pre-requisite for any quantitative analysis and the rising amount of data generated in the field calls for an automatic solution. The challenge lies in the high inter- and intra-speaker variability, the mutual interaction between the articulators and the moderate quality of the images. This study addresses this issue for the first time and tackles it by means of Deep Learning. It proposes a dedicated network architecture named Flat-net and its performance are evaluated and compared with eleven state-of-the-art methods from the literature. The dataset contains midsagittal anatomical Magnetic Resonance Images for 9 speakers sustaining 62 articulations with 21 annotated anatomical landmarks per image. Results show that the Flat-net approach outperforms the former methods, leading to an overall Root Mean Square Error of 3.6 pixels/0.36 cm obtained in a leave-one-out procedure over the speakers. The implementation codes are also shared publicly on GitHub.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6992757PMC
http://dx.doi.org/10.1038/s41598-020-58103-6DOI Listing

Publication Analysis

Top Keywords

vocal tract
8
anatomical landmarks
8
automatic vocal
4
tract landmark
4
landmark localization
4
localization midsagittal
4
midsagittal mri
4
mri data
4
data speech
4
speech sounds
4

Similar Publications

Aerodynamic and Acoustic Power in Infant Cry.

J Voice

January 2025

Utah Center for Vocology, University of Utah, Salt Lake City, UT; National Center for Voice and Speech, Salt Lake City, UT. Electronic address:

Objectives: Acoustic and aerodynamic powers in infant cry are not scaled downward with body size or vocal tract size. The objective here was to show that high lung pressures and impedance matching are used to produce power levels comparable to those in adults.

Study Design And Methodology: A computational model was used to obtain power distributions along the infant airway.

View Article and Find Full Text PDF

Purpose: The aim was to determine and compare the short-term effects of two intensive semi-occluded vocal tract (SOVT) programs, "straw phonation" (SP) and "resonant voice therapy" (RVT), on the phonation of children with vocal fold nodules.

Method: A pretest-posttest randomized controlled study design was used. Thirty children aged 6-12 years were randomly assigned to the SP group ( = 11), RVT group ( = 11), or control group receiving indirect treatment ( = 8) for their voice problems.

View Article and Find Full Text PDF

Salient Voice Symptoms in Primary Muscle Tension Dysphonia.

J Voice

January 2025

School of Behavioral and Brain Sciences, Department of Speech, Language, and Hearing, Callier Center for Communication Disorders, University of Texas at Dallas, Richardson, TX; Department of Otolaryngology - Head and Neck Surgery, University of Texas Southwestern Medical Center, Dallas, TX. Electronic address:

Introduction: Patients with primary muscle tension dysphonia (pMTD) commonly report symptoms of vocal effort, fatigue, discomfort, odynophonia, and aberrant vocal quality (eg, vocal strain, hoarseness). However, voice symptoms most salient to pMTD have not been identified. Furthermore, how standard vocal fatigue and vocal tract discomfort indices that capture persistent symptoms-like the Vocal Fatigue Index (VFI) and Vocal Tract Discomfort Scale (VTDS)-relate to acute symptoms experienced at the time of the voice evaluation is unclear.

View Article and Find Full Text PDF

Exploring awareness of hearing loss and ear health in Jordanian adults.

PLoS One

December 2024

Faculty of Allied Medical Sciences, Department of Audiology and Speech Pathology, Al-Ahliyya Amman University, Amman, Jordan.

Objective: To assess the awareness about hearing loss and ear health among adults in Jordan.

Methods: A cross-sectional study was conducted where a questionnaire was filled from the month of November to the month of December of the year 2023, to assess the level of awareness about hearing loss and ear health. The participants included were Jordanian adults (age ≥ 18 years) residing in the North, Middle and South of Jordan.

View Article and Find Full Text PDF

Magnetic Resonance Imaging (MRI) allows analyzing speech production by capturing high-resolution images of the dynamic processes in the vocal tract. In clinical applications, combining MRI with synchronized speech recordings leads to improved patient outcomes, especially if a phonological-based approach is used for assessment. However, when audio signals are unavailable, the recognition accuracy of sounds is decreased when using only MRI data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!