The various speech sounds of a language are obtained by varying the shape and position of the articulators surrounding the vocal tract. Analyzing their variations is crucial for understanding speech production, diagnosing speech disorders and planning therapy. Identifying key anatomical landmarks of these structures on medical images is a pre-requisite for any quantitative analysis and the rising amount of data generated in the field calls for an automatic solution. The challenge lies in the high inter- and intra-speaker variability, the mutual interaction between the articulators and the moderate quality of the images. This study addresses this issue for the first time and tackles it by means of Deep Learning. It proposes a dedicated network architecture named Flat-net and its performance are evaluated and compared with eleven state-of-the-art methods from the literature. The dataset contains midsagittal anatomical Magnetic Resonance Images for 9 speakers sustaining 62 articulations with 21 annotated anatomical landmarks per image. Results show that the Flat-net approach outperforms the former methods, leading to an overall Root Mean Square Error of 3.6 pixels/0.36 cm obtained in a leave-one-out procedure over the speakers. The implementation codes are also shared publicly on GitHub.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6992757 | PMC |
http://dx.doi.org/10.1038/s41598-020-58103-6 | DOI Listing |
J Voice
January 2025
Utah Center for Vocology, University of Utah, Salt Lake City, UT; National Center for Voice and Speech, Salt Lake City, UT. Electronic address:
Objectives: Acoustic and aerodynamic powers in infant cry are not scaled downward with body size or vocal tract size. The objective here was to show that high lung pressures and impedance matching are used to produce power levels comparable to those in adults.
Study Design And Methodology: A computational model was used to obtain power distributions along the infant airway.
J Speech Lang Hear Res
January 2025
Center for Speech and Language Sciences, Department of Rehabilitation Sciences, Ghent University, Belgium.
Purpose: The aim was to determine and compare the short-term effects of two intensive semi-occluded vocal tract (SOVT) programs, "straw phonation" (SP) and "resonant voice therapy" (RVT), on the phonation of children with vocal fold nodules.
Method: A pretest-posttest randomized controlled study design was used. Thirty children aged 6-12 years were randomly assigned to the SP group ( = 11), RVT group ( = 11), or control group receiving indirect treatment ( = 8) for their voice problems.
J Voice
January 2025
School of Behavioral and Brain Sciences, Department of Speech, Language, and Hearing, Callier Center for Communication Disorders, University of Texas at Dallas, Richardson, TX; Department of Otolaryngology - Head and Neck Surgery, University of Texas Southwestern Medical Center, Dallas, TX. Electronic address:
Introduction: Patients with primary muscle tension dysphonia (pMTD) commonly report symptoms of vocal effort, fatigue, discomfort, odynophonia, and aberrant vocal quality (eg, vocal strain, hoarseness). However, voice symptoms most salient to pMTD have not been identified. Furthermore, how standard vocal fatigue and vocal tract discomfort indices that capture persistent symptoms-like the Vocal Fatigue Index (VFI) and Vocal Tract Discomfort Scale (VTDS)-relate to acute symptoms experienced at the time of the voice evaluation is unclear.
View Article and Find Full Text PDFPLoS One
December 2024
Faculty of Allied Medical Sciences, Department of Audiology and Speech Pathology, Al-Ahliyya Amman University, Amman, Jordan.
Objective: To assess the awareness about hearing loss and ear health among adults in Jordan.
Methods: A cross-sectional study was conducted where a questionnaire was filled from the month of November to the month of December of the year 2023, to assess the level of awareness about hearing loss and ear health. The participants included were Jordanian adults (age ≥ 18 years) residing in the North, Middle and South of Jordan.
Interspeech
September 2024
Pattern Recognition Lab. Friedrich-Alexander University, Erlangen, Germany.
Magnetic Resonance Imaging (MRI) allows analyzing speech production by capturing high-resolution images of the dynamic processes in the vocal tract. In clinical applications, combining MRI with synchronized speech recordings leads to improved patient outcomes, especially if a phonological-based approach is used for assessment. However, when audio signals are unavailable, the recognition accuracy of sounds is decreased when using only MRI data.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!