On the one hand, the relationship between formant frequencies and vocal tract length (VTL) has been intensively studied over the years. On the other hand, the connection involving mel-frequency cepstral coefficients (MFCCs), which concisely codify the overall shape of a speaker's spectral envelope with just a few cepstral coefficients, and VTL has only been modestly analyzed, being worth of further investigation. Thus, based on different statistical models, this article explores the advantages and disadvantages of the latter approach, which is relatively novel, in contrast to the former which arises from more traditional studies. Additionally, VTL is assumed to be a static and inherent characteristic of speakers, that is, a single length parameter is frequently estimated per speaker. By contrast, in this paper we consider VTL estimation from a dynamic perspective using modern real-time Magnetic Resonance Imaging (rtMRI) to measure VTL in parallel with audio signals. To support the experiments, data obtained from USC-TIMIT magnetic resonance videos were used, allowing for the 2D real-time analysis of articulators in motion. As a result, we observed that the performance of MFCCs in case of speaker-dependent modeling is higher, however, in case of cross-speaker modeling, which uses different speakers' data for training and evaluating, its performance is not significantly different of that obtained with formants. In complement, we note that the estimation based on MFCCs is robust, with an acceptable computational time complexity, coherent with the traditional approach.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.jvoice.2023.05.012 | DOI Listing |
J Voice
January 2025
Department of Surgery, Division of Otolaryngology, University of Wisconsin-Madison, Madison, WI. Electronic address:
Introduction: Straw phonation therapy, a form of semi-occluded vocal tract (SOVT) exercise, is commonly used to help treat various voice disorders. Although straw phonation therapy has been studied extensively for decades, the impact of straw depth on vocal function remains unexplored. This study aims to quantify the effects of various straw vocal tract insertion depths (VTID) into the vocal tract on common aerodynamic parameters such as phonation threshold pressure (PTP), phonation threshold flow (PTF), and phonation threshold power (PTW) in an ex vivo canine model.
View Article and Find Full Text PDFQ J Exp Psychol (Hove)
January 2025
Department of Otorhinolaryngology / Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
This study aims to provide a comprehensive picture of auditory emotion perception in cochlear implant (CI) users by (1) investigating emotion categorization in both vocal (pseud-ospeech) and musical domains, and (2) how individual differences in residual acoustic hearing, sensitivity to voice cues (voice pitch, vocal tract length), and quality of life (QoL) might be associated with vocal emotion perception, and, going a step further, also with musical emotion perception. In 28 adult CI users, with or without self-reported acoustic hearing, we showed that sensitivity (d') scores for emotion categorization varied largely across the participants, in line with previous research. However, within participants, the d' scores for vocal and musical emotion categorization were significantly correlated, indicating similar processing of auditory emotional cues across the pseudo-speech and music domains and robustness of the tests.
View Article and Find Full Text PDFAnn Anat
January 2025
Department of Morpho-Functional Sciences I, Faculty of Medicine, "Grigore T. Popa" University of Medicine and Pharmacy, Iasi, Romania. Electronic address:
and Aims We conducted this research motivated by the incomplete knowledge of the changes made by resonance and harmonic filtering processes made by articulatory gestures in the supralar-yngeal level of the vocal tract. Aim of research The goal of the study is to evaluate the adaptive changes taking place at the oropharyngeal isthmus during sustained phonation. Methods We focused on exploring the dynamics of the oropharyngeal pavilion in voice professionals using Cone-Beam Computed Tomogra-phy (CBCT).
View Article and Find Full Text PDFJ Voice
January 2025
Utah Center for Vocology, University of Utah, Salt Lake City, UT; National Center for Voice and Speech, Salt Lake City, UT. Electronic address:
Objectives: Acoustic and aerodynamic powers in infant cry are not scaled downward with body size or vocal tract size. The objective here was to show that high lung pressures and impedance matching are used to produce power levels comparable to those in adults.
Study Design And Methodology: A computational model was used to obtain power distributions along the infant airway.
J Speech Lang Hear Res
January 2025
Center for Speech and Language Sciences, Department of Rehabilitation Sciences, Ghent University, Belgium.
Purpose: The aim was to determine and compare the short-term effects of two intensive semi-occluded vocal tract (SOVT) programs, "straw phonation" (SP) and "resonant voice therapy" (RVT), on the phonation of children with vocal fold nodules.
Method: A pretest-posttest randomized controlled study design was used. Thirty children aged 6-12 years were randomly assigned to the SP group ( = 11), RVT group ( = 11), or control group receiving indirect treatment ( = 8) for their voice problems.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!