MFCC Parameters of the Speech Signal: An Alternative to Formant-Based Instantaneous Vocal Tract Length Estimation.

J Voice

Escuela de Ing. Eléctrica, Electrónica y de Telecomunicaciones (E3T), Universidad Industrial de Santander, Bucaramanga, Colombia.

Published: June 2023

On the one hand, the relationship between formant frequencies and vocal tract length (VTL) has been intensively studied over the years. On the other hand, the connection involving mel-frequency cepstral coefficients (MFCCs), which concisely codify the overall shape of a speaker's spectral envelope with just a few cepstral coefficients, and VTL has only been modestly analyzed, being worth of further investigation. Thus, based on different statistical models, this article explores the advantages and disadvantages of the latter approach, which is relatively novel, in contrast to the former which arises from more traditional studies. Additionally, VTL is assumed to be a static and inherent characteristic of speakers, that is, a single length parameter is frequently estimated per speaker. By contrast, in this paper we consider VTL estimation from a dynamic perspective using modern real-time Magnetic Resonance Imaging (rtMRI) to measure VTL in parallel with audio signals. To support the experiments, data obtained from USC-TIMIT magnetic resonance videos were used, allowing for the 2D real-time analysis of articulators in motion. As a result, we observed that the performance of MFCCs in case of speaker-dependent modeling is higher, however, in case of cross-speaker modeling, which uses different speakers' data for training and evaluating, its performance is not significantly different of that obtained with formants. In complement, we note that the estimation based on MFCCs is robust, with an acceptable computational time complexity, coherent with the traditional approach.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jvoice.2023.05.012DOI Listing

Publication Analysis

Top Keywords

vocal tract
8
cepstral coefficients
8
magnetic resonance
8
vtl
5
mfcc parameters
4
parameters speech
4
speech signal
4
signal alternative
4
alternative formant-based
4
formant-based instantaneous
4

Similar Publications

Introduction: Straw phonation therapy, a form of semi-occluded vocal tract (SOVT) exercise, is commonly used to help treat various voice disorders. Although straw phonation therapy has been studied extensively for decades, the impact of straw depth on vocal function remains unexplored. This study aims to quantify the effects of various straw vocal tract insertion depths (VTID) into the vocal tract on common aerodynamic parameters such as phonation threshold pressure (PTP), phonation threshold flow (PTF), and phonation threshold power (PTW) in an ex vivo canine model.

View Article and Find Full Text PDF

EXPRESS: Vocal and musical emotion perception, voice cue discrimination, and quality of life in cochlear implant users with and without acoustic hearing.

Q J Exp Psychol (Hove)

January 2025

Department of Otorhinolaryngology / Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.

This study aims to provide a comprehensive picture of auditory emotion perception in cochlear implant (CI) users by (1) investigating emotion categorization in both vocal (pseud-ospeech) and musical domains, and (2) how individual differences in residual acoustic hearing, sensitivity to voice cues (voice pitch, vocal tract length), and quality of life (QoL) might be associated with vocal emotion perception, and, going a step further, also with musical emotion perception. In 28 adult CI users, with or without self-reported acoustic hearing, we showed that sensitivity (d') scores for emotion categorization varied largely across the participants, in line with previous research. However, within participants, the d' scores for vocal and musical emotion categorization were significantly correlated, indicating similar processing of auditory emotional cues across the pseudo-speech and music domains and robustness of the tests.

View Article and Find Full Text PDF

and Aims We conducted this research motivated by the incomplete knowledge of the changes made by resonance and harmonic filtering processes made by articulatory gestures in the supralar-yngeal level of the vocal tract. Aim of research The goal of the study is to evaluate the adaptive changes taking place at the oropharyngeal isthmus during sustained phonation. Methods We focused on exploring the dynamics of the oropharyngeal pavilion in voice professionals using Cone-Beam Computed Tomogra-phy (CBCT).

View Article and Find Full Text PDF

Aerodynamic and Acoustic Power in Infant Cry.

J Voice

January 2025

Utah Center for Vocology, University of Utah, Salt Lake City, UT; National Center for Voice and Speech, Salt Lake City, UT. Electronic address:

Objectives: Acoustic and aerodynamic powers in infant cry are not scaled downward with body size or vocal tract size. The objective here was to show that high lung pressures and impedance matching are used to produce power levels comparable to those in adults.

Study Design And Methodology: A computational model was used to obtain power distributions along the infant airway.

View Article and Find Full Text PDF

Purpose: The aim was to determine and compare the short-term effects of two intensive semi-occluded vocal tract (SOVT) programs, "straw phonation" (SP) and "resonant voice therapy" (RVT), on the phonation of children with vocal fold nodules.

Method: A pretest-posttest randomized controlled study design was used. Thirty children aged 6-12 years were randomly assigned to the SP group ( = 11), RVT group ( = 11), or control group receiving indirect treatment ( = 8) for their voice problems.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!