Variations in vocal effort can create challenges for speaker recognition systems that are optimized for use with neutral speech. The Lombard effect and whisper are two commonly-occurring forms of vocal effort variation that result in non-neutral speech, the first due to noise exposure and the second due to intentional adjustment on the part of the speaker. In this article, a comparative evaluation of speaker recognition performance in non-neutral conditions is presented using multiple Lombard effect and whisper corpora. The detrimental impact of these vocal effort variations on discrimination and calibration performance on global, per-corpus, and per-speaker levels is explored using conventional error metrics, along with visual representations of the model and score spaces. A non-neutral speech detector is subsequently introduced and used to inform score calibration in several ways. Two calibration approaches are proposed and shown to reduce error to the same level as an optimal calibration approach that relies on ground-truth vocal effort information. This article contributes a generalizable methodology towards detecting vocal effort variation and using this knowledge to inform and advance speaker recognition system behavior.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9245507 | PMC |
http://dx.doi.org/10.1109/taslp.2021.3053388 | DOI Listing |
Alzheimers Dement
December 2024
Department of Neurology, Division of Cognitive and Motor Aging, Albert Einstein College of Medicine, New York, NY, USA
Background: Vocal biomarkers are emerging as potentially meaningful health indicators in multiple domains, including cognition. Because voice‐enabled devices are widespread, automated vocal analysis could become a useful modality for early detection and monitoring of cognitive impairment. To assess the efficacy of vocal biomarkers in identifying cognitive impairment we evaluated prosodic speech features on vocal tasks in a research cohort from Kerala, India, and a referral cohort from the Montefiore‐Einstein Center for the Aging Brain in the Bronx, NY.
View Article and Find Full Text PDFJ Voice
January 2025
School of Behavioral and Brain Sciences, Department of Speech, Language, and Hearing, Callier Center for Communication Disorders, University of Texas at Dallas, Richardson, TX; Department of Otolaryngology - Head and Neck Surgery, University of Texas Southwestern Medical Center, Dallas, TX. Electronic address:
Introduction: Patients with primary muscle tension dysphonia (pMTD) commonly report symptoms of vocal effort, fatigue, discomfort, odynophonia, and aberrant vocal quality (eg, vocal strain, hoarseness). However, voice symptoms most salient to pMTD have not been identified. Furthermore, how standard vocal fatigue and vocal tract discomfort indices that capture persistent symptoms-like the Vocal Fatigue Index (VFI) and Vocal Tract Discomfort Scale (VTDS)-relate to acute symptoms experienced at the time of the voice evaluation is unclear.
View Article and Find Full Text PDFJ Voice
December 2024
SLT Department, Uskudar University, Istanbul, Turkey. Electronic address:
Objective: The purpose of this study is to examine the effects of a short-term (30 minutes) vocal loading task (VLT) on the objective and subjective parameters of voice and determine the restorative strategies of three different vocal exercises performed after the VLT.
Methods: The sample of the study included 30 normophonic women. The protocols that were applied in the study were carried out on three consecutive days.
J Speech Lang Hear Res
December 2024
Neurorehabilitation and Brain Research Group, Institute for Human-Centered Technology Research, Universitat Politècnica de València, Spain.
Purpose: This study investigated the ecological validity of conventional voice assessments by comparing the self-perceived voice quality and acoustic characteristics of voice production during these assessments to those in a simulated environment with varying distracting conditions and noise levels.
Method: Forty-two university professors (26 women) participated in the study, where they were asked to produce loud connected speech by reading a 100-word text under four different conditions: a conventional assessment and three virtual classroom simulations created with 360° videos, each with different noise levels, played through a virtual reality headset and headphones. The first video depicted students paying attention in class (40 dB classroom noise); the second showed some students talking, generating moderate conversational noise (60 dB); and the third depicted students talking loudly and not paying attention (70 dB).
Logoped Phoniatr Vocol
December 2024
Speech Prosody Studies Group, Dep. of Linguistics, State Univ. of Campinas, Campinas, Brazil.
Purpose: The analysis of acoustic parameters contributes to the characterisation of human communication development throughout the lifetime. The present paper intends to analyse suprasegmental features of European Portuguese in longitudinal conversational speech samples of three male public figures in uncontrolled environments across different ages, approximately 30 years apart.
Participants And Methods: Twenty prosodic features concerning intonation, intensity, rhythm, and pause measures were extracted semi-automatically from 360 speech intervals (3-4 interviews from each speaker x 30 speech intervals x 3 speakers) lasting between 3 to 6 s.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!