Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition.

IEEE/ACM Trans Audio Speech Lang Process

Center for Robust Speech Systems (CRSS), University of Texas at Dallas, Richardson, TX 75083-0688 USA.

Published: January 2021

Variations in vocal effort can create challenges for speaker recognition systems that are optimized for use with neutral speech. The Lombard effect and whisper are two commonly-occurring forms of vocal effort variation that result in non-neutral speech, the first due to noise exposure and the second due to intentional adjustment on the part of the speaker. In this article, a comparative evaluation of speaker recognition performance in non-neutral conditions is presented using multiple Lombard effect and whisper corpora. The detrimental impact of these vocal effort variations on discrimination and calibration performance on global, per-corpus, and per-speaker levels is explored using conventional error metrics, along with visual representations of the model and score spaces. A non-neutral speech detector is subsequently introduced and used to inform score calibration in several ways. Two calibration approaches are proposed and shown to reduce error to the same level as an optimal calibration approach that relies on ground-truth vocal effort information. This article contributes a generalizable methodology towards detecting vocal effort variation and using this knowledge to inform and advance speaker recognition system behavior.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9245507	PMC
http://dx.doi.org/10.1109/taslp.2021.3053388	DOI Listing

Publication Analysis

Top Keywords

vocal effort

speaker recognition

lombard whisper

effort variation

non-neutral speech

speaker

vocal

effort

analysis calibration

calibration lombard

Similar Publications

Efficacy of vocal biomarkers in detecting cognitive impairment: interim findings from US and Indian populations.

Alzheimers Dement

December 2024

Department of Neurology, Division of Cognitive and Motor Aging, Albert Einstein College of Medicine, New York, NY, USA

Erik Larsen Olivia Murton Xinyu Song Erica Weiss Ayers Emmeline

Background: Vocal biomarkers are emerging as potentially meaningful health indicators in multiple domains, including cognition. Because voice‐enabled devices are widespread, automated vocal analysis could become a useful modality for early detection and monitoring of cognitive impairment. To assess the efficacy of vocal biomarkers in identifying cognitive impairment we evaluated prosodic speech features on vocal tasks in a research cohort from Kerala, India, and a referral cohort from the Montefiore‐Einstein Center for the Aging Brain in the Bronx, NY.

View Article and Find Full Text PDF

Similar Publications

Salient Voice Symptoms in Primary Muscle Tension Dysphonia.

J Voice

January 2025

School of Behavioral and Brain Sciences, Department of Speech, Language, and Hearing, Callier Center for Communication Disorders, University of Texas at Dallas, Richardson, TX; Department of Otolaryngology - Head and Neck Surgery, University of Texas Southwestern Medical Center, Dallas, TX. Electronic address:

Avery Moore Adrianna C Shembel

Introduction: Patients with primary muscle tension dysphonia (pMTD) commonly report symptoms of vocal effort, fatigue, discomfort, odynophonia, and aberrant vocal quality (eg, vocal strain, hoarseness). However, voice symptoms most salient to pMTD have not been identified. Furthermore, how standard vocal fatigue and vocal tract discomfort indices that capture persistent symptoms-like the Vocal Fatigue Index (VFI) and Vocal Tract Discomfort Scale (VTDS)-relate to acute symptoms experienced at the time of the voice evaluation is unclear.

View Article and Find Full Text PDF

Similar Publications

Effects of a Short-Term Vocal Loading Task and Different Restoration Strategies on Voice.

J Voice

December 2024

SLT Department, Uskudar University, Istanbul, Turkey. Electronic address:

Emine Ülvan Serkan Bengisu Göksu Yılmaz Ayşe Buse Saraç Damla Akı

Objective: The purpose of this study is to examine the effects of a short-term (30 minutes) vocal loading task (VLT) on the objective and subjective parameters of voice and determine the restorative strategies of three different vocal exercises performed after the VLT.

Methods: The sample of the study included 30 normophonic women. The protocols that were applied in the study were carried out on three consecutive days.

View Article and Find Full Text PDF

Similar Publications

Ecological Validity of Self-Perceived Voice Quality and Acoustic Measures During Voice Assessments: An Observational Study on Faculty Teachers.

J Speech Lang Hear Res

December 2024

Neurorehabilitation and Brain Research Group, Institute for Human-Centered Technology Research, Universitat Politècnica de València, Spain.

Daniel Rodríguez Marco Guzman Pedro Brito Roberto Llorens

Purpose: This study investigated the ecological validity of conventional voice assessments by comparing the self-perceived voice quality and acoustic characteristics of voice production during these assessments to those in a simulated environment with varying distracting conditions and noise levels.

Method: Forty-two university professors (26 women) participated in the study, where they were asked to produce loud connected speech by reading a 100-word text under four different conditions: a conventional assessment and three virtual classroom simulations created with 360° videos, each with different noise levels, played through a virtual reality headset and headphones. The first video depicted students paying attention in class (40 dB classroom noise); the second showed some students talking, generating moderate conversational noise (60 dB); and the third depicted students talking loudly and not paying attention (70 dB).

View Article and Find Full Text PDF

Similar Publications

Prosodic changes with age: a longitudinal study with three public figures in European Portuguese.

Logoped Phoniatr Vocol

December 2024

Speech Prosody Studies Group, Dep. of Linguistics, State Univ. of Campinas, Campinas, Brazil.

Ana Rita S Valente Catarina Oliveira Luciana Albuquerque António Teixeira Plínio A Barbosa

Purpose: The analysis of acoustic parameters contributes to the characterisation of human communication development throughout the lifetime. The present paper intends to analyse suprasegmental features of European Portuguese in longitudinal conversational speech samples of three male public figures in uncontrolled environments across different ages, approximately 30 years apart.

Participants And Methods: Twenty prosodic features concerning intonation, intensity, rhythm, and pause measures were extracted semi-automatically from 360 speech intervals (3-4 interviews from each speaker x 30 speech intervals x 3 speakers) lasting between 3 to 6 s.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!