Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition.

IEEE/ACM Trans Audio Speech Lang Process

Center for Robust Speech Systems (CRSS), University of Texas at Dallas, Richardson, TX 75083-0688 USA.

Published: January 2021

Variations in vocal effort can create challenges for speaker recognition systems that are optimized for use with neutral speech. The Lombard effect and whisper are two commonly-occurring forms of vocal effort variation that result in non-neutral speech, the first due to noise exposure and the second due to intentional adjustment on the part of the speaker. In this article, a comparative evaluation of speaker recognition performance in non-neutral conditions is presented using multiple Lombard effect and whisper corpora. The detrimental impact of these vocal effort variations on discrimination and calibration performance on global, per-corpus, and per-speaker levels is explored using conventional error metrics, along with visual representations of the model and score spaces. A non-neutral speech detector is subsequently introduced and used to inform score calibration in several ways. Two calibration approaches are proposed and shown to reduce error to the same level as an optimal calibration approach that relies on ground-truth vocal effort information. This article contributes a generalizable methodology towards detecting vocal effort variation and using this knowledge to inform and advance speaker recognition system behavior.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9245507PMC
http://dx.doi.org/10.1109/taslp.2021.3053388DOI Listing

Publication Analysis

Top Keywords

vocal effort
20
speaker recognition
16
lombard whisper
12
effort variation
8
non-neutral speech
8
speaker
5
vocal
5
effort
5
analysis calibration
4
calibration lombard
4

Similar Publications

Background: Vocal biomarkers are emerging as potentially meaningful health indicators in multiple domains, including cognition. Because voice‐enabled devices are widespread, automated vocal analysis could become a useful modality for early detection and monitoring of cognitive impairment. To assess the efficacy of vocal biomarkers in identifying cognitive impairment we evaluated prosodic speech features on vocal tasks in a research cohort from Kerala, India, and a referral cohort from the Montefiore‐Einstein Center for the Aging Brain in the Bronx, NY.

View Article and Find Full Text PDF

Salient Voice Symptoms in Primary Muscle Tension Dysphonia.

J Voice

January 2025

School of Behavioral and Brain Sciences, Department of Speech, Language, and Hearing, Callier Center for Communication Disorders, University of Texas at Dallas, Richardson, TX; Department of Otolaryngology - Head and Neck Surgery, University of Texas Southwestern Medical Center, Dallas, TX. Electronic address:

Introduction: Patients with primary muscle tension dysphonia (pMTD) commonly report symptoms of vocal effort, fatigue, discomfort, odynophonia, and aberrant vocal quality (eg, vocal strain, hoarseness). However, voice symptoms most salient to pMTD have not been identified. Furthermore, how standard vocal fatigue and vocal tract discomfort indices that capture persistent symptoms-like the Vocal Fatigue Index (VFI) and Vocal Tract Discomfort Scale (VTDS)-relate to acute symptoms experienced at the time of the voice evaluation is unclear.

View Article and Find Full Text PDF

Objective: The purpose of this study is to examine the effects of a short-term (30 minutes) vocal loading task (VLT) on the objective and subjective parameters of voice and determine the restorative strategies of three different vocal exercises performed after the VLT.

Methods: The sample of the study included 30 normophonic women. The protocols that were applied in the study were carried out on three consecutive days.

View Article and Find Full Text PDF

Purpose: This study investigated the ecological validity of conventional voice assessments by comparing the self-perceived voice quality and acoustic characteristics of voice production during these assessments to those in a simulated environment with varying distracting conditions and noise levels.

Method: Forty-two university professors (26 women) participated in the study, where they were asked to produce loud connected speech by reading a 100-word text under four different conditions: a conventional assessment and three virtual classroom simulations created with 360° videos, each with different noise levels, played through a virtual reality headset and headphones. The first video depicted students paying attention in class (40 dB classroom noise); the second showed some students talking, generating moderate conversational noise (60 dB); and the third depicted students talking loudly and not paying attention (70 dB).

View Article and Find Full Text PDF

Purpose: The analysis of acoustic parameters contributes to the characterisation of human communication development throughout the lifetime. The present paper intends to analyse suprasegmental features of European Portuguese in longitudinal conversational speech samples of three male public figures in uncontrolled environments across different ages, approximately 30 years apart.

Participants And Methods: Twenty prosodic features concerning intonation, intensity, rhythm, and pause measures were extracted semi-automatically from 360 speech intervals (3-4 interviews from each speaker x 30 speech intervals x 3 speakers) lasting between 3 to 6 s.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!