No agreed-upon method currently exists for objective measurement of perceived voice quality. This paper describes validation of a psychoacoustic model designed to fill this gap. This model includes parameters to characterize the harmonic and inharmonic voice sources, vocal tract transfer function, fundamental frequency, and amplitude of the voice, which together serve to completely quantify the integral sound of a target voice sample.
View Article and Find Full Text PDFObjectives/hypotheses: Charismatic leaders use vocal behavior to persuade their audience, achieve goals, arouse emotional states, and convey personality traits and leadership status. This study investigates voice fundamental frequency (f0) and sound pressure level (SPL) in female and male French, Italian, Brazilian, and American politicians to determine which acoustic parameters are related to cross-gender and cross-cultural common vocal abilities, and which derive from culture-, gender-, and language-specific vocal strategies used to adapt vocal behavior to listeners' culture-related expectations.
Study Design: Speech corpora were collected for two formal communicative contexts (leaders address followers or other leaders) and one informal communicative context (dyadic interaction), based on the persuasive goals inherent in each context and on the relative status of the listeners and speakers.
J Speech Lang Hear Res
October 2016
Purpose: The question of what type of utterance-a sustained vowel or continuous speech-is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation.
Method: Speakers with voice disorders sustained vowels and read sentences.
A psychoacoustic model of the voice source spectrum is proposed. The model is characterized by four spectral slope parameters: the difference in amplitude between the first two harmonics (H1-H2), the second and fourth harmonics (H2-H4), the fourth harmonic and the harmonic nearest 2 kHz in frequency (H4-2 kHz), and the harmonic nearest 2 kHz and that nearest 5 kHz (2 kHz-5 kHz). As a step toward model validation, experiments were conducted to establish the acoustic and perceptual independence of these parameters.
View Article and Find Full Text PDFModels of the voice source differ in their fits to natural voices, but it is unclear which differences in fit are perceptually salient. This study examined the relationship between the fit of five voice source models to 40 natural voices, and the degree of perceptual match among stimuli synthesized with each of the modeled sources. Listeners completed a visual sort-and-rate task to compare versions of each voice created with the different source models, and the results were analyzed using multidimensional scaling.
View Article and Find Full Text PDFAt present, two important questions about voice remain unanswered: When voice quality changes, what physiological alteration caused this change, and if a change to the voice production system occurs, what change in perceived quality can be expected? We argue that these questions can only be answered by an integrated model of voice linking production and perception, and we describe steps towards the development of such a model. Preliminary evidence in support of this approach is also presented. We conclude that development of such a model should be a priority for scientists interested in voice, to explain what physical condition(s) might underlie a given voice quality, or what voice quality might result from a specific physical configuration.
View Article and Find Full Text PDFBecause voice signals result from vocal fold vibration, perceptually meaningful vibratory measures should quantify those aspects of vibration that correspond to differences in voice quality. In this study, glottal area waveforms were extracted from high-speed videoendoscopy of the vocal folds. Principal component analysis was applied to these waveforms to investigate the factors that vary with voice quality.
View Article and Find Full Text PDFAt present, it is not well understood how changes in vocal fold biomechanics correspond to changes in voice quality. Understanding such cross-domain links from physiology to acoustics to perception in the "speech chain" is of both theoretical and clinical importance. This study investigates links between changes in body layer stiffness, which is regulated primarily by the thyroarytenoid muscle, and the consequent changes in acoustics and voice quality under left-right symmetric and asymmetric stiffness conditions.
View Article and Find Full Text PDFIncreases in open quotient are widely assumed to cause changes in the amplitude of the first harmonic relative to the second (H1*-H2*), which in turn correspond to increases in perceived vocal breathiness. Empirical support for these assumptions is rather limited, and reported relationships among these three descriptive levels have been variable. This study examined the empirical relationship among H1*-H2*, the glottal open quotient (OQ), and glottal area waveform skewness, measured synchronously from audio recordings and high-speed video images of the larynges of six phonetically knowledgeable, vocally healthy speakers who varied fundamental frequency and voice qualities quasi-orthogonally.
View Article and Find Full Text PDFAlthough the amount of inharmonic energy (noise) present in a human voice is an important determinant of vocal quality, little is known about the perceptual interaction between harmonic and inharmonic aspects of the voice source. This paper reports three experiments investigating this issue. Results indicate that perception of the harmonic slope and of noise levels are both influenced by complex interactions between the spectral shape and relative levels of harmonic and noise energy in the voice source.
View Article and Find Full Text PDFLittle is known about how listeners judge phonemic versus allophonic (or freely varying) versus post-lexical variations in voice quality, or about which acoustic attributes serve as perceptual cues in specific contexts. To address this issue, native speakers of Gujarati, Thai, and English discriminated among pairs of voices that differed only in the relative amplitudes of the first versus second harmonics (H1-H2). Results indicate that speakers of Gujarati (which contrasts H1-H2 phonemically) were more sensitive to changes than are speakers of Thai or English.
View Article and Find Full Text PDFVoice quality is an important perceptual cue in many disciplines, but knowledge of its nature is limited by a poor understanding of the relevant psychoacoustics. This article (aimed at researchers studying voice, speech, and vocal behavior) describes the UCLA voice synthesizer, software for voice analysis and synthesis designed to test hypotheses about the relationship between acoustic parameters and voice quality perception. The synthesizer provides experimenters with a useful tool for creating and modeling voice signals.
View Article and Find Full Text PDFJ Speech Lang Hear Res
June 2011
Purpose: Interrater disagreements in ratings of quality plague the study of voice. This study compared 2 methods for handling this variability.
Method: Listeners provided multiple breathiness ratings for 2 sets of pathological voices, one including 20 male and 20 female voices unselected for quality and one including 20 breathy female voices.
J Acoust Soc Am
October 2010
Little is known about the perceptual importance of changes in the shape of the source spectrum, although many measures have been proposed and correlations with different vocal qualities (breathiness, roughness, nasality, strain...
View Article and Find Full Text PDFAnn Otol Rhinol Laryngol
January 2010
Objectives: Tracheoesophageal puncture (TEP) for postlaryngectomy speech is increasingly being performed as an office-based procedure. We review our experience with office-based TEP and compare outcomes with those of operating room-based TEP. Our hypothesis was that office-based TEP results in improved prosthesis sizing, reducing the number of visits dedicated to prosthesis resizing.
View Article and Find Full Text PDFPurpose: This article presents the development of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) following a consensus conference on perceptual voice quality measurement sponsored by the American Speech-Language-Hearing Association's Special Interest Division 3, Voice and Voice Disorders. The CAPE-V protocol and recording form were designed to promote a standardized approach to evaluating and documenting auditory-perceptual judgments of vocal quality.
Method: A summary of the consensus conference proceedings and the factors considered by the authors in developing this instrument are included.
Modeling sources of listener variability in voice quality assessment is the first step in developing reliable, valid protocols for measuring quality, and provides insight into the reasons that listeners disagree in their quality assessments. This study examined the adequacy of one such model by quantifying the contributions of four factors to interrater variability: instability of listeners' internal standards for different qualities, difficulties isolating individual attributes in voice patterns, scale resolution, and the magnitude of the attribute being measured. One hundred twenty listeners in six experiments assessed vocal quality in tasks that differed in scale resolution, in the presence/absence of comparison stimuli, and in the extent to which the comparison stimuli (if present) matched the target voices.
View Article and Find Full Text PDFPurpose: Many researchers have studied the acoustics, physiology, and perceptual characteristics of the voice source, but despite significant attention, it remains unclear which aspects of the source should be quantified and how measurements should be made. In this study, the authors examined the relationships among a number of existing measures of the glottal source spectrum, along with the association of these measures to overall spectral shapes and to glottal pulse shapes, to determine which measures of the source best capture information about the shapes of glottal pulses and glottal source spectra.
Method: Seventy-eight different measures of source spectral shapes were made on the voices of 70 speakers.
Although jitter, shimmer, and noise acoustically characterize all voice signals, their perceptual importance in naturally produced pathological voices has not been established psychoacoustically. To determine the role of these attributes in the perception of vocal quality, listeners were asked to adjust levels of jitter, shimmer, and the noise-to-signal ratio in a speech synthesizer, so that synthetic voices matched naturally produced tokens. Results showed that, although listeners agreed well in their judgments of the noise-to-signal ratio, they did not agree with one another in their chosen settings for jitter and shimmer.
View Article and Find Full Text PDFVocal tremors characterize many pathological voices, but acoustic-perceptual aspects of tremor are poorly understood. To investigate this relationship, 2 tremor models were implemented in a custom voice synthesizer. The first modulated fundamental frequency (F0) with a sine wave.
View Article and Find Full Text PDFThe source-filter theory of speech production describes a glottal energy source (volume velocity waveform) that is filtered by the vocal tract and radiates from the mouth as phonation. The characteristics of the volume velocity waveform, the source that drives phonation, have been estimated, but never directly measured at the glottis. To accomplish this measurement, constant temperature anemometer probes were used in an in vivo canine constant pressure model of phonation.
View Article and Find Full Text PDF