Vision perceptually restores auditory spectral dynamics in speech.

John Plass David Brang Satoru Suzuki Marcia Grabowecky

Proc Natl Acad Sci U S A

Department of Psychology, Northwestern University, Evanston, IL 60208.

Published: July 2020

Visual speech facilitates auditory speech perception, but the visual cues responsible for these benefits and the information they provide remain unclear. Low-level models emphasize basic temporal cues provided by mouth movements, but these impoverished signals may not fully account for the richness of auditory information provided by visual speech. High-level models posit interactions among abstract categorical (i.e., phonemes/visemes) or amodal (e.g., articulatory) speech representations, but require lossy remapping of speech signals onto abstracted representations. Because visible articulators shape the spectral content of speech, we hypothesized that the perceptual system might exploit natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to extract detailed spectrotemporal information from visual speech without employing high-level abstractions. Consistent with this hypothesis, we found that the time-frequency dynamics of oral resonances (formants) could be predicted with unexpectedly high precision from the changing shape of the mouth during speech. When isolated from other speech cues, speech-based shape deformations improved perceptual sensitivity for corresponding frequency modulations, suggesting that listeners could exploit this cross-modal correspondence to facilitate perception. To test whether this type of correspondence could improve speech comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a condition-specific facilitation effect driven by cross-modal recovery of auditory speech spectra. The perceptual system may therefore use audiovisual correlations rooted in oral acoustics to extract detailed spectrotemporal information from visual speech.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7382243	PMC
http://dx.doi.org/10.1073/pnas.2002887117	DOI Listing

Publication Analysis

Top Keywords

visual speech

speech

auditory speech

visual

perceptual system

frequency modulations

extract detailed

detailed spectrotemporal

spectrotemporal visual

auditory

Similar Publications

Acoustic Measures According to Speaker Gender Identity: Differences and Correlation With Vocal Satisfaction.

J Voice

January 2025

Universidade Estadual de Campinas - UNICAMP, Campinas, São Paulo, Brazil. Electronic address:

Diego Henrique da Cruz Martinho Ana Carolina Constantini

Objective: To analyze acoustic measures of speech and vowel samples from individuals of different genders and to correlate these acoustic measures with vocal satisfaction. This study aims to provide additional data on acoustic measures, serving as references for clinicians while emphasizing the importance of moving beyond cisgender norms. Additionally, it addresses a gap in the Brazilian context by exploring correlations between acoustic measures and self-perceived vocal satisfaction across diverse gender groups.

View Article and Find Full Text PDF

Similar Publications

The natural history of CDKL5 deficiency disorder into adulthood.

medRxiv

January 2025

Angel Aledo-Serrano David Lewis-Smith Helen Leonard Allan Bayat Mohamed Junaid

Knowledge of the natural history of deficiency disorder (CDD) is limited to the results of cross-sectional analysis of largely pediatric cohorts. Assessment of outcomes in adulthood is critical for clinical decision-making and future precision medicine approaches but is challenging because of the diagnostic gap and duration of follow-up that would be required for prospective studies. We aimed to delineate the natural history retrospectively from adulthood.

View Article and Find Full Text PDF

Similar Publications

Influence of Loudness, Pitch, Vowel, and Voice Condition on Supraglottic Tissue Displacement in Female Participants.

J Voice

January 2025

Clínica Santa María, Santiago, Chile.

Marco Guzman Juan Del Lago Camilo Quezada Josefina Jiménez Florencia Perlwitz

Purpose: The present study aims at exploring the effect of pitch, loudness, vowel, and voice condition on supraglottic activity among female participants with voice disorders and among female participants with normal voices.

Methods: Forty-four volunteers were recruited. Inclusion criteria for the dysphonic group were: 1) age between 20 and 50 years, 2) reporting at least 1 year-long history of voice problems, 3) moderate or severe dysphonia.

View Article and Find Full Text PDF

Similar Publications

Ultra high density imaging arrays in diffuse optical tomography for human brain mapping improve image quality and decoding performance.

Sci Rep

January 2025

Mallinckrodt Institute of Radiology, Washington University School of Medicine, 4515 McKinley Ave., St. Louis, MO, 63110, USA.

Zachary E Markow Jason W Trobaugh Edward J Richter Kalyan Tripathy Sean M Rafferty

Functional magnetic resonance imaging (fMRI) has dramatically advanced non-invasive human brain mapping and decoding. Functional near-infrared spectroscopy (fNIRS) and high-density diffuse optical tomography (HD-DOT) non-invasively measure blood oxygen fluctuations related to brain activity, like fMRI, at the brain surface, using more-lightweight equipment that circumvents ergonomic and logistical limitations of fMRI. HD-DOT grids have smaller inter-optode spacing (~ 13 mm) than sparse fNIRS (~ 30 mm) and therefore provide higher image quality, with spatial resolution ~ 1/2 that of fMRI, when using the several source-detector distances (13-40 mm) afforded by the HD-DOT grid.

View Article and Find Full Text PDF

Similar Publications

Validation of the Singing Voice Handicap Index in Greek Singers: Normal and Voice-Disordered Participants.

J Voice

January 2025

Department of Speech and Language Therapy, School of Health Rehabilitation Sciences, University of Patras, Patras, Greece; A' ENT University Clinic, Medical School, National Kapodistreian University of Athens, Athens, Greece. Electronic address:

Joanna Giannopoulou Elina Papadopoulou Athanasios Bibas Ilias Papathanasiou

Objectives: The Singing Voice Handicap Index (SVHI) was culturally adapted and validated in Greek to examine the impacts of voice problems on a singer's everyday life.

Methods: The translated version was administered to 120 singers in total, along with the translated version of the Voice Handicap Index (VHI), a sort voice history questionnaire, two Self-Rating Dysphonia Severity Scales (SRDSSs), and two visual analog scales. A week after the original completion of the Greek version of SVHI, a second copy of the SVHI was administered to 50% of the participants.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!