Visual speech facilitates auditory speech perception, but the visual cues responsible for these benefits and the information they provide remain unclear. Low-level models emphasize basic temporal cues provided by mouth movements, but these impoverished signals may not fully account for the richness of auditory information provided by visual speech. High-level models posit interactions among abstract categorical (i.e., phonemes/visemes) or amodal (e.g., articulatory) speech representations, but require lossy remapping of speech signals onto abstracted representations. Because visible articulators shape the spectral content of speech, we hypothesized that the perceptual system might exploit natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to extract detailed spectrotemporal information from visual speech without employing high-level abstractions. Consistent with this hypothesis, we found that the time-frequency dynamics of oral resonances (formants) could be predicted with unexpectedly high precision from the changing shape of the mouth during speech. When isolated from other speech cues, speech-based shape deformations improved perceptual sensitivity for corresponding frequency modulations, suggesting that listeners could exploit this cross-modal correspondence to facilitate perception. To test whether this type of correspondence could improve speech comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a condition-specific facilitation effect driven by cross-modal recovery of auditory speech spectra. The perceptual system may therefore use audiovisual correlations rooted in oral acoustics to extract detailed spectrotemporal information from visual speech.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7382243 | PMC |
http://dx.doi.org/10.1073/pnas.2002887117 | DOI Listing |
J Voice
January 2025
Universidade Estadual de Campinas - UNICAMP, Campinas, São Paulo, Brazil. Electronic address:
Objective: To analyze acoustic measures of speech and vowel samples from individuals of different genders and to correlate these acoustic measures with vocal satisfaction. This study aims to provide additional data on acoustic measures, serving as references for clinicians while emphasizing the importance of moving beyond cisgender norms. Additionally, it addresses a gap in the Brazilian context by exploring correlations between acoustic measures and self-perceived vocal satisfaction across diverse gender groups.
View Article and Find Full Text PDFKnowledge of the natural history of deficiency disorder (CDD) is limited to the results of cross-sectional analysis of largely pediatric cohorts. Assessment of outcomes in adulthood is critical for clinical decision-making and future precision medicine approaches but is challenging because of the diagnostic gap and duration of follow-up that would be required for prospective studies. We aimed to delineate the natural history retrospectively from adulthood.
View Article and Find Full Text PDFJ Voice
January 2025
Clínica Santa María, Santiago, Chile.
Purpose: The present study aims at exploring the effect of pitch, loudness, vowel, and voice condition on supraglottic activity among female participants with voice disorders and among female participants with normal voices.
Methods: Forty-four volunteers were recruited. Inclusion criteria for the dysphonic group were: 1) age between 20 and 50 years, 2) reporting at least 1 year-long history of voice problems, 3) moderate or severe dysphonia.
Sci Rep
January 2025
Mallinckrodt Institute of Radiology, Washington University School of Medicine, 4515 McKinley Ave., St. Louis, MO, 63110, USA.
Functional magnetic resonance imaging (fMRI) has dramatically advanced non-invasive human brain mapping and decoding. Functional near-infrared spectroscopy (fNIRS) and high-density diffuse optical tomography (HD-DOT) non-invasively measure blood oxygen fluctuations related to brain activity, like fMRI, at the brain surface, using more-lightweight equipment that circumvents ergonomic and logistical limitations of fMRI. HD-DOT grids have smaller inter-optode spacing (~ 13 mm) than sparse fNIRS (~ 30 mm) and therefore provide higher image quality, with spatial resolution ~ 1/2 that of fMRI, when using the several source-detector distances (13-40 mm) afforded by the HD-DOT grid.
View Article and Find Full Text PDFJ Voice
January 2025
Department of Speech and Language Therapy, School of Health Rehabilitation Sciences, University of Patras, Patras, Greece; A' ENT University Clinic, Medical School, National Kapodistreian University of Athens, Athens, Greece. Electronic address:
Objectives: The Singing Voice Handicap Index (SVHI) was culturally adapted and validated in Greek to examine the impacts of voice problems on a singer's everyday life.
Methods: The translated version was administered to 120 singers in total, along with the translated version of the Voice Handicap Index (VHI), a sort voice history questionnaire, two Self-Rating Dysphonia Severity Scales (SRDSSs), and two visual analog scales. A week after the original completion of the Greek version of SVHI, a second copy of the SVHI was administered to 50% of the participants.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!