Speech comprehension is crucial for human social interaction, relying on the integration of auditory and visual cues across various levels of representation. While research has extensively studied multisensory integration (MSI) using idealised, well-controlled stimuli, there is a need to understand this process in response to complex, naturalistic stimuli encountered in everyday life. This study investigated behavioural and neural MSI in neurotypical adults experiencing audio-visual speech within a naturalistic, social context. Our novel paradigm incorporated a broader social situational context, complete words, and speech-supporting iconic gestures, allowing for context-based pragmatics and semantic priors. We investigated MSI in the presence of unimodal (auditory or visual) or complementary, bimodal speech signals. During audio-visual speech trials, compared to unimodal trials, participants more accurately recognised spoken words and showed a more pronounced suppression of alpha power-an indicator of heightened integration load. Importantly, on the neural level, these effects surpassed mere summation of unimodal responses, suggesting non-linear MSI mechanisms. Overall, our findings demonstrate that typically developing adults integrate audio-visual speech and gesture information to facilitate speech comprehension in noisy environments, highlighting the importance of studying MSI in ecologically valid contexts.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11263810 | PMC |
http://dx.doi.org/10.1002/hbm.26797 | DOI Listing |
eNeuro
January 2025
Neurophysiology of Everyday Life Group, Department of Psychology, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
A comprehensive analysis of everyday sound perception can be achieved using Electroencephalography (EEG) with the concurrent acquisition of information about the environment. While extensive research has been dedicated to speech perception, the complexities of auditory perception within everyday environments, specifically the types of information and the key features to extract, remain less explored. Our study aims to systematically investigate the relevance of different feature categories: discrete sound-identity markers, general cognitive state information, and acoustic representations, including discrete sound onset, the envelope, and mel-spectrogram.
View Article and Find Full Text PDFDigit Health
December 2024
Ostbayerische Technische Hochschule (OTH) Regensburg, Faculty of Health and Social Sciences; Nursing Science, Germany.
J Psycholinguist Res
November 2024
Department of Psychology, University of Milan-Bicocca, Piazza Dell'Ateneo Nuovo, 1, 20126, Milan, Italy.
To avoid misunderstandings, ironic speakers may accompany their ironic remarks with a particular intonation and specific facial expressions that signal that the message should not be taken at face value. The acoustic realization of the ironic tone of voice differs from language to language, whereas the ironic face manifests the speaker's negative stance and might thus have a universal basis. We conducted a study on 574 participants speaking 6 different languages (French, German, Dutch, English, Mandarin, and Italian-the control group) to verify whether they could recognize ironic remarks uttered in Italian in three different modalities: watching muted videos, listening to audio tracks, and when both cues were present.
View Article and Find Full Text PDFTrends Hear
October 2024
Computational Neuroscience of Speech and Hearing, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland.
Comprehending speech in noise (SiN) poses a challenge for older hearing-impaired listeners, requiring auditory and working memory resources. Visual speech cues provide additional sensory information supporting speech understanding, while the extent of such visual benefit is characterized by large variability, which might be accounted for by individual differences in working memory capacity (WMC). In the current study, we investigated behavioral and neurofunctional (i.
View Article and Find Full Text PDFThe speech-driven facial animation technology is generally categorized into two main types: 3D and 2D talking face. Both of these have garnered considerable research attention in recent years. However, to our knowledge, the research into 3D talking face has not progressed as deeply as that of 2D talking face, particularly in terms of lip-sync and perceptual mouth movements.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!