Speech comprehension is crucial for human social interaction, relying on the integration of auditory and visual cues across various levels of representation. While research has extensively studied multisensory integration (MSI) using idealised, well-controlled stimuli, there is a need to understand this process in response to complex, naturalistic stimuli encountered in everyday life. This study investigated behavioural and neural MSI in neurotypical adults experiencing audio-visual speech within a naturalistic, social context. Our novel paradigm incorporated a broader social situational context, complete words, and speech-supporting iconic gestures, allowing for context-based pragmatics and semantic priors. We investigated MSI in the presence of unimodal (auditory or visual) or complementary, bimodal speech signals. During audio-visual speech trials, compared to unimodal trials, participants more accurately recognised spoken words and showed a more pronounced suppression of alpha power-an indicator of heightened integration load. Importantly, on the neural level, these effects surpassed mere summation of unimodal responses, suggesting non-linear MSI mechanisms. Overall, our findings demonstrate that typically developing adults integrate audio-visual speech and gesture information to facilitate speech comprehension in noisy environments, highlighting the importance of studying MSI in ecologically valid contexts.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11263810PMC
http://dx.doi.org/10.1002/hbm.26797DOI Listing

Publication Analysis

Top Keywords

audio-visual speech
12
multisensory integration
8
speech comprehension
8
auditory visual
8
speech
7
msi
5
integration speech
4
speech gestures
4
gestures naturalistic
4
naturalistic paradigm
4

Similar Publications

A comprehensive analysis of everyday sound perception can be achieved using Electroencephalography (EEG) with the concurrent acquisition of information about the environment. While extensive research has been dedicated to speech perception, the complexities of auditory perception within everyday environments, specifically the types of information and the key features to extract, remain less explored. Our study aims to systematically investigate the relevance of different feature categories: discrete sound-identity markers, general cognitive state information, and acoustic representations, including discrete sound onset, the envelope, and mel-spectrogram.

View Article and Find Full Text PDF
Article Synopsis
  • The adoption of digital synchronous video communication for telecare and teletherapy has surged recently, fueled by COVID-19 and a broader trend toward digital healthcare in the past two decades.
  • A study involving 20 qualitative interviews with health professionals and patients from Germany, Austria, and Switzerland identified six main categories and 20 sub-categories that can influence the effectiveness of telesettings, highlighting the importance of motivation and digital skills.
  • The findings suggest a need for structured guidelines and training to support telesetting, emphasizing the adaptation of methodologies to incorporate audio-visual technology effectively.
View Article and Find Full Text PDF

Cross-Linguistic Recognition of Irony Through Visual and Acoustic Cues.

J Psycholinguist Res

November 2024

Department of Psychology, University of Milan-Bicocca, Piazza Dell'Ateneo Nuovo, 1, 20126, Milan, Italy.

To avoid misunderstandings, ironic speakers may accompany their ironic remarks with a particular intonation and specific facial expressions that signal that the message should not be taken at face value. The acoustic realization of the ironic tone of voice differs from language to language, whereas the ironic face manifests the speaker's negative stance and might thus have a universal basis. We conducted a study on 574 participants speaking 6 different languages (French, German, Dutch, English, Mandarin, and Italian-the control group) to verify whether they could recognize ironic remarks uttered in Italian in three different modalities: watching muted videos, listening to audio tracks, and when both cues were present.

View Article and Find Full Text PDF

Comprehending speech in noise (SiN) poses a challenge for older hearing-impaired listeners, requiring auditory and working memory resources. Visual speech cues provide additional sensory information supporting speech understanding, while the extent of such visual benefit is characterized by large variability, which might be accounted for by individual differences in working memory capacity (WMC). In the current study, we investigated behavioral and neurofunctional (i.

View Article and Find Full Text PDF

The speech-driven facial animation technology is generally categorized into two main types: 3D and 2D talking face. Both of these have garnered considerable research attention in recent years. However, to our knowledge, the research into 3D talking face has not progressed as deeply as that of 2D talking face, particularly in terms of lip-sync and perceptual mouth movements.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!