Cross-modal effects in speech perception.

Annu Rev Linguist

Interdisciplinary Speech Research Lab, Department of Linguistics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada.

Published: January 2019

Speech research during recent years has moved progressively away from its traditional focus on audition toward a more multisensory approach. In addition to audition and vision, many somatosenses including proprioception, pressure, vibration and aerotactile sensation are all highly relevant modalities for experiencing and/or conveying speech. In this article, we review both long-standing cross-modal effects stemming from decades of audiovisual speech research as well as new findings related to somatosensory effects. Cross-modal effects in speech perception to date are found to be constrained by temporal congruence and signal relevance, but appear to be unconstrained by spatial congruence. Far from taking place in a one-, two- or even three-dimensional space, the literature reveals that speech occupies a highly multidimensional sensory space. We argue that future research in cross-modal effects should expand to consider each of these modalities both separately and in combination with other modalities in speech.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8297790PMC
http://dx.doi.org/10.1146/annurev-linguistics-011718-012353DOI Listing

Publication Analysis

Top Keywords

cross-modal effects
16
effects speech
8
speech perception
8
speech
7
cross-modal
4
perception speech
4
speech years
4
years moved
4
moved progressively
4
progressively traditional
4

Similar Publications

Self-critical strategy adjustment based artificial intelligence method in generating diagnostic reports of respiratory diseases.

Physiol Meas

January 2025

Academy of Military Science of the People's Liberation Army, Beijing, 100073, CHINA.

Objective: Humanity faces many health challenges, among which respiratory diseases are one of the leading causes of human death. Existing AI-driven pre-diagnosis approaches can enhance the efficiency of diagnosis but still face challenges. For example, single-modal data suffer from information redundancy or loss, difficulty in learning relationships between features, and revealing the obscure characteristics of complex diseases.

View Article and Find Full Text PDF

Task-irrelevant sounds that are semantically congruent with the target can facilitate performance in visual search tasks, resulting in faster search times. In three experiments, we tested the underlying processes of this effect. Participants were presented with auditory primes that were semantically congruent, neutral, or incongruent to the visual search target, and importantly, we varied the set size of the search displays.

View Article and Find Full Text PDF

DICCR: Double-gated intervention and confounder causal reasoning for vision-language navigation.

Neural Netw

December 2024

School of Computer and Electronic Information, Guangxi University, University Road, Nanning, 530004, Guangxi, China. Electronic address:

Vision-language navigation (VLN) is a challenging task that requires agents to capture the correlation between different modalities from redundant information according to instructions, and then make sequential decisions on visual scenes and text instructions in the action space. Recent research has focused on extracting visual features and enhancing text knowledge, ignoring the potential bias in multi-modal data and the problem of spurious correlations between vision and text. Therefore, this paper studies the relationship structure between multi-modal data from the perspective of causality and weakens the potential correlation between different modalities through cross-modal causality reasoning.

View Article and Find Full Text PDF

Cross-modal contrastive learning for unified placenta analysis using photographs.

Patterns (N Y)

December 2024

Data Sciences and Artificial Intelligence Section, College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA, USA.

The placenta is vital to maternal and child health but often overlooked in pregnancy studies. Addressing the need for a more accessible and cost-effective method of placental assessment, our study introduces a computational tool designed for the analysis of placental photographs. Leveraging images and pathology reports collected from sites in the United States and Uganda over a 12-year period, we developed a cross-modal contrastive learning algorithm consisting of pre-alignment, distillation, and retrieval modules.

View Article and Find Full Text PDF

Optical coherence tomography (OCT) and confocal microscopy are pivotal in retinal imaging, offering distinct advantages and limitations. OCT offers rapid, noninvasive imaging but can suffer from clarity issues and motion artifacts, while confocal microscopy, providing high-resolution, cellular-detailed color images, is invasive and raises ethical concerns. To bridge the benefits of both modalities, we propose a novel framework based on unsupervised 3D CycleGAN for translating unpaired OCT to confocal microscopy images.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!