Speech research during recent years has moved progressively away from its traditional focus on audition toward a more multisensory approach. In addition to audition and vision, many somatosenses including proprioception, pressure, vibration and aerotactile sensation are all highly relevant modalities for experiencing and/or conveying speech. In this article, we review both long-standing cross-modal effects stemming from decades of audiovisual speech research as well as new findings related to somatosensory effects. Cross-modal effects in speech perception to date are found to be constrained by temporal congruence and signal relevance, but appear to be unconstrained by spatial congruence. Far from taking place in a one-, two- or even three-dimensional space, the literature reveals that speech occupies a highly multidimensional sensory space. We argue that future research in cross-modal effects should expand to consider each of these modalities both separately and in combination with other modalities in speech.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8297790 | PMC |
http://dx.doi.org/10.1146/annurev-linguistics-011718-012353 | DOI Listing |
Physiol Meas
January 2025
Academy of Military Science of the People's Liberation Army, Beijing, 100073, CHINA.
Objective: Humanity faces many health challenges, among which respiratory diseases are one of the leading causes of human death. Existing AI-driven pre-diagnosis approaches can enhance the efficiency of diagnosis but still face challenges. For example, single-modal data suffer from information redundancy or loss, difficulty in learning relationships between features, and revealing the obscure characteristics of complex diseases.
View Article and Find Full Text PDFJ Exp Psychol Hum Percept Perform
January 2025
Department of Psychology, Saarland University.
Task-irrelevant sounds that are semantically congruent with the target can facilitate performance in visual search tasks, resulting in faster search times. In three experiments, we tested the underlying processes of this effect. Participants were presented with auditory primes that were semantically congruent, neutral, or incongruent to the visual search target, and importantly, we varied the set size of the search displays.
View Article and Find Full Text PDFNeural Netw
December 2024
School of Computer and Electronic Information, Guangxi University, University Road, Nanning, 530004, Guangxi, China. Electronic address:
Vision-language navigation (VLN) is a challenging task that requires agents to capture the correlation between different modalities from redundant information according to instructions, and then make sequential decisions on visual scenes and text instructions in the action space. Recent research has focused on extracting visual features and enhancing text knowledge, ignoring the potential bias in multi-modal data and the problem of spurious correlations between vision and text. Therefore, this paper studies the relationship structure between multi-modal data from the perspective of causality and weakens the potential correlation between different modalities through cross-modal causality reasoning.
View Article and Find Full Text PDFPatterns (N Y)
December 2024
Data Sciences and Artificial Intelligence Section, College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA, USA.
The placenta is vital to maternal and child health but often overlooked in pregnancy studies. Addressing the need for a more accessible and cost-effective method of placental assessment, our study introduces a computational tool designed for the analysis of placental photographs. Leveraging images and pathology reports collected from sites in the United States and Uganda over a 12-year period, we developed a cross-modal contrastive learning algorithm consisting of pre-alignment, distillation, and retrieval modules.
View Article and Find Full Text PDFBiol Imaging
December 2024
Visual Information Laboratory, University of Bristol, Bristol, UK.
Optical coherence tomography (OCT) and confocal microscopy are pivotal in retinal imaging, offering distinct advantages and limitations. OCT offers rapid, noninvasive imaging but can suffer from clarity issues and motion artifacts, while confocal microscopy, providing high-resolution, cellular-detailed color images, is invasive and raises ethical concerns. To bridge the benefits of both modalities, we propose a novel framework based on unsupervised 3D CycleGAN for translating unpaired OCT to confocal microscopy images.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!