Vision-and-language navigation (VLN) tasks require agents to navigate three-dimensional environments guided by natural language instructions, offering substantial potential for diverse applications. However, the scarcity of training data impedes progress in this field. This paper introduces PanoGen++, a novel framework that addresses this limitation by generating varied and pertinent panoramic environments for VLN tasks.
View Article and Find Full Text PDFIn recent years, electroencephalogram (EEG)-based emotion recognition technology has made remarkable advances. However, a subtle but crucial problem caused by the sliding window method has long been overlooked, which is the serious quantity mismatch between stimuli and short-term EEG frames. This may be an important factor limiting the performance of the emotion recognition systems.
View Article and Find Full Text PDFIEEE Trans Vis Comput Graph
December 2024
Eye tracking technology, essential for enhancing user experience in virtual reality (VR) and augmented reality (AR) devices, has been widely incorporated into advanced head-mounted devices like the Apple Vision Pro and PICO 4 Pro, becoming a standard feature. However, dedicated eye tracking datasets for such devices are severely lacking, with existing datasets commonly facing issues like camera skew and low resolution, particularly failing to adequately consider the diversity in wearing postures. To address this gap, we have developed the Posture-Variant Eye Tracking Dataset (PVEye), which includes 11,044,800 high-resolution near-eye images from 104 participants, showcasing a rich variety of wearing postures.
View Article and Find Full Text PDFIndividuals' affective experience can be intricate, influenced by various factors including monetary rewards and social factors during social interaction. However, within this array of factors, divergent evidence has been considered as potential contributors to social anxiety. To gain a better understanding of the specific factors associated with anxiety during social interaction, we combined a social interaction task with neurophysiological recordings obtained through an anxiety-elicitation task conducted in a Virtual Reality (VR) environment.
View Article and Find Full Text PDF. The decline in the performance of electromyography (EMG)-based silent speech recognition is widely attributed to disparities in speech patterns, articulation habits, and individual physiology among speakers. Feature alignment by learning a discriminative network that resolves domain offsets across speakers is an effective method to address this problem.
View Article and Find Full Text PDFPrevious resting-state functional magnetic resonance imaging (rs-fMRI) studies have widely explored the temporal connection changes in the human brain following long-term sleep deprivation (SD). However, the frequency-specific topological properties of sleep-deprived functional networks remain virtually unclear. In this study, thirty-seven healthy male subjects underwent resting-state fMRI during rested wakefulness (RW) and after 36 hours of SD, and we examined frequency-specific spectral connection changes (0.
View Article and Find Full Text PDFThe cognitive and behavioral functions of the human brain are supported by its frequency multiplexing mechanism. However, there is limited understanding of the dynamics of the functional network topology. This study aims to investigate the frequency-specific topology of the functional human brain using 7T rs-fMRI data.
View Article and Find Full Text PDFJ Integr Neurosci
February 2024
Background: Emotions are thought to be related to distinct patterns of neural oscillations, but the interactions among multi-frequency neural oscillations during different emotional states lack full exploration. Phase-amplitude coupling is a promising tool for understanding the complexity of the neurophysiological system, thereby playing a crucial role in revealing the physiological mechanisms underlying emotional electroencephalogram (EEG). However, the non-sinusoidal characteristics of EEG lead to the non-uniform distribution of phase angles, which could potentially affect the analysis of phase-amplitude coupling.
View Article and Find Full Text PDFBackground: Affective computing has gained increasing attention in the area of the human-computer interface where electroencephalography (EEG)-based emotion recognition occupies an important position. Nevertheless, the diversity of emotions and the complexity of EEG signals result in unexplored relationships between emotion and multichannel EEG signal frequency, as well as spatial and temporal information.
Methods: Audio-video stimulus materials were used that elicited four types of emotions (sad, fearful, happy, neutral) in 32 male and female subjects (age 21-42 years) while collecting EEG signals.
In brain-computer interface (BCI) systems, challenges are presented by the recognition of motor imagery (MI) brain signals. Established recognition approaches have achieved favorable performance from patterns like SSVEP, AEP, and P300, whereas the classification methods for MI need to be improved. Hence, seeking a classification method that exhibits high accuracy and robustness for application in MI-BCI systems is essential.
View Article and Find Full Text PDFAnnu Int Conf IEEE Eng Med Biol Soc
July 2023
Light, and sound are persistently out of sync for subjective temporal perception called point of subjective simultaneity (PSS). It is stable within individuals but variable among individuals. Previous studies found that spontaneous alpha power, functioning in attention-related brain states, predicts individual PSS in the temporal order judgment (TOJ) task.
View Article and Find Full Text PDFIEEE Trans Biomed Eng
April 2024
Nowadays, how to estimate vigilance with higher accuracy has become a hot field of research direction. Although the increasing available modalities opens the door for amazing new possibilities to achieve good performance, the uncertain cross-modal interaction still poses a real challenge to the multimodal fusion. In this paper, a cross-modality alignment method has been proposed based on the contrastive learning for extracting shared but not the same information among modalities.
View Article and Find Full Text PDFFront Hum Neurosci
October 2023
Introduction: Emotion recognition plays a crucial role in affective computing. Recent studies have demonstrated that the fuzzy boundaries among negative emotions make recognition difficult. However, to the best of our knowledge, no formal study has been conducted thus far to explore the effects of increased negative emotion categories on emotion recognition.
View Article and Find Full Text PDFBioengineering (Basel)
October 2023
(1) Background: Emotion recognition based on EEG signals is a rapidly growing and promising research field in affective computing. However, traditional methods have focused on single-channel features that reflect time-domain or frequency-domain information of the EEG, as well as bi-channel features that reveal channel-wise relationships across brain regions. Despite these efforts, the mechanism of mutual interactions between EEG rhythms under different emotional expressions remains largely unexplored.
View Article and Find Full Text PDFMultisensory integration occurs within a limited time interval between multimodal stimuli. Multisensory temporal perception varies widely among individuals and involves perceptual synchrony and temporal sensitivity processes. Previous studies explored the neural mechanisms of individual differences for beep-flash stimuli, whereas there was no study for speech.
View Article and Find Full Text PDFMultisensory integration is more likely to occur if the multimodal inputs are within a narrow temporal window called temporal binding window (TBW). Prestimulus local neural oscillations and interregional synchrony within sensory areas can modulate cross-modal integration. Previous work has examined the role of ongoing neural oscillations in audiovisual temporal integration, but there is no unified conclusion.
View Article and Find Full Text PDFThe study of wearable systems based on surface electromyography (sEMG) signals has attracted widespread attention and plays an important role in human-computer interaction, physiological state monitoring, and other fields. Traditional sEMG signal acquisition systems are primarily targeted at body parts that are not in line with daily wearing habits, such as the arms, legs, and face. In addition, some systems rely on wired connections, which impacts their flexibility and user-friendliness.
View Article and Find Full Text PDFObjective: Perceptual integration and segregation are modulated by the phase of ongoing neural oscillation whose frequency period is broader than the size of the temporal binding window (TBW). Studies have shown that the abstract beep-flash stimuli with about 100 ms TBW were modulated by the alpha band phase. Therefore, we hypothesize that the temporal perception of speech with about hundreds of milliseconds of TBW might be affected by the delta-theta phase.
View Article and Find Full Text PDFIntroduction: A high perceptual load can effectively prevent attention from being drawn to irrelevant stimuli; however, the neural pattern underlying this process remains unclear.
Methods: This study adopted a perceptual load paradigm to examine the temporal processes of attentional modulation by incorporating conditions of perceptual load, distractor-target compatibility, and eccentricity.
Results: The behavioral results showed that a high perceptual load significantly reduced attentional distraction caused by peripheral distractors.
Silent speech recognition breaks the limitations of automatic speech recognition when acoustic signals cannot be produced or captured clearly, but still has a long way to go before being ready for any real-life applications. To address this issue, we propose a novel silent speech recognition framework based on surface electromyography (sEMG) signals. In our approach, a new deep learning architecture Parallel Inception Convolutional Neural Network (PICNN) is proposed and implemented in our silent speech recognition system, with six inception modules processing six channels of sEMG data, separately and simultaneously.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
January 2024
Vision-language navigation (VLN) is a challenging task, which guides an agent to navigate in a realistic environment by natural language instructions. Sequence-to-sequence modeling is one of the most prospective architectures for the task, which achieves the agent navigation goal by a sequence of moving actions. The line of work has led to the state-of-the-art performance.
View Article and Find Full Text PDFJ Neurosci Methods
March 2022
Background: The Gaze-independent BCI system is used to restore communication in patients with eye movement disorders. One available control mechanism is the utilization of spatial attention. However, spatial information is mostly used to simply answer the "True/False" target recognition question and is seldom used to improve the efficiency of target detection.
View Article and Find Full Text PDFAnnu Int Conf IEEE Eng Med Biol Soc
November 2021
With the purpose of providing an external human-machine interaction platform for the elderly in need, a novel facial surface electromyography based silent speech recognition system was developed. In this study, we propose a deep learning architecture named Parallel-Inception Convolutional Neural Network (PICNN), and employ up-to-date feature extraction method log Mel frequency spectral coefficients (MFSC). To better meet the requirements of our target users, a 100-class dataset containing daily life-related demands was designed and generated for the comparative experiments.
View Article and Find Full Text PDF