Publications by authors named "Gemma Roig"

Background: Medical imagesegmentation is an essential step in both clinical and research applications, and automated segmentation models-such as TotalSegmentator-have become ubiquitous. However, robust methods for validating the accuracy of these models remain limited, and manual inspection is often necessary before the segmentation masks produced by these models can be used.

Methods: To address this gap, we have developed a novel validation framework for segmentation models, leveraging data augmentation to assess model consistency.

View Article and Find Full Text PDF

Studying the neural basis of human dynamic visual perception requires extensive experimental data to evaluate the large swathes of functionally diverse brain neural networks driven by perceiving visual events. Here, we introduce the BOLD Moments Dataset (BMD), a repository of whole-brain fMRI responses to over 1000 short (3 s) naturalistic video clips of visual events across ten human subjects. We use the videos' extensive metadata to show how the brain represents word- and sentence-level descriptions of visual events and identify correlates of video memorability scores extending into the parietal cortex.

View Article and Find Full Text PDF

EMOKINE is a software package and dataset creation suite for emotional full-body movement research in experimental psychology, affective neuroscience, and computer vision. A computational framework, comprehensive instructions, a pilot dataset, observer ratings, and kinematic feature extraction code are provided to facilitate future dataset creations at scale. In addition, the EMOKINE framework outlines how complex sequences of movements may advance emotion research.

View Article and Find Full Text PDF

To navigate through their immediate environment humans process scene information rapidly. How does the cascade of neural processing elicited by scene viewing to facilitate navigational planning unfold over time? To investigate, we recorded human brain responses to visual scenes with electroencephalography and related those to computational models that operationalize three aspects of scene processing (2D, 3D, and semantic information), as well as to a behavioral model capturing navigational affordances. We found a temporal processing hierarchy: navigational affordance is processed later than the other scene features (2D, 3D, and semantic) investigated.

View Article and Find Full Text PDF

With the rise in traffic congestion in urban centers, predicting accidents has become paramount for city planning and public safety. This work comprehensively studied the efficacy of modern deep learning (DL) methods in forecasting traffic accidents and enhancing Level-4 and Level-5 (L-4 and L-5) driving assistants with actionable visual and language cues. Using a rich dataset detailing accident occurrences, we juxtaposed the Transformer model against traditional time series models like ARIMA and the more recent Prophet model.

View Article and Find Full Text PDF

This paper presents an exploration into the capabilities of an adaptive PID controller within the realm of truck platooning operations, situating the inquiry within the context of Cognitive Radio and AI-enhanced 5G and Beyond 5G (B5G) networks. We developed a Deep Learning (DL) model that emulates an adaptive PID controller, taking into account the implications of factors such as communication latency, packet loss, and communication range, alongside considerations of reliability, robustness, and security. Furthermore, we harnessed a Large Language Model (LLM), GPT-3.

View Article and Find Full Text PDF

The human brain achieves visual object recognition through multiple stages of linear and nonlinear transformations operating at a millisecond scale. To predict and explain these rapid transformations, computational neuroscientists employ machine learning modeling techniques. However, state-of-the-art models require massive amounts of data to properly train, and to the present day there is a lack of vast brain datasets which extensively sample the temporal dynamics of visual object recognition.

View Article and Find Full Text PDF

In this paper, we introduce an approach for future frames prediction based on a single input image. Our method is able to generate an entire video sequence based on the information contained in the input frame. We adopt an autoregressive approach in our generation process, i.

View Article and Find Full Text PDF

To interact with objects in complex environments, we must know what they are and where they are in spite of challenging viewing conditions. Here, we investigated where, how and when representations of object location and category emerge in the human brain when objects appear on cluttered natural scene images using a combination of functional magnetic resonance imaging, electroencephalography and computational models. We found location representations to emerge along the ventral visual stream towards lateral occipital complex, mirrored by gradual emergence in deep neural networks.

View Article and Find Full Text PDF

In this paper, we tackle the problem of predicting the affective responses of movie viewers, based on the content of the movies. Current studies on this topic focus on video representation learning and fusion techniques to combine the extracted features for predicting affect. Yet, these typically, while ignoring the correlation between multiple modality inputs, ignore the correlation between temporal inputs (i.

View Article and Find Full Text PDF

The human visual cortex enables visual perception through a cascade of hierarchical computations in cortical regions with distinct functionalities. Here, we introduce an AI-driven approach to discover the functional mapping of the visual cortex. We related human brain responses to scene images measured with functional MRI (fMRI) systematically to a diverse set of deep neural networks (DNNs) optimized to perform different scene perception tasks.

View Article and Find Full Text PDF

Visual scene perception is mediated by a set of cortical regions that respond preferentially to images of scenes, including the occipital place area (OPA) and parahippocampal place area (PPA). However, the differential contribution of OPA and PPA to scene perception remains an open research question. In this study, we take a deep neural network (DNN)-based computational approach to investigate the differences in OPA and PPA function.

View Article and Find Full Text PDF

Though the range of invariance in recognition of novel objects is a basic aspect of human vision, its characterization has remained surprisingly elusive. Here we report tolerance to scale and position changes in one-shot learning by measuring recognition accuracy of Korean letters presented in a flash to non-Korean subjects who had no previous experience with Korean letters. We found that humans have significant scale-invariance after only a single exposure to a novel object.

View Article and Find Full Text PDF

Objective: The objective of this paper is to investigate whether a small number of sequentially composed multivariable linear controllers can be used to recover a defining relation between the joint torques, angles, and velocities hidden in the walking data of multiple human subjects.

Methods: We solve a mixed integer programming problem that defines the optimal multivariable and multiphase relation between the torques, angles, and velocities for the hip, knee, and ankle joints.

Results: Using the data of seven healthy subjects, we show that the aforementioned relation can be remarkably well represented by four sequentially composed and independently activated multivariable linear controllers; the controllers account for [Formula: see text] (mean ± sem) of the variance in the joint torques across subjects, and [Formula: see text] of the variance for a new subject.

View Article and Find Full Text PDF

Most state-of-the-art visual attention models estimate the probability distribution of fixating the eyes in a location of the image, the so-called saliency maps. Yet, these models do not predict the temporal sequence of eye fixations, which may be valuable for better predicting the human eye fixations, as well as for understanding the role of the different cues during visual exploration. In this paper, we present a method for predicting the sequence of human eye fixations, which is learned from the recorded human eye-tracking data.

View Article and Find Full Text PDF