Integrating information from vision and language modalities has sparked interesting applications in the fields of computer vision and natural language processing. Existing methods, though promising in tasks like image captioning and visual question answering, face challenges in understanding real-life issues and offering step-by-step solutions. In particular, they typically limit their scope to solutions with a sequential structure, thus ignoring complex inter-step dependencies. To bridge this gap, we propose a graph-based approach to vision-language problem solving. It leverages a novel integrated attention mechanism that jointly considers the importance of features within each step as well as across multiple steps. Together with a graph neural network method, this attention mechanism can be progressively learned to predict sequential and non-sequential solution graphs depending on the characterization of the problem-solving process. To tightly couple attention with the problem-solving procedure, we further design new learning objectives with attention metrics that quantify this integrated attention, which better aligns visual and language information within steps, and more accurately captures information flow between steps. Experimental results on VisualHow, a comprehensive dataset of varying solution structures, show significant improvements in predicting steps and dependencies, demonstrating the effectiveness of our approach in tackling various vision-language problems.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2024.3357631 | DOI Listing |
Viruses
November 2024
Department of Sciences and Technologies for Sustainable Development and One Health, Universita Campus Bio-Medico di Roma, 00128 Rome, Italy.
Wolbachia-based mosquito control strategies have gained significant attention as a sustainable approach to reduce the transmission of vector-borne diseases such as dengue, Zika, and chikungunya. These endosymbiotic bacteria can limit the ability of mosquitoes to transmit pathogens, offering a promising alternative to traditional chemical-based interventions. With the growing impact of climate change on mosquito population dynamics and disease transmission, Wolbachia interventions represent an adaptable and resilient strategy for mitigating the public health burden of vector-borne diseases.
View Article and Find Full Text PDFSensors (Basel)
December 2024
School of Mechanical and Power Engineering, Zhengzhou University, Zhengzhou 450000, China.
To address the limitations of existing deep learning-based algorithms in detecting surface defects on brake pipe ends, a novel lightweight detection algorithm, FP-YOLOv8, is proposed. This algorithm is developed based on the YOLOv8n framework with the aim of improving accuracy and model lightweight design. First, the C2f_GhostV2 module has been designed to replace the original C2f module.
View Article and Find Full Text PDFSensors (Basel)
December 2024
NUS-ISS, National University of Singapore, Singapore 119615, Singapore.
The attention mechanism is essential to (CNN) vision backbones used for sensing and imaging systems. Conventional attention modules are designed heuristically, relying heavily on empirical tuning. To tackle the challenge of designing attention mechanisms, this paper proposes a novel probabilistic attention mechanism.
View Article and Find Full Text PDFSensors (Basel)
December 2024
Shanghai Research Institute of Microelectronics, Peking University, Shanghai 201203, China.
Despite the accuracy and robustness attained in the field of object tracking, algorithms based on Siamese neural networks often over-rely on information from the initial frame, neglecting necessary updates to the template; furthermore, in prolonged tracking situations, such methodologies encounter challenges in efficiently addressing issues such as complete occlusion or instances where the target exits the frame. To tackle these issues, this study enhances the SiamRPN algorithm by integrating the convolutional block attention module (CBAM), which enhances spatial channel attention. Additionally, it integrates the kernelized correlation filters (KCFs) for enhanced feature template representation.
View Article and Find Full Text PDFSensors (Basel)
December 2024
Department of Information and Electronic Engineering, International Hellenic University, 57001 Thessaloniki, Greece.
Recent advances in emotion recognition through Artificial Intelligence (AI) have demonstrated potential applications in various fields (e.g., healthcare, advertising, and driving technology), with electroencephalogram (EEG)-based approaches demonstrating superior accuracy compared to facial or vocal methods due to their resistance to intentional manipulation.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!