Video question answering (Video-QA) is a subject undergoing intense study in Artificial Intelligence, which is one of the tasks which can evaluate such AI abilities. In this paper, we propose a Modality Attention Fusion framework with Hybrid Multi-head Self-attention (MAF-HMS). MAF-HMS focuses on the task of answering multiple-choice questions regarding a video-subtitle-QA representation by fusion of attention and self-attention between each modality. We use BERT to extract text features, and use Faster R-CNN to ex-tract visual features to provide a useful input representation for our model to answer questions. In addition, we have constructed a Modality Attention Fusion (MAF) framework for the attention fusion matrix from different modalities (video, subtitles, QA), and use a Hybrid Multi-headed Self-attention (HMS) to further determine the correct answer. Experiments on three separate scene datasets show our overall model outperforms the baseline methods by a large margin. Finally, we conducted extensive ablation studies to verify the various components of the network and demonstrate the effectiveness and advantages of our method over existing methods through question type and required modality experimental results.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9536548 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0275156 | PLOS |
Comput Biol Med
January 2025
Department of Pathology, Peking University Health Science Center, 38 College Road, Haidian, Beijing, 100191, China; Department of Pathology, School of Basic Medical Sciences, Third Hospital, Peking University Health Science Center, Beijing, 100191, China. Electronic address:
Background: Ovarian cancer is among the most lethal gynecologic malignancy that threatens women's lives. Pathological diagnosis is a key tool for early detection and diagnosis of ovarian cancer, guiding treatment strategies. The evaluation of various ovarian cancer-related cells, based on morphological and immunohistochemical pathology images, is deemed an important step.
View Article and Find Full Text PDFMol Divers
January 2025
Key Laboratory for Macromolecular Science of Shaanxi Province, School of Chemistry and Chemical Engineering, Shaanxi Normal University, Xi'an, 710119, People's Republic of China.
Molecular Property Prediction (MPP) is a fundamental task in important research fields such as chemistry, materials, biology, and medicine, where traditional computational chemistry methods based on quantum mechanics often consume substantial time and computing power. In recent years, machine learning has been increasingly used in computational chemistry, in which graph neural networks have shown good performance in molecular property prediction tasks, but they have some limitations in terms of generalizability, interpretability, and certainty. In order to address the above challenges, a Multiscale Molecular Structural Neural Network (MMSNet) is proposed in this paper, which obtains rich multiscale molecular representations through the information fusion between bonded and non-bonded "message passing" structures at the atomic scale and spatial feature information "encoder-decoder" structures at the molecular scale; a multi-level attention mechanism is introduced on the basis of theoretical analysis of molecular mechanics in order to enhance the model's interpretability; the prediction results of MMSNet are used as label values and clustered in the molecular library by the K-NN (K-Nearest Neighbors) algorithm to reverse match the spatial structure of the molecules, and the certainty of the model is quantified by comparing virtual screening results across different K-values.
View Article and Find Full Text PDFSensors (Basel)
January 2025
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China.
With the rapid development of AI algorithms and computational power, object recognition based on deep learning frameworks has become a major research direction in computer vision. UAVs equipped with object detection systems are increasingly used in fields like smart transportation, disaster warning, and emergency rescue. However, due to factors such as the environment, lighting, altitude, and angle, UAV images face challenges like small object sizes, high object density, and significant background interference, making object detection tasks difficult.
View Article and Find Full Text PDFSensors (Basel)
January 2025
Engineering Training Center, Nantong University, Nantong 226019, China.
The issue of obstacle avoidance and safety for visually impaired individuals has been a major topic of research. However, complex street environments still pose significant challenges for blind obstacle detection systems. Existing solutions often fail to provide real-time, accurate obstacle avoidance decisions.
View Article and Find Full Text PDFSensors (Basel)
January 2025
School of Naval Architecture, Ocean and Energy Power Engineering, Wuhan University of Technology, Wuhan 430070, China.
Remaining useful life (RUL) prediction is a cornerstone of Prognostic and Health Management (PHM) for power machinery, playing a crucial role in ensuring the reliability and safety of these critical systems. In recent years, deep learning techniques have shown great promise in RUL prediction, providing more reliable and accurate outcomes. However, existing models often struggle with comprehensive feature extraction, especially in capturing the complex behavior of power machinery, where non-linear degradation patterns arise under varying operational conditions.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!