Visual-semantic embedding (VSE) networks create joint image-text representations to map images and texts in a shared embedding space to enable various information retrieval-related tasks, such as image-text retrieval, image captioning, and visual question answering. The most recent state-of-the-art VSE-based networks are: VSE++, SCAN, VSRN, and UNITER. This study evaluates the performance of those VSE networks for the task of image-to-text retrieval and identifies and analyses their strengths and limitations to guide future research on the topic. The experimental results on Flickr30K revealed that the pre-trained network, UNITER, achieved 61.5% on average Recall@5 for the task of retrieving all relevant descriptions. The traditional networks, VSRN, SCAN, and VSE++, achieved 50.3%, 47.1%, and 29.4% on average Recall@5, respectively, for the same task. An additional analysis was performed on image-text pairs from the top 25 worst-performing classes using a subset of the Flickr30K-based dataset to identify the limitations of the performance of the best-performing models, VSRN and UNITER. These limitations are discussed from the perspective of image scenes, image objects, image semantics, and basic functions of neural networks. This paper discusses the strengths and limitations of VSE networks to guide further research into the topic of using VSE networks for cross-modal information retrieval tasks.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8404943 | PMC |
http://dx.doi.org/10.3390/jimaging7080125 | DOI Listing |
Psychol Res
October 2024
School of Information, Kochi University of Technology, Kami-city, Kochi, 782-8502, Japan.
Cognitive control has been investigated in attentional conflict tasks for a long time. One representative phenomenon of adaptive cognitive control in these tasks is the congruency sequence effect (CSE), which means that a previous conflict will lead to reduced congruency effects at the current moment, reflecting increased control of attention toward the task at hand. One debating question is whether CSE can generalize between different conditions.
View Article and Find Full Text PDFInt J Mol Sci
March 2024
Fundación Instituto de Investigación Sanitaria de Santiago de Compostela (FIDIS), Hospital Clínico, 15706 Santiago de Compostela, Spain.
This study aimed to investigate the venom sac extracts (VSEs) of the European hornet (EH) (Linnaeus, 1758) (Hymenoptera: Vespidae), focusing on the differences between stinging females, gynes (G), and workers (W), at the protein level. Using a quantitative "Sequential Window Acquisition of all Theoretical Fragment Ion Mass Spectra" (SWATH-MS) analysis, we identified and quantified a total of 240 proteins. Notably, within the group, 45.
View Article and Find Full Text PDFSensors (Basel)
July 2023
Automotive Engineering Research Institute, Jiangsu University, Zhenjiang 212013, China.
This paper proposes a novel vehicle state estimation (VSE) method that combines a physics-informed neural network (PINN) and an unscented Kalman filter on manifolds (UKF-M). This VSE aimed to achieve inertial measurement unit (IMU) calibration and provide comprehensive information on the vehicle's dynamic state. The proposed method leverages a PINN to eliminate IMU drift by constraining the loss function with ordinary differential equations (ODEs).
View Article and Find Full Text PDFSmall
November 2022
New Energy Technology Engineering Lab of Jiangsu Province, College of Science, Nanjing University of Posts & Telecommunications (NUPT), Nanjing, 210023, China.
Defect engineering of transition metal dichalcogenides (TMDCs) is important for improving electrocatalytic hydrogen evolution reaction (HER) performance. Herein, a facile and scalable atomic-level di-defect strategy over thermodynamically stable VSe nanoflakes, yielding attractive improvements in the electrocatalytic HER performance over a wide electrolyte pH range is reported. The di-defect configuration with controllable spatial relation between single-atom (SA) V defects and single Se vacancy defects effectively triggers the electrocatalytic HER activity of the inert VSe basal plane.
View Article and Find Full Text PDFScientometrics
April 2022
Centre of Information and Library Services, VSE Praha, Prague, Czech Republic.
Multiple studies have investigated bibliometric factors predictive of the citation count a research article will receive. In this article, we go beyond bibliometric data by using a range of machine learning techniques to find patterns predictive of citation count using both article content and available metadata. As the input collection, we use the CORD-19 corpus containing research articles-mostly from biology and medicine-applicable to the COVID-19 crisis.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!