Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in the images can be easily detected as multiple objects with distinct bounding boxes (considered repetitive objects). The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers. Rather than utilizing all of them for answer prediction, we make an effort to identify the most important connections or eliminate redundant ones. We propose a sparse spatial graph network (SSGN) that introduces a spatially aware relation pruning technique to this task. As spatial factors for relation measurement, we employ spatial distance, geometric dimension, overlap area, and DIoU for spatially aware pruning. We consider three visual relationships for graph learning: object-object, OCR-OCR tokens, and object-OCR token relationships. SSGN is a progressive graph learning architecture that verifies the pivotal relations in the correlated object-token sparse graph, and then in the respective object-based sparse graph and token-based sparse graph. Experiment results on TextVQA and ST-VQA datasets demonstrate that SSGN achieves promising performances. And some visualization results further demonstrate the interpretability of our method.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TIP.2023.3310332 | DOI Listing |
Eur J Med Res
January 2025
Department of Anesthesiology, Chongqing Health Center for Women and Children, Women and Children's Hospital of Chongqing Medical University, No. 120, Longshan Road, Yubei District, Chongqing, 401147, China.
Background: Postoperative pain intensity is influenced by various factors, including genetic variations. The SCN10A gene encodes the Nav1.8 sodium channel protein, which is crucial for pain signal transmission in peripheral sensory neurons.
View Article and Find Full Text PDFNutr Metab (Lond)
January 2025
Shenzhen Center for Chronic Disease Control, No. 2021 Buxin Road, Luohu District, Shenzhen, 518020, China.
Background: Serum vitamin D deficiency is intricately linked to metabolic disorders, however, evidence on its association with continuous metabolic risk in children and adolescents remains insufficient. This study aims to elucidate the relationship between serum vitamin D levels and continuous metabolic risk.
Methods: The cross-sectional analysis involved 4490 participants aged 6 ~ 18, and the longitudinal investigation included 1398 individuals aged 6 ~ 12 years.
BMC Cardiovasc Disord
January 2025
Department of Cardiology, Guangdong Provincial Key Laboratory of Coronary Heart Disease Prevention, Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, No. 106, Zhongshan 2 Road, Yuexiu District, Guangzhou, 510080, China.
Background: Uric acid has been identified as an independent predictor of poor outcomes in patients with heart failure with preserved ejection fraction (HFpEF). However, the impact of gender differences on this association is not fully explored.
Methods: This retrospective cohort study included hospitalized patients with HFpEF from June 2018 to October 2022.
Corticocortical (CC) projections in the visual system facilitate hierarchical processing of sensory information. In addition to direct CC connections, indirect cortico-thalamo-cortical (CTC) pathways through the pulvinar nucleus of the thalamus can relay sensory signals and mediate cortical interactions according to behavioral demands. While the pulvinar connects extensively to the entire visual cortex, it is unknown whether transthalamic pathways link all cortical areas or whether they follow systematic organizational rules.
View Article and Find Full Text PDFAsia Pac J Ophthalmol (Phila)
January 2025
The Primasia International Eye Research Institute of The Chinese University of Hong Kong (Shenzhen), Shenzhen, China. Electronic address:
A panel of 21 international experts are formed by the Asia-Pacific Vitreo-retina Society to work out the consensus and guidelines on polypoidal choroidal vasculopathy (PCV). PCV is a common subtype of neovascular age-related macular degeneration and is more prevalent in Asian populations. Recent advancement in imaging technology allows greater understanding of the disease process of PCV.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!