Bilinear pooling (BLP) refers to a family of operations recently developed for fusing features from different modalities predominantly for visual question answering (VQA) models. Successive BLP techniques have yielded higher performance with lower computational expense, yet at the same time they have drifted further from the original motivational justification of bilinear models, instead becoming empirically motivated by task performance. Furthermore, despite significant success in text-image fusion in VQA, BLP has not yet gained such notoriety in video question answering (video-QA). Though BLP methods have continued to perform well on video tasks when fusing vision and non-textual features, BLP has recently been overshadowed by other vision and textual feature fusion techniques in video-QA. We aim to add a new perspective to the empirical and motivational drift in BLP. We take a step back and discuss the motivational origins of BLP, highlighting the often-overlooked parallels to neurological theories (Dual Coding Theory and The Two-Stream Model of Vision). We seek to carefully and experimentally ascertain the empirical strengths and limitations of BLP as a multimodal text-vision fusion technique in video-QA using two models (TVQA baseline and heterogeneous-memory-enchanced 'HME' model) and four datasets (TVQA, TGif-QA, MSVD-QA, and EgoVQA). We examine the impact of both simply replacing feature concatenation in the existing models with BLP, and a modified version of the TVQA baseline to accommodate BLP that we name the 'dual-stream' model. We find that our relatively simple integration of BLP does not increase, and mostly harms, performance on these video-QA benchmarks. Using our insights on recent work in BLP for video-QA results and recently proposed theoretical multimodal fusion taxonomies, we offer insight into why BLP-driven performance gain for video-QA benchmarks may be more difficult to achieve than in earlier VQA models. We share our perspective on, and suggest solutions for, the key issues we identify with BLP techniques for multimodal fusion in video-QA. We look beyond the empirical justification of BLP techniques and propose both alternatives and improvements to multimodal fusion by drawing neurological inspiration from Dual Coding Theory and the Two-Stream Model of Vision. We qualitatively highlight the potential for neurological inspirations in video-QA by identifying the relative abundance of psycholinguistically 'concrete' words in the vocabularies for each of the text components ( questions and answers) of the four video-QA datasets we experiment with.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9202627 | PMC |
http://dx.doi.org/10.7717/peerj-cs.974 | DOI Listing |
Leukemia
January 2025
The Clara D. Bloomfield Center for Leukemia Outcomes Research, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA.
The FLT3 gene frequently undergoes mutations in acute myeloid leukemia (AML), with internal tandem duplications (ITD) and tyrosine kinase domain (TKD) point mutations (PMs) being most common. Recently, PMs and deletions in the FLT3 juxtamembrane domain (JMD) have been identified, but their biological and clinical significance remains poorly understood. We analyzed 1660 patients with de novo AML and found FLT3-JMD mutations, mostly PMs, in 2% of the patients.
View Article and Find Full Text PDFBiomech Model Mechanobiol
December 2024
Bioengineering, University of California, Santa Barbara, Santa Barbara, United States.
The heart is a dynamic pump whose function is influenced by its mechanical properties. The viscoelastic properties of the heart, i.e.
View Article and Find Full Text PDFPolymers (Basel)
November 2024
State Key Laboratory of Advanced Optical Communication Systems and Networks, Shanghai Jiao Tong University, Shanghai 200240, China.
We design and fabricate meter-scale long connectorized paper-like flexible multimode polymer waveguide film with a large bandwidth-length product (BLP) for board-level optical interconnects application. The measured BLP of the multimode waveguide is greater than 57.3 GHz·m at a wavelength of 850 nm under the strictest overfilled launch condition with a maximum length of 2.
View Article and Find Full Text PDFCephalalgia
December 2024
AbbVie, North Chicago, IL, USA.
Background: Migraine is associated with obesity. These analyses evaluated weight change with atogepant used as a preventive migraine treatment.
Methods: Five atogepant clinical trials in adults with migraine (one phase 2b/3; four phase 3) were included: Three 12-week, randomized, placebo-controlled trials (episodic migraine: two; chronic migraine: one); one 40-week, open-label extension trial and one 52-week, standard care, randomized, long-term safety trial in episodic migraine.
Commun Biol
December 2024
Department of Microbiology, New York University School of Medicine, New York, NY, USA.
Using chromosomal barcoding, we observed that >97% of the Streptococcus pneumoniae (Spn) population turns over in the lung within 2 days post-inoculation in a murine model. This marked collapse of diversity and bacterial turnover was associated with acute inflammation (severe pneumococcal pneumonia), high bacterial numbers in the lungs, bacteremia, and mortality. Intra-strain competition mediated by the blp locus, which expresses bacteriocins in a quorum-sensing-dependent manner, was required for each of these effects.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!