Visual grounding is a task to localize an object described by a sentence in an image. Conventional visual grounding methods extract visual and linguistic features isolatedly and then perform cross-modal interaction in a post-fusion manner. We argue that this post-fusion mechanism does not fully utilize the information in two modalities. Instead, it is more desired to perform cross-modal interaction during the extraction process of the visual and linguistic feature. In this paper, we propose a language-customized visual feature learning mechanism where linguistic information guides the extraction of visual feature from the very beginning. We instantiate the mechanism as a one-stage framework named Progressive Language-customized Visual feature learning (PLV). Our proposed PLV consists of a Progressive Language-customized Visual Encoder (PLVE) and a grounding module. We customize the visual feature with linguistic guidance at each stage of the PLVE by Channel-wise Language-guided Interaction Modules (CLIM). Our proposed PLV outperforms conventional state-of-the-art methods with large margins across five visual grounding datasets without pre-training on object detection datasets, while achieving real-time speed. The source code is available in the supplementary material.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TIP.2022.3181516 | DOI Listing |
J Neurosurg
January 2025
Departments of1Neurosurgery.
Objective: Craniopharyngiomas are rare, benign brain tumors that are primarily treated with surgery. Although the extended endoscopic endonasal approach (EEEA) has evolved as a more reliable surgical alternative and yields better visual outcomes than traditional craniotomy, postoperative visual deterioration remains one of the most common complications, and relevant risk factors are still poorly defined. Hence, identifying risk factors and developing a predictive model for postoperative visual deterioration is indeed necessary.
View Article and Find Full Text PDFBioinformatics
January 2025
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.
Summary: In recent years there has been a surge in prokaryotic genome assemblies, coming from both isolated organisms and environmental samples. These assemblies often include novel species that are poorly represented in reference databases creating a need for a tool that can annotate both well-described and novel taxa, and can run at scale. Here, we present mettannotator-a comprehensive, scalable Nextflow pipeline for prokaryotic genome annotation that identifies coding and non-coding regions, predicts protein functions, including antimicrobial resistance, and delineates gene clusters.
View Article and Find Full Text PDFDisabil Rehabil Assist Technol
January 2025
School of Rehabilitation Therapy, Queen's University, Kingston, Ontario, Canada.
This article explores the existing research evidence on the potential effectiveness of lipreading as a communication strategy to enhance speech recognition in individuals with hearing impairment. A scoping review was conducted, involving a search of six electronic databases (MEDLINE, Embase, Web of Science, Engineering Village, CINAHL, and PsycINFO) for research papers published between January 2013 and June 2023. This study included original research papers with full texts available in English, covering all study designs: qualitative, quantitative, and mixed methods.
View Article and Find Full Text PDFTransl Vis Sci Technol
January 2025
Glaucoma Service, Wills Eye Hospital, Philadelphia, PA, USA.
Purpose: The integration of artificial intelligence (AI), particularly deep learning (DL), with optical coherence tomography (OCT) offers significant opportunities in the diagnosis and management of glaucoma. This article explores the application of various DL models in enhancing OCT capabilities and addresses the challenges associated with their clinical implementation.
Methods: A review of articles utilizing DL models was conducted, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), autoencoders, and large language models (LLMs).
J Microsc
January 2025
Laboratory of Apicomplexan Biology, Institut Pasteur Montevideo, Montevideo, Uruguay.
Apicomplexans, a large phylum of protozoan intracellular parasites, well known for their ability to invade and proliferate within host cells, cause diseases with major health and economic impacts worldwide. These parasites are responsible for conditions such as malaria, cryptosporidiosis, and toxoplasmosis, which affect humans and other animals. Apicomplexans exhibit complex life cycles, marked by diverse modes of cell division, which are closely associated with their pathogenesis.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!