Vision-language models have emerged as a powerful tool for previously challenging multi-modal classification problem in the medical domain. This development has led to the exploration of automated image description generation for multi-modal clinical scans, particularly for radiology report generation. Existing research has focused on clinical descriptions for specific modalities or body regions, leaving a gap for a model providing entire-body multi-modal descriptions. In this paper, we address this gap by automating the generation of standardized body station(s) and list of organ(s) across the whole body in multi-modal MR and CT radiological images. Leveraging the versatility of the Contrastive Language-Image Pre-training (CLIP), we refine and augment the existing approach through multiple experiments, including baseline model fine-tuning, adding station(s) as a superset for better correlation between organs, along with image and language augmentations. Our proposed approach demonstrates 47.6% performance improvement over baseline PubMedCLIP.

Download full-text PDF

Source
http://dx.doi.org/10.1109/EMBC53108.2024.10781689DOI Listing

Publication Analysis

Top Keywords

multi-modal
5
language augmentation
4
augmentation clip
4
clip improved
4
improved anatomy
4
anatomy detection
4
detection multi-modal
4
multi-modal medical
4
medical images
4
images vision-language
4

Similar Publications

This paper addresses the challenge of reconstructing the motion process of the safety and arming (S&A) mechanism in fuze by transforming the problem into a target detection and tracking problem. A novel tracking method, which fuses an improved Kalman filter with a temporal scale-adaptive KCF (AKF-CF), is proposed. The methodology introduces key innovations: (1) Extraction of grayscale images and directional gradient histogram (HOG) features of the target, followed by the use of an Adaptive Wave PCA-Autoencoder (AWPA) method to accurately capture multi-modal and multi-scale features of the target; (2) Application of bilinear interpolation and hybrid filtering techniques to generate a spatial and temporal scale-adaptive bounding box for the filtered target, enabling dynamic adjustment of the tracking box size; (3) Integration of an occlusion-aware mechanism using average peak correlation energy (APCE) to trigger Kalman-based position prediction when the target is occluded, thus mitigating tracking drift.

View Article and Find Full Text PDF

Multiple imaging modalities and specific proteins in the cerebrospinal fluid, providing a comprehensive understanding of neurodegenerative disorders, have been widely used for computer-aided diagnosis of Alzheimer's disease (AD). Given the proven effectiveness of contrastive learning in aligning multimodal representation, in this paper, we investigate effective contrastive learning strategies to learn better cross-modal representations for the integration of multi-modal complementary information. To enhance the overall performance in AD diagnosis, we construct a unified hybrid network that integrates feature learning and classifier learning into an end-to-end framework.

View Article and Find Full Text PDF

Successful hematopoietic cell transplant requires immunosuppression to prevent graft-versus-host disease (GVHD), a lethal, T-cell-mediated post-transplant complication. The phase 3 BMT CTN 1703 trial demonstrated superior GVHD-free/relapse-free survival for post-transplant cyclophosphamide (PT-Cy)-based GVHD prophylaxis versus tacrolimus/methotrexate (Tac/MTX), but did not improve overall survival. To compare T-cell biology between GVHD prophylaxis regimens, 324 patients were co-enrolled onto BMT CTN 1801 ( NCT03959241 ).

View Article and Find Full Text PDF

Accurate identification of bird species is essential for monitoring biodiversity, analyzing ecological patterns, assessing population health, and guiding conservation efforts. Birds serve as vital indicators of environmental change, making species identification critical for habitat protection and understanding ecosystem dynamics. With over 1,300 species, India's avifauna presents significant challenges due to morphological and acoustic similarities among species.

View Article and Find Full Text PDF

Background: Understanding the factors that determine distinct courses of anxiety symptoms throughout development will better guide interventions. There are scarce data-driven longitudinal studies, using multi-modal predictors, investigating the chronicity of anxiety symptoms from childhood to young adulthood, particularly in a middle-income country.

Methods: 2033 youths (ages 6-14 years [Mean age = 10.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!