Purpose: In surgical computer vision applications, data privacy and expert annotation challenges impede the acquisition of labeled training data. Unpaired image-to-image translation techniques have been explored to automatically generate annotated datasets by translating synthetic images into a realistic domain. The preservation of structure and semantic consistency, i.e., per-class distribution during translation, poses a significant challenge, particularly in cases of semantic distributional mismatch.

Method: This study empirically investigates various translation methods for generating data in surgical applications, explicitly focusing on semantic consistency. Through our analysis, we introduce a novel and simple combination of effective approaches, which we call ConStructS. The defined losses within this approach operate on multiple image patches and spatial resolutions during translation.

Results: Various state-of-the-art models were extensively evaluated on two challenging surgical datasets. With two different evaluation schemes, the semantic consistency and the usefulness of the translated images on downstream semantic segmentation tasks were evaluated. The results demonstrate the effectiveness of the ConStructS method in minimizing semantic distortion, with images generated by this model showing superior utility for downstream training.

Conclusion: In this study, we tackle semantic inconsistency in unpaired image translation for surgical applications with minimal labeled data. The simple model (ConStructS) enhances consistency during translation and serves as a practical way of generating fully labeled and semantically consistent datasets at minimal cost. Our code is available at https://gitlab.com/nct_tso_public/constructs .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11599420PMC
http://dx.doi.org/10.1007/s11548-024-03079-1DOI Listing

Publication Analysis

Top Keywords

semantic consistency
16
surgical applications
12
unpaired image
8
image translation
8
data surgical
8
semantic
7
translation
6
consistency
5
data
5
surgical
5

Similar Publications

Crossmodal correspondences, or widely shared tendencies for mapping experiences across sensory domains, are revealed in common descriptors of musical timbre such as , , and . Two experiments are reported in which participants listened to recordings of musical instruments playing major scales, selected colors to match the timbres, and rated the timbres on crossmodal semantic scales. Experiment A used three different keyboard instruments, each played in three pitch registers.

View Article and Find Full Text PDF

This paper introduces the Morphologically-Analyzed and Syntactically-Annotated Quran (MASAQ) dataset, a comprehensive resource designed to address the scarcity of annotated Quranic Arabic corpora and facilitate the development of advanced Natural Language Processing (NLP) models. The Quran, being a cornerstone of classical Arabic, presents unique challenges for NLP due to its sacred nature and complex linguistic features. MASAQ provides a detailed syntactic and morphological annotation of the entire Quranic text, utilizing a rigorously verified text from Tanzil.

View Article and Find Full Text PDF

Neural specialization for 'visual' concepts emerges in the absence of vision.

Cognition

January 2025

Department of Psychological & Brain Sciences, Johns Hopkins University, Baltimore, MD, USA. Electronic address:

The 'different-body/different-concepts hypothesis' central to some embodiment theories proposes that the sensory capacities of our bodies shape the cognitive and neural basis of our concepts. We tested this hypothesis by comparing behavioral semantic similarity judgments and neural signatures (fMRI) of 'visual' categories ('living things,' or animals, e.g.

View Article and Find Full Text PDF

The field of medical image segmentation powered by deep learning has recently received substantial attention, with a significant focus on developing novel architectures and designing effective loss functions. Traditional loss functions, such as Dice loss and Cross-Entropy loss, predominantly rely on global metrics to compare predictions with labels. However, these global measures often struggle to address challenges such as occlusion and nonuni-form intensity.

View Article and Find Full Text PDF

While there is strong evidence that younger adults use contextual information to generate semantic predictions, findings from older adults are less clear. Age affects cognition in a variety of different ways that may impact prediction mechanisms; while the efficiency of memory systems and processing speed decrease, life experience leads to complementary increases in vocabulary size, real-world knowledge, and even inhibitory control. Using the visual world paradigm, we tested prediction in younger ( = 30, between 18 and 35 years of age) and older adults ( = 30, between 53 and 78 years of age).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!