In the medical field, endoscopic video analysis is crucial for disease diagnosis and minimally invasive surgery. The Endoscopic Foundation Models (Endo- FM) utilize large-scale self-supervised pre-training on endoscopic video data and leverage video transformer models to capture long-range spatiotemporal dependencies. However, detecting complex lesions such as gastrointestinal metaplasia (GIM) in endoscopic videos remains challenging due to unclear boundaries and indistinct features, and Endo-FM has not demonstrated good performance. To this end, we propose a fully fine-tuning strategy with an Extended Learnable Offset Parameter (ELOP), which improves model performance by introducing learnable offset parameters in the input space. Specifically, we propose a novel loss function that combines cross- entropy loss and focal loss through a weighted sum, enabling the model to better focus on hard-to-classify samples during training. We validated ELOP on a private GIM dataset from a local grade-A tertiary hospital and a public polyp detection dataset. Experimental results show that ELOP significantly improves the detection accuracy, achieving accuracy improvements of 6.25 % and 3.75%respectively compared to the original Endo-FM. In summary, ELOP provides an excellent solution for detecting complex lesions in endoscopic videos, achieving more precise diagnoses.

Download full-text PDF

Source
http://dx.doi.org/10.1088/2057-1976/adaec3DOI Listing

Publication Analysis

Top Keywords

learnable offset
12
fine-tuning strategy
8
endoscopic foundation
8
foundation models
8
endoscopic video
8
endoscopic
5
full fine-tuning
4
strategy endoscopic
4
models expanded
4
expanded learnable
4

Similar Publications

In the medical field, endoscopic video analysis is crucial for disease diagnosis and minimally invasive surgery. The Endoscopic Foundation Models (Endo- FM) utilize large-scale self-supervised pre-training on endoscopic video data and leverage video transformer models to capture long-range spatiotemporal dependencies. However, detecting complex lesions such as gastrointestinal metaplasia (GIM) in endoscopic videos remains challenging due to unclear boundaries and indistinct features, and Endo-FM has not demonstrated good performance.

View Article and Find Full Text PDF

Optimizing Transformer-Based Network via Advanced Decoder Design for Medical Image Segmentation.

Biomed Phys Eng Express

January 2025

Shandong University, No. 72, Binhai Road, Jimo, Qingdao City, Shandong Province, Qingdao, 266200, CHINA.

U-Net is widely used in medical image segmentation due to its simple and flexible architecture design. To address the challenges of scale and complexity in medical tasks, several variants of U-Net have been proposed. In particular, methods based on Vision Transformer (ViT), represented by Swin UNETR, have gained widespread attention in recent years.

View Article and Find Full Text PDF

GMmorph: dynamic spatial matching registration model for 3D medical image based on gated Mamba.

Phys Med Biol

January 2025

School of Software, Xi'an Jiaotong University, Xi'an City, Shanxi Province 710049, People's Republic of China.

Deformable registration aims to achieve nonlinear alignment of image space by estimating a dense displacement field. It is commonly used as a preprocessing step in clinical and image analysis applications, such as surgical planning, diagnostic assistance, and surgical navigation. We aim to overcome these challenges: Deep learning-based registration methods often struggle with complex displacements and lack effective interaction between global and local feature information.

View Article and Find Full Text PDF

Transformer based 3D tooth segmentation via point cloud region partition.

Sci Rep

November 2024

State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.

Automatic and accurate tooth segmentation on 3D dental point clouds plays a pivotal role in computer-aided dentistry. Existing Transformer-based methods focus on aggregating local features, but fail to directly model global contexts due to memory limitations and high computational cost. In this paper, we propose a novel Transformer-based 3D tooth segmentation network, called PointRegion, which can process the entire point cloud at a low cost.

View Article and Find Full Text PDF

. The decline in the performance of electromyography (EMG)-based silent speech recognition is widely attributed to disparities in speech patterns, articulation habits, and individual physiology among speakers. Feature alignment by learning a discriminative network that resolves domain offsets across speakers is an effective method to address this problem.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!