Full fine-tuning strategy for endoscopic foundation models with expanded learnable offset parameters.

Minghan Dong Xiangwei Zheng Xia Zhang Xingyu Zhang Mingzhe Zhang

Biomed Phys Eng Express

Shandong Normal University, Jinan, Jinan, Shandong, 250014, CHINA.

Published: January 2025

In the medical field, endoscopic video analysis is crucial for disease diagnosis and minimally invasive surgery. The Endoscopic Foundation Models (Endo- FM) utilize large-scale self-supervised pre-training on endoscopic video data and leverage video transformer models to capture long-range spatiotemporal dependencies. However, detecting complex lesions such as gastrointestinal metaplasia (GIM) in endoscopic videos remains challenging due to unclear boundaries and indistinct features, and Endo-FM has not demonstrated good performance. To this end, we propose a fully fine-tuning strategy with an Extended Learnable Offset Parameter (ELOP), which improves model performance by introducing learnable offset parameters in the input space. Specifically, we propose a novel loss function that combines cross- entropy loss and focal loss through a weighted sum, enabling the model to better focus on hard-to-classify samples during training. We validated ELOP on a private GIM dataset from a local grade-A tertiary hospital and a public polyp detection dataset. Experimental results show that ELOP significantly improves the detection accuracy, achieving accuracy improvements of 6.25 % and 3.75%respectively compared to the original Endo-FM. In summary, ELOP provides an excellent solution for detecting complex lesions in endoscopic videos, achieving more precise diagnoses.

Download full-text PDF	Source
http://dx.doi.org/10.1088/2057-1976/adaec3	DOI Listing

Publication Analysis

Top Keywords

learnable offset

fine-tuning strategy

endoscopic foundation

foundation models

endoscopic video

endoscopic

full fine-tuning

strategy endoscopic

models expanded

expanded learnable

Similar Publications

Full fine-tuning strategy for endoscopic foundation models with expanded learnable offset parameters.

Biomed Phys Eng Express

January 2025

Shandong Normal University, Jinan, Jinan, Shandong, 250014, CHINA.

Minghan Dong Xiangwei Zheng Xia Zhang Xingyu Zhang Mingzhe Zhang

View Article and Find Full Text PDF

Similar Publications

Optimizing Transformer-Based Network via Advanced Decoder Design for Medical Image Segmentation.

Biomed Phys Eng Express

January 2025

Shandong University, No. 72, Binhai Road, Jimo, Qingdao City, Shandong Province, Qingdao, 266200, CHINA.

Weibin Yang Zhiqi Dong Mingyuan Xu Longwei Xu Dehua Geng

U-Net is widely used in medical image segmentation due to its simple and flexible architecture design. To address the challenges of scale and complexity in medical tasks, several variants of U-Net have been proposed. In particular, methods based on Vision Transformer (ViT), represented by Swin UNETR, have gained widespread attention in recent years.

View Article and Find Full Text PDF

Similar Publications

GMmorph: dynamic spatial matching registration model for 3D medical image based on gated Mamba.

Phys Med Biol

January 2025

School of Software, Xi'an Jiaotong University, Xi'an City, Shanxi Province 710049, People's Republic of China.

Hao Lin Yonghong Song Qi Zhang

Deformable registration aims to achieve nonlinear alignment of image space by estimating a dense displacement field. It is commonly used as a preprocessing step in clinical and image analysis applications, such as surgical planning, diagnostic assistance, and surgical navigation. We aim to overcome these challenges: Deep learning-based registration methods often struggle with complex displacements and lack effective interaction between global and local feature information.

View Article and Find Full Text PDF

Similar Publications

Transformer based 3D tooth segmentation via point cloud region partition.

Sci Rep

November 2024

State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.

You Wu Hongping Yan Kun Ding

Automatic and accurate tooth segmentation on 3D dental point clouds plays a pivotal role in computer-aided dentistry. Existing Transformer-based methods focus on aggregating local features, but fail to directly model global contexts due to memory limitations and high computational cost. In this paper, we propose a novel Transformer-based 3D tooth segmentation network, called PointRegion, which can process the entire point cloud at a low cost.

View Article and Find Full Text PDF

Similar Publications

A simplified adversarial architecture for cross-subject silent speech recognition using electromyography.

J Neural Eng

September 2024

Defense Innovation Institute, Academy of Military Sciences (AMS), Beijing 100071, People's Republic of China.

Qiang Cui Xingyu Zhang Yakun Zhang Changyan Zheng Liang Xie

. The decline in the performance of electromyography (EMG)-based silent speech recognition is widely attributed to disparities in speech patterns, articulation habits, and individual physiology among speakers. Feature alignment by learning a discriminative network that resolves domain offsets across speakers is an effective method to address this problem.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!