In the medical field, endoscopic video analysis is crucial for disease diagnosis and minimally invasive surgery. The Endoscopic Foundation Models (Endo- FM) utilize large-scale self-supervised pre-training on endoscopic video data and leverage video transformer models to capture long-range spatiotemporal dependencies. However, detecting complex lesions such as gastrointestinal metaplasia (GIM) in endoscopic videos remains challenging due to unclear boundaries and indistinct features, and Endo-FM has not demonstrated good performance. To this end, we propose a fully fine-tuning strategy with an Extended Learnable Offset Parameter (ELOP), which improves model performance by introducing learnable offset parameters in the input space. Specifically, we propose a novel loss function that combines cross- entropy loss and focal loss through a weighted sum, enabling the model to better focus on hard-to-classify samples during training. We validated ELOP on a private GIM dataset from a local grade-A tertiary hospital and a public polyp detection dataset. Experimental results show that ELOP significantly improves the detection accuracy, achieving accuracy improvements of 6.25 % and 3.75%respectively compared to the original Endo-FM. In summary, ELOP provides an excellent solution for detecting complex lesions in endoscopic videos, achieving more precise diagnoses.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1088/2057-1976/adaec3 | DOI Listing |
Biomed Phys Eng Express
January 2025
Shandong Normal University, Jinan, Jinan, Shandong, 250014, CHINA.
In the medical field, endoscopic video analysis is crucial for disease diagnosis and minimally invasive surgery. The Endoscopic Foundation Models (Endo- FM) utilize large-scale self-supervised pre-training on endoscopic video data and leverage video transformer models to capture long-range spatiotemporal dependencies. However, detecting complex lesions such as gastrointestinal metaplasia (GIM) in endoscopic videos remains challenging due to unclear boundaries and indistinct features, and Endo-FM has not demonstrated good performance.
View Article and Find Full Text PDFBiomed Phys Eng Express
January 2025
Shandong University, No. 72, Binhai Road, Jimo, Qingdao City, Shandong Province, Qingdao, 266200, CHINA.
U-Net is widely used in medical image segmentation due to its simple and flexible architecture design. To address the challenges of scale and complexity in medical tasks, several variants of U-Net have been proposed. In particular, methods based on Vision Transformer (ViT), represented by Swin UNETR, have gained widespread attention in recent years.
View Article and Find Full Text PDFPhys Med Biol
January 2025
School of Software, Xi'an Jiaotong University, Xi'an City, Shanxi Province 710049, People's Republic of China.
Deformable registration aims to achieve nonlinear alignment of image space by estimating a dense displacement field. It is commonly used as a preprocessing step in clinical and image analysis applications, such as surgical planning, diagnostic assistance, and surgical navigation. We aim to overcome these challenges: Deep learning-based registration methods often struggle with complex displacements and lack effective interaction between global and local feature information.
View Article and Find Full Text PDFSci Rep
November 2024
State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
Automatic and accurate tooth segmentation on 3D dental point clouds plays a pivotal role in computer-aided dentistry. Existing Transformer-based methods focus on aggregating local features, but fail to directly model global contexts due to memory limitations and high computational cost. In this paper, we propose a novel Transformer-based 3D tooth segmentation network, called PointRegion, which can process the entire point cloud at a low cost.
View Article and Find Full Text PDFJ Neural Eng
September 2024
Defense Innovation Institute, Academy of Military Sciences (AMS), Beijing 100071, People's Republic of China.
. The decline in the performance of electromyography (EMG)-based silent speech recognition is widely attributed to disparities in speech patterns, articulation habits, and individual physiology among speakers. Feature alignment by learning a discriminative network that resolves domain offsets across speakers is an effective method to address this problem.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!