Multi-scale dual-channel feature embedding decoder for biomedical image segmentation.

Comput Methods Programs Biomed

Department of Computer Science and Engineering, National Institute of Technology, Durgapur 713209, West Bengal, India. Electronic address:

Published: December 2024

AI Article Synopsis

  • Effective segmentation of biomedical images requires a balance between understanding both global context and local details, but existing transformer models often struggle with accuracy and computational limits.
  • The proposed solution in the paper is a multi-scale dual-channel decoder that utilizes two parallel encoders and a hierarchical Attention-gated Swin Transformer decoder to enhance feature extraction while minimizing computational demands.
  • Evaluation of the model on various datasets, including public and private ones, shows that it surpasses existing models in segmentation accuracy for liver tumors and spleen images, maintaining a manageable computational cost.

Article Abstract

Background And Objective: Attaining global context along with local dependencies is of paramount importance for achieving highly accurate segmentation of objects from image frames and is challenging while developing deep learning-based biomedical image segmentation. Several transformer-based models have been proposed to handle this issue in biomedical image segmentation. Despite this, segmentation accuracy remains an ongoing challenge, as these models often fall short of the target range due to their limited capacity to capture critical local and global contexts. However, the quadratic computational complexity is the main limitation of these models. Moreover, a large dataset is required to train those models.

Methods: In this paper, we propose a novel multi-scale dual-channel decoder to mitigate this issue. The complete segmentation model uses two parallel encoders and a dual-channel decoder. The encoders are based on convolutional networks, which capture the features of the input images at multiple levels and scales. The decoder comprises a hierarchy of Attention-gated Swin Transformers with a fine-tuning strategy. The hierarchical Attention-gated Swin Transformers implements a multi-scale, multi-level feature embedding strategy that captures short and long-range dependencies and leverages the necessary features without increasing computational load. At the final stage of the decoder, a fine-tuning strategy is implemented that refines the features to keep the rich features and reduce the possibility of over-segmentation.

Results: The proposed model is evaluated on publicly available LiTS, 3DIRCADb, and spleen datasets obtained from Medical Segmentation Decathlon. The model is also evaluated on a private dataset from Medical College Kolkata, India. We observe that the proposed model outperforms the state-of-the-art models in liver tumor and spleen segmentation in terms of evaluation metrics at a comparative computational cost.

Conclusion: The novel dual-channel decoder embeds multi-scale features and creates a representation of both short and long-range contexts efficiently. It also refines the features at the final stage to select only necessary features. As a result, we achieve better segmentation performance than the state-of-the-art models.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cmpb.2024.108464DOI Listing

Publication Analysis

Top Keywords

biomedical image
12
image segmentation
12
dual-channel decoder
12
segmentation
9
multi-scale dual-channel
8
feature embedding
8
attention-gated swin
8
swin transformers
8
fine-tuning strategy
8
short long-range
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!