Background And Objective: Medical image segmentation has been significantly improved in recent years with the progress of Convolutional Neural Networks (CNNs). Due to the inherent limitations of convolutional operations, CNNs perform poorly in learning the correlation information between global and long-range features. To solve this problem, some existing solutions rely on building deep encoders and down-sampling operations, but such methods are prone to produce redundant network structures and lose local details. Therefore, medical image segmentation tasks require better solutions to improve the modeling of the global context, while maintaining a strong grasp of the low-level details.
Methods: We propose a novel multiscale parallel branch architecture (MP-FocalUNet). On the encoder side of MP-FocalUNet, dual-scale sub-networks are used to extract information of different scales. A cross-scale "Feature Fusion" (FF) module was proposed to explore the potential of dual branch networks and fully utilize feature representations at different scales. On the decoder side, combined with the traditional CNN in parallel, focal self-attention is used for long-distance modeling, which can effectively capture the global dependencies and underlying spatial details in a shallower way.
Results: Our proposed method is evaluated on both abdominal organ segmentation datasets and automatic cardiac diagnosis challenge datasets. Our method consistently outperforms several state-of-the-art segmentation methods with an average Dice score of 82.45 % (2.68 % higher than HC-Net) and 91.44 % (0.35 % higher than HC-Net) on the abdominal organ datasets and the automatic cardiac diagnosis challenge datasets, respectively.
Conclusions: Our MP-FocalUNet is a novel encoder-decoder based multiscale parallel branch Transformer network, which solves the problem of insufficient long-distance modeling in CNNs and fuses image information at different scales. Extensive experiments on abdominal and cardiac medical image segmentation tasks show that our MP-FocalUNet outperforms other state-of-the-art methods. In the future, our work will focus on designing more lightweight Transformer-based models and better learning pixel-level intrinsic structural features generated by patch division in visual Transformers.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.cmpb.2024.108562 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!