Background: Precise glioma segmentation from multi-parametric magnetic resonance (MR) images is essential for brain glioma diagnosis. However, due to the indistinct boundaries between tumor sub-regions and the heterogeneous appearances of gliomas in volumetric MR scans, designing a reliable and automated glioma segmentation method is still challenging. Although existing 3D Transformer-based or convolution-based segmentation networks have obtained promising results via multi-modal feature fusion strategies or contextual learning methods, they widely lack the capability of hierarchical interactions between different modalities and cannot effectively learn comprehensive feature representations related to all glioma sub-regions.
Purpose: To overcome these problems, in this paper, we propose a 3D hierarchical cross-modality interaction network (HCMINet) using Transformers and convolutions for accurate multi-modal glioma segmentation, which leverages an effective hierarchical cross-modality interaction strategy to sufficiently learn modality-specific and modality-shared knowledge correlated to glioma sub-region segmentation from multi-parametric MR images.
Methods: In the HCMINet, we first design a hierarchical cross-modality interaction Transformer (HCMITrans) encoder to hierarchically encode and fuse heterogeneous multi-modal features by Transformer-based intra-modal embeddings and inter-modal interactions in multiple encoding stages, which effectively captures complex cross-modality correlations while modeling global contexts. Then, we collaborate an HCMITrans encoder with a modality-shared convolutional encoder to construct the dual-encoder architecture in the encoding stage, which can learn the abundant contextual information from global and local perspectives. Finally, in the decoding stage, we present a progressive hybrid context fusion (PHCF) decoder to progressively fuse local and global features extracted by the dual-encoder architecture, which utilizes the local-global context fusion (LGCF) module to efficiently alleviate the contextual discrepancy among the decoding features.
Results: Extensive experiments are conducted on two public and competitive glioma benchmark datasets, including the BraTS2020 dataset with 494 patients and the BraTS2021 dataset with 1251 patients. Results show that our proposed method outperforms existing Transformer-based and CNN-based methods using other multi-modal fusion strategies in our experiments. Specifically, the proposed HCMINet achieves state-of-the-art mean DSC values of 85.33% and 91.09% on the BraTS2020 online validation dataset and the BraTS2021 local testing dataset, respectively.
Conclusions: Our proposed method can accurately and automatically segment glioma regions from multi-parametric MR images, which is beneficial for the quantitative analysis of brain gliomas and helpful for reducing the annotation burden of neuroradiologists.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1002/mp.17354 | DOI Listing |
Sensors (Basel)
January 2025
Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China.
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality retrieval task to match a person across different spectral camera views. Most existing works focus on learning shared feature representations from the final embedding space of advanced networks to alleviate modality differences between visible and infrared images. However, exclusively relying on high-level semantic information from the network's final layers can restrict shared feature representations and overlook the benefits of low-level details.
View Article and Find Full Text PDFContrast-enhanced ultrasound (CEUS) has been extensively employed as an imaging modality in thyroid nodule diagnosis due to its capacity to visualise the distribution and circulation of micro-vessels in organs and lesions in a non-invasive manner. However, current CEUS-based thyroid nodule diagnosis methods suffered from: 1) the blurred spatial boundaries between nodules and other anatomies in CEUS videos, and 2) the insufficient representations of the local structural information of nodule tissues by the features extracted only from CEUS videos. In this paper, we propose a novel dual-branch network with a cross-modality-attention mechanism for thyroid nodule diagnosis by integrating the information from tow related modalities, i.
View Article and Find Full Text PDFMed Phys
November 2024
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China.
Background: Precise glioma segmentation from multi-parametric magnetic resonance (MR) images is essential for brain glioma diagnosis. However, due to the indistinct boundaries between tumor sub-regions and the heterogeneous appearances of gliomas in volumetric MR scans, designing a reliable and automated glioma segmentation method is still challenging. Although existing 3D Transformer-based or convolution-based segmentation networks have obtained promising results via multi-modal feature fusion strategies or contextual learning methods, they widely lack the capability of hierarchical interactions between different modalities and cannot effectively learn comprehensive feature representations related to all glioma sub-regions.
View Article and Find Full Text PDFGait Posture
September 2024
School and Graduate Institute of Physical Therapy, College of Medicine, National Taiwan University, Taipei, Taiwan; Rehabilitation Department, Sin-Wu Branch, Tao-Yuan General Hospital, Ministry of Health and Welfare, Taiwan.
Background: While dual-task walking requires the ability to integrate sensory information from multiple ongoing sources, it remains unknown whether dual-task walking is more affected than single-task walking by the multisensory integration ability.
Research Question: How does the audiovisual temporal integration ability affect single-task and dual-task gaits in the aging population?
Methods: One hundred and thirty healthy middle-aged and older adults (age = 64.7 ± 6.
Neural Netw
February 2024
Department of Computing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China.
Image Salient Object Detection (SOD) is a fundamental research topic in the area of computer vision. Recently, the multimodal information in RGB, Depth (D), and Thermal (T) modalities has been proven to be beneficial to the SOD. However, existing methods are only designed for RGB-D or RGB-T SOD, which may limit the utilization in various modalities, or just finetuned on specific datasets, which may bring about extra computation overhead.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!