Salient object detection (SOD), which is used to identify the most distinctive object in a given scene, plays an important role in computer vision tasks. Most existing RGB-D SOD methods employ a CNN-based network as the backbone to extract features from RGB and depth images; however, the inherent locality of a CNN-based network limits the performance of CNN-based methods. To tackle this issue, we propose a novel Swin Transformer-based edge guidance network (SwinEGNet) for RGB-D SOD in which the Swin Transformer is employed as a powerful feature extractor to capture the global context. An edge-guided cross-modal interaction module is proposed to effectively enhance and fuse features. In particular, we employed the Swin Transformer as the backbone to extract features from RGB images and depth maps. Then, we introduced the edge extraction module (EEM) to extract edge features and the depth enhancement module (DEM) to enhance depth features. Additionally, a cross-modal interaction module (CIM) was used to integrate cross-modal features from global and local contexts. Finally, we employed a cascaded decoder to refine the prediction map in a coarse-to-fine manner. Extensive experiments demonstrated that our SwinEGNet achieved the best performance on the LFSD, NLPR, DES, and NJU2K datasets and achieved comparable performance on the STEREO dataset compared to 14 state-of-the-art methods. Our model achieved better performance compared to SwinNet, with 88.4% parameters and 77.2% FLOPs. Our code will be publicly available.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10650861 | PMC |
http://dx.doi.org/10.3390/s23218802 | DOI Listing |
Biomed Phys Eng Express
January 2025
Shandong University, No. 72, Binhai Road, Jimo, Qingdao City, Shandong Province, Qingdao, 266200, CHINA.
U-Net is widely used in medical image segmentation due to its simple and flexible architecture design. To address the challenges of scale and complexity in medical tasks, several variants of U-Net have been proposed. In particular, methods based on Vision Transformer (ViT), represented by Swin UNETR, have gained widespread attention in recent years.
View Article and Find Full Text PDFQuant Imaging Med Surg
January 2025
Department of Medical Ultrasound, West China Hospital of Sichuan University, Chengdu, China.
Background: Ultrasound imaging is pivotal for point of care non-invasive diagnosis of musculoskeletal (MSK) injuries. Notably, MSK ultrasound demands a higher level of operator expertise compared to general ultrasound procedures, necessitating thorough checks on image quality and precise categorization of each image. This need for skilled assessment highlights the importance of developing supportive tools for quality control and categorization in clinical settings.
View Article and Find Full Text PDFFront Oncol
January 2025
Department of Radiation Oncology, Yonsei Cancer Center, Heavy Ion Therapy Research Institute, Yonsei University College of Medicine, Seoul, Republic of Korea.
Purpose: Recent deep-learning based synthetic computed tomography (sCT) generation using magnetic resonance (MR) images have shown promising results. However, generating sCT for the abdominal region poses challenges due to the patient motion, including respiration and peristalsis. To address these challenges, this study investigated an unsupervised learning approach using a transformer-based cycle-GAN with structure-preserving loss for abdominal cancer patients.
View Article and Find Full Text PDFJ Neural Eng
January 2025
Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America.
This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e.
View Article and Find Full Text PDFSensors (Basel)
December 2024
School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China.
Megavoltage computed tomography (MVCT) plays a crucial role in patient positioning and dose reconstruction during tomotherapy. However, due to the limited scan field of view (sFOV), the entire cross-section of certain patients may not be fully covered, resulting in projection data truncation. Truncation artifacts in MVCT can compromise registration accuracy with the planned kilovoltage computed tomography (KVCT) and hinder subsequent MVCT-based adaptive planning.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!