Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection.

Sensors (Basel)

Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China.

Published: October 2023

Salient object detection (SOD), which is used to identify the most distinctive object in a given scene, plays an important role in computer vision tasks. Most existing RGB-D SOD methods employ a CNN-based network as the backbone to extract features from RGB and depth images; however, the inherent locality of a CNN-based network limits the performance of CNN-based methods. To tackle this issue, we propose a novel Swin Transformer-based edge guidance network (SwinEGNet) for RGB-D SOD in which the Swin Transformer is employed as a powerful feature extractor to capture the global context. An edge-guided cross-modal interaction module is proposed to effectively enhance and fuse features. In particular, we employed the Swin Transformer as the backbone to extract features from RGB images and depth maps. Then, we introduced the edge extraction module (EEM) to extract edge features and the depth enhancement module (DEM) to enhance depth features. Additionally, a cross-modal interaction module (CIM) was used to integrate cross-modal features from global and local contexts. Finally, we employed a cascaded decoder to refine the prediction map in a coarse-to-fine manner. Extensive experiments demonstrated that our SwinEGNet achieved the best performance on the LFSD, NLPR, DES, and NJU2K datasets and achieved comparable performance on the STEREO dataset compared to 14 state-of-the-art methods. Our model achieved better performance compared to SwinNet, with 88.4% parameters and 77.2% FLOPs. Our code will be publicly available.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10650861PMC
http://dx.doi.org/10.3390/s23218802DOI Listing

Publication Analysis

Top Keywords

swin transformer-based
8
transformer-based edge
8
edge guidance
8
guidance network
8
salient object
8
object detection
8
rgb-d sod
8
cnn-based network
8
backbone extract
8
extract features
8

Similar Publications

Optimizing Transformer-Based Network via Advanced Decoder Design for Medical Image Segmentation.

Biomed Phys Eng Express

January 2025

Shandong University, No. 72, Binhai Road, Jimo, Qingdao City, Shandong Province, Qingdao, 266200, CHINA.

U-Net is widely used in medical image segmentation due to its simple and flexible architecture design. To address the challenges of scale and complexity in medical tasks, several variants of U-Net have been proposed. In particular, methods based on Vision Transformer (ViT), represented by Swin UNETR, have gained widespread attention in recent years.

View Article and Find Full Text PDF

Background: Ultrasound imaging is pivotal for point of care non-invasive diagnosis of musculoskeletal (MSK) injuries. Notably, MSK ultrasound demands a higher level of operator expertise compared to general ultrasound procedures, necessitating thorough checks on image quality and precise categorization of each image. This need for skilled assessment highlights the importance of developing supportive tools for quality control and categorization in clinical settings.

View Article and Find Full Text PDF

Abdominal synthetic CT generation for MR-only radiotherapy using structure-conserving loss and transformer-based cycle-GAN.

Front Oncol

January 2025

Department of Radiation Oncology, Yonsei Cancer Center, Heavy Ion Therapy Research Institute, Yonsei University College of Medicine, Seoul, Republic of Korea.

Purpose: Recent deep-learning based synthetic computed tomography (sCT) generation using magnetic resonance (MR) images have shown promising results. However, generating sCT for the abdominal region poses challenges due to the patient motion, including respiration and peristalsis. To address these challenges, this study investigated an unsupervised learning approach using a transformer-based cycle-GAN with structure-preserving loss for abdominal cancer patients.

View Article and Find Full Text PDF

Transformer-based neural speech decoding from surface and depth electrode signals.

J Neural Eng

January 2025

Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America.

This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e.

View Article and Find Full Text PDF

Megavoltage computed tomography (MVCT) plays a crucial role in patient positioning and dose reconstruction during tomotherapy. However, due to the limited scan field of view (sFOV), the entire cross-section of certain patients may not be fully covered, resulting in projection data truncation. Truncation artifacts in MVCT can compromise registration accuracy with the planned kilovoltage computed tomography (KVCT) and hinder subsequent MVCT-based adaptive planning.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!