RODFormer: High-Precision Design for Rotating Object Detection with Transformers.

Sensors (Basel)

Hubei Provincial Engineering Technology Research Center of Green Chemical Equipment, School of Mechanical and Electrical Engineering, Wuhan Institute of Technology, Wuhan 430205, China.

Published: March 2022

Aiming at the problem of Transformers lack of local spatial receptive field and discontinuous boundary loss in rotating object detection, in this paper, we propose a Transformer-based high-precision rotating object detection model (RODFormer). Firstly, RODFormer uses a structured transformer architecture to collect feature information of different resolutions to improve the collection range of feature information. Secondly, a new feed-forward network (spatial-FFN) is constructed. Spatial-FFN fuses the local spatial features of 3 × 3 depthwise separable convolutions with the global channel features of multilayer perceptron (MLP) to solve the deficiencies of FFN in local spatial modeling. Finally, based on the space-FFN architecture, a detection head is built using the CIOU-smooth L1 loss function and only returns to the horizontal frame when the rotating frame is close to the horizontal, so as to alleviate the loss discontinuity of the rotating frame. Ablation experiments of RODFormer on the DOTA dataset show that the Transformer-structured module, the spatial-FFN module and the CIOU-smooth L1 loss function module are all effective in improving the detection accuracy of RODFormer. Compared with 12 rotating object detection models on the DOTA dataset, RODFormer has the highest average detection accuracy (up to 75.60%), that is, RODFormer is more competitive in rotating object detection accuracy.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9003240PMC
http://dx.doi.org/10.3390/s22072633DOI Listing

Publication Analysis

Top Keywords

rotating object
20
object detection
20
local spatial
12
detection accuracy
12
detection
8
ciou-smooth loss
8
loss function
8
rotating frame
8
dota dataset
8
rodformer
7

Similar Publications

An automated micro-tweezers system with a flexible workspace would benefit the intelligent sorting of live cells. Such micro-tweezers could employ a forced vortex strong enough to capture a single cell. Furthermore, addressable control of the position to the vortex would constitute a robotic system.

View Article and Find Full Text PDF

Automatic Aortic Valve Extraction Using Deep Learning with Contrast-Enhanced Cardiac CT Images.

J Cardiovasc Dev Dis

December 2024

Global Center for Biomedical Science and Engineering, Faculty of Medicine, Hokkaido University, Sapporo 060-8638, Japan.

Purpose: This study evaluates the use of deep learning techniques to automatically extract and delineate the aortic valve annulus region from contrast-enhanced cardiac CT images. Two approaches, namely, segmentation and object detection, were compared to determine their accuracy.

Materials And Methods: A dataset of 32 contrast-enhanced cardiac CT scans was analyzed.

View Article and Find Full Text PDF

Orienting Gaze Toward a Visual Target: Neurophysiological Synthesis with Epistemological Considerations.

Vision (Basel)

January 2025

Centre Gilles Gaston Granger, UMR 7304 Centre National de la Recherche Scientifique, Aix Marseille Université, 13621 Aix-en-Provence, France.

The appearance of an object triggers an orienting gaze movement toward its location. The movement consists of a rapid rotation of the eyes, the saccade, which is accompanied by a head rotation if the target eccentricity exceeds the oculomotor range and by a slow eye movement if the target moves. Completing a previous report, we explain the numerous points that lead to questioning the validity of a one-to-one correspondence relation between measured physical values of gaze or head orientation and neuronal activity.

View Article and Find Full Text PDF

Object pose estimation is essential for computer vision applications such as quality inspection, robotic bin picking, and warehouse logistics. However, this task often requires expensive equipment such as 3D cameras or Lidar sensors, as well as significant computational resources. Many state-of-the-art methods for 6D pose estimation depend on deep neural networks, which are computationally demanding and require GPUs for real-time performance.

View Article and Find Full Text PDF

Detecting ship targets in remote sensing images within complex scenarios faces numerous challenges. The limited feature information of small-scale targets and their random orientation angles often result in missed and false detections. To address these issues, this paper proposes a Multi-Scale Rotated Detection Network (MSRO-Net) for detecting rotated ship targets in remote sensing images.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!