AI Article Synopsis

  • Taking feature pyramids into account is essential for improving object detection performance, but existing methods struggle to effectively integrate semantic information across various scales.
  • The authors propose a novel architecture that incorporates two main processes: global attention to enhance overall feature information and local reconfiguration to better capture scale correlations, both of which are designed to improve the model's expressiveness.
  • Additionally, the study identifies a flaw in the loss function used during training that leads to inaccurate object localization, proposing a modified loss function that emphasizes precision, resulting in improved performance across different detection frameworks.

Article Abstract

Taking the feature pyramids into account has become a crucial way to boost the object detection performance. While various pyramid representations have been developed, previous works are still inefficient to integrate the semantical information over different scales. Moreover, recent object detectors are suffering from accurate object location applications, mainly due to the coarse definition of the "positive" examples at training and predicting phases. In this paper, we begin by analyzing current pyramid solutions, and then propose a novel architecture by reconfiguring the feature hierarchy in a flexible yet effective way. In particular, our architecture consists of two lightweight and trainable processes: global attention and local reconfiguration. The global attention is to emphasize the global information of each feature scale, while the local reconfiguration is to capture the local correlations across different scales. Both the global attention and local reconfiguration are non-linear and thus exhibit more expressive ability. Then, we discover that the loss function for object detectors during training is the central cause of the inaccurate location problem. We propose to address this issue by reshaping the standard cross entropy loss such that it focuses more on accurate predictions. Both the feature reconfiguration and the consistent loss could be utilized in popular one-stage (SSD, RetinaNet) and two-stage (Faster R-CNN) detection frameworks. Extensive experimental evaluations on PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO datasets demonstrate that, our models achieve consistent and significant boosts compared with other state-of-the-art methods.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2019.2917781DOI Listing

Publication Analysis

Top Keywords

global attention
12
local reconfiguration
12
reconfiguration consistent
8
consistent loss
8
object detection
8
object detectors
8
attention local
8
pascal voc
8
feature
5
reconfiguration
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!