P2T: Pyramid Pooling Transformer for Scene Understanding.

IEEE Trans Pattern Anal Mach Intell

Published: November 2023

Recently, the vision transformer has achieved great success by pushing the state-of-the-art of various vision tasks. One of the most challenging problems in the vision transformer is that the large sequence length of image tokens leads to high computational cost (quadratic complexity). A popular solution to this problem is to use a single pooling operation to reduce the sequence length. This paper considers how to improve existing vision transformers, where the pooled feature extracted by a single pooling operation seems less powerful. To this end, we note that pyramid pooling has been demonstrated to be effective in various vision tasks owing to its powerful ability in context abstraction. However, pyramid pooling has not been explored in backbone network design. To bridge this gap, we propose to adapt pyramid pooling to Multi-Head Self-Attention (MHSA) in the vision transformer, simultaneously reducing the sequence length and capturing powerful contextual features. Plugged with our pooling-based MHSA, we build a universal vision transformer backbone, dubbed Pyramid Pooling Transformer (P2T). Extensive experiments demonstrate that, when applied P2T as the backbone network, it shows substantial superiority in various vision tasks such as image classification, semantic segmentation, object detection, and instance segmentation, compared to previous CNN- and transformer-based networks. The code will be released at https://github.com/yuhuan-wu/P2T.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2022.3202765DOI Listing

Publication Analysis

Top Keywords

pyramid pooling
20
vision transformer
16
vision tasks
12
sequence length
12
pooling transformer
8
vision
8
single pooling
8
pooling operation
8
backbone network
8
pooling
7

Similar Publications

Objective: To assist in the rapid clinical identification of brain tumor types while achieving segmentation detection, this study investigates the feasibility of applying the deep learning YOLOv5s algorithm model to the segmentation of brain tumor magnetic resonance images and optimizes and upgrades it on this basis.

Methods: The research institute utilized two public datasets of meningioma and glioma magnetic resonance imaging from Kaggle. Dataset 1 contains a total of 3,223 images, and Dataset 2 contains 216 images.

View Article and Find Full Text PDF

Seg-SkiNet: adaptive deformable fusion convolutional network for skin lesion segmentation.

Quant Imaging Med Surg

January 2025

School of Computer and Control Engineering, Yantai University, Yantai, China.

Background: Skin lesion segmentation plays a significant role in skin cancer diagnosis. However, due to the complex shapes, varying sizes, and different color depths, precise segmentation of skin lesions is a challenging task. Therefore, the aim of this study was to design a customized deep learning (DL) model for the precise segmentation of skin lesions, particularly for complex shapes and small target lesions.

View Article and Find Full Text PDF

The railway track extraction using unmanned aerial vehicle (UAV) aerial images suffers from issues such as low extraction accuracy and high time consumption. In response to these problems, this paper presents a lightweight algorithm DA-DeepLabv3 + based on densely connected and attention mechanisms. Firstly, the lightweight MobileNetV2 network is employed to replace the Xception feature extraction network, thereby reducing the number of model parameters.

View Article and Find Full Text PDF

Accurate 3D point cloud object detection is crucially important for autonomous driving vehicles. The sparsity of point clouds in 3D scenes, especially for smaller targets like pedestrians and bicycles that contain fewer points, makes detection particularly challenging. To solve this problem, we propose a single-stage voxel-based 3D object detection method, namely PFENet.

View Article and Find Full Text PDF

In the underwater domain, small object detection plays a crucial role in the protection, management, and monitoring of the environment and marine life. Advancements in deep learning have led to the development of many efficient detection techniques. However, the complexity of the underwater environment, limited information available from small objects, and constrained computational resources make small object detection challenging.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!