Recently, the vision transformer has achieved great success by pushing the state-of-the-art of various vision tasks. One of the most challenging problems in the vision transformer is that the large sequence length of image tokens leads to high computational cost (quadratic complexity). A popular solution to this problem is to use a single pooling operation to reduce the sequence length. This paper considers how to improve existing vision transformers, where the pooled feature extracted by a single pooling operation seems less powerful. To this end, we note that pyramid pooling has been demonstrated to be effective in various vision tasks owing to its powerful ability in context abstraction. However, pyramid pooling has not been explored in backbone network design. To bridge this gap, we propose to adapt pyramid pooling to Multi-Head Self-Attention (MHSA) in the vision transformer, simultaneously reducing the sequence length and capturing powerful contextual features. Plugged with our pooling-based MHSA, we build a universal vision transformer backbone, dubbed Pyramid Pooling Transformer (P2T). Extensive experiments demonstrate that, when applied P2T as the backbone network, it shows substantial superiority in various vision tasks such as image classification, semantic segmentation, object detection, and instance segmentation, compared to previous CNN- and transformer-based networks. The code will be released at https://github.com/yuhuan-wu/P2T.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2022.3202765 | DOI Listing |
Objective: To assist in the rapid clinical identification of brain tumor types while achieving segmentation detection, this study investigates the feasibility of applying the deep learning YOLOv5s algorithm model to the segmentation of brain tumor magnetic resonance images and optimizes and upgrades it on this basis.
Methods: The research institute utilized two public datasets of meningioma and glioma magnetic resonance imaging from Kaggle. Dataset 1 contains a total of 3,223 images, and Dataset 2 contains 216 images.
Quant Imaging Med Surg
January 2025
School of Computer and Control Engineering, Yantai University, Yantai, China.
Background: Skin lesion segmentation plays a significant role in skin cancer diagnosis. However, due to the complex shapes, varying sizes, and different color depths, precise segmentation of skin lesions is a challenging task. Therefore, the aim of this study was to design a customized deep learning (DL) model for the precise segmentation of skin lesions, particularly for complex shapes and small target lesions.
View Article and Find Full Text PDFSci Rep
January 2025
School of Computer Science, Hunan University of Technology, Tianyuan District, Zhuzhou, 412007, China.
The railway track extraction using unmanned aerial vehicle (UAV) aerial images suffers from issues such as low extraction accuracy and high time consumption. In response to these problems, this paper presents a lightweight algorithm DA-DeepLabv3 + based on densely connected and attention mechanisms. Firstly, the lightweight MobileNetV2 network is employed to replace the Xception feature extraction network, thereby reducing the number of model parameters.
View Article and Find Full Text PDFNeural Netw
January 2025
School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
Accurate 3D point cloud object detection is crucially important for autonomous driving vehicles. The sparsity of point clouds in 3D scenes, especially for smaller targets like pedestrians and bicycles that contain fewer points, makes detection particularly challenging. To solve this problem, we propose a single-stage voxel-based 3D object detection method, namely PFENet.
View Article and Find Full Text PDFSci Rep
January 2025
China Institute of Water Resources and Hydropower Research, Beijing, 100048, China.
In the underwater domain, small object detection plays a crucial role in the protection, management, and monitoring of the environment and marine life. Advancements in deep learning have led to the development of many efficient detection techniques. However, the complexity of the underwater environment, limited information available from small objects, and constrained computational resources make small object detection challenging.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!