PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text.

Wenhai Wang Enze Xie Xiang Li Xuebo Liu Ding Liang Zhibo Yang Tong Lu Chunhua Shen

IEEE Trans Pattern Anal Mach Intell

Published: September 2022

Scene text detection and recognition have been well explored in the past few years. Despite the progress, efficient and accurate end-to-end spotting of arbitrarily-shaped text remains challenging. In this work, we propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes. PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels. By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text. Moreover, as a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications. Taking the advantages of the kernel representation, we design a series of components as follows: 1) a computationally efficient feature enhancement network composed of stacked Feature Pyramid Enhancement Modules (FPEMs); 2) a lightweight detection head cooperating with Pixel Aggregation (PA); and 3) an efficient attention-based recognition head with Masked RoI. Benefiting from the kernel representation and the tailored components, our method achieves high inference speed while maintaining competitive accuracy. Extensive experiments show the superiority of our method. For example, the proposed PAN++ achieves an end-to-end text spotting F-measure of 64.9 at 29.2 FPS on the Total-Text dataset, which significantly outperforms the previous best method. Code will be available at: git.io/PAN.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TPAMI.2021.3077555	DOI Listing

Publication Analysis

Top Keywords

kernel representation

arbitrarily-shaped text

text

efficient accurate

accurate end-to-end

end-to-end spotting

spotting arbitrarily-shaped

scene text

end-to-end text

text spotting

Similar Publications

Kernel representation-based End-to-End network-enabled decoding strategy for precise and medical diagnosis.

J Hazard Mater

January 2025

School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430070, China; Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan 430070, China. Electronic address:

Qinyu Wang Xuewen Peng Niu Feng Yiping Chen Chunhua Deng

Artificial intelligence-assisted imaging biosensors have attracted increasing attention due to their flexibility, allowing for the digital image analysis and quantification of biomarkers. While deep learning methods have led to advancements in biomarker identification, the diversity in the density and adherence of targets still poses a serious challenge. In this regard, we propose CellNet, a neural network model specifically designed for detecting dense targets.

View Article and Find Full Text PDF

Similar Publications

Advancing semantic segmentation: Enhanced UNet algorithm with attention mechanism and deformable convolution.

PLoS One

January 2025

Department of Electrical and Computer Engineering, University of Birjand, Birjand, Iran.

Effat Sahragard Hassan Farsi Sajad Mohamadzadeh

This paper presents a novel method for improving semantic segmentation performance in computer vision tasks. Our approach utilizes an enhanced UNet architecture that leverages an improved ResNet50 backbone. We replace the last layer of ResNet50 with deformable convolution to enhance feature representation.

View Article and Find Full Text PDF

Similar Publications

Principled neuromorphic reservoir computing.

Nat Commun

January 2025

Neuromorphic Computing Lab, Intel, Santa Clara, CA, USA.

Denis Kleyko Christopher J Kymn Anthony Thomas Bruno A Olshausen Friedrich T Sommer

Reservoir computing advances the intriguing idea that a nonlinear recurrent neural circuit-the reservoir-can encode spatio-temporal input signals to enable efficient ways to perform tasks like classification or regression. However, recently the idea of a monolithic reservoir network that simultaneously buffers input signals and expands them into nonlinear features has been challenged. A representation scheme in which memory buffer and expansion into higher-order polynomial features can be configured separately has been shown to significantly outperform traditional reservoir computing in prediction of multivariate time-series.

View Article and Find Full Text PDF

Similar Publications

A discrete convolutional network for entity relation extraction.

Neural Netw

January 2025

State Key Laboratory of Public Big Data, Guizhou University, 550025, China; Engineering Research Center of Text Computing & Cognitive Intelligence, Ministry of Education, Guizhou University, 550025, China; College of Computer Science and Technology, Guizhou University, 550025, China. Electronic address:

Weizhe Yang Yongbin Qin Kai Wang Ying Hu Ruizhang Huang

Relation extraction independently verifies all entity pairs in a sentence to identify predefined relationships between named entities. Because these entity pairs share the same contextual features of a sentence, they lead to a complicated semantic structure. To distinguish semantic expressions between relation instances, manually designed rules or elaborate deep architectures are usually applied to learn task-relevant representations.

View Article and Find Full Text PDF

Similar Publications

Multiscale Residual Weighted Classification Network for Human Activity Recognition in Microwave Radar.

Sensors (Basel)

January 2025

School of Information and Communication Engineering, Beijing Information Science and Technology University, Beijing 100101, China.

Yukun Gao Lin Cao Zongmin Zhao Dongfeng Wang Chong Fu

Human activity recognition by radar sensors plays an important role in healthcare and smart homes. However, labeling a large number of radar datasets is difficult and time-consuming, and it is difficult for models trained on insufficient labeled data to obtain exact classification results. In this paper, we propose a multiscale residual weighted classification network with large-scale, medium-scale, and small-scale residual networks.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!