Scene text detection and recognition have been well explored in the past few years. Despite the progress, efficient and accurate end-to-end spotting of arbitrarily-shaped text remains challenging. In this work, we propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes. PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels. By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text. Moreover, as a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications. Taking the advantages of the kernel representation, we design a series of components as follows: 1) a computationally efficient feature enhancement network composed of stacked Feature Pyramid Enhancement Modules (FPEMs); 2) a lightweight detection head cooperating with Pixel Aggregation (PA); and 3) an efficient attention-based recognition head with Masked RoI. Benefiting from the kernel representation and the tailored components, our method achieves high inference speed while maintaining competitive accuracy. Extensive experiments show the superiority of our method. For example, the proposed PAN++ achieves an end-to-end text spotting F-measure of 64.9 at 29.2 FPS on the Total-Text dataset, which significantly outperforms the previous best method. Code will be available at: git.io/PAN.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2021.3077555DOI Listing

Publication Analysis

Top Keywords

kernel representation
20
arbitrarily-shaped text
12
text
11
efficient accurate
8
accurate end-to-end
8
end-to-end spotting
8
spotting arbitrarily-shaped
8
scene text
8
end-to-end text
8
text spotting
8

Similar Publications

Kernel representation-based End-to-End network-enabled decoding strategy for precise and medical diagnosis.

J Hazard Mater

January 2025

School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430070, China; Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan 430070, China. Electronic address:

Artificial intelligence-assisted imaging biosensors have attracted increasing attention due to their flexibility, allowing for the digital image analysis and quantification of biomarkers. While deep learning methods have led to advancements in biomarker identification, the diversity in the density and adherence of targets still poses a serious challenge. In this regard, we propose CellNet, a neural network model specifically designed for detecting dense targets.

View Article and Find Full Text PDF

This paper presents a novel method for improving semantic segmentation performance in computer vision tasks. Our approach utilizes an enhanced UNet architecture that leverages an improved ResNet50 backbone. We replace the last layer of ResNet50 with deformable convolution to enhance feature representation.

View Article and Find Full Text PDF

Principled neuromorphic reservoir computing.

Nat Commun

January 2025

Neuromorphic Computing Lab, Intel, Santa Clara, CA, USA.

Reservoir computing advances the intriguing idea that a nonlinear recurrent neural circuit-the reservoir-can encode spatio-temporal input signals to enable efficient ways to perform tasks like classification or regression. However, recently the idea of a monolithic reservoir network that simultaneously buffers input signals and expands them into nonlinear features has been challenged. A representation scheme in which memory buffer and expansion into higher-order polynomial features can be configured separately has been shown to significantly outperform traditional reservoir computing in prediction of multivariate time-series.

View Article and Find Full Text PDF

A discrete convolutional network for entity relation extraction.

Neural Netw

January 2025

State Key Laboratory of Public Big Data, Guizhou University, 550025, China; Engineering Research Center of Text Computing & Cognitive Intelligence, Ministry of Education, Guizhou University, 550025, China; College of Computer Science and Technology, Guizhou University, 550025, China. Electronic address:

Relation extraction independently verifies all entity pairs in a sentence to identify predefined relationships between named entities. Because these entity pairs share the same contextual features of a sentence, they lead to a complicated semantic structure. To distinguish semantic expressions between relation instances, manually designed rules or elaborate deep architectures are usually applied to learn task-relevant representations.

View Article and Find Full Text PDF

Human activity recognition by radar sensors plays an important role in healthcare and smart homes. However, labeling a large number of radar datasets is difficult and time-consuming, and it is difficult for models trained on insufficient labeled data to obtain exact classification results. In this paper, we propose a multiscale residual weighted classification network with large-scale, medium-scale, and small-scale residual networks.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!