Scene text detection and recognition have been well explored in the past few years. Despite the progress, efficient and accurate end-to-end spotting of arbitrarily-shaped text remains challenging. In this work, we propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes. PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels. By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text. Moreover, as a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications. Taking the advantages of the kernel representation, we design a series of components as follows: 1) a computationally efficient feature enhancement network composed of stacked Feature Pyramid Enhancement Modules (FPEMs); 2) a lightweight detection head cooperating with Pixel Aggregation (PA); and 3) an efficient attention-based recognition head with Masked RoI. Benefiting from the kernel representation and the tailored components, our method achieves high inference speed while maintaining competitive accuracy. Extensive experiments show the superiority of our method. For example, the proposed PAN++ achieves an end-to-end text spotting F-measure of 64.9 at 29.2 FPS on the Total-Text dataset, which significantly outperforms the previous best method. Code will be available at: git.io/PAN.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2021.3077555 | DOI Listing |
J Hazard Mater
January 2025
School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430070, China; Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan 430070, China. Electronic address:
Artificial intelligence-assisted imaging biosensors have attracted increasing attention due to their flexibility, allowing for the digital image analysis and quantification of biomarkers. While deep learning methods have led to advancements in biomarker identification, the diversity in the density and adherence of targets still poses a serious challenge. In this regard, we propose CellNet, a neural network model specifically designed for detecting dense targets.
View Article and Find Full Text PDFPLoS One
January 2025
Department of Electrical and Computer Engineering, University of Birjand, Birjand, Iran.
This paper presents a novel method for improving semantic segmentation performance in computer vision tasks. Our approach utilizes an enhanced UNet architecture that leverages an improved ResNet50 backbone. We replace the last layer of ResNet50 with deformable convolution to enhance feature representation.
View Article and Find Full Text PDFNat Commun
January 2025
Neuromorphic Computing Lab, Intel, Santa Clara, CA, USA.
Reservoir computing advances the intriguing idea that a nonlinear recurrent neural circuit-the reservoir-can encode spatio-temporal input signals to enable efficient ways to perform tasks like classification or regression. However, recently the idea of a monolithic reservoir network that simultaneously buffers input signals and expands them into nonlinear features has been challenged. A representation scheme in which memory buffer and expansion into higher-order polynomial features can be configured separately has been shown to significantly outperform traditional reservoir computing in prediction of multivariate time-series.
View Article and Find Full Text PDFNeural Netw
January 2025
State Key Laboratory of Public Big Data, Guizhou University, 550025, China; Engineering Research Center of Text Computing & Cognitive Intelligence, Ministry of Education, Guizhou University, 550025, China; College of Computer Science and Technology, Guizhou University, 550025, China. Electronic address:
Relation extraction independently verifies all entity pairs in a sentence to identify predefined relationships between named entities. Because these entity pairs share the same contextual features of a sentence, they lead to a complicated semantic structure. To distinguish semantic expressions between relation instances, manually designed rules or elaborate deep architectures are usually applied to learn task-relevant representations.
View Article and Find Full Text PDFSensors (Basel)
January 2025
School of Information and Communication Engineering, Beijing Information Science and Technology University, Beijing 100101, China.
Human activity recognition by radar sensors plays an important role in healthcare and smart homes. However, labeling a large number of radar datasets is difficult and time-consuming, and it is difficult for models trained on insufficient labeled data to obtain exact classification results. In this paper, we propose a multiscale residual weighted classification network with large-scale, medium-scale, and small-scale residual networks.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!