In the realm of offline handwritten text recognition, numerous normalization algorithms have been developed over the years to serve as preprocessing steps prior to applying automatic recognition models to handwritten text scanned images. These algorithms have demonstrated effectiveness in enhancing the overall performance of recognition architectures. However, many of these methods rely heavily on heuristic strategies that are not seamlessly integrated with the recognition architecture itself. This paper introduces the use of a Pix2Pix trainable model, a specific type of conditional generative adversarial network, as the method to normalize handwritten text images. Also, this algorithm can be seamlessly integrated as the initial stage of any deep learning architecture designed for handwritten recognition tasks. All of this facilitates training the normalization and recognition components as a unified whole, while still maintaining some interpretability of each module. Our proposed normalization approach learns from a blend of heuristic transformations applied to text images, aiming to mitigate the impact of intra-personal handwriting variability among different writers. As a result, it achieves slope and slant normalizations, alongside other conventional preprocessing objectives, such as normalizing the size of text ascenders and descenders. We will demonstrate that the proposed architecture replicates, and in certain cases surpasses, the results of a widely used heuristic algorithm across two metrics and when integrated as the first step of a deep recognition architecture.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11207351PMC
http://dx.doi.org/10.3390/s24123892DOI Listing

Publication Analysis

Top Keywords

handwritten text
16
offline handwritten
8
seamlessly integrated
8
recognition architecture
8
text images
8
recognition
7
text
6
handwritten
5
pix2pix architecture
4
architecture complete
4

Similar Publications

As the demand for computational performance in artificial intelligence (AI) continues to increase, diffractive deep neural networks (DNNs), which can perform AI computing at the speed of light by repeated optical modulation with diffractive optical elements (DOEs), are attracting attention. DOEs are varied in terms of fabrication methods and materials, and among them, volume holographic optical elements (vHOEs) have unique features such as high selectivity and multiplex recordability for wavelength and angle. However, when those are used for DNNs, they suffer from unknown wavefront aberrations compounded by multiple fabrication errors.

View Article and Find Full Text PDF

The generator, which combines convolutional neural network (CNN) and Transformer as its core modules, serves as the primary model for the handwriting font generation network and demonstrates effective performance. However, there are still problems with insufficient feature extraction in the overall structure of the font, the thickness of strokes, and the curvature of strokes, resulting in subpar detail in the generated fonts. To solve the problems, we propose a method for constructing a handwritten font generation model based on Pyramid Squeeze Attention, called PSA-HWT.

View Article and Find Full Text PDF

Ancient Yi Script Handwriting Sample Repository.

Sci Data

October 2024

School of Artificial Intelligence, Chongqing University of Technology, Chongqing, 400054, China.

The ancient Yi script has been used for over 8000 years, which can be ranked with Oracle,Sumerian,Egyptian,Mayan and Harappan,and is one of the six ancient scripts in the world. In this article, we collected 2922 handwritten single word samples of commonly used ancient Yi characters. Each character was written by 310 people respectively, with a total of 427,939 valid characters.

View Article and Find Full Text PDF

Digitalising the past decades: automated ICD-10 coding of unstructured free text dermatological diagnoses.

BMC Health Serv Res

October 2024

School of Medicine and Health, Department of Dermatology and Allergy, Technical University of Munich, Biedersteiner Str. 29, Munich, 80802, Germany.

Background: Current digital medical databases record systematically coded diagnoses, but many legacy databases are full of hand-written, free text diagnoses, which can only be meaningfully analysed after mapping them to a coding system. While diagnoses can be extracted from full medical notes with good accuracy, no algorithm using only an unstructured free text diagnosis with no additional data has been published to date.

Objectives/methods: Therefore, we sought to create an algorithm which maps hand-written German diagnoses from our clinical photography database to ICD-10 diagnosis codes, validate its output manually by dermatologists and analyse diagnosis counts over time as a proof-of-concept of its application.

View Article and Find Full Text PDF

Analyzing Arabic Handwriting Style through Hand Kinematics.

Sensors (Basel)

September 2024

Applied Interactive Multimedia Lab, Engineering Division, New York University Abu Dhabi, Abu Dhabi P.O. Box 129188, United Arab Emirates.

Handwriting style is an important aspect affecting the quality of handwriting. Adhering to one style is crucial for languages that follow cursive orthography and possess multiple handwriting styles, such as Arabic. The majority of available studies analyze Arabic handwriting style from static documents, focusing only on pure styles.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!