Before the 19th century, all communication and official records relied on handwritten documents, cherished as valuable artefacts by different ethnic groups. While significant efforts have been made to automate the transcription of major languages like English, French, Arabic, and Chinese, there has been less research on regional and minor languages, despite their importance from geographical and historical perspectives. This research focuses on detecting and recognizing Pashto handwritten characters and ligatures, which is essential for preserving this regional cursive language in Pakistan and its status as the national language of Afghanistan. Deep learning techniques were employed to detect and recognize Pashto characters and ligatures, utilizing a newly developed dataset specific to Pashto. A further enhancement was done on the dataset by implementing data augmentation, i.e., scaling and rotation on Pashto handwritten characters and ligatures, which gave us many variations of a single trajectory. Different morphological operations for minimizing gaps in the trajectories were also performed. The median filter was used for the removal of different noises. This dataset will be combined with the existing PHWD-V2 dataset. Various deep-learning techniques were evaluated, including VGG19, MobileNetV2, MobileNetV3, and a customized CNN. The customized CNN demonstrated the highest accuracy and minimal loss, achieving a training accuracy of 93.98%, validation accuracy of 92.08% and testing accuracy of 92.99%.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10346912 | PMC |
http://dx.doi.org/10.3390/s23136060 | DOI Listing |
PeerJ Comput Sci
March 2024
AI and Software, Gachon University, Seongnam-si, Republic of South Korea.
This article introduces a recognition system for handwritten text in the Pashto language, representing the first attempt to establish a baseline system using the Pashto Handwritten Text Imagebase (PHTI) dataset. Initially, the PHTI dataset underwent pre-processed to eliminate unwanted characters, subsequently, the dataset was divided into training 70%, validation 15%, and test sets 15%. The proposed recognition system is based on multi-dimensional long short-term memory (MD-LSTM) networks.
View Article and Find Full Text PDFSensors (Basel)
June 2023
School of Information Technology, Deakin University, Burwood, VIC 3128, Australia.
Before the 19th century, all communication and official records relied on handwritten documents, cherished as valuable artefacts by different ethnic groups. While significant efforts have been made to automate the transcription of major languages like English, French, Arabic, and Chinese, there has been less research on regional and minor languages, despite their importance from geographical and historical perspectives. This research focuses on detecting and recognizing Pashto handwritten characters and ligatures, which is essential for preserving this regional cursive language in Pakistan and its status as the national language of Afghanistan.
View Article and Find Full Text PDFSensors (Basel)
October 2020
Department of Robot System Engineering, Tongmyong University, Busan 48520, Korea.
Handwritten character recognition is increasingly important in a variety of automation fields, for example, authentication of bank signatures, identification of ZIP codes on letter addresses, and forensic evidence. Despite improved object recognition technologies, Pashto's hand-written character recognition (PHCR) remains largely unsolved due to the presence of many enigmatic hand-written characters, enormously cursive Pashto characters, and lack of research attention. We propose a convolutional neural network (CNN) model for recognition of Pashto hand-written characters for the first time in an unrestricted environment.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!