A deep learning approach for Named Entity Recognition in Urdu language.

Rimsha Anam Muhammad Waqas Anwar Muhammad Hasan Jamal Usama Ijaz Bajwa Isabel de la Torre Diez Eduardo Silva Alvarado Emmanuel Soriano Flores Imran Ashraf

PLoS One

Department of Information and Communication Engineering, Yeungnam University, Gyeongsan, Korea.

Published: April 2024

Named Entity Recognition (NER) is a natural language processing task that has been widely explored for different languages in the recent decade but is still an under-researched area for the Urdu language due to its rich morphology and language complexities. Existing state-of-the-art studies on Urdu NER use various deep-learning approaches through automatic feature selection using word embeddings. This paper presents a deep learning approach for Urdu NER that harnesses FastText and Floret word embeddings to capture the contextual information of words by considering the surrounding context of words for improved feature extraction. The pre-trained FastText and Floret word embeddings are publicly available for Urdu language which are utilized to generate feature vectors of four benchmark Urdu language datasets. These features are then used as input to train various combinations of Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), CRF, and deep learning models. The results show that our proposed approach significantly outperforms existing state-of-the-art studies on Urdu NER, achieving an F-score of up to 0.98 when using BiLSTM+GRU with Floret embeddings. Error analysis shows a low classification error rate ranging from 1.24% to 3.63% across various datasets showing the robustness of the proposed approach. The performance comparison shows that the proposed approach significantly outperforms similar existing studies.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10977791	PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0300725	PLOS

Publication Analysis

Top Keywords

urdu language

deep learning

urdu ner

word embeddings

proposed approach

learning approach

named entity

entity recognition

existing state-of-the-art

state-of-the-art studies

Similar Publications

Twice Upon a Time: Children Use Syntax to Learn the Meanings of Yesterday and Tomorrow.

Dev Sci

March 2025

Department of Psychology, University of California San Diego, La Jolla, USA.

Urvi Maheshwari David Barner

Time words like "yesterday" and "tomorrow" are abstract, and are interpreted relative to the context in which they are produced: the word "tomorrow" refers to a different point in time now than in 24 h. We tested 112 three- to five-year-old English- and Hindi-speaking children on their knowledge of "yesterday" and "tomorrow," which are represented by the same word in Hindi-Urdu: "kal." We found that Hindi learners performed better than English learners when tested on actual past and future events, but that performance for hypothetical events was poor for both groups.

View Article and Find Full Text PDF

Similar Publications

A dataset of Roman Urdu text with spelling variations for sentence level sentiment analysis.

Data Brief

December 2024

Department of Information Technology, University of Sindh, Jamshoro, Pakistan.

Mudasar Ahmed Soomro Rafia Naz Memon Asghar Ali Chandio Mehwish Leghari Muhammad Hanif Soomro

Roman Urdu text is very widespread on many websites. People mostly prefer to give their social comments or product reviews in Roman Urdu, and Roman Urdu is counted as non-standard language. The main reason for this is that there is no rule for word spellings within Roman Urdu words, so people create and post their own word spellings, like "2mro" is a nonstandard spelling for tomorrow.

View Article and Find Full Text PDF

Similar Publications

Cross-cultural adaptation, reliability, validity and responsiveness of Urdu version of hip disability and osteoarthritis outcome score.

Disabil Rehabil

December 2024

Margalla Institute of Health Sciences, Rawalpindi, Pakistan.

Arfa Zafar Somiya Naz M Nazim Farooq Maira Fatima Qurat Ul Ain

Purpose: To linguistically and cross-culturally translate Hip Disability and Osteoarthritis Outcome Score into Urdu language (HOOS-U), and test its psychometric properties among patients with hip pain.

Materials And Methods: Translation and cross-cultural adaptation of English version of HOOS were carried out following international guidelines. Psychometric testing included reliability (internal consistency and test-retest reliability), validity (content and construct validity) and responsiveness.

View Article and Find Full Text PDF

Similar Publications

Machine learning based framework for fine-grained word segmentation and enhanced text normalization for low resourced language.

PeerJ Comput Sci

January 2024

Department of Computer Science, National Textile University, Faisalabad, Pakistan.

Shahzad Nazir Muhammad Asif Mariam Rehman Shahbaz Ahmad

In text applications, pre-processing is deemed as a significant parameter to enhance the outcomes of natural language processing (NLP) chores. Text normalization and tokenization are two pivotal procedures of text pre-processing that cannot be overstated. Text normalization refers to transforming raw text into scriptural standardized text, while word tokenization splits the text into tokens or words.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!