Multi-label emotion classification of Urdu tweets.

PeerJ Comput Sci

CIC, Instituto Politécnico Nacional, Mexico City, Mexico.

Published: April 2022

Urdu is a widely used language in South Asia and worldwide. While there are similar datasets available in English, we created the first multi-label emotion dataset consisting of 6,043 tweets and six basic emotions in the Urdu Nastalíq script. A multi-label (ML) classification approach was adopted to detect emotions from Urdu. The morphological and syntactic structure of Urdu makes it a challenging problem for multi-label emotion detection. In this paper, we build a set of baseline classifiers such as machine learning algorithms (Random forest (RF), Decision tree (J48), Sequential minimal optimization (SMO), AdaBoostM1, and Bagging), deep-learning algorithms (Convolutional Neural Networks (1D-CNN), Long short-term memory (LSTM), and LSTM with CNN features) and transformer-based baseline (BERT). We used a combination of text representations: stylometric-based features, pre-trained word embedding, word-based n-grams, and character-based n-grams. The paper highlights the annotation guidelines, dataset characteristics and insights into different methodologies used for Urdu based emotion classification. We present our best results using micro-averaged F1, macro-averaged F1, accuracy, Hamming loss (HL) and exact match (EM) for all tested methods.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9044368PMC
http://dx.doi.org/10.7717/peerj-cs.896DOI Listing

Publication Analysis

Top Keywords

multi-label emotion
12
emotion classification
8
emotions urdu
8
urdu
6
multi-label
4
classification urdu
4
urdu tweets
4
tweets urdu
4
urdu language
4
language south
4

Similar Publications

In the era of social media, the use of emojis and code-mixed language has become essential in online communication. However, selecting the appropriate emoji that matches a particular sentiment or emotion in the code-mixed text can be difficult. This paper presents a novel task of predicting multiple emojis in English-Hindi code-mixed sentences and proposes a new dataset called SENTIMOJI, which extends the SemEval 2020 Task 9 SentiMix dataset.

View Article and Find Full Text PDF

In the dynamic domain of logistics, effective communication is essential for streamlined operations. Our innovative solution, the Multi-Labeling Ensemble (MLEn), tackles the intricate task of extracting multi-labeled data, employing advanced techniques for accurate preprocessing of textual data through the NLTK toolkit. This approach is carefully tailored to the prevailing language used in logistics communication.

View Article and Find Full Text PDF

Background: Geriatric depression and anxiety have been identified as mood disorders commonly associated with the onset of dementia. Currently, the diagnosis of geriatric depression and anxiety relies on self-reported assessments for primary screening purposes, which is uncomfortable for older adults and can be prone to misreporting. When a more precise diagnosis is needed, additional methods such as in-depth interviews or functional magnetic resonance imaging are used.

View Article and Find Full Text PDF

Multilabel multiclass sentiment and emotion dataset from indonesian mobile application review.

Data Brief

October 2023

Computer Science Department, School of Computer Science, Bina Nusantara University Bandung Campus, Jakarta, Indonesia 11480.

Reviews are a person's way of expressing feedback on something in the form of criticism and ideas. Reviews of mobile apps are a type of user feedback that focuses on the performance and look of a mobile application and is typically featured on the download page of a mobile application, such as in the Apps Store. Because it comprises a person's feelings and emotions, whether they are joyful, sad, hostile, or indifferent toward a mobile application, the review data is textual and may be gathered and utilized as material for creating a textual dataset.

View Article and Find Full Text PDF

Psychotic disorder diseases (PDD) or mental illnesses are group of illnesses that affect the minds and impair the cognitive ability, retard emotional ability and obstruct the process of communication and relationship with others and are characterized by delusions, hallucinations and disoriented or disordered pattern of thinking. Prognosis of PDD is not sufficient because of the nature of the diseases and as such adequate form of diagnosis is required to detect, manage and treat the illness. This paper applied the single-label classification (SLC) machine learning approach in mining of electronic health records of people with PDD in Nigeria using eleven independent (demographic) variables and five PDD as target variables.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!