The prevalence of cyberbullying has reached an alarming rate, affecting approximately 54% of teenagers who experience various forms of cyberbullying, including offensive hate speech, threats, and racism. This research introduces a comprehensive dataset and system for cyberbullying detection in Urdu tweets, leveraging a spectrum of machine learning approaches including traditional models and advanced deep learning techniques. The objectives of this study are threefold. Firstly, a dataset consisting of 12,500 annotated tweets in Urdu is created, and it is made publicly available to the research community. Secondly, annotation guidelines for Urdu text with appropriate labels for cyberbullying detection are developed. Finally, a series of experiments is conducted to assess the performance of machine learning and deep learning techniques in detecting cyberbullying. The results indicate that fastText deep learning models outperform other models in cyberbullying detection. This study demonstrates its efficacy in effectively detecting and classifying cyberbullying incidents in Urdu tweets, contributing to the broader effort of creating a safer digital environment.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11065408PMC
http://dx.doi.org/10.7717/peerj-cs.1963DOI Listing

Publication Analysis

Top Keywords

urdu tweets
12
cyberbullying detection
12
deep learning
12
comprehensive dataset
8
machine learning
8
learning techniques
8
cyberbullying
7
urdu
5
learning
5
addressing cyberbullying
4

Similar Publications

Sarcasm detection has emerged due to its applicability in natural language processing (NLP) but lacks substantial exploration in low-resource languages like Urdu, Arabic, Pashto, and Roman-Urdu. While fewer studies identifying sarcasm have focused on low-resource languages, most of the work is in English. This research addresses the gap by exploring the efficacy of diverse machine learning (ML) algorithms in identifying sarcasm in Urdu.

View Article and Find Full Text PDF

The prevalence of cyberbullying has reached an alarming rate, affecting approximately 54% of teenagers who experience various forms of cyberbullying, including offensive hate speech, threats, and racism. This research introduces a comprehensive dataset and system for cyberbullying detection in Urdu tweets, leveraging a spectrum of machine learning approaches including traditional models and advanced deep learning techniques. The objectives of this study are threefold.

View Article and Find Full Text PDF

Low-resource languages are gaining much-needed attention with the advent of deep learning models and pre-trained word embedding. Though spoken by more than 230 million people worldwide, Urdu is one such low-resource language that has recently gained popularity online and is attracting a lot of attention and support from the research community. One challenge faced by such resource-constrained languages is the scarcity of publicly available large-scale datasets for conducting any meaningful study.

View Article and Find Full Text PDF

Multi-label emotion classification of Urdu tweets.

PeerJ Comput Sci

April 2022

CIC, Instituto Politécnico Nacional, Mexico City, Mexico.

Urdu is a widely used language in South Asia and worldwide. While there are similar datasets available in English, we created the first multi-label emotion dataset consisting of 6,043 tweets and six basic emotions in the Urdu Nastalíq script. A multi-label (ML) classification approach was adopted to detect emotions from Urdu.

View Article and Find Full Text PDF

Working from a dataset of 118 billion messages running from the start of 2009 to the end of 2019, we identify and explore the relative daily use of over 150 languages on Twitter. We find that eight languages comprise 80% of all tweets, with English, Japanese, Spanish, Arabic, and Portuguese being the most dominant. To quantify social spreading in each language over time, we compute the 'contagion ratio': The balance of retweets to organic messages.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!