Arabic paraphrased parallel synthetic dataset.

Data Brief

Information Technology Department, King Saud University, Riyadh, Saudi Arabia.

Published: December 2024

AI Article Synopsis

  • The Arabic paraphrased parallel dataset enhances natural language processing (NLP) applications like machine translation, text summarization, and sentiment analysis by leveraging diverse data sources and employing data augmentation techniques.
  • It acts as a valuable tool for improving educational resources and optimizing search engines, while also supporting content creation and semantic analysis for better understanding of context and meaning in Arabic.
  • Created using a systematic approach that includes collecting, preprocessing, and evaluating sentences, this dataset aims to fill the research gap in paraphrase generation in Arabic, ultimately fostering innovation in Arabic language processing.

Article Abstract

The Arabic paraphrased parallel dataset plays a crucial role in advancing NLP and other language-related applications by leveraging data from diverse sources and expanding it through data augmentation techniques. This dataset enhances machine translation, text summarization, and sentiment analysis, providing a better understanding and manipulation of the Arabic language. It also serves as a valuable tool for improving educational materials, optimizing search engines, and supporting content creation across various fields. Its role in semantic analysis aids in understanding context and meaning, making it indispensable for domain-specific applications. The main aim of building this dataset is to generate paraphrased sentences through synthetic augmentation using the back translation technique, addressing the gap in research and datasets focused on paraphrase generation in Arabic. The process involves collecting sentences from various sources, followed by preprocessing and evaluation to ensure reliability and usefulness. This systematic approach aims to produce a robust Arabic paraphrased dataset that can be utilized in various NLP tasks, fostering further innovation in Arabic language processing.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11533034PMC
http://dx.doi.org/10.1016/j.dib.2024.111004DOI Listing

Publication Analysis

Top Keywords

arabic paraphrased
12
paraphrased parallel
8
arabic language
8
arabic
6
dataset
5
parallel synthetic
4
synthetic dataset
4
dataset arabic
4
parallel dataset
4
dataset plays
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!