This data paper introduces a comprehensive dataset tailored for word sense disambiguation tasks, explicitly focusing on a hundred polysemous words frequently employed in Modern Standard Arabic. The dataset encompasses a diverse set of senses for each word, ranging from 3 to 8, resulting in 367 unique senses. Each word sense is accompanied by contextual sentences comprising ten sentence examples that feature the polysemous word in various contexts. The data collection resulted in a dataset of 3670 samples. Significantly, the dataset is in Arabic, which is known for its rich morphology, complex syntax, and extensive polysemy. The data was meticulously collected from various web sources, spanning news, medicine, finance, and more domains. This inclusivity ensures the dataset's applicability across diverse fields, positioning it as a pivotal resource for Arabic Natural Language Processing (NLP) applications. The data collection timeframe spans from the first of April 2023 to the first of May 2023. The dataset provides comprehensive model learning by including all senses for a frequently used Arabic polysemous term, even rare senses that are infrequently used in real-world contexts, thereby mitigating biases. The dataset comprises synthetic sentences generated by GPT3.5-turbo, addressing instances where rare senses lack sufficient real-world data. The dataset collection process involved initial web scraping, followed by manual sorting to distinguish word senses, supplemented by thorough searches by a human expert to fill in missing contextual sentences. Finally, in instances where online data for rare word senses was lacking or insufficient, synthetic samples were generated. Beyond its primary utility in word sense disambiguation, this dataset holds considerable value for scientists and researchers across various domains, extending its relevance to sentiment analysis applications.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11222923 | PMC |
http://dx.doi.org/10.1016/j.dib.2024.110591 | DOI Listing |
PLoS One
January 2025
IT4Innovations, VSB - Technical University of Ostrava, Ostrava, Czech Republic.
Malware is a common word in modern era. Everyone using computer is aware of it. Some users have to face the problem known as Cyber crimes.
View Article and Find Full Text PDFBrain Sci
December 2024
Vagelos College of Physicans and Surgeons, Columbia University, New York, NY 10032, USA.
Background/objectives: Olfactory dysfunction (OD) is associated with a variety of neurologic deficits and impacts socialization decisions, mood, and overall quality of life. As a common symptom comprising the long COVID condition, persistent COVID-19-associated olfactory dysfunction (C19OD) may further impact the presentations of neuropsychiatric sequelae. Our study aims to characterize the longitudinal burden of depression, anxiety, and neuropsychiatric symptoms in a population with C19OD.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
January 2025
Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA 15213.
Health Expect
February 2025
Osteopathy Sciences Research Unit (URSO), Université Libre de Bruxelles (ULB), Brussels, Belgium.
Objective: Chronic musculoskeletal pain (CMSP) is frequent in chronic diseases, decreasing the quality of life of these patients. In a survey conducted in Belgium in 2019, chronic pain was named by patients as the main factor of complexity in their lives. The objective of our research was to provide elements to understand why and how CMSP contributes to the complexity of these people's lives.
View Article and Find Full Text PDFJ Patient Exp
December 2024
Department of Medical and Clinical Psychology, Tilburg University, Tilburg, the Netherlands.
To explore "the lived experience" of patients with cancer through narratives, in-depth interviews with 20 patients were conducted in the patients' homes-"at the kitchen table." Interviews were audio-recorded, transcribed, and analyzed following the Linguistic Inquiry and Word Count (LIWC) methodology. Thematic Analysis was used to explore themes in the narratives.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!