Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity.

Data Brief

IXA NLP Group, Faculty of Informatics, UPV/EHU∖∖ Manuel Lardizabal 1, 20018, Donostia, Basque Country, Spain.

Published: October 2019

AI Article Synopsis

  • The article presents a reproducibility dataset aimed at replicating experiments and results from the authors' earlier work on ontology-based semantic similarity and Word Embeddings.
  • The dataset compiles raw word-similarity values from various methods, all processed using a script to generate key evaluation metrics and tables.
  • Additionally, it offers tools to conduct new word similarity benchmarks, enabling further exploration of the topic using different methods or datasets.

Article Abstract

This data article introduces a reproducibility dataset with the aim of allowing the exact replication of all experiments, results and data tables introduced in our companion paper (Lastra-Díaz et al., 2019), which introduces the largest experimental survey on ontology-based semantic similarity methods and Word Embeddings (WE) for word similarity reported in the literature. The implementation of all our experiments, as well as the gathering of all raw data derived from them, was based on the software implementation and evaluation of all methods in HESML library (Lastra-Díaz et al., 2017), and their subsequent recording with Reprozip (Chirigati et al., 2016). Raw data is made up by a collection of data files gathering the raw word-similarity values returned by each method for each word pair evaluated in any benchmark. Raw data files were processed by running a R-language script with the aim of computing all evaluation metrics reported in (Lastra-Díaz et al., 2019), such as Pearson and Spearman correlation, harmonic score and statistical significance p-values, as well as to generate automatically all data tables shown in our companion paper. Our dataset provides all input data files, resources and complementary software tools to reproduce from scratch all our experimental data, statistical analysis and reported data. Finally, our reproducibility dataset provides a self-contained experimentation platform which allows to run new word similarity benchmarks by setting up new experiments including other unconsidered methods or word similarity benchmarks.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6736772PMC
http://dx.doi.org/10.1016/j.dib.2019.104432DOI Listing

Publication Analysis

Top Keywords

word similarity
16
reproducibility dataset
12
methods word
12
lastra-díaz et al
12
raw data
12
data files
12
data
10
experimental survey
8
word embeddings
8
data tables
8

Similar Publications

The Role of Aspect During Deverbal Word Processing in Greek.

J Psycholinguist Res

January 2025

Department of Comparative and General Linguistics, Faculty of Arts, University of Ljubljana, Ljubljana, Slovenia.

Deverbal formations in Greek, e.g. mi'razo 'to distribute' < 'mirazma 'distributing' are considered morphologically complex lexical items.

View Article and Find Full Text PDF

In literate adults, an area along the left posterior fusiform gyrus that is often referred to as the "visual word form area" (VWFA) responds particularly strongly to written characters compared to other visually similar stimuli. Theoretical accounts differ in whether they attribute the strong left-lateralization of the VWFA to a left-hemisphere bias towards visual features used in script, to competition of visual word form processing with that of other visual stimuli processed in the same general cortical territory (especially faces), or to the well-established left-lateralization of the language system.Here we used functional magnetic resonance imaging to test the last hypothesis by investigating lateralization of the VWFA in participants (male and female) who have right-hemisphere language due to a large left-hemisphere perinatal stroke.

View Article and Find Full Text PDF

Continuous theta-burst stimulation demonstrates language-network-specific causal effects on syntactic processing.

Neuroimage

January 2025

Max Planck Partner Group, School of International Chinese Language Education, Beijing Normal University, Beijing, China; Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany. Electronic address:

Hierarchical syntactic structure processing is proposed to be at the core of the human language faculty. Syntactic processing is supported by the left fronto-temporal language network, including a core area in the inferior frontal gyrus as well as its interaction with the posterior temporal lobe (i.e.

View Article and Find Full Text PDF

Deciphering compromised speech-in-noise intelligibility in older listeners: the role of cochlear synaptopathy.

eNeuro

January 2025

Hearing Technology @ WAVES, Department of Information Technology, Ghent University, Technologiepark 216, 9052 Zwijnaarde, Belgium

Speech intelligibility declines with age and sensorineural hearing damage (SNHL). However, it remains unclear whether cochlear synaptopathy (CS), a recently discovered form of SNHL, significantly contributes to this issue. CS refers to damaged auditory-nerve synapses that innervate the inner hair cells and there is currently no go-to diagnostic test available.

View Article and Find Full Text PDF

Objective: This study evaluates the utility of word embeddings, generated by large language models (LLMs), for medical diagnosis by comparing the semantic proximity of symptoms to their eponymic disease embedding ("eponymic condition") and the mean of all symptom embeddings associated with a disease ("ensemble mean").

Materials And Methods: Symptom data for 5 diagnostically challenging pediatric diseases-CHARGE syndrome, Cowden disease, POEMS syndrome, Rheumatic fever, and Tuberous sclerosis-were collected from PubMed. Using the Ada-002 embedding model, disease names and symptoms were translated into vector representations in a high-dimensional space.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!