This data article introduces a reproducibility dataset with the aim of allowing the exact replication of all experiments, results and data tables introduced in our companion paper (Lastra-Díaz et al., 2019), which introduces the largest experimental survey on ontology-based semantic similarity methods and Word Embeddings (WE) for word similarity reported in the literature. The implementation of all our experiments, as well as the gathering of all raw data derived from them, was based on the software implementation and evaluation of all methods in HESML library (Lastra-Díaz et al., 2017), and their subsequent recording with Reprozip (Chirigati et al., 2016). Raw data is made up by a collection of data files gathering the raw word-similarity values returned by each method for each word pair evaluated in any benchmark. Raw data files were processed by running a R-language script with the aim of computing all evaluation metrics reported in (Lastra-Díaz et al., 2019), such as Pearson and Spearman correlation, harmonic score and statistical significance p-values, as well as to generate automatically all data tables shown in our companion paper. Our dataset provides all input data files, resources and complementary software tools to reproduce from scratch all our experimental data, statistical analysis and reported data. Finally, our reproducibility dataset provides a self-contained experimentation platform which allows to run new word similarity benchmarks by setting up new experiments including other unconsidered methods or word similarity benchmarks.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6736772 | PMC |
http://dx.doi.org/10.1016/j.dib.2019.104432 | DOI Listing |
J Psycholinguist Res
January 2025
Department of Comparative and General Linguistics, Faculty of Arts, University of Ljubljana, Ljubljana, Slovenia.
Deverbal formations in Greek, e.g. mi'razo 'to distribute' < 'mirazma 'distributing' are considered morphologically complex lexical items.
View Article and Find Full Text PDFIn literate adults, an area along the left posterior fusiform gyrus that is often referred to as the "visual word form area" (VWFA) responds particularly strongly to written characters compared to other visually similar stimuli. Theoretical accounts differ in whether they attribute the strong left-lateralization of the VWFA to a left-hemisphere bias towards visual features used in script, to competition of visual word form processing with that of other visual stimuli processed in the same general cortical territory (especially faces), or to the well-established left-lateralization of the language system.Here we used functional magnetic resonance imaging to test the last hypothesis by investigating lateralization of the VWFA in participants (male and female) who have right-hemisphere language due to a large left-hemisphere perinatal stroke.
View Article and Find Full Text PDFNeuroimage
January 2025
Max Planck Partner Group, School of International Chinese Language Education, Beijing Normal University, Beijing, China; Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany. Electronic address:
Hierarchical syntactic structure processing is proposed to be at the core of the human language faculty. Syntactic processing is supported by the left fronto-temporal language network, including a core area in the inferior frontal gyrus as well as its interaction with the posterior temporal lobe (i.e.
View Article and Find Full Text PDFeNeuro
January 2025
Hearing Technology @ WAVES, Department of Information Technology, Ghent University, Technologiepark 216, 9052 Zwijnaarde, Belgium
Speech intelligibility declines with age and sensorineural hearing damage (SNHL). However, it remains unclear whether cochlear synaptopathy (CS), a recently discovered form of SNHL, significantly contributes to this issue. CS refers to damaged auditory-nerve synapses that innervate the inner hair cells and there is currently no go-to diagnostic test available.
View Article and Find Full Text PDFJ Am Med Inform Assoc
January 2025
Kennewick, WA 99338, United States.
Objective: This study evaluates the utility of word embeddings, generated by large language models (LLMs), for medical diagnosis by comparing the semantic proximity of symptoms to their eponymic disease embedding ("eponymic condition") and the mean of all symptom embeddings associated with a disease ("ensemble mean").
Materials And Methods: Symptom data for 5 diagnostically challenging pediatric diseases-CHARGE syndrome, Cowden disease, POEMS syndrome, Rheumatic fever, and Tuberous sclerosis-were collected from PubMed. Using the Ada-002 embedding model, disease names and symptoms were translated into vector representations in a high-dimensional space.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!