This paper aims at analyzing the changes in the fields of speech and natural language processing over the recent past 5 years (2016-2020). It is in continuation of a series of two papers that we published in 2019 on the analysis of the NLP4NLP corpus, which contained articles published in 34 major conferences and journals in the field of speech and natural language processing, over a period of 50 years (1965-2015), and analyzed with the methods developed in the field of NLP, hence its name. The extended NLP4NLP+5 corpus now covers 55 years, comprising close to 90,000 documents [+30% compared with NLP4NLP: as many articles have been published in the single year 2020 than over the first 25 years (1965-1989)], 67,000 authors (+40%), 590,000 references (+80%), and approximately 380 million words (+40%).
View Article and Find Full Text PDFBackground: Outcomes are variables monitored during a clinical trial to assess the impact of an intervention on humans' health.Automatic assessment of semantic similarity of trial outcomes is required for a number of tasks, such as detection of outcome switching (unjustified changes of pre-defined outcomes of a trial) and implementation of Core Outcome Sets (minimal sets of outcomes that should be reported in a particular medical domain).
Objective: We aimed at building an algorithm for assessing semantic similarity of pairs of primary and reported outcomes.