The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.

JMIR Med Inform

Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States.

Published: November 2020

Background: Semantic textual similarity is a common task in the general English domain to assess the degree to which the underlying semantics of 2 text segments are equivalent to each other. Clinical Semantic Textual Similarity (ClinicalSTS) is the semantic textual similarity task in the clinical domain that attempts to measure the degree of semantic equivalence between 2 snippets of clinical text. Due to the frequent use of templates in the Electronic Health Record system, a large amount of redundant text exists in clinical notes, making ClinicalSTS crucial for the secondary use of clinical text in downstream clinical natural language processing applications, such as clinical text summarization, clinical semantics extraction, and clinical information retrieval.

Objective: Our objective was to release ClinicalSTS data sets and to motivate natural language processing and biomedical informatics communities to tackle semantic text similarity tasks in the clinical domain.

Methods: We organized the first BioCreative/OHNLP ClinicalSTS shared task in 2018 by making available a real-world ClinicalSTS data set. We continued the shared task in 2019 in collaboration with National NLP Clinical Challenges (n2c2) and the Open Health Natural Language Processing (OHNLP) consortium and organized the 2019 n2c2/OHNLP ClinicalSTS track. We released a larger ClinicalSTS data set comprising 1642 clinical sentence pairs, including 1068 pairs from the 2018 shared task and 1006 new pairs from 2 electronic health record systems, GE and Epic. We released 80% (1642/2054) of the data to participating teams to develop and fine-tune the semantic textual similarity systems and used the remaining 20% (412/2054) as blind testing to evaluate their systems. The workshop was held in conjunction with the American Medical Informatics Association 2019 Annual Symposium.

Results: Of the 78 international teams that signed on to the n2c2/OHNLP ClinicalSTS shared task, 33 produced a total of 87 valid system submissions. The top 3 systems were generated by IBM Research, the National Center for Biotechnology Information, and the University of Florida, with Pearson correlations of r=.9010, r=.8967, and r=.8864, respectively. Most top-performing systems used state-of-the-art neural language models, such as BERT and XLNet, and state-of-the-art training schemas in deep learning, such as pretraining and fine-tuning schema, and multitask learning. Overall, the participating systems performed better on the Epic sentence pairs than on the GE sentence pairs, despite a much larger portion of the training data being GE sentence pairs.

Conclusions: The 2019 n2c2/OHNLP ClinicalSTS shared task focused on computing semantic similarity for clinical text sentences generated from clinical notes in the real world. It attracted a large number of international teams. The ClinicalSTS shared task could continue to serve as a venue for researchers in natural language processing and medical informatics communities to develop and improve semantic textual similarity techniques for clinical text.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7732706PMC
http://dx.doi.org/10.2196/23375DOI Listing

Publication Analysis

Top Keywords

semantic textual
24
textual similarity
24
shared task
24
clinical text
20
clinical
16
natural language
16
language processing
16
clinicalsts shared
16
2019 n2c2/ohnlp
12
clinicalsts data
12

Similar Publications

Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval.

Neural Netw

December 2024

School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China; Beijing Key Laboratory of Network System and Network Culture, Beijing, China.

The goal of Text-to-Image Person Retrieval (TIPR) is to retrieve specific person images according to the given textual descriptions. A primary challenge in this task is bridging the substantial representational gap between visual and textual modalities. The prevailing methods map texts and images into unified embedding space for matching, while the intricate semantic correspondences between texts and images are still not effectively constructed.

View Article and Find Full Text PDF

Dual-tower model with semantic perception and timespan-coupled hypergraph for next-basket recommendation.

Neural Netw

December 2024

Intelligent Financial Software Engineering New Technology Joint Laboratory, Xidian University, Xi'an, 710071, China; Shanghai Fairyland Software Corp., Ltd., Shanghai, 200233, China. Electronic address:

Next basket recommendation (NBR) is an essential task within the realm of recommendation systems and is dedicated to the anticipation of user preferences in the next moment based on the analysis of users' historical sequences of engaged baskets. Current NBR models utilise unique identity (ID) information to represent distinct users and items and focus on capturing the dynamic preferences of users through sequential encoding techniques such as recurrent neural networks and hierarchical time decay modelling, which have dominated the NBR field more than a decade. However, these models exhibit two significant limitations, resulting in suboptimal representations for both users and items.

View Article and Find Full Text PDF

Introduction: The escalating complexity of medical literature necessitates tools to enhance readability for patients. This study aimed to evaluate the efficacy of ChatGPT-4 in simplifying neurology and neurosurgical abstracts and patient education materials (PEMs) while assessing content preservation using Latent Semantic Analysis (LSA).

Methods: A total of 100 abstracts (25 each from Neurosurgery, Journal of Neurosurgery, Lancet Neurology, and JAMA Neurology) and 340 PEMs (66 from the American Association of Neurological Surgeons, 274 from the American Academy of Neurology) were transformed by a GPT-4.

View Article and Find Full Text PDF

VIIDA and InViDe: computational approaches for generating and evaluating inclusive image paragraphs for the visually impaired.

Disabil Rehabil Assist Technol

December 2024

Department of Informatics, Universidade Federal de Viçosa - UFV, Viçosa, Brazil.

Background: Existing image description methods when used as Assistive Technologies often fall short in meeting the needs of blind or low vision (BLV) individuals. They tend to either compress all visual elements into brief captions, create disjointed sentences for each image region, or provide extensive descriptions.

Purpose: To address these limitations, we introduce VIIDA, a procedure aimed at the Visually Impaired which implements an Image Description Approach, focusing on webinar scenes.

View Article and Find Full Text PDF

In the context of high-quality economic development, technological innovation has emerged as a fundamental driver of socio-economic progress. The consequent proliferation of science and technology news, which acts as a vital medium for disseminating technological advancements and policy changes, has attracted considerable attention from technology management agencies and innovation organizations. Nevertheless, online science and technology news has historically exhibited characteristics such as limited scale, disorderliness, and multi-dimensionality, which is extremely inconvenient for users of deep application.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!