Background: Semantic textual similarity (STS) captures the degree of semantic similarity between texts. It plays an important role in many natural language processing applications such as text summarization, question answering, machine translation, information retrieval, dialog systems, plagiarism detection, and query ranking. STS has been widely studied in the general English domain. However, there exists few resources for STS tasks in the clinical domain and in languages other than English, such as Japanese.

Objective: The objective of this study is to capture semantic similarity between Japanese clinical texts (Japanese clinical STS) by creating a Japanese dataset that is publicly available.

Materials: We created two datasets for Japanese clinical STS: (1) Japanese case reports (CR dataset) and (2) Japanese electronic medical records (EMR dataset). The CR dataset was created from publicly available case reports extracted from the CiNii database. The EMR dataset was created from Japanese electronic medical records.

Methods: We used an approach based on bidirectional encoder representations from transformers (BERT) to capture the semantic similarity between the clinical domain texts. BERT is a popular approach for transfer learning and has been proven to be effective in achieving high accuracy for small datasets. We implemented two Japanese pretrained BERT models: a general Japanese BERT and a clinical Japanese BERT. The general Japanese BERT is pretrained on Japanese Wikipedia texts while the clinical Japanese BERT is pretrained on Japanese clinical texts.

Results: The BERT models performed well in capturing semantic similarity in our datasets. The general Japanese BERT outperformed the clinical Japanese BERT and achieved a high correlation with human score (0.904 in the CR dataset and 0.875 in the EMR dataset). It was unexpected that the general Japanese BERT outperformed the clinical Japanese BERT on clinical domain dataset. This could be due to the fact that the general Japanese BERT is pretrained on a wide range of texts compared with the clinical Japanese BERT.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294940PMC
http://dx.doi.org/10.1055/s-0041-1731390DOI Listing

Publication Analysis

Top Keywords

japanese bert
40
japanese
21
japanese clinical
20
general japanese
20
clinical japanese
20
clinical domain
16
semantic similarity
16
bert
15
clinical
13
emr dataset
12

Similar Publications

Missed critical imaging findings, particularly those indicating cancer, are a common issue that can result in delays in patient follow-up and treatment. To address this, we developed a rule-based natural language processing (NLP) algorithm to detect cancer-suspicious findings from Japanese radiology reports. The dataset used consisted of chest and abdomen CT reports from six institutions.

View Article and Find Full Text PDF
Article Synopsis
  • Intentional overdoses (OD) of OTC and prescription drugs are increasingly becoming a global issue, and there’s a lack of social media-focused research on this topic.* -
  • This study analyzed over 30,000 Japanese Twitter posts related to "OD" using advanced language processing techniques, identifying specific drug mentions and emotional tones.* -
  • Findings indicated a significant correlation between negative emotions and OD incidents, highlighting the potential of social media surveillance in identifying high-risk individuals and shaping prevention efforts.*
View Article and Find Full Text PDF
Article Synopsis
  • The study focused on analyzing narratives from breast cancer patients to identify various concerns using a natural language processing model called BERT.
  • Researchers labeled interview transcripts with five categories of concerns: "treatment," "physical," "psychological," "work/financial," and "family/friends."
  • The classifiers created from these labeled texts demonstrated varying precision levels, with high scores for "physical" and "work/financial," and lower scores for "treatment," indicating room for improvement through domain adaptation techniques.
View Article and Find Full Text PDF

Enhancing the design of voting advice applications with BERT language model.

Front Artif Intell

August 2024

e-Society Laboratory, College of Information Science and Engineering, Ritsumeikan University, Osaka, Japan.

Article Synopsis
  • - VAAs are popular tools that around 30% of voters consider when making electoral decisions, as they help compare voter preferences with party positions based on policy statements.
  • - Creating VAA policy statements is a labor-intensive process, requiring extensive analysis of political data, such as party manifestos, within a limited time frame.
  • - This study introduces a system that leverages pre-trained language models like BERT to assist VAA designers by providing objective suggestions for policy statements, although human oversight remains necessary.
View Article and Find Full Text PDF

Background: Medication safety in residential care facilities is a critical concern, particularly when nonmedical staff provide medication assistance. The complex nature of medication-related incidents in these settings, coupled with the psychological impact on health care providers, underscores the need for effective incident analysis and preventive strategies. A thorough understanding of the root causes, typically through incident-report analysis, is essential for mitigating medication-related incidents.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!