Relation extraction using large language models: a case study on acupuncture point locations.

J Am Med Inform Assoc

Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, United States.

Published: November 2024

AI Article Synopsis

  • The study investigates the effectiveness of large language models (LLMs), particularly fine-tuned GPT-3.5, in accurately extracting acupoint location relations from a specific dataset of acupuncture points provided by the WHO.
  • The researchers compared various models, focusing on key relation types and found that fine-tuned GPT-3.5 achieved the highest micro-average F1 score of 0.92 in performance metrics.
  • The findings highlight the importance of domain-specific fine-tuning in improving model performance for acupuncture, paving the way for better clinical decision support and educational tools in this field.

Article Abstract

Objective: In acupuncture therapy, the accurate location of acupoints is essential for its effectiveness. The advanced language understanding capabilities of large language models (LLMs) like Generative Pre-trained Transformers (GPTs) and Llama present a significant opportunity for extracting relations related to acupoint locations from textual knowledge sources. This study aims to explore the performance of LLMs in extracting acupoint-related location relations and assess the impact of fine-tuning on GPT's performance.

Materials And Methods: We utilized the World Health Organization Standard Acupuncture Point Locations in the Western Pacific Region (WHO Standard) as our corpus, which consists of descriptions of 361 acupoints. Five types of relations ("direction_of", "distance_of", "part_of", "near_acupoint", and "located_near") (n = 3174) between acupoints were annotated. Four models were compared: pre-trained GPT-3.5, fine-tuned GPT-3.5, pre-trained GPT-4, as well as pretrained Llama 3. Performance metrics included micro-average exact match precision, recall, and F1 scores.

Results: Our results demonstrate that fine-tuned GPT-3.5 consistently outperformed other models in F1 scores across all relation types. Overall, it achieved the highest micro-average F1 score of 0.92.

Discussion: The superior performance of the fine-tuned GPT-3.5 model, as shown by its F1 scores, underscores the importance of domain-specific fine-tuning in enhancing relation extraction capabilities for acupuncture-related tasks. In light of the findings from this study, it offers valuable insights into leveraging LLMs for developing clinical decision support and creating educational modules in acupuncture.

Conclusion: This study underscores the effectiveness of LLMs like GPT and Llama in extracting relations related to acupoint locations, with implications for accurately modeling acupuncture knowledge and promoting standard implementation in acupuncture training and practice. The findings also contribute to advancing informatics applications in traditional and complementary medicine, showcasing the potential of LLMs in natural language processing.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11491641PMC
http://dx.doi.org/10.1093/jamia/ocae233DOI Listing

Publication Analysis

Top Keywords

fine-tuned gpt-35
12
relation extraction
8
large language
8
language models
8
acupuncture point
8
point locations
8
extracting relations
8
relations acupoint
8
acupoint locations
8
acupuncture
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!