Knowledge graph construction for heart failure using large language models with prompt engineering.

Tianhan Xu Yixun Gu Mantian Xue Renjie Gu Bin Li Xiang Gu

Front Comput Neurosci

Department of Cardiovascular, Northern Jiangsu Province People Hospital of Yangzhou University, Yangzhou, Jiangsu, China.

Published: July 2024

Constructing a knowledge graph for diseases, specifically heart failure, is important for enhancing clinical diagnosis, treatment, and health management, but current methods often struggle with limited training data and out-of-distribution entities.* -
This study introduces an innovative pipeline that uses large language models, prompt engineering, and expert refinement to improve the design and extraction phases of knowledge graph construction.* -
Results show the proposed TwoStepChat method significantly outperforms traditional methods, saving 65% of annotation time and effectively handling information not present in training data.*

Introduction: Constructing an accurate and comprehensive knowledge graph of specific diseases is critical for practical clinical disease diagnosis and treatment, reasoning and decision support, rehabilitation, and health management. For knowledge graph construction tasks (such as named entity recognition, relation extraction), classical BERT-based methods require a large amount of training data to ensure model performance. However, real-world medical annotation data, especially disease-specific annotation samples, are very limited. In addition, existing models do not perform well in recognizing out-of-distribution entities and relations that are not seen in the training phase.

Method: In this study, we present a novel and practical pipeline for constructing a heart failure knowledge graph using large language models and medical expert refinement. We apply prompt engineering to the three phases of schema design: schema design, information extraction, and knowledge completion. The best performance is achieved by designing task-specific prompt templates combined with the TwoStepChat approach.

Results: Experiments on two datasets show that the TwoStepChat method outperforms the Vanillia prompt and outperforms the fine-tuned BERT-based baselines. Moreover, our method saves 65% of the time compared to manual annotation and is better suited to extract the out-of-distribution information in the real world.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11250484	PMC
http://dx.doi.org/10.3389/fncom.2024.1389475	DOI Listing

Publication Analysis

Top Keywords

knowledge graph

graph construction

heart failure

large language

language models

prompt engineering

schema design

knowledge

construction heart

failure large

Similar Publications

BioGSF: a graph-driven semantic feature integration framework for biomedical relation extraction.

Brief Bioinform

November 2024

Suzhou Key Lab of Multi-modal Data Fusion and Intelligent Healthcare, No. 1188 Wuzhong Avenue, Wuzhong District Suzhou, Suzhou 215004, China.

Yang Yang Zixuan Zheng Yuyang Xu Huifang Wei Wenying Yan

The automatic and accurate extraction of diverse biomedical relations from literature constitutes the core elements of medical knowledge graphs, which are indispensable for healthcare artificial intelligence. Currently, fine-tuning through stacking various neural networks on pre-trained language models (PLMs) represents a common framework for end-to-end resolution of the biomedical relation extraction (RE) problem. Nevertheless, sequence-based PLMs, to a certain extent, fail to fully exploit the connections between semantics and the topological features formed by these connections.

View Article and Find Full Text PDF

Similar Publications

A Novel Hyper-Heuristic Algorithm with Soft and Hard Constraints for Causal Discovery Using a Linear Structural Equation Model.

Entropy (Basel)

January 2025

School of Electronic and Information, Northwestern Polytechnical University, Xi'an 710129, China.

Yinglong Dang Xiaoguang Gao Zidong Wang

Artificial intelligence plays an indispensable role in improving productivity and promoting social development, and causal discovery is one of the extremely important research directions in this field. Acyclic directed graphs (DAGs) are the most commonly used tool in causal modeling because of their excellent interpretability and structural properties. However, in the face of insufficient data, the accuracy and efficiency of DAGs learning are greatly reduced, resulting in a false perception of causality.

View Article and Find Full Text PDF

Similar Publications

Few-Shot Graph Anomaly Detection via Dual-Level Knowledge Distillation.

Entropy (Basel)

January 2025

National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu 610065, China.

Xuan Li Dejie Cheng Luheng Zhang Chengfang Zhang Ziliang Feng

Graph anomaly detection is crucial in many high-impact applications across diverse fields. In anomaly detection tasks, collecting plenty of annotated data tends to be costly and laborious. As a result, few-shot learning has been explored to address the issue by requiring only a few labeled samples to achieve good performance.

View Article and Find Full Text PDF

Similar Publications

A bibliometric review of functional ingredients and their efficacy in developing functional biscuits.

F1000Res

January 2025

Department of Data Science, Prasanna School of Public Health, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India.

Kshama Vishwakarma Varalakshmi Chandra Sekaran Vidya Patwardhan Asha Kamath

Introduction: Numerous studies have concluded that the functional ingredients benefit human health. Similarly, present times have seen exponential growth in functional food in bakery product segments like breads and biscuits. However, there is a lack of information on functional ingredients and their usefulness in developing functional bakery products.

View Article and Find Full Text PDF

Similar Publications

Consensus representation of multiple cell-cell graphs from gene signaling pathways for cell type annotation.

BMC Biol

January 2025

Research Office, City University of Hong Kong (Dongguan), Dongguan, 523000, China.

Yu-An Huang Yue-Chao Li Zhu-Hong You Lun Hu Peng-Wei Hu

Background: Recent advancements in single-cell RNA sequencing have greatly expanded our knowledge of the heterogeneous nature of tissues. However, robust and accurate cell type annotation continues to be a major challenge, hindered by issues such as marker specificity, batch effects, and a lack of comprehensive spatial and interaction data. Traditional annotation methods often fail to adequately address the complexity of cellular interactions and gene regulatory networks.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!