Objectives: This study evaluated the efficacy of integrating a retrieval-augmented generation (RAG) model and a large language model (LLM) to improve the accuracy of drug name mapping across international vocabularies.

Methods: Drug ingredient names were translated into English using the Japanese Accepted Names for Pharmaceuticals. Drug concepts were extracted from the standard vocabulary of OHDSI, and the accuracy of mappings between translated terms and RxNorm was assessed by vector similarity, using the BioBERT-generated embedded vectors as the baseline. Subsequently, we developed LLMs with RAG that distinguished the final candidates from the baseline. We assessed the efficacy of the LLM with RAG in candidate selection by comparing it with conventional methods based on vector similarity.

Results: The evaluation metrics demonstrated the superior performance of the combined LLM + RAG over traditional vector similarity methods. Notably, the hit rates of the Mixtral 8x7b and GPT-3.5 models exceeded 90%, significantly outperforming the baseline rate of 64% across stratified groups of PO drugs, injections, and all interventions. Furthermore, the r-precision metric, which measures the alignment between model judgment and human evaluation, revealed a notable improvement in LLM performance, ranging from 41% to 50% compared to the baseline of 23%.

Conclusions: Integrating an RAG and an LLM outperformed conventional string comparison and embedding vector similarity techniques, offering a more refined approach to global drug information mapping.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11570653PMC
http://dx.doi.org/10.4258/hir.2024.30.4.355DOI Listing

Publication Analysis

Top Keywords

vector similarity
12
retrieval-augmented generation
8
large language
8
language model
8
drug mapping
8
llm rag
8
rag
5
llm
5
mapping drug
4
drug terms
4

Similar Publications

Visceral leishmaniasis (VL) is a vector-borne disease caused by the obligate intracellular protozoan in India. VL can be complicated by post-kala-azar dermal leishmaniasis (PKDL), a macular or nodular rash that develops in 10%-20% of patients after treatment of VL in India. Patients with PKDL are infectious to sand flies, promoting further transmission of the parasite.

View Article and Find Full Text PDF

Background: Reliable and specific biomarkers that can distinguish autism spectrum disorders (ASDs) from commonly co-occurring attention-deficit/hyperactivity disorder (ADHD) are lacking, causing misses and delays in diagnosis, and reducing access to interventions and quality of life.

Aims: To examine whether an innovative, brief (1-min), videogame method called Computerised Assessment of Motor Imitation (CAMI), can identify ASD-specific imitation differences compared with neurotypical children and children with ADHD.

Method: This cross-sectional study used CAMI alongside standardised parent-report (Social Responsiveness Scale, Second Edition) and observational measures of autism (Autism Diagnostic Observation Schedule-Second Edition; ADOS-2), ADHD (Conners) and motor ability (Physical and Neurological Examination for Soft Signs).

View Article and Find Full Text PDF

Background: Investigators and funding organizations desire knowledge on topics and trends in publicly funded research but current efforts for manual categorization have been limited in breadth and depth of understanding.

Purpose: We present a semi-automated analysis of 21 years of R-type National Cancer Institute (NCI) grants to departments of radiation oncology and radiology using natural language processing (NLP).

Methods: We selected all non-education R-type NCI grants from 2000 to 2020 awarded to departments of radiation oncology/radiology with affiliated schools of medicine.

View Article and Find Full Text PDF

Adeno-associated virus (AAV) inverted terminal repeats (ITRs) induce p53-dependent apoptosis in human embryonic stem cells (hESCs). To interrogate this phenomenon, a synthetic ITR (SynITR), harboring substitutions in putative p53 binding sites was generated and evaluated for vector production and gene delivery. Replication of SynITR flanked transgenic genome was similar compared to wild type (wt) ITR, with a modest increase in vector titers.

View Article and Find Full Text PDF

Background: Christianson syndrome (CS) is an x-linked recessive neurodevelopmental and neurodegenerative condition characterized by severe intellectual disability, cerebellar degeneration, ataxia, and epilepsy. Mutations to the gene encoding NHE6 are responsible for CS, and we recently demonstrated that a mutation to the rat gene causes a similar phenotype in the spontaneous rat model, which exhibits cerebellar degeneration with motor dysfunction. In previous work, we used the PhP.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!