Inferring cancer disease response from radiology reports using large language models with data augmentation and prompting.

J Am Med Inform Assoc

Department of Computer Science, National University of Singapore, Singapore.

Published: September 2023

- The study aimed to evaluate large language models' ability to interpret cancer treatment outcomes from free-text radiology reports, using a dataset of over 10,000 CT reports categorized by disease response.
- Several models, including transformer models and conventional machine learning methods, were tested, with the GatorTron transformer achieving the highest accuracy (approximately 89%) and performance further boosted by data augmentation techniques.
- The findings suggest that these models could assist in analyzing extensive datasets for cancer progression and serve to provide automated insights for clinicians on patient disease responses.

Objective: To assess large language models on their ability to accurately infer cancer disease response from free-text radiology reports.

Materials And Methods: We assembled 10 602 computed tomography reports from cancer patients seen at a single institution. All reports were classified into: no evidence of disease, partial response, stable disease, or progressive disease. We applied transformer models, a bidirectional long short-term memory model, a convolutional neural network model, and conventional machine learning methods to this task. Data augmentation using sentence permutation with consistency loss as well as prompt-based fine-tuning were used on the best-performing models. Models were validated on a hold-out test set and an external validation set based on Response Evaluation Criteria in Solid Tumors (RECIST) classifications.

Results: The best-performing model was the GatorTron transformer which achieved an accuracy of 0.8916 on the test set and 0.8919 on the RECIST validation set. Data augmentation further improved the accuracy to 0.8976. Prompt-based fine-tuning did not further improve accuracy but was able to reduce the number of training reports to 500 while still achieving good performance.

Discussion: These models could be used by researchers to derive progression-free survival in large datasets. It may also serve as a decision support tool by providing clinicians an automated second opinion of disease response.

Conclusions: Large clinical language models demonstrate potential to infer cancer disease response from radiology reports at scale. Data augmentation techniques are useful to further improve performance. Prompt-based fine-tuning can significantly reduce the size of the training dataset.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10531105	PMC
http://dx.doi.org/10.1093/jamia/ocad133	DOI Listing

Publication Analysis

Top Keywords

data augmentation

cancer disease

disease response

language models

prompt-based fine-tuning

response radiology

radiology reports

large language

infer cancer

test set

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered