Objective: To assess large language models on their ability to accurately infer cancer disease response from free-text radiology reports.
Materials And Methods: We assembled 10 602 computed tomography reports from cancer patients seen at a single institution. All reports were classified into: no evidence of disease, partial response, stable disease, or progressive disease.