Objective: To develop a domain-specific large language model (LLM) for LI-RADS v2018 categorization of hepatic observations based on free-text descriptions extracted from MRI reports.
Material And Methods: This retrospective study included 291 small liver observations, divided into training (n = 141), validation (n = 30), and test (n = 120) datasets. Of these, 120 were fictitious, and 171 were extracted from 175 MRI reports from a single institution. The algorithm's performance was compared to two independent radiologists and one hepatologist in a human replacement scenario, and considering two combined strategies (double reading with arbitration and triage). Agreement on LI-RADS category and dichotomic malignancy (LR-4, LR-5, and LR-M) were estimated using linear-weighted κ statistics and Cohen's κ, respectively. Sensitivity and specificity for LR-5 were calculated. The consensus agreement of three other radiologists served as the ground truth.
Results: The model showed moderate agreement against the ground truth for both LI-RADS categorization (κ = 0.54 [95% CI: 0.42-0.65]) and the dichotomized approach (κ = 0.58 [95% CI: 0.42-0.73]). Sensitivity and specificity for LR-5 were 0.76 (95% CI: 0.69-0.86) and 0.96 (95% CI: 0.91-1.00), respectively. When the chatbot was used as a triage tool, performance improved for LI-RADS categorization (κ = 0.86/0.87 for the two independent radiologists and κ = 0.76 for the hepatologist), dichotomized malignancy (κ = 0.94/0.91 and κ = 0.87) and LR-5 identification (1.00/0.98 and 0.85 sensitivity, 0.96/0.92 and 0.92 specificity), with no statistical significance compared to the human readers' individual performance. Through this strategy, the workload decreased by 45%.
Conclusion: LI-RADS v2018 categorization from unlabelled MRI reports is feasible using our LLM, and it enhances the efficiency of data curation.
Critical Relevance Statement: Our proof-of-concept study provides novel insights into the potential applications of LLMs, offering a real-world example of how these tools could be integrated into a local workflow to optimize data curation for research purposes.
Key Points: Automatic LI-RADS categorization from free-text reports would be beneficial to workflow and data mining. LiverAI, a GPT-4-based model, supported various strategies improving data curation efficiency by up to 60%. LLMs can integrate into workflows, significantly reducing radiologists' workload.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11584817 | PMC |
http://dx.doi.org/10.1186/s13244-024-01850-1 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!