Objective: To construct a novel tumor-node-morphology (TNMor) staging system derived from natural language processing (NLP) of pathology reports to predict outcomes of pancreatic ductal adenocarcinoma.
Method: This retrospective study with 1657 participants was based on a large referral center and The Cancer Genome Atlas Program (TCGA) dataset. In the training cohort, NLP was used to extract and screen prognostic predictors from pathology reports to develop the TNMor system, which was further evaluated with the tumor-node-metastasis (TNM) system in the internal and external validation cohort, respectively.