Purpose: To facilitate identification of clinical trial participation candidates, we developed a machine learning tool that automates the determination of a patient's metastatic status, on the basis of unstructured electronic health record (EHR) data.
Methods: This tool scans EHR documents, extracting text snippet features surrounding key words (such as metastatic, progression, and local). A regularized logistic regression model was trained and used to classify patients across five metastatic categories: highly likely and likely positive, highly likely and likely negative, and unknown.