A flexible symbolic regression method for constructing interpretable clinical prediction models.

William G La Cava Paul C Lee Imran Ajmal Xiruo Ding Priyanka Solanki Jordana B Cohen Jason H Moore Daniel S Herman

NPJ Digit Med

Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Published: June 2023

Machine learning (ML) models trained for triggering clinical decision support (CDS) are typically either accurate or interpretable but not both. Scaling CDS to the panoply of clinical use cases while mitigating risks to patients will require many ML models be intuitively interpretable for clinicians. To this end, we adapted a symbolic regression method, coined the feature engineering automation tool (FEAT), to train concise and accurate models from high-dimensional electronic health record (EHR) data. We first present an in-depth application of FEAT to classify hypertension, hypertension with unexplained hypokalemia, and apparent treatment-resistant hypertension (aTRH) using EHR data for 1200 subjects receiving longitudinal care in a large healthcare system. FEAT models trained to predict phenotypes adjudicated by chart review had equivalent or higher discriminative performance (p < 0.001) and were at least three times smaller (p < 1 × 10) than other potentially interpretable models. For aTRH, FEAT generated a six-feature, highly discriminative (positive predictive value = 0.70, sensitivity = 0.62), and clinically intuitive model. To assess the generalizability of the approach, we tested FEAT on 25 benchmark clinical phenotyping tasks using the MIMIC-III critical care database. Under comparable dimensionality constraints, FEAT's models exhibited higher area under the receiver-operating curve scores than penalized linear models across tasks (p < 6 × 10). In summary, FEAT can train EHR prediction models that are both intuitively interpretable and accurate, which should facilitate safe and effective scaling of ML-triggered CDS to the panoply of potential clinical use cases and healthcare practices.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10241925	PMC
http://dx.doi.org/10.1038/s41746-023-00833-8	DOI Listing

Publication Analysis

Top Keywords

symbolic regression

regression method

models trained

ehr data

models

flexible symbolic

method constructing

constructing interpretable

interpretable clinical

clinical prediction

Similar Publications

Role of data-driven regional growth model in shaping brain folding patterns.

Soft Matter

January 2025

School of Environmental, Civil, Agricultural and Mechanical Engineering, College of Engineering, University of Georgia, Athens, GA 30602, USA.

Jixin Hou Zhengwang Wu Xianyan Chen Li Wang Dajiang Zhu

The surface morphology of the developing mammalian brain is crucial for understanding brain function and dysfunction. Computational modeling offers valuable insights into the underlying mechanisms for early brain folding. Recent findings indicate significant regional variations in brain tissue growth, while the role of these variations in cortical development remains unclear.

View Article and Find Full Text PDF

Similar Publications

Mortality prediction after major surgery in a mixed population through machine learning: a multi-objective symbolic regression approach.

Anaesthesia

January 2025

Department of Medical Physics and Biomedical Engineering, University College London, London, UK.

Pietro Arina Davide Ferrari Nicholas Tetlow Amy Dewar Robert Stephens

Introduction: Understanding 1-year mortality following major surgery offers valuable insights into patient outcomes and the quality of peri-operative care. Few models exist that predict 1-year mortality accurately. This study aimed to develop a predictive model for 1-year mortality in patients undergoing complex non-cardiac surgery using a novel machine-learning technique called multi-objective symbolic regression.

View Article and Find Full Text PDF

Similar Publications

Selective Detection of Formaldehyde and Nitrogen Dioxide Using Innovative Modeling of SnO Surface Response to Pulsed Temperature Profile.

Sensors (Basel)

December 2024

Laboratoire d'Analyse et d'Architecture des Systèmes (LAAS), Université de Toulouse, CNRS, UPS, 7 Avenue du Colonel Roche, 31031 Toulouse, France.

Emilie Bialic Jimmy Leblet Aymen Sendi Paul Gersberg Axel Maupoux

The need for odor measurement and pollution source identification in various sectors (aeronautic, automobile, healthcare…) has increased in the last decade. Multisensor modules, such as electronic noses, seem to be a promising and inexpensive alternative to traditional sensors that were only sensitive to one gas at a time. However, the selectivity, the non-repetitiveness of their manufacture, and their drift remain major obstacles to the use of electronic noses.

View Article and Find Full Text PDF

Similar Publications

TExCNN: Leveraging Pre-Trained Models to Predict Gene Expression from Genomic Sequences.

Genes (Basel)

December 2024

Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.

Guohao Dong Yuqian Wu Lan Huang Fei Li Fengfeng Zhou

Background/objectives: Understanding the relationship between DNA sequences and gene expression levels is of significant biological importance. Recent advancements have demonstrated the ability of deep learning to predict gene expression levels directly from genomic data. However, traditional methods are limited by basic word encoding techniques, which fail to capture the inherent features and patterns of DNA sequences.

View Article and Find Full Text PDF

Similar Publications

Opening the AI Black Box: Distilling Machine-Learned Algorithms into Code.

Entropy (Basel)

December 2024

Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

Eric J Michaud Isaac Liao Vedang Lad Ziming Liu Anish Mudide

Can we turn AI black boxes into code? Although this mission sounds extremely challenging, we show that it is not entirely impossible by presenting a proof-of-concept method, MIPS, that can synthesize programs based on the automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code. We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4: MIPS solves 32 of them, including 13 that are not solved by GPT-4 (which also solves 30). MIPS uses an integer autoencoder to convert the RNN into a finite state machine, then applies Boolean or integer symbolic regression to capture the learned algorithm.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!