Beyond XGBoost and SHAP: Unveiling true feature importance.

J Hazard Mater

Faculty of Data Science, Musashino University, 3-3-3 Ariake Koto-ku, Tokyo 135-8181, Japan. Electronic address:

Published: January 2025

This paper outlines key machine learning principles, focusing on the use of XGBoost and SHAP values to assist researchers in avoiding analytical pitfalls. XGBoost builds models by incrementally adding decision trees, each addressing the errors of the previous one, which can result in inflated feature importance scores due to the method's emphasis on misclassified examples. While SHAP values provide a theoretically robust way to interpret predictions, their dependence on model structure and feature interactions can introduce biases. The lack of ground truth values complicates model evaluation, as biased feature importance can obscure real relationships with target variables. Ground truth values, representing the actual labels used in model training and validation, are crucial for improving predictive accuracy, serving as benchmarks for comparing model outcomes to true results. However, they do not ensure real associations between features and targets. Instead, they help gauge the model's effectiveness in achieving high accuracy. This paper underscores the necessity for researchers to recognize biases in feature importance and model evaluation, advocating for the use of rigorous statistical methods to enhance the reliability of analyses in machine learning research.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.jhazmat.2025.137382	DOI Listing

Publication Analysis

Top Keywords

xgboost shap

machine learning

shap values

ground truth

truth values

model evaluation

feature

model

shap unveiling

unveiling true

Similar Publications

A deep-learning system integrating electrocardiograms and laboratory indicators for diagnosing acute aortic dissection and acute myocardial infarction.

Int J Cardiol

January 2025

Department of Computer Center, Zigong Fourth People's Hospital, Zigong, Sichuan 643000, China.

Liping Wang Hai Wu Chaoyong Wu Lan Shu Dehao Zhou

Background: Acute Stanford Type A aortic dissection (AAD-type A) and acute myocardial infarction (AMI) present with similar symptoms but require distinct treatments. Efficient differentiation is critical due to limited access to radiological equipment in many primary healthcare. This study develops a multimodal deep learning model integrating electrocardiogram (ECG) signals and laboratory indicators to enhance diagnostic accuracy for AAD-type A and AMI.

View Article and Find Full Text PDF

Similar Publications

Beyond XGBoost and SHAP: Unveiling true feature importance.

J Hazard Mater

January 2025

Faculty of Data Science, Musashino University, 3-3-3 Ariake Koto-ku, Tokyo 135-8181, Japan. Electronic address:

Yoshiyasu Takefuji

View Article and Find Full Text PDF

Similar Publications

Explainable machine learning model for assessing health status in patients with comorbid coronary heart disease and depression: Development and validation study.

Int J Med Inform

January 2025

Department of Emergency Medicine Qilu Hospital of Shandong University Jinan China; Shandong Provincial Clinical Research Center for Emergency and Critical Care Medicine Institute of Emergency and Critical Care Medicine of Shandong University Chest Pain Center Qilu Hospital of Shandong University Jinan China; Key Laboratory of Emergency and Critical Care Medicine of Shandong Province Key Laboratory of Cardiopulmonary-Cerebral Resuscitation Research of Shandong Province Shandong Provincial Engineering Laboratory for Emergency and Critical Care Medicine Shandong Key Laboratory: Magnetic Field-free Medicine & Functional Imaging Qilu Hospital of Shandong University Jinan China. Electronic address:

Jiqing Li Shuo Wu Jianhua Gu

Background: Coronary heart disease (CHD) and depression frequently co-occur, significantly impacting patient outcomes. However, comprehensive health status assessment tools for this complex population are lacking. This study aimed to develop and validate an explainable machine learning model to evaluate overall health status in patients with comorbid CHD and depression.

View Article and Find Full Text PDF

Similar Publications

Interpretable machine learning models for predicting skip metastasis in cN0 papillary thyroid cancer based on clinicopathological and elastography radiomics features.

Front Oncol

January 2025

Departments of Ultrasound, Jiading District Central Hospital Affiliated Shanghai University of Medicine &Health Sciences, Shanghai, China.

Xiaohua Yao Mingming Tang Min Lu Jie Zhou Debin Yang

Background: Skip lymph node metastasis (SLNM) in papillary thyroid cancer (PTC) involves cancer cells bypassing central nodes to directly metastasize to lateral nodes, often undetected by standard preoperative ultrasonography. Although multiple models exist to identify SLNM, they are inadequate for clinically node-negative (cN0) patients, resulting in underestimated metastatic risks and compromised treatment effectiveness. Our study aims to develop and validate a machine learning (ML) model that combines elastography radiomics with clinicopathological data to predict pre-surgical SLNM risk in cN0 PTC patients with increased risk of lymph node metastasis (LNM), improving their treatment strategies.

View Article and Find Full Text PDF

Similar Publications

Predicting functional outcomes of patients with spontaneous intracerebral hemorrhage based on explainable machine learning models: a multicenter retrospective study.

Front Neurol

January 2025

Department of Neurosurgery, Changshu Hospital Affiliated to Soochow University, Changshu, China.

Bin Pan Fengda Li Chuanghong Liu Zeyi Li Chengfa Sun

Background: Spontaneous intracerebral hemorrhage (SICH) is the second most common cause of cerebrovascular disease after ischemic stroke, with high mortality and disability rates, imposing a significant economic burden on families and society. This retrospective study aimed to develop and evaluate an interpretable machine learning model to predict functional outcomes 3 months after SICH.

Methods: A retrospective analysis was conducted on clinical data from 380 patients with SICH who were hospitalized at three different centers between June 2020 and June 2023.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!