This paper outlines key machine learning principles, focusing on the use of XGBoost and SHAP values to assist researchers in avoiding analytical pitfalls. XGBoost builds models by incrementally adding decision trees, each addressing the errors of the previous one, which can result in inflated feature importance scores due to the method's emphasis on misclassified examples. While SHAP values provide a theoretically robust way to interpret predictions, their dependence on model structure and feature interactions can introduce biases. The lack of ground truth values complicates model evaluation, as biased feature importance can obscure real relationships with target variables. Ground truth values, representing the actual labels used in model training and validation, are crucial for improving predictive accuracy, serving as benchmarks for comparing model outcomes to true results. However, they do not ensure real associations between features and targets. Instead, they help gauge the model's effectiveness in achieving high accuracy. This paper underscores the necessity for researchers to recognize biases in feature importance and model evaluation, advocating for the use of rigorous statistical methods to enhance the reliability of analyses in machine learning research.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.jhazmat.2025.137382 | DOI Listing |
Int J Cardiol
January 2025
Department of Computer Center, Zigong Fourth People's Hospital, Zigong, Sichuan 643000, China.
Background: Acute Stanford Type A aortic dissection (AAD-type A) and acute myocardial infarction (AMI) present with similar symptoms but require distinct treatments. Efficient differentiation is critical due to limited access to radiological equipment in many primary healthcare. This study develops a multimodal deep learning model integrating electrocardiogram (ECG) signals and laboratory indicators to enhance diagnostic accuracy for AAD-type A and AMI.
View Article and Find Full Text PDFJ Hazard Mater
January 2025
Faculty of Data Science, Musashino University, 3-3-3 Ariake Koto-ku, Tokyo 135-8181, Japan. Electronic address:
This paper outlines key machine learning principles, focusing on the use of XGBoost and SHAP values to assist researchers in avoiding analytical pitfalls. XGBoost builds models by incrementally adding decision trees, each addressing the errors of the previous one, which can result in inflated feature importance scores due to the method's emphasis on misclassified examples. While SHAP values provide a theoretically robust way to interpret predictions, their dependence on model structure and feature interactions can introduce biases.
View Article and Find Full Text PDFBackground: Coronary heart disease (CHD) and depression frequently co-occur, significantly impacting patient outcomes. However, comprehensive health status assessment tools for this complex population are lacking. This study aimed to develop and validate an explainable machine learning model to evaluate overall health status in patients with comorbid CHD and depression.
View Article and Find Full Text PDFFront Oncol
January 2025
Departments of Ultrasound, Jiading District Central Hospital Affiliated Shanghai University of Medicine &Health Sciences, Shanghai, China.
Background: Skip lymph node metastasis (SLNM) in papillary thyroid cancer (PTC) involves cancer cells bypassing central nodes to directly metastasize to lateral nodes, often undetected by standard preoperative ultrasonography. Although multiple models exist to identify SLNM, they are inadequate for clinically node-negative (cN0) patients, resulting in underestimated metastatic risks and compromised treatment effectiveness. Our study aims to develop and validate a machine learning (ML) model that combines elastography radiomics with clinicopathological data to predict pre-surgical SLNM risk in cN0 PTC patients with increased risk of lymph node metastasis (LNM), improving their treatment strategies.
View Article and Find Full Text PDFFront Neurol
January 2025
Department of Neurosurgery, Changshu Hospital Affiliated to Soochow University, Changshu, China.
Background: Spontaneous intracerebral hemorrhage (SICH) is the second most common cause of cerebrovascular disease after ischemic stroke, with high mortality and disability rates, imposing a significant economic burden on families and society. This retrospective study aimed to develop and evaluate an interpretable machine learning model to predict functional outcomes 3 months after SICH.
Methods: A retrospective analysis was conducted on clinical data from 380 patients with SICH who were hospitalized at three different centers between June 2020 and June 2023.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!