Background: For the recruitment and monitoring of subjects for therapy studies, it is important to predict whether mild cognitive impaired (MCI) subjects will prospectively develop Alzheimer's disease (AD). Machine learning (ML) is suitable to improve early AD prediction. The etiology of AD is heterogeneous, which leads to high variability in disease patterns. Further variability originates from multicentric study designs, varying acquisition protocols, and errors in the preprocessing of magnetic resonance imaging (MRI) scans. The high variability makes the differentiation between signal and noise difficult and may lead to overfitting. This article examines whether an automatic and fair data valuation method based on Shapley values can identify the most informative subjects to improve ML classification.
Methods: An ML workflow was developed and trained for a subset of the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. The validation was executed for an independent ADNI test set and for the Australian Imaging, Biomarker and Lifestyle Flagship Study of Ageing (AIBL) cohort. The workflow included volumetric MRI feature extraction, feature selection, sample selection using Data Shapley, random forest (RF), and eXtreme Gradient Boosting (XGBoost) for model training as well as Kernel SHapley Additive exPlanations (SHAP) values for model interpretation.
Results: The RF models, which excluded 134 of the 467 training subjects based on their RF Data Shapley values, outperformed the base models that reached a mean accuracy of 62.64% by 5.76% (3.61 percentage points) for the independent ADNI test set. The XGBoost base models reached a mean accuracy of 60.00% for the AIBL data set. The exclusion of those 133 subjects with the smallest RF Data Shapley values could improve the classification accuracy by 2.98% (1.79 percentage points). The cutoff values were calculated using an independent validation set.
Conclusion: The Data Shapley method was able to improve the mean accuracies for the test sets. The most informative subjects were associated with the number of ApolipoproteinE ε4 (ApoE ε4) alleles, cognitive test results, and volumetric MRI measurements.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8444618 | PMC |
http://dx.doi.org/10.1186/s13195-021-00879-4 | DOI Listing |
Front Neurol
January 2025
Department of Radiology, Affiliated Hospital 6 of Nantong University, Yancheng Third People's Hospital, Yancheng, Jiangsu, China.
Introduction: Early prognosis prediction of acute ischemic stroke (AIS) can support clinicians in choosing personalized treatment plans. The aim of this study is to develop a machine learning (ML) model that uses multiple post-labeling delay times (multi-PLD) arterial spin labeling (ASL) radiomics features to achieve early and precise prediction of AIS prognosis.
Methods: This study enrolled 102 AIS patients admitted between December 2020 and September 2024.
Sci Rep
January 2025
Gastroenterology Department, The First Affiliated Hospital of Guangxi Medical University, Nanning, China.
To retrospectively develop and validate an interpretable deep learning model and nomogram utilizing endoscopic ultrasound (EUS) images to predict pancreatic neuroendocrine tumors (PNETs). Following confirmation via pathological examination, a retrospective analysis was performed on a cohort of 266 patients, comprising 115 individuals diagnosed with PNETs and 151 with pancreatic cancer. These patients were randomly assigned to the training or test group in a 7:3 ratio.
View Article and Find Full Text PDFToxicology
January 2025
Deparment of clinical pharmacy, Jieyang People's Hospital, 522000, China. Electronic address:
Drug-induced autoimmunity (DIA) is a non-IgE immune-related adverse drug reaction that poses substantial challenges in predictive toxicology due to its idiosyncratic nature, complex pathogenesis, and diverse clinical manifestations. To address these challenges, we developed InterDIA, an interpretable machine learning framework for predicting DIA toxicity based on molecular physicochemical properties. Multi-strategy feature selection and advanced ensemble resampling approaches were integrated to enhance prediction accuracy and overcome data imbalance.
View Article and Find Full Text PDFSingle-omics approaches often provide a limited view of complex biological systems, whereas multiomics integration offers a more comprehensive understanding by combining diverse data views. However, integrating heterogeneous data types and interpreting the intricate relationships between biological features-both within and across different data views-remains a bottleneck. To address these challenges, we introduce COSIME (Cooperative Multi-view Integration and Scalable Interpretable Model Explainer).
View Article and Find Full Text PDFWater Res X
May 2025
Institute for Artificial Intelligence R&D of Serbia, Fruškogorska 1, Novi Sad 21000, Serbia.
This study evaluates three Machine Learning (ML) models-Temporal Kolmogorov-Arnold Networks (TKAN), Long Short-Term Memory (LSTM), and Temporal Convolutional Networks (TCN)-focusing on their capabilities to improve prediction accuracy and efficiency in streamflow forecasting. We adopt a data-centric approach, utilizing large, validated datasets to train the models, and apply SHapley Additive exPlanations (SHAP) to enhance the interpretability and reliability of the ML models. The results show that TKAN outperforms LSTM but slightly lags behind TCN in streamflow forecasting.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!