Patients with Non-Small Cell Lung Cancer (NSCLC) present a variety of clinical symptoms, such as dyspnea and chest pain, complicating accurate diagnosis. NSCLC includes subtypes distinguished by histological characteristics, specifically lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC). This study aims to compare and identify abnormal gene expression patterns in LUAD and LUSC samples relative to adjacent healthy tissues using an explainable artificial intelligence (XAI) framework. The LASSO algorithm was employed to identify the top gene features in the LUAD and LUSC datasets. An ensemble-based extreme gradient boosting (XGBoost) machine learning (ML) algorithm was trained and interpreted using SHapley Additive exPlanations (SHAP), with top features undergoing biological annotation through survival and functional enrichment analyses. The XAI-based SHAP module addresses the opaque nature of ML models. Notably, 35 and 33 genes were identified for LUAD and LUSC, respectively, using the LASSO algorithm. Performance metrics such as average accuracy and Matthew's correlation coefficient were evaluated. The XGBoost model demonstrated an average accuracy of 99.1 % for LUAD and 98.6 % for LUSC. The SFTPC gene emerged as the most significant feature across both NSCLC subtypes. For LUAD, genes such as STX11, CLEC3B, EMP2, and LYVE1 significantly influenced the XAI-SHAP framework. Conversely, GKN2, OGN, SLC39A8, and MMRN1 were identified for LUSC. Survival analysis and functional validation of these genes highlighted the physiological functions observed to be dysregulated in the NSCLC subtypes. These identified genes have the potential to enhance current medical diagnostics and therapeutics.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.compbiolchem.2024.108333 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!