Comparative investigation of lung adenocarcinoma and squamous cell carcinoma transcriptome to reveal potential candidate biomarkers: An explainable AI approach.

Comput Biol Chem

Laboratory of Integrative Genomics, Department of Integrative Biology, School of BioSciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu 632014, India. Electronic address:

Published: December 2024

Patients with Non-Small Cell Lung Cancer (NSCLC) present a variety of clinical symptoms, such as dyspnea and chest pain, complicating accurate diagnosis. NSCLC includes subtypes distinguished by histological characteristics, specifically lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC). This study aims to compare and identify abnormal gene expression patterns in LUAD and LUSC samples relative to adjacent healthy tissues using an explainable artificial intelligence (XAI) framework. The LASSO algorithm was employed to identify the top gene features in the LUAD and LUSC datasets. An ensemble-based extreme gradient boosting (XGBoost) machine learning (ML) algorithm was trained and interpreted using SHapley Additive exPlanations (SHAP), with top features undergoing biological annotation through survival and functional enrichment analyses. The XAI-based SHAP module addresses the opaque nature of ML models. Notably, 35 and 33 genes were identified for LUAD and LUSC, respectively, using the LASSO algorithm. Performance metrics such as average accuracy and Matthew's correlation coefficient were evaluated. The XGBoost model demonstrated an average accuracy of 99.1 % for LUAD and 98.6 % for LUSC. The SFTPC gene emerged as the most significant feature across both NSCLC subtypes. For LUAD, genes such as STX11, CLEC3B, EMP2, and LYVE1 significantly influenced the XAI-SHAP framework. Conversely, GKN2, OGN, SLC39A8, and MMRN1 were identified for LUSC. Survival analysis and functional validation of these genes highlighted the physiological functions observed to be dysregulated in the NSCLC subtypes. These identified genes have the potential to enhance current medical diagnostics and therapeutics.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiolchem.2024.108333DOI Listing

Publication Analysis

Top Keywords

luad lusc
12
lung adenocarcinoma
8
squamous cell
8
cell carcinoma
8
lasso algorithm
8
average accuracy
8
nsclc subtypes
8
luad
6
lusc
6
comparative investigation
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!