An ensemble machine learning-based performance evaluation identifies top In-Silico pathogenicity prediction methods that best classify driver mutations in cancer.

Subrata Das Vatsal Patel Shouvik Chakravarty Arnab Ghosh Anirban Mukhopadhyay Nidhan K Biswas

BioData Min

Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India.

Published: January 2025

Background And Objective: Accurate identification and prioritization of driver-mutations in cancer is critical for effective patient management. Despite the presence of numerous bioinformatic algorithms for estimating mutation pathogenicity, there is significant variation in their assessments. This inconsistency is evident even for well-established cancer driver mutations. This study aims to develop an ensemble machine learning approach to evaluate the performance (rank) of pathogenic and conservation scoring algorithms (PCSAs) based on their ability to distinguish pathogenic driver mutations from benign passenger (non-driver) mutations in head and neck squamous cell carcinoma (HNSC).

Methods: The study used a dataset from 502 HNSC patients, classifying mutations based on 299 known high-confidence cancer driver genes. Missense somatic mutations in driver genes were treated as driver mutations, while non-driver mutations were randomly selected from other genes. Each mutation was annotated with 41 PCSAs. Three machine learning algorithms-logistic regression, random forest, and support vector machine-along with recursive feature elimination, were used to rank these PCSAs. The final ranking of the PCSAs was determined using rank-average-sort and rank-sum-sort methods.

Results: The random forest algorithm emerged as the top performer among the three tested ML algorithms, with an AUC-ROC of 0.89, compared to 0.83 for the other two, in distinguishing pathogenic driver mutations from benign passenger mutations using all 41 PCSAs. The top 11 PCSAs were selected based on the first quintile cut-off from the final rank-sum distribution. Classifiers built using these top 11 PCSAs (DEOGEN2, Integrated_fitCons, MVP, etc.) demonstrated significantly higher performance (p-value < 2.22e-16) compared to those using the remaining 30 PCSAs across all three ML algorithms, in separating pathogenic driver from benign passenger mutations. The top PCSAs demonstrated strong performance on a validation cohort including independent HNSC and other cancer types: breast, lung, and colorectal - reflecting its consistency, robustness and generalizability.

Conclusions: The ensemble machine learning approach effectively evaluates the performance of PCSAs based on their ability to differentiate pathogenic drivers from benign passenger mutations in HNSC and other cancer types. Notably, some well-known PCSAs performed poorly, underscoring the importance of data-driven selection over relying solely on popularity.

Download full-text PDF	Source
http://dx.doi.org/10.1186/s13040-024-00420-x	DOI Listing

Publication Analysis

Top Keywords

driver mutations

mutations

ensemble machine

cancer driver

machine learning

pathogenic driver

mutations benign

benign passenger

non-driver mutations

driver genes

Similar Publications

An ensemble machine learning-based performance evaluation identifies top In-Silico pathogenicity prediction methods that best classify driver mutations in cancer.

BioData Min

January 2025

Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India.

Subrata Das Vatsal Patel Shouvik Chakravarty Arnab Ghosh Anirban Mukhopadhyay

View Article and Find Full Text PDF

Similar Publications

Genomic and transcriptomic signatures of sequential carcinogenesis from papillary neoplasm to biliary tract cancer.

J Hepatol

January 2025

Department of Pathology, Yonsei University College of Medicine, Seoul, Republic of Korea; Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, Republic of Korea. Electronic address:

Taek Chung Seungho Oh Jeongsoo Won Jiho Park Jeong Eun Yoo

Background & Aims: Papillary neoplasms of the biliary tree, including intraductal papillary neoplasms (IPN) and intracholecystic papillary neoplasms (ICPN), are recognized as precancerous lesions. However, the genetic characteristics underlying sequential carcinogenesis remain unclear.

Methods: Whole-exome sequencing was performed on 166 neoplasms (33 intrahepatic IPNs, 44 extrahepatic IPNs, and 89 ICPNs), and 41 associated carcinomas.

View Article and Find Full Text PDF

Similar Publications

Impact of calreticulin mutations on treatment and survival outcomes in myelofibrosis during ruxolitinib therapy.

Ann Hematol

January 2025

Department of Engineering for Innovation Medicine, Section of Innovation Biomedicine, Hematology Area, University of Verona, Verona, Italy.

Francesca Palandri Filippo Branzanti Erika Morsia Alessandra Dedola Giulia Benevolo

Calreticulin (CALR) mutations are detected in around 20% of patients with primary and post-essential thrombocythemia myelofibrosis (MF). Regardless of driver mutations, patients with splenomegaly and symptoms are generally treated with JAK2-inhibitors, most commonly ruxolitinib. Recently, new therapies specifically targeting the CALR mutant clone have entered clinical investigation.

View Article and Find Full Text PDF

Similar Publications

Successful long-term outcome of neoadjuvant sequential targeted therapy and chemotherapy for stage III non-small cell lung carcinoma: 10 case series.

Transl Lung Cancer Res

December 2024

Department of General Thoracic Surgery, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan.

Masaya Aoki Ryo Miyata Go Kamimura Shoichiro Morizono Takuya Tokunaga

Background: Perioperative treatment of locally advanced non-small cell lung cancer (NSCLC) is attracting attention. The effect of neoadjuvant tyrosine kinase inhibitor (TKI) therapy on postoperative long-term outcomes in patients with driver gene mutations remains unclear. The aim of this study was to clarify the long-term survival outcomes of patients with stage III NSCLC harboring driver gene mutations who received preoperative TKI therapy.

View Article and Find Full Text PDF

Similar Publications

A novel bioassay reflecting response to immune checkpoint inhibitor therapy in non-small cell lung cancer with malignant pleural effusion.

Transl Lung Cancer Res

December 2024

Division of Pulmonary Medicine, Department of Medicine, Jichi Medical University Hospital, Shimotsuke, Tochigi, Japan.

Ayako Takigami Naoko Mato Koichi Hagiwara Makoto Maemondo

Background: Immune checkpoint inhibitor (ICI) therapy has prolonged the survival of a proportion of patients with advanced non-small cell lung cancer (NSCLC). Histological quantification of programmed cell death-ligand 1 (PD-L1) in tumors is a widely adopted marker for predicting the efficacy of ICI treatment. However, its use in patients with malignant pleural effusion (MPE) is occasionally challenging because of the difficulty of tissue sampling.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!