Background And Objective: Accurate identification and prioritization of driver-mutations in cancer is critical for effective patient management. Despite the presence of numerous bioinformatic algorithms for estimating mutation pathogenicity, there is significant variation in their assessments. This inconsistency is evident even for well-established cancer driver mutations. This study aims to develop an ensemble machine learning approach to evaluate the performance (rank) of pathogenic and conservation scoring algorithms (PCSAs) based on their ability to distinguish pathogenic driver mutations from benign passenger (non-driver) mutations in head and neck squamous cell carcinoma (HNSC).

Methods: The study used a dataset from 502 HNSC patients, classifying mutations based on 299 known high-confidence cancer driver genes. Missense somatic mutations in driver genes were treated as driver mutations, while non-driver mutations were randomly selected from other genes. Each mutation was annotated with 41 PCSAs. Three machine learning algorithms-logistic regression, random forest, and support vector machine-along with recursive feature elimination, were used to rank these PCSAs. The final ranking of the PCSAs was determined using rank-average-sort and rank-sum-sort methods.

Results: The random forest algorithm emerged as the top performer among the three tested ML algorithms, with an AUC-ROC of 0.89, compared to 0.83 for the other two, in distinguishing pathogenic driver mutations from benign passenger mutations using all 41 PCSAs. The top 11 PCSAs were selected based on the first quintile cut-off from the final rank-sum distribution. Classifiers built using these top 11 PCSAs (DEOGEN2, Integrated_fitCons, MVP, etc.) demonstrated significantly higher performance (p-value < 2.22e-16) compared to those using the remaining 30 PCSAs across all three ML algorithms, in separating pathogenic driver from benign passenger mutations. The top PCSAs demonstrated strong performance on a validation cohort including independent HNSC and other cancer types: breast, lung, and colorectal - reflecting its consistency, robustness and generalizability.

Conclusions: The ensemble machine learning approach effectively evaluates the performance of PCSAs based on their ability to differentiate pathogenic drivers from benign passenger mutations in HNSC and other cancer types. Notably, some well-known PCSAs performed poorly, underscoring the importance of data-driven selection over relying solely on popularity.

Download full-text PDF

Source
http://dx.doi.org/10.1186/s13040-024-00420-xDOI Listing

Publication Analysis

Top Keywords

driver mutations
20
mutations
10
ensemble machine
8
cancer driver
8
machine learning
8
pathogenic driver
8
mutations benign
8
benign passenger
8
non-driver mutations
8
driver genes
8

Similar Publications

Background And Objective: Accurate identification and prioritization of driver-mutations in cancer is critical for effective patient management. Despite the presence of numerous bioinformatic algorithms for estimating mutation pathogenicity, there is significant variation in their assessments. This inconsistency is evident even for well-established cancer driver mutations.

View Article and Find Full Text PDF

Genomic and transcriptomic signatures of sequential carcinogenesis from papillary neoplasm to biliary tract cancer.

J Hepatol

January 2025

Department of Pathology, Yonsei University College of Medicine, Seoul, Republic of Korea; Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, Republic of Korea. Electronic address:

Background & Aims: Papillary neoplasms of the biliary tree, including intraductal papillary neoplasms (IPN) and intracholecystic papillary neoplasms (ICPN), are recognized as precancerous lesions. However, the genetic characteristics underlying sequential carcinogenesis remain unclear.

Methods: Whole-exome sequencing was performed on 166 neoplasms (33 intrahepatic IPNs, 44 extrahepatic IPNs, and 89 ICPNs), and 41 associated carcinomas.

View Article and Find Full Text PDF

Impact of calreticulin mutations on treatment and survival outcomes in myelofibrosis during ruxolitinib therapy.

Ann Hematol

January 2025

Department of Engineering for Innovation Medicine, Section of Innovation Biomedicine, Hematology Area, University of Verona, Verona, Italy.

Calreticulin (CALR) mutations are detected in around 20% of patients with primary and post-essential thrombocythemia myelofibrosis (MF). Regardless of driver mutations, patients with splenomegaly and symptoms are generally treated with JAK2-inhibitors, most commonly ruxolitinib. Recently, new therapies specifically targeting the CALR mutant clone have entered clinical investigation.

View Article and Find Full Text PDF

Background: Perioperative treatment of locally advanced non-small cell lung cancer (NSCLC) is attracting attention. The effect of neoadjuvant tyrosine kinase inhibitor (TKI) therapy on postoperative long-term outcomes in patients with driver gene mutations remains unclear. The aim of this study was to clarify the long-term survival outcomes of patients with stage III NSCLC harboring driver gene mutations who received preoperative TKI therapy.

View Article and Find Full Text PDF

Background: Immune checkpoint inhibitor (ICI) therapy has prolonged the survival of a proportion of patients with advanced non-small cell lung cancer (NSCLC). Histological quantification of programmed cell death-ligand 1 (PD-L1) in tumors is a widely adopted marker for predicting the efficacy of ICI treatment. However, its use in patients with malignant pleural effusion (MPE) is occasionally challenging because of the difficulty of tissue sampling.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!