Convolutional neural network models for cancer type prediction based on gene expression.

BMC Med Genomics

Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.

Published: April 2020

AI Article Synopsis

  • The study highlights the development of innovative Convolutional Neural Network (CNN) models for accurately predicting cancer types from gene expression data while accounting for potential biases related to tissue origin.
  • The models, tested on a large dataset of over 10,000 samples, achieved high prediction accuracies (93.9-95.0%) and identified around 2,090 cancer markers that were biologically relevant.
  • Additionally, the research demonstrated the ability to classify breast cancer subtypes with an average accuracy of 88.42%, showcasing the models' effectiveness in both diagnosis and therapeutic implications.

Article Abstract

Background: Precise prediction of cancer types is vital for cancer diagnosis and therapy. Through a predictive model, important cancer marker genes can be inferred. Several studies have attempted to build machine learning models for this task however none has taken into consideration the effects of tissue of origin that can potentially bias the identification of cancer markers.

Results: In this paper, we introduced several Convolutional Neural Network (CNN) models that take unstructured gene expression inputs to classify tumor and non-tumor samples into their designated cancer types or as normal. Based on different designs of gene embeddings and convolution schemes, we implemented three CNN models: 1D-CNN, 2D-Vanilla-CNN, and 2D-Hybrid-CNN. The models were trained and tested on gene expression profiles from combined 10,340 samples of 33 cancer types and 713 matched normal tissues of The Cancer Genome Atlas (TCGA). Our models achieved excellent prediction accuracies (93.9-95.0%) among 34 classes (33 cancers and normal). Furthermore, we interpreted one of the models, 1D-CNN model, with a guided saliency technique and identified a total of 2090 cancer markers (108 per class on average). The concordance of differential expression of these markers between the cancer type they represent and others is confirmed. In breast cancer, for instance, our model identified well-known markers, such as GATA3 and ESR1. Finally, we extended the 1D-CNN model for the prediction of breast cancer subtypes and achieved an average accuracy of 88.42% among 5 subtypes. The codes can be found at https://github.com/chenlabgccri/CancerTypePrediction.

Conclusions: Here we present novel CNN designs for accurate and simultaneous cancer/normal and cancer types prediction based on gene expression profiles, and unique model interpretation scheme to elucidate biologically relevance of cancer marker genes after eliminating the effects of tissue-of-origin. The proposed model has light hyperparameters to be trained and thus can be easily adapted to facilitate cancer diagnosis in the future.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7119277PMC
http://dx.doi.org/10.1186/s12920-020-0677-2DOI Listing

Publication Analysis

Top Keywords

gene expression
16
cancer types
16
cancer
15
convolutional neural
8
neural network
8
cancer type
8
prediction based
8
based gene
8
cancer diagnosis
8
cancer marker
8

Similar Publications

The Ataxia-telangiectasia mutated (ATM) is the most important gene for repairing the DNA in Myelodysplastic Neoplasm.

DNA Repair (Amst)

January 2025

Cancer Cytogenomic Laboratory, Center for Research and Drug Development (NPDM), Federal University of Ceara, Fortaleza, Ceara, Brazil; Post-Graduate Program in Medical Science, Federal University of Ceara, Fortaleza, Ceara, Brazil; Post-Graduate Program of Pathology, Federal University of Ceara, Fortaleza, Ceara, Fortaleza, Ceara, Brazil; Post-Graduate Program of Translational Medicine, Federal University of Ceara, Fortaleza, Ceara, Brazil.

Myelodysplastic Neoplasm (MDS) is a cancer associated with aging, often leading to acute myeloid leukemia (AML). One of its hallmarks is hypermethylation, particularly in genes responsible for DNA repair. This study aimed to evaluate the methylation and mutation status of DNA repair genes (single-strand - XPA, XPC, XPG, CSA, CSB and double-strand - ATM, BRCA1, BRCA2, LIG4, RAD51) in MDS across three patient cohorts (Cohort A-56, Cohort B-100, Cohort C-76), using methods like pyrosequencing, real-time PCR, immunohistochemistry, and mutation screening.

View Article and Find Full Text PDF

The St. Lawrence Estuary (SLE) beluga () population in Canada is Endangered, and endocrine disrupting contaminants, such as polychlorinated biphenyls (PCBs), polybrominated diphenyl ethers (PBDEs), and other halogenated flame retardants, have been identified as a threat to the recovery of this population. Here, potential impacts of these contaminants on SLE beluga were evaluated by comparing skin transcriptome profiles and biological pathways between this population and a population less exposed to contaminants (Eastern Beaufort Sea) used as a reference.

View Article and Find Full Text PDF

The global prevalence of heart failure is still growing, which imposes a heavy economic burden. The role of microRNA-146b (miR-146b) in HF remain largely unknown. This study aims to explore the role and mechanism of miR-146b in HF.

View Article and Find Full Text PDF

Proteomic Characterization of NEDD4 Unveils Its Potential Novel Downstream Effectors in Gastric Cancer.

J Proteome Res

January 2025

Graduate School of Analytical Science and Technology (GRAST), Chungnam National University, Daejeon 34134, Republic of Korea.

The E3 ubiquitin ligase neural precursor cell-expressed developmentally down-regulated 4 (NEDD4) is involved in various cancer signaling pathways, including PTEN/AKT. However, its role in promoting gastric cancer (GC) progression is unclear. This study was conducted to elucidate the role of NEDD4 in GC progression.

View Article and Find Full Text PDF

Objective: Aim: Testing Cordia myxa extract on colon cancer cell line and caspase-3 gene and COX-2 protein expression.

Patients And Methods: Materials and Methods: This study used Cordia myxa ethanolic extract at various dosages on SW480 cells. Cell proliferation was measured using MTT, also examined effect of Cordia myxa extract on caspase-3 gene expression using quantitative real-time polymerase chain reaction.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!