Application of Feature Selection and Deep Learning for Cancer Prediction Using DNA Methylation Markers.

Genes (Basel)

Department of Public Health, North Dakota State University, 640S Aldevron Tower, 1455 14th Ave N, Fargo, ND 58102, USA.

Published: August 2022

DNA methylation is a process that can affect gene accessibility and therefore gene expression. In this study, a machine learning pipeline is proposed for the prediction of breast cancer and the identification of significant genes that contribute to the prediction. The current study utilized breast cancer methylation data from The Cancer Genome Atlas (TCGA), specifically the TCGA-BRCA dataset. Feature engineering techniques have been utilized to reduce data volume and make deep learning scalable. A comparative analysis of the proposed approach on Illumina 27K and 450K methylation data reveals that deep learning methodologies for cancer prediction can be coupled with feature selection models to enhance prediction accuracy. Prediction using 450K methylation markers can be accomplished in less than 13 s with an accuracy of 98.75%. Of the list of 685 genes in the feature selected 27K dataset, 578 were mapped to Ensemble Gene IDs. This reduced set was significantly (FDR < 0.05) enriched in five biological processes and one molecular function. Of the list of 1572 genes in the feature selected 450K data set, 1290 were mapped to Ensemble Gene IDs. This reduced set was significantly (FDR < 0.05) enriched in 95 biological processes and 17 molecular functions. Seven oncogene/tumor suppressor genes were common between the 27K and 450K feature selected gene sets. These genes were RTN4IP1, MYO18B, ANP32A, BRF1, SETBP1, NTRK1, and IGF2R. Our bioinformatics deep learning workflow, incorporating imputation and data balancing methods, is able to identify important methylation markers related to functionally important genes in breast cancer with high accuracy compared to deep learning or statistical models alone.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9498757PMC
http://dx.doi.org/10.3390/genes13091557DOI Listing

Publication Analysis

Top Keywords

deep learning
20
methylation markers
12
breast cancer
12
feature selected
12
feature selection
8
cancer prediction
8
dna methylation
8
methylation data
8
27k 450k
8
450k methylation
8

Similar Publications

Deep learning-based metabolomics data study of prostate cancer.

BMC Bioinformatics

December 2024

College of Computer Science and Technology, Inner Mongolia Minzu University, Tongliao, 028000, China.

As a heterogeneous disease, prostate cancer (PCa) exhibits diverse clinical and biological features, which pose significant challenges for early diagnosis and treatment. Metabolomics offers promising new approaches for early diagnosis, treatment, and prognosis of PCa. However, metabolomics data are characterized by high dimensionality, noise, variability, and small sample sizes, presenting substantial challenges for classification.

View Article and Find Full Text PDF

Methods: We retrospectively collected CT scan data from 276 patients with pathologically confirmed primary bone tumors from 4 medical centers in Guangdong Province between January, 2010 and August, 2021. A convolutional neural network (CNN) was employed as the deep learning architecture. The optimal baseline deep learning model (R-Net) was determined through transfer learning, and an optimized model (S-Net) was obtained through algorithmic improvements.

View Article and Find Full Text PDF

Predicting craniofacial fibrous dysplasia growth status: an exploratory study of a hybrid radiomics and deep learning model based on computed tomography images.

Oral Surg Oral Med Oral Pathol Oral Radiol

November 2024

Department of Oral and Cranio-Maxillofacial Surgery, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; College of Stomatology, Shanghai Jiao Tong University, Shanghai, China; National Center for Stomatology, Shanghai, China; National Clinical Research Center for Oral Diseases, Shanghai, China; Shanghai Key Laboratory of Stomatology, Shanghai, China. Electronic address:

Objective: This study aimed to develop 3 models based on computed tomography (CT) images of patients with craniofacial fibrous dysplasia (CFD): a radiomics model (Model Rad), a deep learning (DL) model (Model DL), and a hybrid radiomics and DL model (Model Rad+DL), and evaluate the ability of these models to distinguish between adolescents with active lesion progression and adults with stable lesion progression.

Methods: We retrospectively analyzed preoperative CT scans from 148 CFD patients treated at Shanghai Ninth People's Hospital. The images were processed using 3D-Slicer software to segment and extract regions of interest for radiomics and DL analysis.

View Article and Find Full Text PDF

Purpose: Improve the accuracy of one-stage object detection by modifying the YOLOv7 with Convolutional Block Attention Module (CBAM), known as YOLOv7-CBAM, which can automatically identify torn or intact rotator cuff tendon to assist physicians in diagnosing rotator cuff lesions through ultrasound.

Methods: Between 2020 and 2021, patients who experienced shoulder pain for over 3 months and had both ultrasound and MRI examinations were categorized into torn and intact group. To ensure balanced training, we included the same number of patients on both groups.

View Article and Find Full Text PDF

Microinfarcts and microhemorrhages are characteristic lesions of cerebrovascular disease. Although multiple studies have been published, there is no one universal standard criteria for the neuropathological assessment of cerebrovascular disease. In this study, we propose a novel application of machine learning in the automated screening of microinfarcts and microhemorrhages.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!