Graph coloring for extracting discriminative genes in cancer data.

Ann Hum Genet

Departmento de Lenguajes y Sistemas Informáticos, Higher Technical School of Computer Engineering, University of Seville, Seville, Spain.

Published: May 2019

Background And Objective: The major difficulty of the analysis of the input gene expression data in a microarray-based approach for an automated diagnosis of cancer is the large number of genes (high dimensionality) with many irrelevant genes (noise) compared to the very small number of samples. This research study tackles the dimensionality reduction challenge in this area.

Methods: This research study introduces a dimension-reduction technique termed graph coloring approach (GCA) for microarray data-based cancer classification based on analyzing the absolute correlation between gene-gene pairs and partitioning genes into several hubs using graph coloring. GCA starts by a gene-selection step in which top relevant genes are selected using a biserial correlation. Each time, a gene from an ordered list of top relevant genes is selected as the hub gene (representative) and redundant genes are added to its group; the process is repeated recursively for the remaining genes. A gene is considered redundant if its absolute correlation with the hub gene is greater than a controlling threshold. A suitable range for the threshold is estimated by computing a percentage graph for the absolute correlation between gene-gene pairs. Each value in the estimated range for the threshold can efficiently produce a new feature subset.

Results: GCA achieved significant improvement over several existing techniques in terms of higher accuracy and a smaller number of features. Also, genes selected by this method are relevant genes according to the information stored in scientific repositories.

Conclusions: The proposed dimension-reduction technique can help biologists accurately predict cancer in several areas of the body.

Download full-text PDF

Source
http://dx.doi.org/10.1111/ahg.12297DOI Listing

Publication Analysis

Top Keywords

graph coloring
12
absolute correlation
12
relevant genes
12
genes selected
12
genes
10
dimension-reduction technique
8
correlation gene-gene
8
gene-gene pairs
8
top relevant
8
hub gene
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!