New variable selection strategy for analysis of high-dimensional DNA methylation data.

J Bioinform Comput Biol

1 Department of Statistics, Pusan National University, Busan 46241, Korea.

Published: August 2018

AI Article Synopsis

  • Regularization methods are important for analyzing complex genomic data, especially with DNA methylation data from the Infinium HumanMethylation450 BeadChip, which contains multiple CpG sites for each gene.
  • The paper highlights two main regularization techniques: Sparse Group Lasso (SGL) for scenarios where most CpG sites in a gene are related to an outcome, and network-based regularization when only a few are relevant.
  • A new variable selection strategy is proposed that tracks selection frequency of variables from both methods, showing better performance in simulations and in identifying significant CpG sites linked to ovarian cancer.

Article Abstract

In genetic association studies, regularization methods are often used due to their computational efficiency for analysis of high-dimensional genomic data. DNA methylation data generated from Infinium HumanMethylation450 BeadChip Kit have a group structure where an individual gene consists of multiple Cytosine-phosphate-Guanine (CpG) sites. Consequently, group-based regularization can precisely detect outcome-related CpG sites. Representative examples are sparse group lasso (SGL) and network-based regularization. The former is powerful when most of the CpG sites within the same gene are associated with a phenotype outcome. In contrast, the latter is preferred when only a few of the CpG sites within the same gene are related to the outcome. In this paper, we propose new variable selection strategy based on a selection probability that measures selection frequency of individual variables selected by both SGL and network-based regularization. In extensive simulation study, we demonstrated that the proposed strategy can show relatively outstanding selection performance under any situation, compared with both SGL and network-based regularization. Also, we applied the proposed strategy to identify differentially methylated CpG sites and their corresponding genes from ovarian cancer data.

Download full-text PDF

Source
http://dx.doi.org/10.1142/S0219720018500105DOI Listing

Publication Analysis

Top Keywords

cpg sites
20
sgl network-based
12
network-based regularization
12
variable selection
8
selection strategy
8
analysis high-dimensional
8
dna methylation
8
methylation data
8
sites gene
8
proposed strategy
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!