Genetic association studies for gene expressions: permutation-based mutual information in a comparison with standard ANOVA and as a novel approach for feature selection.

Silke Szymczak Angelo Nuzzo Christian Fuchsberger Daniel F Schwarz Andreas Ziegler Riccardo Bellazzi Bernd-Wolfgang Igl

BMC Proc

Institute of Medical Biometry and Statistics, University Hospital Schleswig-Holstein, Campus Lübeck, University at Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany.

Published: December 2009

Mutual information (MI) is a robust nonparametric statistical approach for identifying associations between genotypes and gene expression levels. Using the data of Problem 1 provided for the Genetic Analysis Workshop 15, we first compared a quantitative MI (Tsalenko et al. 2006 J Bioinform Comput Biol 4:259-4) with the standard analysis of variance (ANOVA) and the nonparametric Kruskal-Wallis (KW) test. We then proposed a novel feature selection approach using MI in a classification scenario to address the small n - large p problem and compared it with a feature selection that relies on an asymptotic chi2 distribution. In both applications, we used a permutation-based approach for evaluating the significance of MI. Substantial discrepancies in significance were observed between MI, ANOVA, and KW that can be explained by different empirical distributions of the data. In contrast to ANOVA and KW, MI detects shifts in location when the data are non-normally distributed, skewed, or contaminated with outliers. ANOVA but not MI is often significant if one genotype with a small frequency had a remarkable difference in the average gene expression level relative to the other two genotypes. MI depends on genotype frequencies and cannot detect these differences. In the classification scenario, we show that our novel approach for feature selection identifies a smaller list of markers with higher accuracy compared to the standard method. In conclusion, permutation-based MI approaches provide reliable and flexible statistical frameworks which seem to be well suited for data that are non-normal, skewed, or have an otherwise peculiar distribution. They merit further methodological investigation.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2359872	PMC
http://dx.doi.org/10.1186/1753-6561-1-s1-s9	DOI Listing

Publication Analysis

Top Keywords

feature selection

novel approach

approach feature

gene expression

classification scenario

anova

approach

genetic association

association studies

studies gene

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered