Effective dimensionality for principal component analysis of time series expression data.

Michael Hörnquist John Hertz Mattias Wahde

Biosystems

Department of Science and Technology, Linköping University, SE-601 74, Norrköping, Sweden.

Published: October 2003

Large-scale expression data are today measured for thousands of genes simultaneously. This development has been followed by an exploration of theoretical tools to get as much information out of these data as possible. Several groups have used principal component analysis (PCA) for this task. However, since this approach is data-driven, care must be taken in order not to analyze the noise instead of the data. As a strong warning towards uncritical use of the output from a PCA, we employ a newly developed procedure to judge the effective dimensionality of a specific data set. Although this data set is obtained during the development of rat central nervous system, our finding is a general property of noisy time series data. Based on knowledge of the noise-level for the data, we find that the effective number of dimensions that are meaningful to use in a PCA is much lower than what could be expected from the number of measurements. We attribute this fact both to effects of noise and the lack of independence of the expression levels. Finally, we explore the possibility to increase the dimensionality by performing more measurements within one time series, and conclude that this is not a fruitful approach.

Download full-text PDF	Source
http://dx.doi.org/10.1016/s0303-2647(03)00128-x	DOI Listing

Publication Analysis

Top Keywords

time series

effective dimensionality

principal component

component analysis

data

expression data

data set

dimensionality principal

analysis time

series expression

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!