Proc Natl Acad Sci U S A
November 2007
We describe the use of a higher-order singular value decomposition (HOSVD) in transforming a data tensor of genes x "x-settings," that is, different settings of the experimental variable x x "y-settings," which tabulates DNA microarray data from different studies, to a "core tensor" of "eigenarrays" x "x-eigengenes" x "y-eigengenes." Reformulating this multilinear HOSVD such that it decomposes the data tensor into a linear superposition of all outer products of an eigenarray, an x- and a y-eigengene, that is, rank-1 "subtensors," we define the significance of each subtensor in terms of the fraction of the overall information in the data tensor that it captures. We illustrate this HOSVD with an integration of genome-scale mRNA expression data from three yeast cell cycle time courses, two of which are under exposure to either hydrogen peroxide or menadione.
View Article and Find Full Text PDFWe describe the singular value decomposition (SVD) of yeast genome-scale mRNA lengths distribution data measured by DNA microarrays. SVD uncovers in the mRNA abundance levels data matrix of genes x arrays, i.e.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
December 2005
We describe the use of the matrix eigenvalue decomposition (EVD) and pseudoinverse projection and a tensor higher-order EVD (HOEVD) in reconstructing the pathways that compose a cellular system from genome-scale nondirectional networks of correlations among the genes of the system. The EVD formulates a genes x genes network as a linear superposition of genes x genes decorrelated and decoupled rank-1 subnetworks, which can be associated with functionally independent pathways. The integrative pseudoinverse projection of a network computed from a "data" signal onto a designated "basis" signal approximates the network as a linear superposition of only the subnetworks that are common to both signals and simulates observation of only the pathways that are manifest in both experiments.
View Article and Find Full Text PDFWe describe an integrative data-driven mathematical framework that formulates any number of genome-scale molecular biological data sets in terms of one chosen set of data samples, or of profiles extracted mathematically from data samples, designated the "basis" set. By using pseudoinverse projection, the molecular biological profiles of the data samples are least-squares-approximated as superpositions of the basis profiles. Reconstruction of the data in the basis simulates experimental observation of only the cellular states manifest in the data that correspond to those of the basis.
View Article and Find Full Text PDFMotivation: Gene expression data often contain missing expression values. Effective missing value estimation methods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process.
View Article and Find Full Text PDFRegistration using the least-squares cost function is sensitive to the intensity fluctuations caused by the blood oxygen level dependent (BOLD) signal in functional MRI (fMRI) experiments, resulting in stimulus-correlated motion errors. These errors are severe enough to cause false-positive clusters in the activation maps of datasets acquired from 3T scanners. This paper presents a new approach to resolving the coupling between registration and activation.
View Article and Find Full Text PDF