Publications by authors named "Iain Johnstone"

Background: One-carbon metabolism, which includes the folate and methionine cycles, involves the transfer of methyl groups which are then utilised as a part of multiple physiological processes including redox defence. During the methionine cycle, the vitamin B12-dependent enzyme methionine synthetase converts homocysteine to methionine. The enzyme S-adenosylmethionine (SAM) synthetase then uses methionine in the production of the reactive methyl carrier SAM.

View Article and Find Full Text PDF

We study the sample covariance matrix for real-valued data with general population covariance, as well as MANOVA-type covariance estimators in variance components models under null hypotheses of global sphericity. In the limit as matrix dimensions increase proportionally, the asymptotic spectra of such estimators may have multiple disjoint intervals of support, possibly intersecting the negative half line. We show that the distribution of the extremal eigenvalue at each regular edge of the support has a GOE Tracy-Widom limit.

View Article and Find Full Text PDF

Background: Plant-derived cysteine proteinases of the papain family (CPs) attack nematodes by digesting the cuticle, leading to rupture and death of the worm. The nematode cuticle is composed of collagens and cuticlins, but the specific molecular target(s) for the proteinases have yet to be identified.

Methods: This study followed the course of nematode cuticle disruption using immunohistochemistry, scanning electron microscopy and proteomics, using a free-living nematode, Caenorhabditis elegans and the murine GI nematode Heligmosomoides bakeri (H.

View Article and Find Full Text PDF

Sample correlation matrices are widely used, but for high-dimensional data little is known about their spectral properties beyond "null models", which assume the data have independent coordinates. In the class of spiked models, we apply random matrix theory to derive asymptotic first-order and distributional results for both leading eigenvalues and eigenvectors of sample correlation matrices, assuming a high-dimensional regime in which the ratio , of number of variables to sample size , converges to a positive constant. While the first-order spectral properties of sample correlation matrices match those of sample covariance matrices, their asymptotic distributions can differ significantly.

View Article and Find Full Text PDF

We study the spectra of MANOVA estimators for variance component covariance matrices in multivariate random effects models. When the dimensionality of the observations is large and comparable to the number of realizations of each random effect, we show that the empirical spectra of such estimators are well-approximated by deterministic laws. The Stieltjes transforms of these laws are characterized by systems of fixed-point equations, which are numerically solvable by a simple iterative procedure.

View Article and Find Full Text PDF

We study improved approximations to the distribution of the largest eigenvalue of the sample covariance matrix of zero-mean Gaussian observations in dimension + 1. We assume that one population principal component has variance ℓ > 1 and the remaining 'noise' components have common variance 1. In the high-dimensional limit 0, we study Edgeworth corrections to the limiting Gaussian distribution of in the supercritical case .

View Article and Find Full Text PDF
PCA in High Dimensions: An orientation.

Proc IEEE Inst Electr Electron Eng

August 2018

When the data are high dimensional, widely used multivariate statistical methods such as principal component analysis can behave in unexpected ways. In settings where the dimension of the observations is comparable to the sample size, upward bias in sample eigenvalues and inconsistency of sample eigenvectors are among the most notable phenomena that appear. These phenomena, and the limiting behavior of the rescaled extreme sample eigenvalues, have recently been investigated in detail under the spiked covariance model.

View Article and Find Full Text PDF

We show that in a common high-dimensional covariance model, the choice of loss function has a profound effect on optimal estimation. In an asymptotic framework based on the Spiked Covariance model and use of orthogonally invariant estimators, we show that optimal estimation of the population covariance matrix boils down to design of an optimal shrinker that acts elementwise on the sample eigenvalues. Indeed, to each loss function there corresponds a unique admissible eigenvalue shrinker * dominating all other shrinkers.

View Article and Find Full Text PDF

Consider the classical Gaussian unitary ensemble of size and the real white Wishart ensemble with variables and degrees of freedom. In the limits of large and , with positive ratio in the Wishart case, the expected number of eigenvalues that exit the upper bulk edge is less than one, approaching 0.031 and 0.

View Article and Find Full Text PDF

We consider estimating the predictive density under Kullback-Leibler loss in an sparse Gaussian sequence model. Explicit expressions of the first order minimax risk along with its exact constant, asymptotically least favorable priors and optimal predictive density estimates are derived. Compared to the sparse recovery results involving point estimation of the normal mean, new decision theoretic phenomena are seen.

View Article and Find Full Text PDF

The classical methods of multivariate analysis are based on the eigenvalues of one or two sample covariance matrices. In many applications of these methods, for example to high dimensional data, it is natural to consider alternative hypotheses which are a low rank departure from the null hypothesis. For rank one alternatives, this note provides a representation for the joint eigenvalue density in terms of a single contour integral.

View Article and Find Full Text PDF

We study the problem of estimating the leading eigenvectors of a high-dimensional population covariance matrix based on independent Gaussian observations. We establish a lower bound on the minimax risk of estimators under the loss, in the joint limit as dimension and sample size increase to infinity, under various models of sparsity for the population eigenvectors. The lower bound on the risk points to the existence of different regimes of sparsity of the eigenvectors.

View Article and Find Full Text PDF

"Bulk" measurements of antiviral innate immune responses from pooled cells yield averaged signals and do not reveal underlying signaling heterogeneity in infected and bystander single cells. We examined such heterogeneity in the small intestine during rotavirus (RV) infection. Murine RV EW robustly activated type I IFNs and several antiviral genes (IFN-stimulated genes) in the intestine by bulk analysis, the source of induced IFNs primarily being hematopoietic cells.

View Article and Find Full Text PDF

We study the rate of convergence for the largest eigenvalue distributions in the Gaussian unitary and orthogonal ensembles to their Tracy-Widom limits. We show that one can achieve an () rate with particular choices of the centering and scaling constants. The arguments here also shed light on more complicated cases of Laguerre and Jacobi ensembles, in both unitary and orthogonal versions.

View Article and Find Full Text PDF

We discuss the identification of genes that are associated with an outcome in RNA sequencing and other sequence-based comparative genomic experiments. RNA-sequencing data take the form of counts, so models based on the Gaussian distribution are unsuitable. Moreover, normalization is challenging because different sequencing experiments may generate quite different total numbers of reads.

View Article and Find Full Text PDF

In Gaussian sequence models with Gaussian priors, we develop some simple examples to illustrate three perspectives on matching of posterior and frequentist probabilities when the dimension p increases with sample size n: (i) convergence of joint posterior distributions, (ii) behavior of a non-linear functional: squared error loss, and (iii) estimation of linear functionals. The three settings are progressively less demanding in terms of conditions needed for validity of the Bernstein-von Mises theorem.

View Article and Find Full Text PDF

Principal components analysis (PCA) is a classic method for the reduction of dimensionality of data in the form of n observations (or cases) of a vector with p variables. Contemporary datasets often have p comparable with or even much larger than n. Our main assertions, in such settings, are (a) that some initial reduction in dimensionality is desirable before applying any PCA-type search for principal modes, and (b) the initial reduction in dimensionality is best achieved by working in a basis in which the signals have a sparse representation.

View Article and Find Full Text PDF

Coordination between cell fate specification and cell cycle control in multicellular organisms is essential to regulate cell numbers in tissues and organs during development, and its failure may lead to oncogenesis. In mammalian cells, as part of a general cell cycle checkpoint mechanism, the F-box protein beta-transducin repeat-containing protein (beta-TrCP) and the Skp1/Cul1/F-box complex control the periodic cell cycle fluctuations in abundance of the CDC25A and B phosphatases. Here, we find that the Caenorhabditis elegans beta-TrCP orthologue LIN-23 regulates a progressive decline of CDC-25.

View Article and Find Full Text PDF

Modern applications of statistical theory and methods can involve extremely large datasets, often with huge numbers of measurements on each of a comparatively small number of experimental units. New methodology and accompanying theory have emerged in response: the goal of this Theme Issue is to illustrate a number of these recent developments. This overview article introduces the difficulties that arise with high-dimensional data in the context of the very familiar linear statistical model: we give a taste of what can nevertheless be achieved when the parameter vector of interest is sparse, that is, contains many zero elements.

View Article and Find Full Text PDF

The nematode cuticle is a protective collagenous extracellular matrix that is modified, cross-linked, and processed by a number of key enzymes. This Ecdysozoan-specific structure is synthesized repeatedly and allows growth and development in a linked degradative and biosynthetic process known as molting. A targeted RNA interference screen using a cuticle collagen marker has been employed to identify components of the cuticle biosynthetic pathway.

View Article and Find Full Text PDF

The nonsense-mediated mRNA decay (NMD) pathway is a surveillance mechanism that targets the degradation of mRNAs harboring premature termination codons (PTCs). Two key aspects of NMD are the definition of a PTC codon and the identification of the molecular machinery dedicated to this mechanism. This chapter describes the development of transgenic reporters as well as the use of genome-wide RNAi and genetic screens to identify novel components of the NMD pathway in the nematode Caenorhabditis elegans.

View Article and Find Full Text PDF

The greatest root distribution occurs everywhere in classical multivariate analysis, but even under the null hypothesis the exact distribution has required extensive tables or special purpose software. We describe a simple approximation, based on the Tracy-Widom distribution, that in many cases can be used instead of tables or software, at least for initial screening. The quality of approximation is studied, and its use illustrated in a variety of setttings.

View Article and Find Full Text PDF

Let A and B be independent, central Wishart matrices in p variables with common covariance and having m and n degrees of freedom, respectively. The distribution of the largest eigenvalue of (A + B)(-1)B has numerous applications in multivariate statistics, but is difficult to calculate exactly. Suppose that m and n grow in proportion to p.

View Article and Find Full Text PDF

The nematode cuticle is an extremely flexible and resilient exoskeleton that permits locomotion via attachment to muscle, confers environmental protection and allows growth by molting. It is synthesised five times, once in the embryo and subsequently at the end of each larval stage prior to molting. It is a highly structured extra-cellular matrix (ECM), composed predominantly of cross-linked collagens, additional insoluble proteins termed cuticlins, associated glycoproteins and lipids.

View Article and Find Full Text PDF

The nonsense-mediated mRNA decay (NMD) pathway selectively degrades mRNAs harboring premature termination codons (PTCs). Seven genes (smg-1-7, for suppressor with morphological effect on genitalia) that are essential for NMD were originally identified in the nematode Caenorhabditis elegans, and orthologs of these genes have been found in several species. Whereas in humans NMD is linked to splicing, PTC definition occurs independently of exon boundaries in Drosophila.

View Article and Find Full Text PDF