Principal component analysis (PCA) is an important tool for analyzing large collections of variables. It functions both as a pre-processing tool to summarize many variables into components and as a method to reveal structure in data. Different coefficients play a central role in these two uses. One focuses on the weights when the goal is summarization, while one inspects the loadings if the goal is to reveal structure. It is well known that the solutions to the two approaches can be found by singular value decomposition; weights, loadings, and right singular vectors are mathematically equivalent. What is often overlooked, is that they are no longer equivalent in the setting of sparse PCA methods which induce zeros either in the weights or the loadings. The lack of awareness for this difference has led to questionable research practices in sparse PCA. First, in simulation studies data is generated mostly based only on structures with sparse singular vectors or sparse loadings, neglecting the structure with sparse weights. Second, reported results represent local optima as the iterative routines are often initiated with the right singular vectors. In this paper we critically re-assess sparse PCA methods by also including data generating schemes characterized by sparse weights and different initialization strategies. The results show that relying on commonly used data generating models can lead to over-optimistic conclusions. They also highlight the impact of choice between sparse weights versus sparse loadings methods and the initialization strategies. The practical consequences of this choice are illustrated with empirical datasets.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10991020PMC
http://dx.doi.org/10.3758/s13428-023-02099-0DOI Listing

Publication Analysis

Top Keywords

sparse pca
16
weights loadings
12
singular vectors
12
sparse weights
12
sparse
10
reveal structure
8
pca methods
8
sparse loadings
8
data generating
8
initialization strategies
8

Similar Publications

Rice ( L.) is a crucial crop for employment and agricultural output and heavily reliant on family labor. This study evaluated the effects of nitrogen levels (80, 120, and 160 kg·ha) on weed incidence and key agronomic variables, including vegetative growth, yield, and related traits, in Ecuador's primary rice-growing regions, Guayas and Los Ríos.

View Article and Find Full Text PDF

Microbial contamination and the prevalence of foodborne pathogens in mutton meat and during its slaughtering process were investigated through microbial source tracking and automated pathogen identification techniques. Samples from mutton meat, cutting boards, hand swabs, knives, weighing balances, and water sources were collected from four different retail sites in Coimbatore. Total plate count (TPC), yeast and mold count (YMC), coliforms, , , , and were examined across 91 samples.

View Article and Find Full Text PDF

Spatial transcriptomics (ST) provides critical insights into the complex spatial organization of gene expression in tissues, enabling researchers to unravel the intricate relationship between cellular environments and biological function. Identifying spatial domains within tissues is essential for understanding tissue architecture and the mechanisms underlying various biological processes, including development and disease progression. Here, we present Randomized Spatial PCA (RASP), a novel spatially aware dimensionality reduction method for spatial transcriptomics (ST) data.

View Article and Find Full Text PDF

4D pathology: translating dynamic epithelial tubulogenesis to prostate cancer pathology.

Histopathology

October 2024

Department of Pathology, Erasmus MC Cancer Institute, University Medical Centre, Rotterdam, the Netherlands.

The Gleason score is the gold standard for grading of prostate cancer (PCa) and is assessed by assigning specific grades to different microscopical growth patterns. Aside from the Gleason grades, individual growth patterns such as cribriform architecture were recently shown to have independent prognostic value for disease outcome. PCa grading is performed on static tissue samples collected at one point in time, whereas in vivo epithelial tumour structures are dynamically invading, branching and expanding into the surrounding stroma.

View Article and Find Full Text PDF

Raman spectroscopy and multivariate analysis for the waste and edible vegetable oil classification.

Nat Prod Res

October 2024

Laboratory of Materials Science and Nanotechnology (LMNT), Department of Chemical, Physics, Mathematics and Natural Science, University of Sassari, Sassari, Italy.

Twelve samples of waste cooking oil (WCO) were prepared by four different deep-frying procedures. The edible and the waste oil samples were characterised by Raman spectroscopy, revealing few and almost negligible differences between them. Therefore, the possibility of classifying the different groups of samples by extracting valuable data from the Raman spectra through statistical multivariate analysis was explored.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!