Principal component analysis (PCA) is indispensable for processing high-throughput omics datasets, as it can extract meaningful biological variability while minimizing the influence of noise. However, the suitability of PCA is contingent on appropriate normalization and transformation of count data, and accurate selection of the number of principal components; improper choices can result in the loss of biological information or corruption of the signal due to excessive noise. Typical approaches to these challenges rely on heuristics that lack theoretical foundations.
View Article and Find Full Text PDFHIV infection exerts profound and long-lasting neurodegenerative effects on the central nervous system (CNS) that can persist despite antiretroviral therapy (ART). Here, we used single-nucleus multiome sequencing to map the transcriptomic and epigenetic landscapes of postmortem human brains from 13 healthy individuals and 20 individuals with HIV who have a history of treatment with ART. Our study spanned three distinct regions-the prefrontal cortex, insular cortex, and ventral striatum-enabling a comprehensive exploration of region-specific and cross-regional perturbations.
View Article and Find Full Text PDFIdentifying accurate cell markers in single-cell RNA-seq data is crucial for understanding cellular diversity and function. Localized Marker Detector (LMD) is a novel tool to identify "localized genes" - genes exclusively expressed in groups of highly similar cells - thereby characterizing cellular diversity in a multi-resolution and fine-grained manner. LMD constructs a cell-cell affinity graph, diffuses the gene expression value across the cell graph, and assigns a score to each gene based on its diffusion dynamics.
View Article and Find Full Text PDFCurrent methods for comparing single-cell RNA sequencing datasets collected in multiple conditions focus on discrete regions of the transcriptional state space, such as clusters of cells. Here we quantify the effects of perturbations at the single-cell level using a continuous measure of the effect of a perturbation across the transcriptomic space. We describe this space as a manifold and develop a relative likelihood estimate of observing each cell in each of the experimental conditions using graph signal processing.
View Article and Find Full Text PDFIEEE Signal Process Mag
November 2020
Adv Intell Data Anal
April 2020
While neural networks are powerful approximators used to classify or embed data into lower dimensional spaces, they are often regarded as black boxes with uninterpretable features. Here we propose for making hidden layers more interpretable without significantly impacting performance on the primary task. Taking inspiration from spatial organization and localization of neuron activations in biological networks, we use a graph Laplacian penalty to structure the activations within a layer.
View Article and Find Full Text PDFWe propose a novel framework for combining datasets via alignment of their intrinsic geometry. This alignment can be used to fuse data originating from disparate modalities, or to correct batch effects while preserving intrinsic data structure. Importantly, we do not assume any pointwise correspondence between datasets, but instead rely on correspondence between a (possibly unknown) subset of data features.
View Article and Find Full Text PDF