IEEE Trans Pattern Anal Mach Intell
September 2023
Random forests are considered one of the best out-of-the-box classification and regression algorithms due to their high level of predictive performance with relatively little tuning. Pairwise proximities can be computed from a trained random forest and measure the similarity between data points relative to the supervised task. Random forest proximities have been used in many applications including the identification of variable importance, data imputation, outlier detection, and data visualization.
View Article and Find Full Text PDFA fundamental task in data exploration is to extract low dimensional representations that capture intrinsic geometry in data, especially for faithfully visualizing data in two or three dimensions. Common approaches use kernel methods for manifold learning. However, these methods typically only provide an embedding of the input data and cannot extend naturally to new data points.
View Article and Find Full Text PDFNeuropil is a fundamental form of tissue organization within the brain, in which densely packed neurons synaptically interconnect into precise circuit architecture. However, the structural and developmental principles that govern this nanoscale precision remain largely unknown. Here we use an iterative data coarse-graining algorithm termed 'diffusion condensation' to identify nested circuit structures within the Caenorhabditis elegans neuropil, which is known as the nerve ring.
View Article and Find Full Text PDFBiomed Opt Express
November 2020
We developed a hyperspectral imaging tool based on surface-enhanced Raman spectroscopy (SERS) probes to determine the expression level and visualize the distribution of PD-L1 in individual cells. Electron-microscopic analysis of PD-L1 antibody - gold nanorod conjugates demonstrated binding the cell surface and internalization into endosomal vesicles. Stimulation of cells with IFN-γ or metformin was used to confirm the ability of SERS probes to report treatment-induced changes.
View Article and Find Full Text PDFDiesel exhaust particles (DEPs) are major constituents of air pollution and associated with numerous oxidative stress-induced human diseases. In vitro toxicity studies are useful for developing a better understanding of species-specific in vivo conditions. Conventional in vitro assessments based on oxidative biomarkers are destructive and inefficient.
View Article and Find Full Text PDFProc IEEE Int Conf Big Data
December 2019
Big data often has emergent structure that exists at multiple levels of abstraction, which are useful for characterizing complex interactions and dynamics of the observations. Here, we consider multiple levels of abstraction via a multiresolution geometry of data points at different granularities. To construct this geometry we define a time-inhomogemeous diffusion process that effectively condenses data points together to uncover nested groupings at larger and larger granularities.
View Article and Find Full Text PDFAn amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDFThe high-dimensional data created by high-throughput technologies require visualization tools that reveal data structure and patterns in an intuitive form. We present PHATE, a visualization method that captures both local and global nonlinear structure using an information-geometric distance between data points. We compare PHATE to other tools on a variety of artificial and biological datasets, and find that it consistently preserves a range of patterns in data, including continual progressions, branches and clusters, better than other tools.
View Article and Find Full Text PDFIt is currently challenging to analyze single-cell data consisting of many cells and samples, and to address variations arising from batch effects and different sample preparations. For this purpose, we present SAUCIE, a deep neural network that combines parallelization and scalability offered by neural networks, with the deep representation of data that can be learned by them to perform many single-cell data analysis tasks. Our regularizations (penalties) render features learned in hidden layers of the neural network interpretable.
View Article and Find Full Text PDFRecent work has focused on the problem of nonparametric estimation of information divergence functionals between two continuous random variables. Many existing approaches require either restrictive assumptions about the density support set or difficult calculations at the support set boundary which must be known a priori. The mean squared error (MSE) convergence rate of a leave-one-out kernel density plug-in divergence functional estimator for general bounded density support sets is derived where knowledge of the support boundary, and therefore, the boundary correction is not required.
View Article and Find Full Text PDFSingle-cell RNA sequencing technologies suffer from many sources of technical noise, including under-sampling of mRNA molecules, often termed "dropout," which can severely obscure important gene-gene relationships. To address this, we developed MAGIC (Markov affinity-based graph imputation of cells), a method that shares information across similar cells, via data diffusion, to denoise the cell count matrix and fill in missing transcripts. We validate MAGIC on several biological systems and find it effective at recovering gene-gene relationships and additional structures.
View Article and Find Full Text PDFProc IEEE Int Conf Acoust Speech Signal Process
March 2016
High frequency oscillations (HFOs) are a promising biomarker of epileptic brain tissue and activity. HFOs additionally serve as a prototypical example of challenges in the analysis of discrete events in high-temporal resolution, intracranial EEG data. Two primary challenges are 1) dimensionality reduction, and 2) assessing feasibility of classification.
View Article and Find Full Text PDF