Motivation: Many real-world problems can be modeled as annotated graphs. Scalable graph algorithms that extract actionable information from such data are in demand since these graphs are large, varying in topology, and have diverse node/edge annotations. When these graphs change over time they create dynamic graphs, and open the possibility to find patterns across different time points.
View Article and Find Full Text PDFGiven a cardiac-arrest patient being monitored in the ICU (intensive care unit) for brain activity, how can we predict their health outcomes as early as possible? Early decision-making is critical in many applications, e.g. monitoring patients may assist in early intervention and improved care.
View Article and Find Full Text PDFLaw enforcement and domain experts can detect human trafficking (HT) in online escort websites by analyzing suspicious clusters of connected ads. How can we explain clustering results intuitively and interactively, visualizing potential evidence for experts to analyze? We present TRAFFICVIS, the first interface for cluster-level HT detection and labeling. Developed through months of participatory design with domain experts, TRAFFICVIS provides coordinated views in conjunction with carefully chosen backend algorithms to effectively show spatio-temporal and text patterns to a wide variety of anti-HT stakeholders.
View Article and Find Full Text PDFHow can we detect fraudulent lockstep behavior in large-scale multi-aspect data (i.e., tensors)? Can we detect it when data are too large to fit in memory or even on a disk? Past studies have shown that dense subtensors in real-world tensors (e.
View Article and Find Full Text PDFCan we use graph mining algorithms to find patterns in tumor molecular mechanisms? Can we model disease progression with multiple time-specific graph comparison algorithms? In this paper, we will focus on this area. Our main contributions are 1) we proposed the Temporal-Omics (Temp-O) workflow to model tumor progression in non-small cell lung cancer (NSCLC) using graph comparisons between multiple stage-specific graphs, and 2) we showed that temporal structures are meaningful in the tumor progression of NSCLC. Other identified temporal structures that were not highlighted in this paper may also be used to gain insights to possible novel mechanisms.
View Article and Find Full Text PDFJ Integr Neurosci
September 2016
We propose a nonlinear dynamic model for an invasive electroencephalogram analysis that learns the optimal parameters of the neural population model via the Levenberg-Marquardt algorithm. We introduce the crucial windows where the estimated parameters present patterns before seizure onset. The optimal parameters minimizes the error between the observed signal and the generated signal by the model.
View Article and Find Full Text PDFHow can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like 'edible', 'fits in hand')? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the (CMTF) problem. Can we enhance CMTF solver, so that it can operate on potentially very large datasets that may not fit in main memory? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of CMTF algorithm, produces sparse and interpretable solutions, and parallelizes CMTF algorithm, producing sparse and interpretable solutions (up to ).
View Article and Find Full Text PDFMultiaspect data are ubiquitous in modern Big Data applications. For instance, different aspects of a social network are the different types of communication between people, the time stamp of each interaction, and the location associated to each individual. How can we jointly model all those aspects and leverage the additional information that they introduce to our analysis? Tensors, which are multidimensional extensions of matrices, are a principled and mathematically sound way of modeling such multiaspect data.
View Article and Find Full Text PDFComplex networks have been shown to exhibit universal properties, with one of the most consistent patterns being the scale-free degree distribution, but are there regularities obeyed by the r-hop neighborhood in real networks? We answer this question by identifying another power-law pattern that describes the relationship between the fractions of node pairs C(r) within r hops and the hop count r. This scale-free distribution is pervasive and describes a large variety of networks, ranging from social and urban to technological and biological networks. In particular, inspired by the definition of the fractal correlation dimension D2 on a point-set, we consider the hop-count r to be the underlying distance metric between two vertices of the network, and we examine the scaling of C(r) with r.
View Article and Find Full Text PDFGiven a simple noun such as apple, and a question such as "Is it edible?," what processes take place in the human brain? More specifically, given the stimulus, what are the interactions between (groups of) neurons (also known as functional connectivity) and how can we automatically infer those interactions, given measurements of the brain activity? Furthermore, how does this connectivity differ across different human subjects? In this work, we show that this problem, even though originating from the field of neuroscience, can benefit from big data techniques; we present a simple, novel good-enough brain model, or GeBM in short, and a novel algorithm Sparse-SysId, which are able to effectively model the dynamics of the neuron interactions and infer the functional connectivity. Moreover, GeBM is able to simulate basic psychological phenomena such as habituation and priming (whose definition we provide in the main text). We evaluate GeBM by using real brain data.
View Article and Find Full Text PDFNetwork robustness is an important principle in biology and engineering. Previous studies of global networks have identified both redundancy and sparseness as topological properties used by robust networks. By focusing on molecular subnetworks, or modules, we show that module topology is tightly linked to the level of environmental variability (noise) the module expects to encounter.
View Article and Find Full Text PDFProc SIAM Int Conf Data Min
January 2014
How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like 'edible', 'fits in hand')? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the (CMTF) problem. Can we accelerate CMTF solver, so that it runs within a few minutes instead of tens of hours to a day, while maintaining good accuracy? We introduce TURBO-SMT, a meta-method capable of doing exactly that: it boosts the performance of CMTF algorithm, by up to ×, along with an up to increase in sparsity, with comparable accuracy to the baseline.
View Article and Find Full Text PDFLet us consider that someone is starting a research on a topic that is unfamiliar to them. Which seminal papers have influenced the topic the most? What is the genealogy of the seminal papers in this topic? These are the questions that they can raise, which we try to answer in this paper. First, we propose an algorithm that finds a set of seminal papers on a given topic.
View Article and Find Full Text PDFComput Math Methods Med
December 2013
Recently, data with complex characteristics such as epilepsy electroencephalography (EEG) time series has emerged. Epilepsy EEG data has special characteristics including nonlinearity, nonnormality, and nonperiodicity. Therefore, it is important to find a suitable forecasting method that covers these special characteristics.
View Article and Find Full Text PDFThe traditional approach to functional image analysis models images as matrices of raw voxel intensity values. Although such a representation is widely utilized and heavily entrenched both within neuroimaging and in the wider data mining community, the strong interactions among space, time, and categorical modes such as subject and experimental task inherent in functional imaging yield a dataset with "high-order" structure, which matrix models are incapable of exploiting. Reasoning across all of these modes of data concurrently requires a high-order model capable of representing relationships between all modes of the data in tandem.
View Article and Find Full Text PDFMotivation: Microarray profiling of mRNA abundance is often ill suited for temporal-spatial analysis of gene expressions in multicellular organisms such as Drosophila. Recent progress in image-based genome-scale profiling of whole-body mRNA patterns via in situ hybridization (ISH) calls for development of accurate and automatic image analysis systems to facilitate efficient mining of complex temporal-spatial mRNA patterns, which will be essential for functional genomics and network inference in higher organisms.
Results: We present SPEX(2), an automatic system for embryonic ISH image processing, which can extract, transform, compare, classify and cluster spatial gene expression patterns in Drosophila embryos.
Motivation: Protein complexes integrate multiple gene products to coordinate many biological functions. Given a graph representing pairwise protein interaction data one can search for subgraphs representing protein complexes. Previous methods for performing such search relied on the assumption that complexes form a clique in that graph.
View Article and Find Full Text PDF