Publications by authors named "George C Runger"

Motivation: Matched case-control analysis is widely used in biomedical studies to identify exposure variables associated with health conditions. The matching is used to improve the efficiency. Existing variable selection methods for matched case-control studies are challenged in high-dimensional settings where interactions among variables are also important.

View Article and Find Full Text PDF

Background: Copy Number Alternations (CNAs) is defined as somatic gain or loss of DNA regions. The profiles of CNAs may provide a fingerprint specific to a tumor type or tumor grade. Low-coverage sequencing for reporting CNAs has recently gained interest since successfully translated into clinical applications.

View Article and Find Full Text PDF

In this paper, we propose a new end-to-end deep neural network model for time-series classification (TSC) with emphasis on both the accuracy and the interpretation. The proposed model contains a convolutional network component to extract high-level features and a recurrent network component to enhance the modeling of the temporal characteristics of TS data. In addition, a feedforward fully connected network with the sparse group lasso (SGL) regularization is used to generate the final classification.

View Article and Find Full Text PDF

Background: Next generation sequencing tests (NGS) are usually performed on relatively small core biopsy or fine needle aspiration (FNA) samples. Data is limited on what amount of tumor by volume or minimum number of FNA passes are needed to yield sufficient material for running NGS. We sought to identify the amount of tumor for running the PCDx NGS platform.

View Article and Find Full Text PDF

Kernel principal component analysis (KPCA) is a method widely used for denoising multivariate data. Using geometric arguments, we investigate why a projection operation inherent to all existing KPCA denoising algorithms can sometimes cause very poor denoising. Based on this, we propose a modification to the projection operation that remedies this problem and can be incorporated into any of the existing KPCA algorithms.

View Article and Find Full Text PDF

Phenotypic characterization of individual cells provides crucial insights into intercellular heterogeneity and enables access to information that is unavailable from ensemble averaged, bulk cell analyses. Single-cell studies have attracted significant interest in recent years and spurred the development of a variety of commercially available and research-grade technologies. To quantify cell-to-cell variability of cell populations, we have developed an experimental platform for real-time measurements of oxygen consumption (OC) kinetics at the single-cell level.

View Article and Find Full Text PDF

Despite significant improvements in recent years, proteomic datasets currently available still suffer from large number of missing values. Integrative analyses based upon incomplete proteomic and transcriptomic datasets could seriously bias the biological interpretation. In this study, we applied a non-linear data-driven stochastic gradient boosted trees (GBT) model to impute missing proteomic values using a temporal transcriptomic and proteomic dataset of Shewanella oneidensis.

View Article and Find Full Text PDF

Motivation: Gene expression profiling technologies can generally produce mRNA abundance data for all genes in a genome. A dearth of proteomic data persists because identification range and sensitivity of proteomic measurements lag behind those of transcriptomic measurements. Using partial proteomic data, it is likely that integrative transcriptomic and proteomic analysis may introduce significant bias.

View Article and Find Full Text PDF

This paper proposes a new feature selection methodology. The methodology is based on the stepwise variable selection procedure, but, instead of using the traditional discriminant metrics such as Wilks' Lambda, it uses an estimation of the misclassification error as the figure of merit to evaluate the introduction of new features. The expected misclassification error rate (MER) is obtained by using the densities of a constructed function of random variables, which is the stochastic representation of the conditional distribution of the quadratic discriminant function estimate.

View Article and Find Full Text PDF