Publications by authors named "John H Phan"

To use next-generation sequencing technology such as RNA-seq for medical and health applications, choosing proper analysis methods for biomarker identification remains a critical challenge for most users. The US Food and Drug Administration (FDA) has led the Sequencing Quality Control (SEQC) project to conduct a comprehensive investigation of 278 representative RNA-seq data analysis pipelines consisting of 13 sequence mapping, three quantification, and seven normalization methods. In this article, we focused on the impact of the joint effects of RNA-seq pipelines on gene expression estimation as well as the downstream prediction of disease outcomes.

View Article and Find Full Text PDF

Traumatic brain injury (TBI) can occur across wide segments of the population, presenting in a heterogeneous manner that makes diagnosis inconsistent and management challenging. Biomarkers offer the potential to objectively identify injury status, severity, and phenotype by measuring the relative concentrations of endogenous molecules in readily accessible biofluids. Through a data-driven, discovery approach, novel biomarker candidates for TBI were identified in the serum lipidome of adult male Sprague-Dawley rats in the first week following moderate controlled cortical impact (CCI).

View Article and Find Full Text PDF

Cancer survival prediction is an active area of research that can help prevent unnecessary therapies and improve patient's quality of life. Gene expression profiling is being widely used in cancer studies to discover informative biomarkers that aid predict different clinical endpoint prediction. We use multiple modalities of data derived from RNA deep-sequencing (RNA-seq) to predict survival of cancer patients.

View Article and Find Full Text PDF

While numerous RNA-seq data analysis pipelines are available, research has shown that the choice of pipeline influences the results of differentially expressed gene detection and gene expression estimation. Gene expression estimation is a key step in RNA-seq data analysis, since the accuracy of gene expression estimates profoundly affects the subsequent analysis. Generally, gene expression estimation involves sequence alignment and quantification, and accurate gene expression estimation requires accurate alignment.

View Article and Find Full Text PDF

The Big Data era in Biomedical research has resulted in large-cohort data repositories such as The Cancer Genome Atlas (TCGA). These repositories routinely contain hundreds of matched patient samples for genomic, proteomic, imaging, and clinical data modalities, enabling holistic and multi-modal integrative analysis of human disease. Using TCGA renal and ovarian cancer data, we conducted a novel investigation of multi-modal data integration by combining histopathological image and RNA-seq data.

View Article and Find Full Text PDF

We compare methods for filtering RNA-seq lowexpression genes and investigate the effect of filtering on detection of differentially expressed genes (DEGs). Although RNA-seq technology has improved the dynamic range of gene expression quantification, low-expression genes may be indistinguishable from sampling noise. The presence of noisy, low-expression genes can decrease the sensitivity of detecting DEGs.

View Article and Find Full Text PDF

Histopathological whole-slide images (WSIs) have emerged as an objective and quantitative means for image-based disease diagnosis. However, WSIs may contain acquisition artifacts that affect downstream image feature extraction and quantitative disease diagnosis. We develop a method for detecting blur artifacts in WSIs using distributions of local blur metrics.

View Article and Find Full Text PDF

Prediction of survival for cancer patients is an open area of research. However, many of these studies focus on datasets with a large number of patients. We present a novel method that is specifically designed to address the challenge of data scarcity, which is often the case for cancer datasets.

View Article and Find Full Text PDF

Background: Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model.

Results: We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays.

View Article and Find Full Text PDF

RNA-seq enables quantification of the human transcriptome. Estimation of gene expression is a fundamental issue in the analysis of RNA-seq data. However, there is an inherent ambiguity in distinguishing between genes with very low expression and experimental or transcriptional noise.

View Article and Find Full Text PDF

RNA-seq data analysis pipelines are generally composed of sequence alignment, expression quantification, expression normalization, and differentially expressed gene (DEG) detection. Each step has numerous specific tools or algorithms, so we cannot explore all combinatorial pipelines and provide a comprehensive comparison of pipeline performance. To understand the mechanism of RNA-seq data analysis pipelines and provide some useful information for pipeline selection, we believe it is necessary to analyze the interactions among pipeline components.

View Article and Find Full Text PDF

Robust prediction models are important for numerous science, engineering, and biomedical applications. However, best-practice procedures for optimizing prediction models can be computationally complex, especially when choosing models from among hundreds or thousands of parameter choices. Computational complexity has further increased with the growth of data in these fields, concurrent with the era of "Big Data".

View Article and Find Full Text PDF

Researchers have developed computer-aided decision support systems for translational medicine that aim to objectively and efficiently diagnose cancer using histopathological images. However, the performance of such systems is confounded by nonbiological experimental variations or "batch effects" that can commonly occur in histopathological data, especially when images are acquired using different imaging devices and patient samples. This is even more problematic in large-scale studies in which cross-laboratory sharing of large volumes of data is necessary.

View Article and Find Full Text PDF

Background: Genome annotation is a crucial component of RNA-seq data analysis. Much effort has been devoted to producing an accurate and rational annotation of the human genome. An annotated genome provides a comprehensive catalogue of genomic functional elements.

View Article and Find Full Text PDF

One way to gain a more comprehensive picture of the complex function of a cell is to study the transcriptome. A promising technology for studying the transcriptome is RNA sequencing, an application of which is to quantify elements in the transcriptome and to link quantitative observations to biology. Although numerous quantification algorithms are publicly available, no method of systematically assessing these algorithms has been developed.

View Article and Find Full Text PDF

RNA-Seq, a deep sequencing technique, promises to be a potential successor to microarrays for studying the transcriptome. One of many aspects of transcriptomics that are of interest to researchers is gene expression estimation. With rapid development in RNA-Seq, there are numerous tools available to estimate gene expression, each producing different results.

View Article and Find Full Text PDF

Background: Analysis of tissue biopsy whole-slide images (WSIs) depends on effective detection and elimination of image artifacts. We present a novel method to detect tissue-fold artifacts in histopathological WSIs. We also study the effect of tissue folds on image features and prediction models.

View Article and Find Full Text PDF

RNA-sequencing (RNA-seq) technology has emerged as the preferred method for quantification of gene and isoform expression. Numerous RNA-seq quantification tools have been proposed and developed, bringing us closer to developing expression-based diagnostic tests based on this technology. However, because of the rapidly evolving technologies and algorithms, it is essential to establish a systematic method for evaluating the quality of RNA-seq quantification.

View Article and Find Full Text PDF

Objectives: With the objective of bringing clinical decision support systems to reality, this article reviews histopathological whole-slide imaging informatics methods, associated challenges, and future research opportunities.

Target Audience: This review targets pathologists and informaticians who have a limited understanding of the key aspects of whole-slide image (WSI) analysis and/or a limited knowledge of state-of-the-art technologies and analysis methods.

Scope: First, we discuss the importance of imaging informatics in pathology and highlight the challenges posed by histopathological WSI.

View Article and Find Full Text PDF

Nanoparticle-mediated hyperthermia for cancer therapy is a growing area of cancer nanomedicine because of the potential for localized and targeted destruction of cancer cells. Localized hyperthermal effects are dependent on many factors, including nanoparticle size and shape, excitation wavelength and power, and tissue properties. Computational modeling is an important tool for investigating and optimizing these parameters.

View Article and Find Full Text PDF

Unlabelled: Kinases become one of important groups of drug targets. To identify more kinases being potential for cancer therapy, we developed an integrative approach for the large-scale screen of functional genes capable of regulating the main traits of cancer metastasis. We first employed self-assembled cell microarray to screen functional genes that regulate cancer cell migration using a human genome kinase siRNA library.

View Article and Find Full Text PDF

Background: Automatic cancer diagnostic systems based on histological image classification are important for improving therapeutic decisions. Previous studies propose textural and morphological features for such systems. These features capture patterns in histological images that are useful for both cancer grading and subtyping.

View Article and Find Full Text PDF

Histopathological images acquired from different experimental set-ups often suffer from batch-effects due to color variations and scale variations. In this paper, we develop a novel scale normalization model for histopathological images based on nuclear area distributions. Results indicate that the normalization model closely fits empirical values for two renal tumor datasets.

View Article and Find Full Text PDF

Combining multiple microarray datasets increases sample size and leads to improved reproducibility in identification of informative genes and subsequent clinical prediction. Although microarrays have increased the rate of genomic data collection, sample size is still a major issue when identifying informative genetic biomarkers. Because of this, feature selection methods often suffer from false discoveries, resulting in poorly performing predictive models.

View Article and Find Full Text PDF