Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.

BMC Bioinformatics

Department of Information and Computer Sciences, University of Hawaii, 1680 East-West Road, Honolulu, Hawaii 96822, USA.

Published: February 2007

Background: Designing appropriate machine learning methods for identifying genes that have a significant discriminating power for disease outcomes has become more and more important for our understanding of diseases at genomic level. Although many machine learning methods have been developed and applied to the area of microarray gene expression data analysis, the majority of them are based on linear models, which however are not necessarily appropriate for the underlying connection between the target disease and its associated explanatory genes. Linear model based methods usually also bring in false positive significant features more easily. Furthermore, linear model based algorithms often involve calculating the inverse of a matrix that is possibly singular when the number of potentially important genes is relatively large. This leads to problems of numerical instability. To overcome these limitations, a few non-linear methods have recently been introduced to the area. Many of the existing non-linear methods have a couple of critical problems, the model selection problem and the model parameter tuning problem, that remain unsolved or even untouched. In general, a unified framework that allows model parameters of both linear and non-linear models to be easily tuned is always preferred in real-world applications. Kernel-induced learning methods form a class of approaches that show promising potentials to achieve this goal.

Results: A hierarchical statistical model named kernel-imbedded Gaussian process (KIGP) is developed under a unified Bayesian framework for binary disease classification problems using microarray gene expression data. In particular, based on a probit regression setting, an adaptive algorithm with a cascading structure is designed to find the appropriate kernel, to discover the potentially significant genes, and to make the optimal class prediction accordingly. A Gibbs sampler is built as the core of the algorithm to make Bayesian inferences. Simulation studies showed that, even without any knowledge of the underlying generative model, the KIGP performed very close to the theoretical Bayesian bound not only in the case with a linear Bayesian classifier but also in the case with a very non-linear Bayesian classifier. This sheds light on its broader usability to microarray data analysis problems, especially to those that linear methods work awkwardly. The KIGP was also applied to four published microarray datasets, and the results showed that the KIGP performed better than or at least as well as any of the referred state-of-the-art methods did in all of these cases.

Conclusion: Mathematically built on the kernel-induced feature space concept under a Bayesian framework, the KIGP method presented in this paper provides a unified machine learning approach to explore both the linear and the possibly non-linear underlying relationship between the target features of a given binary disease classification problem and the related explanatory gene expression data. More importantly, it incorporates the model parameter tuning into the framework. The model selection problem is addressed in the form of selecting a proper kernel type. The KIGP method also gives Bayesian probabilistic predictions for disease classification. These properties and features are beneficial to most real-world applications. The algorithm is naturally robust in numerical computation. The simulation studies and the published data studies demonstrated that the proposed KIGP performs satisfactorily and consistently.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1821044PMC
http://dx.doi.org/10.1186/1471-2105-8-67DOI Listing

Publication Analysis

Top Keywords

disease classification
16
gene expression
16
expression data
16
microarray gene
12
machine learning
12
learning methods
12
model
9
kernel-imbedded gaussian
8
methods
8
data analysis
8

Similar Publications

Background: Segmentation models for clinical data experience severe performance degradation when trained on a single client from one domain and distributed to other clients from different domain. Federated Learning (FL) provides a solution by enabling multi-party collaborative learning without compromising the confidentiality of clients' private data.

Methods: In this paper, we propose a cross-domain FL method for Weakly Supervised Semantic Segmentation (FL-W3S) of white blood cells in microscopic images.

View Article and Find Full Text PDF

Crimean-Congo haemorrhagic fever virus (CCHFV), a Biosafety level 4 pathogen transmitted by ticks, causes severe haemorrhagic diseases in humans but remains clinically silent in animals. Over the past forty years, Nigeria lacks comprehensive genetic data on CCHFV in livestock and ticks. This study aimed to identify and characterize CCHFV strains in cattle and their Hyalomma ticks, the primary vector, in Kwara State, Nigeria.

View Article and Find Full Text PDF

Parkinson's disease (PD) is a common disease of the elderly. Given the easy accessibility of handwriting samples, many researchers have proposed handwriting-based detection methods for Parkinson's disease. Extracting more discriminative features from handwriting is an important step.

View Article and Find Full Text PDF

Glucocorticosteroids remain the most common pharmaceutical approach for the treatment of equine asthma but can be associated with significant side effects, including respiratory microbiome alterations. The goal of the study was to assess the impact of 2% lidocaine nebulization, a projected alternative treatment of equine asthma, on the healthy equine respiratory microbiota. A prospective, randomized, controlled, blinded, 2-way crossover study was performed, to assess the effect of 1 mg/kg 2% lidocaine (7 treatments over 4 days) on the equine respiratory microbiota compared to control horses (saline and no treatment).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!