Kernel learning methods, whether Bayesian or frequentist, typically involve multiple levels of inference, with the coefficients of the kernel expansion being determined at the first level and the kernel and regularisation parameters carefully tuned at the second level, a process known as model selection. Model selection for kernel machines is commonly performed via optimisation of a suitable model selection criterion, often based on cross-validation or theoretical performance bounds. However, if there are a large number of kernel parameters, as for instance in the case of automatic relevance determination (ARD), there is a substantial risk of over-fitting the model selection criterion, resulting in poor generalisation performance.
View Article and Find Full Text PDFMika, Rätsch, Weston, Schölkopf and Müller [Mika, S., Rätsch, G., Weston, J.
View Article and Find Full Text PDFArtificial neural networks have proved an attractive approach to non-linear regression problems arising in environmental modelling, such as statistical downscaling, short-term forecasting of atmospheric pollutant concentrations and rainfall run-off modelling. However, environmental datasets are frequently very noisy and characterized by a noise process that may be heteroscedastic (having input dependent variance) and/or non-Gaussian. The aim of this paper is to review existing methodologies for estimating predictive uncertainty in such situations and, more importantly, to illustrate how a model of the predictive distribution may be exploited in assessing the possible impacts of climate change and to improve current decision making processes.
View Article and Find Full Text PDFJ.-H. Chen and C.
View Article and Find Full Text PDFMotivation: Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance.
View Article and Find Full Text PDFSurvival analysis is a branch of statistics concerned with the time elapsing before "failure," with diverse applications in medical statistics and the analysis of the reliability of electrical or mechanical components. We introduce a parametric accelerated life survival analysis model based on kernel learning methods that, at least in principal, is able to learn arbitrary dependencies between a vector of explanatory variables and the scale of the distribution of survival times. The proposed kernel survival analysis method is then used to model the growth domain of Clostridium botulinum, the food processing and storage conditions permitting the growth of this foodborne microbial pathogen, leading to the production of the neurotoxin responsible for botulism.
View Article and Find Full Text PDFWe present here a simple technique that simplifies the construction of Bayesian treatments of a variety of sparse kernel learning algorithms. An incomplete Cholesky factorisation is employed to modify the dual parameter space, such that the Gaussian prior over the dual model parameters is whitened. The regularisation term then corresponds to the usual weight-decay regulariser, allowing the Bayesian analysis to proceed via the evidence framework of MacKay.
View Article and Find Full Text PDFLeave-one-out cross-validation has been shown to give an almost unbiased estimator of the generalisation properties of statistical models, and therefore provides a sensible criterion for model selection and comparison. In this paper we show that exact leave-one-out cross-validation of sparse Least-Squares Support Vector Machines (LS-SVMs) can be implemented with a computational complexity of only O(ln2) floating point operations, rather than the O(l2n2) operations of a naïve implementation, where l is the number of training patterns and n is the number of basis vectors. As a result, leave-one-out cross-validation becomes a practical proposition for model selection in large scale applications.
View Article and Find Full Text PDF