Motivation: The increasing availability of high-throughput omics data allows for considering a new medicine centered on individual patients. Precision medicine relies on exploiting these high-throughput data with machine-learning models, especially the ones based on deep-learning approaches, to improve diagnosis. Due to the high-dimensional small-sample nature of omics data, current deep-learning models end up with many parameters and have to be fitted with a limited training set.
View Article and Find Full Text PDFMotivation: Transcriptomics data are becoming more accessible due to high-throughput and less costly sequencing methods. However, data scarcity prevents exploiting deep learning models' full predictive power for phenotypes prediction. Artificially enhancing the training sets, namely data augmentation, is suggested as a regularization strategy.
View Article and Find Full Text PDFIn the sea of data generated daily, unlabeled samples greatly outnumber labeled ones. This is due to the fact that, in many application areas, labels are scarce or hard to obtain. In addition, unlabeled samples might belong to new classes that are not available in the label set associated with data.
View Article and Find Full Text PDFBackground: Machine learning is now a standard tool for cancer prediction based on gene expression data. However, deep learning is still new for this task, and there is no clear consensus about its performance and utility. Few experimental works have evaluated deep neural networks and compared them with state-of-the-art machine learning.
View Article and Find Full Text PDFMotivation: Medical care is becoming more and more specific to patients' needs due to the increased availability of omics data. The application to these data of sophisticated machine learning models, in particular deep learning (DL), can improve the field of precision medicine. However, their use in clinics is limited as their predictions are not accompanied by an explanation.
View Article and Find Full Text PDFBackground: With the rapid advancement of genomic sequencing techniques, massive production of gene expression data is becoming possible, which prompts the development of precision medicine. Deep learning is a promising approach for phenotype prediction (clinical diagnosis, prognosis, and drug response) based on gene expression profile. Existing deep learning models are usually considered as black-boxes that provide accurate predictions but are not interpretable.
View Article and Find Full Text PDFBackground: The use of predictive gene signatures to assist clinical decision is becoming more and more important. Deep learning has a huge potential in the prediction of phenotype from gene expression profiles. However, neural networks are viewed as black boxes, where accurate predictions are provided without any explanation.
View Article and Find Full Text PDFBackground: Microbiome biomarker discovery for patient diagnosis, prognosis, and risk evaluation is attracting broad interest. Selected groups of microbial features provide signatures that characterize host disease states such as cancer or cardio-metabolic diseases. Yet, the current predictive models stemming from machine learning still behave as black boxes and seldom generalize well.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
January 2013
One of the major aims of many microarray experiments is to build discriminatory diagnosis and prognosis models. A large number of supervised methods have been proposed in literature for microarray-based classification for this purpose. Model evaluation and comparison is a critical issue and, the most of the time, is based on the classification cost.
View Article and Find Full Text PDFInt J Bioinform Res Appl
June 2011
Microarray experiments can be used for simultaneous expression of thousands of genes in various conditions. Data from these experiments are used to identify the gene involved in a particular biological phenomenon. Most current methods for such analysis assume that genes are independent.
View Article and Find Full Text PDFMotivation: The receiver operator characteristic (ROC) curves are commonly used in biomedical applications to judge the performance of a discriminant across varying decision thresholds. The estimated ROC curve depends on the true positive rate (TPR) and false positive rate (FPR), with the key metric being the area under the curve (AUC). With small samples these rates need to be estimated from the training data, so a natural question arises: How well do the estimates of the AUC, TPR and FPR compare with the true metrics?
Results: Through a simulation study using data models and analysis of real microarray data, we show that (i) for small samples the root mean square differences of the estimated and true metrics are considerable; (ii) even for large samples, there is only weak correlation between the true and estimated metrics; and (iii) generally, there is weak regression of the true metric on the estimated metric.
Background: Adipose tissue gene expression analysis in humans now provides a tremendous means to discover the physiopathologic gene targets critical for our understanding and treatment of obesity. Clinical studies are emerging in which adipose gene expression has been examined in hundreds of subjects, and it will be fundamentally important that these studies can be compared so that a common consensus can be reached and new therapeutic targets for obesity proposed.
Objective: We studied the effect of the biopsy sampling methods (needle-aspirated and surgical) used in clinical investigation programs on the functional interpretation of adipose tissue gene expression profiles.
Motivation: The classification methods typically used in bioinformatics classify all examples, even if the classification is ambiguous, for instance, when the example is close to the separating hyperplane in linear classification. For medical applications, it may be better to classify an example only when there is a sufficiently high degree of accuracy, rather than classify all examples with decent accuracy. Moreover, when all examples are classified, the classification rule has no control over the accuracy of the classifier; the algorithm just aims to produce a classifier with the smallest error rate possible.
View Article and Find Full Text PDFEURASIP J Bioinform Syst Biol
June 2010
The aim of many microarray experiments is to build discriminatory diagnosis and prognosis models. Given the huge number of features and the small number of examples, model validity which refers to the precision of error estimation is a critical issue. Previous studies have addressed this issue via the deviation distribution (estimated error minus true error), in particular, the deterioration of cross-validation precision in high-dimensional settings where feature selection is used to mitigate the peaking phenomenon (overfitting).
View Article and Find Full Text PDFMotivation: Microarray experiments that allow simultaneous expression profiling of thousands of genes in various conditions (tissues, cells or time) generate data whose analysis raises difficult problems. In particular, there is a vast disproportion between the number of attributes (tens of thousands) and the number of examples (several tens). Dimension reduction is therefore a key step before applying classification approaches.
View Article and Find Full Text PDFThe stress hormone epinephrine produces major physiological effects on skeletal muscle. Here we determined skeletal muscle mRNA expression profiles before and during a 6-h epinephrine infusion performed in nine young men. Stringent statistical analysis of data obtained using 43000 cDNA element microarrays showed that 1206 and 474 genes were up- and down-regulated, respectively.
View Article and Find Full Text PDF