The classification of a population by a specific trait is a major task in medicine, for example when in a diagnostic setting groups of patients with specific diseases are identified, but also when in predictive medicine a group of patients is classified into specific disease severity classes that might profit from different treatments. When the sizes of those subgroups become small, for example in rare diseases, imbalances between the classes are more the rule than the exception and make statistical classification problematic when the error rate of the minority class is high. Many observations are classified as belonging to the majority class, while the error rate of the majority class is low.
View Article and Find Full Text PDFClassification studies are widely applied, e.g. in biomedical research to classify objects/patients into predefined groups.
View Article and Find Full Text PDFAlthough tuberculosis (TB) causes more deaths than any other pathogen, most infected individuals harbor the pathogen without signs of disease. We explored the metabolome of >400 small molecules in serum of uninfected individuals, latently infected healthy individuals and patients with active TB. We identified changes in amino acid, lipid and nucleotide metabolism pathways, providing evidence for anti-inflammatory metabolomic changes in TB.
View Article and Find Full Text PDFDetection of discriminating patterns in gene expression data can be accomplished by using various methods of statistical learning. It has been proposed that sample pooling in this context would have negative effects; however, pooling cannot always be avoided. We propose a simulation framework to explicitly investigate the parameters of patterns, experimental design, noise, and choice of method in order to find out which effects on classification performance are to be expected.
View Article and Find Full Text PDFBackground: For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g.
View Article and Find Full Text PDF