In this work, we present a novel methodology for performing the supervised classification of time-ordered noisy data; we call this methodology Entropic Sparse Probabilistic Approximation with Markov regularization (eSPA-Markov). It is an extension of entropic learning methodologies, allowing the simultaneous learning of segmentation patterns, entropy-optimal feature space discretizations, and Bayesian classification rules. We prove the conditions for the existence and uniqueness of the learning problem solution and propose a one-shot numerical learning algorithm that-in the leading order-scales linearly in dimension.
View Article and Find Full Text PDFSmall data learning problems are characterized by a significant discrepancy between the limited number of response variable observations and the large feature space dimension. In this setting, the common learning tools struggle to identify the features important for the classification task from those that bear no relevant information and cannot derive an appropriate learning rule that allows discriminating among different classes. As a potential solution to this problem, here we exploit the idea of reducing and rotating the feature space in a lower-dimensional gauge and propose the gauge-optimal approximate learning (GOAL) algorithm, which provides an analytically tractable joint solution to the dimension reduction, feature segmentation, and classification problems for small data learning problems.
View Article and Find Full Text PDFClassification problems in the small data regime (with small data statistic T and relatively large feature space dimension D) impose challenges for the common machine learning (ML) and deep learning (DL) tools. The standard learning methods from these areas tend to show a lack of robustness when applied to data sets with significantly fewer data points than dimensions and quickly reach the overfitting bound, thus leading to poor performance beyond the training set. To tackle this issue, we propose eSPA+, a significant extension of the recently formulated entropy-optimal scalable probabilistic approximation algorithm (eSPA).
View Article and Find Full Text PDFMislabeling of cases as well as controls in case-control studies is a frequent source of strong bias in prognostic and diagnostic tests and algorithms. Common data processing methods available to the researchers in the biomedical community do not allow for consistent and robust treatment of labeled data in the situations where both, the case and the control groups, contain a non-negligible proportion of mislabeled data instances. This is an especially prominent issue in studies regarding late-onset conditions, where individuals who may convert to cases may populate the control group, and for screening studies that often have high false-positive/-negative rates.
View Article and Find Full Text PDF