Publications by Jonathan Bac

Publications by authors named "Jonathan Bac"

Page 1 of 1

Domain Adaptation Principal Component Analysis: Base Linear Method for Learning with Out-of-Distribution Data.

Evgeny M Mirkes Jonathan Bac Aziz Fouché Sergey V Stasenko Andrei Zinovyev

Entropy (Basel)

December 2022

Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train.

View Article and Find Full Text PDF

Hubness reduction improves clustering and trajectory inference in single-cell transcriptomic data.

Elise Amblard Jonathan Bac Alexander Chervov Vassili Soumelis Andrei Zinovyev

Bioinformatics

January 2022

Motivation: Single-cell RNA-seq (scRNAseq) datasets are characterized by large ambient dimensionality, and their analyses can be affected by various manifestations of the dimensionality curse. One of these manifestations is the hubness phenomenon, i.e.

View Article and Find Full Text PDF

Scikit-Dimension: A Python Package for Intrinsic Dimension Estimation.

Jonathan Bac Evgeny M Mirkes Alexander N Gorban Ivan Tyukin Andrei Zinovyev

Entropy (Basel)

October 2021

Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces scikit-dimension, an open-source Python package for intrinsic dimension estimation.

View Article and Find Full Text PDF

Minimum Spanning vs. Principal Trees for Structured Approximations of Multi-Dimensional Datasets.

Alexander Chervov Jonathan Bac Andrei Zinovyev

Entropy (Basel)

November 2020

Construction of graph-based approximations for multi-dimensional data point clouds is widely used in a variety of areas. Notable examples of applications of such approximators are cellular trajectory inference in single-cell data analysis, analysis of clinical trajectories from synchronic datasets, and skeletonization of images. Several methods have been proposed to construct such approximating graphs, with some based on computation of minimum spanning trees and some based on principal graphs generalizing principal curves.

View Article and Find Full Text PDF

Robust and Scalable Learning of Complex Intrinsic Dataset Geometry via ElPiGraph.

Luca Albergante Evgeny Mirkes Jonathan Bac Huidong Chen Alexis Martin

Entropy (Basel)

March 2020

Multidimensional datapoint clouds representing large datasets are frequently characterized by non-trivial low-dimensional geometry and topology which can be recovered by unsupervised machine learning approaches, in particular, by principal graphs. Principal graphs approximate the multivariate data by a graph injected into the data space with some constraints imposed on the node mapping. Here we present ElPiGraph, a scalable and robust method for constructing principal graphs.

View Article and Find Full Text PDF

Trajectories, bifurcations, and pseudo-time in large clinical datasets: applications to myocardial infarction and diabetes data.

Sergey E Golovenkin Jonathan Bac Alexander Chervov Evgeny M Mirkes Yuliya V Orlova

Gigascience

November 2020

Background: Large observational clinical datasets are becoming increasingly available for mining associations between various disease traits and administered therapy. These datasets can be considered as representations of the landscape of all possible disease conditions, in which a concrete disease state develops through stereotypical routes, characterized by "points of no return" and "final states" (such as lethal or recovery states). Extracting this information directly from the data remains challenging, especially in the case of synchronic (with a short-term follow-up) observations.

View Article and Find Full Text PDF

Lizard Brain: Tackling Locally Low-Dimensional Yet Globally Complex Organization of Multi-Dimensional Datasets.

Jonathan Bac Andrei Zinovyev

Front Neurorobot

January 2020

Machine learning deals with datasets characterized by high dimensionality. However, in many cases, the intrinsic dimensionality of the datasets is surprisingly low. For example, the dimensionality of a robot's perception space can be large and multi-modal but its variables can have more or less complex non-linear interdependencies.

View Article and Find Full Text PDF