Publications by authors named "Andrei Zinovyev"

Immunotherapy is improving the survival of patients with metastatic non-small cell lung cancer (NSCLC), yet reliable biomarkers are needed to identify responders prospectively and optimize patient care. In this study, we explore the benefits of multimodal approaches to predict immunotherapy outcome using multiple machine learning algorithms and integration strategies. We analyze baseline multimodal data from a cohort of 317 metastatic NSCLC patients treated with first-line immunotherapy, including positron emission tomography images, digitized pathological slides, bulk transcriptomic profiles, and clinical information.

View Article and Find Full Text PDF

Digital twins represent a key technology for precision health. Medical digital twins consist of computational models that represent the health state of individual patients over time, enabling optimal therapeutics and forecasting patient prognosis. Many health conditions involve the immune system, so it is crucial to include its key features when designing medical digital twins.

View Article and Find Full Text PDF

Background: Chronic obstructive pulmonary disease (COPD) exhibits considerable progression heterogeneity. We hypothesized that elastic principal graph analysis (EPGA) would identify distinct clinical phenotypes and their longitudinal relationships.

Methods: Cross-sectional data from 8,972 tobacco-exposed COPDGene participants, with and without COPD, were used to train a model with EPGA, using thirty clinical, physiologic and CT features.

View Article and Find Full Text PDF

Motivation: Deciphering molecular signals from omics data helps understanding cellular processes and disease progression. Effective algorithms for extracting these signals are essential, with a strong emphasis on robustness and reproducibility.

Results: R/Bioconductor package implements consensus independent component analysis (ICA)-a data-driven deconvolution method to decompose heterogeneous omics data and extract features suitable for patient stratification and multimodal data integration.

View Article and Find Full Text PDF

Boolean networks are largely employed to model the qualitative dynamics of cell fate processes by describing the change of binary activation states of genes and transcription factors with time. Being able to bridge such qualitative states with quantitative measurements of gene expression in cells, as scRNA-seq, is a cornerstone for data-driven model construction and validation. On one hand, scRNA-seq binarisation is a key step for inferring and validating Boolean models.

View Article and Find Full Text PDF

Esophageal squamous cell carcinoma (ESCC) is the predominant subtype of esophageal cancer in Central Asia, often diagnosed at advanced stages. Understanding population-specific patterns of ESCC is crucial for tailored treatments. This study aimed to unravel ESCC's genetic basis in Kazakhstani patients and identify potential biomarkers for early diagnosis and targeted therapies.

View Article and Find Full Text PDF

The efficiency of analyzing high-throughput data in systems biology has been demonstrated in numerous studies, where molecular data, such as transcriptomics and proteomics, offers great opportunities for understanding the complexity of biological processes. One important aspect of data analysis in systems biology is the shift from a reductionist approach that focuses on individual components to a more integrative perspective that considers the system as a whole, where the emphasis shifted from differential expression of individual genes to determining the activity of gene sets. Here, we present the rROMA software package for fast and accurate computation of the activity of gene sets with coordinated expression.

View Article and Find Full Text PDF

Important quantities of biological data can today be acquired to characterize cell types and states, from various sources and using a wide diversity of methods, providing scientists with more and more information to answer challenging biological questions. Unfortunately, working with this amount of data comes at the price of ever-increasing data complexity. This is caused by the multiplication of data types and batch effects, which hinders the joint usage of all available data within common analyses.

View Article and Find Full Text PDF

Data integration of single-cell RNA-seq (scRNA-seq) data describes the task of embedding datasets gathered from different sources or experiments into a common representation so that cells with similar types or states are embedded close to one another independently from their dataset of origin. Data integration is a crucial step in most scRNA-seq data analysis pipelines involving multiple batches. It improves data visualization, batch effect reduction, clustering, label transfer, and cell type inference.

View Article and Find Full Text PDF

Background: Molecular understanding of muscle-invasive (MIBC) and non-muscle-invasive (NMIBC) bladder cancer is currently based primarily on transcriptomic and genomic analyses.

Objective: To conduct proteogenomic analyses to gain insights into bladder cancer (BC) heterogeneity and identify underlying processes specific to tumor subgroups and therapeutic outcomes.

Design, Setting, And Participants: Proteomic data were obtained for 40 MIBC and 23 NMIBC cases for which transcriptomic and genomic data were already available.

View Article and Find Full Text PDF
Article Synopsis
  • Researchers created a model that helps us understand how cancer cells invade other areas by looking at how they interact with each other and their surroundings.
  • The model uses a mix of techniques to simulate cell movement and can predict ways to stop these cells from spreading.
  • It shows both detailed 2D and 3D pictures of the invasion process and is based on real experiments, helping scientists find new targets for treatment.
View Article and Find Full Text PDF

Background: Exploring the function or the developmental history of cells in various organisms provides insights into a given cell type's core molecular characteristics and putative evolutionary mechanisms. Numerous computational methods now exist for analyzing single-cell data and identifying cell states. These methods mostly rely on the expression of genes considered as markers for a given cell state.

View Article and Find Full Text PDF

Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train.

View Article and Find Full Text PDF

In recent cancer genomics programs, large-scale profiling of microRNAs has been routinely used in order to better understand the role of microRNAs in gene regulation and disease. To support the analysis of such amount of data, scalability of bioinformatics pipelines is increasingly important to handle larger datasets.Here, we describe a scalable implementation of the clustered miRNA Master Regulator Analysis (clustMMRA) pipeline, developed to search for genomic clusters of microRNAs potentially driving cancer molecular subtyping.

View Article and Find Full Text PDF

Summary: We developed BIODICA, an integrated computational environment for application of independent component analysis (ICA) to bulk and single-cell molecular profiles, interpretation of the results in terms of biological functions and correlation with metadata. The computational core is the novel Python package stabilized-ica which provides interface to several ICA algorithms, a stabilization procedure, meta-analysis and component interpretation tools. BIODICA is equipped with a user-friendly graphical user interface, allowing non-experienced users to perform the ICA-based omics data analysis.

View Article and Find Full Text PDF

Cell cycle is a biological process underlying the existence and propagation of life in time and space. It has been an object for mathematical modeling for long, with several alternative mechanistic modeling principles suggested, describing in more or less details the known molecular mechanisms. Recently, cell cycle has been investigated at single cell level in snapshots of unsynchronized cell populations, exploiting the new methods for transcriptomic and proteomic molecular profiling.

View Article and Find Full Text PDF

WebMaBoSS is an easy-to-use web interface for conversion, storage, simulation and analysis of Boolean models that allows to get insight from these models without any specific knowledge of modeling or coding. It relies on an existing software, MaBoSS, which simulates Boolean models using a stochastic approach: it applies continuous time Markov processes over the Boolean network. It was initially built to fill the gap between Boolean and continuous formalisms, i.

View Article and Find Full Text PDF

Motivation: Single-cell RNA-seq (scRNAseq) datasets are characterized by large ambient dimensionality, and their analyses can be affected by various manifestations of the dimensionality curse. One of these manifestations is the hubness phenomenon, i.e.

View Article and Find Full Text PDF

Independent Component Analysis is a matrix factorization method for data dimension reduction. ICA has been widely applied for the analysis of transcriptomic data for blind separation of biological, environmental, and technical factors affecting gene expression. The study aimed to analyze the publicly available esophageal cancer data using the ICA for identification and comprehensive analysis of reproducible signaling pathways and molecular signatures involved in this cancer type.

View Article and Find Full Text PDF

Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces scikit-dimension, an open-source Python package for intrinsic dimension estimation.

View Article and Find Full Text PDF

Multilayer networks allow interpreting the molecular basis of diseases, which is particularly challenging in rare diseases where the number of cases is small compared with the size of the associated multi-omics datasets. In this work, we develop a dimensionality reduction methodology to identify the minimal set of genes that characterize disease subgroups based on their persistent association in multilayer network communities. We use this approach to the study of medulloblastoma, a childhood brain tumor, using proteogenomic data.

View Article and Find Full Text PDF

The rising interest for precise characterization of the tumour immune contexture has recently brought forward the high potential of RNA sequencing (RNA-seq) in identifying molecular mechanisms engaged in the response to immunotherapy. In this review, we provide an overview of the major principles of single-cell and conventional (bulk) RNA-seq applied to onco-immunology. We describe standard preprocessing and statistical analyses of data obtained from such techniques and highlight some computational challenges relative to the sequencing of individual cells.

View Article and Find Full Text PDF

Ewing sarcoma (EwS) is a highly aggressive pediatric bone cancer that is defined by a somatic fusion between the EWSR1 gene and an ETS family member, most frequently the FLI1 gene, leading to expression of a chimeric transcription factor EWSR1-FLI1. Otherwise, EwS is one of the most genetically stable cancers. The situation when the major cancer driver is well known looks like a unique opportunity for applying the systems biology approach in order to understand the EwS mechanisms as well as to uncover some general mechanistic principles of carcinogenesis.

View Article and Find Full Text PDF

Construction of graph-based approximations for multi-dimensional data point clouds is widely used in a variety of areas. Notable examples of applications of such approximators are cellular trajectory inference in single-cell data analysis, analysis of clinical trajectories from synchronic datasets, and skeletonization of images. Several methods have been proposed to construct such approximating graphs, with some based on computation of minimum spanning trees and some based on principal graphs generalizing principal curves.

View Article and Find Full Text PDF