Publications by authors named "Leonid Chindelevitch"

Sequence variation observed in populations of pathogens can be used for important public health and evolutionary genomic analyses, especially outbreak analysis and transmission reconstruction. Identifying this variation is typically achieved by aligning sequence reads to a reference genome, but this approach is susceptible to reference biases and requires careful filtering of called genotypes. There is a need for tools that can process this growing volume of bacterial genome data, providing rapid results, but that remain simple so they can be used without highly trained bioinformaticians, expensive data analysis, and long-term storage and processing of large files.

View Article and Find Full Text PDF

As public health laboratories expand their genomic sequencing and bioinformatics capacity for the surveillance of different pathogens, labs must carry out robust validation, training, and optimization of wet- and dry-lab procedures. Achieving these goals for algorithms, pipelines and instruments often requires that lower quality datasets be made available for analysis and comparison alongside those of higher quality. This range of data quality in reference sets can complicate the sharing of sub-optimal datasets that are vital for the community and for the reproducibility of assays.

View Article and Find Full Text PDF
Article Synopsis
  • MICs are currently the standard method for measuring antibiotic resistance, but traditional lab methods are often cumbersome and inconsistent.
  • The study explored using genome sequencing and machine learning for predicting MICs, focusing on interpretable models like Elastic Net and Random Forests to enhance clinical relevance.
  • Results suggest that how MICs are treated in predictive models—either as continuous or categorical variables—impacts prediction accuracy, recommending different approaches based on the quantity of available antibiotic concentration levels.
View Article and Find Full Text PDF

Summary: Fastlin is a bioinformatics tool designed for rapid Mycobacterium tuberculosis complex (MTBC) lineage typing. It utilizes an ultra-fast alignment-free approach to detect previously identified barcode single nucleotide polymorphisms associated with specific MTBC lineages. In a comprehensive benchmarking against existing tools, fastlin demonstrated high accuracy and significantly faster running times.

View Article and Find Full Text PDF

The increasing availability of high-throughput sequencing (frequently termed next-generation sequencing (NGS)) data has created opportunities to gain deeper insights into the mechanisms of a number of diseases and is already impacting many areas of medicine and public health. The area of infectious diseases stands somewhat apart from other human diseases insofar as the relevant genomic data comes from the microbes rather than their human hosts. A particular concern about the threat of antimicrobial resistance (AMR) has driven the collection and reporting of large-scale datasets containing information from microbial genomes together with antimicrobial susceptibility test (AST) results.

View Article and Find Full Text PDF

The problem of computing the Elementary Flux Modes (EFMs) and Minimal Cut Sets (MCSs) of metabolic network is a fundamental one in metabolic networks. A key insight is that they can be understood as a dual pair of monotone Boolean functions (MBFs). Using this insight, this computation reduces to the question of generating from an oracle a dual pair of MBFs.

View Article and Find Full Text PDF

An important problem in genome comparison is the genome sorting problem, that is, the problem of finding a sequence of basic operations that transforms one genome into another whose length (possibly weighted) equals the distance between them. These sequences are called optimal sorting scenarios. However, there is usually a large number of such scenarios, and a naïve algorithm is very likely to be biased towards a specific type of scenario, impairing its usefulness in real-world applications.

View Article and Find Full Text PDF

Motivation: The rank distance model represents genome rearrangements in multi-chromosomal genomes as matrix operations, which allows the reconstruction of parsimonious histories of evolution by rearrangements. We seek to generalize this model by allowing for genomes with different gene content, to accommodate a broader range of biological contexts. We approach this generalization by using a matrix representation of genomes.

View Article and Find Full Text PDF

Antibiotic-resistant pathogens are a major public health threat. A deeper understanding of how an antibiotic's mechanism of action influences the emergence of resistance would aid in the design of new drugs and help to preserve the effectiveness of existing ones. To this end, we developed a model that links bacterial population dynamics with antibiotic-target binding kinetics.

View Article and Find Full Text PDF

The field of genomic epidemiology is rapidly growing as many jurisdictions begin to deploy whole-genome sequencing (WGS) in their national or regional pathogen surveillance programmes. WGS data offer a rich view of the shared ancestry of a set of taxa, typically visualized with phylogenetic trees illustrating the clusters or subtypes present in a group of taxa, their relatedness and the extent of diversification within and between them. When methicillin-resistant (MRSA) arose and disseminated widely, phylogenetic trees of MRSA-containing types of had a distinctive 'comet' shape, with a 'comet head' of recently adapted drug-resistant isolates in the context of a 'comet tail' that was predominantly drug-sensitive.

View Article and Find Full Text PDF
Article Synopsis
  • - The study focuses on developing a WHO-endorsed catalogue of mutations to help predict drug resistance in Mycobacterium tuberculosis (MTBC) as part of enhancing molecular diagnostics for quicker drug susceptibility testing.
  • - Using data from 38,215 MTBC isolates across 45 countries, researchers identified and classified 15,667 mutation associations, with 1,149 mutations linked to resistance and 107 to susceptibility for 13 anti-tuberculosis drugs.
  • - The findings reveal high sensitivity (>80%) and specificity (>95%) for most tested drugs, showcasing the potential of the catalogue for informing treatment decisions, although fewer resistance mutations were found for certain drugs like bedaquiline and linezolid.
View Article and Find Full Text PDF

Background: There has been a simultaneous increase in demand and accessibility across genomics, transcriptomics, proteomics and metabolomics data, known as omics data. This has encouraged widespread application of omics data in life sciences, from personalized medicine to the discovery of underlying pathophysiology of diseases. Causal analysis of omics data may provide important insight into the underlying biological mechanisms.

View Article and Find Full Text PDF

The shape of phylogenetic trees can be used to gain evolutionary insights. A tree's shape specifies the connectivity of a tree, while its branch lengths reflect either the time or genetic distance between branching events; well-known measures of tree shape include the Colless and Sackin imbalance, which describe the asymmetry of a tree. In other contexts, network science has become an important paradigm for describing structural features of networks and using them to understand complex systems, ranging from protein interactions to social systems.

View Article and Find Full Text PDF

European governments use non-pharmaceutical interventions (NPIs) to control resurging waves of COVID-19. However, they only have outdated estimates for how effective individual NPIs were in the first wave. We estimate the effectiveness of 17 NPIs in Europe's second wave from subnational case and death data by introducing a flexible hierarchical Bayesian transmission model and collecting the largest dataset of NPI implementation dates across Europe.

View Article and Find Full Text PDF

Motivation: Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria.

View Article and Find Full Text PDF

The occurrence of multiple strains of a bacterial pathogen such as or within a single human host, referred to as a mixed infection, has important implications for both healthcare and public health. However, methods for detecting it, and especially determining the proportion and identities of the underlying strains, from WGS (whole-genome sequencing) data, have been limited. In this paper we introduce SplitStrains, a novel method for addressing these challenges.

View Article and Find Full Text PDF
Article Synopsis
  • * The study analyzed data from various European and non-European countries from January to May 2020 to evaluate the impact of different NPIs on virus transmission.
  • * Key findings showed that closing educational institutions, limiting gatherings to 10 people or less, and shutting down face-to-face businesses significantly reduced transmission, while stay-at-home orders had a smaller effect.
View Article and Find Full Text PDF

With the exponential growth of genome databases, the importance of phylogenetics has increased dramatically over the past years. Studying phylogenetic trees enables us not only to understand how genes, genomes, and species evolve, but also helps us predict how they might change in future. One of the crucial aspects of phylogenetics is the comparison of two or more phylogenetic trees.

View Article and Find Full Text PDF

Background: Bacterial pathogens exhibit an impressive amount of genomic diversity. This diversity can be informative of evolutionary adaptations, host-pathogen interactions, and disease transmission patterns. However, capturing this diversity directly from biological samples is challenging.

View Article and Find Full Text PDF

Background: The area of genome rearrangements has given rise to a number of interesting biological, mathematical and algorithmic problems. Among these, one of the most intractable ones has been that of finding the median of three genomes, a special case of the ancestral reconstruction problem. In this work we re-examine our recently proposed way of measuring genome rearrangement distance, namely, the rank distance between the matrix representations of the corresponding genomes, and show that the median of three genomes can be computed exactly in polynomial time , where , with respect to this distance, when the median is allowed to be an arbitrary orthogonal matrix.

View Article and Find Full Text PDF

Phylogenetic trees are frequently used in biology to study the relationships between a number of species or organisms. The shape of a phylogenetic tree contains useful information about patterns of speciation and extinction, so powerful tools are needed to investigate the shape of a phylogenetic tree. Tree shape statistics are a common approach to quantifying the shape of a phylogenetic tree by encoding it with a single number.

View Article and Find Full Text PDF

Motivation: Constraint-based modeling of metabolic networks helps researchers gain insight into the metabolic processes of many organisms, both prokaryotic and eukaryotic. Minimal cut sets (MCSs) are minimal sets of reactions whose inhibition blocks a target reaction in a metabolic network. Most approaches for finding the MCSs in constrained-based models require, either as an intermediate step or as a byproduct of the calculation, the computation of the set of elementary flux modes (EFMs), a convex basis for the valid flux vectors in the network.

View Article and Find Full Text PDF

Motivation: Despite the remarkable advances in sequencing and computational techniques, noise in the data and complexity of the underlying biological mechanisms render deconvolution of the phylogenetic relationships between cancer mutations difficult. Besides that, the majority of the existing datasets consist of bulk sequencing data of single tumor sample of an individual. Accurate inference of the phylogenetic order of mutations is particularly challenging in these cases and the existing methods are faced with several theoretical limitations.

View Article and Find Full Text PDF

Mathematical models are often regarded as recent innovations in the description and analysis of infectious disease outbreaks and epidemics, but simple mathematical expressions have been in use for projection of epidemic trajectories for more than a century. We recently introduced a single equation model (the incidence decay with exponential adjustment, or IDEA model) that can be used for short-term epidemiological forecasting. In the mid-19th century, Dr.

View Article and Find Full Text PDF