Publications by authors named "Vaibhav Rajan"

Explainable Artificial Intelligence (XAI) techniques generate explanations for predictions from AI models. These explanations can be evaluated for (i) faithfulness to the prediction, i.e.

View Article and Find Full Text PDF

A Synthetic Lethal (SL) interaction is a functional relationship between two genes or functional entities where the loss of either entity is viable but the loss of both is lethal. Such pairs can be used to develop targeted anticancer therapies with fewer side effects and reduced overtreatment. However, finding clinically relevant SL interactions remains challenging.

View Article and Find Full Text PDF

Clustering is a fundamental tool for exploratory data analysis, and is ubiquitous across scientific disciplines. Gaussian Mixture Model (GMM) is a popular probabilistic and interpretable model for clustering. In many practical settings, the true data distribution, which is unknown, may be non-Gaussian and may be contaminated by noise or outliers.

View Article and Find Full Text PDF

Risk models play a crucial role in disease prevention, particularly in intensive care units (ICUs). Diseases often have complex manifestations with heterogeneous subpopulations, or subtypes, that exhibit distinct clinical characteristics. Risk models that explicitly model subtypes have high predictive accuracy and facilitate subtype-specific personalization.

View Article and Find Full Text PDF

Single cell data integration methods aim to integrate cells across data batches and modalities, and data integration tasks can be categorized into horizontal, vertical, diagonal, and mosaic integration, where mosaic integration is the most general and challenging case with few methods developed. We propose scMoMaT, a method that is able to integrate single cell multi-omics data under the mosaic integration scenario using matrix tri-factorization. During integration, scMoMaT is also able to uncover the cluster specific bio-markers across modalities.

View Article and Find Full Text PDF

Motivation: In many biomedical studies, there arises the need to integrate data from multiple directly or indirectly related sources. Collective matrix factorization (CMF) and its variants are models designed to collectively learn from arbitrary collections of matrices. The latent factors learnt are rich integrative representations that can be used in downstream tasks, such as clustering or relation prediction with standard machine-learning models.

View Article and Find Full Text PDF

Background: Patient representation learning aims to learn features, also called representations, from input sources automatically, often in an unsupervised manner, for use in predictive models. This obviates the need for cumbersome, time- and resource-intensive manual feature engineering, especially from unstructured data such as text, images, or graphs. Most previous techniques have used neural network-based autoencoders to learn patient representations, primarily from clinical notes in electronic medical records (EMRs).

View Article and Find Full Text PDF

Study of pairwise genetic interactions, such as mutually exclusive mutations, has led to understanding of underlying mechanisms in cancer. Investigation of various combinatorial motifs within networks of such interactions can lead to deeper insights into its mutational landscape and inform therapy development. One such motif called the Between-Pathway Model (BPM) represents redundant or compensatory pathways that can be therapeutically exploited.

View Article and Find Full Text PDF

Background: Adverse drug events (ADEs) are unintended side effects of drugs that cause substantial clinical and economic burdens globally. Not all ADEs are discovered during clinical trials; therefore, postmarketing surveillance, called pharmacovigilance, is routinely conducted to find unknown ADEs. A wealth of information, which facilitates ADE discovery, lies in the growing body of biomedical literature.

View Article and Find Full Text PDF

Motivation: The study of the evolutionary history of biological networks enables deep functional understanding of various bio-molecular processes. Network growth models, such as the Duplication-Mutation with Complementarity (DMC) model, provide a principled approach to characterizing the evolution of protein-protein interactions (PPIs) based on duplication and divergence. Current methods for model-based ancestral network reconstruction primarily use greedy heuristics and yield sub-optimal solutions.

View Article and Find Full Text PDF

Motivation: Metagenomics studies have provided key insights into the composition and structure of microbial communities found in different environments. Among the techniques used to analyse metagenomic data, binning is considered a crucial step to characterize the different species of micro-organisms present. The use of short-read data in most binning tools poses several limitations, such as insufficient species-specific signal, and the emergence of long-read sequencing technologies offers us opportunities to surmount them.

View Article and Find Full Text PDF

Motivation: A synthetic lethal (SL) interaction is a relationship between two functional entities where the loss of either one of the entities is viable but the loss of both entities is lethal to the cell. Such pairs can be used as drug targets in targeted anticancer therapies, and so, many methods have been developed to identify potential candidate SL pairs. However, these methods use only a subset of available data from multiple platforms, at genomic, epigenomic and transcriptomic levels; and hence are limited in their ability to learn from complex associations in heterogeneous data sources.

View Article and Find Full Text PDF

Motivation: The identification of sub-populations of patients with similar characteristics, called patient subtyping, is important for realizing the goals of precision medicine. Accurate subtyping is crucial for tailoring therapeutic strategies that can potentially lead to reduced mortality and morbidity. Model-based clustering, such as Gaussian mixture models, provides a principled and interpretable methodology that is widely used to identify subtypes.

View Article and Find Full Text PDF

Background: Fitness devices have spurred the development of apps that aim to motivate users, through interventions, to increase their physical activity (PA). Personalization in the interventions is essential as the target users are diverse with respect to their activity levels, requirements, preferences, and behavior.

Objective: This review aimed to (1) identify different kinds of personalization in interventions for promoting PA among any type of user group, (2) identify user models used for providing personalization, and (3) identify gaps in the current literature and suggest future research directions.

View Article and Find Full Text PDF

An Acute Hypotensive Episode (AHE) is the sudden onset of a sustained period of low blood pressure and is one among the most critical conditions in Intensive Care Units (ICU). Without timely medical care, it can lead to an irreversible organ damage and death. By identifying patients at risk for AHE early, adequate medical intervention can save lives and improve patient outcomes.

View Article and Find Full Text PDF

Clinical time series, comprising of repeated clinical measurements provide valuable information of the trajectory of patients' condition. Linear dynamical systems (LDS) are used extensively in science and engineering for modeling time series data. The observation and state variables in LDS are assumed to be uniformly sampled in time with a fixed sampling rate.

View Article and Find Full Text PDF

Stroke is a major cause of mortality and long-term disability in the world. Predictive outcome models in stroke are valuable for personalized treatment, rehabilitation planning and in controlled clinical trials. We design a new multi-class classification model to predict outcome in the short-term, the putative therapeutic window for several treatments.

View Article and Find Full Text PDF

Backgound: Evolution of cancer cells is characterized by large scale and rapid changes in the chromosomal  landscape. The fluorescence in situ hybridization (FISH) technique provides a way to measure the copy numbers of preselected genes in a group of cells and has been found to be a reliable source of data to model the evolution of tumor cells. Chowdhury et al.

View Article and Find Full Text PDF

Postoperative Acute Respiratory Failure (ARF) is a serious complication in critical care affecting patient morbidity and mortality. In this paper we investigate a novel approach to predicting ARF in critically ill patients. We study the use of two disparate sources of information – semi-structured text contained in nursing notes and investigative reports that are regularly recorded and the respiration rate, a physiological signal that is continuously monitored during a patient's ICU stay.

View Article and Find Full Text PDF

Inaccurate inference of positional homologies in multiple sequence alignments and systematic errors introduced by alignment heuristics obfuscate phylogenetic inference. Alignment masking, the elimination of phylogenetically uninformative or misleading sites from an alignment before phylogenetic analysis, is a common practice in phylogenetic analysis. Although masking is often done manually, automated methods are necessary to handle the much larger data sets being prepared today.

View Article and Find Full Text PDF

TIBA is a tool to reconstruct phylogenetic trees from rearrangement data that consist of ordered lists of synteny blocks (or genes), where each synteny block is shared with all of its homologues in the input genomes. The evolution of these synteny blocks, through rearrangement operations, is modelled by the uniform Double-Cut-and-Join model. Using a true distance estimate under this model and simple distance-based methods, TIBA reconstructs a phylogeny of the input genomes.

View Article and Find Full Text PDF

Background: Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date.

View Article and Find Full Text PDF

Comparing two or more phylogenetic trees is a fundamental task in computational biology. The simplest outcome of such a comparison is a pairwise measure of similarity, dissimilarity, or distance. A large number of such measures have been proposed, but so far all suffer from problems varying from computational cost to lack of robustness; many can be shown to behave unexpectedly under certain plausible inputs.

View Article and Find Full Text PDF

The study of genomic inversions (or reversals) has been a mainstay of computational genomics for nearly 20 years. After the initial breakthrough of Hannenhalli and Pevzner, who gave the first polynomial-time algorithm for sorting signed permutations by inversions, improved algorithms have been designed, culminating with an optimal linear-time algorithm for computing the inversion distance and a subquadratic algorithm for providing a shortest sequence of inversions--also known as sorting by inversions. Remaining open was the question of whether sorting by inversions could be done in O(nlogn) time.

View Article and Find Full Text PDF

Background: The rapidly increasing availability of whole-genome sequences has enabled the study of whole-genome evolution. Evolutionary mechanisms based on genome rearrangements have attracted much attention and given rise to many models; somewhat independently, the mechanisms of gene duplication and loss have seen much work. However, the two are not independent and thus require a unified treatment, which remains missing to date.

View Article and Find Full Text PDF