Past research in computational systems biology has focused more on the development and applications of advanced statistical and numerical optimization techniques and much less on understanding the geometry of the biological space. By representing biological entities as points in a low dimensional Euclidean space, state-of-the-art methods for drug-target interaction (DTI) prediction implicitly assume the flat geometry of the biological space. In contrast, recent theoretical studies suggest that biological systems exhibit tree-like topology with a high degree of clustering.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
August 2022
Modeling complex biological systems is necessary to understand biochemical interactions behind pharmacological effects of drugs. Successful in silico drug repurposing relies on exploration of diverse biochemical concepts and their relationships, including drug's adverse reactions, drug targets, disease symptoms, as well as disease associated genes and their pathways, to name a few. We present a computational method for inferring drug-disease associations from complex but incomplete and biased biological networks.
View Article and Find Full Text PDFAdvances in next-generation sequencing and high-throughput techniques have enabled the generation of vast amounts of diverse omics data. These big data provide an unprecedented opportunity in biology, but impose great challenges in data integration, data mining, and knowledge discovery due to the complexity, heterogeneity, dynamics, uncertainty, and high-dimensionality inherited in the omics data. Network has been widely used to represent relations between entities in biological system, such as protein-protein interaction, gene regulation, and brain connectivity (i.
View Article and Find Full Text PDFDue to the aging world population and increasing trend in clinical practice to treat patients with multiple drugs, adverse events (AEs) are becoming a major challenge in drug discovery and public health. In particular, identifying AEs caused by drug combinations remains a challenging task. Clinical trials typically focus on individual drugs rather than drug combinations and animal models are unreliable.
View Article and Find Full Text PDFProceedings (IEEE Int Conf Bioinformatics Biomed)
November 2017
Adverse drug reactions (ADRs) represent one of the main health and economic problems in the world. With increasing data on ADRs, there is an increased need for software tools capable of organizing and storing the information on drug-ADR associations in a form that is easy to use and understand. Here we present a step by step computational procedure capable of extracting drug-ADR frequency data from the large collection of patient safety reports stored in the Federal Drug Administration database.
View Article and Find Full Text PDFAMIA Jt Summits Transl Sci Proc
May 2018
Side effects are the second and the fourth leading causes of drug attrition and death in the US. Thus, accurate prediction of side effects and understanding their mechanism of action will significantly impact drug discovery and clinical practice. Here, we show REMAP, a neighborhood-regularized weighted and imputed one-class collaborative filtering algorithm, is effective in predicting drug-side effect associations from a drug-side effect association network, and significantly outperforms the state-of-the-art multi-target learning algorithm for predicting rare side effects.
View Article and Find Full Text PDFMotivation: Adverse drug reactions (ADRs) are one of the main causes of death and a major financial burden on the world's economy. Due to the limitations of the animal model, computational prediction of serious and rare ADRs is invaluable. However, current state-of-the-art computational methods do not yield significantly better predictions of rare ADRs than random guessing.
View Article and Find Full Text PDFConventional one-drug-one-gene approach has been of limited success in modern drug discovery. Polypharmacology, which focuses on searching for multi-targeted drugs to perturb disease-causing networks instead of designing selective ligands to target individual proteins, has emerged as a new drug discovery paradigm. Although many methods for single-target virtual screening have been developed to improve the efficiency of drug discovery, few of these algorithms are designed for polypharmacology.
View Article and Find Full Text PDFTarget-based screening is one of the major approaches in drug discovery. Besides the intended target, unexpected drug off-target interactions often occur, and many of them have not been recognized and characterized. The off-target interactions can be responsible for either therapeutic or side effects.
View Article and Find Full Text PDFAlgorithms Mol Biol
October 2015
Background: Progress in the field of protein three-dimensional structure prediction depends on the development of new and improved algorithms for measuring the quality of protein models. Perhaps the best descriptor of the quality of a protein model is the GDT function that maps each distance cutoff θ to the number of atoms in the protein model that can be fit under the distance θ from the corresponding atoms in the experimentally determined structure. It has long been known that the area under the graph of this function (GDT_A) can serve as a reliable, single numerical measure of the model quality.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
January 2014
The Largest Common Point-set (LCP) and the Pattern Matching (PM) problems have received much attention in the fields of pattern matching, computer vision and computational biology. Perhaps, the most important application of these problems is the protein structural alignment, which seeks to find a superposition of a pair of input proteins that maximizes a given protein structure similarity metric. Although it has been shown that LCP and PM are both tractable problems, the running times of existing algorithms are high-degree polynomials.
View Article and Find Full Text PDFThe importance of pairwise protein structural comparison in biomedical research is fueling the search for algorithms capable of finding more accurate structural match of two input proteins in a timely manner. In recent years, we have witnessed rapid advances in the development of methods for approximate and optimal solutions to the protein structure matching problem. Albeit slow, these methods can be extremely useful in assessing the accuracy of more efficient, heuristic algorithms.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
May 2014
We study the well known LCP (Largest Common Point-Set) under Bottleneck Distance Problem. Given two proteins a and b (as sequences of points in 3D space) and a distance cutoff σ, the goal is to find a spatial superposition and an alignment that maximizes the number of pairs of points from a and b that can be fit under the distance σ from each other. The best to date algorithms for approximate and exact solution to this problem run in time O(n^8) and O(n^32), respectively, where n represents the protein length.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
April 2012
Protein structure alignment is an important tool in many biological applications, such as protein evolution studies, protein structure modeling, and structure-based, computer-aided drug design. Protein structure alignment is also one of the most challenging problems in computational molecular biology, due to an infinite number of possible spatial orientations of any two protein structures. We study one of the most commonly used measures of pairwise protein structure similarity, defined as the number of pairs of atoms in two proteins that can be superimposed under a predefined distance cutoff.
View Article and Find Full Text PDFJ Bioinform Comput Biol
June 2011
The problem of finding an optimal structural alignment for a pair of superimposed proteins is often amenable to the Smith-Waterman dynamic programming algorithm, which runs in time proportional to the product of lengths of the sequences being aligned. While the quadratic running time is acceptable for computing a single alignment of two fixed protein structures, the time complexity becomes a bottleneck when running the Smith-Waterman routine multiple times in order to find a globally optimal superposition and alignment of the input proteins. We present a subquadratic running time algorithm capable of computing an alignment that optimizes one of the most widely used measures of protein structure similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff.
View Article and Find Full Text PDFMotivation: Structural alignment is an important tool for understanding the evolutionary relationships between proteins. However, finding the best pairwise structural alignment is difficult, due to the infinite number of possible superpositions of two structures. Unlike the sequence alignment problem, which has a polynomial time solution, the structural alignment problem has not been even classified as solvable.
View Article and Find Full Text PDFBMC Bioinformatics
April 2009
Background: In the last decade, a significant improvement in detecting remote similarity between protein sequences has been made by utilizing alignment profiles in place of amino-acid strings. Unfortunately, no analytical theory is available for estimating the significance of a gapped alignment of two profiles. Many experiments suggest that the distribution of local profile-profile alignment scores is of the Gumbel form.
View Article and Find Full Text PDFJ Bioinform Comput Biol
April 2008
Measuring the accuracy of protein three-dimensional structures is one of the most important problems in protein structure prediction. For structure-based drug design, the accuracy of the binding site is far more important than the accuracy of any other region of the protein. We have developed an automated method for assessing the quality of a protein model by focusing on the set of residues in the small molecule binding site.
View Article and Find Full Text PDFMotivation: Profile-based protein homology detection algorithms are valuable tools in genome annotation and protein classification. By utilizing information present in the sequences of homologous proteins, profile-based methods are often able to detect extremely weak relationships between protein sequences, as evidenced by the large-scale benchmarking experiments such as CASP and LiveBench.
Results: We study the relationship between the sensitivity of a profile-profile method and the size of the sequence profile, which is defined as the average number of different residue types observed at the profile's positions.
We present a novel, knowledge-based method for the side-chain addition step in protein structure modeling. The foundation of the method is a conditional probability equation, which specifies the probability that a side-chain will occupy a specific rotamer state, given a set of evidence about the rotamer states adopted by the side-chains at aligned positions in structurally homologous crystal structures. We demonstrate that our method increases the accuracy of homology model side-chain addition when compared with the widely employed practice of preserving the side-chain conformation from the homology template to the target at conserved residue positions.
View Article and Find Full Text PDFSTRUCTFAST is a novel profile-profile alignment algorithm capable of detecting weak similarities between protein sequences. The increased sensitivity and accuracy of the STRUCTFAST method are achieved through several unique features. First, the algorithm utilizes a novel dynamic programming engine capable of incorporating important information from a structural family directly into the alignment process.
View Article and Find Full Text PDFMotivation: Background distribution statistics for profile-based sequence alignment algorithms cannot be calculated analytically, and hence such algorithms must resort to measuring the significance of an alignment score by assessing its location among a distribution of background alignment scores. The Gumbel parameters that describe this background distribution are usually pre-computed for a limited number of scoring systems, gap schemes, and sequence lengths and compositions. The use of such look-ups is known to introduce errors, which compromise the significance assessment of a remote homology relationship.
View Article and Find Full Text PDFProteins comprising the core of the eukaryotic cellular machinery are often highly conserved, presumably due to selective constraints maintaining important structural features. We have developed statistical procedures to decompose these constraints into distinct categories and to pinpoint critical structural features within each category. When applied to P-loop GTPases, this revealed within Rab, Rho, Ras, and Ran a canonical network of molecular interactions centered on bound nucleotide.
View Article and Find Full Text PDF