Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline.
View Article and Find Full Text PDFProteome-wide identification of protein-protein interactions is a formidable task which has yet to be sufficiently addressed by experimental methodologies. Many computational methods have been developed to predict proteome-wide interaction networks, but few leverage both the sensitivity of structural information and the wide availability of sequence data. We present PEPPI, a pipeline which integrates structural similarity, sequence similarity, functional association data, and machine learning-based classification through a naïve Bayesian classifier model to accurately predict protein-protein interactions at a proteomic scale.
View Article and Find Full Text PDFThis article reports and analyzes the results of protein contact and distance prediction by our methods in the 14th Critical Assessment of techniques for protein Structure Prediction (CASP14). A new deep learning-based contact/distance predictor was employed based on the ensemble of two complementary coevolution features coupling with deep residual networks. We also improved our multiple sequence alignment (MSA) generation protocol with wholesale meta-genome sequence databases.
View Article and Find Full Text PDFStructure prediction for proteins lacking homologous templates in the Protein Data Bank (PDB) remains a significant unsolved problem. We developed a protocol, C-I-TASSER, to integrate interresidue contact maps from deep neural-network learning with the cutting-edge I-TASSER fragment assembly simulations. Large-scale benchmark tests showed that C-I-TASSER can fold more than twice the number of non-homologous proteins than the I-TASSER, which does not use contacts.
View Article and Find Full Text PDFIn this article, we report 3D structure prediction results by two of our best server groups ("Zhang-Server" and "QUARK") in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations.
View Article and Find Full Text PDFThe topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training.
View Article and Find Full Text PDFProtein-ligand docking is an important approach for virtual screening and protein function annotation. Although many docking methods have been developed, most require a high-resolution crystal structure of the receptor and a user-specified binding site to start. This information is, however, not available for the majority of unknown proteins, including many pharmaceutically important targets.
View Article and Find Full Text PDFAs the infection of 2019-nCoV coronavirus is quickly developing into a global pneumonia epidemic, careful analysis of its transmission and cellular mechanisms is sorely needed. In this report, we re-analyzed the computational approaches and findings presented in two recent manuscripts by Ji . (https://doi.
View Article and Find Full Text PDFAs the infection of 2019-nCoV coronavirus is quickly developing into a global pneumonia epidemic, the careful analysis of its transmission and cellular mechanisms is sorely needed. In this Communication, we first analyzed two recent studies that concluded that snakes are the intermediate hosts of 2019-nCoV and that the 2019-nCoV spike protein insertions share a unique similarity to HIV-1. However, the reimplementation of the analyses, built on larger scale data sets using state-of-the-art bioinformatics methods and databases, presents clear evidence that rebuts these conclusions.
View Article and Find Full Text PDFβ-Glucosidase (BGL) is a rate-limiting enzyme of lignocellulose hydrolysis for second-generation bioethanol production, but its inhibition by lignocellulose pretreatment products, ethanol, and salt is apparent. Here, the recombinant Penicillium oxalicum 16 BGL 1 (rPO16BGL1) from Pichia pastoris GS115 kept complete activity at 0.2-1.
View Article and Find Full Text PDFFuture Gener Comput Syst
October 2019
There is an increasing gap between the number of known protein sequences and the number of proteins with experimentally characterized structure and function. To alleviate this issue, we have developed the I-TASSER gateway, an online server for automated and reliable protein structure and function prediction. For a given sequence, I-TASSER starts with template recognition from a known structure library, followed by full-length atomic model construction by iterative assembly simulations of the continuous structural fragments excised from the template alignments.
View Article and Find Full Text PDFWe report the results of residue-residue contact prediction of a new pipeline built purely on the learning of coevolutionary features in the CASP13 experiment. For a query sequence, the pipeline starts with the collection of multiple sequence alignments (MSAs) from multiple genome and metagenome sequence databases using two complementary Hidden Markov Model (HMM)-based searching tools. Three profile matrices, built on covariance, precision, and pseudolikelihood maximization respectively, are then created from the MSAs, which are used as the input features of a deep residual convolutional neural network architecture for contact-map training and prediction.
View Article and Find Full Text PDFComparison of ligand poses generated by protein-ligand docking programs has often been carried out with the assumption of direct atomic correspondence between ligand structures. However, this correspondence is not necessarily chemically relevant for symmetric molecules and can lead to an artificial inflation of ligand pose distance metrics, particularly those that depend on receptor superposition (rather than ligand superposition), such as docking root mean square deviation (RMSD). Several of the commonly-used RMSD calculation algorithms that correct for molecular symmetry do not take into account the bonding structure of molecules and can therefore result in non-physical atomic mapping.
View Article and Find Full Text PDFBioorg Med Chem Lett
December 2018
SurA is a gram-negative, periplasmic chaperone protein involved in the proper folding of outer membrane porins (OMPs), which protect bacteria against toxins in the extracellular environment by selectively regulating the passage of nutrients into the cell. Previous studies demonstrated that deletion of SurA renders bacteria more sensitive to toxins that compromise the integrity of the outer membrane. Inhibitors of SurA will perturb the folding of OMPs, leading to disruption of the outer membrane barrier and making the cell more vulnerable to toxic insults.
View Article and Find Full Text PDF