Motivation: Gene transcripts are distinguished by the composition of their exons, and this different exon composition may contribute to advancing proteome complexity. Despite the availability of alternative splicing information documented in various databases, a ready association of exonic variations to the protein sequence remains a mammoth task.
Results: To associate exonic variation(s) with the protein systematically, we designed the Exon Nomenclature and Classification of Transcripts (ENACT) framework for uniquely annotating exons that tracks their loci in gene architecture context with encapsulating variations in splice site(s) and amino acid coding status.
Vibrio cholerae cytolysin (VCC) is a β-barrel pore-forming toxin (β-PFT). Upon encountering the target cells, VCC forms heptameric β-barrel pores and permeabilizes the cell membranes. Structure-function mechanisms of VCC have been extensively studied in the past.
View Article and Find Full Text PDFEnzyme promiscuity is the ability of (some) enzymes to perform alternate reactions or catalyze non-cognate substrate(s). The latter is referred to as substrate promiscuity, widely studied for its biotechnological applications and understanding enzyme evolution. Insights into the structural basis of substrate promiscuity would greatly benefit the design and engineering of enzymes.
View Article and Find Full Text PDFThe excellent mechanical strength and toughness of spider silk are well characterized experimentally and understood atomistically using computational simulations. However, little attention has been focused on understanding whether the amino acid sequence of β-sheet nanocrystals, which is the key to rendering strength to silk fiber, is optimally chosen to mitigate molecular-scale failure mechanisms. To investigate this, we modeled β-sheet nanocrystals of various representative small/polar/hydrophobic amino acid repeats for determining the sequence motif having superior nanomechanical tensile strength and toughness.
View Article and Find Full Text PDFUndergraduate laboratory courses, owing to their larger sizes and shorter time slots, are often conducted in highly structured modes. However, this approach is known to interfere with students' engagement in the experiments. To enhance students' engagement, we propose an alternative mode of running laboratory courses by creating some "disorder" in a previously adopted structure.
View Article and Find Full Text PDFIntra-chain domain interactions are known to play a significant role in the function and stability of multidomain proteins. These interactions are mediated through a physical interaction at domain-domain interfaces (DDIs). With a motivation to understand evolution of interfaces, we have investigated similarities among DDIs.
View Article and Find Full Text PDFBMC Bioinformatics
December 2017
Background: Knowledge of catalytic residues can play an essential role in elucidating mechanistic details of an enzyme. However, experimental identification of catalytic residues is a tedious and time-consuming task, which can be expedited by computational predictions. Despite significant development in active-site prediction methods, one of the remaining issues is ranked positions of putative catalytic residues among all ranked residues.
View Article and Find Full Text PDFPore-forming toxins (PFTs) are typically produced as water-soluble monomers, which upon interacting with target cells assemble into transmembrane oligomeric pores. Vibrio parahaemolyticus thermostable direct hemolysin (TDH) is an atypical PFT that exists as a tetramer in solution, prior to membrane binding. The TDH structure highlights a core β-sandwich domain similar to those found in the eukaryotic actinoporin family of PFTs.
View Article and Find Full Text PDFAs metabolic engineering and synthetic biology progress toward reaching the goal of a more sustainable use of biological resources, the need of increasing the number of value-added chemicals that can be produced in industrial organisms becomes more imperative. Exploring, however, the vast possibility of pathways amenable to engineering through heterologous genes expression in a chassis organism is complex and unattainable manually. Here, we present XTMS, a web-based pathway analysis platform available at http://xtms.
View Article and Find Full Text PDFDespite recent advances, it is yet not clear how intrinsically disordered regions in proteins recognize their targets without any defined structures. Short linear motifs had been proposed to mediate molecular recognition by disordered regions; however, the underlying structural prerequisite remains elusive. Moreover, the role of short linear motifs in DNA recognition has not been studied.
View Article and Find Full Text PDFBackground: We consider the possibility of engineering metabolic pathways in a chassis organism in order to synthesize novel target compounds that are heterologous to the chassis. For this purpose, we model metabolic networks through hypergraphs where reactions are represented by hyperarcs. Each hyperarc represents an enzyme-catalyzed reaction that transforms set of substrates compounds into product compounds.
View Article and Find Full Text PDFIn a variety of threading methods, often poorly ranked (low z-score) templates have good alignments. Here, a new method, TASSER_low-zsc that identifies these low z-score-ranked templates to improve protein structure prediction accuracy, is described. The approach consists of clustering of threading templates by affinity propagation on the basis of structural similarity (thread_cluster) followed by TASSER modeling, with final models selected by using a TASSER_QA variant.
View Article and Find Full Text PDFUnlabelled: In the post-genomic era, the annotation of protein function facilitates the understanding of various biological processes. To extend the range of function annotation methods to the twilight zone of sequence identity, we have developed approaches that exploit both protein tertiary structure and/or protein sequence evolutionary relationships. To serve the scientific community, we have integrated the structure prediction tools, TASSER, TASSER-Lite and METATASSER, and the functional inference tools, FINDSITE, a structure-based algorithm for binding site prediction, Gene Ontology molecular function inference and ligand screening, EFICAz(2), a sequence-based approach to enzyme function inference and DBD-hunter, an algorithm for predicting DNA-binding proteins and associated DNA-binding residues, into a unified web resource, Protein Structure and Function prediction Resource (PSiFR).
View Article and Find Full Text PDFThe performance of the protein structure prediction server pro-sp3-TASSER in CASP8 is described. Compared to CASP7, the major improvement in prediction is in the quality of input models to TASSER. These improvements are due to the PRO-SP(3) threading method, the improved quality of contact predictions provided by TASSER_2.
View Article and Find Full Text PDFBackground: Protein tertiary structure comparisons are employed in various fields of contemporary structural biology. Most structure comparison methods involve generation of an initial seed alignment, which is extended and/or refined to provide the best structural superposition between a pair of protein structures as assessed by a structure comparison metric. One such metric, the TM-score, was recently introduced to provide a combined structure quality measure of the coordinate root mean square deviation between a pair of structures and coverage.
View Article and Find Full Text PDFAn improved TASSER (Threading/ASSEmbly/Refinement) methodology is applied to predict the tertiary structure for all CASP7 targets. TASSER employs template identification by threading, followed by tertiary structure assembly by rearranging continuous template fragments, where conformational space is searched via Parallel Hyperbolic Monte Carlo sampling with an optimized force-field that includes knowledge-based statistical potentials and restraints derived from threading templates. The final models are selected by clustering structures from the low temperature replicas.
View Article and Find Full Text PDFThis study involves the development of a rapid comparative modeling tool for homologous sequences by extension of the TASSER methodology, developed for tertiary structure prediction. This comparative modeling procedure was validated on a representative benchmark set of proteins in the Protein Data Bank composed of 901 single domain proteins (41-200 residues) having sequence identities between 35-90% with respect to the template. Using a Monte Carlo search scheme with the length of runs optimized for weakly/nonhomologous proteins, TASSER often provides appreciable improvement in structure quality over the initial template.
View Article and Find Full Text PDFDuring the course of our large-scale genome analysis a conserved domain, currently detectable only in the genomes of Drosophila melanogaster, Caenorhabditis elegans and Anopheles gambiae, has been identified. The function of this domain is currently unknown and no function annotation is provided for this domain in the publicly available genomic, protein family and sequence databases. The search for the homologues of this domain in the non-redundant sequence database using PSI-BLAST, resulted in identification of distant relationship between this family and the alkaline phosphatase-like superfamily, which includes families of aryl sulfatase, N-acetylgalactosomine-4-sulfatase, alkaline phosphatase and 2,3-bisphosphoglycerate-independent phosphoglycerate mutase (iPGM).
View Article and Find Full Text PDFIn Silico Biol
June 2005
A family of hypothetical proteins, identified predominantly from archaeal genomes, has been analyzed in order to understand its functional characteristics. Using extensive sequence similarity searches it is inferred that this family is remotely related (best sequence identity is 19%) to ClpP proteinases that belongs to serine proteinase class. This family of hypothetical proteins is referred to as SDH proteinase family based on conserved sequential order of Ser, Asp and His residues and predicted serine proteinase activity.
View Article and Find Full Text PDFIn order to bridge the gap between proteins with three-dimensional (3-D) structural information and those without 3-D structures, extensive experimental and computational efforts for structure recognition are being invested. One of the rapid and simple computational approaches for structure recognition makes use of sequence profiles with sensitive profile matching procedures to identify remotely related homologous families. While adopting this approach we used profiles that are generated from structure-based sequence alignment of homologous protein domains of known structures integrated with sequence homologues.
View Article and Find Full Text PDFThe sequencing of the Mycobacterium tuberculosis (MTB) H37Rv genome has facilitated deeper insights into the biology of MTB, yet the functions of many MTB proteins are unknown. We have used sensitive profile-based search procedures to assign functional and structural domains to infer functions of gene products encoded in MTB. These domain assignments have been made using a compendium of sequence and structural domain families.
View Article and Find Full Text PDFBackground: SUPFAM database is a compilation of superfamily relationships between protein domain families of either known or unknown 3-D structure. In SUPFAM, sequence families from Pfam and structural families from SCOP are associated, using profile matching, to result in sequence superfamilies of known structure. Subsequently all-against-all family profile matches are made to deduce a list of new potential superfamilies of yet unknown structure.
View Article and Find Full Text PDFThe members of the family of G-proteins are characterized by their ability to bind and hydrolyze guanosine triphosphate (GTP) to guanosine diphosphate (GDP). Despite a common biochemical function of GTP hydrolysis shared among the members of the family of G-proteins, they are associated with diverse biological roles. The current work describes the identification and detailed analysis of the putative G-proteins encoded in the completely sequenced prokaryotic genomes.
View Article and Find Full Text PDFThe database of Phylogeny and ALIgnment of homologous protein structures (PALI) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of protein domains in various families. The latest updated version (Release 2.1) comprises of 844 families of homologous proteins involving 3863 protein domain structures with each of these families having at least two members.
View Article and Find Full Text PDFMycobacterium tuberculosis is a globally successful pathogen, infecting more than one third of total world's population. These bacteria have the remarkable ability to persist in the host for long periods of time unrecognized by the immune system and then to re-emerge later in life causing the disease. The physiology of such persistent or dormant bacilli is not very well characterized.
View Article and Find Full Text PDF