Background: Bacteria and archaea produce an enormous diversity of modified peptides that are involved in various forms of inter-microbial conflicts or communication. A vast class of such peptides are Ribosomally synthesized, Postranslationally modified Peptides (RiPPs), and a major group of RiPPs are graspetides, so named after ATP-grasp ligases that catalyze the formation of lactam and lactone linkages in these peptides. The diversity of graspetides, the multiple proteins encoded in the respective Biosynthetic Gene Clusters (BGCs) and their evolution have not been studied in full detail.
View Article and Find Full Text PDFAmong the ribosomally synthesized and post-translationally modified peptide (RiPP) natural products, "graspetides" (formerly known as microviridins) contain macrocyclic esters and amides that are formed by ATP-grasp ligase tailoring enzymes using the side chains of Asp/Glu as acceptors and Thr/Ser/Lys as donors. Graspetides exhibit diverse patterns of macrocylization and connectivities exemplified by microviridins, that have a caged tricyclic core, and thuringin and plesiocin that feature a "hairpin topology" with cross-strand ω-ester bonds. Here, we characterize chryseoviridin, a new type of multicore RiPP encoded by DS19109 (Phylum Bacteroidetes) and solve a 2.
View Article and Find Full Text PDFAsgard is a recently discovered superphylum of archaea that appears to include the closest archaeal relatives of eukaryotes. Debate continues as to whether the archaeal ancestor of eukaryotes belongs within the Asgard superphylum or whether this ancestor is a sister group to all other archaea (that is, a two-domain versus a three-domain tree of life). Here we present a comparative analysis of 162 complete or nearly complete genomes of Asgard archaea, including 75 metagenome-assembled genomes that-to our knowledge-have not previously been reported.
View Article and Find Full Text PDFSeverity of seasonal influenza A epidemics is related to the antigenic novelty of the predominant viral strains circulating each year. Support for a strong correlation between epidemic severity and antigenic drift comes from infectious challenge experiments on vaccinated animals and human volunteers, field studies of vaccine efficacy, prospective studies of subjects with laboratory-confirmed prior infections, and analysis of the connection between drift and severity from surveillance data. We show that, given data on the antigenic and sequence novelty of the hemagglutinin protein of clinical isolates of H3N2 virus from a season along with the corresponding data from prior seasons, we can accurately predict the influenza severity for that season.
View Article and Find Full Text PDFThe hemagglutinin protein of influenza virus bears several sites of N-linked asparagine glycosylation. The number and location of these sites varies with strain and substrain. The human H3 hemagglutinin has gained several glycosylation sites on the antigenically important globular head since its introduction to humans, presumably due to selection.
View Article and Find Full Text PDFGenome sequencing projects have resulted in a rapid accumulation of predicted protein sequences. With experimentally verified information on protein function lagging far behind, computational methods are used for functional annotation of proteins. Here we describe a number of protocols for protein sequence and structure analysis that can be used to infer function of uncharacterized proteins.
View Article and Find Full Text PDFThe availability of complete genome sequences of diverse bacteria and archaea makes comparative sequence analysis a powerful tool for analyzing signal transduction systems encoded in these genomes. However, most signal transduction proteins consist of two or more individual protein domains, which significantly complicates their functional annotation and makes automated annotation of these proteins in the course of large-scale genome sequencing projects particularly unreliable. This chapter describes certain common-sense protocols for sequence analysis of two-component histidine kinases and response regulators, as well as other components of the prokaryotic signal transduction machinery: Ser/Thr/Tyr protein kinases and protein phosphatases, adenylate and diguanylate cyclases, and c-di-GMP phosphodiesterases.
View Article and Find Full Text PDFInterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files.
View Article and Find Full Text PDFThe PIRSF protein classification system (http://pir.georgetown.edu/pirsf/) reflects evolutionary relationships of full-length proteins and domains.
View Article and Find Full Text PDFInterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information.
View Article and Find Full Text PDFIncreasingly, scientists have begun to tackle gene functions and other complex regulatory processes by studying organisms at the global scales for various levels of biological organization, ranging from genomes to metabolomes and physiomes. Meanwhile, new bioinformatics methods have been developed for inferring protein function using associative analysis of functional properties to complement the traditional sequence homology-based methods. To fully exploit the value of the high-throughput system biology data and to facilitate protein functional studies requires bioinformatics infrastructures that support both data integration and associative analysis.
View Article and Find Full Text PDFBackground: Sequencing the genomes of multiple, taxonomically diverse eukaryotes enables in-depth comparative-genomic analysis which is expected to help in reconstructing ancestral eukaryotic genomes and major events in eukaryotic evolution and in making functional predictions for currently uncharacterized conserved genes.
Results: We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs (eukaryotic orthologous groups or KOGs) from seven eukaryotic genomes: Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Encephalitozoon cuniculi. Conservation of KOGs through the phyletic range of eukaryotes strongly correlates with their functions and with the effect of gene knockout on the organism's viability.
The Protein Information Resource (PIR) is an integrated public resource of protein informatics. To facilitate the sensible propagation and standardization of protein annotation and the systematic detection of annotation errors, PIR has extended its superfamily concept and developed the SuperFamily (PIRSF) classification system. Based on the evolutionary relationships of whole proteins, this classification system allows annotation of both specific biological and generic biochemical functions.
View Article and Find Full Text PDFBMC Bioinformatics
September 2003
Background: The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies.
Results: We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups.
Three-dimensional structures are now known within most protein families and it is likely, when searching a sequence database, that one will identify a homolog of known structure. The goal of Entrez's 3D-structure database is to make structure information and the functional annotation it can provide easily accessible to molecular biologists. To this end, Entrez's search engine provides several powerful features: (i) links between databases, for example between a protein's sequence and structure; (ii) pre-computed sequence and structure neighbors; and (iii) structure and sequence/structure alignment visualization.
View Article and Find Full Text PDFThe Conserved Domain Database (CDD) is now indexed as a separate database within the Entrez system and linked to other Entrez databases such as MEDLINE(R). This allows users to search for domain types by name, for example, or to view the domain architecture of any protein in Entrez's sequence database. CDD can be accessed on the WorldWideWeb at http://www.
View Article and Find Full Text PDFTransmembrane receptors in microorganisms, such as sensory histidine kinases and methyl-accepting chemotaxis proteins, are molecular devices for monitoring environmental changes. We report here that sensory domain sharing is widespread among different classes of transmembrane receptors. We have identified two novel conserved extracellular sensory domains, named CHASE2 and CHASE3, that are found in at least four classes of transmembrane receptors: histidine kinases, adenylate cyclases, predicted diguanylate cyclases, and either serine/threonine protein kinases (CHASE2) or methyl-accepting chemotaxis proteins (CHASE3).
View Article and Find Full Text PDFSequence analysis of bacterial genomes revealed a novel DNA-binding domain. This domain is found in several response regulators of the two-component signal transduction system, such as Pseudomonas aeruginosa AlgR, involved in the regulation of alginate biosynthesis and in the pathogenesis of cystic fibrosis; Clostridium perfringens VirR, a regulator of virulence factors, and in several regulators of bacteriocin biosynthesis, previously unified in the AgrA/ComE family. Most of the transcriptional regulators that contain this DNA-binding domain are involved in biosynthesis of extracellular polysaccharides, fimbriation, expression of exoproteins, including toxins, and quorum sensing.
View Article and Find Full Text PDF