The Clusters of Orthologous Genes (COG) database, originally created in 1997, has been updated to reflect the constantly growing collection of completely sequenced prokaryotic genomes. This update increased the genome coverage from 1309 to 2296 species, including 2103 bacteria and 193 archaea, in most cases, with a single representative genome per genus. This set covers all genera of bacteria and archaea that included organisms with 'complete genomes' as per NCBI databases in November 2023.
View Article and Find Full Text PDFIn silico identification of viral anti-CRISPR proteins (Acrs) has relied largely on the guilt-by-association method using known Acrs or anti-CRISPR associated proteins (Acas) as the bait. However, the low number and limited spread of the characterized archaeal Acrs and Aca hinders our ability to identify Acrs using guilt-by-association. Here, based on the observation that the few characterized archaeal Acrs and Aca are transcribed immediately post viral infection, we hypothesize that these genes, and many other unidentified anti-defense genes (ADG), are under the control of conserved regulatory sequences including a strong promoter, which can be used to predict anti-defense genes in archaeal viruses.
View Article and Find Full Text PDFThe identification of microbial genes essential for survival as those with lethal knockout phenotype (LKP) is a common strategy for functional interrogation of genomes. However, interpretation of the LKP is complicated because a substantial fraction of the genes with this phenotype remains poorly functionally characterized. Furthermore, many genes can exhibit LKP not because their products perform essential cellular functions but because their knockout activates the toxicity of other genes (conditionally essential genes).
View Article and Find Full Text PDFGenomes of bacteria and archaea contain a much larger fraction of unidirectional (serial) gene pairs than convergent or divergent gene pairs. Many of the unidirectional gene pairs have short overlaps of -4 nt and -1 nt. As shown previously, translation of the genes in overlapping unidirectional gene pairs is tightly coupled.
View Article and Find Full Text PDFThe evolution of genomes in all life forms involves two distinct, dynamic types of genomic changes: gene duplication (and loss) that shape families of paralogous genes and extension (and contraction) of low-complexity regions (LCR), which occurs through dynamics of short repeats in protein-coding genes. Although the roles of each of these types of events in genome evolution have been studied, their co-evolutionary dynamics is not thoroughly understood. Here, by analyzing a wide range of genomes from diverse bacteria and archaea, we show that LCR and paralogy represent two distinct routes of evolution that are inversely correlated.
View Article and Find Full Text PDFBackground: Evolutionary rate is a key characteristic of gene families that is linked to the functional importance of the respective genes as well as specific biological functions of the proteins they encode. Accurate estimation of evolutionary rates is a challenging task that requires precise phylogenetic analysis. Here we present an easy to estimate protein family level measure of sequence variability based on alignment column homogeneity in multiple alignments of protein sequences from Clade-Specific Clusters of Orthologous Genes (csCOGs).
View Article and Find Full Text PDFBackground: Bacteria and archaea produce an enormous diversity of modified peptides that are involved in various forms of inter-microbial conflicts or communication. A vast class of such peptides are Ribosomally synthesized, Postranslationally modified Peptides (RiPPs), and a major group of RiPPs are graspetides, so named after ATP-grasp ligases that catalyze the formation of lactam and lactone linkages in these peptides. The diversity of graspetides, the multiple proteins encoded in the respective Biosynthetic Gene Clusters (BGCs) and their evolution have not been studied in full detail.
View Article and Find Full Text PDFMolecular mechanisms involved in biological conflicts and self vs nonself recognition in archaea remain poorly characterized. We apply phylogenomic analysis to identify a hypervariable gene module that is widespread among . These loci consist of an upstream gene coding for a large protein containing several immunoglobulin (Ig) domains and unique combinations of downstream genes, some of which also contain Ig domains.
View Article and Find Full Text PDFNumerous, diverse, highly variable defense and offense genetic systems are encoded in most bacterial genomes and are involved in various forms of conflict among competing microbes or their eukaryotic hosts. Here we focus on the offense and self-versus-nonself discrimination systems encoded by archaeal genomes that so far have remained largely uncharacterized and unannotated. Specifically, we analyze archaeal genomic loci encoding polymorphic and related toxin systems and ribosomally synthesized antimicrobial peptides.
View Article and Find Full Text PDFScreening of genomic and metagenomic databases for new variants of CRISPR-Cas systems increasingly results in the discovery of derived variants that do not seem to possess the interference capacity and are implicated in functions distinct from adaptive immunity. We describe an extremely derived putative class 1 CRISPR-Cas system that is present in many Halobacteria and consists of distant homologs of the Cas5 and Cas7 protein along with an uncharacterized conserved protein and various nucleases. We hypothesize that, although this system lacks typical CRISPR effectors or a CRISPR array, it functions as a RNA-dependent defense mechanism that, unlike other derived CRISPR-Cas, utilizes alternative nucleases to cleave invader genomes.
View Article and Find Full Text PDFToxoplasma gondii is among the most prevalent parasites worldwide, infecting many wild and domestic animals and causing zoonotic infections in humans. T. gondii differs substantially in its broad distribution from closely related parasites that typically have narrow, specialized host ranges.
View Article and Find Full Text PDFMedicago truncatula, a close relative of alfalfa (Medicago sativa), is a model legume used for studying symbiotic nitrogen fixation, mycorrhizal interactions and legume genomics. J. Craig Venter Institute (JCVI; formerly TIGR) has been involved in M.
View Article and Find Full Text PDFThe Arabidopsis Information Portal (https://www.araport.org) is a new online resource for plant biology research.
View Article and Find Full Text PDFBackground: While the pneumococcal protein conjugate vaccines reduce the incidence in invasive pneumococcal disease (IPD), serotype replacement remains a major concern. Thus, serotype-independent protection with vaccines targeting virulence genes, such as PspA, have been pursued. PspA is comprised of diverse clades that arose through recombination.
View Article and Find Full Text PDFBackground: A low genetic diversity in Francisella tularensis has been documented. Current DNA based genotyping methods for typing F. tularensis offer a limited and varying degree of subspecies, clade and strain level discrimination power.
View Article and Find Full Text PDFMicroarray expression analysis is providing unprecedented data on gene expression in humans and mammalian model systems. Although such studies provide a tremendous resource for understanding human disease states, one of the significant challenges is cross-referencing the data derived from different species, across diverse expression analysis platforms, in order to properly derive inferences regarding gene expression and disease state. To address this problem, we have developed RESOURCERER, a microarray-resource annotation and cross-reference database built using the analysis of expressed sequence tags (ESTs) and gene sequences provided by the TIGR Gene Index (TGI) and TIGR Orthologous Gene Alignment (TOGA) databases [now called Eukaryotic Gene Orthologs (EGO)].
View Article and Find Full Text PDFMol Biochem Parasitol
November 2005
Despite the significance of Plasmodium vivax as the most widespread human malaria parasite and a major public health problem, gene expression in this parasite is poorly understood. To accelerate gene discovery and facilitate the annotation phase of the P. vivax genome project, we have undertaken a transcriptome approach to study gene expression in the mixed blood stages of a P.
View Article and Find Full Text PDFUnlabelled: MeSHer uses a simple statistical approach to identify biological concepts in the form of Medical Subject Headings (MeSH terms) obtained from the PubMed database that are significantly overrepresented within the identified gene set relative to those associated with the overall collection of genes on the underlying DNA microarray platform. As a demonstration, we apply this approach to gene lists acquired from a published study of the effects of angiotensin II (Ang II) treatment on cardiac gene expression and demonstrate that this approach can aid in the interpretation of the resulting 'significant' gene set.
Availability: The software is available at http://www.
Although the list of completed genome sequencing projects has expanded rapidly, sequencing and analysis of expressed sequence tags (ESTs) remain a primary tool for discovery of novel genes in many eukaryotes and a key element in genome annotation. The TIGR Gene Indices (http://www.tigr.
View Article and Find Full Text PDFCytogenet Genome Res
June 2004
Expressed sequence tag (EST) projects have produced extremely valuable resources for identifying genes affecting phenotypes of interest. A large-scale EST sequencing project for rainbow trout was initiated to identify and functionally annotate as many unique transcripts as possible. Over 45,000 5' ESTs were obtained by sequencing clones from a single normalized library constructed using mRNA from six tissues.
View Article and Find Full Text PDFApproximately 80% of the maize genome comprises highly repetitive sequences interspersed with single-copy, gene-rich sequences, and standard genome sequencing strategies are not readily adaptable to this type of genome. Methodologies that enrich for genic sequences might more rapidly generate useful results from complex genomes. Equivalent numbers of clones from maize selected by techniques called methylation filtering and High C0t selection were sequenced to generate approximately 200,000 reads (approximately 132 megabases), which were assembled into contigs.
View Article and Find Full Text PDFTGICL is a pipeline for analysis of large Expressed Sequence Tags (EST) and mRNA databases in which the sequences are first clustered based on pairwise sequence similarity, and then assembled by individual clusters (optionally with quality values) to produce longer, more complete consensus sequences. The system can run on multi-CPU architectures including SMP and PVM.
View Article and Find Full Text PDFThe cultivated potato (Solanum tuberosum) shares similar biology with other members of the Solanaceae, yet has features unique within the family, such as modified stems (stolons) that develop into edible tubers. To better understand potato biology, we have undertaken a survey of the potato transcriptome using expressed sequence tags (ESTs) from diverse tissues. A total of 61,940 ESTs were generated from aerial tissues, below-ground tissues, and tissues challenged with the late-blight pathogen (Phytophthora infestans).
View Article and Find Full Text PDFComparative genomics promises to rapidly accelerate the identification and functional classification of biologically important human genes. We developed the TIGR Orthologous Gene Alignment (TOGA;
An essential component of functional genomics studies is the sequence of DNA expressed in tissues of interest. To provide a resource of bovine-specific expressed sequence data and facilitate this powerful approach in cattle research, four normalized cDNA libraries were produced and arrayed for high-throughput sequencing. The libraries were made with RNA pooled from multiple tissues to increase efficiency of normalization and maximize the number of independent genes for which sequence data were obtained.
View Article and Find Full Text PDF