Publications by authors named "Eugene Koonin"

Despite the practically unlimited number of possible protein sequences, the number of basic shapes in which proteins fold seems not only to be finite, but also to be relatively small, with probably no more than 10,000 folds in existence. Moreover, the distribution of proteins among these folds is highly non-homogeneous -- some folds and superfamilies are extremely abundant, but most are rare. Protein folds and families encoded in diverse genomes show similar size distributions with notable mathematical properties, which also extend to the number of connections between domains in multidomain proteins.

View Article and Find Full Text PDF

Background: In general, the length of a protein sequence is determined by its function and the wide variance in the lengths of an organism's proteins reflects the diversity of specific functional roles for these proteins. However, additional evolutionary forces that affect the length of a protein may be revealed by studying the length distributions of proteins evolving under weaker functional constraints.

Results: We performed sequence comparisons to distinguish highly conserved and poorly conserved proteins from the bacterium Escherichia coli, the archaeon Archaeoglobus fulgidus, and the eukaryotes Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens.

View Article and Find Full Text PDF

Background: Power distributions appear in numerous biological, physical and other contexts, which appear to be fundamentally different. In biology, power laws have been claimed to describe the distributions of the connections of enzymes and metabolites in metabolic networks, the number of interactions partners of a given protein, the number of members in paralogous families, and other quantities. In network analysis, power laws imply evolution of the network with preferential attachment, i.

View Article and Find Full Text PDF

Prokaryotic genomes are considered to be 'wall-to-wall' genomes, which consist largely of genes for proteins and structural RNAs, with only a small fraction of the genomic DNA allotted to intergenic regions, which are thought to typically contain regulatory signals. The majority of bacterial and archaeal genomes contain 6-14% non-coding DNA. Significant positive correlations were detected between the fraction of non-coding DNA and inter- and intra-operonic distances, suggesting that different classes of non-coding DNA evolve congruently.

View Article and Find Full Text PDF

In this article we report the initial biochemical, genetic, and electron microscopic analysis of a previously uncharacterized, 8.9-kDa, predicted thiol-redox protein. The name A2.

View Article and Find Full Text PDF

The Escherichia coli protein YjeQ represents a protein family whose members are broadly conserved in bacteria and have been shown to be indispensable to the growth of E. coli and Bacillus subtilis [Arigoni, F., et al.

View Article and Find Full Text PDF

The familial Alzheimer's disease gene products, presenilin-1 and presenilin-2 (PS1 and PS2), are involved in amyloid beta-protein precursor processing (AbetaPP), Notch receptor signaling, and programmed cell death. However, the molecular mechanisms by which presenilins regulate these processes remain unknown. Clues about the function of a protein can be obtained by seeing whether it interacts with another protein of known function.

View Article and Find Full Text PDF

COP9 signalosome (CSN) cleaves the ubiquitin-like protein Nedd8 from the Cul1 subunit of SCF ubiquitin ligases. The Jab1/MPN domain metalloenzyme (JAMM) motif in the Jab1/Csn5 subunit was found to underlie CSN's Nedd8 isopeptidase activity. JAMM is found in proteins from archaea, bacteria, and eukaryotes, including the Rpn11 subunit of the 26S proteasome.

View Article and Find Full Text PDF

The 26S proteasome mediates degradation of ubiquitin-conjugated proteins. Although ubiquitin is recycled from proteasome substrates, the molecular basis of deubiquitination at the proteasome and its relation to substrate degradation remain unknown. The Rpn11 subunit of the proteasome lid subcomplex contains a highly conserved Jab1/MPN domain-associated metalloisopeptidase (JAMM) motif-EX(n)HXHX(10)D.

View Article and Find Full Text PDF

Genome comparisons indicate that horizontal gene transfer and differential gene loss are major evolutionary phenomena that, at least in prokaryotes, involve a large fraction, if not the majority, of genes. The extent of these events casts doubt on the feasibility of constructing a 'Tree of Life', because the trees for different genes often tell different stories. However, alternative approaches to tree construction that attempt to determine tree topology on the basis of comparisons of complete gene sets seem to reveal a phylogenetic signal that supports the three-domain evolutionary scenario and suggests the possibility of delineation of previously undetected major clades of prokaryotes.

View Article and Find Full Text PDF

The availability of multiple complete genome sequences from the same species can facilitate attempts to systematically address basic questions in genome evolution. We refer to such efforts as "microevolutionary genomics". We report the results of comparative analyses of complete intraspecific genome (and proteome) sequences from four bacterial species--Chlamydophila pneumoniae, Escherichia coli, Helicobacter pylori and Neisseria meningitidis.

View Article and Find Full Text PDF

A previously undetected domain with a CxCx(n)CxH pattern of predicted zinc-chelating residues was identified in a variety of prokaryotic and eukaryotic proteins. These include bacterial ATPases of the SWI2/SNF2 family, plant MuDR transposases and transposase-derived Far1 nuclear proteins, and vertebrate MEK kinase-1. This domain was designated SWIM after SWI2/SNF2 and MuDR, and is predicted to have DNA-binding and protein-protein interaction functions in different contexts.

View Article and Find Full Text PDF

Transcription is a slow and expensive process: in eukaryotes, approximately 20 nucleotides can be transcribed per second at the expense of at least two ATP molecules per nucleotide. Thus, at least for highly expressed genes, transcription of long introns, which are particularly common in mammals, is costly. Using data on the expression of genes that encode proteins in Caenorhabditis elegans and Homo sapiens, we show that introns in highly expressed genes are substantially shorter than those in genes that are expressed at low levels.

View Article and Find Full Text PDF

Complementary developments in comparative genomics, protein structure determination and in-depth comparison of protein sequences and structures have provided a better understanding of the prevailing trends in the emergence and diversification of protein domains. The investigation of deep relationships among different classes of proteins involved in key cellular functions, such as nucleic acid polymerases and other nucleotide-dependent enzymes, indicates that a substantial set of diverse protein domains evolved within the primordial, ribozyme-dominated RNA world.

View Article and Find Full Text PDF

A computational procedure was developed for systematic detection of lineage-specific expansions (LSEs) of protein families in sequenced genomes and applied to obtain a census of LSEs in five eukaryotic species, the yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe, the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster, and the green plant Arabidopsis thaliana. A significant fraction of the proteins encoded in each of these genomes, up to 80% in A. thaliana, belong to LSEs.

View Article and Find Full Text PDF

Background: Gene fusions can be used as tools for functional prediction and also as evolutionary markers. Fused genes often show a scattered phyletic distribution, which suggests a role for processes other than vertical inheritance in their evolution.

Results: The evolutionary history of gene fusions was studied by phylogenetic analysis of the domains in the fused proteins and the orthologous domains that form stand-alone proteins.

View Article and Find Full Text PDF

In overlapping genes, the same DNA sequence codes for two proteins using different reading frames. Analysis of overlapping genes can help in understanding the mode of evolution of a coding region from noncoding DNA. We identified 71 pairs of convergent genes, with overlapping 3' ends longer than 15 nucleotides, that are conserved in at least two prokaryotic genomes.

View Article and Find Full Text PDF

The "knockout-rate" prediction holds that essential genes should be more evolutionarily conserved than are nonessential genes. This is because negative (purifying) selection acting on essential genes is expected to be more stringent than that for nonessential genes, which are more functionally dispensable and/or redundant. However, a recent survey of evolutionary distances between Saccharomyces cerevisiae and Caenorhabditis elegans proteins did not reveal any difference between the rates of evolution for essential and nonessential genes.

View Article and Find Full Text PDF

Protein sequence and structure comparisons show that the catalytic domains of Class I aminoacyl-tRNA synthetases, a related family of nucleotidyltransferases involved primarily in coenzyme biosynthesis, nucleotide-binding domains related to the UspA protein (USPA domains), photolyases, electron transport flavoproteins, and PP-loop-containing ATPases together comprise a distinct class of alpha/beta domains designated the HUP domain after HIGH-signature proteins, UspA, and PP-ATPase. Several lines of evidence are presented to support the monophyly of the HUP domains, to the exclusion of other three-layered alpha/beta folds with the generic "Rossmann-like" topology. Cladistic analysis, with patterns of structural and sequence similarity used as discrete characters, identified three major evolutionary lineages within the HUP domain class: the PP-ATPases; the HIGH superfamily, which includes class I aaRS and related nucleotidyltransferases containing the HIGH signature in their nucleotide-binding loop; and a previously unrecognized USPA-like group, which includes USPA domains, electron transport flavoproteins, and photolyases.

View Article and Find Full Text PDF

The lipoyl-binding domain is often present, in one or several copies, in the E2 subunit and, less often, in the E1 and E3 subunits of 2-oxo acid dehydrogenase complexes. Phylogenetic analysis shows evidence of multiple, independent intragenomic recombination events between different versions of the lipoyl-binding domain in various bacteria and eukaryotic mitochondria, leading to homogenization of the sequences of the lipoyl-binding domain within the same enzymatic complex in several bacterial lineages. This appears to be the first case of sequence homogenization at the level of an individual domain in prokaryotes.

View Article and Find Full Text PDF

A computational method was developed for delineating connected gene neighborhoods in bacterial and archaeal genomes. These gene neighborhoods are not typically present, in their entirety, in any single genome, but are held together by overlapping, partially conserved gene arrays. The procedure was applied to comparing the orders of orthologous genes, which were extracted from the database of Clusters of Orthologous Groups of proteins (COGs), in 31 prokaryotic genomes and resulted in the identification of 188 clusters of gene arrays, which included 1001 of 2890 COGs.

View Article and Find Full Text PDF

We show that three cytoplasmic thiol oxidoreductases encoded by vaccinia virus comprise a complete pathway for formation of disulfide bonds in intracellular virion membrane proteins. The pathway was defined by analyzing conditional lethal mutants and effects of cysteine to serine substitutions and by trapping disulfide-bonded heterodimer intermediates for each consecutive step. The upstream component, E10R, belongs to the ERV1/ALR family of FAD-containing sulfhydryl oxidases that use oxygen as the electron acceptor.

View Article and Find Full Text PDF

We have determined the complete 1,694,969-nt sequence of the GC-rich genome of Methanopyrus kandleri by using a whole direct genome sequencing approach. This approach is based on unlinking of genomic DNA with the ThermoFidelase version of M. kandleri topoisomerase V and cycle sequencing directed by 2'-modified oligonucleotides (Fimers).

View Article and Find Full Text PDF