Publications by authors named "Boris E Shakhnovich"

Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales--from sequences to organisms and populations.

View Article and Find Full Text PDF

In this work we develop a microscopic physical model of early evolution where phenotype--organism life expectancy--is directly related to genotype--the stability of its proteins in their native conformations-which can be determined exactly in the model. Simulating the model on a computer, we consistently observe the "Big Bang" scenario whereby exponential population growth ensues as soon as favorable sequence-structure combinations (precursors of stable proteins) are discovered. Upon that, random diversity of the structural space abruptly collapses into a small set of preferred proteins.

View Article and Find Full Text PDF

Although rigorous measures of similarity for sequence and structure are now well established, the problem of defining functional relationships has been particularly daunting. Here, we present several manifold embedding techniques to compute distances between Gene Ontology (GO) functional annotations and consequently estimate functional distances between protein domains. To evaluate accuracy, we correlate the functional distance to the well established measures of sequence, structural, and phylogenetic similarities.

View Article and Find Full Text PDF

Computational prediction of nucleotide binding specificity for transcription factors remains a fundamental and largely unsolved problem. Determination of binding positions is a prerequisite for research in gene regulation, a major mechanism controlling phenotypic diversity. Furthermore, an accurate determination of binding specificities from high-throughput data sources is necessary to realize the full potential of systems biology.

View Article and Find Full Text PDF

Understanding transcription factor (TF) mediated control of gene expression remains a major challenge at the interface of computational and experimental biology. Computational techniques predicting TF-binding site specificity are frequently unreliable. On the other hand, comprehensive experimental validation is difficult and time consuming.

View Article and Find Full Text PDF

Recent investigations of high-throughput genomic and phenomic data have uncovered a variety of significant but relatively weak correlations between a gene's functional and evolutionary characteristics. In particular, essential genes and genes with paralogs have a slight propensity to evolve more slowly than nonessential genes and singletons, respectively. However, given the weakness and multiplicity of these associations, their biological relevance remains uncertain.

View Article and Find Full Text PDF

Analysis of increasingly saturated sequence databases have shown that gene family sizes are highly skewed with many families being small and few containing many, far-diverged homologs. Additionally, recently published results have identified a structural determinant of mutational plasticity: designability that correlates strongly with gene family size. In this paper, we explore the possible links between the two observations, exploring the possible effect of designability on duplication and divergence.

View Article and Find Full Text PDF

Genome scale identification of transcription factor binding sites (TFBS) is fundamental to understanding the complexities of mRNA expression at both the cell and organismal levels. While high-throughput experimental methods provide associations between transcription factors and the genes they regulate under a specified experimental condition, computational methods are still required to pinpoint the exact location of binding. Moreover, since the binding site is an intrinsic property of the promoter region, computational methods are in principle more general than condition dependent experimental methods.

View Article and Find Full Text PDF

Understanding the relationship between protein structure and function is one of the foremost challenges in post-genomic biology. Higher conservation of structure could, in principle, allow researchers to extend current limitations of annotation. However, despite significant research in the area, a precise and quantitative relationship between biochemical function and protein structure has been elusive.

View Article and Find Full Text PDF

Understanding the observed variability in the number of homologs of a gene is a very important unsolved problem that has broad implications for research into coevolution of structure and function, gene duplication, pseudogene formation, and possibly for emerging diseases. Here, we attempt to define and elucidate some possible causes behind the observed irregularity in sequence space. We present evidence that sequence variability and functional diversity of a gene or fold family is influenced by quantifiable characteristics of the protein structure.

View Article and Find Full Text PDF

A question of fundamental importance is the definition and identification of modules from microarray experiments. A wide variety of techniques have been used to gain insight into the elucidation of such modules. One problem, however, is the inability to directly compare results between the different data sets produced due to the inherent parameterizations of their approaches.

View Article and Find Full Text PDF

ELISA (http://romi.bu.edu/elisa/) is a database that was designed for flexibility in defining interesting queries about protein domain evolution.

View Article and Find Full Text PDF

Since the advent of investigations into structural genomics, research has focused on correctly identifying domain boundaries, as well as domain similarities and differences in the context of their evolutionary relationships. As the science of structural genomics ramps up adding more and more information into the databanks, questions about the accuracy and completeness of our classification and annotation systems appear on the forefront of this research. A central question of paramount importance is how structural similarity relates to functional similarity.

View Article and Find Full Text PDF

We attempt to understand the evolutionary origin of protein folds by simulating their divergent evolution with a three-dimensional lattice model. Starting from an initial seed lattice structure, evolution of model proteins progresses by sequence duplication and subsequent point mutations. A new gene's ability to fold into a stable and unique structure is tested each time through direct kinetic folding simulations.

View Article and Find Full Text PDF

Unlabelled: The problem of functional annotation based on homology modeling is primary to current bioinformatics research. Researchers have noted regularities in sequence, structure and even chromosome organization that allow valid functional cross-annotation. However, these methods provide a lot of false negatives due to limited specificity inherent in the system.

View Article and Find Full Text PDF

An open question of great interest in biophysics is whether variations in structure cause protein folds to differ in the number of amino acid sequences that can fold to them stably, i.e., in their designability.

View Article and Find Full Text PDF

Using structural similarity clustering of protein domains: protein domain universe graph (PDUG), and a hierarchical functional annotation: gene ontology (GO) as two evolutionary lenses, we find that each structural cluster (domain fold) exhibits a distribution of functions that is unique to it. These functional distributions are functional fingerprints that are specific to characteristic structural clusters and vary from cluster to cluster. Furthermore, as structural similarity threshold for domain clustering in the PDUG is relaxed we observe an influx of earlier-diverged domains into clusters.

View Article and Find Full Text PDF