The rapid loss of reef-building corals owing to ocean warming is driving the development of interventions such as coral propagation and restoration, selective breeding and assisted gene flow. Many of these interventions target naturally heat-tolerant individuals to boost climate resilience, but the challenges of quickly and reliably quantifying heat tolerance and identifying thermotolerant individuals have hampered implementation. Here, we used coral bleaching automated stress systems to perform rapid, standardized heat tolerance assays on 229 colonies of across six coral nurseries spanning Florida's Coral Reef, USA.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
June 2013
Biological process enrichment is a widely used metric for evaluating the quality of multiprotein modules. In this study, we examine possible optimization criteria for detecting homologous multiprotein modules and quantify their effects on biological process enrichment. We find that modularity, linear density, and module size are the most important criteria considered, complementary to each other, and that graph theoretical attributes account for 36% of the variance in biological process enrichment.
View Article and Find Full Text PDFRecently, we reported the spectroscopic and kinetic characterizations of cytochrome P450 compound I in CYP119A1, effectively closing the catalytic cycle of cytochrome P450-mediated hydroxylations. In this minireview, we focus on the developments that made this breakthrough possible. We examine the importance of enzyme purification in the quest for reactive intermediates and report the preparation of compound I in a second P450 (P450ST).
View Article and Find Full Text PDFSince the first emergence of protein-protein interaction networks more than a decade ago, they have been viewed as static scaffolds of the signaling-regulatory events taking place in cells, and their analysis has been mainly confined to topological aspects. Recently, functional models of these networks have been suggested, ranging from Boolean to constraint-based methods. However, learning such models from large-scale data remains a formidable task, and most modeling approaches rely on extensive human curation.
View Article and Find Full Text PDFPedigree graphs, or family trees, are typically constructed by an expensive process of examining genealogical records to determine which pairs of individuals are parent and child. New methods to automate this process take as input genetic data from a set of extant individuals and reconstruct ancestral individuals. There is a great need to evaluate the quality of these methods by comparing the estimated pedigree to the true pedigree.
View Article and Find Full Text PDFCan we find the family trees, or pedigrees, that relate the haplotypes of a group of individuals? Collecting the genealogical information for how individuals are related is a very time-consuming and expensive process. Methods for automating the construction of pedigrees could stream-line this process. While constructing single-generation families is relatively easy given whole genome data, reconstructing multi-generational, possibly inbred, pedigrees is much more challenging.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
October 2012
Detecting essential multiprotein modules that change infrequently during evolution is a challenging algorithmic task that is important for understanding the structure, function, and evolution of the biological cell. In this paper, we define a measure of modularity for interactomes and present a linear-time algorithm, Produles, for detecting multiprotein modularity conserved during evolution that improves on the running time of previous algorithms for related problems and offers desirable theoretical guarantees. We present a biologically motivated graph theoretic set of evaluation measures complementary to previous evaluation measures, demonstrate that Produles exhibits good performance by all measures, and describe certain recurrent anomalies in the performance of previous algorithms that are not detected by previous measures.
View Article and Find Full Text PDFBackground: Molecular studies of the human disease transcriptome typically involve a search for genes whose expression is significantly dysregulated in sick individuals compared to healthy controls. Recent studies have found that only a small number of the genes in human disease-related pathways show consistent dysregulation in sick individuals. However, those studies found that some pathway genes are affected in most sick individuals, but genes can differ among individuals.
View Article and Find Full Text PDFDespite the desirable information contained in complex pedigree data sets, analysis methods struggle to efficiently process these data. The attractiveness of pedigree data is their power for detecting rare variants, particularly in comparison with studies of unrelated individuals. In addition, rather than assuming individuals in a study are unrelated, knowledge of their relationships can avoid spurious results due to confounding population structure effects.
View Article and Find Full Text PDFIn the network querying problem, one is given a protein complex or pathway of species A and a protein-protein interaction network of species B; the goal is to identify subnetworks of B that are similar to the query in terms of sequence, topology, or both. Existing approaches mostly depend on knowledge of the interaction topology of the query in the network of species A; however, in practice, this topology is often not known. To address this problem, we develop a topology-free querying algorithm, which we call Torque.
View Article and Find Full Text PDFThis work demonstrates how gene association studies can be analyzed to map a global landscape of genetic interactions among protein complexes and pathways. Despite the immense potential of gene association studies, they have been challenging to analyze because most traits are complex, involving the combined effect of mutations at many different genes. Due to lack of statistical power, only the strongest single markers are typically identified.
View Article and Find Full Text PDFTORQUE is a tool for cross-species querying of protein-protein interaction networks. It aims to answer the following question: given a set of proteins constituting a known complex or a pathway in one species, can a similar complex or pathway be found in the protein network of another species? To this end, Torque seeks a matching set of proteins that are sequence similar to the query proteins and span a connected region of the target network, while allowing for both insertions and deletions. Unlike existing approaches, TORQUE does not require knowledge of the interconnections among the query proteins.
View Article and Find Full Text PDFThe central questions asked in whole-genome association studies are how to locate associated regions in the genome and how to estimate the significance of these findings. Researchers usually do this by testing each SNP separately for association and then applying a suitable correction for multiple-hypothesis testing. However, SNPs are correlated by the unobserved genealogy of the population, and a more powerful statistical methodology would attempt to take this genealogy into account.
View Article and Find Full Text PDFAnalysis of expression quantitative trait loci (eQTLs) is an emerging technique in which individuals are genotyped across a panel of genetic markers and, simultaneously, phenotyped using DNA microarrays. Because of the spacing of markers and linkage disequilibrium, each marker may be near many genes making it difficult to finely map which of these genes are the causal factors responsible for the observed changes in the downstream expression. To address this challenge, we present an efficient method for prioritizing candidate genes at a locus.
View Article and Find Full Text PDFPopulation stratification can be a serious obstacle in the analysis of genomewide association studies. We propose a method for evaluating the significance of association scores in whole-genome cohorts with stratification. Our approach is a randomization test akin to a standard permutation test.
View Article and Find Full Text PDFMotivation: The search for genetic variants that are linked to complex diseases such as cancer, Parkinson's;, or Alzheimer's; disease, may lead to better treatments. Since haplotypes can serve as proxies for hidden variants, one method of finding the linked variants is to look for case-control associations between the haplotypes and disease. Finding these associations requires a high-quality estimation of the haplotype frequencies in the population.
View Article and Find Full Text PDFJ Comput Biol
September 2007
We present a method that compares the protein interaction networks of two species to detect functionally similar (conserved) protein modules between them. The method is based on an algorithm we developed to identify matching subgraphs between two graphs. Unlike previous network comparison methods, our algorithm has provable guarantees on correctness and efficiency.
View Article and Find Full Text PDFProc IEEE Comput Soc Bioinform Conf
August 2006
In the early 1990s, after more than three decades of studying algorithms within the frame work of theoretical computer science, I shifted my focus to alogrithmic problems arising in genomics. There is a fundamental difference between the views of algorithms in the two fields: in theoretical computer science the input-output behavior of an algorithm is rigorously specified in advance, whereas in computational biology an algorithm is merely a vehicle for discovering Nature's ground truth. In order to be effective in computational genomics I have had to radically change my approach to research.
View Article and Find Full Text PDFThe interpretation of large-scale protein network data depends on our ability to identify significant substructures in the data, a computationally intensive task. Here we adapt and extend efficient techniques for finding paths and trees in graphs to the problem of identifying pathways in protein interaction networks. We present linear-time algorithms for finding paths and trees in networks under several biologically motivated constraints.
View Article and Find Full Text PDFProc IEEE Comput Soc Bioinform Conf
August 2006
The complexity of the global organization and internal structures of motifs in higher eukaryotic organisms raises significant challenges for motif detection techniques. To achieve successful de novo motif detection it is necessary to model the complex dependencies within and among motifs and incorporate biological prior knowledge. In this paper, we present LOGOS, an integrated LOcal and GlObal motif Sequence model for biopolymer sequences, which provides a principled framework for developing, modularizing, extending and computing expressive motif models for complex biopolymer sequence analysis.
View Article and Find Full Text PDFMounting evidence shows that many protein complexes are conserved in evolution. Here we use conservation to find complexes that are common to the yeast S. cerevisiae and the bacteria H.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
February 2005
To elucidate cellular machinery on a global scale, we performed a multiple comparison of the recently available protein-protein interaction networks of Caenorhabditis elegans, Drosophila melanogaster, and Saccharomyces cerevisiae. This comparison integrated protein interaction and sequence information to reveal 71 network regions that were conserved across all three species and many exclusive to the metazoans. We used this conservation, and found statistically significant support for 4,645 previously undescribed protein functions and 2,609 previously undescribed protein interactions.
View Article and Find Full Text PDFJ Bioinform Comput Biol
April 2003
Each person's genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person's genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a haplotype.
View Article and Find Full Text PDFWe study a design and optimization problem that occurs, for example, when single nucleotide polymorphisms (SNPs) are to be genotyped using a universal DNA tag array. The problem of optimizing the universal array to avoid disruptive cross-hybridization between universal components of the system was addressed in previous work. Cross-hybridization can, however, also occur assay specifically, due to unwanted complementarity involving assay-specific components.
View Article and Find Full Text PDFThe complexity of the global organization and internal structure of motifs in higher eukaryotic organisms raises significant challenges for motif detection techniques. To achieve successful de novo motif detection, it is necessary to model the complex dependencies within and among motifs and to incorporate biological prior knowledge. In this paper, we present LOGOS, an integrated LOcal and GlObal motif Sequence model for biopolymer sequences, which provides a principled framework for developing, modularizing, extending and computing expressive motif models for complex biopolymer sequence analysis.
View Article and Find Full Text PDF