Publications by authors named "Carol A Rohl"

We developed PolyA-seq, a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts, and used it to globally map polyadenylation (polyA) sites in 24 matched tissues in human, rhesus, dog, mouse, and rat. We show that PolyA-seq is as accurate as existing RNA sequencing (RNA-seq) approaches for digital gene expression (DGE), enabling simultaneous mapping of polyA sites and quantitative measurement of their usage. In human, we confirmed 158,533 known sites and discovered 280,857 novel sites (FDR < 2.

View Article and Find Full Text PDF

Background: Identifying associations between genotypes and gene expression levels using microarrays has enabled systematic interrogation of regulatory variation underlying complex phenotypes. This approach has vast potential for functional characterization of disease states, but its prohibitive cost, given hundreds to thousands of individual samples from populations have to be genotyped and expression profiled, has limited its widespread application.

Results: Here we demonstrate that genomic regions with allele-specific expression (ASE) detected by sequencing cDNA are highly enriched for cis-acting expression quantitative trait loci (cis-eQTL) identified by profiling of 500 animals in parallel, with up to 90% agreement on the allele that is preferentially expressed.

View Article and Find Full Text PDF

Non-coding RNAs (ncRNAs) are an essential class of molecular species that have been difficult to monitor on high throughput platforms due to frequent lack of polyadenylation. Using a polyadenylation-neutral amplification protocol and next-generation sequencing, we explore ncRNA expression in eleven human tissues. ncRNAs 7SL, U2, 7SK, and HBII-52 are expressed at levels far exceeding mRNAs.

View Article and Find Full Text PDF

Background: DNA copy number variations occur within populations and aberrations can cause disease. We sought to develop an improved lab-automatable, cost-efficient, accurate platform to profile DNA copy number.

Results: We developed a sequencing-based assay of nuclear, mitochondrial, and telomeric DNA copy number that draws on the unbiased nature of next-generation sequencing and incorporates techniques developed for RNA expression profiling.

View Article and Find Full Text PDF

We developed a procedure for the preparation of whole transcriptome cDNA libraries depleted of ribosomal RNA from only 1 microg of total RNA. The method relies on a collection of short, computationally selected oligonucleotides, called 'not-so-random' (NSR) primers, to obtain full-length, strand-specific representation of nonribosomal RNA transcripts. In this study we validated the technique by profiling human whole brain and universal human reference RNA using ultra-high-throughput sequencing.

View Article and Find Full Text PDF

Background: Housekeeping genes (HKG) are constitutively expressed in all tissues while tissue-enriched genes (TEG) are expressed at a much higher level in a single tissue type than in others. HKGs serve as valuable experimental controls in gene and protein expression experiments, while TEGs tend to represent distinct physiological processes and are frequently candidates for biomarkers or drug targets. The genomic features of these two groups of genes expressed in opposing patterns may shed light on the mechanisms by which cells maintain basic and tissue-specific functions.

View Article and Find Full Text PDF

Motivation: Our focus has been on detecting topological properties that are rare in real proteins, but occur more frequently in models generated by protein structure prediction methods such as Rosetta. We previously created the Knotfind algorithm, successfully decreasing the frequency of knotted Rosetta models during CASP6. We observed an additional class of knot-like loops that appeared to be equally un-protein-like and yet do not contain a mathematical knot.

View Article and Find Full Text PDF

Genetic variants that are associated with common human diseases do not lead directly to disease, but instead act on intermediate, molecular phenotypes that in turn induce changes in higher-order disease traits. Therefore, identifying the molecular phenotypes that vary in response to changes in DNA and that also associate with changes in disease traits has the potential to provide the functional information required to not only identify and validate the susceptibility genes that are directly affected by changes in DNA, but also to understand the molecular networks in which such genes operate and how changes in these networks lead to changes in disease traits. Toward that end, we profiled more than 39,000 transcripts and we genotyped 782,476 unique single nucleotide polymorphisms (SNPs) in more than 400 human liver samples to characterize the genetic architecture of gene expression in the human liver, a metabolically active tissue that is important in a number of common human diseases, including obesity, diabetes, and atherosclerosis.

View Article and Find Full Text PDF

Motivation: Knots in polypeptide chains have been found in very few proteins, and consequently should be generally avoided in protein structure prediction methods. Most effective structure prediction methods do not model the protein folding process itself, but rather seek only to correctly obtain the final native state. Consequently, the mechanisms that prevent knots from occurring in native proteins are not relevant to the modeling process, and as a result, knots can occur with significantly higher frequency in protein models.

View Article and Find Full Text PDF

We have developed a method that combines the ROSETTA de novo protein folding and refinement protocol with distance constraints derived from homologous structures to build homology models that are frequently more accurate than their templates. We test this method by building complete-chain models for a benchmark set of 22 proteins, each with 1 or 2 candidate templates, for a total of 39 test cases. We use structure-based and sequence-based alignments for each of the test cases.

View Article and Find Full Text PDF

The Robetta server and revised automatic protocols were used to predict structures for CASP6 targets. Robetta is a publicly available protein structure prediction server (http://robetta.bakerlab.

View Article and Find Full Text PDF

RosettaNMR combines the Rosetta de novo structure prediction method with limited NMR experimental data for rapid estimation of protein structure. The de novo Rosetta algorithm predicts protein three-dimensional structures using only sequence information by combining short fragments selected from known protein structures on the basis of local sequence similarity. These fragments are assembled using a Monte Carlo strategy to generate models that reproduce empirical statistics describing nonlocal protein structure such as overall compactness, hydrophobic burial, and beta-strand pairing.

View Article and Find Full Text PDF

A major limitation of current comparative modeling methods is the accuracy with which regions that are structurally divergent from homologues of known structure can be modeled. Because structural differences between homologous proteins are responsible for variations in protein function and specificity, the ability to model these differences has important functional consequences. Although existing methods can provide reasonably accurate models of short loop regions, modeling longer structurally divergent regions is an unsolved problem.

View Article and Find Full Text PDF

Robetta is a fully automated protein structure prediction server that uses the Rosetta fragment-insertion method. It combines template-based and de novo structure prediction methods in an attempt to produce high quality models that cover every residue of a submitted sequence. The first step in the procedure is the automatic detection of the locations of domains and selection of the appropriate modeling protocol for each domain.

View Article and Find Full Text PDF

We describe predictions of the structures of CASP5 targets using Rosetta. The Rosetta fragment insertion protocol was used to generate models for entire target domains without detectable sequence similarity to a protein of known structure and to build long loop insertions (and N-and C-terminal extensions) in cases where a structural template was available. Encouraging results were obtained both for the de novo predictions and for the long loop insertions; we describe here the successes as well as the failures in the context of current efforts to improve the Rosetta method.

View Article and Find Full Text PDF

We have improved the original Rosetta centroid/backbone decoy set by increasing the number of proteins and frequency of near native models and by building on sidechains and minimizing clashes. The new set consists of 1,400 model structures for 78 different and diverse protein targets and provides a challenging set for the testing and evaluation of scoring functions. We evaluated the extent to which a variety of all-atom energy functions could identify the native and close-to-native structures in the new decoy sets.

View Article and Find Full Text PDF

Protein-protein docking algorithms provide a means to elucidate structural details for presently unknown complexes. Here, we present and evaluate a new method to predict protein-protein complexes from the coordinates of the unbound monomer components. The method employs a low-resolution, rigid-body, Monte Carlo search followed by simultaneous optimization of backbone displacement and side-chain conformations using Monte Carlo minimization.

View Article and Find Full Text PDF

Very short alanine peptide helices can be studied in a fixed-nucleus, helix-forming system [Siedlicka, M., Goch, G., Ejchart, A.

View Article and Find Full Text PDF

We use the Rosetta de novo structure prediction method to produce three-dimensional structure models for all Pfam-A sequence families with average length under 150 residues and no link to any protein of known structure. To estimate the reliability of the predictions, the method was calibrated on 131 proteins of known structure. For approximately 60% of the proteins one of the top five models was correctly predicted for 50 or more residues, and for approximately 35%, the correct SCOP superfamily was identified in a structure-based search of the Protein Data Bank using one of the models.

View Article and Find Full Text PDF

As genome-sequencing projects rapidly increase the database of protein sequences, the gap between known sequences and known structures continues to grow exponentially, increasing the demand to accelerate structure determination methods. Residual dipolar couplings (RDCs) are an attractive source of experimental restraints for NMR structure determination, particularly rapid, high-throughput methods, because they yield both local and long-range orientational information and can be easily measured and assigned once the backbone resonances of a protein have been assigned. While very extensive RDC data sets have been used to determine the structure of ubiquitin, it is unclear to what extent such methods will generalize to larger proteins with less complete data sets.

View Article and Find Full Text PDF

New methods for determining chemical structures from residual dipolar couplings are presented. The fundamental dipolar coupling equation is converted to an elliptical equation in the principal alignment frame. This elliptical equation is then combined with other angular or dipolar coupling constraints to form simple polynomial equations that define discrete solutions for the unit vector(s).

View Article and Find Full Text PDF