Background: Currently, most genome annotation is curated by centralized groups with limited resources. Efforts to share annotations transparently among multiple groups have not yet been satisfactory.
Results: Here we introduce a concept called the Distributed Annotation System (DAS).
Bioinformatics
September 2001
Motivation: When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that diverged by gene duplication), because duplication enables functional diversification. The utility of phylogenetic information in high-throughput genome annotation ('phylogenomics') is widely recognized, but existing approaches are either manual or indirect (e.g.
View Article and Find Full Text PDFSome genes produce noncoding transcripts that function directly as structural, regulatory, or even catalytic RNAs [1, 2]. Unlike protein-coding genes, which can be detected as open reading frames with distinctive statistical biases, noncoding RNA (ncRNA) gene sequences have no obvious inherent statistical biases [3]. Thus, genome sequence analyses reveal novel protein-coding genes, but any novel ncRNA genes remain invisible.
View Article and Find Full Text PDFGene expression in a developmentally arrested, long-lived dauer population of Caenorhabditis elegans was compared with a nondauer (mixed-stage) population by using serial analysis of gene expression (SAGE). Dauer (152,314) and nondauer (148,324) SAGE tags identified 11,130 of the predicted 19,100 C. elegans genes.
View Article and Find Full Text PDFHigh-density microarrays are useful tools to study gene expression for the purpose of characterizing functional tissue changes in response to the action of drugs and chemicals. To test whether high-density expression data can identify mechanisms of toxicity and to identify an unknown sample through its RNA expression pattern, groups of male Wistar rats were administered 6 hepatotoxicants. The compounds chosen for this study were microcystin-LR (MLR), phenobarbital (PB), lipopolysaccharide (LPS), carbon tetrachloride (CT), thioacetamide (THA), and cyproterone acetate (CPA).
View Article and Find Full Text PDFA Tree Viewer (ATV) is a Java tool for the display and manipulation of annotated phylogenetic trees. It can be utilized both as a standalone application and as an applet in a web browser.
View Article and Find Full Text PDFThe human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
View Article and Find Full Text PDFMotivation: Several results in the literature suggest that biologically interesting RNAs have secondary structures that are more stable than expected by chance. Based on these observations, we developed a scanning algorithm for detecting noncoding RNA genes in genome sequences, using a fully probabilistic version of the Zuker minimum-energy folding algorithm.
Results: Preliminary results were encouraging, but certain anomalies led us to do a carefully controlled investigation of this class of methods.
Motivation: In a previous paper, we presented a polynomial time dynamic programming algorithm for predicting optimal RNA secondary structure including pseudoknots. However, a formal grammatical representation for RNA secondary structure with pseudoknots was still lacking.
Results: Here we show a one-to-one correspondence between that algorithm and a formal transformational grammar.
In eukaryotes, dozens of posttranscriptional modifications are directed to specific nucleotides in ribosomal RNAs (rRNAs) by small nucleolar RNAs (snoRNAs). We identified homologs of snoRNA genes in both branches of the Archaea. Eighteen small sno-like RNAs (sRNAs) were cloned from the archaeon Sulfolobus acidocaldarius by coimmunoprecipitation with archaeal fibrillarin and NOP56, the homologs of eukaryotic snoRNA-associated proteins.
View Article and Find Full Text PDFSome genes produce RNAs that are functional instead of encoding proteins. Noncoding RNA genes are surprisingly numerous. Recently, active research areas include small nucleolar RNAs, antisense riboregulator RNAs, and RNAs involved in X-dosage compensation.
View Article and Find Full Text PDFPfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the WWW in the UK at http://www.sanger.
View Article and Find Full Text PDFSmall nucleolar RNAs (snoRNAs) are required for ribose 2'-O-methylation of eukaryotic ribosomal RNA. Many of the genes for this snoRNA family have remained unidentified in Saccharomyces cerevisiae, despite the availability of a complete genome sequence. Probabilistic modeling methods akin to those used in speech recognition and computational linguistics were used to computationally screen the yeast genome and identify 22 methylation guide snoRNAs, snR50 to snR71.
View Article and Find Full Text PDFWe report on a male Egyptian patient who developed myasthenia gravis with typical symptoms, beneficial response to pyridostigmine, and the presence of anti-acetylcholine receptor antibodies and anti-striated muscle antibodies during the course of a chronic hepatitis C infection complicated by liver cirrhosis. As also reported for the herpes simplex and for the HIV virus, hepatitis C may lead to myasthenia gravis via a mechanism of cross-reactivity between viral epitopes and the acetylcholine receptor.
View Article and Find Full Text PDFJ Mol Biol
February 1999
We describe a dynamic programming algorithm for predicting optimal RNA secondary structure, including pseudoknots. The algorithm has a worst case complexity of O(N6) in time and O(N4) in storage. The description of the algorithm is complex, which led us to adopt a useful graphical representation (Feynman diagrams) borrowed from quantum field theory.
View Article and Find Full Text PDFThe recent literature on profile hidden Markov model (profile HMM) methods and software is reviewed. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system suitable for searching databases for remotely homologous sequences. Profile HMM analyses complement standard pairwise comparison methods for large-scale sequence analysis.
View Article and Find Full Text PDFPfam is a collection of multiple alignments and profile hidden Markov models of protein domain families. Release 3.1 is a major update of the Pfam database and contains 1313 families which are available on the World Wide Web in Europe at http://www.
View Article and Find Full Text PDFNucleic Acids Res
January 1998
Pfam contains multiple alignments and hidden Markov model based profiles (HMM-profiles) of complete protein domains. The definition of domain boundaries, family members and alignment is done semi-automatically based on expert knowledge, sequence similarity, other protein family databases and the ability of HMM-profiles to correctly identify and align the members. Release 2.
View Article and Find Full Text PDFDatabases of multiple sequence alignments are a valuable aid to protein sequence classification and analysis. One of the main challenges when constructing such a database is to simultaneously satisfy the conflicting demands of completeness on the one hand and quality of alignment and domain definitions on the other. The latter properties are best dealt with by manual approaches, whereas completeness in practice is only amenable to automatic methods.
View Article and Find Full Text PDFNucleic Acids Res
March 1997
We describe a program, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases. Two previously described tRNA detection programs are used as fast, first-pass prefilters to identify candidate tRNAs, which are then analyzed by a highly selective tRNA covariance model. This work represents a practical application of RNA covariance models, which are general, probabilistic secondary structure profiles based on stochastic context-free grammars.
View Article and Find Full Text PDFWe report a prediction that the highly immunogenic outer capsid (Hoc) protein of the prokaryotic phage T4 contains three tandem immunoglobulin-like domains. Immunoglobulin-like folds have previously been identified in prokaryotic proteins but these share no recognizable sequence similarity with eukaryotic immunoglobulin superfamily (IgSF) folds, and may represent products of convergent evolution. In contrast, the Hoc immunoglobulin-like folds are proposed, based on immunoglobulin-like sequence consensus matches detected by hidden Markov modeling.
View Article and Find Full Text PDFThis study was conducted to provide data on the pharmacokinetics of [14C]metosulam (N-[2,6-dichloro-3-methylphenyl]-5,7-dimethoxy-1,2,4-triazolo-[1,5a]- pyrimidine-2-sulfonamide). Groups of male Sprague-Dawley rats, CD-1 mice and Beagle dogs were given a single oral gavage dose of 100 mg [14C]metosulam kg-1 body weight and blood, urine, feces and selected tissue specimens were collected up to 168 h for rats and mice and 216 h post-dosing for dogs. Two of these dogs received a second oral dose of 100 mg kg-1 and were humanely euthanized at 12 h post-dosing and selected tissues were collected.
View Article and Find Full Text PDFWe report a prediction that two prokaryotic proteins contain immunoglobulin superfamily domains. Immunoglobulin-like folds have been identified previously in prokaryotic proteins, but these share no recognizable sequence similarity with eukaryotic immunoglobulin superfamily (IgSF) folds, and may be the result of the physics and chemistry of proteins favoring certain common folds. In contrast, the prokaryotic proteins identified have sequences whose match to the immunoglobulin superfamily can be detected by hidden Markov modeling, BLASTP matches, key residue analysis, and secondary structure predictions.
View Article and Find Full Text PDFPacing Clin Electrophysiol
September 1996
RF catheter ablation of accessory bypass tracts associated with the Wolff-Parkinson-White syndrome has become an accepted and widespread therapy. When bypass tracts are located in the free wall of the left ventricle, a single catheter technique may be utilized. A single catheter is placed via the femoral artery, across the aortic valve into the left ventricle.
View Article and Find Full Text PDF'Profiles' of protein structures and sequence alignments can detect subtle homologies. Profile analysis has been put on firmer mathematical ground by the introduction of hidden Markov model (HMM) methods. During the past year, applications of these powerful new HMM-based profiles have begun to appear in the fields of protein-structure prediction and large-scale genome-sequence analysis.
View Article and Find Full Text PDF