Background: Thoroughly annotated data resources are a key requirement in phenotype dependent analysis and diagnosis of diseases in the area of precision medicine. Recent work has shown that curation and systematic annotation of human phenome data can significantly improve the quality and selectivity for the interpretation of inherited diseases. We have therefore developed PhenoDis, a comprehensive, manually annotated database providing symptomatic, genetic and imprinting information about rare cardiac diseases.
View Article and Find Full Text PDFSummary: Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain.
View Article and Find Full Text PDFThe Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM).
View Article and Find Full Text PDFCORUM is a database that provides a manually curated repository of experimentally characterized protein complexes from mammalian organisms, mainly human (64%), mouse (16%) and rat (12%). Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The new CORUM 2.
View Article and Find Full Text PDFUnlabelled: Cross-mapping of gene and protein identifiers between different databases is a tedious and time-consuming task. To overcome this, we developed CRONOS, a cross-reference server that contains entries from five mammalian organisms presented by major gene and protein information resources. Sequence similarity analysis of the mapped entries shows that the cross-references are highly accurate.
View Article and Find Full Text PDFThe generation of expressed sequence tag (EST) libraries offers an affordable approach to investigate organisms, if no genome sequence is available. OREST (http://mips.gsf.
View Article and Find Full Text PDFProtein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The CORUM (http://mips.gsf.
View Article and Find Full Text PDFData from large-scale genome projects, transcriptomics and proteomics experiments have provided scientists with a wealth of information establishing the basis for the investigation of cellular processes. To understand biological function beyond the single gene by the discovery and characterization of functional protein networks, bioinformatics analysis requires information about two additional attributes associated with the gene products: (i) high-level protein function prediction of experimentally uncharacterized proteins and (ii) systematic classification of protein function. This article describes the basic properties of protein classification systems and discusses examples of their implementation.
View Article and Find Full Text PDFMfunGD (http://mips.gsf.de/genre/proj/mfungd/) provides a resource for annotated mouse proteins and their occurrence in protein networks.
View Article and Find Full Text PDFSimilarity Matrix of Proteins (SIMAP) (http://mips.gsf.de/simap) provides a database based on a pre-computed similarity matrix covering the similarity space formed by >4 million amino acid sequences from public databases and completely sequenced genomes.
View Article and Find Full Text PDFIn this paper, we present the Functional Catalogue (FunCat), a hierarchically structured, organism-independent, flexible and scalable controlled classification system enabling the functional description of proteins from any organism. FunCat has been applied for the manual annotation of prokaryotes, fungi, plants and animals. We describe how FunCat is implemented as a highly efficient and robust tool for the manual and automatic annotation of genomic sequences.
View Article and Find Full Text PDFInteracting proteins from Saccharomyces cerevisiae are evolutionarily conserved and their likelihood of having an ortholog in other ascomycota species correlates with the number of interaction partners. Moreover, interacting proteins show a clear preference to be conserved as a pair, indicating that nature maintains selection pressure on the interaction links between proteins. The conservation of interacting protein pairs between different organisms does not exhibit any bias with respect to protein functional roles.
View Article and Find Full Text PDFThe German Neurospora Genome Project has assembled sequences from ordered cosmid and BAC clones of linkage groups II and V of the genome of Neurospora crassa in 13 and 12 contigs, respectively. Including additional sequences located on other linkage groups a total of 12 Mb were subjected to a manual gene extraction and annotation process. The genome comprises a small number of repetitive elements, a low degree of segmental duplications and very few paralogous genes.
View Article and Find Full Text PDFAfter 50 years of analysing Neurospora crassa genes one by one large scale sequence analysis has increased the number of accessible genes tremendously in the last few years. Being the only filamentous fungus for which a comprehensive genomic sequence database is publicly accessible N. crassa serves as the model for this important group of microorganisms.
View Article and Find Full Text PDF