Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers.
View Article and Find Full Text PDFWe describe a novel algorithm for deriving the minimal set of nonredundant transcripts compatible with the splicing structure of a set of ESTs mapped on a genome. Sets of ESTs with compatible splicing are represented by a special type of graph. We describe the algorithms for building the graphs and for deriving the minimal set of transcripts from the graphs that are compatible with the evidence.
View Article and Find Full Text PDFAs more genomes are sequenced, there is an increasing need for automated first-pass annotation which allows timely access to important genomic information. The Ensembl gene-building system enables fast automated annotation of eukaryotic genomes. It annotates genes based on evidence derived from known protein, cDNA, and EST sequences.
View Article and Find Full Text PDFThe Ensembl pipeline is an extension to the Ensembl system which allows automated annotation of genomic sequence. The software comprises two parts. First, there is a set of Perl modules ("Runnables" and "RunnableDBs") which are 'wrappers' for a variety of commonly used analysis tools.
View Article and Find Full Text PDFEnsembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes.
View Article and Find Full Text PDFThe sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences.
View Article and Find Full Text PDFAnopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs.
View Article and Find Full Text PDFLinkage analysis in multiplex families has provisionally identified several genomic regions where genes influencing susceptibility to multiple sclerosis are likely to be located. It is anticipated that association mapping will provide a higher degree of resolution, but this more powerful approach is limited by the substantial genotyping effort required. Here, we describe the first use of DNA pooling to screen the whole genome for association in multiple sclerosis based on a 0.
View Article and Find Full Text PDF