The finished sequence of human chromosome 10 comprises a total of 131,666,441 base pairs. It represents 99.4% of the euchromatic DNA and includes one megabase of heterochromatic sequence within the pericentromeric region of the short and long arm of the chromosome.
View Article and Find Full Text PDFChromosome 9 is highly structurally polymorphic. It contains the largest autosomal block of heterochromatin, which is heteromorphic in 6-8% of humans, whereas pericentric inversions occur in more than 1% of the population. The finished euchromatic sequence of chromosome 9 comprises 109,044,351 base pairs and represents >99.
View Article and Find Full Text PDFBackground: It is well known that different species have different protein domain repertoires, and indeed that some protein domains are kingdom specific. This information has not yet been incorporated into statistical methods for finding domains in sequences of amino acids.
Results: We show that by incorporating our understanding of the taxonomic distribution of specific protein domains, we can enhance domain recognition in protein sequences.
We present two algorithms in this paper: GeneWise, which predicts gene structure using similar protein sequences, and Genomewise, which provides a gene structure final parse across cDNA- and EST-defined spliced structure. Both algorithms are heavily used by the Ensembl annotation system. The GeneWise algorithm was developed from a principled combination of hidden Markov models (HMMs).
View Article and Find Full Text PDFEnsembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes.
View Article and Find Full Text PDFChromosome 13 is the largest acrocentric human chromosome. It carries genes involved in cancer including the breast cancer type 2 (BRCA2) and retinoblastoma (RB1) genes, is frequently rearranged in B-cell chronic lymphocytic leukaemia, and contains the DAOA locus associated with bipolar disorder and schizophrenia. We describe completion and analysis of 95.
View Article and Find Full Text PDFThe laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome.
View Article and Find Full Text PDFOne of the primary tasks in deciphering the functional contents of a newly sequenced genome is the identification of its protein coding genes. Existing computational methods for gene prediction include ab initio methods which use the DNA sequence itself as the only source of information, comparative methods using multiple genomic sequences, and similarity based methods which employ the cDNA or protein sequences of related genes to aid the gene prediction. We present here an algorithm implemented in a computer program called Projector which combines comparative and similarity approaches.
View Article and Find Full Text PDFThe Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organize biology around the sequences of large genomes.
View Article and Find Full Text PDFWormBase (http://www.wormbase.org/) is the central data repository for information about Caenorhabditis elegans and related nematodes.
View Article and Find Full Text PDFPfam is a large collection of protein families and domains. Over the past 2 years the number of families in Pfam has doubled and now stands at 6190 (version 10.0).
View Article and Find Full Text PDFThe soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C.
View Article and Find Full Text PDFIdeally, an oncolytic virus will replicate preferentially in malignant cells, have the ability to treat disseminated metastases, and ultimately be cleared by the patient. Here we present evidence that the attenuated vesicular stomatitis strains, AV1 and AV2, embody all of these traits. We uncover the mechanism by which these mutants are selectively attenuated in interferon-responsive cells while remaining highly lytic in 80% of human tumor cell lines tested.
View Article and Find Full Text PDFChromosome 6 is a metacentric chromosome that constitutes about 6% of the human genome. The finished sequence comprises 166,880,988 base pairs, representing the largest chromosome sequenced so far. The entire sequence has been subjected to high-quality manual annotation, resulting in the evidence-supported identification of 1,557 genes and 633 pseudogenes.
View Article and Find Full Text PDFSearching a database for a local alignment to a query under a typical scoring scheme, such as PAM120 or BLOSUM62 with affine gap costs, is a computation that has resisted algorithmic improvement due to its basis in dynamic programming and the weak nature of the signals being searched for. In a query preprocessing step, a set of tables can be built that permit one to (a) eliminate a large fraction of the dynamic programming matrix from consideration and (b) to compute several steps of the remainder with a single table lookup. While this result is not an asymptotic improvement over the original Smith-Waterman algorithm, its complexity is characterized in terms of some sparse features of the matrix and it yields the fastest software implementation to date for such searches.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
April 2003
Most modern speech recognition uses probabilistic models to interpret a sequence of sounds. Hidden Markov models, in particular, are used to recognize words. The same techniques have been adapted to find domains in protein sequences of amino acids.
View Article and Find Full Text PDFBackground: In 1975, the then-Center for Disease Control (CDC) established the Vessel Sanitation Program (VSP) to minimize the risk for diarrheal disease among passengers and crew aboard ships by assisting the cruise ship industry in developing and implementing comprehensive environmental health programs.
Objectives: To evaluate the relationship between cruise ship sanitation scores and diarrheal disease incidence and outbreaks among cruise ship passengers.
Methods: Retrospective cohort study of ship inspection and diarrheal disease data from 1990 through 2000 from the National Center for Environmental Health, CDC database, for cruise ships entering the United States.
A principal challenge currently facing biologists is how to connect the complete DNA sequence of an organism to its development and behaviour. Large-scale targeted-deletions have been successful in defining gene functions in the single-celled yeast Saccharomyces cerevisiae, but comparable analyses have yet to be performed in an animal. Here we describe the use of RNA interference to inhibit the function of approximately 86% of the 19,427 predicted genes of C.
View Article and Find Full Text PDFInterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually.
View Article and Find Full Text PDFWormBase (http://www.wormbase.org/) is a web-accessible central data repository for information about Caenorhabditis elegans and related nematodes.
View Article and Find Full Text PDFThe Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes.
View Article and Find Full Text PDFThe V protein of the Paramyxovirus simian virus 5 (SV5) is a multifunctional protein containing an N-terminal 164 residue domain that is shared with the P protein and a distinct C-terminal domain that is cysteine-rich and which is highly conserved among Paramyxoviruses. We report the recovery from Vero cells [interferon (IFN) nonproducing cells] of a recombinant SV5 (rSV5) that lacks the V protein C-terminal specific domain (rSV5VDeltaC). In Vero cells rSV5VDeltaC forms large plaques and grows at a rate and titer similar to those of rSV5.
View Article and Find Full Text PDFWe have written a fast implementation of the popular Neighbor-Joining tree building algorithm. QuickTree allows the reconstruction of phylogenies for very large protein families (including the largest Pfam alignment containing 27000 HIV GP120 glycoprotein sequences) that would be infeasible using other popular methods.
View Article and Find Full Text PDFWe present a novel comparative method for the ab initio prediction of protein coding genes in eukaryotic genomes. The method simultaneously predicts the gene structures of two un-annotated input DNA sequences which are homologous to each other and retrieves the subsequences which are conserved between the two DNA sequences. It is capable of predicting partial, complete and multiple genes and can align pairs of genes which differ by events of exon-fusion or exon-splitting.
View Article and Find Full Text PDFThe exponential increase in the submission of nucleotide sequences to the nucleotide sequence database by genome sequencing centres has resulted in a need for rapid, automatic methods for classification of the resulting protein sequences. There are several signature and sequence cluster-based methods for protein classification, each resource having distinct areas of optimum application owing to the differences in the underlying analysis methods. In recognition of this, InterPro was developed as an integrated documentation resource for protein families, domains and functional sites, to rationalise the complementary efforts of the individual protein signature database projects.
View Article and Find Full Text PDF