Source Code Biol Med
August 2011
Background: Current sequencing technology makes it practical to sequence many samples of a given organism, raising new challenges for the processing and interpretation of large genomics data sets with associated metadata. Traditional computational phylogenetic methods are ideal for studying the evolution of gene/protein families and using those to infer the evolution of an organism, but are less than ideal for the study of the whole organism mainly due to the presence of insertions/deletions/rearrangements. These methods provide the researcher with the ability to group a set of samples into distinct genotypic groups based on sequence similarity, which can then be associated with metadata, such as host information, pathogenicity, and time or location of occurrence.
View Article and Find Full Text PDFKnowledge about pathogenesis is increasing dramatically, and most of this information is stored in the scientific literature or in sequence databases. This information can be made more accessible by the use of ontologies or controlled vocabularies. Recently, several ontologies, controlled vocabularies and databases have been developed or adapted for virulence factors and their roles in pathogenesis.
View Article and Find Full Text PDFWe have developed a challenge task for the second BioCreAtIvE (Critical Assessment of Information Extraction in Biology) that requires participating systems to provide lists of the EntrezGene (formerly LocusLink) identifiers for all human genes and proteins mentioned in a MEDLINE abstract. We are distributing 281 annotated abstracts and another 5,000 noisily annotated abstracts along with a gene name lexicon to participants. We have performed a series of baseline experiments to better characterize this dataset and form a foundation for participant exploration.
View Article and Find Full Text PDFSource Code Biol Med
October 2007
Background: Phylogenetic trees are widely used to visualize evolutionary relationships between different organisms or samples of the same organism. There exists a variety of both free and commercial tree visualization software available, but limitations in these programs often require researchers to use multiple programs for analysis, annotation, and the production of publication-ready images.
Results: We present TreeViewJ, a Java tool for visualizing, editing and analyzing phylogenetic trees.
Neuronal identities are specified by the combinatorial functions of activators and repressors of gene expression. Members of the well-conserved Olf/EBF (O/E) transcription factor family have been shown to play important roles in neuronal and non-neuronal development and differentiation. O/E proteins are highly expressed in the olfactory epithelium, and O/E binding sites have been identified upstream of olfactory genes.
View Article and Find Full Text PDFBackground: The biological research literature is a major repository of knowledge. As the amount of literature increases, it will get harder to find the information of interest on a particular topic. There has been an increasing amount of work on text mining this literature, but comparing this work is hard because of a lack of standards for making comparisons.
View Article and Find Full Text PDFBackground: We prepared and evaluated training and test materials for an assessment of text mining methods in molecular biology. The goal of the assessment was to evaluate the ability of automated systems to generate a list of unique gene identifiers from PubMed abstracts for the three model organisms Fly, Mouse, and Yeast. This paper describes the preparation and evaluation of answer keys for training and testing.
View Article and Find Full Text PDFBackground: Our goal in BioCreAtIve has been to assess the state of the art in text mining, with emphasis on applications that reflect real biological applications, e.g., the curation process for model organism databases.
View Article and Find Full Text PDFMost C. elegans sensory neuron types consist of a single bilateral pair of neurons, and respond to a unique set of sensory stimuli. Although genes required for the development and function of individual sensory neuron types have been identified in forward genetic screens, these approaches are unlikely to identify genes that when mutated result in subtle or pleiotropic phenotypes.
View Article and Find Full Text PDFBiology has now become an information science, and researchers are increasingly dependent on expert-curated biological databases to organize the findings from the published literature. We report here on a series of experiments related to the application of natural language processing to aid in the curation process for FlyBase. We focused on listing the normalized form of genes and gene products discussed in an article.
View Article and Find Full Text PDFNuclear receptors regulate numerous critical biological processes. The C. elegans genome is predicted to encode approximately 270 nuclear receptors of which >250 are unique to nematodes.
View Article and Find Full Text PDF