Background: Cell lines have been widely used in biomedical research. The community-based Cell Line Ontology (CLO) is a member of the OBO Foundry library that covers the domain of cell lines. Since its publication two years ago, significant updates have been made, including new groups joining the CLO consortium, new cell line cells, upper level alignment with the Cell Ontology (CL) and the Ontology for Biomedical Investigation, and logical extensions.
View Article and Find Full Text PDFSummary: GLay provides Cytoscape users an assorted collection of versatile community structure algorithms and graph layout functions for network clustering and structured visualization. High performance is achieved by dynamically linking highly optimized C functions to the Cytoscape JAVA program, which makes GLay especially suitable for decomposition, display and exploratory analysis of large biological networks.
Availability: http://brainarray.
Background: Reactive oxygen species (ROS) are known mediators of cellular damage in multiple diseases including diabetic complications. Despite its importance, no comprehensive database is currently available for the genes associated with ROS.
Methods: We present ROS- and diabetes-related targets (genes/proteins) collected from the biomedical literature through a text mining technology.
Gene regulation in eukaryotes involves a complex interplay between the proximal promoter and distal genomic elements (such as enhancers) which work in concert to drive precise spatio-temporal gene expression. The experimental localization and characterization of gene regulatory elements is a very complex and resource-intensive process. The computational identification of regulatory regions that confer spatiotemporally specific tissue-restricted expression of a gene is thus an important challenge for computational biology.
View Article and Find Full Text PDFFor insight into transcriptional mechanisms mediating physiological responses to GH, data mining was performed on a profile of GH-regulated genes induced or inhibited at different times in highly responsive 3T3-F442A adipocytes. Gene set enrichment analysis indicated that GH-regulated genes are enriched in pathways including phosphoinositide and insulin signaling and suggested that suppressor of cytokine signaling 2 (SOCS2) and phosphoinositide 3' kinase regulatory subunit p85alpha (Pik3r1) are important targets. Model-based Chinese restaurant clustering identified a group of genes highly regulated by GH at times consistent with its key physiological actions.
View Article and Find Full Text PDFUnlabelled: SciMiner is a web-based literature mining and functional analysis tool that identifies genes and proteins using a context specific analysis of MEDLINE abstracts and full texts. SciMiner accepts a free text query (PubMed Entrez search) or a list of PubMed identifiers as input. SciMiner uses both regular expression patterns and dictionaries of gene symbols and names compiled from multiple sources.
View Article and Find Full Text PDFTo assess the potential of tumor-associated, alternatively spliced gene products as a source of biomarkers in biological fluids, we have analyzed a large data set of mass spectra derived from the plasma proteome of a mouse model of human pancreatic ductal adenocarcinoma. MS/MS spectra were interrogated for novel splice isoforms using a nonredundant database containing an exhaustive three-frame translation of Ensembl transcripts and gene models from ECgene. This integrated analysis identified 420 distinct splice isoforms, of which 92 did not match any previously annotated mouse protein sequence.
View Article and Find Full Text PDFMotivation: Cell lines are used extensively in biomedical research, but the nomenclature describing cell lines has not been standardized. The problems are both linguistic and experimental. Many ambiguous cell line names appear in the published literature.
View Article and Find Full Text PDFUnlabelled: The MiMI molecular interaction repository integrates data from multiple sources, resolves interactions to standard gene names and symbols, links to annotation data from GO, MeSH and PubMed and normalizes the descriptions of interaction type. Here, we describe a Cytoscape plugin that retrieves interaction and annotation data from MiMI and links out to multiple data sources and tools. Community annotation of the interactome is supported.
View Article and Find Full Text PDFAMIA Annu Symp Proc
October 2007
The University of Michigan Clinical Data Repository (CDR) integrates over 25 data sources, and as a result has a schema that is too complex to be directly queried by clinical researchers. Schema summarization uses abstract elements and links to summarize a complex schema and allows users with limited knowledge of the underlying database structure to effectively issue queries to the CDR for clinical and translational research.
View Article and Find Full Text PDFAMIA Annu Symp Proc
October 2007
Cell lines are used extensively throughout biomedical research, but the nomenclature describing cell lines has not been standardized; many ambiguous names appear in the published literature. The Cell Line Ontology is a well-structured collection of names for cell lines cultured in vitro. This ontology collates names from ATCC, HyperCLDB and MeSH in an informative format and specifies relationships between cell lines including derivation and homolog.
View Article and Find Full Text PDFJ Bioinform Comput Biol
June 2008
The systematic inference of biologically relevant influence networks remains a challenging problem in computational biology. Even though the availability of high-throughput data has enabled the use of probabilistic models to infer the plausible structure of such networks, their true interpretation of the biology of the process is questionable. In this work, we propose a network inference methodology, based on the directed information (DTI) criterion, that incorporates the biology of transcription within the framework so as to enable experimentally verifiable inference.
View Article and Find Full Text PDFWe present an in-depth analysis of mouse plasma leading to the development of a publicly available repository composed of 568 liquid chromatography-tandem mass spectrometry runs. A total of 13,779 distinct peptides have been identified with high confidence. The corresponding approximately 3,000 proteins are estimated to span a 7 logarithmic range of abundance in plasma.
View Article and Find Full Text PDFSummary: Cytoscape enhanced search plugin (ESP) enables searching complex biological networks on multiple attribute fields using logical operators and wildcards. Queries use an intuitive syntax and simple search line interface. ESP is implemented as a Cytoscape plugin and complements existing search functions in the Cytoscape network visualization and analysis software, allowing users to easily identify nodes, edges and subgraphs of interest, even for very large networks.
View Article and Find Full Text PDFThe development of liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has made it possible to characterize phosphopeptides in an increasingly large-scale and high-throughput fashion. However, extracting confident phosphopeptide identifications from the resulting large data sets in a similar high-throughput fashion remains difficult, as does rigorously estimating the false discovery rate (FDR) of a set of phosphopeptide identifications. This article describes a data analysis pipeline designed to address these issues.
View Article and Find Full Text PDFEURASIP J Bioinform Syst Biol
June 2010
Motif discovery for the identification of functional regulatory elements underlying gene expression is a challenging problem. Sequence inspection often leads to discovery of novel motifs (including transcription factor sites) with previously uncharacterized function in gene expression. Coupled with the complexity underlying tissue-specific gene expression, there are several motifs that are putatively responsible for expression in a certain cell type.
View Article and Find Full Text PDFUnlabelled: MiSearch is an adaptive biomedical literature search tool that ranks citations based on a statistical model for the likelihood that a user will choose to view them. Citation selections are automatically acquired during browsing and used to dynamically update a likelihood model that includes authorship, journal and PubMed indexing information. The user can optionally elect to include or exclude specific features and vary the importance of timeliness in the ranking.
View Article and Find Full Text PDFEURASIP J Bioinform Syst Biol
June 2010
Most current methods for gene regulatory network identification lead to the inference of steady-state networks, that is, networks prevalent over all times, a hypothesis which has been challenged. There has been a need to infer and represent networks in a dynamic, that is, time-varying fashion, in order to account for different cellular states affecting the interactions amongst genes. In this work, we present an approach, regime-SSM, to understand gene regulatory networks within such a dynamic setting.
View Article and Find Full Text PDFComput Syst Bioinformatics Conf
December 2007
The systematic inference of biologically relevant influence networks remains a challenging problem in computational biology. Even though the availability of high-throughput data has enabled the use of probabilistic models to infer the plausible structure of such networks, their true interpretation of the biology of the process is questionable. In this work, we propose a network inference methodology, based on the directed information (DTI) criterion, which incorporates the biology of transcription within the framework, so as to enable experimentally verifiable inference.
View Article and Find Full Text PDFGene expression responses are complex and frequently involve the actions of many genes to effect coordinated patterns. We hypothesized these coordinated responses are evolutionarily conserved and used a comparison of human and mouse gene expression profiles to identify the most prominent conserved features across a set of normal mammalian tissues. Based on data from multiple studies across multiple tissues in human and mouse, 13 gene expression modes across multiple tissues were identified in each of these species using principal component analysis.
View Article and Find Full Text PDFThe MYC genes encode nuclear sequence specific-binding DNA-binding proteins that are pleiotropic regulators of cellular function, and the c-MYC proto-oncogene is deregulated and/or mutated in most human cancers. Experimental studies of MYC binding to the genome are not fully consistent. While many c-MYC recognition sites can be identified in c-MYC responsive genes, other motif matches-even experimentally confirmed sites-are associated with genes showing no c-MYC response.
View Article and Find Full Text PDFMotivation: With the rapid increase in the availability of biological graph datasets, there is a growing need for effective and efficient graph querying methods. Due to the noisy and incomplete characteristics of these datasets, exact graph matching methods have limited use and approximate graph matching methods are required. Unfortunately, existing graph matching methods are too restrictive as they only allow exact or near exact graph matching.
View Article and Find Full Text PDFBackground: Defining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of the entire human genome was used as the query database to search for novel blood proteins in the data from the Human Proteome Organization Plasma Proteome Project.
View Article and Find Full Text PDFThe Human Proteome Organization (HUPO) recently completed the first large-scale collaborative study to characterize the human serum and plasma proteomes. The study was carried out in different locations and used diverse methods and instruments to compare and integrate tandem mass spectrometry (MS/MS) data on aliquots of pooled serum and plasma from healthy subjects. Liquid chromatography (LC)-MS/MS data sets from 18 laboratories were matched to the International Protein Index database, and an initial integration exercise resulted in 9,504 proteins identified with one or more peptides, and 3,020 proteins identified with two or more peptides.
View Article and Find Full Text PDF