The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.
View Article and Find Full Text PDFA basic problem of microarray data analysis is to identify genes whose expression is affected by the distinction between malignancies with different properties. These genes are said to be differentially expressed. Differential expression can be detected by selecting the genes with P-values (derived using an appropriate hypothesis test) below a certain rejection level.
View Article and Find Full Text PDFUnlabelled: The microarray gene expression markup language (MAGE-ML) is a widely used XML (eXtensible Markup Language) standard for describing and exchanging information about microarray experiments. It can describe microarray designs, microarray experiment designs, gene expression data and data analysis results. We describe RMAGEML, a new Bioconductor package that provides a link between cDNA microarray data stored in MAGE-ML format and the Bioconductor framework for preprocessing, visualization and analysis of microarray experiments.
View Article and Find Full Text PDFMotivation: Microarrays are capable of determining the expression levels of thousands of genes simultaneously. In combination with classification methods, this technology can be useful to support clinical management decisions for individual patients, e.g.
View Article and Find Full Text PDFWe implemented a framework called TXTGate that combines literature indices of selected public biological resources in a flexible text-mining system designed towards the analysis of groups of genes. By means of tailored vocabularies, term- as well as gene-centric views are offered on selected textual fields and MEDLINE abstracts used in LocusLink and the Saccharomyces Genome Database. Subclustering and links to external resources allow for in-depth analysis of the resulting term profiles.
View Article and Find Full Text PDFBackground: The transcription start site of a metazoan gene remains poorly understood, mostly because there is no clear signal present in all genes. Now that several sequenced metazoan genomes have been annotated, we have been able to compare the base composition around the transcription start site for all annotated genes across multiple genomes.
Results: The most prominent feature in the base compositions is a significant local variation in G+C content over a large region around the transcription start site.
Thanks to its increasing availability, electronic literature has become a potential source of information for the development of complex Bayesian networks (BN), when human expertise is missing or data is scarce or contains much noise. This opportunity raises the question of how to integrate information from free-text resources with statistical data in learning Bayesian networks. Firstly, we report on the collection of prior information resources in the ovarian cancer domain, which includes "kernel" annotations of the domain variables.
View Article and Find Full Text PDFSummary: The implementation of a genetic algorithm is described that provides a fast method of searching for the optimal combination of transcription factor binding sites in a set of regulatory sequences.
Availability: The algorithm can be used transparently as a web service from within the Toucan software. Toucan can be accessed at http://www.
Background: The PmrAB (BasSR) two-component regulatory system is required for Salmonella typhimurium virulence. PmrAB-controlled modifications of the lipopolysaccharide (LPS) layer confer resistance to cationic antibiotic polypeptides, which may allow bacteria to survive within macrophages. The PmrAB system also confers resistance to Fe3+-mediated killing.
View Article and Find Full Text PDFUnlabelled: To identify key genes in the antiproliferative action of 1,25(OH)2D3, MC3T3-E1 mouse osteoblasts were subjected to cDNA microarray analyses. Eleven E2F-driven DNA replication genes were downregulated by 1,25(OH)2D3. These results were confirmed by quantitative RT-PCR in different cell types, showing the general nature of this action of 1,25(OH)2D3.
View Article and Find Full Text PDFNephrol Dial Transplant
February 2004
Background: Chronic haemodialysis patients are at increased risk for developing tuberculosis (TB). Appropriate screening methods to detect latent Mycobacterium tuberculosis infection are required. The aim of this prospective multi-centre study was to evaluate the tuberculin skin test (TST) as a screening method for detection of M.
View Article and Find Full Text PDFPLAG1 is a proto-oncogene whose ectopic expression can trigger the development of pleomorphic adenomas of the salivary glands and of lipoblastomas. As PLAG1 is a transcription factor, able to activate transcription through the binding to the consensus sequence GRGGC(N)(6-8)GGG, its ectopic expression presumably results in the deregulation of target genes, leading to uncontrolled cell proliferation. The identification of PLAG1 target genes is therefore a crucial step in understanding the molecular mechanisms involved in PLAG1-induced tumorigenesis.
View Article and Find Full Text PDFThe upcoming availability of public microarray repositories and of large compendia of gene expression information opens up a new realm of possibilities for microarray data analysis. An essential challenge is the efficient integration of microarray data generated by different research groups on different array platforms. This review focuses on the problems associated with this integration, which are: (1) the efficient access to and exchange of microarray data; (2) the validation and comparison of data from different platforms (cDNA and short and long oligonucleotides); and (3) the integrated statistical analysis of multiple data sets.
View Article and Find Full Text PDFMotivation: Gibbs sampling has become a method of choice for the discovery of noisy patterns, known as motifs, in DNA and protein sequences. Because handling noise in microarray data presents similar challenges, we have adapted this strategy to the biclustering of discretized microarray data.
Results: In contrast with standard clustering that reveals genes that behave similarly over all the conditions, biclustering groups genes over only a subset of conditions for which those genes have a sharp probability distribution.
Motivation: The transcriptional regulation of a metazoan gene depends on the cooperative action of multiple transcription factors that bind to cis-regulatory modules (CRMs) located in the neighborhood of the gene. By integrating multiple signals, CRMs confer an organism specific spatial and temporal rate of transcription.
Results: Based on the hypothesis that genes that are needed in exactly the same conditions might share similar regulatory switches, we have developed a novel methodology to find CRMs in a set of coexpressed or coregulated genes.
Incorporating prior knowledge into black-box classifiers is still much of an open problem. We propose a hybrid Bayesian methodology that consists in encoding prior knowledge in the form of a (Bayesian) belief network and then using this knowledge to estimate an informative prior for a black-box model (e.g.
View Article and Find Full Text PDFINCLUSive is a suite of algorithms and tools for the analysis of gene expression data and the discovery of cis-regulatory sequence elements. The tools allow normalization, filtering and clustering of microarray data, functional scoring of gene clusters, sequence retrieval, and detection of known and unknown regulatory elements using probabilistic sequence models and Gibbs sampling. All tools are available via different web pages and as web services.
View Article and Find Full Text PDFWe performed mRNA expression profiling of mouse primary hippocampal neurones undergoing differentiation in vitro. We show that 2314 genes significantly changed expression during neuronal differentiation. The temporal resolution of our experiment (six time points) permits us to distinguish between gene expression patterns characteristic for the axonal and for the dendritic stages of neurite outgrowth.
View Article and Find Full Text PDFBackground: As genomics becomes increasingly relevant to medicine, medical informatics and bioinformatics are gradually converging into a larger field that we call computational biomedicine.
Objectives: Developing a computational framework that is common to the different disciplines that compose computational biomedicine will be a major enabler of the further development and integration of this research domain.
Methods: Probabilistic graphical models such as Hidden Markov Models, belief networks, and missing-data models together with computational methods such as dynamic programming, Expectation-Maximization, data-augmentation Gibbs sampling, and the Metropolis-Hastings algorithm provide the tools for an integrated probabilistic approach to computational biomedicine.
Summary: MARAN is a web-based application for normalizing microarray data. MARAN comprises a generic ANOVA model, an option for Loess fitting prior to ANOVA analysis, and a module for selecting genes with significantly changing expression.
Availability: http://www.
TOUCAN is a Java application for the rapid discovery of significant cis-regulatory elements from sets of coexpressed or coregulated genes. Biologists can automatically (i) retrieve genes and intergenic regions, (ii) identify putative regulatory regions, (iii) score sequences for known transcription factor binding sites, (iv) identify candidate motifs for unknown binding sites, and (v) detect those statistically over-represented sites that are characteristic for a gene set. Genes or intergenic regions are retrieved from Ensembl or EMBL, together with orthologs and supporting information.
View Article and Find Full Text PDFPac Symp Biocomput
August 2003
Thanks to its increasing availability, electronic literature can now be a major source of information when developing complex statistical models where data is scarce or contains much noise. This raises the question of how to deeply integrate information from domain literature with experimental data. Evaluating what kind of statistical text representations can integrate literature knowledge in clustering still remains an unsufficiently explored topic.
View Article and Find Full Text PDFMotif detection based on Gibbs sampling is a common procedure used to retrieve regulatory motifs in silico. Using a species-specific background model was previously shown to increase the robustness of the algorithm. Here, we demonstrate that selecting a non-species-adapted background model can have an adverse effect on the results of motif detection.
View Article and Find Full Text PDFObjective: To determine if power Doppler ultrasound examination of the endometrium can contribute to a correct diagnosis of endometrial malignancy in women with postmenopausal bleeding and endometrium > or = 5 mm.
Methods: Eighty-three women with postmenopausal bleeding and endometrium > or = 5 mm underwent gray-scale and power Doppler ultrasound examination using predetermined, standardized settings. Suspicion of endometrial malignancy at gray-scale ultrasound examination (endometrial morphology) was noted, and the color content of the endometrium at power Doppler examination was estimated subjectively (endometrial color score).
Motivation: Microarray experiments generate a considerable amount of data, which analyzed properly help us gain a huge amount of biologically relevant information about the global cellular behaviour. Clustering (grouping genes with similar expression profiles) is one of the first steps in data analysis of high-throughput expression measurements. A number of clustering algorithms have proved useful to make sense of such data.
View Article and Find Full Text PDF