We present deep link prediction (DLP), a method for the interpretation of loss-of-function screens. Our approach uses representation-based link prediction to reprioritize phenotypic readouts by integrating screening experiments with gene-gene interaction networks. We validate on 2 different loss-of-function technologies, RNAi and CRISPR, using datasets obtained from DepMap.
View Article and Find Full Text PDFObjective: Enormous amounts of healthcare data are becoming increasingly accessible through the large-scale adoption of electronic health records. In this work, structured and unstructured (textual) data are combined to assign clinical diagnostic and procedural codes (specifically ICD-9-CM) to patient stays. We investigate whether integrating these heterogeneous data types improves prediction strength compared to using the data types in isolation.
View Article and Find Full Text PDFInterpretation of the multitude of variants obtained from next generation sequencing (NGS) is labor intensive and complex. Web-based interfaces such as Galaxy streamline the generation of variant lists but lack flexibility in the downstream annotation and filtering that are necessary to identify causative variants in medical genomics. To this end, we built VariantDB, a web-based interactive annotation and filtering platform that automatically annotates variants with allele frequencies, functional impact, pathogenicity predictions and pathway information.
View Article and Find Full Text PDFBMC Bioinformatics
February 2011
Background: With the availability of large scale expression compendia it is now possible to view own findings in the light of what is already available and retrieve genes with an expression profile similar to a set of genes of interest (i.e., a query or seed set) for a subset of conditions.
View Article and Find Full Text PDFNewborn screening programs for severe metabolic disorders using tandem mass spectrometry are widely used. Medium-Chain Acyl-CoA dehydrogenase deficiency (MCADD) is the most prevalent mitochondrial fatty acid oxidation defect (1:15,000 newborns) and it has been proven that early detection of this metabolic disease decreases mortality and improves the outcome. In previous studies, data mining methods on derivatized tandem MS datasets have shown high classification accuracies.
View Article and Find Full Text PDFMotivation: We developed ViTraM, a tool that allows visualizing overlapping transcriptional modules in an intuitive way. By visualizing not only the genes and the experiments in which the genes are co-expressed, but also additional properties of the modules such as the regulators and regulatory motifs that are responsible for the observed co-expression, ViTraM can assist in the biological analysis and interpretation of the output of module detection tools.
Availability: The ViTraM software is platform-independent.
The development of structure-learning algorithms for gene regulatory networks depends heavily on the availability of synthetic data sets that contain both the original network and associated expression data. This article reports the application of SynTReN, an existing network generator that samples topologies from existing biological networks and uses Michaelis-Menten and Hill enzyme kinetics to simulate gene interactions. We illustrate the effects of different aspects of the expression data on the quality of the inferred network.
View Article and Find Full Text PDFBackground: In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Despite the demonstrated success of such algorithms in uncovering biologically relevant regulatory relations, further developments in the area are hampered by a lack of tools to compare the performance of alternative module network learning strategies. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms.
View Article and Find Full Text PDFBackground: The development of algorithms to infer the structure of gene regulatory networks based on expression data is an important subject in bioinformatics research. Validation of these algorithms requires benchmark data sets for which the underlying network is known. Since experimental data sets of the appropriate size and design are usually not available, there is a clear need to generate well-characterized synthetic data sets that allow thorough testing of learning algorithms in a fast and reproducible manner.
View Article and Find Full Text PDF