Publications by Mark de Haan

Publications by authors named "Mark de Haan"

Page 1 of 1

MOLGENIS research: advanced bioinformatics data software for non-bioinformaticians.

K Joeri van der Velde Floris Imhann Bart Charbon Chao Pang David van Enckevort Mark de Haan

Bioinformatics

March 2019

Motivation: The volume and complexity of biological data increases rapidly. Many clinical professionals and biomedical researchers without a bioinformatics background are generating big '-omics' data, but do not always have the tools to manage, process or publicly share these data.

Results: Here we present MOLGENIS Research, an open-source web-application to collect, manage, analyze, visualize and share large and complex biomedical datasets, without the need for advanced bioinformatics skills.

View Article and Find Full Text PDF

BiobankUniverse: automatic matchmaking between datasets for biobank data discovery and integration.

Chao Pang Fleur Kelpin David van Enckevort Niina Eklund Kaisa Silander Mark de Haan

Bioinformatics

November 2017

Motivation: Biobanks are indispensable for large-scale genetic/epidemiological studies, yet it remains difficult for researchers to determine which biobanks contain data matching their research questions.

Results: To overcome this, we developed a new matching algorithm that identifies pairs of related data elements between biobanks and research variables with high precision and recall. It integrates lexical comparison, Unified Medical Language System ontology tagging and semantic query expansion.

View Article and Find Full Text PDF

MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks.

Chao Pang David van Enckevort Mark de Haan Fleur Kelpin Jonathan Jetten

Bioinformatics

July 2016

Motivation: While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration.

Results: To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology.

View Article and Find Full Text PDF

SORTA: a system for ontology-based re-coding and technical annotation of biomedical phenotype data.

Chao Pang Annet Sollie Anna Sijtsma Dennis Hendriksen Bart Charbon Mark de Haan

Database (Oxford)

May 2016

There is an urgent need to standardize the semantics of biomedical data values, such as phenotypes, to enable comparative and integrative analyses. However, it is unlikely that all studies will use the same data collection protocols. As a result, retrospective standardization is often required, which involves matching of original (unstructured or locally coded) data to widely used coding or ontology systems such as SNOMED CT (clinical terms), ICD-10 (International Classification of Disease) and HPO (Human Phenotype Ontology).

View Article and Find Full Text PDF

Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels.

Patrick Deelen Daria V Zhernakova Mark de Haan Marijke van der Sijde Marc Jan Bonder

Genome Med

May 2015

Background: RNA-sequencing (RNA-seq) is a powerful technique for the identification of genetic variants that affect gene-expression levels, either through expression quantitative trait locus (eQTL) mapping or through allele-specific expression (ASE) analysis. Given increasing numbers of RNA-seq samples in the public domain, we here studied to what extent eQTLs and ASE effects can be identified when using public RNA-seq data while deriving the genotypes from the RNA-sequencing reads themselves.

Methods: We downloaded the raw reads for all available human RNA-seq datasets.

View Article and Find Full Text PDF

Evaluation of CADD Scores in Curated Mismatch Repair Gene Variants Yields a Model for Clinical Validation and Prioritization.

K Joeri van der Velde Joël Kuiper Bryony A Thompson John-Paul Plazzer Gert van Valkenhoef Mark de Haan

Hum Mutat

July 2015

Next-generation sequencing in clinical diagnostics is providing valuable genomic variant data, which can be used to support healthcare decisions. In silico tools to predict pathogenicity are crucial to assess such variants and we have evaluated a new tool, Combined Annotation Dependent Depletion (CADD), and its classification of gene variants in Lynch syndrome by using a set of 2,210 DNA mismatch repair gene variants. These had already been classified by experts from InSiGHT's Variant Interpretation Committee.

View Article and Find Full Text PDF

WormQTLHD--a web database for linking human disease to natural variation data in C. elegans.

K Joeri van der Velde Mark de Haan Konrad Zych Danny Arends L Basten Snoek

Nucleic Acids Res

January 2014

Interactions between proteins are highly conserved across species. As a result, the molecular basis of multiple diseases affecting humans can be studied in model organisms that offer many alternative experimental opportunities. One such organism-Caenorhabditis elegans-has been used to produce much molecular quantitative genetics and systems biology data over the past decade.

View Article and Find Full Text PDF

Publications by authors named "Mark de Haan"

MOLGENIS research: advanced bioinformatics data software for non-bioinformaticians.

BiobankUniverse: automatic matchmaking between datasets for biobank data discovery and integration.

MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks.

SORTA: a system for ontology-based re-coding and technical annotation of biomedical phenotype data.

Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels.

Evaluation of CADD Scores in Curated Mismatch Repair Gene Variants Yields a Model for Clinical Validation and Prioritization.

WormQTLHD--a web database for linking human disease to natural variation data in C. elegans.

A PHP Error was encountered

A PHP Error was encountered