Feasibility of using Clinical Element Models (CEM) to standardize phenotype variables in the database of genotypes and phenotypes (dbGaP).

PLoS One

Division of Biomedical Informatics, Department of Medicine, School of Medicine, University of California San Diego, La Jolla, California, United States of America.

Published: September 2014

The database of Genotypes and Phenotypes (dbGaP) contains various types of data generated from genome-wide association studies (GWAS). These data can be used to facilitate novel scientific discoveries and to reduce cost and time for exploratory research. However, idiosyncrasies and inconsistencies in phenotype variable names are a major barrier to reusing these data. We addressed these challenges in standardizing phenotype variables by formalizing their descriptions using Clinical Element Models (CEM). Designed to represent clinical data, CEMs were highly expressive and thus were able to represent a majority (77.5%) of the 215 phenotype variable descriptions. However, their high expressivity also made it difficult to directly apply them to research data such as phenotype variables in dbGaP. Our study suggested that simplification of the template models makes it more straightforward to formally represent the key semantics of phenotype variables.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3776754PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0076384PLOS

Publication Analysis

Top Keywords

phenotype variables
16
clinical element
8
element models
8
models cem
8
database genotypes
8
genotypes phenotypes
8
phenotypes dbgap
8
phenotype variable
8
phenotype
6
data
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!