Rapid storage and retrieval of genomic intervals from a relational database system using nested containment lists.

Database (Oxford)

Department of Biomedical Informatics, Center for Human Genetics Research, Vanderbilt University, 2215 Garland Ave, Nashville, TN 37232, USA.

Published: October 2013

Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks. Database URL: https://github.com/bushlab/mynclist.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3724366PMC
http://dx.doi.org/10.1093/database/bat056DOI Listing

Publication Analysis

Top Keywords

retrieval genomic
12
relational database
12
storage retrieval
8
database system
8
genomic annotations
8
annotation tasks
8
database
5
rapid storage
4
genomic
4
genomic intervals
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!