Identifying micro-inversions using high-throughput sequencing reads.

BMC Genomics

State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, and Center for Quantitative Biology, Peking University, Beijing, 100871, China.

Published: January 2016

Background: The identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease. However, current alignment methods are generally insensitive to detect MIs. Here we develop a novel tool, MID (Micro-Inversion Detector), to identify MIs in human genomes using next-generation sequencing reads.

Results: The algorithm of MID is designed based on a dynamic programming path-finding approach. What makes MID different from other variant detection tools is that MID can handle small MIs and multiple breakpoints within an unmapped read. Moreover, MID improves reliability in low coverage data by integrating multiple samples. Our evaluation demonstrated that MID outperforms Gustaf, which can currently detect inversions from 30 bp to 500 bp.

Conclusions: To our knowledge, MID is the first method that can efficiently and reliably identify MIs from unmapped short next-generation sequencing reads. MID is reliable on low coverage data, which is suitable for large-scale projects such as the 1000 Genomes Project (1KGP). MID identified previously unknown MIs from the 1KGP that overlap with genes and regulatory elements in the human genome. We also identified MIs in cancer cell lines from Cancer Cell Line Encyclopedia (CCLE). Therefore our tool is expected to be useful to improve the study of MIs as a type of genetic variant in the human genome. The source code can be downloaded from: http://cqb.pku.edu.cn/ZhuLab/MID .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895285PMC
http://dx.doi.org/10.1186/s12864-015-2305-7DOI Listing

Publication Analysis

Top Keywords

sequencing reads
12
next-generation sequencing
12
mis
9
mid
9
identify mis
8
low coverage
8
coverage data
8
human genome
8
cancer cell
8
identifying micro-inversions
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!