Detecting tandem repeat variants in coding regions using code-adVNTR.

iScience

Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA 92093, USA.

Published: August 2022

The human genome contains more than one million tandem repeats (TRs), DNA sequences containing multiple approximate copies of a motif repeated contiguously. TRs account for significant genetic variation, with 50 + diseases attributed to changes in motif number. A few diseases have been to be caused by small indels in variable number tandem repeats (VNTRs) including poly-cystic kidney disease type 1 (MCKD1) and monogenic type 1 diabetes. However, small indels in VNTRs are largely unexplored mainly due to the long and complex structure of VNTRs with multiple motifs. We developed a method, code-adVNTR, that utilizes multi-motif hidden Markov models to detect both, motif count variation and small indels, within VNTRs. In simulated data, code-adVNTR outperformed GATK-HaplotypeCaller in calling small indels within large VNTRs. We used code-adVNTR to characterize coding VNTRs in the 1000 genomes data identifying many population-specific variants, and to reliably call mutations for MCKD1.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9379575PMC
http://dx.doi.org/10.1016/j.isci.2022.104785DOI Listing

Publication Analysis

Top Keywords

small indels
12
tandem repeats
8
vntrs
6
detecting tandem
4
tandem repeat
4
repeat variants
4
variants coding
4
coding regions
4
code-advntr
4
regions code-advntr
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!