Variable Number Tandem Repeats (VNTRs) are tandem repeat (TR) loci that vary in copy number across a population. Using our program, VNTRseek, we analyzed human whole genome sequencing datasets from 2770 individuals in order to detect minisatellite VNTRs, i.e., those with pattern sizes ≥7 bp. We detected 35 638 VNTR loci and classified 5676 as commonly polymorphic (i.e. with non-reference alleles occurring in >5% of the population). Commonly polymorphic VNTR loci were found to be enriched in genomic regions with regulatory function, i.e. transcription start sites and enhancers. Investigation of the commonly polymorphic VNTRs in the context of population ancestry revealed that 1096 loci contained population-specific alleles and that those could be used to classify individuals into super-populations with near-perfect accuracy. Search for quantitative trait loci (eQTLs), among the VNTRs proximal to genes, indicated that in 187 genes expression differences correlated with VNTR genotype. We validated our predictions in several ways, including experimentally, through the identification of predicted alleles in long reads, and by comparisons showing consistency between sequencing platforms. This study is the most comprehensive analysis of minisatellite VNTRs in the human population to date.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8096271 | PMC |
http://dx.doi.org/10.1093/nar/gkab224 | DOI Listing |
Cell Genom
December 2024
Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China. Electronic address:
Curr Protoc
November 2024
Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.
Short tandem repeats (STRs) and variable-number tandem repeats (VNTRs) are repetitive genomic sequences seen widely throughout the genome. These repeat expansions are currently known to cause ∼60 diseases, with expansions in new loci linked to rare diseases continuing to be discovered. Genome sequencing is an important tool for detecting disease-causing variants and several computational tools have been developed to analyze tandem repeats from genomic data, enabling the genotyping and the identification of expanded alleles.
View Article and Find Full Text PDFJ Clin Microbiol
September 2024
Servicio de Microbiología Clínica y Enfermedades Infecciosas, Hospital General Universitario Gregorio Marañón, Madrid, Spain.
Am J Hum Genet
August 2024
Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA. Electronic address:
Genome Biol
June 2024
Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria.
Background: Variable number tandem repeats (VNTRs) are highly polymorphic DNA regions harboring many potentially disease-causing variants. However, VNTRs often appear unresolved ("dark") in variation databases due to their repetitive nature. One particularly complex and medically relevant VNTR is the KIV-2 VNTR located in the cardiovascular disease gene LPA which encompasses up to 70% of the coding sequence.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!