WebSTR: A Population-wide Database of Short Tandem Repeat Variation in Humans.

J Mol Biol

Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA; Department of Medicine, University of California San Diego, La Jolla, CA, USA. Electronic address:

Published: October 2023

AI Article Synopsis

  • Short tandem repeats (STRs) are variable DNA sequences made up of repeated nucleotide patterns, which can affect various human traits, including gene expression and disease risk.
  • Recent advances in genome analysis have allowed for the creation of large datasets that analyze STR variation across diverse populations, overcoming previous bioinformatics challenges.
  • WebSTR is a comprehensive database that catalogs genetic variation of STRs, using data from significant projects, and is accessible through a web portal and API for researchers.

Article Abstract

Short tandem repeats (STRs) are consecutive repetitions of one to six nucleotide motifs. They are hypervariable due to the high prevalence of repeat unit insertions or deletions primarily caused by polymerase slippage during replication. Genetic variation at STRs has been shown to influence a range of traits in humans, including gene expression, cancer risk, and autism. Until recently STRs have been poorly studied since they pose significant challenges to bioinformatics analyses. Moreover, genome-wide analysis of STR variation in population-scale cohorts requires large amounts of data and computational resources. However, the recent advent of genome-wide analysis tools has resulted in multiple large genome-wide datasets of STR variation spanning nearly two million genomic loci in thousands of individuals from diverse populations. Here we present WebSTR, a database of genetic variation and other characteristics of genome-wide STRs across human populations. WebSTR is based on reference panels of more than 1.7 million human STRs created with state of the art repeat annotation methods and can easily be extended to include additional cohorts or species. It currently contains data based on STR genotypes for individuals from the 1000 Genomes Project, H3Africa, the Genotype-Tissue Expression (GTEx) Project and colorectal cancer patients from the TCGA dataset. WebSTR is implemented as a relational database with programmatic access available through an API and a web portal for browsing data. The web portal is publicly available at https://webstr.ucsd.edu.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jmb.2023.168260DOI Listing

Publication Analysis

Top Keywords

short tandem
8
genetic variation
8
genome-wide analysis
8
str variation
8
populations webstr
8
web portal
8
variation
5
strs
5
webstr
4
webstr population-wide
4

Similar Publications

Evaluation of RMplex system for differentiating father-son pairs using Y-STRs in a Korean population.

Forensic Sci Int Genet

January 2025

Forensic DNA Division, National Forensic Service, Wonju, South Korea. Electronic address:

Y-chromosomal short tandem repeats (Y-STRs) at rapidly mutating (RM) loci have been suggested as tools for differentiating paternally related males. RMplex is a recently developed system that incorporates 26 RM loci and four fast-mutating (FM) loci, targeting 44 male-specific loci. Here, we evaluated the RMplex by estimating Y-STR mutation rates and the overall differentiation rates for 542 Korean father-son pairs, as well as the genetic population values for 409 unrelated males.

View Article and Find Full Text PDF

Background: Molecular diagnosis has become highly significant for patient management in oncology.

Methods: Here, 30 well-characterized clinical germline samples were studied with adaptive sampling to enrich the full sequence of 152 cancer predisposition genes. Sequencing was performed on Oxford Nanopore (ONT) R10.

View Article and Find Full Text PDF

Background: Frameshift variants in the variable number tandem repeat region of () cause autosomal dominant tubulointerstitial kidney disease (ADTKD-) but are challenging to detect. We investigated the prevalence in patients with kidney failure of undetermined aetiology and compared Danish families with ADTKD-.

Methods: We recruited patients with suspected kidney failure of undetermined aetiology at ≤50 years and excluded those with a clear-cut clinical or histopathological kidney diagnoses or established genetic kidney diseases identified thorough medical record review.

View Article and Find Full Text PDF

Deciphering the Coupling State-Dependent Transcription Termination in the Escherichia coli Galactose Operon.

Mol Microbiol

January 2025

Department of Biological Sciences, College of Biological Sciences and Biotechnology, Chungnam National University, Daejeon, Republic of Korea.

The distance between the ribosome and the RNA polymerase active centers, known as the mRNA loop length, is crucial for transcription-translation coupling. Despite the existence of multiple expressomes with varying mRNA loop lengths, their in vivo roles remain largely unexplored. This study examines the mechanisms governing transcription termination in the Escherichia coli galactose operon, revealing a crucial role in the transcription and translation coupling state.

View Article and Find Full Text PDF

The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research and clinical laboratories to evaluate variant detection on male sex chromosomes X and Y, we create a small variant benchmark set with 111,725 variants for the Genome in a Bottle HG002 reference material. We develop an active evaluation approach to demonstrate the benchmark set reliably identifies errors in challenging genomic regions and across short and long read callsets.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!