The Structure of Simple Satellite Variation in the Human Genome and Its Correlation With Centromere Ancestry.

Genome Biol Evol

Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA.

Published: August 2024

AI Article Synopsis

  • The study utilizes k-Seek software to analyze 2,504 human genomes, revealing the prevalence of simple satellites, particularly Human Satellite 3, which makes up about 8 Mb of the genetic content.
  • Researchers identified ~50,000 rare tandem repeats not previously cataloged, including new variants of important genetic elements like telomeric and pericentromeric repeats.
  • Findings show that while most abundant repeats are similar across populations, AG-rich repeats are notably more prevalent in African individuals; the variations in simple satellite abundances correlate with genetic relatedness among individuals, particularly in relation to specific chromosome ancestries.

Article Abstract

Although repetitive DNA forms much of the human genome, its study is challenging due to limitations in assembly and alignment of repetitive short-reads. We have deployed k-Seek, software that detects tandem repeats embedded in single reads, on 2,504 human genomes from the 1,000 Genomes Project to quantify the variation and abundance of simple satellites (repeat units <20 bp). We find that the ancestral monomer of Human Satellite 3 makes up the largest portion of simple satellite content in humans (mean of ∼8 Mb). We discovered ∼50,000 rare tandem repeats that are not detected in the T2T-CHM13v2.0 assembly, including undescribed variants of telomericand pericentromeric repeats. We find broad homogeneity of the most abundant repeats across populations, except for AG-rich repeats which are more abundant in African individuals. We also find cliques of highly similar AG- and AT-rich satellites that are interspersed and form higher-order structures that covary in copy number across individuals, likely through concerted amplification via unequal exchange. Finally, we use pericentromeric polymorphisms to estimate centromeric genetic relatedness between individuals and find a strong predictive relationship between centromeric lineages and pericentromeric simple satellite abundances. In particular, ancestral monomers of Human Satellite 2 and Human Satellite 3 abundances correlate with clusters of centromeric ancestry on chromosome 16 and chromosome 9, with some clusters structured by population. These results provide new descriptions of the population dynamics that underlie the evolution of simple satellites in humans.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11305138PMC
http://dx.doi.org/10.1093/gbe/evae153DOI Listing

Publication Analysis

Top Keywords

human genome
8
structure simple
4
simple satellite
4
satellite variation
4
variation human
4
genome correlation
4
correlation centromere
4
centromere ancestry
4
ancestry repetitive
4
repetitive dna
4

Similar Publications

Cancer survivors have an increased risk of developing Type 2 diabetes compared to the general population. Patients treated with cisplatin, a common chemotherapeutic agent, are more likely to develop metabolic syndrome and Type 2 diabetes than age- and sex-matched controls. Surprisingly, the impact of cisplatin on pancreatic islets has not been reported.

View Article and Find Full Text PDF

Hemorrhagic stroke is a known complication of glioma, yet the underlying mechanisms remain poorly understood. This study aims to investigate key biomarkers of glioma-related hemorrhage to provide insights into glioma molecular therapies. Data were obtained from the Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA) databases to analyze differentially expressed genes (DEGs) in glioma by contrasting glioblastoma (GBM) with low-grade gliomas (LGGs).

View Article and Find Full Text PDF

Background: Familial hyperlipidemia (familial hypercholesterolemia, FH) is an autosomal genetic disorder. It includes type heterozygous familial hyperlipidemia (heterozygous familial hypercholesterolemia). HeFH is mainly caused by mutations in the LDLR, APOB, and PCSK9 genes and is characterized by elevated plasma low-density lipoprotein cholesterol levels.

View Article and Find Full Text PDF

VirDetect-AI: a residual and convolutional neural network-based metagenomic tool for eukaryotic viral protein identification.

Brief Bioinform

November 2024

Departamento de Genética del Desarrollo y Fisiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos 62210, México.

This study addresses the challenging task of identifying viruses within metagenomic data, which encompasses a broad array of biological samples, including animal reservoirs, environmental sources, and the human body. Traditional methods for virus identification often face limitations due to the diversity and rapid evolution of viral genomes. In response, recent efforts have focused on leveraging artificial intelligence (AI) techniques to enhance accuracy and efficiency in virus detection.

View Article and Find Full Text PDF

Identifying cancer prognosis genes through causal learning.

Brief Bioinform

November 2024

School of Artificial Intelligence, Jilin University, 3003 Qianjin Street, 130012 Changchun, China.

Accurate identification of causal genes for cancer prognosis is critical for estimating disease progression and guiding treatment interventions. In this study, we propose CPCG (Cancer Prognosis's Causal Gene), a two-stage framework identifying gene sets causally associated with patient prognosis across diverse cancer types using transcriptomic data. Initially, an ensemble approach models gene expression's impact on survival with parametric and semiparametric hazard models.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!