Though the advent of long-read sequencing technologies has led to a leap in contiguity of de novo genome assemblies, current reference genomes of higher organisms still do not provide unbroken sequences of complete chromosomes. Despite reads in excess of 30 000 base pairs, there are still repetitive structures that cannot be resolved by current state-of-the-art assemblers. The most challenging of these structures are tandemly arrayed repeats, which occur in the genomes of all eukaryotes. Untangling tandem repeat clusters is exceptionally difficult, since the rare differences between repeat copies are obscured by the high error rate of long reads. Solving this problem would constitute a major step towards computing fully assembled genomes. Here, we demonstrate by example of the Drosophila Histone Complex that via machine learning algorithms, it is possible to exploit the underlying distinguishing patterns of single nucleotide variants of repeats from very noisy data to resolve a large and highly conserved repeat cluster. The ideas explored in this paper are a first step towards the automated assembly of complex repeat structures and promise to be applicable to a wide range of eukaryotic genomes.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380962 | PMC |
http://dx.doi.org/10.1093/nar/gky1194 | DOI Listing |
J Biol Chem
January 2025
Department of Biochemistry and Molecular Biology and Hollings Cancer Center, Medical University of South Carolina, Charleston, SC 29425, USA. Electronic address:
Transposable element (TE) silencing in the germline is crucial for preserving genome integrity; its absence results in sterility and diminished developmental robustness. The Piwi-interacting RNA (piRNA) pathway is the primary small non-coding RNA mechanism by which TEs are silenced in the germline. Three piRNA binding proteins promote the piRNA pathway function in the germline- P-element-induced wimpy testis (Piwi), Aubergine (Aub), and Argonaute 3 (Ago3).
View Article and Find Full Text PDFGenetica
January 2025
Dipartimento di Scienze, Università degli Studi "Roma Tre", Rome, Italy.
In most Eukaryota, telomeres are protected by the CST complex, composed of CTC1, STN1 and TEN1. In Drosophila, instead, another complex is present, composed of Modigliani, Tea and Verrocchio. We performed a search for STN1 orthologs in Arthropoda, in order to verify if Verrocchio can be considered as such.
View Article and Find Full Text PDFInt J Mol Sci
December 2024
Koltzov Institute of Developmental Biology of Russian Academy of Sciences, 26 Vavilov Street, 119334 Moscow, Russia.
has two paralogs, and , related to the evolutionarily conserved family genes. In mammals, the family consists of , encoding transcription co-factors involved in the regulation of development and cell fate determination. The function of and in remains unclear.
View Article and Find Full Text PDFbioRxiv
December 2024
Integrative Program for Biological and Genome Sciences, University of North Carolina, Chapel Hill, NC, 27599 USA.
Coordinated expression of replication-dependent (RD) histones genes occurs within the Histone Locus Body (HLB) during S phase, but the molecular steps in transcription that are cell cycle regulated are unknown. We report that RNA Pol II promotes HLB formation and is enriched in the HLB outside of S phase, including G1-arrested cells that do not transcribe RD histone genes. In contrast, the transcription elongation factor Spt6 is enriched in HLBs only during S phase.
View Article and Find Full Text PDFNat Commun
January 2025
Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!