Repetitive DNA sequences cause genomic instability and are important genetic markers. Identification of repeats is a critical step in genome annotation and analysis. On the other hand, repeats also pose a technical challenge for genome assembly and alignment programs using NGS data. RFGR is a comprehensive tool that can find exact repetitive sequences in complete genomes and assembled genomes, as well as NGS reads of prokaryotes. For complete genomes, RFGR uses a suffix trees to find seed repeats of repetitive sequences of fixed length with indels. For assembled genomes, RFGR uses a modified Bowtie aligner to find seed repeats of exact repetitive sequences in the contigs/ scaffolds, which are then extended to maximal repeats. The repeats are classified and for repeats near a gene, RFGR reports the gene as well. For the control dataset of E. coli UTI89 and E. coli K12, RFGR reports 35,141 and 49,352 repeats, respectively. For NGS reads, RFGR uses the frequency of the repetitive k-mers to determine FASTQ reads containing repetitive sequences and removes them from the dataset. An E. coli K12 NGS dataset pre-processed using RFGR, on comparison with the original dataset, gives an improved assembly. The N50 value improves by 22.86% with a decrease in size of the assembly graph by nearly 50%. Thus, with RFGR, we achieve a better assembly with reduced computation. RFGR can be improved in terms of the length of the minimum repeat found, extending to find approximate repeats and to be applicable to Eukaryotes as well.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1007/s10528-023-10628-x | DOI Listing |
Mol Ecol Resour
January 2025
Manchester Institute of Biotechnology, School of Natural Sciences, University of Manchester, Manchester, UK.
Collagen is the most ubiquitous protein in the animal kingdom and one of the most abundant proteins on Earth. Despite having a relatively repetitive amino acid sequence motif that enables its triple helical structure, in type 1 collagen, that dominates skin and bone, there is enough variation for its increasing use for the biomolecular species identification of animal tissues processed or degraded beyond the amenability of DNA-based analyses. In recent years, this has been most commonly achieved through the technique of collagen peptide mass fingerprinting (PMF) known as ZooMS (Zooarchaeology by Mass Spectrometry), applied to the analysis of tens of thousands of samples across over one hundred studies in the past decade alone.
View Article and Find Full Text PDFNat Commun
January 2025
Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA.
Studies of the genetics of Alzheimer's disease (AD) have largely focused on single nucleotide variants and short insertions/deletions. However, most of the disease heritability has yet to be uncovered, suggesting that there is substantial genetic risk conferred by other forms of genetic variation. There are over one million short tandem repeats (STRs) in the genome, and their link to AD risk has not been assessed.
View Article and Find Full Text PDFOpen Biol
January 2025
Department of Epigenetics, Medical Research Institute (MRI), Tokyo Medical and Dental University (TMDU), Tokyo 113-8510, Japan.
Retrotransposon Gag-like (RTL) 8A, 8B and 8C are eutherian-specific genes derived from a certain retrovirus. They cluster as a triplet of genes on the X chromosome, but their function remains unknown. Here, we demonstrate that and play important roles in the brain: their double knockout (DKO) mice not only exhibit reduced social responses and increased apathy-like behaviour, but also become obese from young adulthood, similar to patients with late Prader-Willi syndrome (PWS), a neurodevelopmental genomic imprinting disorder.
View Article and Find Full Text PDFBMC Microbiol
January 2025
School of Biological Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
Antimicrobial resistance (AMR) in soil is an ancient phenomenon with widespread spatial presence in terrestrial ecosystems. However, the natural processes shaping the temporal dissemination of AMR in soils are not well understood. We aimed to determine whether, how, and why AMR varies with soil age in recently deglaciated pioneer and developing Arctic soils using a space-for-time approach.
View Article and Find Full Text PDFSci Rep
January 2025
Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), 8A Biomedical Grove, #04-06 Immunos, Singapore, 138648, Singapore.
The tumor suppressor LKB1/STK11 plays important roles in regulating cellular metabolism and stress responses and its mutations are associated with various cancers. We recently identified a novel exon 1b within intron 1 of human LKB1/STK11, which generates an alternatively spliced, mitochondria-targeting LKB1 isoform important for regulating mitochondrial oxidative stress. Here we examined the formation of this novel exon 1b and uncovered its relatively late emergence during evolution.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!