RFGR: Repeat Finder for Complete and Assembled Whole Genomes and NGS Reads.

Biochem Genet

Department of Computational Biology and Bioinformatics, University of Kerala, Karyavattom, Trivandrum, Kerala, India.

Published: October 2024

Repetitive DNA sequences cause genomic instability and are important genetic markers. Identification of repeats is a critical step in genome annotation and analysis. On the other hand, repeats also pose a technical challenge for genome assembly and alignment programs using NGS data. RFGR is a comprehensive tool that can find exact repetitive sequences in complete genomes and assembled genomes, as well as NGS reads of prokaryotes. For complete genomes, RFGR uses a suffix trees to find seed repeats of repetitive sequences of fixed length with indels. For assembled genomes, RFGR uses a modified Bowtie aligner to find seed repeats of exact repetitive sequences in the contigs/ scaffolds, which are then extended to maximal repeats. The repeats are classified and for repeats near a gene, RFGR reports the gene as well. For the control dataset of E. coli UTI89 and E. coli K12, RFGR reports 35,141 and 49,352 repeats, respectively. For NGS reads, RFGR uses the frequency of the repetitive k-mers to determine FASTQ reads containing repetitive sequences and removes them from the dataset. An E. coli K12 NGS dataset pre-processed using RFGR, on comparison with the original dataset, gives an improved assembly. The N50 value improves by 22.86% with a decrease in size of the assembly graph by nearly 50%. Thus, with RFGR, we achieve a better assembly with reduced computation. RFGR can be improved in terms of the length of the minimum repeat found, extending to find approximate repeats and to be applicable to Eukaryotes as well.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10528-023-10628-xDOI Listing

Publication Analysis

Top Keywords

repetitive sequences
16
assembled genomes
12
ngs reads
12
rfgr
10
repeats
9
reads repetitive
8
exact repetitive
8
complete genomes
8
genomes rfgr
8
find seed
8

Similar Publications

Quantifying Bone Collagen Fingerprint Variation Between Species.

Mol Ecol Resour

January 2025

Manchester Institute of Biotechnology, School of Natural Sciences, University of Manchester, Manchester, UK.

Collagen is the most ubiquitous protein in the animal kingdom and one of the most abundant proteins on Earth. Despite having a relatively repetitive amino acid sequence motif that enables its triple helical structure, in type 1 collagen, that dominates skin and bone, there is enough variation for its increasing use for the biomolecular species identification of animal tissues processed or degraded beyond the amenability of DNA-based analyses. In recent years, this has been most commonly achieved through the technique of collagen peptide mass fingerprinting (PMF) known as ZooMS (Zooarchaeology by Mass Spectrometry), applied to the analysis of tens of thousands of samples across over one hundred studies in the past decade alone.

View Article and Find Full Text PDF

Studies of the genetics of Alzheimer's disease (AD) have largely focused on single nucleotide variants and short insertions/deletions. However, most of the disease heritability has yet to be uncovered, suggesting that there is substantial genetic risk conferred by other forms of genetic variation. There are over one million short tandem repeats (STRs) in the genome, and their link to AD risk has not been assessed.

View Article and Find Full Text PDF

Retrotransposon Gag-like (RTL) 8A, 8B and 8C are eutherian-specific genes derived from a certain retrovirus. They cluster as a triplet of genes on the X chromosome, but their function remains unknown. Here, we demonstrate that and play important roles in the brain: their double knockout (DKO) mice not only exhibit reduced social responses and increased apathy-like behaviour, but also become obese from young adulthood, similar to patients with late Prader-Willi syndrome (PWS), a neurodevelopmental genomic imprinting disorder.

View Article and Find Full Text PDF

Antimicrobial resistance (AMR) in soil is an ancient phenomenon with widespread spatial presence in terrestrial ecosystems. However, the natural processes shaping the temporal dissemination of AMR in soils are not well understood. We aimed to determine whether, how, and why AMR varies with soil age in recently deglaciated pioneer and developing Arctic soils using a space-for-time approach.

View Article and Find Full Text PDF

Alu-Sc-mediated exonization generated a mitochondrial LKB1 gene variant found only in higher order primates.

Sci Rep

January 2025

Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), 8A Biomedical Grove, #04-06 Immunos, Singapore, 138648, Singapore.

The tumor suppressor LKB1/STK11 plays important roles in regulating cellular metabolism and stress responses and its mutations are associated with various cancers. We recently identified a novel exon 1b within intron 1 of human LKB1/STK11, which generates an alternatively spliced, mitochondria-targeting LKB1 isoform important for regulating mitochondrial oxidative stress. Here we examined the formation of this novel exon 1b and uncovered its relatively late emergence during evolution.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!