AI Article Synopsis

  • Retrotransposons, especially the L1 element, significantly influence mammalian genomes and human diseases, comprising 17% of the human genome despite most having defects.
  • The study tested RNA-Seq data from prostate cancer cells to compare the quality and effort required for identifying expressed L1, finding minimal data loss with whole-cell, strand-specific RNA-Seq compared to cytoplasmic RNA-Seq, though it required more manual curation.
  • The research concludes that with careful manual curation, both cytoplasmic and whole-cell stranded RNA-Seq datasets can effectively identify expressed L1 loci, even though non-strand-specific datasets result in significant data loss.

Article Abstract

Background: Retrotransposons are one of the oldest evolutionary forces shaping mammalian genomes, with the ability to mobilize from one genomic location to another. This mobilization is also a significant factor in human disease. The only autonomous human retroelement, L1, has propagated to make up 17% of the human genome, accumulating over 500,000 copies. The majority of these loci are truncated or defective with only a few reported to remain capable of retrotransposition. We have previously published a strand-specific RNA-Seq bioinformatics approach to stringently identify at the locus-specific level the few expressed full-length L1s using cytoplasmic RNA. With growing repositories of RNA-Seq data, there is potential to mine these datasets to identify and study expressed L1s at single-locus resolution, although many datasets are not strand-specific or not generated from cytoplasmic RNA.

Results: We developed whole-cell, cytoplasmic and nuclear RNA-Seq datasets from 22Rv1 prostate cancer cells to test the influence of different preparations on the quality and effort needed to measure L1 expression. We found that there was minimal data loss in the identification of full-length expressed L1 s using whole cell, strand-specific RNA-Seq data compared to cytoplasmic, strand-specific RNA-Seq data. However, this was only possible with an increased amount of manual curation of the bioinformatics output to eliminate increased background. About half of the data was lost when the sequenced datasets were non-strand specific.

Conclusions: The results of these studies demonstrate that with rigorous manual curation the utilization of stranded RNA-Seq datasets allow identification of expressed L1 loci from either cytoplasmic or whole-cell RNA-Seq datasets.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6945437PMC
http://dx.doi.org/10.1186/s13100-019-0194-zDOI Listing

Publication Analysis

Top Keywords

rna-seq data
12
rna-seq datasets
12
rna-seq
8
strand-specific rna-seq
8
manual curation
8
datasets
6
cytoplasmic
5
data
5
comparative analysis
4
analysis expression
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!