When endogenous retroviruses (ERVs) or other transposable elements (TEs) insert into an intron, the consequence on gene transcription can range from negligible to a complete ablation of normal transcripts. With the advance of sequencing technology, more and more insertionally polymorphic or private TE insertions are being identified in humans and mice, of which some could have a significant impact on host gene expression. Nevertheless, an efficient and low cost approach to prioritize their potential effect on gene transcription has been lacking. By building a computational model based on artificial neural networks (ANN), we demonstrate the feasibility of using machine-learning approaches to predict the likelihood that intronic ERV insertions will have major effects on gene transcription, focusing on the two ERV families, namely Intracisternal A-type Particle (IAP) and Early Transposon (ETn)/MusD elements, which are responsible for the majority of ERV-induced mutations in mice. We trained the ANN model using properties associated with these ERVs known to cause germ-line mutations (positive cases) and properties associated with likely neutral ERVs of the same families (negative cases), and derived a set of prediction plots that can visualize the likelihood of affecting gene transcription by ERV insertions. Our results show a highly reliable prediction power of our model, and offer a potential approach to computationally screen for other types of TE insertions that may affect gene transcription or even cause disease.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3735543 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071971 | PLOS |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!