iProEP: A Computational Predictor for Predicting Promoter.

Mol Ther Nucleic Acids

Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China. Electronic address:

Published: September 2019

AI Article Synopsis

  • Promoters are key DNA elements near the transcription start site that regulate gene transcription and are critical for understanding gene structure and regulation.
  • This study combined pseudo k-tuple nucleotide composition with a position-correlation scoring function to identify promoter sequences in several organisms, including humans and bacteria.
  • The method showed high accuracy in distinguishing promoters from non-promoters (up to 95.7%) and outperformed existing techniques, with a user-friendly online server created for broader access to the tool.

Article Abstract

Promoter is a fundamental DNA element located around the transcription start site (TSS) and could regulate gene transcription. Promoter recognition is of great significance in determining transcription units, studying gene structure, analyzing gene regulation mechanisms, and annotating gene functional information. Many models have already been proposed to predict promoters. However, the performances of these methods still need to be improved. In this work, we combined pseudo k-tuple nucleotide composition (PseKNC) with position-correlation scoring function (PCSF) to formulate promoter sequences of Homo sapiens (H. sapiens), Drosophila melanogaster (D. melanogaster), Caenorhabditis elegans (C. elegans), Bacillus subtilis (B. subtilis), and Escherichia coli (E. coli). Minimum Redundancy Maximum Relevance (mRMR) algorithm and increment feature selection strategy were then adopted to find out optimal feature subsets. Support vector machine (SVM) was used to distinguish between promoters and non-promoters. In the 10-fold cross-validation test, accuracies of 93.3%, 93.9%, 95.7%, 95.2%, and 93.1% were obtained for H. sapiens, D. melanogaster, C. elegans, B. subtilis, and E. coli, with the areas under receiver operating curves (AUCs) of 0.974, 0.975, 0.981, 0.988, and 0.976, respectively. Comparative results demonstrated that our method outperforms existing methods for identifying promoters. An online web server was established that can be freely accessed (http://lin-group.cn/server/iProEP/).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6616480PMC
http://dx.doi.org/10.1016/j.omtn.2019.05.028DOI Listing

Publication Analysis

Top Keywords

iproep computational
4
computational predictor
4
predictor predicting
4
promoter
4
predicting promoter
4
promoter promoter
4
promoter fundamental
4
fundamental dna
4
dna element
4
element located
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!