iPromoter-ET: Identifying promoters and their strength by extremely randomized trees-based feature selection.

Yunyun Liang Shengli Zhang Huijuan Qiao Yingying Yao

Anal Biochem

School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China.

Published: October 2021

Promoter is a region of DNA that determines the transcription of a particular gene. There are several σ factors in the RNA polymerase, which has the function of identifying the promoter and facilitating the binding of the RNA polymerase to the promoter. Owing to the importance of promoter in genome research, it is an urgent task to develop computational tool for effectively identifying promoters and their strength facing the avalanche of DNA sequences discovered in the post-genomic age. In this paper, we develop a model named iPromoter-ET using the k-mer nucleotide composition, binary encoding and dinucleotide property matrix-based distance transformation for features extraction, and extremely randomized trees (extra trees) for feature selection. Its 1st layer is used to identify whether a DNA sequence is of promoter or not, while its 2nd layer is to identify promoter samples as being strong or weak promoter. Support vector machine and the five cross-validation are used to perform identification and assess performance, respectively. The results indicate that our model remarkably outperforms the existing models in both the 1st and 2nd layers for accuracy and stability. We anticipate that our proposed model will become a very effective intelligent tool, or at the least, a complementary tool to the existing modes of identifying promoters and their strength. Moreover, the datasets and codes for iPromoter-ET are freely available at https://github.com/shengli0201/iPromoter-ET.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.ab.2021.114335	DOI Listing

Publication Analysis

Top Keywords

identifying promoters

promoters strength

extremely randomized

feature selection

rna polymerase

layer identify

promoter

ipromoter-et identifying

strength extremely

randomized trees-based

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!