Promoters are the basic functional cis-elements to which RNA polymerase binds to initiate the process of gene transcription. Comprehensive understanding gene expression and regulation depends on the precise identification of promoters, as they are the most important component of gene expression. This study aimed to develop a machine learning-based model to predict promoters in (). In the prediction model, the promoter sequences in genome were encoded by pseudo -tuple nucleotide composition (PseKNC) and position-correlation scoring function (PCSF). Numerical features were obtained and then optimized using mRMR by combining with support vector machine (SVM) and 5-fold cross-validation (CV). Subsequently, these optimized features were inputted into SVM-based classifier to discriminate promoter sequences from non-promoter sequences in . Results of 10-fold CV showed that the model could yield the overall accuracy of 96.0% and the area under the ROC curve (AUC) of 0.990. We hope that this model will provide help for the study of promoter and gene regulation in .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10215528PMC
http://dx.doi.org/10.3389/fmicb.2023.1200678DOI Listing

Publication Analysis

Top Keywords

identification promoters
8
support vector
8
vector machine
8
gene expression
8
promoter sequences
8
computational identification
4
promoters
4
promoters support
4
machine promoters
4
promoters basic
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!