i4mC-EL: Identifying DNA N4-Methylcytosine Sites in the Mouse Genome Using Ensemble Learning.

Biomed Res Int

College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China.

Published: October 2021

As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) plays a crucial role in controlling gene replication, expression, cell cycle, DNA replication, and differentiation. The accurate identification of 4mC sites is necessary to understand biological functions. In the paper, we use ensemble learning to develop a model named i4mC-EL to identify 4mC sites in the mouse genome. Firstly, a multifeature encoding scheme consisting of Kmer and EIIP was adopted to describe the DNA sequences. Secondly, on the basis of the multifeature encoding scheme, we developed a stacked ensemble model, in which four machine learning algorithms, namely, BayesNet, NaiveBayes, LibSVM, and Voted Perceptron, were utilized to implement an ensemble of base classifiers that produce intermediate results as input of the metaclassifier, Logistic. The experimental results on the independent test dataset demonstrate that the overall rate of predictive accurate of i4mC-EL is 82.19%, which is better than the existing methods. The user-friendly website implementing i4mC-EL can be accessed freely at the following.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8187051PMC
http://dx.doi.org/10.1155/2021/5515342DOI Listing

Publication Analysis

Top Keywords

dna n4-methylcytosine
8
sites mouse
8
mouse genome
8
ensemble learning
8
4mc sites
8
multifeature encoding
8
encoding scheme
8
i4mc-el
4
i4mc-el identifying
4
dna
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!