N4-methylcytosine (4 mC) is an important and common methylation which widely exists in prokaryotes. It plays a crucial role in correcting DNA replication errors and protecting host DNA against degradation by restrictive enzymes. Hence, the accurate identification for 4 mC sites is greatly significant for understanding biological functions and treating gene diseases. In this paper, a novel model is designed for identifying 4 mC sites. Firstly, we extract features from original sequences by multi-source feature representation methods, which are mono-nucleotide binary and k-mer frequency, dinucleotide binary and position-specific frequency, ring-function-hydrogen-chemical properties, dinucleotide-based DNA properties and trinucleotide-based DNA properties. Subsequently, gradient boosting decision tree is applied to select the optimal feature set and remove redundant information. Finally, support vector machine is employed to predict 4 mC or non-4mC sites. The accuracies of six datasets reach 0.851, 0.859, 0.801, 0.87, 0.859 and 0.901, respectively, which are superior to previous prediction methods. Therefore, the results show that our predictor is a feasible and effective tool for identifying 4 mC sites. Furthermore, an online web server is established at http://dnan4c.zhanglab.site.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ab.2022.114746DOI Listing

Publication Analysis

Top Keywords

4 mc sites
12
gradient boosting
8
boosting decision
8
decision tree
8
identifying 4 mc
8
dna properties
8
sites
5
4 mc
5
identification dna
4
dna n4-methylcytosine
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!