Motivation: Accurate identification of N4-methylcytosine (4mC) modifications in a genome wide can provide insights into their biological functions and mechanisms. Machine learning recently have become effective approaches for computational identification of 4mC sites in genome. Unfortunately, existing methods cannot achieve satisfactory performance, owing to the lack of effective DNA feature representations that are capable to capture the characteristics of 4mC modifications.

Results: In this work, we developed a new predictor named 4mcPred-IFL, aiming to identify 4mC sites. To represent and capture discriminative features, we proposed an iterative feature representation algorithm that enables to learn informative features from several sequential models in a supervised iterative mode. Our analysis results showed that the feature representations learnt by our algorithm can capture the discriminative distribution characteristics between 4mC sites and non-4mC sites, enlarging the decision margin between the positives and negatives in feature space. Additionally, by evaluating and comparing our predictor with the state-of-the-art predictors on benchmark datasets, we demonstrate that our predictor can identify 4mC sites more accurately.

Availability And Implementation: The user-friendly webserver that implements the proposed 4mcPred-IFL is well established, and is freely accessible at http://server.malab.cn/4mcPred-IFL.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz408DOI Listing

Publication Analysis

Top Keywords

4mc sites
16
feature representations
12
iterative feature
8
characteristics 4mc
8
identify 4mc
8
capture discriminative
8
4mc
6
sites
5
representations improve
4
improve n4-methylcytosine
4

Similar Publications

iDNA-ITLM: An interpretable and transferable learning model for identifying DNA methylation.

PLoS One

October 2024

School of Information and Communication Engineering, Hainan University, Haikou, Hainan, China.

In this study, from the perspective of image processing, we propose the iDNA-ITLM model, using a novel data enhance strategy by continuously self-replicating a short DNA sequence into a longer DNA sequence and then embedding it into a high-dimensional matrix to enlarge the receptive field, for identifying DNA methylation sites. Our model consistently outperforms the current state-of-the-art sequence-based DNA methylation site recognition methods when evaluated on 17 benchmark datasets that cover multiple species and include three DNA methylation modifications (4mC, 5hmC, and 6mA). The experimental results demonstrate the robustness and superior performance of our model across these datasets.

View Article and Find Full Text PDF

An Integrated Multi-Model Framework Utilizing Convolutional Neural Networks Coupled with Feature Extraction for Identification of 4mC Sites in DNA Sequences.

Comput Biol Med

December 2024

Department of Management Information Systems (MIS), School of Business, King Faisal University (KFU), Al-Ahsa, 31982, Saudi Arabia. Electronic address:

N4-methylcytosine (4mC) is a chemical modification that occurs on one of the four nucleotide bases in DNA and plays a vital role in DNA expression, repair, and replication. It also actively participates in the regulation of cell differentiation and gene expression. Consequently, it is important to comprehend the role of 4mC in the epigenetic regulation for revealing the complications of the gene expression and their associated governing cellular operations.

View Article and Find Full Text PDF

Using a hybrid neural network architecture for DNA sequence representation: A study on N-methylcytosine sites.

Comput Biol Med

August 2024

Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, 110, Taiwan; Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, 110, Taiwan; AIBioMed Research Group, Taipei Medical University, Taipei, 110, Taiwan; Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, 110, Taiwan. Electronic address:

N-methylcytosine (4mC) is a modified form of cytosine found in DNA, contributing to epigenetic regulation. It exists in various genomes, including the Rosaceae family encompassing significant fruit crops like apples, cherries, and roses. Previous investigations have examined the distribution and functional implications of 4mC sites within the Rosaceae genome, focusing on their potential roles in gene expression regulation, environmental adaptation, and evolution.

View Article and Find Full Text PDF

iDNA-OpenPrompt: OpenPrompt learning model for identifying DNA methylation.

Front Genet

April 2024

School of Information and Communication Engineering, Hainan University, Haikou, Hainan, China.

DNA methylation is a critical epigenetic modification involving the addition of a methyl group to the DNA molecule, playing a key role in regulating gene expression without changing the DNA sequence. The main difficulty in identifying DNA methylation sites lies in the subtle and complex nature of methylation patterns, which may vary across different tissues, developmental stages, and environmental conditions. Traditional methods for methylation site identification, such as bisulfite sequencing, are typically labor-intensive, costly, and require large amounts of DNA, hindering high-throughput analysis.

View Article and Find Full Text PDF

DNA 4 mC plays a crucial role in the genetic expression process of organisms. However, existing deep learning algorithms have shortcomings in the ability to represent DNA sequence features. In this paper, we propose a 4 mC site identification algorithm, DNABert-4mC, based on a fusion of the pruned pre-training DNABert-Pruning model and artificial feature encoding to identify 4 mC sites.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!