DNA 5-methylcytosine (5mC) is widely present in multicellular eukaryotes, which plays important roles in various developmental and physiological processes and a wide range of human diseases. Thus, it is essential to accurately detect the 5mC sites. Although current sequencing technologies can map genome-wide 5mC sites, these experimental methods are both costly and time-consuming. To achieve a fast and accurate prediction of 5mC sites, we propose a new computational approach, BERT-5mC. First, we pre-trained a domain-specific BERT (bidirectional encoder representations from transformers) model by using human promoter sequences as language . BERT is a deep two-way language representation model based on Transformer. Second, we fine-tuned the domain-specific BERT model based on the 5mC training dataset to build the model. The cross-validation results show that our model achieves an AUROC of 0.966 which is higher than other state-of-the-art methods such as iPromoter-5mC, 5mC_Pred, and BiLSTM-5mC. Furthermore, our model was evaluated on the independent test set, which shows that our model achieves an AUROC of 0.966 that is also higher than other state-of-the-art methods. Moreover, we analyzed the attention weights generated by BERT to identify a number of nucleotide distributions that are closely associated with 5mC modifications. To facilitate the use of our model, we built a webserver which can be freely accessed at: http://5mc-pred.zhulab.org.cn.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10712318 | PMC |
http://dx.doi.org/10.7717/peerj.16600 | DOI Listing |
Epigenet Rep
November 2024
Department of Environmental Health Sciences, University of Michigan School of Public Health, Ann Arbor, Michigan, USA.
DNA methylation, an epigenetic mark, has become a common outcome in epidemiological studies with the aid of affordable and reliable technologies. Yet the most widespread technique used to assess methylation, bisulfite conversion, does not allow for the differentiation of regular DNA methylation (5-mC) and other cytosine modifications, like that of hydroxymethylation (5-hmC). As both 5-mC and 5-hmC have distinct biological roles, sometimes with opposing effects, it is crucial to understand the difference between these marks.
View Article and Find Full Text PDFbioRxiv
November 2024
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
While nanopore sequencing is increasingly used for mapping DNA modifications, it is important to recognize false positive calls as they can mislead biological interpretations. To assist biologists and methods developers, we describe a framework for rigorous evaluation that highlights the use of false discovery rate with rationally designed negative controls capturing both general background and confounding modifications. Our critical assessment across multiple forms of DNA modifications highlights that while nanopore sequencing performs reliably for high-abundance modifications, including 5-methylcytosine (5mC) at CpG sites in mammalian cells and 5-hydroxymethylcytosine (5hmC) in mammalian brain cells, it makes a significant proportion of false positive detections for low-abundance modifications, such as 5mC at CpH sites, 5hmC and N6-methyldeoxyadenine (6mA) in most mammal cell types.
View Article and Find Full Text PDFmedRxiv
October 2024
Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA.
The study conducted a comprehensive genome-wide analysis of differential 5mC and 5hmC modifications at both CpG and non-CpG sites in postmortem orbitofrontal neurons from 25 PTSD cases and 13 healthy controls. It was observed that PTSD patients exhibit a greater number of differential 5hmC sites compared to 5mC sites. Specifically, individuals with PTSD tend to show hyper-5mC/5hmC at CpG sites, particularly within CpG islands and promoter regions, and hypo-5mC/5hmC at non-CpG sites, especially within intragenic regions.
View Article and Find Full Text PDFInt J Biol Macromol
December 2024
Molecular Simulation Laboratory, Department of Physics, Bharathiar University, Coimbatore 641046, Tamilnadu, India. Electronic address:
Detecting epigenetically modified (EM) bases is crucial for disease detection, biosensing, and DNA sequencing. Two-dimensional P-doped SiBN and BN sheets are used as sensing substrates in density functional theory (DFT) studies. Both the sheets are doped with a phosphorous atom at various atomic sites to examine the sheet's potential in detecting 5-hydroxymethylcytosine (5hmc), 5-methylcytosine (5mc), 7-methylguanine (7mg) and 8-oxoguanine (8oxg) bases.
View Article and Find Full Text PDFbioRxiv
September 2024
Single Molecule Analysis Group, Department of Chemistry, University of Michigan, Ann Arbor, MI 48109, USA.
DNA methylation is a fundamental element of epigenetic regulation that is governed by the MBD protein superfamily, a group of "readers" that share a highly conserved methyl-CpG-binding domain (MBD) and mediate chromatin remodeler recruitment, transcription regulation, and coordination of DNA and histone modification. Previous work has characterized the binding affinity and sequence selectivity of MBD-containing proteins toward palindromes of 5-methylcytosine (5mC) containing 5mCpG dinucleotides, often referred to as single symmetrically methylated CpG sites. However, little is known about how MBD binding is influenced by the prototypical local clustering of methylated CpG sites and the presence of DNA structural motifs encountered, e.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!