DNA N6-Methyladenine (6mA) is a common epigenetic modification, which plays some significant roles in the growth and development of plants. It is crucial to identify 6mA sites for elucidating the functions of 6mA. In this article, a novel model named i6mA-vote is developed to predict 6mA sites of plants. Firstly, DNA sequences were coded into six feature vectors with diverse strategies based on density, physicochemical properties, and position of nucleotides, respectively. To find the best coding strategy, the feature vectors were compared on several machine learning classifiers. The results suggested that the position of nucleotides has a significant positive effect on 6mA sites identification. Thus, the dinucleotide one-hot strategy which can describe position characteristics of nucleotides well was employed to extract DNA features in our method. Secondly, DNA sequences of Rosaceae were divided into a training dataset and a test dataset randomly. Finally, i6mA-vote was constructed by combining five different base-classifiers under a majority voting strategy and trained on the Rosaceae training dataset. The i6mA-vote was evaluated on the task of predicting 6mA sites from the genome of the Rosaceae, Rice, and Arabidopsis separately. In Rosaceae, the performances of i6mA-vote were 0.955 on accuracy (ACC), 0.909 on Matthew correlation coefficients (MCC), 0.955 on sensitivity (SN), and 0.954 on specificity (SP). Those indicators, in the order of ACC, MCC, SN, SP, were 0.882, 0.774, 0.961, and 0.803 on Rice while they were 0.798, 0.617, 0.666, and 0.929 on Arabidopsis. According to the indicators, our method was effectiveness and better than other concerned methods. The results also illustrated that i6mA-vote does not only well in 6mA sites prediction of intraspecies but also interspecies plants. Moreover, it can be seen that the specificity is distinctly lower than the sensitivity in Rice while it is just the opposite in Arabidopsis. It may be resulted from sequence similarity among Rosaceae, Rice and Arabidopsis.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8882731 | PMC |
http://dx.doi.org/10.3389/fpls.2022.845835 | DOI Listing |
Nucleic Acids Res
December 2024
MOE Key Laboratory of Evolution & Marine Biodiversity and Institute of Evolution & Marine Biodiversity, Ocean University of China, Qingdao 266003, China.
The ciliate Tetrahymena thermophila is a well-established unicellular model eukaryote, contributing significantly to foundational biological discoveries. Despite its acknowledged importance, current studies on Tetrahymena biology face challenges due to gene annotation inaccuracy, particularly the notable absence of untranslated regions (UTRs). To comprehensively annotate the Tetrahymena macronuclear genome, we collected extensive transcriptomic data spanning various cell stages.
View Article and Find Full Text PDFbioRxiv
November 2024
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
While nanopore sequencing is increasingly used for mapping DNA modifications, it is important to recognize false positive calls as they can mislead biological interpretations. To assist biologists and methods developers, we describe a framework for rigorous evaluation that highlights the use of false discovery rate with rationally designed negative controls capturing both general background and confounding modifications. Our critical assessment across multiple forms of DNA modifications highlights that while nanopore sequencing performs reliably for high-abundance modifications, including 5-methylcytosine (5mC) at CpG sites in mammalian cells and 5-hydroxymethylcytosine (5hmC) in mammalian brain cells, it makes a significant proportion of false positive detections for low-abundance modifications, such as 5mC at CpH sites, 5hmC and N6-methyldeoxyadenine (6mA) in most mammal cell types.
View Article and Find Full Text PDFFront Genet
November 2024
School of Cyber Science and Engineering, University of International Relations, Beijing, China.
DNA methylation is of crucial importance for biological genetic expression, such as biological cell differentiation and cellular tumours. The identification of DNA-6mA sites using traditional biological experimental methods requires more cumbersome steps and a large amount of time. The advent of neural network technology has facilitated the identification of 6 mA sites on cross-species DNA with enhanced efficacy.
View Article and Find Full Text PDFPLoS One
October 2024
School of Information and Communication Engineering, Hainan University, Haikou, Hainan, China.
In this study, from the perspective of image processing, we propose the iDNA-ITLM model, using a novel data enhance strategy by continuously self-replicating a short DNA sequence into a longer DNA sequence and then embedding it into a high-dimensional matrix to enlarge the receptive field, for identifying DNA methylation sites. Our model consistently outperforms the current state-of-the-art sequence-based DNA methylation site recognition methods when evaluated on 17 benchmark datasets that cover multiple species and include three DNA methylation modifications (4mC, 5hmC, and 6mA). The experimental results demonstrate the robustness and superior performance of our model across these datasets.
View Article and Find Full Text PDFMethods
October 2024
Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China. Electronic address:
DNA N6 methyladenine (6mA) plays an important role in many biological processes, and accurately identifying its sites helps one to understand its biological effects more comprehensively. Previous traditional experimental methods are very labor-intensive and traditional machine learning methods also seem to be somewhat insufficient as the database of 6mA methylation groups becomes progressively larger, so we propose a deep learning-based method called multi-scale convolutional model based on global response normalization (CG6mA) to solve the prediction problem of 6mA site. This method is tested with other methods on three different kinds of benchmark datasets, and the results show that our model can get more excellent prediction results.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!