MuLan-Methyl-multiple transformer-based language models for accurate DNA methylation prediction.

Gigascience

Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany.

Published: December 2022

Transformer-based language models are successfully used to address massive text-related tasks. DNA methylation is an important epigenetic mechanism, and its analysis provides valuable insights into gene regulation and biomarker identification. Several deep learning-based methods have been proposed to identify DNA methylation, and each seeks to strike a balance between computational effort and accuracy. Here, we introduce MuLan-Methyl, a deep learning framework for predicting DNA methylation sites, which is based on 5 popular transformer-based language models. The framework identifies methylation sites for 3 different types of DNA methylation: N6-adenine, N4-cytosine, and 5-hydroxymethylcytosine. Each of the employed language models is adapted to the task using the "pretrain and fine-tune" paradigm. Pretraining is performed on a custom corpus of DNA fragments and taxonomy lineages using self-supervised learning. Fine-tuning aims at predicting the DNA methylation status of each type. The 5 models are used to collectively predict the DNA methylation status. We report excellent performance of MuLan-Methyl on a benchmark dataset. Moreover, we argue that the model captures characteristic differences between different species that are relevant for methylation. This work demonstrates that language models can be successfully adapted to applications in biological sequence analysis and that joint utilization of different language models improves model performance. Mulan-Methyl is open source, and we provide a web server that implements the approach.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10367125PMC
http://dx.doi.org/10.1093/gigascience/giad054DOI Listing

Publication Analysis

Top Keywords

dna methylation
28
language models
24
transformer-based language
12
methylation
9
dna
8
predicting dna
8
methylation sites
8
models adapted
8
methylation status
8
performance mulan-methyl
8

Similar Publications

Introduction: The interferon regulatory factor 7 (IRF7), a member of the IRF family of transcription factors, plays a major role in the regulation of numerous aspects of an immune response and has increasingly been surveyed to determine the aetiology and pathogenesis of systemic sclerosis (SSc). Objective: This study aimed to investigate the transcriptional levels of IRF7 mRNA in peripheral blood mononuclear cells (PBMCs) and the impact of promoter methylation on IRF7 mRNA expression in SSc patients compared to healthy controls.

Methods: PBMCs were obtained from confirmed 40 naïve SSc cases and 20 healthy controls for IRF-7 expression and methylation analysis.

View Article and Find Full Text PDF

Blood-based DNA methylation markers for lung cancer prediction.

BMJ Oncol

May 2024

Genomic Epidemiology Branch, International Agency for Research on Cancer, Lyon, France.

Objective: Screening high-risk individuals with low-dose CT reduces mortality from lung cancer, but many lung cancers occur in individuals who are not eligible for screening. Risk biomarkers may be useful to refine risk models and improve screening eligibility criteria. We evaluated if blood-based DNA methylation markers can improve a traditional lung cancer prediction model.

View Article and Find Full Text PDF

Background: Normal brain aging is associated with dopamine decline, which has been linked to age-related cognitive decline. Factors underlying individual differences in dopamine integrity at older ages remain, however, unclear. Here we aimed at investigating: (i) whether inflammation is associated with levels and 5-year changes of in vivo dopamine D2-receptor (DRD2) availability, (ii) if DRD2-inflammation associations differ between men and women, and (iii) whether inflammation and cerebral small-vessel disease (white-matter lesions) serve as two independent predictors of DRD2 availability.

View Article and Find Full Text PDF

Assessment of relationships between epigenetic age acceleration and multiple sclerosis: a bidirectional mendelian randomization study.

Epigenetics Chromatin

January 2025

Department of Neurology, Tongji Shanxi Hospital, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Third Hospital of Shanxi Medical University, Taiyuan, China.

Background: The DNA methylation-based epigenetic clocks are increasingly recognized for their precision in predicting aging and its health implications. Although prior research has identified connections between accelerated epigenetic aging and multiple sclerosis, the chronological and causative aspects of these relationships are yet to be elucidated. Our research seeks to clarify these potential causal links through a bidirectional Mendelian randomization study.

View Article and Find Full Text PDF

Postpartum depression (PPD) affects ~10-15% of childbearing individuals, with deleterious consequences for two generations. Recent research has explored the biological mechanisms of PPD, particularly neuroactive steroids (NAS). We sought here to investigate associations between NAS levels and ratios during pregnancy and the subsequent development of depressive symptoms with postpartum onset.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!