AI Article Synopsis

  • A variety of deep learning models are being developed to predict chromatin accessibility from DNA sequences, but evaluation results often overlook the significance of cell type specific regulatory elements (CREs), which are crucial for gene regulation and complex disease heritability.
  • The study evaluates the accuracy of these genomic models, revealing that general purpose models like Enformer and Sei perform worse in regions that are specifically accessible to certain cell types.
  • The research highlights that tailoring models for specific tissues and enhancing their capacity for cell type specific regulation can boost performance, but improving predictions of reference sequences doesn't necessarily translate to better predictions of variant effects, suggesting the need for new approaches in the field.

Article Abstract

Background: A number of deep learning models have been developed to predict epigenetic features such as chromatin accessibility from DNA sequence. Model evaluations commonly report performance genome-wide; however, regulatory elements (CREs), which play critical roles in gene regulation, make up only a small fraction of the genome. Furthermore, cell type specific CREs contain a large proportion of complex disease heritability.

Results: We evaluate genomic deep learning models in chromatin accessibility regions with varying degrees of cell type specificity. We assess two modeling directions in the field: general purpose models trained across thousands of outputs (cell types and epigenetic marks), and models tailored to specific tissues and tasks. We find that the accuracy of genomic deep learning models, including two state-of-the-art general purpose models - Enformer and Sei - varies across the genome and is reduced in cell type specific accessible regions. Using accessibility models trained on cell types from specific tissues, we find that increasing model capacity to learn cell type specific regulatory syntax - through single-task learning or high capacity multi-task models - can improve performance in cell type specific accessible regions. We also observe that improving reference sequence predictions does not consistently improve variant effect predictions, indicating that novel strategies are needed to improve performance on variants.

Conclusions: Our results provide a new perspective on the performance of genomic deep learning models, showing that performance varies across the genome and is particularly reduced in cell type specific accessible regions. We also identify strategies to maximize performance in cell type specific accessible regions.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11257480PMC
http://dx.doi.org/10.1101/2024.07.05.602265DOI Listing

Publication Analysis

Top Keywords

cell type
32
type specific
28
deep learning
20
learning models
20
specific accessible
20
accessible regions
20
genomic deep
16
performance cell
12
models
10
cell
10

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!