Transcription initiation is a process that is essential to ensuring the proper function of any gene, yet we still lack a unified understanding of sequence patterns and rules that explain most transcription start sites in the human genome. By predicting transcription initiation at base-pair resolution from sequences with a deep learning-inspired explainable model called Puffin, we show that a small set of simple rules can explain transcription initiation at most human promoters. We identify key sequence patterns that contribute to human promoter activity, each activating transcription with distinct position-specific effects.
View Article and Find Full Text PDFTranscription initiation is an essential process for ensuring proper function of any gene, however, a unified understanding of sequence patterns and rules that determine transcription initiation sites in human genome remains elusive. By explaining transcription initiation at basepair resolution from sequence with a deep learning-inspired explainable modeling approach, here we show that simple rules can explain the vast majority of human promoters. We identified key sequence patterns that contribute to human promoter function, each activating transcription with a distinct position-specific effect curve that likely reflects its mechanism of promoting transcription initiation.
View Article and Find Full Text PDFDesigning biological sequences is an important challenge that requires satisfying complex constraints and thus is a natural problem to address with deep generative modeling. Diffusion generative models have achieved considerable success in many applications. Score-based generative stochastic differential equations (SDE) model is a continuous-time diffusion model framework that enjoys many benefits, but the originally proposed SDEs are not naturally designed for modeling discrete data.
View Article and Find Full Text PDF