Chromatin accessibility and post-transcriptional histone modifications play important roles in gene expression regulation. However, little is known about the joint effect of multiple chromatin modifications on the gene expression level in plants, despite that the regulatory roles of individual histone marks such as H3K4me3 in gene expression have been well-documented. By using machine-learning methods, we systematically performed gene expression level prediction based on multiple chromatin modifications data in Arabidopsis and rice. We found that as few as four histone modifications were sufficient to yield good prediction performance, and H3K4me3 and H3K36me3 being the top two predictors with known functions related to transcriptional initiation and elongation, respectively. We demonstrated that the predictive powers differed between protein-coding and non-coding genes as well as between CpG-enriched and CpG-depleted genes. We also showed that the predictive model trained in one tissue or species could be applied to another tissue or species, suggesting shared underlying mechanisms. More interestingly, the gene expression levels of conserved orthologs are easier to predict than the species-specific genes. In addition, chromatin state of distal enhancers was moderately correlated to gene expression but was dispensable if given the chromatin features of the proximal regions of genes. We further extended the analysis to transcription factor (TF) binding data. Strikingly, the combinatorial effects of only a few TFs were roughly fit to gene expression levels in Arabidopsis. Overall, by using quantitative modeling, we provide a comprehensive and unbiased perspective on the epigenetic and TF-mediated regulation of gene expression in plants.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/pcp/pcz051 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!