The physical interactions between enhancers and promoters are often involved in gene transcriptional regulation. High tissue-specific enhancer-promoter interactions (EPIs) are responsible for the differential expression of genes. Experimental methods are time-consuming and labor-intensive in measuring EPIs. An alternative approach, machine learning, has been widely used to predict EPIs. However, most existing machine learning methods require a large number of functional genomic and epigenomic features as input, which limits the application to different cell lines. In this paper, we developed a random forest model, HARD (H3K27ac, ATAC-seq, RAD21, and Distance), to predict EPI using only four types of features. Independent tests on a benchmark dataset showed that HARD outperforms other models with the fewest features. Our results revealed that chromatin accessibility and the binding of cohesin are important for cell-line-specific EPIs. Furthermore, we trained the HARD model in the GM12878 cell line and performed testing in the HeLa cell line. The cross-cell-lines prediction also performs well, suggesting it has the potential to be applied to other cell lines.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10151517 | PMC |
http://dx.doi.org/10.3389/fgene.2023.1133775 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!