Prediction accuracy of regulatory elements from sequence varies by functional sequencing technique.

Front Cell Infect Microbiol

Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States.

Published: August 2023

Introduction: Various sequencing based approaches are used to identify and characterize the activities of -regulatory elements in a genome-wide fashion. Some of these techniques rely on indirect markers such as histone modifications (ChIP-seq with histone antibodies) or chromatin accessibility (ATAC-seq, DNase-seq, FAIRE-seq), while other techniques use direct measures such as episomal assays measuring the enhancer properties of DNA sequences (STARR-seq) and direct measurement of the binding of transcription factors (ChIP-seq with transcription factor-specific antibodies). The activities of -regulatory elements such as enhancers, promoters, and repressors are determined by their sequence and secondary processes such as chromatin accessibility, DNA methylation, and bound histone markers.

Methods: Here, machine learning models are employed to evaluate the accuracy with which -regulatory elements identified by various commonly used sequencing techniques can be predicted by their underlying sequence alone to distinguish between -regulatory activity that is reflective of sequence content versus secondary processes.

Results And Discussion: Models trained and evaluated on sequences identified through DNase-seq and STARR-seq are significantly more accurate than models trained on sequences identified by H3K4me1, H3K4me3, and H3K27ac ChIP-seq, FAIRE-seq, and ATAC-seq. These results suggest that the activity detected by DNase-seq and STARR-seq can be largely explained by underlying DNA sequence, independent of secondary processes. Experimentally, a subset of DNase-seq and H3K4me1 ChIP-seq sequences were tested for enhancer activity using luciferase assays and compared with previous tests performed on STARR-seq sequences. The experimental data indicated that STARR-seq sequences are substantially enriched for enhancer-specific activity, while the DNase-seq and H3K4me1 ChIP-seq sequences are not. Taken together, these results indicate that the DNase-seq approach identifies a broad class of regulatory elements of which enhancers are a subset and the associated data are appropriate for training models for detecting regulatory activity from sequence alone, STARR-seq data are best for training enhancer-specific sequence models, and H3K4me1 ChIP-seq data are not well suited for training and evaluating sequence-based models for -regulatory element prediction.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10433755PMC
http://dx.doi.org/10.3389/fcimb.2023.1182567DOI Listing

Publication Analysis

Top Keywords

-regulatory elements
12
h3k4me1 chip-seq
12
regulatory elements
8
activities -regulatory
8
chromatin accessibility
8
elements enhancers
8
secondary processes
8
models trained
8
sequences identified
8
dnase-seq starr-seq
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!