Background: Promoters, non-coding DNA sequences located at upstream regions of the transcription start site of genes/gene clusters, are essential regulatory elements for the initiation and regulation of transcriptional processes. Furthermore, identifying promoters in DNA sequences and genomes significantly contributes to discovering entire structures of genes of interest. Therefore, exploration of promoter regions is one of the most imperative topics in molecular genetics and biology. Besides experimental techniques, computational methods have been developed to predict promoters. In this study, we propose iPromoter-Seqvec - an efficient computational model to predict TATA and non-TATA promoters in human and mouse genomes using bidirectional long short-term memory neural networks in combination with sequence-embedded features extracted from input sequences. The promoter and non-promoter sequences were retrieved from the Eukaryotic Promoter database and then were refined to create four benchmark datasets.

Results: The area under the receiver operating characteristic curve (AUCROC) and the area under the precision-recall curve (AUCPR) were used as two key metrics to evaluate model performance. Results on independent test sets showed that iPromoter-Seqvec outperformed other state-of-the-art methods with AUCROC values ranging from 0.85 to 0.99 and AUCPR values ranging from 0.86 to 0.99. Models predicting TATA promoters in both species had slightly higher predictive power compared to those predicting non-TATA promoters. With a novel idea of constructing artificial non-promoter sequences based on promoter sequences, our models were able to learn highly specific characteristics discriminating promoters from non-promoters to improve predictive efficiency.

Conclusions: iPromoter-Seqvec is a stable and robust model for predicting both TATA and non-TATA promoters in human and mouse genomes. Our proposed method was also deployed as an online web server with a user-friendly interface to support research communities. Links to our source codes and web server are available at https://github.com/mldlproject/2022-iPromoter-Seqvec .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9531353PMC
http://dx.doi.org/10.1186/s12864-022-08829-6DOI Listing

Publication Analysis

Top Keywords

non-tata promoters
12
promoters
9
identifying promoters
8
bidirectional long
8
long short-term
8
short-term memory
8
sequence-embedded features
8
dna sequences
8
tata non-tata
8
promoters human
8

Similar Publications

Background: Promoters, non-coding DNA sequences located at upstream regions of the transcription start site of genes/gene clusters, are essential regulatory elements for the initiation and regulation of transcriptional processes. Furthermore, identifying promoters in DNA sequences and genomes significantly contributes to discovering entire structures of genes of interest. Therefore, exploration of promoter regions is one of the most imperative topics in molecular genetics and biology.

View Article and Find Full Text PDF

A promoter is a region in the DNA sequence that defines where the transcription of a gene by RNA polymerase initiates, which is typically located proximal to the transcription start site (TSS). How to correctly identify the gene TSS and the core promoter is essential for our understanding of the transcriptional regulation of genes. As a complement to conventional experimental methods, computational techniques with easy-to-use platforms as essential bioinformatics tools can be effectively applied to annotate the functions and physiological roles of promoters.

View Article and Find Full Text PDF
Article Synopsis
  • The study addresses the challenge of accurately identifying promoters—key DNA regions that initiate transcription—using Convolutional Neural Networks (CNN) to analyze sequence features across different organisms, including humans, mice, plants, and bacteria.
  • CNN models achieved high accuracy in classifying promoters, with significant success rates for TATA and non-TATA promoters, particularly in human and Arabidopsis sequences, indicating the effectiveness of the deep learning approach in capturing complex promoter characteristics.
  • A new program, CNNProm, has been developed to utilize these CNN models for promoter prediction, which can be broadly applied to various genomes, and includes a random substitution method to identify conserved functional elements without needing specific promoter features.
View Article and Find Full Text PDF

Rate of promoter class turn-over in yeast evolution.

BMC Evol Biol

February 2006

Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey 08544, USA.

Background: Phylogenetic conservation at the DNA level is routinely used as evidence of molecular function, under the assumption that locations and sequences of functional DNA segments remain invariant in evolution. In particular, short DNA segments participating in initiation and regulation of transcription are often conserved between related species. However, transcription of a gene can evolve, and this evolution may involve changes of even such conservative DNA segments.

View Article and Find Full Text PDF

Alternative core promoters regulate tissue-specific transcription from the autoimmune diabetes-related ICA1 (ICA69) gene locus.

J Biol Chem

January 2003

Division of Immunogenetics, Department of Pediatrics, Diabetes Institute, Rangos Research Center, Children's Hospital of Pittsburgh, University of Pittsburgh School of Medicine, Pennsylvania 15213, USA.

Islet cell autoantigen 69-kDa (ICA69), protein product of the human ICA1 gene, is one target of the immune processes defining the pathogenesis of Type 1 diabetes. We have characterized the genomic structure and functional promoters within the 5'-regulatory region of ICA1. 5'-RNA ligase-mediated rapid amplification of cDNA ends evaluation of ICA1 transcripts expressed in human islets, testis, heart, and cultured neuroblastoma cells reveals that three 5'-untranslated region exons are variably expressed from the ICA1 gene in a tissue-specific manner.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!