We present here a heuristic method toward predicting the expression specificity in the transcriptional process, which is known to be regulated in large part by promoter sequences, by observing the appearance of conserved sequence patterns in a group of known promoters, such as for housekeeping or tissue-specific genes. Statistically conserved patterns were automatically extracted from a set of unaligned sequences up to 200 bp upstream of the transcription initiation site, by a standard procedure using the Markov chain and binomial distribution models. Furthermore, to obtain signal sequences of optimal lengths we devised a method that combines the multiple alignment and the analysis of the information content (or relative entropy). Groups of related promoters were compiled from the EPD eukaryotic promoter database and the EMBL nucleic acid sequence database. Each promoter was examined for its specificity by linear discriminant analysis to test the validity of the extracted patterns. Our method could correctly discriminate 77.6% of the housekeeping gene promoters and 62.9% of the liver promoters.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/dnares/4.2.81 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!