In this study, we used a mathematical method for the multiple alignment of highly divergent sequences (MAHDS) to create a database of potential promoter sequences (PPSs) in the genome. To search for PPSs, 20 statistically significant classes of sequences located in the range from -499 to +100 nucleotides near the annotated genes were calculated. For each class, a position-weight matrix (PWM) was computed and then used to identify PPSs in the genome. In total, 825,136 PPSs were detected, with a false positive rate of 0.13%. The PPSs obtained with the MAHDS method were tested using TSSFinder, which detects transcription start sites. The databank of the found PPSs provides their coordinates in chromosomes, the alignment of each PPS with the PWM, and the level of statistical significance as a normal distribution argument, and can be used in genetic engineering and biotechnology.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9332048PMC
http://dx.doi.org/10.3390/biology11081117DOI Listing

Publication Analysis

Top Keywords

database potential
8
potential promoter
8
promoter sequences
8
ppss genome
8
ppss
6
sequences
4
sequences genome
4
genome study
4
study mathematical
4
mathematical method
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!