The RNA polymerase II (Pol II) core promoter is the strategic site of convergence of the signals that lead to the initiation of DNA transcription, but the downstream core promoter in humans has been difficult to understand. Here we analyse the human Pol II core promoter and use machine learning to generate predictive models for the downstream core promoter region (DPR) and the TATA box. We developed a method termed HARPE (high-throughput analysis of randomized promoter elements) to create hundreds of thousands of DPR (or TATA box) variants, each with known transcriptional strength.
View Article and Find Full Text PDFDNA sequence signals in the core promoter, such as the initiator (Inr), direct transcription initiation by RNA polymerase II. Here we show that the human Inr has the consensus of BBCABW at focused promoters in which transcription initiates at a single site or a narrow cluster of sites. The analysis of 7678 focused transcription start sites revealed 40% with a perfect match to the Inr and 16% with a single mismatch outside of the CA core.
View Article and Find Full Text PDFNew technologies in genomics and proteomics have influenced the emergence of proteogenomics, a field at the confluence of genomics, transcriptomics, and proteomics. First generation proteogenomic toolkits employ peptide mass spectrometry to identify novel protein coding regions. We extend first generation proteogenomic tools to achieve greater accuracy and enable the analysis of large, complex genomes.
View Article and Find Full Text PDF