Most of the current computational models for splice junction prediction are based on the identification of canonical splice junctions. However, it is observed that the junctions lacking the consensus dimers GT and AG also undergo splicing. Identification of such splice junctions, called the non-canonical splice junctions, is also essential for a comprehensive understanding of the splicing phenomenon. This work focuses on the identification of non-canonical splice junctions through the application of a bidirectional long short-term memory (BLSTM) network. Furthermore, we apply a back-propagation-based (integrated gradient) and a perturbation-based (occlusion) visualization techniques to extract the non-canonical splicing features learned by the model. The features obtained are validated with the existing knowledge from the literature. Integrated gradient extracts features that comprise contiguous nucleotides, whereas occlusion extracts features that are individual nucleotides distributed across the sequence.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1142/S0219720021500141 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!