EnsembleSplice: ensemble deep learning model for splice site prediction.

BMC Bioinformatics

Department of Computer Science, University of Colorado, Colorado Springs, CO, 80918, USA.

Published: October 2022

Background: Identifying splice site regions is an important step in the genomic DNA sequencing pipelines of biomedical and pharmaceutical research. Within this research purview, efficient and accurate splice site detection is highly desirable, and a variety of computational models have been developed toward this end. Neural network architectures have recently been shown to outperform classical machine learning approaches for the task of splice site prediction. Despite these advances, there is still considerable potential for improvement, especially regarding model prediction accuracy, and error rate.

Results: Given these deficits, we propose EnsembleSplice, an ensemble learning architecture made up of four (4) distinct convolutional neural networks (CNN) model architecture combination that outperform existing splice site detection methods in the experimental evaluation metrics considered including the accuracies and error rates. We trained and tested a variety of ensembles made up of CNNs and DNNs using the five-fold cross-validation method to identify the model that performed the best across the evaluation and diversity metrics. As a result, we developed our diverse and highly effective splice site (SS) detection model, which we evaluated using two (2) genomic Homo sapiens datasets and the Arabidopsis thaliana dataset. The results showed that for of the Homo sapiens EnsembleSplice achieved accuracies of 94.16% for one of the acceptor splice sites and 95.97% for donor splice sites, with an error rate for the same Homo sapiens dataset, 4.03% for the donor splice sites and 5.84% for the acceptor splice sites datasets.

Conclusions: Our five-fold cross validation ensured the prediction accuracy of our models are consistent. For reproducibility, all the datasets used, models generated, and results in our work are publicly available in our GitHub repository here: https://github.com/OluwadareLab/EnsembleSplice.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9535948PMC
http://dx.doi.org/10.1186/s12859-022-04971-wDOI Listing

Publication Analysis

Top Keywords

splice site
24
splice sites
16
site detection
12
homo sapiens
12
splice
10
ensemblesplice ensemble
8
site prediction
8
prediction accuracy
8
acceptor splice
8
donor splice
8

Similar Publications

The established consensus sequence for human 5' splice sites masks the presence of two major splice site classes defined by preferential base-pairing potentials with either U5 snRNA loop 1 or the U6 snRNA ACAGA box. The two 5' splice site classes are separable in genome sequences, sensitized by specific genotypes and associated with splicing complexity. The two classes reflect the commitment to 5' splice site usage occurring primarily during 5' splice site transfer to U6 snRNA.

View Article and Find Full Text PDF

Structural basis of 5' splice site recognition by the minor spliceosome.

Mol Cell

January 2025

European Molecular Biology Laboratory (EMBL), EMBL Grenoble, 71 Avenue des Martyrs, 38042 Grenoble, France. Electronic address:

The minor spliceosome catalyzes excision of U12-dependent introns from precursors of eukaryotic messenger RNAs (pre-mRNAs). This process is critical for many cellular functions, but the underlying molecular mechanisms remain elusive. Here, we report a cryoelectron microscopy (cryo-EM) reconstruction of the 13-subunit human U11 small nuclear ribonucleoprotein particle (snRNP) complex in apo and substrate-bound forms, revealing the architecture of the U11 small nuclear RNA (snRNA), five minor spliceosome-specific factors, and the mechanism of the U12-type 5' splice site (5'SS) recognition.

View Article and Find Full Text PDF

-Related Muscular Dystrophies, LGMD, and TMD, in an Estonian Family Caused by the Finnish Founder Variant.

Neurol Genet

December 2024

From the The Institute of Clinical Medicine (K.Õ., T.R., E.Õ.-S., L.M., S. Pajusalu), Faculty of Medicine, University of Tartu; Genetics and Personalized Medicine Clinic (K.Õ., T.R., L.M., Sander Pajusalu); Children's Clinic (E.O.-S.); Pathology Department (S. Puusepp), Tartu University Hospital, Estonia; Folkhalsan Research Center (M.S., B.U.), Helsinki; and Tampere Neuromuscular Center (B.U.), Tampere, Finland.

Background And Objectives: Tibial muscular dystrophy (TMD) is an autosomal dominant, slowly progressive late-onset distal myopathy. TMD was first described in 1991 by Udd et al. in Finnish patients, who were later found to harbor a heterozygous unique 11-bp insertion/deletion in the last exon of the gene-the Finnish founder variant (FINmaj).

View Article and Find Full Text PDF

Objective: The study aimed to evaluate the epidemiological, clinical, and molecular data of mucopolysaccharidosis type II (MPS II) patients and their outcomes using the national registry of patients in the Russian Federation (RF). Materials and Methods: In the retrospective cohort study, the authors included data from the Russian national registry of MPS II. Results: The prevalence of MPS II in RF is 0.

View Article and Find Full Text PDF

Opening of the cardiac voltage-gated Na+ channel (Nav1.5) is responsible for robust depolarization of the cardiac action potential, while inactivation, which rapidly follows, allows for repolarization. Regulation of both the voltage- and time-dependent kinetics of Nav1.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!