Deregulated splicing machinery components have shown to be associated with the development of several types of cancer and, therefore, the determination of such alterations can help the development of tumor-specific molecular targets for early prognosis and therapy. Determining such splicing components, however, is not a straightforward task mainly due to the heterogeneity of tumors, the variability across samples, and the fat-short characteristic of genomic datasets. In this work, a supervised machine learning-based methodology is proposed, allowing the determination of subsets of relevant splicing components that best discriminate samples. The methodology comprises three main phases: first, a ranking of features is determined by means of applying feature weighting algorithms that compute the importance of each splicing component; second, the best subset of features that allows the induction of an accurate classifier is determined by means of conducting an effective heuristic search; then the confidence over the induced classifier is assessed by means of explaining the individual predictions and its global behavior. At the end, an extensive experimental study was conducted on a large collection of transcript-based datasets, illustrating the utility and benefit of the proposed methodology for analyzing dysregulation in splicing machinery.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.artmed.2020.101950DOI Listing

Publication Analysis

Top Keywords

splicing machinery
12
supervised machine
8
machine learning-based
8
learning-based methodology
8
methodology analyzing
8
analyzing dysregulation
8
dysregulation splicing
8
splicing components
8
splicing
6
methodology
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!