Comparative analyses between retained introns and constitutively spliced introns in Arabidopsis thaliana using random forest and support vector machine.

PLoS One

Department of Biology, Miami University, Oxford, Ohio, United States of America; Department of Computer Sciences and Software Engineering, Miami University, Oxford, Ohio, United States of America.

Published: April 2015

One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter [Formula: see text] in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4128822PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0104049PLOS

Publication Analysis

Top Keywords

random forest
12
alternative splicing
12
introns
8
retained introns
8
constitutively spliced
8
spliced introns
8
introns arabidopsis
8
arabidopsis thaliana
8
forest support
8
support vector
8

Similar Publications

Background: Major depressive disorder (MDD) comes along with an increased risk of recurrence and poor course of illness. Machine learning has recently shown promise in the prediction of mental illness, yet models aiming to predict MDD course are still rare and do not quantify the predictive value of established MDD recurrence risk factors.

Methods: We analyzed N = 571 MDD patients from the Marburg-Münster Affective Disorder Cohort Study (MACS).

View Article and Find Full Text PDF

Integrating machine learning, suspect and nontarget screening reveal the interpretable fates of micropollutants and their transformation products in sludge.

J Hazard Mater

January 2025

School of Environmental Studies, China University of Geosciences, Wuhan, Hubei 430074, China; National Engineering Research Center of Industrial Wastewater Detoxication and Resource Recovery, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China. Electronic address:

Activated sludge enriches vast amounts of micropollutants (MPs) when wastewater is treated, posing potential environmental risks. While standard methods typically focus on target analysis of known compounds, the identity, structure, and concentration of transformation products (TPs) of MPs remain less understood. Here, we employed a novel approach that integrates machine learning for the quantification of nontarget TPs with advanced target, suspect, and nontarget screening strategies.

View Article and Find Full Text PDF

Plastic waste management is one of the key issues in global environmental protection. Integrating spectroscopy acquisition devices with deep learning algorithms has emerged as an effective method for rapid plastic classification. However, the challenges in collecting plastic samples and spectroscopy data have resulted in a limited number of data samples and an incomplete comparison of relevant classification algorithms.

View Article and Find Full Text PDF

IntroductionAsthma attacks are set off by triggers such as pollutants from the environment, respiratory viruses, physical activity and allergens. The aim of this research is to create a machine learning model using data from mobile health technology to predict and appropriately warn a patient to avoid such triggers.MethodsLightweight machine learning models, XGBoost, Random Forest, and LightGBM were trained and tested on cleaned asthma data with a 70-30 train-test split.

View Article and Find Full Text PDF

Purpose: The incidence of cancer, which is a serious public health concern, is increasing. A predictive analysis driven by machine learning was integrated with haematology parameters to create a method for the simultaneous diagnosis of several malignancies at different stages.

Patients And Methods: We analysed a newly collected dataset from various hospitals in Jordan comprising 19,537 laboratory reports (6,280 cancer and 13,257 noncancer cases).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!