MiRPara: a SVM-based software tool for prediction of most probable microRNA coding regions in genome scale sequences.

BMC Bioinformatics

Bioinformatics Group, State Key Laboratory of Virology, Wuhan Institute of Virology, Chinese Academy of Science, PR of China.

Published: April 2011

Background: MicroRNAs are a family of ~22 nt small RNAs that can regulate gene expression at the post-transcriptional level. Identification of these molecules and their targets can aid understanding of regulatory processes. Recently, HTS has become a common identification method but there are two major limitations associated with the technique. Firstly, the method has low efficiency, with typically less than 1 in 10,000 sequences representing miRNA reads and secondly the method preferentially targets highly expressed miRNAs. If sequences are available, computational methods can provide a screening step to investigate the value of an HTS study and aid interpretation of results. However, current methods can only predict miRNAs for short fragments and have usually been trained against small datasets which don't always reflect the diversity of these molecules.

Results: We have developed a software tool, miRPara, that predicts most probable mature miRNA coding regions from genome scale sequences in a species specific manner. We classified sequences from miRBase into animal, plant and overall categories and used a support vector machine to train three models based on an initial set of 77 parameters related to the physical properties of the pre-miRNA and its miRNAs. By applying parameter filtering we found a subset of ~25 parameters produced higher prediction ability compared to the full set. Our software achieves an accuracy of up to 80% against experimentally verified mature miRNAs, making it one of the most accurate methods available.

Conclusions: miRPara is an effective tool for locating miRNAs coding regions in genome sequences and can be used as a screening step prior to HTS experiments. It is available at http://www.whiov.ac.cn/bioinformatics/mirpara.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3110143PMC
http://dx.doi.org/10.1186/1471-2105-12-107DOI Listing

Publication Analysis

Top Keywords

coding regions
12
regions genome
12
software tool
8
genome scale
8
scale sequences
8
screening step
8
sequences
6
mirnas
5
mirpara svm-based
4
svm-based software
4

Similar Publications

Early addiction disorders screening is recommended in primary care. The goal of health system reform is to include allied health professionals in this screening. The appropriation of their new role has not yet been explored.

View Article and Find Full Text PDF

mettannotator: a comprehensive and scalable Nextflow annotation pipeline for prokaryotic assemblies.

Bioinformatics

January 2025

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.

Summary: In recent years there has been a surge in prokaryotic genome assemblies, coming from both isolated organisms and environmental samples. These assemblies often include novel species that are poorly represented in reference databases creating a need for a tool that can annotate both well-described and novel taxa, and can run at scale. Here, we present mettannotator-a comprehensive, scalable Nextflow pipeline for prokaryotic genome annotation that identifies coding and non-coding regions, predicts protein functions, including antimicrobial resistance, and delineates gene clusters.

View Article and Find Full Text PDF

Introduction: Allergic rhinitis is the specific inflammation against allergen by immune defense cells on the nasal mucosa, which can lead to chronic nasal symptoms such as sneezing, itching, runny nose, and nasal congestion. It is associated with high morbidity including sinusitis, asthma, otitis media, hypertrophied inferior turbinate, and nasal polyps. Despite its complications, it remains poorly recognized and tracked.

View Article and Find Full Text PDF

A new model-based approach for estimating rural hospital markets.

J Rural Health

January 2025

North Carolina Rural Health Research Program, Cecil G. Sheps Center for Health Services Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.

Purpose: To provide a new approach for defining rural hospital markets.

Methods: First, we estimated models of hospital choice. We defined hospitals in the choice set using nationwide hospital data from the Healthcare Cost Report Information System (HCRIS).

View Article and Find Full Text PDF

Cases for a disease can be defined broadly using diagnostic codes, or narrowly using gold-standard confirmation that often is not available in large administrative datasets. These different definitions can have significant impacts on the results and conclusions of studies. We conducted this study to assess how using melanoma phecodes versus histologic confirmation for invasive or in situ melanoma impacts the results of a genome-wide association study (GWAS) using the Million Veteran Program.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!