We present a highly accurate gene-prediction system for eukaryotic genomes, called mGene. It combines in an unprecedented manner the flexibility of generalized hidden Markov models (gHMMs) with the predictive power of modern machine learning methods, such as Support Vector Machines (SVMs). Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans. Considering the average of sensitivity and specificity, the developmental version of mGene exhibited the best prediction performance on nucleotide, exon, and transcript level for ab initio and multiple-genome gene-prediction tasks. The fully developed version shows superior performance in 10 out of 12 evaluation criteria compared with the other participating gene finders, including Fgenesh++ and Augustus. An in-depth analysis of mGene's genome-wide predictions revealed that approximately 2200 predicted genes were not contained in the current genome annotation. Testing a subset of 57 of these genes by RT-PCR and sequencing, we confirmed expression for 24 (42%) of them. mGene missed 300 annotated genes, out of which 205 were unconfirmed. RT-PCR testing of 24 of these genes resulted in a success rate of merely 8%. These findings suggest that even the gene catalog of a well-studied organism such as C. elegans can be substantially improved by mGene's predictions. We also provide gene predictions for the four nematodes C. briggsae, C. brenneri, C. japonica, and C. remanei. Comparing the resulting proteomes among these organisms and to the known protein universe, we identified many species-specific gene inventions. In a quality assessment of several available annotations for these genomes, we find that mGene's predictions are most accurate.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775605PMC
http://dx.doi.org/10.1101/gr.090597.108DOI Listing

Publication Analysis

Top Keywords

mgene's predictions
8
gene
5
mgene
4
mgene accurate
4
accurate svm-based
4
svm-based gene
4
gene finding
4
finding application
4
application nematode
4
nematode genomes
4

Similar Publications

Analyzing microbial samples remains computationally challenging due to their diversity and complexity. The lack of robust de novo protein function prediction methods exacerbates the difficulty in deriving functional insights from these samples. Traditional prediction methods, dependent on homology and sequence similarity, often fail to predict functions for novel proteins and proteins without known homologs.

View Article and Find Full Text PDF

Fluid administration is widely used to treat hypotension in patients undergoing veno-venous extracorporeal membrane oxygenation (VV-ECMO). However, excessive fluid administration may lead to fluid overload can aggravate acute respiratory distress syndrome (ARDS) and increase patient mortality, predicting fluid responsiveness is of great significance for VV-ECMO patients. This prospective single-center study was conducted in a medical intensive care unit (ICU) and finally included 51 VV-ECMO patients with ARDS in the prone position (PP).

View Article and Find Full Text PDF

Acoustic emission information can describe the damage degree of rock samples in the process of failure. However, as a discrete non-stationary signal, acoustic emission information is difficult to be effectively processed by conventional methods, while wavelet analysis is an effective method for non-stationary signal processing. Therefore, acoustic emission signal is deeply studied by using wavelet analysis method.

View Article and Find Full Text PDF

Urban rail transit systems, represented by subways, have significantly alleviated the traffic pressure brought by urbanization and have addressed issues such as traffic congestion. However, as a commonly used construction method for subway tunnels, shield tunneling inevitably disturbs the surrounding soil, leading to uneven ground surface settlement, which can impact the safety of nearby buildings. Therefore, it is crucial to promptly obtain and predict the ground surface settlement induced by shield tunneling construction to enable safety warnings and evaluations.

View Article and Find Full Text PDF

The Laurani high-sulfidation epithermal deposit, located in the northeastern Altiplano of Bolivia, is a representative gold-polymetallic deposit linked to the late Miocene volcanic rocks that were formed approximately at about 7.5 Ma. At Laurani, four mineralization stages are defined.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!