A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 176

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3122
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 316
Function: require_once

Vector space classification of DNA sequences. | LitMetric

Vector space classification of DNA sequences.

J Theor Biol

Division of Biology and W. K. Kellogg Radiation Laboratory, California Institute of Technology, 1201 East California Boulevard, Pasadena, CA 91125, USA.

Published: July 2003

Revisiting the problem of intron-exon identification, we use a principal component analysis (PCA) to classify DNA sequences and present first results that validate our approach. Sequences are translated into document vectors that represent their word content; a principal component analysis then defines Gaussian-distributed sequence classes. The classification uses word content and variation of word usage to distinguish sequences. We test our approach with several data sets of genomic DNA and are able to classify introns and exons with an accuracy of up to 96%. We compare the method with the best traditional coding measure, the non-overlapping hexamer frequency count, and find that the PCA method produces better results. We also investigate the degree of cross-validation between different data sets of introns and exons and find evidence that the quality of a data set can be detected.

Download full-text PDF

Source
http://dx.doi.org/10.1016/s0022-5193(03)00082-1DOI Listing

Publication Analysis

Top Keywords

dna sequences
8
principal component
8
component analysis
8
word content
8
data sets
8
introns exons
8
vector space
4
space classification
4
classification dna
4
sequences
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!