Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1142/S0219720012500163 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!