Severity: Warning
Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
Filename: helpers/my_audit_helper.php
Line Number: 176
Backtrace:
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 1034
Function: getPubMedXML
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3152
Function: GetPubMedArticleOutput_2016
File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global
File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword
File: /var/www/html/index.php
Line: 316
Function: require_once
Strings of nucleotides carrying biological information are typically described as sequence motifs represented by weight matrices or consensus sequences. However, many signals in DNA or RNA are recognized by multiple factors in temporal sequence, consist of distinct alternative motifs, or are best described by base composition. Here we apply the latent Dirichlet allocation (LDA) mixture model to nucleotide sequences. Using positions in an alignment of human or Drosophila splice sites as samples, we show that LDA readily identifies motifs, including such elusive cases as the intron branch site. Using whole sequences with positional k-mers as features, LDA can identify sequence subtypes enriched in long vs. short introns. LDA with bulk k-mers can reliably distinguish reading frame and species of origin in coding sequences from humans and Drosophila. We find that LDA is a useful model for describing heterogeneous signals, for assigning individual sequences to subtypes, and for identifying and characterizing sequences that do not fit recognized subtypes. Because LDA topic models are interpretable, they also aid the discovery of new motifs, even those present in a small fraction of samples. In summary, LDA can identify and characterize signals in nucleotide sequences, including candidate regulatory factors involved in biological processes.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11310860 | PMC |
http://dx.doi.org/10.1093/nargab/lqae099 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!