A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 176

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 1034
Function: getPubMedXML

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3152
Function: GetPubMedArticleOutput_2016

File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 316
Function: require_once

Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy. | LitMetric

AI Article Synopsis

  • Information theoretic methods are widely used in bioinformatics, especially in comparative genomics, for analyzing DNA sequences through k-mers (short DNA words).
  • The study evaluated different k-mer lengths (11, 21, 31, 41) across 5805 genomes from the KEGG GENOME database, finding that 21- and 31-mer Jaccard similarities provided the best hierarchical clustering that aligned with the phylogenetic tree of life.
  • The analysis of around 14.2 million prokaryotic genomes revealed potential misclassification errors in a curated database, highlighting the importance of quantitative taxonomic classifications based on whole-genome similarities.

Article Abstract

Information theoretic approaches are ubiquitous and effective in a wide variety of bioinformatics applications. In comparative genomics, alignment-free methods, based on short DNA words, or k-mers, are particularly powerful. We evaluated the utility of varying k-mer lengths for genome comparisons by analyzing their sequence space coverage of 5805 genomes in the KEGG GENOME database. In subsequent analyses on four k-mer lengths spanning the relevant range (11, 21, 31, 41), hierarchical clustering of 1634 genus-level representative genomes using pairwise 21- and 31-mer Jaccard similarities best recapitulated a phylogenetic/taxonomic tree of life with clear boundaries for superkingdom domains and high subtree similarity for named taxons at lower levels (family through phylum). By analyzing ~14.2M prokaryotic genome comparisons by their lowest-common-ancestor taxon levels, we detected many potential misclassification errors in a curated database, further demonstrating the need for wide-scale adoption of quantitative taxonomic classifications based on whole-genome similarity.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8516232PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0258693PLOS

Publication Analysis

Top Keywords

comparative genomics
8
k-mer lengths
8
genome comparisons
8
large-scale k-mer-based
4
k-mer-based analysis
4
analysis informational
4
informational properties
4
properties genomes
4
genomes comparative
4
genomics taxonomy
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!