The wide adoption of bacterial genome sequencing and encoding both core and accessory genome variation using -mers has allowed bacterial genome-wide association studies (GWAS) to identify genetic variants associated with relevant phenotypes such as those linked to infection. Significant limitations still remain because of -mers being duplicated across gene clusters and as far as the interpretation of association results is concerned, which affects the wider adoption of GWAS methods on microbial data sets. We have developed a simple computational method (panfeed) that explicitly links each -mer to their gene cluster at base-resolution level, which allows us to avoid biases introduced by a global de Bruijn graph as well as more easily map and annotate associated variants. We tested panfeed on two independent data sets, correctly identifying previously characterized causal variants, which demonstrates the precision of the method, as well as its scalable performance. panfeed is a command line tool written in the python programming language and is available at https://github.com/microbial-pangenomes-lab/panfeed.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10711318PMC
http://dx.doi.org/10.1099/mgen.0.001129DOI Listing

Publication Analysis

Top Keywords

bacterial genome-wide
8
data sets
8
reduced ambiguity
4
ambiguity improved
4
improved interpretability
4
interpretability bacterial
4
genome-wide associations
4
associations gene-cluster-centric
4
gene-cluster-centric -mers
4
-mers wide
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!