PanKmer: k-mer-based and reference-free pangenome analysis.

Bioinformatics

The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States.

Published: October 2023

Summary: Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex eukaryotic genomes, limit their scope to identifying structural variants (SVs), or incur bias by relying on a reference genome. Here, we present PanKmer, a toolkit designed for reference-free analysis of pangenome datasets consisting of dozens to thousands of individual genomes. PanKmer decomposes a set of input genomes into a table of observed k-mers and their presence-absence values in each genome. These are stored in an efficient k-mer index data format that encodes SNPs, INDELs, and SVs. It also includes functions for downstream analysis of the k-mer index, such as calculating sequence similarity statistics between individuals at whole-genome or local scales. For example, k-mers can be "anchored" in any individual genome to quantify sequence variability or conservation at a specific locus. This facilitates workflows with various biological applications, e.g. identifying cases of hybridization between plant species. PanKmer provides researchers with a valuable and convenient means to explore the full scope of genetic variation in a population, without reference bias.

Availability And Implementation: PanKmer is implemented as a Python package with components written in Rust, released under a BSD license. The source code is available from the Python Package Index (PyPI) at https://pypi.org/project/pankmer/ as well as Gitlab at https://gitlab.com/salk-tm/pankmer. Full documentation is available at https://salk-tm.gitlab.io/pankmer/.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603592PMC
http://dx.doi.org/10.1093/bioinformatics/btad621DOI Listing

Publication Analysis

Top Keywords

pangenome analysis
8
python package
8
pankmer
5
pankmer k-mer-based
4
k-mer-based reference-free
4
reference-free pangenome
4
analysis
4
analysis summary
4
summary pangenomes
4
pangenomes replacing
4

Similar Publications

Importance: This study is essential for comprehending the zoonotic transmission, antimicrobial resistance, and genetic diversity of enteropathogenic (EPEC).

Objective: To improve our understanding of EPEC, this study focused on analyzing and comparing the genomic characteristics of EPEC isolates from humans and companion animals in Korea.

Methods: The whole genome of 26 EPEC isolates from patients with diarrhea and 20 EPEC isolates from companion animals in Korea were sequenced using the Illumina HiSeq X (Illumina, USA) and Oxford Nanopore MinION (Oxford Nanopore Technologies, UK) platforms.

View Article and Find Full Text PDF

Enterobacter asburiae (E. asburiae) is a gram-negative rod-shaped bacterium which has emerging significance as an opportunistic pathogen having high virulence pattern and drug resistant properties. In this study, we present the detailed analysis of the whole genome sequence of a multidrug-resistant (MDR) E.

View Article and Find Full Text PDF

Unlabelled: Thousands of complete genome sequences for strains of a species that are now available enable the advancement of pangenome analytics to a new level of sophistication. We collected 2,377 publicly available complete genomes of for detailed pangenome analysis. The core genome and accessory genomes consisted of 2,398 and 5,182 genes, respectively.

View Article and Find Full Text PDF

Unlabelled: ) is a clinically significant pathogen and a highly genetically diverse species due to its large accessory genome. The functional consequence of this diversity remains unknown mainly because, to date, functional genomic studies in have been primarily performed on reference strains. Given the growing public health threat of infections, understanding the functional genomic differences among clinical isolates can provide more insight into how its genetic diversity influences gene essentiality, clinically relevant phenotypes, and importantly, potential drug targets.

View Article and Find Full Text PDF

The development of a comprehensive pig graph pangenome assembly encompassing 27 genomes represents the most extensive collection of pig genomic data to date. Analysis of this pangenome reveals the critical role of structural variations in driving adaptation and defining breed-specific traits. Notably, the study identifies as a key candidate gene governing intramuscular fat deposition and meat quality in pigs.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!