PanKmer: k-mer-based and reference-free pangenome analysis.

Anthony J Aylward Semar Petrus Allen Mamerto Nolan T Hartwick Todd P Michael

Bioinformatics

The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States.

Published: October 2023

Summary: Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex eukaryotic genomes, limit their scope to identifying structural variants (SVs), or incur bias by relying on a reference genome. Here, we present PanKmer, a toolkit designed for reference-free analysis of pangenome datasets consisting of dozens to thousands of individual genomes. PanKmer decomposes a set of input genomes into a table of observed k-mers and their presence-absence values in each genome. These are stored in an efficient k-mer index data format that encodes SNPs, INDELs, and SVs. It also includes functions for downstream analysis of the k-mer index, such as calculating sequence similarity statistics between individuals at whole-genome or local scales. For example, k-mers can be "anchored" in any individual genome to quantify sequence variability or conservation at a specific locus. This facilitates workflows with various biological applications, e.g. identifying cases of hybridization between plant species. PanKmer provides researchers with a valuable and convenient means to explore the full scope of genetic variation in a population, without reference bias.

Availability And Implementation: PanKmer is implemented as a Python package with components written in Rust, released under a BSD license. The source code is available from the Python Package Index (PyPI) at https://pypi.org/project/pankmer/ as well as Gitlab at https://gitlab.com/salk-tm/pankmer. Full documentation is available at https://salk-tm.gitlab.io/pankmer/.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603592	PMC
http://dx.doi.org/10.1093/bioinformatics/btad621	DOI Listing

Publication Analysis

Top Keywords

pangenome analysis

python package

pankmer

pankmer k-mer-based

k-mer-based reference-free

reference-free pangenome

analysis

analysis summary

summary pangenomes

pangenomes replacing

Similar Publications

Whole genome sequencing analysis of enteropathogenic from human and companion animals in Korea.

J Vet Sci

December 2024

College of Veterinary Medicine, Chungbuk National University, Cheongju 28644, Korea.

Jae Young Oh Kyung-Hyo Do Jae Hong Jeong SuMin Kwak Sujin Choe

Importance: This study is essential for comprehending the zoonotic transmission, antimicrobial resistance, and genetic diversity of enteropathogenic (EPEC).

Objective: To improve our understanding of EPEC, this study focused on analyzing and comparing the genomic characteristics of EPEC isolates from humans and companion animals in Korea.

Methods: The whole genome of 26 EPEC isolates from patients with diarrhea and 20 EPEC isolates from companion animals in Korea were sequenced using the Illumina HiSeq X (Illumina, USA) and Oxford Nanopore MinION (Oxford Nanopore Technologies, UK) platforms.

View Article and Find Full Text PDF

Similar Publications

First report on comprehensive genomic analysis of a multidrug-resistant Enterobacter asburiae isolated from diabetic foot infection from Bangladesh.

Sci Rep

January 2025

Department of Microbiology, University of Dhaka, Dhaka, 1000, Bangladesh.

Md Rafiul Islam Spencer Mark Mondol Md Azad Hossen Mst Poli Khatun Shahjada Selim

Enterobacter asburiae (E. asburiae) is a gram-negative rod-shaped bacterium which has emerging significance as an opportunistic pathogen having high virulence pattern and drug resistant properties. In this study, we present the detailed analysis of the whole genome sequence of a multidrug-resistant (MDR) E.

View Article and Find Full Text PDF

Similar Publications

Decomposition of the pangenome matrix reveals a structure in gene distribution in the species.

mSphere

December 2024

Department of Bioengineering, University of California, San Diego, La Jolla, California, USA.

Siddharth M Chauhan Omid Ardalani Jason C Hyun Jonathan M Monk Patrick V Phaneuf

Unlabelled: Thousands of complete genome sequences for strains of a species that are now available enable the advancement of pangenome analytics to a new level of sophistication. We collected 2,377 publicly available complete genomes of for detailed pangenome analysis. The core genome and accessory genomes consisted of 2,398 and 5,182 genes, respectively.

View Article and Find Full Text PDF

Similar Publications

Transposon-sequencing across multiple isolates reveals significant functional genomic diversity among strains.

mBio

December 2024

Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

Chidiebere Akusobi Sanjeevani Choudhery Bouchra S Benghomari Ian D Wolf Shreya Singhvi

Unlabelled: ) is a clinically significant pathogen and a highly genetically diverse species due to its large accessory genome. The functional consequence of this diversity remains unknown mainly because, to date, functional genomic studies in have been primarily performed on reference strains. Given the growing public health threat of infections, understanding the functional genomic differences among clinical isolates can provide more insight into how its genetic diversity influences gene essentiality, clinically relevant phenotypes, and importantly, potential drug targets.

View Article and Find Full Text PDF

Similar Publications

Pangenome and genome variation analyses of pigs unveil genomic facets for their adaptation and agronomic characteristics.

Imeta

December 2024

Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, Laboratory of Animal Fat Deposition & Muscle Development, College of Animal Science and Technology Northwest A&F University Yangling Shaanxi China.

Dong Li Yulong Wang Tiantian Yuan Minghao Cao Yulin He

The development of a comprehensive pig graph pangenome assembly encompassing 27 genomes represents the most extensive collection of pig genomic data to date. Analysis of this pangenome reveals the critical role of structural variations in driving adaptation and defining breed-specific traits. Notably, the study identifies as a key candidate gene governing intramuscular fat deposition and meat quality in pigs.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!