Motivation: Multi-strain infection is a common yet under-investigated phenomenon of many pathogens. Currently, biologists analyzing SNP information sometimes have to discard mixed infection samples as many downstream analyses require monogenomic inputs. Such a protocol impedes our understanding of the underlying genetic diversity, co-infection patterns, and genomic relatedness of pathogens. A scalable tool to learn and resolve the SNP-haplotypes from polygenomic data is an urgent need in molecular epidemiology.
Results: We develop a slice sampling Markov Chain Monte Carlo algorithm, named SNP-Slice, to learn not only the SNP-haplotypes of all strains in the populations but also which strains infect which hosts. Our method reconstructs SNP-haplotypes and individual heterozygosities accurately without reference panels and outperforms the state-of-the-art methods at estimating the multiplicity of infections and allele frequencies. Thus, SNP-Slice introduces a novel approach to address polygenomic data and opens a new avenue for resolving complex infection patterns in molecular surveillance. We illustrate the performance of SNP-Slice on empirical malaria and HIV datasets and provide recommendations for using our method on empirical datasets.
Availability And Implementation: The implementation of the SNP-Slice algorithm, as well as scripts to analyze SNP-Slice outputs, are available at https://github.com/nianqiaoju/snp-slice.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11187496 | PMC |
http://dx.doi.org/10.1093/bioinformatics/btae344 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!