Motivation: This article considers the problem of estimating population genetic subdivision from multilocus genotype data. A model is considered to make use of genotypes and possibly of spatial coordinates of sampled individuals. A particular attention is paid to the case of low genetic differentiation with the help of a previously described Bayesian clustering model where allele frequencies are assumed to be a priori correlated. Under this model, various problems of inference are considered, in particular the common and difficult, but still unaddressed, situation where the number of populations is unknown.
Results: A Markov chain Monte Carlo algorithm and a new post-processing scheme are proposed. It is shown that they significantly improve the accuracy of previously existing algorithms in terms of estimated number of populations and estimated population membership. This is illustrated numerically with data simulated from the prior-likelihood model used in inference and also with data simulated from a Wright-Fisher model. Improvements are also illustrated on a real dataset of eighty-eight wolverines (Gulo gulo) genotyped at 10 microsatellites loci. The interest of the solutions presented here are not specific to any clustering model and are hence relevant to many settings in populations genetics where weakly differentiated populations are assumed or sought.
Availability: The improvements implemented will be made available in version 3.0.0 of the R package Geneland. Informations on how to get and use the software are available from http://folk.uio.no/gillesg/Geneland.html.
Supplementary Information: http://folk.uio.no/gillesg/CFM/SuppMat.pdf.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/bioinformatics/btn419 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!