Estimating haplotype frequencies by combining data from large DNA pools with database information.

Dario Gasbarra Sangita Kulathinal Matti Pirinen Mikko J Sillanpää

IEEE/ACM Trans Comput Biol Bioinform

Department of Mathematics and Statistics, University of Helsinki, FIN 00014 Helsinki, Finland.

Published: February 2011

We assume that allele frequency data have been extracted from several large DNA pools, each containing genetic material of up to hundreds of sampled individuals. Our goal is to estimate the haplotype frequencies among the sampled individuals by combining the pooled allele frequency data with prior knowledge about the set of possible haplotypes. Such prior information can be obtained, for example, from a database such as HapMap. We present a Bayesian haplotyping method for pooled DNA based on a continuous approximation of the multinomial distribution. The proposed method is applicable when the sizes of the DNA pools and/or the number of considered loci exceed the limits of several earlier methods. In the example analyses, the proposed model clearly outperforms a deterministic greedy algorithm on real data from the HapMap database. With a small number of loci, the performance of the proposed method is similar to that of an EM-algorithm, which uses a multinormal approximation for the pooled allele frequencies, but which does not utilize prior information about the haplotypes. The method has been implemented using Matlab and the code is available upon request from the authors.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TCBB.2009.71	DOI Listing

Publication Analysis

Top Keywords

dna pools

haplotype frequencies

large dna

allele frequency

frequency data

sampled individuals

pooled allele

proposed method

estimating haplotype

frequencies combining

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!