Admixture estimation plays a crucial role in ancestry inference and genome-wide association studies (GWASs). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 10 to 10 samples and millions of markers commonly found in modern biobanks.
View Article and Find Full Text PDFWe present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction.
View Article and Find Full Text PDFThis article applies the recently proposed "stability selection" procedure of Meinshausen and Bühlmann to the problem of variable selection in genome-wide association. In particular, it explores whether stability selection can identify new regions of interest originally missed or can call into legitimate question regions originally flagged. Our analysis of the seven data sets of the Wellcome Trust Case-Control Consortium suggests that stability selection effectively controls the family-wise error rate but suffers a loss of power.
View Article and Find Full Text PDFBMC Bioinformatics
June 2011
Background: The estimation of individual ancestry from genetic data has become essential to applied population genetics and genetic epidemiology. Software programs for calculating ancestry estimates have become essential tools in the geneticist's analytic arsenal.
Results: Here we describe four enhancements to ADMIXTURE, a high-performance tool for estimating individual ancestries and population allele frequencies from SNP (single nucleotide polymorphism) data.
Population stratification has long been recognized as a confounding factor in genetic association studies. Estimated ancestries, derived from multi-locus genotype data, can be used to perform a statistical correction for population stratification. One popular technique for estimation of ancestry is the model-based approach embodied by the widely applied program structure.
View Article and Find Full Text PDF