Inferring population structure in biobank-scale genomic data.

Am J Hum Genet

Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA. Electronic address:

Published: April 2022

Inferring the structure of human populations from genetic variation data is a key task in population and medical genomic studies. Although a number of methods for population structure inference have been proposed, current methods are impractical to run on biobank-scale genomic datasets containing millions of individuals and genetic variants. We introduce SCOPE, a method for population structure inference that is orders of magnitude faster than existing methods while achieving comparable accuracy. SCOPE infers population structure in about a day on a dataset containing one million individuals and variants as well as on the UK Biobank dataset containing 488,363 individuals and 569,346 variants. Furthermore, SCOPE can leverage allele frequencies from previous studies to improve the interpretability of population structure estimates.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9069078PMC
http://dx.doi.org/10.1016/j.ajhg.2022.02.015DOI Listing

Publication Analysis

Top Keywords

population structure
20
biobank-scale genomic
8
structure inference
8
structure
6
population
5
inferring population
4
structure biobank-scale
4
genomic data
4
data inferring
4
inferring structure
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!