Background: Admixed populations arise when two or more previously isolated populations interbreed. A powerful approach to addressing the genetic complexity in admixed populations is to infer ancestry. Ancestry inference including the proportion of an individual's genome coming from each population and its ancestral origin along the chromosome of an admixed population requires the use of ancestry informative markers (AIMs) from reference ancestral populations. AIMs exhibit substantial differences in allele frequency between ancestral populations. Given the huge amount of human genetic variation data available from diverse populations, a computationally feasible and cost-effective approach is becoming increasingly important to extract or filter AIMs with the maximum information content for ancestry inference, admixture mapping, forensic applications, and detecting genomic regions that have been under recent selection.
Results: To address this gap, we present MI-MAAP, an easy-to-use web-based bioinformatics tool designed to prioritize informative markers for multi-ancestry admixed populations by utilizing feature selection methods and multiple genomics resources including 1000 Genomes Project and Human Genome Diversity Project. Specifically, this tool implements a novel allele frequency-based feature selection algorithm, Lancaster Estimator of Independence (LEI), as well as other genotype-based methods such as Principal Component Analysis (PCA), Support Vector Machine (SVM), and Random Forest (RF). We demonstrated that MI-MAAP is a useful tool in prioritizing informative markers and accurately classifying ancestral populations. LEI is an efficient feature selection strategy to retrieve ancestry informative variants with different allele frequency/selection pressure among (or between) ancestries without requiring computationally expensive individual-level genotype data.
Conclusions: MI-MAAP has a user-friendly interface which provides researchers an easy and fast way to filter and identify AIMs. MI-MAAP can be accessed at https://research.cchmc.org/mershalab/MI-MAAP/login/.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7119171 | PMC |
http://dx.doi.org/10.1186/s12859-020-3462-5 | DOI Listing |
The Biorepository and Integrative Genomics (BIG) Initiative in Tennessee has developed a pioneering resource to address gaps in genomic research by linking genomic, phenotypic, and environmental data from a diverse Mid-South population, including underrepresented groups. We analyzed 13,152 genomes from BIG and found significant genetic diversity, with 50% of participants inferred to have non-European or several types of admixed ancestry. Ancestry within the BIG cohort is stratified, with distinct geographic and demographic patterns, as African ancestry is more common in urban areas, while European ancestry is more common in suburban regions.
View Article and Find Full Text PDFMol Ecol Resour
January 2025
United States Department of Agriculture, Wildlife Services, National Wildlife Research Center, Fort Collins, Colorado, USA.
While a best practice for evaluating the behaviour of genetic clustering algorithms on empirical data is to conduct parallel analyses on simulated data, these types of simulation techniques often involve sampling genetic data with replacement. In this paper we demonstrate that sampling with replacement, especially with large marker sets, inflates the perceived statistical power to correctly assign individuals (or the alleles that they carry) back to source populations-a phenomenon we refer to as resampling-induced, spurious power inflation (RISPI). To address this issue, we present gscramble, a simulation approach in R for creating biologically informed individual genotypes from empirical data that: (1) samples alleles from populations without replacement and (2) segregates alleles based on species-specific recombination rates.
View Article and Find Full Text PDFJAMA Netw Open
January 2025
Mental Illness Research, Education and Clinical Center, Crescenz Veterans Affairs Medical Center, Philadelphia, Pennsylvania.
Importance: Recently, the US Food and Drug Administration gave premarketing approval to an algorithm based on its purported ability to identify individuals at genetic risk for opioid use disorder (OUD). However, the clinical utility of the candidate genetic variants included in the algorithm has not been independently demonstrated.
Objective: To assess the utility of 15 genetic variants from an algorithm intended to predict OUD risk.
Ann Hum Genet
January 2025
Institute of Legal Medicine, Medical University of Innsbruck, Innsbruck, Austria.
Cancers (Basel)
December 2024
Statistical Genetics Research Group, Institute of Medical Biometry, Heidelberg University, Im Neuenheimer Feld 130.3, 69120 Heidelberg, Germany.
Latin Americans have a rich genetic make-up that translates into heterogeneous fractions of the autosomal genome in runs of homozygosity (F) and heterogeneous types and proportions of indigenous American ancestry. While autozygosity has been linked to several human diseases, very little is known about the relationship between inbreeding, genetic ancestry, and cancer risk in Latin Americans. Chile has one of the highest incidences of gallbladder cancer (GBC) in the world, and we investigated the association between inbreeding, GBC, gallstone disease (GSD), and body mass index (BMI) in 4029 genetically admixed Chileans.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!