MI-MAAP: marker informativeness for multi-ancestry admixed populations.

BMC Bioinformatics

Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, 3333 Burnet Avenue, MLC 7037, Cincinnati, OH, 45229-3026, USA.

Published: April 2020

Background: Admixed populations arise when two or more previously isolated populations interbreed. A powerful approach to addressing the genetic complexity in admixed populations is to infer ancestry. Ancestry inference including the proportion of an individual's genome coming from each population and its ancestral origin along the chromosome of an admixed population requires the use of ancestry informative markers (AIMs) from reference ancestral populations. AIMs exhibit substantial differences in allele frequency between ancestral populations. Given the huge amount of human genetic variation data available from diverse populations, a computationally feasible and cost-effective approach is becoming increasingly important to extract or filter AIMs with the maximum information content for ancestry inference, admixture mapping, forensic applications, and detecting genomic regions that have been under recent selection.

Results: To address this gap, we present MI-MAAP, an easy-to-use web-based bioinformatics tool designed to prioritize informative markers for multi-ancestry admixed populations by utilizing feature selection methods and multiple genomics resources including 1000 Genomes Project and Human Genome Diversity Project. Specifically, this tool implements a novel allele frequency-based feature selection algorithm, Lancaster Estimator of Independence (LEI), as well as other genotype-based methods such as Principal Component Analysis (PCA), Support Vector Machine (SVM), and Random Forest (RF). We demonstrated that MI-MAAP is a useful tool in prioritizing informative markers and accurately classifying ancestral populations. LEI is an efficient feature selection strategy to retrieve ancestry informative variants with different allele frequency/selection pressure among (or between) ancestries without requiring computationally expensive individual-level genotype data.

Conclusions: MI-MAAP has a user-friendly interface which provides researchers an easy and fast way to filter and identify AIMs. MI-MAAP can be accessed at https://research.cchmc.org/mershalab/MI-MAAP/login/.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7119171PMC
http://dx.doi.org/10.1186/s12859-020-3462-5DOI Listing

Publication Analysis

Top Keywords

admixed populations
16
informative markers
12
ancestral populations
12
feature selection
12
populations
9
multi-ancestry admixed
8
ancestry inference
8
ancestry informative
8
mi-maap
5
admixed
5

Similar Publications

The Biorepository and Integrative Genomics (BIG) Initiative in Tennessee has developed a pioneering resource to address gaps in genomic research by linking genomic, phenotypic, and environmental data from a diverse Mid-South population, including underrepresented groups. We analyzed 13,152 genomes from BIG and found significant genetic diversity, with 50% of participants inferred to have non-European or several types of admixed ancestry. Ancestry within the BIG cohort is stratified, with distinct geographic and demographic patterns, as African ancestry is more common in urban areas, while European ancestry is more common in suburban regions.

View Article and Find Full Text PDF

gscramble: Simulation of Admixed Individuals Without Reuse of Genetic Material.

Mol Ecol Resour

January 2025

United States Department of Agriculture, Wildlife Services, National Wildlife Research Center, Fort Collins, Colorado, USA.

While a best practice for evaluating the behaviour of genetic clustering algorithms on empirical data is to conduct parallel analyses on simulated data, these types of simulation techniques often involve sampling genetic data with replacement. In this paper we demonstrate that sampling with replacement, especially with large marker sets, inflates the perceived statistical power to correctly assign individuals (or the alleles that they carry) back to source populations-a phenomenon we refer to as resampling-induced, spurious power inflation (RISPI). To address this issue, we present gscramble, a simulation approach in R for creating biologically informed individual genotypes from empirical data that: (1) samples alleles from populations without replacement and (2) segregates alleles based on species-specific recombination rates.

View Article and Find Full Text PDF

Importance: Recently, the US Food and Drug Administration gave premarketing approval to an algorithm based on its purported ability to identify individuals at genetic risk for opioid use disorder (OUD). However, the clinical utility of the candidate genetic variants included in the algorithm has not been independently demonstrated.

Objective: To assess the utility of 15 genetic variants from an algorithm intended to predict OUD risk.

View Article and Find Full Text PDF
Article Synopsis
  • Southern Africa has a long history of human habitation, with diverse immigration affecting the original KhoeSan populations over thousands of years, leading to their decline or admixture, primarily involving KhoeSan women.
  • The study analyzed mitochondrial DNA from 247 South African individuals focused on groups with historical ties to KhoeSan populations to evaluate genetic diversity and connectivity among these groups.
  • Results showed 142 distinct haplotypes, predominantly haplogroup L0, especially within admixed populations, indicating significant population structure and limitations in using mtDNA analysis for forensic purposes due to observed regional variations and matrilocal patterns.
View Article and Find Full Text PDF

Latin Americans have a rich genetic make-up that translates into heterogeneous fractions of the autosomal genome in runs of homozygosity (F) and heterogeneous types and proportions of indigenous American ancestry. While autozygosity has been linked to several human diseases, very little is known about the relationship between inbreeding, genetic ancestry, and cancer risk in Latin Americans. Chile has one of the highest incidences of gallbladder cancer (GBC) in the world, and we investigated the association between inbreeding, GBC, gallstone disease (GSD), and body mass index (BMI) in 4029 genetically admixed Chileans.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!