MI-MAAP: marker informativeness for multi-ancestry admixed populations.

Siqi Chen Sudhir Ghandikota Yadu Gautam Tesfaye B Mersha

BMC Bioinformatics

Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, 3333 Burnet Avenue, MLC 7037, Cincinnati, OH, 45229-3026, USA.

Published: April 2020

Background: Admixed populations arise when two or more previously isolated populations interbreed. A powerful approach to addressing the genetic complexity in admixed populations is to infer ancestry. Ancestry inference including the proportion of an individual's genome coming from each population and its ancestral origin along the chromosome of an admixed population requires the use of ancestry informative markers (AIMs) from reference ancestral populations. AIMs exhibit substantial differences in allele frequency between ancestral populations. Given the huge amount of human genetic variation data available from diverse populations, a computationally feasible and cost-effective approach is becoming increasingly important to extract or filter AIMs with the maximum information content for ancestry inference, admixture mapping, forensic applications, and detecting genomic regions that have been under recent selection.

Results: To address this gap, we present MI-MAAP, an easy-to-use web-based bioinformatics tool designed to prioritize informative markers for multi-ancestry admixed populations by utilizing feature selection methods and multiple genomics resources including 1000 Genomes Project and Human Genome Diversity Project. Specifically, this tool implements a novel allele frequency-based feature selection algorithm, Lancaster Estimator of Independence (LEI), as well as other genotype-based methods such as Principal Component Analysis (PCA), Support Vector Machine (SVM), and Random Forest (RF). We demonstrated that MI-MAAP is a useful tool in prioritizing informative markers and accurately classifying ancestral populations. LEI is an efficient feature selection strategy to retrieve ancestry informative variants with different allele frequency/selection pressure among (or between) ancestries without requiring computationally expensive individual-level genotype data.

Conclusions: MI-MAAP has a user-friendly interface which provides researchers an easy and fast way to filter and identify AIMs. MI-MAAP can be accessed at https://research.cchmc.org/mershalab/MI-MAAP/login/.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7119171	PMC
http://dx.doi.org/10.1186/s12859-020-3462-5	DOI Listing

Publication Analysis

Top Keywords

admixed populations

informative markers

ancestral populations

feature selection

populations

multi-ancestry admixed

ancestry inference

ancestry informative

mi-maap

admixed

Similar Publications

The Biorepository and Integrative Genomics resource for inclusive genomics: insights from a diverse pediatric and admixed cohort.

medRxiv

January 2025

Silvia Buonaiuto Franco Marsico Akram Mohammed Lokesh K Chinthala Ernestine K Amos-Abanyie

The Biorepository and Integrative Genomics (BIG) Initiative in Tennessee has developed a pioneering resource to address gaps in genomic research by linking genomic, phenotypic, and environmental data from a diverse Mid-South population, including underrepresented groups. We analyzed 13,152 genomes from BIG and found significant genetic diversity, with 50% of participants inferred to have non-European or several types of admixed ancestry. Ancestry within the BIG cohort is stratified, with distinct geographic and demographic patterns, as African ancestry is more common in urban areas, while European ancestry is more common in suburban regions.

View Article and Find Full Text PDF

Similar Publications

gscramble: Simulation of Admixed Individuals Without Reuse of Genetic Material.

Mol Ecol Resour

January 2025

United States Department of Agriculture, Wildlife Services, National Wildlife Research Center, Fort Collins, Colorado, USA.

Eric C Anderson Rachael M Giglio Matthew G DeSaix Timothy J Smyser

While a best practice for evaluating the behaviour of genetic clustering algorithms on empirical data is to conduct parallel analyses on simulated data, these types of simulation techniques often involve sampling genetic data with replacement. In this paper we demonstrate that sampling with replacement, especially with large marker sets, inflates the perceived statistical power to correctly assign individuals (or the alleles that they carry) back to source populations-a phenomenon we refer to as resampling-induced, spurious power inflation (RISPI). To address this issue, we present gscramble, a simulation approach in R for creating biologically informed individual genotypes from empirical data that: (1) samples alleles from populations without replacement and (2) segregates alleles based on species-specific recombination rates.

View Article and Find Full Text PDF

Similar Publications

Utility of Candidate Genes From an Algorithm Designed to Predict Genetic Risk for Opioid Use Disorder.

JAMA Netw Open

January 2025

Mental Illness Research, Education and Clinical Center, Crescenz Veterans Affairs Medical Center, Philadelphia, Pennsylvania.

Christal N Davis Zeal Jinwala Alexander S Hatoum Sylvanus Toikumo Arpana Agrawal

Importance: Recently, the US Food and Drug Administration gave premarketing approval to an algorithm based on its purported ability to identify individuals at genetic risk for opioid use disorder (OUD). However, the clinical utility of the candidate genetic variants included in the algorithm has not been independently demonstrated.

Objective: To assess the utility of 15 genetic variants from an algorithm intended to predict OUD risk.

View Article and Find Full Text PDF

Similar Publications

Persistence of Ancestral KhoeSan Mitochondrial Patterns in Contemporary South African Populations.

Ann Hum Genet

January 2025

Institute of Legal Medicine, Medical University of Innsbruck, Innsbruck, Austria.

Maria Eugenia D'Amato Peter Ristow Michelle Livesey Kirsty Heynes Nicole Huber

Article Synopsis

Southern Africa has a long history of human habitation, with diverse immigration affecting the original KhoeSan populations over thousands of years, leading to their decline or admixture, primarily involving KhoeSan women.
The study analyzed mitochondrial DNA from 247 South African individuals focused on groups with historical ties to KhoeSan populations to evaluate genetic diversity and connectivity among these groups.
Results showed 142 distinct haplotypes, predominantly haplogroup L0, especially within admixed populations, indicating significant population structure and limitations in using mtDNA analysis for forensic purposes due to observed regional variations and matrilocal patterns.

View Article and Find Full Text PDF

Similar Publications

Inbreeding and Gallbladder Cancer Risk: Homozygosity Associations Adjusted for Indigenous American Ancestry, BMI, and Genetic Risk of Gallstone Disease.

Cancers (Basel)

December 2024

Statistical Genetics Research Group, Institute of Medical Biometry, Heidelberg University, Im Neuenheimer Feld 130.3, 69120 Heidelberg, Germany.

Francisco Ceballos Felix Boekstegers Dominique Scherer Carol Barahona Ponce Katherine Marcelain

Latin Americans have a rich genetic make-up that translates into heterogeneous fractions of the autosomal genome in runs of homozygosity (F) and heterogeneous types and proportions of indigenous American ancestry. While autozygosity has been linked to several human diseases, very little is known about the relationship between inbreeding, genetic ancestry, and cancer risk in Latin Americans. Chile has one of the highest incidences of gallbladder cancer (GBC) in the world, and we investigated the association between inbreeding, GBC, gallstone disease (GSD), and body mass index (BMI) in 4029 genetically admixed Chileans.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!