Optimization scheme of machine learning model for genetic division between northern Han, southern Han, Korean and Japanese.

Yi Chuan

Key Laboratory of Tianjin for Epigenetics, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China.

Published: November 2022

AI Article Synopsis

  • The main populations in East Asia include Han Chinese, Korean, and Japanese, with Han Chinese showing genetic variation from north to south.
  • The study aimed to classify southern and northern Han Chinese alongside Korean and Japanese individuals using 1185 ancestry informative SNPs and two machine learning algorithms (softmax and randomForest).
  • The softmax model achieved a high accuracy of 92% in classifying these populations, indicating the effectiveness of these methods for fine-scale genetic differentiation, which could benefit forensic DNA analysis.

Article Abstract

Han Chinese, Korean and Japanese are the main populations of East Asia, and Han Chinese presents a gradient admixture from north to south. There are differences among the East Asian populations in genetic structure. To achieve fine-scale genetic classification of southern (S-) and northern (N-) Han Chinese, Korean and Japanese individuals in this study, we collected and analyzed 1185 ancestry informative SNPs (AISNPs) from previous literature reports and our laboratory findings. First, two machine learning algorithms, softmax and randomForest, were used to build genetic classification models. Then, phylogenetic tree, STRUCTURE and principal component analysis were used to evaluate the performance of classification for different AISNP panels. The 234-AISNP panel achieved a fine-scale differentiation among the target populations in four classification schemes. The accuracy of the softmax model was 92%, which realized the accurate classification of the S-Han, N-Han, Korean and Japanese individuals. The two machine learning models tested in this study provided important references for the high-resolution discrimination of close-range populations and will be useful tools to optimize marker panels for developing forensic DNA ancestry inference systems.

Download full-text PDF

Source
http://dx.doi.org/10.16288/j.yczz.22-073DOI Listing

Publication Analysis

Top Keywords

korean japanese
16
machine learning
12
han chinese
12
northern han
8
chinese korean
8
genetic classification
8
japanese individuals
8
han
5
classification
5
optimization scheme
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!