Misassembly of long reads undermines de novo-assembled ethnicity-specific genomes: validation in a Chinese Han population.

Hum Genet

Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, Jinan University, Guangzhou, 510632, China.

Published: July 2019

An ethnicity is characterized by genomic fragments, single nucleotide polymorphisms (SNPs), and structural variations specific to it. However, the widely used 'standard human reference genome' GRCh37/38 is based on Caucasians. Therefore, de novo-assembled reference genomes for specific ethnicities would have advantages for genetics and precision medicine applications, especially with the long-read sequencing techniques that facilitate genome assembly. In this study, we assessed the de novo-assembled Chinese Han reference genome HX1 vis-à-vis the standard GRCh38 for improving the quality of assembly and for ethnicity-specific applications. Surprisingly, all genomic sequencing datasets mapped better to GRCh38 than to HX1, even for the datasets of the Chinese Han population. This gap was mainly due to the massive structural misassembly of the HX1 reference genome rather than the SNPs between the ethnicities, and this misassembly could not be corrected by short-read whole-genome sequencing (WGS). For example, HX1 and the other de novo-assembled personal genomes failed to assemble the mitochondrial genome as a contig. We mapped 97.1% of dbSNP, 98.8% of ClinVar, and 97.2% of COSMIC variants to HX1. HX1-absent, non-synonymous ClinVar SNPs were involved in 140 genes and many important functions in various diseases, most of which were due to the assembly failure of essential exons. In contrast, the HX1-specific regions were scantly expressible, as shown in the cell lines and clinical samples of Chinese patients. Our results demonstrated that the de novo-assembled individual genome such as HX1 did not have advantages against the standard GRCh38 genome due to insufficient assembly quality, and that it is, therefore, not recommended for common use.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00439-019-02032-6DOI Listing

Publication Analysis

Top Keywords

chinese han
12
han population
8
reference genome
8
genome hx1
8
standard grch38
8
genome
6
hx1
6
novo-assembled
5
misassembly long
4
long reads
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!