Background: Many medicinal plants are known for their complex genomes with high ploidy, heterozygosity, and repetitive content which pose severe challenges for genome sequencing of those species. Long reads from Oxford nanopore sequencing technology (ONT) or Pacific Biosciences Single Molecule, Real-Time (SMRT) sequencing offer great advantages in de novo genome assembly, especially for complex genomes with high heterozygosity and repetitive content. Currently, multiple allotetraploid species have sequenced their genomes by long-read sequencing. However, we found that a considerable proportion of these genomes (7.9% on average, maximum 23.7%) could not be covered by NGS (Next Generation Sequencing) reads (uncovered region by NGS reads, UCR) suggesting the questionable and low-quality of those area or genomic areas that can't be sequenced by NGS due to sequencing bias. The underlying causes of those UCR in the genome assembly and solutions to this problem have never been studied.

Methods: In the study, we sequenced the tetraploid genome of Veratrum dahuricum (Turcz.) O. Loes (VDL), a Chinese medicinal plant, with ONT platform and assembled the genome with three strategies in parallel. We compared the qualities, coverage, and heterozygosity of the three ONT assemblies with another released assembly of the same individual using reads from PacBio circular consensus sequencing (CCS) technology, to explore the cause of the UCR.

Results: By mapping the NGS reads against the three ONT assemblies and the CCS assembly, we found that the coverage of those ONT assemblies by NGS reads ranged from 49.15 to 76.31%, much smaller than that of the CCS assembly (99.53%). And alignment between ONT assemblies and CCS assembly showed that most UCR can be aligned with CCS assembly. So, we conclude that the UCRs in ONT assembly are low-quality sequences with a high error rate that can't be aligned with short reads, rather than genomic regions that can't be sequenced by NGS. Further comparison among the intermediate versions of ONT assemblies showed that the most probable origin of those errors is a combination of artificial errors introduced by "self-correction" and initial sequencing error in long reads. We also found that polishing the ONT assembly with CCS reads can correct those errors efficiently.

Conclusions: Through analyzing genome features and reads alignment, we have found the causes for the high proportion of UCR in ONT assembly of VDL are sequencing errors and additional errors introduced by self-correction. The high error rates of ONT-raw reads make them not suitable for self-correction prior to allotetraploid genome assembly, as the self-correction will introduce artificial errors to > 5% of the UCR sequences. We suggest high-precision CCS reads be used to polish the assembly to correct those errors effectively for polyploid genomes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9364492PMC
http://dx.doi.org/10.1186/s13020-022-00644-1DOI Listing

Publication Analysis

Top Keywords

ont assemblies
20
ccs assembly
16
reads
13
high error
12
assembly
12
genome assembly
12
ngs reads
12
ont assembly
12
ont
11
sequencing
10

Similar Publications

Unlabelled: As sequencing costs decrease, short-read and long-read technologies are indispensable tools for uncovering the genetic drivers behind bacterial pathogen resistance. This study explores the differences between the use of short-read (Illumina) and long-read (Oxford Nanopore Technologies, ONT) sequencing in detecting antimicrobial resistance (AMR) genes in ESKAPE pathogens ( and ). Utilizing a dataset of 1,385 whole genome sequences and applying commonly used bioinformatic methods in bacterial genomics, we assessed the differences in genomic completeness, pangenome structure, and AMR gene and point mutation identification.

View Article and Find Full Text PDF

Objective: Multiple-Locus Variable Number of Tandem Repeats (VNTR) Analysis (MLVA) is widely used to subtype pathogens causing foodborne and waterborne disease outbreaks. The MLVAType shiny application was previously designed to extract MLVA profiles of Vibrio cholerae isolates from whole-genome sequencing (WGS) data, and provide backward compatibility with traditional MLVA typing methods. The previous development and validation work was conducted using short (pair-end 300 and 150 nt long) reads from Illumina MiSeq and Hiseq sequencing.

View Article and Find Full Text PDF

Near complete genome assembly of Yadong trout (Salmo trutta).

Sci Data

January 2025

State Key Laboratory of Mariculture Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, Shandong, 266071, China.

The Yadong trout (Salmo trutta), a species endemic to the Yatung River in Tibet, China, was classified as a second-class protected species in the 20th century. Now, it is considered one of the most important fishery resources in China. In this study, we assembled a near-complete genome of the S.

View Article and Find Full Text PDF

The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research and clinical laboratories to evaluate variant detection on male sex chromosomes X and Y, we create a small variant benchmark set with 111,725 variants for the Genome in a Bottle HG002 reference material. We develop an active evaluation approach to demonstrate the benchmark set reliably identifies errors in challenging genomic regions and across short and long read callsets.

View Article and Find Full Text PDF

Telomere-to-telomere genome and resequencing of 254 individuals reveal evolution, genomic footprints in Asian icefish, Protosalanx chinensis.

Gigascience

January 2025

Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture and Rural Affairs, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China.

The Asian icefish, Protosalanx chinensis, has undergone extensive colonization in various waters across China for decades due to its ecological and physiological significance as well as its economic importance in the fishery resource. Here, we decoded a telomere-to-telomere (T2T) genome for P. chinensis combining PacBio HiFi long reads and ultra-long ONT (nanopore) reads and Hi-C data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!