Optical map guided genome assembly.

BMC Bioinformatics

Department of Computer Science, Helsinki Institute for Information Technology, University of Helsinki, Pietari Kalmin katu 5, Helsinki, Finland.

Published: July 2020

Background: The long reads produced by third generation sequencing technologies have significantly boosted the results of genome assembly but still, genome-wide assemblies solely based on read data cannot be produced. Thus, for example, optical mapping data has been used to further improve genome assemblies but it has mostly been applied in a post-processing stage after contig assembly.

Results: We propose OPTICALKERMIT which directly integrates genome wide optical maps into contig assembly. We show how genome wide optical maps can be used to localize reads on the genome and then we adapt the Kermit method, which originally incorporated genetic linkage maps to the miniasm assembler, to use this information in contig assembly. Our experimental results show that incorporating genome wide optical maps to the contig assembly of miniasm increases NGA50 while the number of misassemblies decreases or stays the same. Furthermore, when compared to the Canu assembler, OPTICALKERMIT produces an assembly with almost three times higher NGA50 with a lower number of misassemblies on real A. thaliana reads.

Conclusions: OPTICALKERMIT successfully incorporates optical mapping data directly to contig assembly of eukaryotic genomes. Our results show that this is a promising approach to improve the contiguity of genome assemblies.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7336458PMC
http://dx.doi.org/10.1186/s12859-020-03623-1DOI Listing

Publication Analysis

Top Keywords

contig assembly
16
genome wide
12
wide optical
12
optical maps
12
genome
8
genome assembly
8
optical mapping
8
mapping data
8
genome assemblies
8
maps contig
8

Similar Publications

A new capulavirus infecting sugar beet (Beta vulgaris L.) in France.

Arch Virol

January 2025

Univ. Bordeaux, INRAE, UMR 1332 Biologie du Fruit et Pathologie, CS20032, 33882, Villenave d'Ornon Cedex, France.

A novel capulavirus was identified by high-throughput sequencing in four sugar beet (Beta vulgaris L.) plants collected in April 2023 in Normandy (France). The complete genome of 2744 nucleotides (nt) was sequenced and found to have an organization similar to that of known capulaviruses, with which it showed close phylogenetic relationships.

View Article and Find Full Text PDF

Whole-genome automated assembly pipeline for strains from reference, and clinical samples using the integrated CtGAP pipeline.

NAR Genom Bioinform

March 2025

Departments of Medicine and Pediatrics, Division of Infectious Diseases and Global Health, University of California San Francisco School of Medicine, 550 16th Street, 4th Floor Mission Hall, San Francisco, CA, 94158, USA.

Whole genome sequencing (WGS) is pivotal for the molecular characterization of ()-the leading bacterial cause of sexually transmitted infections and infectious blindness worldwide. WGS can inform epidemiologic, public health and outbreak investigations of these human-restricted pathogens. However, challenges persist in generating high-quality genomes for downstream analyses given its obligate intracellular nature and difficulty with propagation.

View Article and Find Full Text PDF

Given the presence of highly repetitive genomic regions such as subtelomeric regions, understanding human genomic evolution remains challenging. Recently, long-read sequencing technology has facilitated the identification of complex genetic variants, including structural variants (SVs), at the single-nucleotide level. Here, we resolved SVs and their underlying DNA damage-repair mechanisms in subtelomeric regions, which are among the most uncharted genomic regions.

View Article and Find Full Text PDF

As one of the most threatened mammalian taxa, lemurs of Madagascar are facing unprecedented anthropogenic pressures. To address conservation imperatives such as this, researchers have increasingly relied on conservation genomics to identify populations of particular concern. However, many of these genomic approaches necessitate high-quality genomes.

View Article and Find Full Text PDF

Telomere-to-telomere genome and resequencing of 254 individuals reveal evolution, genomic footprints in Asian icefish, Protosalanx chinensis.

Gigascience

January 2025

Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture and Rural Affairs, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China.

The Asian icefish, Protosalanx chinensis, has undergone extensive colonization in various waters across China for decades due to its ecological and physiological significance as well as its economic importance in the fishery resource. Here, we decoded a telomere-to-telomere (T2T) genome for P. chinensis combining PacBio HiFi long reads and ultra-long ONT (nanopore) reads and Hi-C data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!