Background: The long reads produced by third generation sequencing technologies have significantly boosted the results of genome assembly but still, genome-wide assemblies solely based on read data cannot be produced. Thus, for example, optical mapping data has been used to further improve genome assemblies but it has mostly been applied in a post-processing stage after contig assembly.
Results: We propose OPTICALKERMIT which directly integrates genome wide optical maps into contig assembly. We show how genome wide optical maps can be used to localize reads on the genome and then we adapt the Kermit method, which originally incorporated genetic linkage maps to the miniasm assembler, to use this information in contig assembly. Our experimental results show that incorporating genome wide optical maps to the contig assembly of miniasm increases NGA50 while the number of misassemblies decreases or stays the same. Furthermore, when compared to the Canu assembler, OPTICALKERMIT produces an assembly with almost three times higher NGA50 with a lower number of misassemblies on real A. thaliana reads.
Conclusions: OPTICALKERMIT successfully incorporates optical mapping data directly to contig assembly of eukaryotic genomes. Our results show that this is a promising approach to improve the contiguity of genome assemblies.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7336458 | PMC |
http://dx.doi.org/10.1186/s12859-020-03623-1 | DOI Listing |
Arch Virol
January 2025
Univ. Bordeaux, INRAE, UMR 1332 Biologie du Fruit et Pathologie, CS20032, 33882, Villenave d'Ornon Cedex, France.
A novel capulavirus was identified by high-throughput sequencing in four sugar beet (Beta vulgaris L.) plants collected in April 2023 in Normandy (France). The complete genome of 2744 nucleotides (nt) was sequenced and found to have an organization similar to that of known capulaviruses, with which it showed close phylogenetic relationships.
View Article and Find Full Text PDFNAR Genom Bioinform
March 2025
Departments of Medicine and Pediatrics, Division of Infectious Diseases and Global Health, University of California San Francisco School of Medicine, 550 16th Street, 4th Floor Mission Hall, San Francisco, CA, 94158, USA.
Whole genome sequencing (WGS) is pivotal for the molecular characterization of ()-the leading bacterial cause of sexually transmitted infections and infectious blindness worldwide. WGS can inform epidemiologic, public health and outbreak investigations of these human-restricted pathogens. However, challenges persist in generating high-quality genomes for downstream analyses given its obligate intracellular nature and difficulty with propagation.
View Article and Find Full Text PDFNucleic Acids Res
January 2025
Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, 125, Gwahak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea.
Given the presence of highly repetitive genomic regions such as subtelomeric regions, understanding human genomic evolution remains challenging. Recently, long-read sequencing technology has facilitated the identification of complex genetic variants, including structural variants (SVs), at the single-nucleotide level. Here, we resolved SVs and their underlying DNA damage-repair mechanisms in subtelomeric regions, which are among the most uncharted genomic regions.
View Article and Find Full Text PDFAs one of the most threatened mammalian taxa, lemurs of Madagascar are facing unprecedented anthropogenic pressures. To address conservation imperatives such as this, researchers have increasingly relied on conservation genomics to identify populations of particular concern. However, many of these genomic approaches necessitate high-quality genomes.
View Article and Find Full Text PDFGigascience
January 2025
Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture and Rural Affairs, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China.
The Asian icefish, Protosalanx chinensis, has undergone extensive colonization in various waters across China for decades due to its ecological and physiological significance as well as its economic importance in the fishery resource. Here, we decoded a telomere-to-telomere (T2T) genome for P. chinensis combining PacBio HiFi long reads and ultra-long ONT (nanopore) reads and Hi-C data.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!