Despite the long history of genome assembly research, there remains a large gap between the theoretical and practical work. There is practical software with little theoretical underpinning of accuracy on one hand and theoretical algorithms which have not been adopted in practice on the other. In this paper we attempt to bridge the gap between theory and practice by showing how the theoretical safe-and-complete framework can be integrated into existing assemblers in order to improve contiguity. The optimal algorithm in this framework, called the omnitig algorithm, has not been used in practice due to its complexity and its lack of robustness to real data. Instead, we pursue a simplified notion of omnitigs, giving an efficient algorithm to compute them and demonstrating their safety under certain conditions. We modify two assemblers (wtdbg2 and Flye) by replacing their unitig algorithm with the simple omnitig algorithm. We test our modifications using real HiFi data from the Drosophilia melanogaster and the Caenorhabditis elegans genome. Our modified algorithms lead to a substantial improvement in alignment-based contiguity, with negligible computational costs and either no or a small increase in the number of misassemblies.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9915519PMC
http://dx.doi.org/10.1101/2023.01.30.526175DOI Listing

Publication Analysis

Top Keywords

genome assembly
8
omnitig algorithm
8
algorithm
5
omnitig framework
4
framework improve
4
improve genome
4
assembly contiguity
4
practice
4
contiguity practice
4
practice despite
4

Similar Publications

The influx of whole genome sequencing (WGS) data in the public health and clinical diagnostic sectors has created a need for data analysis methods and bioinformatics expertise, which can be a bottleneck for many laboratories. At Sciensano, the Belgian national public health institute, an intuitive and user-friendly bioinformatics tool portal was implemented using Galaxy, an open-source platform for data analysis and workflow creation. The Galaxy @Sciensano instance is available to both internal and external scientists and offers a wide range of tools provided by the community, complemented by over 50 custom tools and pipelines developed in-house.

View Article and Find Full Text PDF

Segmental duplications (SDs) contribute significantly to human disease, evolution and diversity but have been difficult to resolve at the sequence level. We present a population genetics survey of SDs by analyzing 170 human genome assemblies (from 85 samples representing 38 Africans and 47 non-Africans) in which the majority of autosomal SDs are fully resolved using long-read sequence assembly. Excluding the acrocentric short arms and sex chromosomes, we identify 173.

View Article and Find Full Text PDF

Telomere-to-telomere sheep genome assembly identifies variants associated with wool fineness.

Nat Genet

January 2025

Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China.

Ongoing efforts to improve sheep reference genome assemblies still leave many gaps and incomplete regions, resulting in a few common failures and errors in genomic studies. Here, we report a 2.85-Gb gap-free telomere-to-telomere genome of a ram (T2T-sheep1.

View Article and Find Full Text PDF

Satellite DNA shapes dictate pericentromere packaging in female meiosis.

Nature

January 2025

Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

The abundance and sequence of satellite DNA at and around centromeres is evolving rapidly despite the highly conserved and essential process through which the centromere directs chromosome inheritance. The impact of such rapid evolution is unclear. Here we find that sequence-dependent DNA shape dictates packaging of pericentromeric satellites in female meiosis through a conserved DNA-shape-recognizing chromatin architectural protein, high mobility group AT-hook 1 (HMGA1).

View Article and Find Full Text PDF

The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research and clinical laboratories to evaluate variant detection on male sex chromosomes X and Y, we create a small variant benchmark set with 111,725 variants for the Genome in a Bottle HG002 reference material. We develop an active evaluation approach to demonstrate the benchmark set reliably identifies errors in challenging genomic regions and across short and long read callsets.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!