Genome assembly databases are growing rapidly. The sequence content in each new assembly can be largely redundant with previous ones, but this is neither conceptually nor algorithmically easy to measure. We propose new methods and a new tool called DandD that addresses the question of how much new sequence is gained when a sequence collection grows. DandD can describe how much human structural variation is being discovered in each new human genome assembly and when discoveries will level off in the future. DandD uses a measure called ("delta"), developed initially for data compression. Computing directly requires counting -mers, but DandD can rapidly estimate it using genomic sketches. We also propose as an alternative to -mer-specific cardinalities when computing the Jaccard coefficient, avoiding the pitfalls of a poor choice of . We demonstrate the utility of DandD's functions for estimating , characterizing the rate of pangenome growth, and computing all-pairs similarities using k-independent Jaccard. DandD is open source software available at: https://github.com/jessicabonnie/dandd.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9915590PMC
http://dx.doi.org/10.1101/2023.02.02.526837DOI Listing

Publication Analysis

Top Keywords

genome assembly
8
dandd
6
dandd efficient
4
efficient measurement
4
sequence
4
measurement sequence
4
sequence growth
4
growth similarity
4
similarity genome
4
assembly databases
4

Similar Publications

Comparative genomic analysis of Fusarium oxysporum f. sp. lycopersici reveals telomeric duplications of a lineage-specific region carrying SIX8 and PSL1 and genome-wide expansion of Foxy transposable elements.

Int J Biol Macromol

January 2025

State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding 071000, China; Key Laboratory of Vegetable Germplasm Innovation and Utilization of Hebei, Ministry of Education of China-Hebei Province Joint Innovation Center for Efficient Green Vegetable Industry, College of Horticulture, Hebei Agricultural University, Baoding 071000, China; Division of Plant Sciences, Research School of Biology, Australian National University, Canberra, ACT 2601, Australia. Electronic address:

Fusarium oxysporum f. sp. lycopersici (Fol), the causal agent of tomato wilt disease, is a soil-borne, vascular-colonizing fungal pathogen that severely impacts tomato production in most growing regions worldwide.

View Article and Find Full Text PDF

This study presents the first chromosome-level genome assembly of the Korean long-tailed chicken (KLC), a unique breed of Gallus gallus known as Ginkkoridak. Our assembly achieved a super contig N50 of 5.7 Mbp and a scaffold N50 exceeding 90 Mb, with a genome completeness of 96.

View Article and Find Full Text PDF

Chromosome-level genome assembly, annotation, and population genomic resource of argali (Ovis ammon).

Sci Data

January 2025

Key Laboratory of Ecological Safety and Sustainable Development in Arid Lands, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, 830011, China.

Argali stands as the largest species among wild sheep in Central and East Asia, with a concerning rate of decline estimated at 30%. The intraspecific taxonomy of argali remains contentious due to limited genomic data and unclear geographic separation. In this study, we constructed a chromosome-level genome assembly and annotation for the Tibetan argali (O.

View Article and Find Full Text PDF

Chromosome-scale genome assembly of three-spotted seahorse (Hippocampus trimaculatus) with a unique karyotype.

Sci Data

January 2025

Laboratory of Aquatic Genomics, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, 518057, China.

Three-spotted seahorse (Hippocampi trimaculata) is a unique fish with important economic and medicinal values, and its total chromosome number is potentially quite different from other seahorse species. Herein, we constructed a chromosome-level genome assembly for this special seahorse by integration of MGI short-read, PacBio HiFi long-read and Hi-C sequencing techniques. A 416.

View Article and Find Full Text PDF

The Southern Ground Hornbill (SGH - Bucorvus leadbeateri) is one of the largest hornbill species worldwide, known for its complex social structures and breeding behaviours. This bird has been of great interest due to its declining population and disappearance from historic ranges in southern Africa. Despite being the focus of numerous conservation efforts, with research forming an integral part of these initiatives, there is still a substantial lack of knowledge regarding the molecular biology aspects of this bird species.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!