As an increasing number of plant genome sequences become available, it is clear that gene content varies between individuals, and the challenge arises to predict the gene content of a species. However, genome comparison is often confounded by variation in assembly and annotation. Differentiating between true gene absence and variation in assembly or annotation is essential for the accurate identification of conserved and variable genes in a species. Here, we present the de novo assembly of the B. napus cultivar Tapidor and comparison with an improved assembly of the Brassica napus cultivar Darmor-bzh. Both cultivars were annotated using the same method to allow comparison of gene content. We identified genes unique to each cultivar and differentiate these from artefacts due to variation in the assembly and annotation. We demonstrate that using a common annotation pipeline can result in different gene predictions, even for closely related cultivars, and repeat regions which collapse during assembly impact whole genome comparison. After accounting for differences in assembly and annotation, we demonstrate that the genome of Darmor-bzh contains a greater number of genes than the genome of Tapidor. Our results are the first step towards comparison of the true differences between B. napus genomes and highlight the potential sources of error in future production of a B. napus pangenome.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5698052PMC
http://dx.doi.org/10.1111/pbi.12742DOI Listing

Publication Analysis

Top Keywords

assembly annotation
16
gene content
12
variation assembly
12
assembly
8
genome comparison
8
annotation demonstrate
8
genome
5
gene
5
comparison
5
annotation
5

Similar Publications

Here, we report the resequencing, assembly, and annotation of two actinomycete genomes containing abyssomicin gene clusters. DSM 45791 with a circular chromosome of 11,681,598 bp and 4 circular plasmids (14,175-207,548 bp) and sp. NL15-2K with a 12,368,159 bp linear genome and circular plasmid (11,584 bp).

View Article and Find Full Text PDF

Draft genome dataset of strain R-35 isolated from tidal pool sediments.

Data Brief

February 2025

Applied Microbial and Health Biotechnology Institute, Cape Peninsula University of Technology, PO Box 1906, Bellville, Cape Town, 7530, South Africa.

The marine isolate, strain R-35, was isolated from marine sediments collected from the Glencairn Tidal Pool, Table Mountain National Park, Cape Town, South Africa. The genomic DNA was sequenced using the Ion Torrent GeneStudio™ S5 platform, and the assembly was performed using the SPAdes assembler on the Centre for High Performance Computing (CHPC) Lengau Cluster located at the CSIR, Rosebank, South Africa. The draft genome assembly consisted of 722 contigs totaling 7,625,174 base pairs and a G+C% content of 72.

View Article and Find Full Text PDF

Chromosome-level reference genome and annotation of the Arctic fish Anisarchus medius.

Sci Data

January 2025

State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou, 350002, China.

Anisarchus medius (Reinhardt, 1837) is a widely distributed Arctic fish, serving as an indicator of climate change impacts on coastal Arctic ecosystems. This study presents a chromosome-level genome assembly for A. medius using PacBio sequencing and Hi-C technology.

View Article and Find Full Text PDF

Despite the recent surge of viral metagenomic studies, it remains a significant challenge to recover complete virus genomes from metagenomic data. The majority of viral contigs generated from de novo assembly programs are highly fragmented, presenting significant challenges to downstream analysis and inference. To address this issue, we have developed Virseqimprover, a computational pipeline that can extend assembled contigs to complete or nearly complete genomes while maintaining extension quality.

View Article and Find Full Text PDF

With the increasing availability of high-quality genome assemblies, pangenome graphs emerged as a new paradigm in the genomics field for identifying, encoding, and presenting genomic variation at both population and species levels. However, it remains challenging to truly dissect and interpret pangenome graphs via biologically informative visualization. To facilitate better exploration and understanding of pangenome graphs towards novel biological insights, here we present a web-based interactive Visualization and interpretation framework for linear-Reference-projected Pangenome Graphs (VRPG).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!