BUSCO: Assessing Genomic Data Quality and Beyond.

Curr Protoc

Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland.

Published: December 2021

Evaluation of the quality of genomic "data products" such as genome assemblies or gene sets is of critical importance in order to recognize possible issues and correct them during the generation of new data. It is equally essential to guide subsequent or comparative analyses with existing data, as the correct interpretation of the results necessarily requires knowledge about the quality level and reliability of the inputs. Using datasets of near universal single-copy orthologs derived from OrthoDB, BUSCO can estimate the completeness and redundancy of genomic data by providing biologically meaningful metrics based on expected gene content. These can complement technical metrics such as contiguity measures (e.g., number of contigs/scaffolds, and N50 values). Here, we describe the use of the BUSCO tool suite to assess different data types that can range from genome assemblies of single isolates and assembled transcriptomes and annotated gene sets to metagenome-assembled genomes where the taxonomic origin of the species is unknown. BUSCO is the only tool capable of assessing all these types of sequences from both eukaryotic and prokaryotic species. The protocols detail the various BUSCO running modes and the novel workflows introduced in versions 4 and 5, including the batch analysis on multiple inputs, the auto-lineage workflow to run assessments without specifying a dataset, and a workflow for the evaluation of (large) eukaryotic genomes. The protocols further cover the BUSCO setup, guidelines to interpret the results, and BUSCO "plugin" workflows for performing common operations in genomics using BUSCO results, such as building phylogenomic trees and visualizing syntenies. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Assessing an input sequence with a BUSCO dataset specified manually Basic Protocol 2: Assessing an input sequence with a dataset automatically selected by BUSCO Basic Protocol 3: Assessing multiple inputs Alternate Protocol: Decreasing analysis runtime when assessing a large number of small genomes with BUSCO auto-lineage workflow and Snakemake Support Protocol 1: BUSCO setup Support Protocol 2: Visualizing BUSCO results Support Protocol 3: Building phylogenomic trees.

Download full-text PDF

Source
http://dx.doi.org/10.1002/cpz1.323DOI Listing

Publication Analysis

Top Keywords

busco
13
basic protocol
12
protocol assessing
12
support protocol
12
genomic data
8
genome assemblies
8
gene sets
8
busco tool
8
multiple inputs
8
auto-lineage workflow
8

Similar Publications

The stone marten (Martes foina) is an important species for cytogenetic studies in the order Carnivora. ZooFISH probes created from its chromosomes provided a strong and clean signal in chromosome painting experiments and were valuable for studying the evolution of carnivoran genome architecture. The research revealed that the stone marten chromosome set is similar to the presumed ancestral karyotype of the Carnivora, which added an additional value for the species.

View Article and Find Full Text PDF

Planiliza haematocheilus, a teleostan species noted for its ecological adaptability and economic significance, thrives in both freshwater and marine environments. This study presents a novel chromosome-level genome assembly through Hi-C, PacBio CCS, and Illumina sequencing methods. The assembled genome has a final size of 651.

View Article and Find Full Text PDF

Background: In this study, we present an in-depth analysis of the Eurasian minnow (Phoxinus phoxinus) genome, highlighting its genetic diversity, structural variations, and evolutionary adaptations. We generated an annotated haplotype-phased, chromosome-level genome assembly (2n = 50) by integrating high-fidelity (HiFi) long reads and chromosome conformation capture data (Hi-C).

Results: We achieved a haploid size of 940 megabase pairs (Mbp) for haplome 1 and 929 Mbp for haplome 2 with high scaffold N50 values of 36.

View Article and Find Full Text PDF

The cabbage aphid, Brevicoryne brassicae, is a major pest on Brassicaceae plants, causing significant yield losses annually. However, the lack of genomic resources has hindered progress in understanding this pest at the molecular level. Here, we present a high-quality, chromosomal-level genome assembly for B.

View Article and Find Full Text PDF

Genome assembly of the grassland caterpillar Gynaephora qinghaiensis.

Sci Data

January 2025

State Key Laboratory of Rice Biology, Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, 310058, China.

The grassland caterpillars are the most damaging insect pests to the alpine meadow of the Qinghai-Tibetan Plateau in China. In this study, we present a genome assembly of one grassland caterpillar Gynaephora qinghaiensis by using Oxford Nanopore long-read and BGI short-read sequencing. The genome assembly of 861.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!