COInr and mkCOInr: Building and customizing a nonredundant barcoding reference database from BOLD and NCBI using a semi-automated pipeline.

Mol Ecol Resour

Aix Marseille Univ, Avignon Univ, CNRS, IRD, IMBE, Marseille, France.

Published: May 2023

Reference databases with wide taxonomic coverage are greatly needed in many fields of biology, most particularly for the taxonomic assignment of metabarcoding sequences. Therefore, it is fundamental to be able to access and pool data from different primary databases. The COInr database is a freely available, easy-to-access database of COI reference sequences extracted from the BOLD and NCBI nucleotide databases. It is a comprehensive database: not limited to a taxon, a gene region or a taxonomic rank; therefore, it is a good starting point for creating custom databases. Sequences are dereplicated between databases and within taxa. Each taxon has a unique taxonomic identifier (taxID), fundamental to avoid ambiguous associations of homonyms and synonyms in the source database. TaxIDs form a coherent hierarchical system fully compatible with the NCBI taxIDs, allowing their full or ranked lineages to be created. The mkcoinr tool is a series of Perl scripts designed to download sequences from BOLD and NCBI, to build the COInr database and to customize it according to the users' needs. It is possible to select or eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for blast, vtam, qiime and rdp classifier. This is a semi-automated pipeline using command lines in a Linux environment. The COInr database can be downloaded from https://doi.org/10.5281/zenodo.6555985 and mkcoinr and its full documentation is available at https://github.com/meglecz/mkCOInr.

Download full-text PDF

Source
http://dx.doi.org/10.1111/1755-0998.13756DOI Listing

Publication Analysis

Top Keywords

bold ncbi
12
coinr database
12
database
8
semi-automated pipeline
8
gene region
8
sequences
6
databases
5
taxonomic
5
coinr
4
coinr mkcoinr
4

Similar Publications

Freshwater habitats and their quality have always been of utmost importance for human subsistence. Water quality assessment is an important tool, covering biological, chemical and hydromorphological aspects. Bioindicators such as the bivalves can be used as evidence for good water quality, but widespread groups such as species of the family Sphaeriidae Deshayes,1855 (1822) and genus Pisidium/Euglesa/Odhneripidisium also known as 'pea clams' are poorly known and lack taxonomic expertise.

View Article and Find Full Text PDF

Biodiversity Patterns and DNA Barcode Gap Analysis of COI in Coastal Lagoons of Albania.

Biology (Basel)

November 2024

Department of Biological and Environmental Sciences and Technologies, DiSTeBA, University of Salento, Via Monteroni 165, 73100 Lecce, Italy.

Aquatic biodiversity includes a variety of unique species, their habitats, and their interactions with each other. Albania has a large hydrographic network including rivers, lakes, wetlands and coastal marine areas, contributing to a high level of aquatic biodiversity. Currently, evaluating aquatic biodiversity relies on morphological species identification methods, but DNA-based taxonomic identification could improve the monitoring and assessment of aquatic ecosystems.

View Article and Find Full Text PDF

Land managers, researchers, and regulators increasingly utilize environmental DNA (eDNA) techniques to monitor species richness, presence, and absence. In order to properly develop a biological assay for eDNA metabarcoding or quantitative PCR, scientists must be able to find not only reference sequences (previously identified sequences in a genomics database) that match their target taxa but also reference sequences that match non-target taxa. Determining which taxa have publicly available sequences in a time-efficient and accurate manner currently requires computational skills to search, manipulate, and parse multiple unconnected DNA sequence databases.

View Article and Find Full Text PDF

A molecular approach to identify parrotfish (Sparisoma) species during early ontogeny.

J Fish Biol

October 2024

Laboratorio de Genética de la Conservación, Departamento de Biología de la Conservación, Centro de Investigación y de Educación Superior de Ensenada (CICESE), Ensenada, Mexico.

Sparisoma species (parrotfish) comprise an important functional group contributing to coral-reef resilience. The morphological diagnostic characteristics for species identification are clearly described for adult forms but not for the early stages. Consequently, many taxonomical listings of Sparisoma larvae are restricted to the genus level.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!