CRISPR sequences are sometimes erroneously translated and can contaminate public databases with spurious proteins containing spaced repeats.

Database (Oxford)

Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, Ctra. Utrera, Km.1, 41013, Sevilla, Spain.

Published: January 2020

The genomics era is resulting in the generation of a plethora of biological sequences that are usually stored in public databases. There are many computational tools that facilitate the annotation of these sequences, but sometimes they produce mistakes that enter the databases and can be propagated when erroneous data are used for secondary analyses, such as gene prediction or homology searching. While developing a computational gene finder based on protein-coding sequences, we discovered that the reference UniProtKB protein database is contaminated with some spurious sequences translated from DNA containing clustered regularly interspaced short palindromic repeats. We therefore encourage developers of prokaryotic computational gene finders and protein database curators to consider this source of error.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7673337PMC
http://dx.doi.org/10.1093/database/baaa088DOI Listing

Publication Analysis

Top Keywords

public databases
8
computational gene
8
protein database
8
crispr sequences
4
sequences erroneously
4
erroneously translated
4
translated contaminate
4
contaminate public
4
databases spurious
4
spurious proteins
4

Similar Publications

Background: Information exchange regarding the scope and content of health studies is becoming increasingly important. Digital methods, including study websites, can facilitate such an exchange.

Objective: This scoping review aimed to describe how digital information exchange occurs between the public and researchers in health studies.

View Article and Find Full Text PDF

Purpose: Diabetes prevalence is increasing among older adults globally. The current study aimed to compare geriatric syndrome prevalence in older adults with and without diabetes.

Method: Primary research (2011 to 2024) in English, French, or Spanish was included.

View Article and Find Full Text PDF

Salmonella enterica serovar Typhimurium is a prevalent food-borne pathogen that is usually associated with gastroenteritis infection. S. Typhimurium is also a major cause of bloodstream infections in sub-Saharan Africa, and is responsible for invasive non-typhoidal Salmonella (iNTS) disease.

View Article and Find Full Text PDF

Aim: Sleep apnoea syndrome (SAS) is a common sleep disorder associated with heightened cardiovascular risks, yet sex-specific differences in these risks remain unclear.

Methods: This retrospective observational cohort study utilized the JMDC Claims Database, covering >5 million individuals in Japan. We analyzed data from 4,173,702 individuals (2,406,930 men, 1,766,772 women) after excluding those with central SAS, cardiovascular disease, and incomplete lifestyle questionnaire data.

View Article and Find Full Text PDF

Introduction: Older adults represent a growing proportion of the general population. Nonsteroidal anti-inflammatory drugs (NSAIDs) constitute a group of medicines that are both necessary, owing to their anti-inflammatory, analgesic, and cardioprotective abilities, and potentially harmful, owing to their side effects.

Objectives: This study provides a comprehensive analysis of NSAID usage patterns among Polish adults aged 60 years and older.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!