Improving the consistency of domain annotation within the Conserved Domain Database.

Database (Oxford)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bldg. 38 A, Room 5S508, 8600 Rockville Pike, Bethesda, MD 20894, USA

Published: September 2015

When annotating protein sequences with the footprints of evolutionarily conserved domains, conservative score or E-value thresholds need to be applied for RPS-BLAST hits, to avoid many false positives. We notice that manual inspection and classification of hits gathered at a higher threshold can add a significant amount of valuable domain annotation. We report an automated algorithm that 'rescues' valuable borderline-scoring domain hits that are well-supported by domain architecture (DA, the sequential order of conserved domains in a protein query), including tandem repeats of domain hits reported at a more conservative threshold. This algorithm is now available as a selectable option on the public conserved domain search (CD-Search) pages. We also report on the possibility to 'suppress' domain hits close to the threshold based on a lack of well-supported DA and to implement this conservatively as an option in live conserved domain searches and for pre-computed results. Improving domain annotation consistency will in turn reduce the fraction of NR sequences with incomplete DAs.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4356950PMC
http://dx.doi.org/10.1093/database/bav012DOI Listing

Publication Analysis

Top Keywords

domain annotation
12
conserved domain
12
domain hits
12
domain
10
conserved domains
8
conserved
5
hits
5
improving consistency
4
consistency domain
4
annotation conserved
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!