Most partial domains in proteins are alignment and annotation artifacts.

Genome Biol

Department of Biochemistry and Molecular Genetics, University of Virginia, Box 800733, Charlottesville, VA, 22908, USA.

Published: May 2015

Background: Protein domains are commonly used to assess the functional roles and evolutionary relationships of proteins and protein families. Here, we use the Pfam protein family database to examine a set of candidate partial domains. Pfam protein domains are often thought of as evolutionarily indivisible, structurally compact, units from which larger functional proteins are assembled; however, almost 4% of Pfam27 PfamA domains are shorter than 50% of their family model length, suggesting that more than half of the domain is missing at those locations. To better understand the structural nature of partial domains in proteins, we examined 30,961 partial domain regions from 136 domain families contained in a representative subset of PfamA domains (RefProtDom2 or RPD2).

Results: We characterized three types of apparent partial domains: split domains, bounded partials, and unbounded partials. We find that bounded partial domains are over-represented in eukaryotes and in lower quality protein predictions, suggesting that they often result from inaccurate genome assemblies or gene models. We also find that a large percentage of unbounded partial domains produce long alignments, which suggests that their annotation as a partial is an alignment artifact; yet some can be found as partials in other sequence contexts.

Conclusions: Partial domains are largely the result of alignment and annotation artifacts and should be viewed with caution. The presence of partial domain annotations in proteins should raise the concern that the prediction of the protein's gene may be incomplete. In general, protein domains can be considered the structural building blocks of proteins.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4443539PMC
http://dx.doi.org/10.1186/s13059-015-0656-7DOI Listing

Publication Analysis

Top Keywords

partial domains
28
domains
12
protein domains
12
partial
10
domains proteins
8
alignment annotation
8
annotation artifacts
8
pfam protein
8
pfama domains
8
partial domain
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!