Fibrous proteins such as collagen, silk, and elastin play critical biological roles, yet they have been the subject of few projects that use computational techniques to predict either their class or their structure. In this article, we present FiberID, a simple yet effective method for identifying and distinguishing three fibrous protein subclasses from their primary sequences. Using a combination of amino acid composition and fast Fourier measurements, FiberID can classify fibrous proteins belonging to these subclasses with high accuracy by using two standard machine learning techniques (decision trees and Naïve Bayesian classifiers). After presenting our results, we present several fibrous sequences that are regularly misclassified by FiberID as sequences of potential interest for further study. Finally, we analyze the decision trees developed by FiberID for potential insights regarding the structure of these proteins.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1002/prot.21128 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!