Background: Mass spectrometry-based proteomics can identify and quantify thousands of proteins from individual microbial species, but a significant percentage of these proteins are unannotated and hence classified as proteins of unknown function (PUFs). Due to the difficulty in extracting meaningful metabolic information, PUFs are often overlooked or discarded during data analysis, even though they might be critically important in functional activities, in particular for metabolic engineering research.
Results: We optimized and employed a pipeline integrating various "guilt-by-association" (GBA) metrics, including differential expression and co-expression analyses of high-throughput mass spectrometry proteome data and phylogenetic coevolution analysis, and sequence homology-based approaches to determine putative functions for PUFs in Clostridium thermocellum. Our various analyses provided putative functional information for over 95% of the PUFs detected by mass spectrometry in a wild-type and/or an engineered strain of C. thermocellum. In particular, we validated a predicted acyltransferase PUF (WP_003519433.1) with functional activity towards 2-phenylethyl alcohol, consistent with our GBA and sequence homology-based predictions.
Conclusions: This work demonstrates the value of leveraging sequence homology-based annotations with empirical evidence based on the concept of GBA to broadly predict putative functions for PUFs, opening avenues to further interrogation via targeted experiments.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8112048 | PMC |
http://dx.doi.org/10.1186/s13068-021-01964-4 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!