Principal component regression (PCR) was applied to a spectral library of proteins in H2O solution acquired by single-pass attenuated total reflectance (ATR) Fourier transform infrared (FT-IR) spectroscopy. PCR was used to predict the secondary structure content, principally alpha-helical and the beta-sheet content, of proteins within a spectral library. Quantitation of protein secondary structure content was performed as a proof of principle that use of single-pass ATR-FT-IR is an appropriate method for protein secondary structure analysis. The ATR-FT-IR method permits acquisition of the entire spectral range from 700 to 3900 cm(-1) without significant interference from water bands. An "inside model space" bootstrap and a genetic algorithm (GA) were used to improve prediction results. Specifically, the bootstrap was utilized to increase the number of replicates for adequate training and validation of the PCR model. The GA was used to optimize PCR parameters, particularly wavenumber selection. The use of the bootstrap allowed for adequate representation of variability in the amide A, amide B, and C-H stretching regions due to differing levels of sample hydration. Implementation of the bootstrap improved the robustness of the PCR models significantly; however, the use of a GA only slightly improved prediction results. Two spectral libraries are presented where one was better suited for beta-sheet content prediction and the other for alpha-helix content prediction. The GA-optimized PCR method for alpha-helix content prediction utilized 120 wavenumbers within the amide I, II, A, B, and IV and the C-H stretching regions and 18 factors. For beta-sheet content predictions, 580 wavenumbers within the amide I, II, A, and B and the C-H stretching regions and 18 factors were used. The validation results using these two methods yielded an average absolute error of 1.7% for alpha-helix content prediction and an average absolute error of 2.3% for beta-sheet content prediction. After the PCR models were developed and validated, they were used to predict the alpha-helix and beta-sheet content of two unknowns, casein and immunoglobulin G.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1021/ac020104n | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!