Unveiling protein corona composition: predicting with resampling embedding and machine learning.

Rong Liao Yan Zhuang Xiangfeng Li Ke Chen Xingming Wang Cong Feng Guangfu Yin Xiangdong Zhu Jiangli Lin Xingdong Zhang

Regen Biomater

College of Biomedical Engineering, National Engineering Research Centre for Biomaterials, Sichuan University, Chengdu, 610065, China.

Published: December 2023

Biomaterials with surface nanostructures boost protein secretion and tissue regeneration, but predicting the protein corona (PC) formed when nanoparticles enter the body is challenging and crucial for assessing osteoinductivity.
Traditional machine learning models like Random Forest struggle with imbalanced data in PC predictions, but this study introduces resampling techniques to improve accuracy, achieving a 0.68 correlation coefficient and a 0.90 RMSE.
The research successfully validated predictions for four nanoparticles and identified that incubation plasma concentration, particle size distribution index (PDI), and surface modification are key factors affecting PC composition.

Biomaterials with surface nanostructures effectively enhance protein secretion and stimulate tissue regeneration. When nanoparticles (NPs) enter the living system, they quickly interact with proteins in the body fluid, forming the protein corona (PC). The accurate prediction of the PC composition is critical for analyzing the osteoinductivity of biomaterials and guiding the reverse design of NPs. However, achieving accurate predictions remains a significant challenge. Although several machine learning (ML) models like Random Forest (RF) have been used for PC prediction, they often fail to consider the extreme values in the abundance region of PC absorption and struggle to improve accuracy due to the imbalanced data distribution. In this study, resampling embedding was introduced to resolve the issue of imbalanced distribution in PC data. Various ML models were evaluated, and RF model was finally used for prediction, and good correlation coefficient () and root-mean-square deviation (RMSE) values were obtained. Our ablation experiments demonstrated that the proposed method achieved an of 0.68, indicating an improvement of approximately 10%, and an RMSE of 0.90, representing a reduction of approximately 10%. Furthermore, through the verification of label-free quantification of four NPs: hydroxyapatite (HA), titanium dioxide (TiO), silicon dioxide (SiO) and silver (Ag), and we achieved a prediction performance with an value >0.70 using Random Oversampling. Additionally, the feature analysis revealed that the composition of the PC is most significantly influenced by the incubation plasma concentration, PDI and surface modification.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10781662	PMC
http://dx.doi.org/10.1093/rb/rbad082	DOI Listing

Publication Analysis

Top Keywords

protein corona

resampling embedding

machine learning

unveiling protein

corona composition

composition predicting

predicting resampling

embedding machine

learning biomaterials

biomaterials surface

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!