Vast shotgun metagenomics data remain an underutilized resource for novel enzymes. Artificial intelligence (AI) has increasingly been applied to protein mining, but its conventional performance evaluation is interpolative in nature, and these trained models often struggle to extrapolate effectively when challenged with unknown data. In this study, we present a framework (DeepMineLys [deep mining of phage lysins from human microbiome]) based on the convolutional neural network (CNN) to identify phage lysins from three human microbiome datasets. When validated with an independent dataset, our method achieved an F1-score of 84.00%, surpassing existing methods by 20.84%. We expressed 16 lysin candidates from the top 100 sequences in E. coli, confirming 11 as active. The best one displayed an activity 6.2-fold that of lysozyme derived from hen egg white, establishing it as the most potent lysin from the human microbiome. Our study also underscores several important issues when applying AI to biology questions. This framework should be applicable for mining other proteins.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.celrep.2024.114583 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!