A strategy to obtain the greatest number of best-performing variants with least amount of experimental effort over the vast combinatorial mutational landscape would have enormous utility in boosting resource producibility for protein engineering. Toward this goal, we present a simple and effective machine learning-based strategy that outperforms other state-of-the-art methods. Our strategy integrates zero-shot prediction and multi-round sampling to direct active learning via experimenting with only a few predicted top variants.
View Article and Find Full Text PDFThe genome-editing Cas9 protein uses multiple amino-acid residues to bind the target DNA. Considering only the residues in proximity to the target DNA as potential sites to optimise Cas9's activity, the number of combinatorial variants to screen through is too massive for a wet-lab experiment. Here we generate and cross-validate ten in silico and experimental datasets of multi-domain combinatorial mutagenesis libraries for Cas9 engineering, and demonstrate that a machine learning-coupled engineering approach reduces the experimental screening burden by as high as 95% while enriching top-performing variants by ∼7.
View Article and Find Full Text PDFThe Cas9 nuclease from Staphylococcus aureus (SaCas9) holds great potential for use in gene therapy, and variants with increased fidelity have been engineered. However, we find that existing variants have not reached the greatest accuracy to discriminate base mismatches and exhibited much reduced activity when their mutations were grafted onto the KKH mutant of SaCas9 for editing an expanded set of DNA targets. We performed structure-guided combinatorial mutagenesis to re-engineer KKH-SaCas9 with enhanced accuracy.
View Article and Find Full Text PDF