The recent development of CRISPR-Cas technology holds promise to correct gene-level defects for genetic diseases. The key element of the CRISPR-Cas system is the Cas protein, a nuclease that can edit the gene of interest assisted by guide RNA. However, these Cas proteins suffer from inherent limitations such as large size, low cleavage efficiency, and off-target effects, hindering their widespread application as a gene editing tool. Therefore, there is a need to identify novel Cas proteins with improved editing properties, for which it is necessary to understand the underlying features governing the Cas families. In this study, we aim to elucidate the unique protein features associated with Cas9 and Cas12 families and identify the features distinguishing each family from non-Cas proteins. Here, we built Random Forest (RF) binary classifiers to distinguish Cas12 and Cas9 proteins from non-Cas proteins, respectively, using the complete protein feature spectrum (13,494 features) encoding various physiochemical, topological, constitutional, and coevolutionary information on Cas proteins. Furthermore, we built multiclass RF classifiers differentiating Cas9, Cas12, and non-Cas proteins. All the models were evaluated rigorously on the test and independent data sets. The Cas12 and Cas9 binary models achieved a high overall accuracy of 92% and 95% on their respective independent data sets, while the multiclass classifier achieved an F1 score of close to 0.98. We observed that Quasi-Sequence-Order (QSO) descriptors like Schneider.lag and Composition descriptors like charge, volume, and polarizability are predominant in the Cas12 family. Conversely Amino Acid Composition descriptors, especially Tripeptide Composition (TPC), predominate the Cas9 family. Four of the top 10 descriptors identified in Cas9 classification are tripeptides PWN, PYY, HHA, and DHI, which are seen to be conserved across all Cas9 proteins and located within different catalytically important domains of the Cas9 (SpCas9) structure. Among these, DHI and HHA are well-known to be involved in the DNA cleavage activity of the SpCas9 protein. Mutation studies have highlighted the significance of the PWN tripeptide in PAM recognition and DNA cleavage activity of SpCas9, while Y450 from the PYY tripeptide plays a crucial role in reducing off-target effects and improving the specificity in SpCas9. Leveraging our machine learning (ML) pipeline, we identified numerous Cas9 and Cas12 family-specific features. These features offer valuable insights for future experimental and computational studies aiming at designing Cas systems with enhanced gene-editing properties. These features suggest plausible structural modifications that can effectively guide the development of Cas proteins with improved editing capabilities.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.4c00625DOI Listing

Publication Analysis

Top Keywords

cas9 cas12
16
cas proteins
16
non-cas proteins
12
cas9
10
proteins
10
features
8
family-specific features
8
machine learning
8
complete protein
8
protein feature
8

Similar Publications

RNA-guided endonucleases are involved in processes ranging from adaptive immunity to site-specific transposition and have revolutionized genome editing. CRISPR-Cas9, -Cas12 and related proteins use guide RNAs to recognize ∼20-nucleotide target sites within genomic DNA by mechanisms that are not yet fully understood. We used structural and biochemical methods to assess early steps in DNA recognition by Cas12a protein-guide RNA complexes.

View Article and Find Full Text PDF

State of the art CRISPR-based strategies for cancer diagnostics and treatment.

Biomark Res

December 2024

Department of Medicine and Sciences of Aging, "G. d'Annunzio University" of Chieti- Pescara, Via dei Vestini, Chieti, 66100, Italy.

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) technology is a groundbreaking and dynamic molecular tool for DNA and RNA "surgery". CRISPR/Cas9 is the most widely applied system in oncology research. It is a major advancement in genome manipulation due to its precision, efficiency, scalability and versatility compared to previous gene editing methods.

View Article and Find Full Text PDF

CRISPR-Cas system, a natural acquired immune system in prokaryotes that defends against exogenous DNA invasion because of its simple structure and easy operation, has been widely used in many research fields such as synthetic biology, crop genetics and breeding, precision medicine, and so on. The miniature CRISPR-Cas12 system has been an emerging genome editing tool in recent years. Compared to the commonly used CRISPR-Cas9 and CRISPR-Cas12a, the miniature CRISPR-Cas12 system has unique advantages, such as rich PAM sites, higher specificity, smaller volume, and cytotoxicity.

View Article and Find Full Text PDF

Clustered regularly interspaced short palindromic repeats (CRISPR)-Cas systems have gained attention for their revolutionary potential in tuberculosis (TB) management, providing a novel approach to both diagnostics and treatment. This technology, renowned for its ability to accurately target and modify genetic material, offers a promising solution to the limitations of current TB diagnostic methods, which often rely on time-consuming culture techniques or polymerase chain reaction (PCR)-based assays. One of the key advantages of CRISPR-Cas systems is their high specificity and sensitivity, making them well-suited for detecting , even in low-bacterial-load samples.

View Article and Find Full Text PDF

The objective of this systematic review (SR) was to select studies on the use of gene editing by CRISPR technology related to plant resistance to biotic stresses. We sought to evaluate articles deposited in six electronic databases, using pre-defined inclusion and exclusion criteria. This SR demonstrates that countries such as China and the United States of America stand out in studies with CRISPR/Cas.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!