CRISPR/Cas9 cleavage efficiency regression through boosting algorithms and Markov sequence profiling.

Bioinformatics

Faculty of Engineering and Information Technology, Advanced Analytics Institute, University of Technology Sydney, Broadway, NSW, Australia.

Published: September 2018

Motivation: CRISPR/Cas9 system is a widely used genome editing tool. A prediction problem of great interests for this system is: how to select optimal single-guide RNAs (sgRNAs), such that its cleavage efficiency is high meanwhile the off-target effect is low.

Results: This work proposed a two-step averaging method (TSAM) for the regression of cleavage efficiencies of a set of sgRNAs by averaging the predicted efficiency scores of a boosting algorithm and those by a support vector machine (SVM). We also proposed to use profiled Markov properties as novel features to capture the global characteristics of sgRNAs. These new features are combined with the outstanding features ranked by the boosting algorithm for the training of the SVM regressor. TSAM improved the mean Spearman correlation coefficiencies comparing with the state-of-the-art performance on benchmark datasets containing thousands of human, mouse and zebrafish sgRNAs. Our method can be also converted to make binary distinctions between efficient and inefficient sgRNAs with superior performance to the existing methods. The analysis reveals that highly efficient sgRNAs have lower melting temperature at the middle of the spacer, cut at 5'-end closer parts of the genome and contain more 'A' but less 'G' comparing with inefficient ones. Comprehensive further analysis also demonstrates that our tool can predict an sgRNA's cutting efficiency with consistently good performance no matter it is expressed from an U6 promoter in cells or from a T7 promoter in vitro.

Availability And Implementation: Online tool is available at http://www.aai-bioinfo.com/CRISPR/. Python and Matlab source codes are freely available at https://github.com/penn-hui/TSAM.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bty298DOI Listing

Publication Analysis

Top Keywords

cleavage efficiency
8
boosting algorithm
8
sgrnas
6
crispr/cas9 cleavage
4
efficiency
4
efficiency regression
4
regression boosting
4
boosting algorithms
4
algorithms markov
4
markov sequence
4

Similar Publications

We show that a small biotin-binding RNA aptamer that folds into a pseudoknot structure acts as a substrate for bacterial RNase P RNA (RPR) with and without the RNase P C5 protein. Cleavage in the single-stranded region in loop 1 was shown to depend on the presence of a RCCA-motif at the 3' end of the substrate. The nucleobase and the 2'hydroxyl at the position immediately 5' of the cleavage site contribute to both cleavage efficiency and site selection, where C at this position induces significant cleavage at an alternative site, one base upstream of the main cleavage site.

View Article and Find Full Text PDF

'Splice-at-will' Cas12a crRNA engineering enabled direct quantification of ultrashort RNAs.

Nucleic Acids Res

January 2025

Key Laboratory of Applied Surface and Colloid Chemistry, Ministry of Education, Key Laboratory of Analytical Chemistry for Life Science of Shaanxi Province, School of Chemistry & Chemical Engineering, Shaanxi Normal University, 620 West Chang'an Avenue, Chang'an District, Xi'an, Shaanxi 710119, P.R. China.

We present a robust 'splice-at-will' CRISPR RNA (crRNA) engineering mechanism that overcomes the limitations of clustered regularly interspaced short palindromic repeats (CRISPR)/Cas system in directly detecting ultrashort RNAs. In this strategy, an intact Cas12a crRNA can be split from almost any site of the spacer region to obtain a truncated crRNA (tcrRNA) that cannot activate Cas12a even after binding an auxiliary DNA activator. While splicing tcrRNAs with a moiety of ultrashort RNA, the formed combination can work together to activate Cas12a efficiently, enabling 'splice-at-will' crRNA engineering.

View Article and Find Full Text PDF

H5Nx viruses continue to wreak havoc in avian and mammalian species worldwide. The virus distinguishes itself by the ability to replicate to high titers and transmit efficiently in a wide variety of hosts in diverse climatic environments. Fortunately, transmission to and between humans is scarce.

View Article and Find Full Text PDF

CRISPR-Cas enzymes must recognize a protospacer-adjacent motif (PAM) to edit a genomic site, significantly limiting the range of targetable sequences in a genome. Machine learning-based protein engineering provides a powerful solution to efficiently generate Cas protein variants tailored to recognize specific PAMs. Here, we present Protein2PAM, an evolution-informed deep learning model trained on a dataset of over 45,000 CRISPR-Cas PAMs.

View Article and Find Full Text PDF

Precise identification and analysis of multiple protein biomarkers on the surface of breast cancer cell-derived extracellular vesicles (BC-EVs) are of great significance for noninvasive diagnosis of the breast cancer subtypes, but it remains a major challenge owing to their high heterogeneity and low abundance. Herein, we established a CRISPR-based homogeneous electrochemical strategy for near-zero background and ultrasensitive detection of BC-EVs. To realize the high-performance capture and isolation of BC-EVs, fluidity-enhanced magnetic nanoprobes were facilely prepared.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!