Publications by authors named "Guohui Chuai"

Article Synopsis
  • The emergence of CRISPR-Cas systems has led to the development of efficient gene editing tools, but traditional methods for discovering these systems often overlook important variants due to their reliance on sequence similarity.* -
  • A new AI framework called CHOOSER has been introduced to discover CRISPR-Cas systems without needing extensive training data, significantly enhancing the discovery process by utilizing protein large language models.* -
  • Using CHOOSER, researchers identified 11 new Casλ homologs, doubling the known catalog, and experimentally validated one homolog, EphcCasλ, for its capability in self-processing pre-crRNA and potential in CRISPR-based pathogen detection.*
View Article and Find Full Text PDF
Article Synopsis
  • Understanding how cells respond to changes in genes is crucial for many medical fields, but there are challenges in predicting outcomes from single or multiple genetic changes across different cell types.
  • The research introduces an AI method called STAMP, which breaks down the prediction of genetic outcomes into three manageable tasks: finding genes that change after perturbation, determining how they change, and measuring the extent of those changes.
  • STAMP shows significant improvements over previous methods, including the ability to identify important regulatory genes and pathways, even with small sample sizes, and to uncover detailed interactions among genes.
View Article and Find Full Text PDF

Determining correlations between molecules at various levels is an important topic in molecular biology. Large language models have demonstrated a remarkable ability to capture correlations from large amounts of data in the field of natural language processing as well as image generation, and correlations captured from data using large language models can also be applicable to solving a wide range of specific tasks, hence large language models are also referred to as foundation models. The massive amount of data that exists in the field of molecular biology provides an excellent basis for the development of foundation models, and the recent emergence of foundation models in the field of molecular biology has really pushed the entire field forward.

View Article and Find Full Text PDF

Background: The precise characterization of individual tumors and immune microenvironments using transcriptome sequencing has provided a great opportunity for successful personalized cancer treatment. However, the cancer treatment response is often characterized by in vitro assays or bulk transcriptomes that neglect the heterogeneity of malignant tumors in vivo and the immune microenvironment, motivating the need to use single-cell transcriptomes for personalized cancer treatment.

Methods: Here, we present comboSC, a computational proof-of-concept study to explore the feasibility of personalized cancer combination therapy optimization using single-cell transcriptomes.

View Article and Find Full Text PDF

The powerful CRISPR genome editing system is hindered by its off-target effects, and existing computational tools achieved limited performance in genome-wide off-target prediction due to the lack of deep understanding of the CRISPR molecular mechanism. In this study, we propose to incorporate molecular dynamics (MD) simulations in the computational analysis of CRISPR system, and present CRISOT, an integrated tool suite containing four related modules, i.e.

View Article and Find Full Text PDF

Base editing technology is being increasingly applied in genome engineering, but the current strategy for designing guide RNAs (gRNAs) relies substantially on empirical experience rather than a dependable and efficient in silico design. Furthermore, the pleiotropic effect of base editing on disease treatment remains unexplored, which prevents its further clinical usage. Here, we presented BExplorer, an integrated and comprehensive computational pipeline to optimize the design of gRNAs for 26 existing types of base editors in silico.

View Article and Find Full Text PDF
Article Synopsis
  • The single-cell Multi-View Profiler (scMVP) is a deep generative model designed to analyze sequencing data that captures both gene expression and chromatin accessibility in individual cells.
  • It generates unified latent representations for tasks like dimensionality reduction, cell clustering, and tracing developmental pathways while also providing separate imputations for differential analysis and cis-regulatory element identification.
  • scMVP addresses data sparsity challenges and improves the identification of cell groups in diverse joint profiling methods, showcasing its effectiveness on various realistic datasets.
View Article and Find Full Text PDF

Various computational methods have been developed for quantitative modeling of organic chemical reactions; however, the lack of universality as well as the requirement of large amounts of experimental data limit their broad applications. Here, we present DeepReac+, an efficient and universal computational framework for prediction of chemical reaction outcomes and identification of optimal reaction conditions based on deep active learning. Under this framework, DeepReac is designed as a graph-neural-network-based model, which directly takes 2D molecular structures as inputs and automatically adapts to different prediction tasks.

View Article and Find Full Text PDF

Motivation: Quantitative structure-activity relationship (QSAR) analysis is commonly used in drug discovery. Collaborations among pharmaceutical institutions can lead to a better performance in QSAR prediction, however, intellectual property and related financial interests remain substantially hindering inter-institutional collaborations in QSAR modeling for drug discovery.

Results: For the first time, we verified the feasibility of applying the horizontal federated learning (HFL), which is a recently developed collaborative and privacy-preserving learning framework to perform QSAR analysis.

View Article and Find Full Text PDF

Systematic evaluation of genome-wide Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) off-target profiles is a fundamental step for the successful application of the CRISPR system to clinical therapies. Many experimental techniques and in silico tools have been proposed for detecting and predicting genome-wide CRISPR off-target profiles. These techniques and tools, however, have not been systematically benchmarked.

View Article and Find Full Text PDF

Efficient single-cell assignment without prior marker gene annotations is essential for single-cell sequencing data analysis. Current methods, however, have limited effectiveness for distinct single-cell assignment. They failed to achieve a well-generalized performance in different tasks because of the inherent heterogeneity of different single-cell sequencing datasets and different single-cell types.

View Article and Find Full Text PDF

Background: Cancer neoantigens are expressed only in cancer cells and presented on the tumor cell surface in complex with major histocompatibility complex (MHC) class I proteins for recognition by cytotoxic T cells. Accurate and rapid identification of neoantigens play a pivotal role in cancer immunotherapy. Although several in silico tools for neoantigen prediction have been presented, limitations of these tools exist.

View Article and Find Full Text PDF

For genome-wide CRISPR off-target cleavage sites (OTS) prediction, an important issue is data imbalance-the number of true OTS recognized by whole-genome off-target detection techniques is much smaller than that of all possible nucleotide mismatch loci, making the training of machine learning model very challenging. Therefore, computational models proposed for OTS prediction and scoring should be carefully designed and properly evaluated in order to avoid bias. In our study, two tools are taken as examples to further emphasize the data imbalance issue in CRISPR off-target prediction to achieve better sensitivity and specificity for optimized CRISPR gene editing.

View Article and Find Full Text PDF

A major challenge for effective application of CRISPR systems is to accurately predict the single guide RNA (sgRNA) on-target knockout efficacy and off-target profile, which would facilitate the optimized design of sgRNAs with high sensitivity and specificity. Here we present DeepCRISPR, a comprehensive computational platform to unify sgRNA on-target and off-target site prediction into one framework with deep learning, surpassing available state-of-the-art in silico tools. In addition, DeepCRISPR fully automates the identification of sequence and epigenetic features that may affect sgRNA knockout efficacy in a data-driven manner.

View Article and Find Full Text PDF

CRISPR-based genome editing has been widely implemented in various cell types. In-silico single guide RNA (sgRNA) design is a key step for successful gene editing using CRISPR system. Continuing efforts are made to refine in-silico sgRNA design with high on-target efficacy and reduced off-target effects.

View Article and Find Full Text PDF

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)-based gene editing has been widely implemented in various cell types and organisms. A major challenge in the effective application of the CRISPR system is the need to design highly efficient single-guide RNA (sgRNA) with minimal off-target cleavage. Several tools are available for sgRNA design, while limited tools were compared.

View Article and Find Full Text PDF

Background: Deciphering taxonomical structures based on high dimensional sequencing data is still challenging in metagenomics study. Moreover, the common workflow processed in this field fails to identify microbial communities and their effect on a specific disease status. Even the relationships and interactions between different bacteria in a microbial community keep unknown.

View Article and Find Full Text PDF

CRISPR-based genome editing has been widely implemented in various cell types. In silico single guide RNA (sgRNA) design is a key step for successful gene editing using the CRISPR system, and continuing efforts are aimed at refining in silico sgRNA design with high on-target efficacy and reduced off-target effects. Many sgRNA design tools are available, but careful assessments of their application scenarios and performance benchmarks across different types of genome-editing data are needed.

View Article and Find Full Text PDF