Determining correlations between molecules at various levels is an important topic in molecular biology. Large language models have demonstrated a remarkable ability to capture correlations from large amounts of data in the field of natural language processing as well as image generation, and correlations captured from data using large language models can also be applicable to solving a wide range of specific tasks, hence large language models are also referred to as foundation models. The massive amount of data that exists in the field of molecular biology provides an excellent basis for the development of foundation models, and the recent emergence of foundation models in the field of molecular biology has really pushed the entire field forward.
View Article and Find Full Text PDFBackground: The precise characterization of individual tumors and immune microenvironments using transcriptome sequencing has provided a great opportunity for successful personalized cancer treatment. However, the cancer treatment response is often characterized by in vitro assays or bulk transcriptomes that neglect the heterogeneity of malignant tumors in vivo and the immune microenvironment, motivating the need to use single-cell transcriptomes for personalized cancer treatment.
Methods: Here, we present comboSC, a computational proof-of-concept study to explore the feasibility of personalized cancer combination therapy optimization using single-cell transcriptomes.
The powerful CRISPR genome editing system is hindered by its off-target effects, and existing computational tools achieved limited performance in genome-wide off-target prediction due to the lack of deep understanding of the CRISPR molecular mechanism. In this study, we propose to incorporate molecular dynamics (MD) simulations in the computational analysis of CRISPR system, and present CRISOT, an integrated tool suite containing four related modules, i.e.
View Article and Find Full Text PDFGenomics Proteomics Bioinformatics
December 2023
Base editing technology is being increasingly applied in genome engineering, but the current strategy for designing guide RNAs (gRNAs) relies substantially on empirical experience rather than a dependable and efficient in silico design. Furthermore, the pleiotropic effect of base editing on disease treatment remains unexplored, which prevents its further clinical usage. Here, we presented BExplorer, an integrated and comprehensive computational pipeline to optimize the design of gRNAs for 26 existing types of base editors in silico.
View Article and Find Full Text PDFVarious computational methods have been developed for quantitative modeling of organic chemical reactions; however, the lack of universality as well as the requirement of large amounts of experimental data limit their broad applications. Here, we present DeepReac+, an efficient and universal computational framework for prediction of chemical reaction outcomes and identification of optimal reaction conditions based on deep active learning. Under this framework, DeepReac is designed as a graph-neural-network-based model, which directly takes 2D molecular structures as inputs and automatically adapts to different prediction tasks.
View Article and Find Full Text PDFMotivation: Quantitative structure-activity relationship (QSAR) analysis is commonly used in drug discovery. Collaborations among pharmaceutical institutions can lead to a better performance in QSAR prediction, however, intellectual property and related financial interests remain substantially hindering inter-institutional collaborations in QSAR modeling for drug discovery.
Results: For the first time, we verified the feasibility of applying the horizontal federated learning (HFL), which is a recently developed collaborative and privacy-preserving learning framework to perform QSAR analysis.
Nucleic Acids Res
November 2020
Systematic evaluation of genome-wide Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) off-target profiles is a fundamental step for the successful application of the CRISPR system to clinical therapies. Many experimental techniques and in silico tools have been proposed for detecting and predicting genome-wide CRISPR off-target profiles. These techniques and tools, however, have not been systematically benchmarked.
View Article and Find Full Text PDFEfficient single-cell assignment without prior marker gene annotations is essential for single-cell sequencing data analysis. Current methods, however, have limited effectiveness for distinct single-cell assignment. They failed to achieve a well-generalized performance in different tasks because of the inherent heterogeneity of different single-cell sequencing datasets and different single-cell types.
View Article and Find Full Text PDFBackground: Cancer neoantigens are expressed only in cancer cells and presented on the tumor cell surface in complex with major histocompatibility complex (MHC) class I proteins for recognition by cytotoxic T cells. Accurate and rapid identification of neoantigens play a pivotal role in cancer immunotherapy. Although several in silico tools for neoantigen prediction have been presented, limitations of these tools exist.
View Article and Find Full Text PDFFor genome-wide CRISPR off-target cleavage sites (OTS) prediction, an important issue is data imbalance-the number of true OTS recognized by whole-genome off-target detection techniques is much smaller than that of all possible nucleotide mismatch loci, making the training of machine learning model very challenging. Therefore, computational models proposed for OTS prediction and scoring should be carefully designed and properly evaluated in order to avoid bias. In our study, two tools are taken as examples to further emphasize the data imbalance issue in CRISPR off-target prediction to achieve better sensitivity and specificity for optimized CRISPR gene editing.
View Article and Find Full Text PDFA major challenge for effective application of CRISPR systems is to accurately predict the single guide RNA (sgRNA) on-target knockout efficacy and off-target profile, which would facilitate the optimized design of sgRNAs with high sensitivity and specificity. Here we present DeepCRISPR, a comprehensive computational platform to unify sgRNA on-target and off-target site prediction into one framework with deep learning, surpassing available state-of-the-art in silico tools. In addition, DeepCRISPR fully automates the identification of sequence and epigenetic features that may affect sgRNA knockout efficacy in a data-driven manner.
View Article and Find Full Text PDFCRISPR-based genome editing has been widely implemented in various cell types. In-silico single guide RNA (sgRNA) design is a key step for successful gene editing using CRISPR system. Continuing efforts are made to refine in-silico sgRNA design with high on-target efficacy and reduced off-target effects.
View Article and Find Full Text PDFCRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)-based gene editing has been widely implemented in various cell types and organisms. A major challenge in the effective application of the CRISPR system is the need to design highly efficient single-guide RNA (sgRNA) with minimal off-target cleavage. Several tools are available for sgRNA design, while limited tools were compared.
View Article and Find Full Text PDFBackground: Deciphering taxonomical structures based on high dimensional sequencing data is still challenging in metagenomics study. Moreover, the common workflow processed in this field fails to identify microbial communities and their effect on a specific disease status. Even the relationships and interactions between different bacteria in a microbial community keep unknown.
View Article and Find Full Text PDFTrends Biotechnol
January 2017
CRISPR-based genome editing has been widely implemented in various cell types. In silico single guide RNA (sgRNA) design is a key step for successful gene editing using the CRISPR system, and continuing efforts are aimed at refining in silico sgRNA design with high on-target efficacy and reduced off-target effects. Many sgRNA design tools are available, but careful assessments of their application scenarios and performance benchmarks across different types of genome-editing data are needed.
View Article and Find Full Text PDF