N-7methylguanosine (m7G) modification plays a crucial role in various biological processes and is closely associated with the development and progression of many cancers. Accurate identification of m7G modification sites is essential for understanding their regulatory mechanisms and advancing cancer therapy. Previous studies often suffered from insufficient research data, underutilization of motif information, and lack of interpretability.
View Article and Find Full Text PDFMotivation: Diabetes is a chronic metabolic disorder that has been a major cause of blindness, kidney failure, heart attacks, stroke, and lower limb amputation across the world. To alleviate the impact of diabetes, researchers have developed the next generation of anti-diabetic drugs, known as dipeptidyl peptidase IV inhibitory peptides (DPP-IV-IPs). However, the discovery of these promising drugs has been restricted due to the lack of effective peptide-mining tools.
View Article and Find Full Text PDFMolecular representation learning (MRL) is a fundamental task for drug discovery. However, previous deep-learning (DL) methods focus excessively on learning robust inner-molecular representations by mask-dominated pretraining frameworks, neglecting abundant chemical reactivity molecular relationships that have been demonstrated as the determining factor for various molecular property prediction tasks. Here, we present MolCAP to promote MRL, a graph-pretraining Transformer based on chemical reactivity (IMR) knowledge with prompted finetuning.
View Article and Find Full Text PDFNcRNA-encoded small peptides (ncPEPs) have recently emerged as promising targets and biomarkers for cancer immunotherapy. Therefore, identifying cancer-associated ncPEPs is crucial for cancer research. In this work, we propose CoraL, a novel supervised contrastive meta-learning framework for predicting cancer-associated ncPEPs.
View Article and Find Full Text PDFThe promoter region, positioned proximal to the transcription start sites, exerts control over the initiation of gene transcription by modulating the interaction with RNA polymerase. Consequently, the accurate recognition of promoter regions represents a critical focus within the bioinformatics domain. Although some methods leveraging pre-trained language models (PLMs) for promoter prediction have been proposed, the full potential of such PLMs remains largely untapped.
View Article and Find Full Text PDFRecent research has highlighted the pivotal role of RNA post-transcriptional modifications in the regulation of RNA expression and function. Accurate identification of RNA modification sites is important for understanding RNA function. In this study, we propose a novel RNA modification prediction method, namely Rm-LR, which leverages a long-range-based deep learning approach to accurately predict multiple types of RNA modifications using RNA sequences only.
View Article and Find Full Text PDFAnticancer peptides (ACPs) recently have been receiving increasing attention in cancer therapy due to their low consumption, few adverse side effects, and easy accessibility. However, it remains a great challenge to identify anticancer peptides via experimental approaches, requiring expensive and time-consuming experimental studies. In addition, traditional machine-learning-based methods are proposed for ACP prediction mainly depending on hand-crafted feature engineering, which normally achieves low prediction performance.
View Article and Find Full Text PDFMotivation: Plant Small Secreted Peptides (SSPs) play an important role in plant growth, development, and plant-microbe interactions. Therefore, the identification of SSPs is essential for revealing the functional mechanisms. Over the last few decades, machine learning-based methods have been developed, accelerating the discovery of SSPs to some extent.
View Article and Find Full Text PDFHere, we present DeepBIO, the first-of-its-kind automated and interpretable deep-learning platform for high-throughput biological sequence functional analysis. DeepBIO is a one-stop-shop web service that enables researchers to develop new deep-learning architectures to answer any biological question. Specifically, given any biological sequence data, DeepBIO supports a total of 42 state-of-the-art deep-learning algorithms for model training, comparison, optimization and evaluation in a fully automated pipeline.
View Article and Find Full Text PDFAccurately predicting peptide secondary structures remains a challenging task due to the lack of discriminative information in short peptides. In this study, PHAT is proposed, a deep hypergraph learning framework for the prediction of peptide secondary structures and the exploration of downstream tasks. The framework includes a novel interpretable deep hypergraph multi-head attention network that uses residue-based reasoning for structure prediction.
View Article and Find Full Text PDFBackground: Cell-penetrating peptides (CPPs) have received considerable attention as a means of transporting pharmacologically active molecules into living cells without damaging the cell membrane, and thus hold great promise as future therapeutics. Recently, several machine learning-based algorithms have been proposed for predicting CPPs. However, most existing predictive methods do not consider the agreement (disagreement) between similar (dissimilar) CPPs and depend heavily on expert knowledge-based handcrafted features.
View Article and Find Full Text PDFSummary: Identifying the protein-peptide binding residues is fundamentally important to understand the mechanisms of protein functions and explore drug discovery. Although several computational methods have been developed, most of them highly rely on third-party tools or complex data preprocessing for feature design, easily resulting in low computational efficacy and suffering from low predictive performance. To address the limitations, we propose PepBCL, a novel BERT (Bidirectional Encoder Representation from Transformers) -based contrastive learning framework to predict the protein-peptide binding residues based on protein sequences only.
View Article and Find Full Text PDFDNA N4-methylcytosine (4mC) is an important DNA modification and plays a crucial role in a variety of biological processes. Accurate 4mC site identification is fundamental to improving the understanding of 4mC biological functions and mechanisms. However, lots of identification approaches are limited to traditional machine learning, which leads to weak learning ability and a complex feature extraction process.
View Article and Find Full Text PDFRecently, machine learning methods have been developed to identify various peptide bio-activities. However, due to the lack of experimentally validated peptides, machine learning methods cannot provide a sufficiently trained model, easily resulting in poor generalizability. Furthermore, there is no generic computational framework to predict the bioactivities of different peptides.
View Article and Find Full Text PDFMotivation: DNA methylation plays an important role in epigenetic modification, the occurrence, and the development of diseases. Therefore, identification of DNA methylation sites is critical for better understanding and revealing their functional mechanisms. To date, several machine learning and deep learning methods have been developed for the prediction of different DNA methylation types.
View Article and Find Full Text PDF