The associations between cancer and bacteria/fungi have been extensively studied, but the implications of cancer-associated viruses have not been thoroughly examined. In this study, we comprehensively characterized the cancer virome of tissue samples across 31 cancer types, as well as blood samples from 23 cancer types. Our findings demonstrated the presence of viral DNA at low abundances in both tissue and blood across major human cancers, with significant differences in viral community composition observed among various cancer types.
View Article and Find Full Text PDFDNase I hypersensitive sites (DHSs) are a specific genomic region, which is critical to detect or understand cis-regulatory elements. Although there are many methods developed to detect DHSs, there is a big gap in practice. We presented a deep learning-based language model for predicting DHSs, named LangMoDHS.
View Article and Find Full Text PDFWith the development of high-throughput sequencing technology, the scale of single-cell RNA sequencing (scRNA-seq) data has surged. Its data are typically high-dimensional, with high dropout noise and high sparsity. Therefore, gene imputation and cell clustering analysis of scRNA-seq data is increasingly important.
View Article and Find Full Text PDFTranscription factors (TFs) are typical regulators for gene expression and play versatile roles in cellular processes. Since it is time-consuming, costly, and labor-intensive to detect it by using physical methods, it is desired to develop a computational method to detect TFs. Here, we presented a capsule network-based method for identifying TFs.
View Article and Find Full Text PDFBreast cancer patients often have recurrence and metastasis after surgery. Predicting the risk of recurrence and metastasis for a breast cancer patient is essential for the development of precision treatment. In this study, we proposed a novel multi-modal deep learning prediction model by integrating hematoxylin & eosin (H&E)-stained histopathological images, clinical information and gene expression data.
View Article and Find Full Text PDFDeep learning technology is changing the landscape of cybersecurity research, especially the study of large amounts of data. With the rapid growth in the number of malware, developing of an efficient and reliable method for classifying malware has become one of the research priorities. In this paper, a new method, BIR-CNN, is proposed to classify of Android malware.
View Article and Find Full Text PDFEnhancers are short DNA segments that play a key role in biological processes, such as accelerating transcription of target genes. Since the enhancer resides anywhere in a genome sequence, it is difficult to precisely identify enhancers. We presented a bi-directional long-short term memory (Bi-LSTM) and attention-based deep learning method (Enhancer-LSTMAtt) for enhancer recognition.
View Article and Find Full Text PDFPharmaceuticals (Basel)
June 2022
Bioactive peptides are typically small functional peptides with 2-20 amino acid residues and play versatile roles in metabolic and biological processes. Bioactive peptides are multi-functional, so it is vastly challenging to accurately detect all their functions simultaneously. We proposed a convolution neural network (CNN) and bi-directional long short-term memory (Bi-LSTM)-based deep learning method (called MPMABP) for recognizing multi-activities of bioactive peptides.
View Article and Find Full Text PDFComput Biol Chem
June 2022
The embryonic stem cell (ESC) has the capacity to self-renew and maintain pluripotent, while continuously offering a source of various differentiated cell types. The fate decision process of remaining in the ground state or transiting to a differentiated state can be read out by the regulatory network of key transcription factors (TFs). However, its underlying mechanism remains to be fully elucidated.
View Article and Find Full Text PDFStudies have found that long non-coding RNAs (lncRNAs) play important roles in many human biological processes, and it is critical to explore potential lncRNA-disease associations, especially cancer-associated lncRNAs. However, traditional biological experiments are costly and time-consuming, so it is of great significance to develop effective computational models. We developed a random walk algorithm with restart on multiplex and heterogeneous networks of lncRNAs and diseases to predict lncRNA-disease associations (MHRWRLDA).
View Article and Find Full Text PDFComb Chem High Throughput Screen
March 2022
Aim And Objective: The similarities comparison of biological sequences is an important task in bioinformatics. The methods of the similarities comparison for biological sequences are divided into two classes: sequence alignment method and alignment-free method. The graphical representation of biological sequences is a kind of alignment-free method, which constitutes a tool for analyzing and visualizing the biological sequences.
View Article and Find Full Text PDFSpectrochim Acta A Mol Biomol Spectrosc
November 2020
Two fluorescent probes were designed by connecting indomethacin to coumarin through different linkers. The introduction of indomethacin quenched the fluorescence of coumarin-based probes with apparent red-shifts in the absorption and emission maxima, probably due to the photoinduced electron transfer (PET) from the indomethacin to the fluorophore and the formation of folding conformation. The addition of human serum albumin (HSA) triggered about 40-fold fluorescence enhancements of ADC-IMC-2 and ADC-IMC-6 with 85 nm blue-shifts.
View Article and Find Full Text PDFResidue-residue contact prediction has become an increasingly important tool for modeling the three-dimensional structure of a protein when no homologous structure is available. Ultradeep residual neural network (ResNet) has become the most popular method for making contact predictions because it captures the contextual information between residues. In this paper, we propose a novel deep neural network framework for contact prediction which combines ResNet and DenseNet.
View Article and Find Full Text PDFComput Math Methods Med
March 2021
The type III secretion system (T3SS) is a special protein delivery system in Gram-negative bacteria which delivers T3SS-secreted effectors (T3SEs) to host cells causing pathological changes. Numerous experiments have verified that T3SEs play important roles in many biological activities and in host-pathogen interactions. Accurate identification of T3SEs is therefore essential to help understand the pathogenic mechanism of bacteria; however, many existing biological experimental methods are time-consuming and expensive.
View Article and Find Full Text PDFBackground: Subcellular localization prediction of protein is an important component of bioinformatics, which has great importance for drug design and other applications. A multitude of computational tools for proteins subcellular location have been developed in the recent decades, however, existing methods differ in the protein sequence representation techniques and classification algorithms adopted.
Results: In this paper, we firstly introduce two kinds of protein sequences encoding schemes: dipeptide information with space and Gapped k-mer information.
Background: Some studies have shown that Human Papillomavirus (HPV) is strongly associated with cervical cancer. As we all know, cervical cancer still remains the fourth most common cancer, affecting women worldwide. Thus, it is both challenging and essential to detect risk types of human papillomaviruses.
View Article and Find Full Text PDFAs a common malignant tumor disease, thyroid cancer lacks effective preventive and therapeutic drugs. Thus, it is crucial to provide an effective drug selection method for thyroid cancer patients. The connectivity map (CMAP) project provides an experimental validated strategy to repurpose and optimize cancer drugs, the rationale behind which is to select drugs to reverse the gene expression variations induced by cancer.
View Article and Find Full Text PDFIn recent years, it has been increasingly clear that long noncoding RNAs (lncRNAs) play critical roles in many biological processes associated with human diseases. Inferring potential lncRNA-disease associations is essential to reveal the secrets behind diseases, develop novel drugs, and optimize personalized treatments. However, biological experiments to validate lncRNA-disease associations are very time-consuming and costly.
View Article and Find Full Text PDFA near infrared fluorescent probe YSP for sulfite was synthesized, in which a julolidine fused with a pyran-2-one was employed as the fluorophore and the vinyl activated by an indole salt as the receptor. The introduction of julolidine and indole salt strengthens the electron push-pull effect of the probe and allows it to absorb (597 nm) and emit (681 nm) in red wavelength region. The addition of sulfite to the C˭C bond led to prominent blue-shifts in both absorption (171 nm) and emission (165 nm) spectra, which made it possible for colorimetric and ratiometric fluorescent detection of sulfite.
View Article and Find Full Text PDFEvol Bioinform Online
June 2018
In this article, we propose a 3-dimensional graphical representation of protein sequences based on 10 physicochemical properties of 20 amino acids and the BLOSUM62 matrix. It contains evolutionary information and provides intuitive visualization. To further analyze the similarity of proteins, we extract a specific vector from the graphical representation curve.
View Article and Find Full Text PDFBiomed Res Int
September 2018
High-accuracy alignment of sequences with disease information contributes to disease treatment and prevention. The results of multiple sequence alignment depend on the parameters of the objective function, including gap open penalties (GOP), gap extension penalties (GEP), and substitution matrix (SM). Firstly, the theory parameter formulas relating to GOP, GAP, and SM are inferred, combining unaligned sequence length, number, and identity.
View Article and Find Full Text PDFComb Chem High Throughput Screen
April 2019
Aim And Objective: The rapid increase in the amount of protein sequence data available leads to an urgent need for novel computational algorithms to analyze and compare these sequences. This study is undertaken to develop an efficient computational approach for timely encoding protein sequences and extracting the hidden information.
Methods: Based on two physicochemical properties of amino acids, a protein primary sequence was converted into a three-letter sequence, and then a graph without loops and multiple edges and its geometric line adjacency matrix were obtained.
RNAs may act as competing endogenous RNAs (ceRNAs), a critical mechanism in determining gene expression regulations in many cancers. However, the roles of ceRNAs in thyroid carcinoma remains elusive. In this study, we have developed a novel pipeline called Molecular Network-based Identification of ceRNA (MNIceRNA) to identify ceRNAs in thyroid carcinoma.
View Article and Find Full Text PDFObjectives: In this paper, a high-quality sequence encoding scheme is proposed for predicting subcellular location of apoptosis proteins.
Methods: In the proposed methodology, the novel evolutionary-conservative information is introduced to represent protein sequences. Meanwhile, based on the proportion of golden section in mathematics, position-specific scoring matrix (PSSM) is divided into several blocks.
Motivation: Low-rank matrix completion has been demonstrated to be powerful in predicting antigenic distances among influenza viruses and vaccines from partially revealed hemagglutination inhibition table. Meanwhile, influenza hemagglutinin (HA) protein sequences are also effective in inferring antigenic distances. Thus, it is natural to integrate HA protein sequence information into low-rank matrix completion model to help infer influenza antigenicity, which is critical to influenza vaccine development.
View Article and Find Full Text PDF