Background: The interactions of proteins are determined by their sequences and affect the regulation of the cell cycle, signal transduction and metabolism, which is of extraordinary significance to modern proteomics research. Despite advances in experimental technology, it is still expensive, laborious, and time-consuming to determine protein-protein interactions (PPIs), and there is a strong demand for effective bioinformatics approaches to identify potential PPIs. Considering the large amount of PPI data, a high-performance processor can be utilized to enhance the capability of the deep learning method and directly predict protein sequences.
Results: We propose the Sequence-Statistics-Content protein sequence encoding format (SSC) based on information extraction from the original sequence for further performance improvement of the convolutional neural network. The original protein sequences are encoded in the three-channel format by introducing statistical information (the second channel) and bigram encoding information (the third channel), which can increase the unique sequence features to enhance the performance of the deep learning model. On predicting protein-protein interaction tasks, the results using the 2D convolutional neural network (2D CNN) with the SSC encoding method are better than those of the 1D CNN with one hot encoding. The independent validation of new interactions from the HIPPIE database (version 2.1 published on July 18, 2017) and the validation of directly predicted results by applying a molecular docking tool indicate the effectiveness of the proposed protein encoding improvement in the CNN model.
Conclusion: The proposed protein sequence encoding method is efficient at improving the capability of the CNN model on protein sequence-related tasks and may also be effective at enhancing the capability of other machine learning or deep learning methods. Prediction accuracy and molecular docking validation showed considerable improvement compared to the existing hot encoding method, indicating that the SSC encoding method may be useful for analyzing protein sequence-related tasks. The source code of the proposed methods is freely available for academic research at https://github.com/wangy496/SSC-format/ .
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8042949 | PMC |
http://dx.doi.org/10.1186/s12859-021-04111-w | DOI Listing |
JCO Glob Oncol
January 2025
University of Oxford, Oxford, United Kingdom.
Purpose: Epstein-Barr virus (EBV)-positive Burkitt lymphoma (BL) affects children in sub-Saharan Africa, but diagnosis via tissue biopsy is challenging. We explored a liquid biopsy approach using targeted next-generation sequencing to detect the -immunoglobulin (-Ig) translocation and EBV DNA, assessing its potential for minimally invasive BL diagnosis.
Materials And Methods: The panel included targets for the characteristic -Ig translocation, mutations in intron 1 of , mutations in exon 2 of , and three EBV genes: EBV-encoded RNA (EBER)1, EBER2, and EBV nuclear antigen 2.
Science
January 2025
Department of Medicine and Surgery, University of Parma, Parma, Italy.
The current understanding of primate natural action organization derives from laboratory experiments in restrained contexts (RCs) under the assumption that this knowledge generalizes to freely moving contexts (FMCs). In this work, we developed a neurobehavioral platform to enable wireless recording of the same premotor neurons in both RCs and FMCs. Neurons often encoded the same hand and mouth actions differently in RCs and FMCs.
View Article and Find Full Text PDFPLoS One
January 2025
Department of Computer Science and Engineering at Hanyang University ERICA, Ansan-si, Gyeonggi-do, South Korea.
Privacy-preserving record linkage (PPRL) technology, crucial for linking records across datasets while maintaining privacy, is susceptible to graph-based re-identification attacks. These attacks compromise privacy and pose significant risks, such as identity theft and financial fraud. This study proposes a zero-relationship encoding scheme that minimizes the linkage between source and encoded records to enhance PPRL systems' resistance to re-identification attacks.
View Article and Find Full Text PDFJ Am Med Inform Assoc
December 2024
Department of Radiology, Stanford University, Stanford, CA 94304, United States.
Objective: Brief hospital course (BHC) summaries are clinical documents that summarize a patient's hospital stay. While large language models (LLMs) depict remarkable capabilities in automating real-world tasks, their capabilities for healthcare applications such as synthesizing BHCs from clinical notes have not been shown. We introduce a novel preprocessed dataset, the MIMIC-IV-BHC, encapsulating clinical note and BHC pairs to adapt LLMs for BHC synthesis.
View Article and Find Full Text PDFAlzheimers Dement
December 2024
Neurophysiology & Behaviour Lab, University of Castilla-La Mancha, Ciudad Real, Spain.
Background: A key neuropathological feature in the early stages of Alzheimer's disease (AD) involves hippocampal dysfunction arising from the accumulation of amyloid-β (Aβ). Previously, our laboratory identified a shift in the synaptic plasticity long term potentiation (LTP)/long term depression (LTD) induction threshold, leading to memory deficits in a non-transgenic murine model of early AD generated by intracerebroventricular (icv.) injections Aβ oligomers (oAβ), one of the most predominant pathogenetic factors in initial stages of the disease.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!