Significance of Sequence Features in Classification of Protein-Protein Interactions Using Machine Learning.

Protein J

Machine Intelligence Research Lab, Department of Computer Science, University of Kerala, Thiruvananthapuram, Kerala, India.

Published: February 2024

Protein-protein interactions are crucial for the entry of viruses into the cell. Understanding the mechanism of interactions is essential in studying human-virus association, developing new biologics and drug candidates, as well as viral infections and antiviral responses. Experimental methods to analyze human-virus protein-protein interactions based on protein sequence data are time-consuming and labor-intensive, so machine learning models are being developed to predict interactions and determine large-scale interactomes between species. The present work highlights the importance of sequence features in classifying interacting and non-interacting proteins from the protein sequence data. Higher dimensional amino acid sequence features such as Amino Acid Composition (AAC), Dipeptide Composition (DPC), Grouped Amino Acid Composition (GAAC), Pseudo-Amino Acid Composition (PAAC) etc., are extracted. Following feature extraction, three datasets were created: Dataset 1 contains all of the extracted features. While Datasets 2 and 3 contain the most relevant features obtained through dimensionality reduction. To analyze the importance of high-dimensional features and their participation in protein-protein interactions, a random forest classifier is trained on three datasets. With dimensionality reduction, the model exhibited exceptional accuracy, indicating that dimensionality reduction fails to capture the complexity of interactions and the underlying relationships between human and viral proteins. As a result of retaining high-dimensional features, it is possible to capture all the characteristics of protein-protein interactions that resemble host-pathogen associations, leading to the development of biologically meaningful models. Our proposed approach is a more realistic and comprehensive classification model, leading to deeper insights and better applications in virology and drug development.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10930-023-10168-8DOI Listing

Publication Analysis

Top Keywords

protein-protein interactions
20
sequence features
12
amino acid
12
acid composition
12
dimensionality reduction
12
interactions
8
machine learning
8
protein sequence
8
sequence data
8
three datasets
8

Similar Publications

Aberrant promoter methylation of CTHRC1 gene and its clinicopathological characteristics in head and neck cancer.

Int J Oral Maxillofac Surg

January 2025

Molecular Biology Laboratory, Centre for Cellular and Molecular Research, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences (SIMATS), Saveetha University, Chennai, India. Electronic address:

Head and neck squamous cell carcinoma (HNSCC) is genetically complex and difficult to treat. Detection in the early stage is challenging, leading to diagnosis at advanced stages with limited treatment options. This study examined the collagen triple helix repeat containing 1 gene (CTHRC1) as a potential biomarker and therapeutic target in HNSCC.

View Article and Find Full Text PDF

Exploring the mechanism and drug candidates of alveolar echinococcosis affecting liver fibrosis through analysis of existing microarray data.

Acta Trop

January 2025

Department of Department of Anesthesiology, the First Affiliated Hospital of Xinjiang Medical University, No. 137, South Liyushan Road, Xinshi District, Urumqi, Xinjiang, 830054, China; Xinjiang Perioperative Organ Protection Laboratory, No. 137, South Liyushan Road, Xinshi District, Urumqi, Xinjiang, 830054, China. Electronic address:

Echinococcosis, a zoonotic disease, significantly impacts the liver, with alveolar echinococcosis (AE) often leading to liver fibrosis and, in severe cases, cirrhosis. However, the molecular mechanisms by which AE infection promotes liver fibrosis remain incompletely understood. This study utilized bioinformatic analysis of existing microarray data to explore the shared mechanisms between AE and liver fibrosis and to identify potential therapeutic drug candidates.

View Article and Find Full Text PDF

Recombinant Antibodies Inhibit Enzymatic Activity of the E3 Ubiquitin Ligase CHIP via Multiple Mechanisms.

J Biol Chem

January 2025

Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94158, USA; Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA. Electronic address:

Carboxyl-terminus of Hsp70-Interacting Protein (CHIP) is an E3 ubiquitin ligase that marks misfolded substrates for degradation. Hyper-activation of CHIP has been implicated in multiple diseases, including cystic fibrosis and cancer, suggesting that it may be a potential drug target. However, there are few tools available for exploring this possibility.

View Article and Find Full Text PDF

Specific Rosetta-based protein-peptide prediction protocol allows the design of novel cholinesterase inhibitor peptides.

Bioorg Chem

January 2025

Laboratorio de Peptidos Bioactivos, Department of Organic Chemistry, Faculty of Biochemistry and Biological Sciences, National University of the Littoral, Ciudad Universitaria UNL, 3000 Santa Fe, Argentina; National Scientific and Technical Research Council (CONICET), Ministry of Science, Technology and Innovation, Godoy Cruz 2290, Ciudad de Buenos Aires, Argentina. Electronic address:

The search for novel cholinesterase inhibitors is essential for advancing treatments for neurodegenerative disorders such as Alzheimer's disease (AD). In this study, we employed the Rosetta pepspec module, originally developed for designing peptides targeting protein-protein interactions, to design de novo peptides targeting the peripheral aromatic site (PAS) of acetylcholinesterase (AChE) and butyrylcholinesterase (BChE). A total of nine peptides were designed for human AChE (hAChE), T.

View Article and Find Full Text PDF

Clinical evidence increasingly suggests that traditional treatments for dysfunctional uterine bleeding (DUB) have limited success. In this study, blood samples from 10 DUB patients and 10 healthy controls were collected for transcriptome sequencing. Then, the differentially expressed genes (DEGs) were screened and crossed with the DUB-related module genes to obtain the target genes.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!