Peptide hormones serve as genome-encoded signal transduction molecules that play essential roles in multicellular organisms, and their dysregulation can lead to various health problems. In this study, we propose a method for predicting hormonal peptides with high accuracy. The dataset used for training, testing, and evaluating our models consisted of 1174 hormonal and 1174 non-hormonal peptide sequences. Initially, we developed similarity-based methods utilizing BLAST and MERCI software. Although these similarity-based methods provided a high probability of correct prediction, they had limitations, such as no hits or prediction of limited sequences. To overcome these limitations, we further developed machine and deep learning-based models. Our logistic regression-based model achieved a maximum AUROC of 0.93 with an accuracy of 86% on an independent/validation dataset. To harness the power of similarity-based and machine learning-based models, we developed an ensemble method that achieved an AUROC of 0.96 with an accuracy of 89.79% and a Matthews correlation coefficient (MCC) of 0.8 on the validation set. To facilitate researchers in predicting and designing hormone peptides, we developed a web-based server called HOPPred. This server offers a unique feature that allows the identification of hormone-associated motifs within hormone peptides. The server can be accessed at: https://webs.iiitd.edu.in/raghava/hoppred/.

Download full-text PDF

Source
http://dx.doi.org/10.1002/pmic.202400004DOI Listing

Publication Analysis

Top Keywords

similarity-based methods
12
peptide hormones
8
learning-based models
8
hormone peptides
8
prediction peptide
4
hormones ensemble
4
ensemble machine
4
machine learning
4
similarity-based
4
learning similarity-based
4

Similar Publications

Major advances in protein function assignment by remote homolog detection with protein language models - A review.

Curr Opin Struct Biol

January 2025

Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA; Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA. Electronic address:

There is an ever-increasing need for accurate and efficient methods to identify protein homologs. Traditionally, sequence similarity-based methods have dominated protein homolog identification for function identification, but these struggle when the sequence identity between the pairs is low. Recently, transformer architecture-based deep learning methods have achieved breakthrough performances in many fields.

View Article and Find Full Text PDF

MPEMDA: A Multi-Similarity Integration Approach with Pre-completion and Error Correction for Predicting Microbe-Drug Associations.

Methods

January 2025

School of Computer Science and Engineering, Central South University, Changsha 410083, China; Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China.

Exploring the associations between microbes and drugs offers valuable insights into their underlying mechanisms. Traditional wet lab experiments, while reliable, are often time-consuming and labor-intensive, making computational approaches an attractive alternative. Existing similarity-based machine learning models for predicting microbe-drug associations typically rely on integrated similarities as input, neglecting the unique contributions of individual similarities, which can compromise predictive accuracy.

View Article and Find Full Text PDF

Purpose: The potential of Large Language Models (LLMs) in enhancing a variety of natural language tasks in clinical fields includes medical imaging reporting. This pilot study examines the efficacy of a retrieval-augmented generation (RAG) LLM system considering zero-shot learning capability of LLMs, integrated with a comprehensive database of PET reading reports, in improving reference to prior reports and decision making.

Methods: We developed a custom LLM framework with retrieval capabilities, leveraging a database of over 10 years of PET imaging reports from a single center.

View Article and Find Full Text PDF

Comprehensive Evaluation of Advanced Imputation Methods for Proteomic Data Acquired via the Label-Free Approach.

Int J Mol Sci

December 2024

Biological and Chemical Research Centre, Faculty of Chemistry, University of Warsaw, Zwirki i Wigury 101, 02-089 Warsaw, Poland.

Mass-spectrometry-based proteomics frequently utilizes label-free quantification strategies due to their cost-effectiveness, methodological simplicity, and capability to identify large numbers of proteins within a single analytical run. Despite these advantages, the prevalence of missing values (MV), which can impact up to 50% of the data matrix, poses a significant challenge by reducing the accuracy, reproducibility, and interpretability of the results. Consequently, effective handling of missing values is crucial for reliable quantitative analysis in proteomic studies.

View Article and Find Full Text PDF
Article Synopsis
  • Limited-angle dual-energy cone-beam CT (LA-DECBCT) is a promising method for achieving fast, low-dose imaging, but its clinical use is challenged by difficulties in image reconstruction.
  • A new image reconstruction technique using inter-spectral structural similarity was developed to reduce artifacts, improving the quality of DECBCT images without needing extra data for training.
  • This method shows significant potential for practical clinical applications in LA-DECBCT, enabling accurate imaging without relying on X-ray spectra or paired datasets.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!