Prediction of peptide hormones using an ensemble of machine learning and similarity-based methods.

Dashleen Kaur Akanksha Arora Palani Vigneshwar Gajendra P S Raghava

Proteomics

Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.

Published: October 2024

Peptide hormones serve as genome-encoded signal transduction molecules that play essential roles in multicellular organisms, and their dysregulation can lead to various health problems. In this study, we propose a method for predicting hormonal peptides with high accuracy. The dataset used for training, testing, and evaluating our models consisted of 1174 hormonal and 1174 non-hormonal peptide sequences. Initially, we developed similarity-based methods utilizing BLAST and MERCI software. Although these similarity-based methods provided a high probability of correct prediction, they had limitations, such as no hits or prediction of limited sequences. To overcome these limitations, we further developed machine and deep learning-based models. Our logistic regression-based model achieved a maximum AUROC of 0.93 with an accuracy of 86% on an independent/validation dataset. To harness the power of similarity-based and machine learning-based models, we developed an ensemble method that achieved an AUROC of 0.96 with an accuracy of 89.79% and a Matthews correlation coefficient (MCC) of 0.8 on the validation set. To facilitate researchers in predicting and designing hormone peptides, we developed a web-based server called HOPPred. This server offers a unique feature that allows the identification of hormone-associated motifs within hormone peptides. The server can be accessed at: https://webs.iiitd.edu.in/raghava/hoppred/.

Download full-text PDF	Source
http://dx.doi.org/10.1002/pmic.202400004	DOI Listing

Publication Analysis

Top Keywords

similarity-based methods

peptide hormones

learning-based models

hormone peptides

prediction peptide

hormones ensemble

ensemble machine

machine learning

similarity-based

learning similarity-based

Similar Publications

Major advances in protein function assignment by remote homolog detection with protein language models - A review.

Curr Opin Struct Biol

January 2025

Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA; Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA. Electronic address:

Mesih Kilinc Kejue Jia Robert L Jernigan

There is an ever-increasing need for accurate and efficient methods to identify protein homologs. Traditionally, sequence similarity-based methods have dominated protein homolog identification for function identification, but these struggle when the sequence identity between the pairs is low. Recently, transformer architecture-based deep learning methods have achieved breakthrough performances in many fields.

View Article and Find Full Text PDF

Similar Publications

MPEMDA: A Multi-Similarity Integration Approach with Pre-completion and Error Correction for Predicting Microbe-Drug Associations.

Methods

January 2025

School of Computer Science and Engineering, Central South University, Changsha 410083, China; Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China.

Yuxiang Li Haochen Zhao Jianxin Wang

Exploring the associations between microbes and drugs offers valuable insights into their underlying mechanisms. Traditional wet lab experiments, while reliable, are often time-consuming and labor-intensive, making computational approaches an attractive alternative. Existing similarity-based machine learning models for predicting microbe-drug associations typically rely on integrated similarities as input, neglecting the unique contributions of individual similarities, which can compromise predictive accuracy.

View Article and Find Full Text PDF

Similar Publications

Empowering PET imaging reporting with retrieval-augmented large language models and reading reports database: a pilot single center study.

Eur J Nucl Med Mol Imaging

January 2025

Department of Nuclear Medicine, Seoul National University Hospital, 101 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea.

Hongyoon Choi Dongjoo Lee Yeon-Koo Kang Minseok Suh

Purpose: The potential of Large Language Models (LLMs) in enhancing a variety of natural language tasks in clinical fields includes medical imaging reporting. This pilot study examines the efficacy of a retrieval-augmented generation (RAG) LLM system considering zero-shot learning capability of LLMs, integrated with a comprehensive database of PET reading reports, in improving reference to prior reports and decision making.

Methods: We developed a custom LLM framework with retrieval capabilities, leveraging a database of over 10 years of PET imaging reports from a single center.

View Article and Find Full Text PDF

Similar Publications

Comprehensive Evaluation of Advanced Imputation Methods for Proteomic Data Acquired via the Label-Free Approach.

Int J Mol Sci

December 2024

Biological and Chemical Research Centre, Faculty of Chemistry, University of Warsaw, Zwirki i Wigury 101, 02-089 Warsaw, Poland.

Grzegorz Wryk Andrzej Gawor Ewa Bulska

Mass-spectrometry-based proteomics frequently utilizes label-free quantification strategies due to their cost-effectiveness, methodological simplicity, and capability to identify large numbers of proteins within a single analytical run. Despite these advantages, the prevalence of missing values (MV), which can impact up to 50% of the data matrix, poses a significant challenge by reducing the accuracy, reproducibility, and interpretability of the results. Consequently, effective handling of missing values is crucial for reliable quantitative analysis in proteomic studies.

View Article and Find Full Text PDF

Similar Publications

Optimization-Based Image Reconstruction Regularized with Inter-Spectral Structural Similarity for Limited-Angle Dual-Energy Cone-Beam CT.

ArXiv

December 2024

Junbo Peng Tonghe Wang Huiqiao Xie Richard L J Qiu Chih-Wei Chang

Article Synopsis

Limited-angle dual-energy cone-beam CT (LA-DECBCT) is a promising method for achieving fast, low-dose imaging, but its clinical use is challenged by difficulties in image reconstruction.
A new image reconstruction technique using inter-spectral structural similarity was developed to reduce artifacts, improving the quality of DECBCT images without needing extra data for training.
This method shows significant potential for practical clinical applications in LA-DECBCT, enabling accurate imaging without relying on X-ray spectra or paired datasets.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!