Machine learning models to predict sweetness of molecules.

Comput Biol Med

Infosys Center for Artificial Intelligence, Department of Computational Biology, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi), Okhla Phase III, New Delhi, 110020, India. Electronic address:

Published: January 2023

Sweetness is a vital taste to which humans are innately attracted. Given the increasing prevalence of type-2 diabetes, it is highly relevant to build computational models to predict the sweetness of small molecules. Such models are valuable for identifying sweeteners with low calorific value. We present regression-based machine learning and deep learning algorithms for predicting sweetness. Toward this goal, we manually curated the most extensive dataset of 671 sweet molecules with known experimental sweetness values ranging from 0.2 to 22,500,000. Gradient Boost and Random Forest Regressors emerged as the best models for predicting the sweetness of molecules with a correlation coefficient of 0.94 and 0.92, respectively. Our models show state-of-the-art performance when compared with previously published studies. Besides making our dataset (SweetpredDB) available, we also present a user-friendly web server to return the predicted sweetness for small molecules, Sweetpred (https://cosylab.iiitd.edu.in/sweetpred).

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2022.106441DOI Listing

Publication Analysis

Top Keywords

machine learning
8
models predict
8
predict sweetness
8
sweetness molecules
8
sweetness small
8
small molecules
8
predicting sweetness
8
sweetness
7
models
5
molecules
5

Similar Publications

Objective: This study evaluates the utility of word embeddings, generated by large language models (LLMs), for medical diagnosis by comparing the semantic proximity of symptoms to their eponymic disease embedding ("eponymic condition") and the mean of all symptom embeddings associated with a disease ("ensemble mean").

Materials And Methods: Symptom data for 5 diagnostically challenging pediatric diseases-CHARGE syndrome, Cowden disease, POEMS syndrome, Rheumatic fever, and Tuberous sclerosis-were collected from PubMed. Using the Ada-002 embedding model, disease names and symptoms were translated into vector representations in a high-dimensional space.

View Article and Find Full Text PDF

Transcription factor prediction using protein 3D secondary structures.

Bioinformatics

January 2025

Institute for Computational Systems Biology, Universität Hamburg, Hamburg, 22761, Germany.

Motivation: Transcription factors (TFs) are DNA-binding proteins that regulate gene expression. Traditional methods predict a protein as a TF if the protein contains any DNA-binding domains (DBDs) of known TFs. However, this approach fails to identify a novel TF that does not contain any known DBDs.

View Article and Find Full Text PDF

Background: Postoperative delirium (POD) is a common complication after major surgery and is associated with poor outcomes in older adults. Early identification of patients at high risk of POD can enable targeted prevention efforts. However, existing POD prediction models require inpatient data collected during the hospital stay, which delays predictions and limits scalability.

View Article and Find Full Text PDF

Importance: Recently, the US Food and Drug Administration gave premarketing approval to an algorithm based on its purported ability to identify individuals at genetic risk for opioid use disorder (OUD). However, the clinical utility of the candidate genetic variants included in the algorithm has not been independently demonstrated.

Objective: To assess the utility of 15 genetic variants from an algorithm intended to predict OUD risk.

View Article and Find Full Text PDF

Purpose: To extract conjunctival bulbar redness from standardized high-resolution ocular surface photographs of a novel imaging system by implementing an image analysis pipeline.

Methods: Data from two trials (healthy; outgoing ophthalmic clinic) were collected, processed, and used to train a machine learning model for ocular surface segmentation. Various regions of interest were defined to globally and locally extract a redness biomarker based on color intensity.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!