Exploiting Multiple Descriptor Sets in QSAR Studies.

J Chem Inf Model

Department of Statistics, The University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada.

Published: March 2016

A quantitative structure-activity relationship (QSAR) is a model relating a specific biological response to the chemical structures of compounds. There are many descriptor sets available to characterize chemical structure, raising the question of how to choose among them or how to use all of them for training a QSAR model. Making efficient use of all sets of descriptors is particularly problematic when active compounds are rare among the assay response data. We consider various strategies to make use of the richness of multiple descriptor sets when assay data are poor in active compounds. Comparisons are made using data from four bioassays, each with five sets of molecular descriptors. The recommended method takes all available descriptors from all sets and uses an algorithm to partition them into groups called phalanxes. Distinct statistical models are trained, each based on only the descriptors in one phalanx, and the models are then averaged in an ensemble of models. By giving the descriptors a chance to contribute in different models, the recommended method uses more of the descriptors in model averaging. This results in better ranking of active compounds to identify a shortlist of drug candidates for development.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.5b00663DOI Listing

Publication Analysis

Top Keywords

descriptor sets
12
active compounds
12
multiple descriptor
8
qsar model
8
recommended method
8
sets
6
descriptors
6
exploiting multiple
4
sets qsar
4
qsar studies
4

Similar Publications

The comprehensive identification of peaks in untargeted lipidomics using LC-MS/MS remains a significant challenge. Confidence in lipid annotation can be greatly improved by integrating a highly accurate machine learning-based retention time prediction model. Such an approach enables the identification of lipids for understanding pathogenic mechanisms, biomarker discovery, and drug screening.

View Article and Find Full Text PDF

Artificially synthesized DNA holds significant promise in addressing fundamental biochemical questions and driving advancements in biotechnology, genetics, and DNA digital data storage. Rapid and precise electric identification of these artificial DNA strands is crucial for their effective application. Herein, we present a comprehensive investigation into the electric recognition of eight artificial synthesized DNA (DNA and DNA) nucleobases using quantum tunneling transport and machine learning (ML) techniques.

View Article and Find Full Text PDF

Predicting purification process fit of monoclonal antibodies using machine learning.

MAbs

December 2025

Department of Purification, Microbiology and Virology, Genentech Inc, South San Francisco, CA, USA.

In early-stage development of therapeutic monoclonal antibodies, assessment of the viability and ease of their purification typically requires extensive experimentation. However, the work required for upstream protein expression and downstream purification development often conflicts with timeline pressures and material constraints, limiting the number of molecules and process conditions that can reasonably be assessed. Recently, high-throughput batch-binding screen data along with improved molecular descriptors have enabled development of robust quantitative structure-property relationship (QSPR) models that predict monoclonal antibody chromatographic binding behavior from the amino acid sequence.

View Article and Find Full Text PDF

Small proteins (≤100 amino acids) play important roles across all life forms, ranging from unicellular bacteria to higher organisms. In this study, we have developed SProtFP which is a machine learning-based method for functional annotation of prokaryotic small proteins into selected functional categories. SProtFP uses independent artificial neural networks (ANNs) trained using a combination of physicochemical descriptors for classifying small proteins into antitoxin type 2, bacteriocin, DNA-binding, metal-binding, ribosomal protein, RNA-binding, type 1 toxin and type 2 toxin proteins.

View Article and Find Full Text PDF

CovCysPredictor: Predicting Selective Covalently Modifiable Cysteines Using Protein Structure and Interpretable Machine Learning.

J Chem Inf Model

January 2025

Computer-Aided Drug Discovery, Global Discovery Chemistry, Novartis Biomedical Research, 181 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.

Targeted covalent inhibition is a powerful therapeutic modality in the drug discoverer's toolbox. Recent advances in covalent drug discovery, in particular, targeting cysteines, have led to significant breakthroughs for traditionally challenging targets such as mutant KRAS, which is implicated in diverse human cancers. However, identifying cysteines for targeted covalent inhibition is a difficult task, as experimental and in silico tools have shown limited accuracy.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!