Feature selection methods in QSAR studies.

J AOAC Int

Vrije Universiteit Brussel, Department of Analytical Chemistry and Pharmaceutical Technology, Center for Pharmaceutical Research, Brussels, Belgium.

Published: August 2012

A quantitative structure-activity relationship (QSAR) relates quantitative chemical structure attributes (molecular descriptors) to a biological activity. QSAR studies have now become attractive in drug discovery and development because their application can save substantial time and human resources. Several parameters are important in the prediction ability of a QSAR model. On the one hand, different statistical methods may be applied to check the linear or nonlinear behavior of a data set. On the other hand, feature selection techniques are applied to decrease the model complexity, to decrease the overfitting/overtraining risk, and to select the most important descriptors from the often more than 1000 calculated. The selected descriptors are then linked to a biological activity of the corresponding compound by means of a mathematical model. Different modeling techniques can be applied, some of which explicitly require a feature selection. A QSAR model can be useful in the design of new compounds with improved potency in the class under study. Only molecules with a predicted interesting activity will be synthesized. In the feature selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus attention, while ignoring the rest. Up to now, many feature selection techniques, such as genetic algorithms, forward selection, backward elimination, stepwise regression, and simulated annealing have been used extensively. Swarm intelligence optimizations, such as ant colony optimization and partial swarm optimization, which are feature selection techniques usually simulated based on animal and insect life behavior to find the shortest path between a food source and their nests, recently are also involved in QSAR studies. This review paper provides an overview of different feature selection techniques applied in QSAR modeling.

Download full-text PDF

Source
http://dx.doi.org/10.5740/jaoacint.sge_goodarziDOI Listing

Publication Analysis

Top Keywords

feature selection
28
selection techniques
16
qsar studies
12
techniques applied
12
biological activity
8
qsar model
8
feature
7
qsar
7
selection
7
techniques
5

Similar Publications

The "no-show" problem in healthcare refers to the prevalent phenomenon where patients schedule appointments with healthcare providers but fail to attend them without prior cancellation or rescheduling. In addressing this issue, our study delves into a multivariate analysis over a five-year period involving 21,969 patients. Our study introduces a predictive model framework that offers a holistic approach to managing the no-show problem in healthcare, incorporating elements into the objective function that address not only the accurate prediction of no-shows but also the management of service capacity, overbooking, and idle resource allocation resulting from mispredictions.

View Article and Find Full Text PDF

Machine learning prediction model for oral mucositis risk in head and neck radiotherapy: a preliminary study.

Support Care Cancer

January 2025

Oral Diagnosis Department, Faculdade de Odontolodia de Piracicaba, Universidade de Campinas (UNICAMP), Piracicaba, São Paulo, Brazil.

Purpose: Oral mucositis (OM) reflects a complex interplay of several risk factors. Machine learning (ML) is a promising frontier in science, capable of processing dense information. This study aims to assess the performance of ML in predicting OM risk in patients undergoing head and neck radiotherapy.

View Article and Find Full Text PDF

Chronic obstructive pulmonary disease (COPD) is a leading cause of death worldwide and greatly reduces the quality of life. Utilizing remote monitoring has been shown to improve quality of life and reduce exacerbations, but remains an ongoing area of research. We introduce a novel method for estimating changes in ease of breathing for COPD patients, using obstructed breathing data collected via wearables.

View Article and Find Full Text PDF

In a previous preliminary study, radiomic features from the largest and the hottest lesion in baseline F-FDG PET/CT (bPET/CT) of classical Hodgkin's Lymphoma (cHL) predicted early response-to-treatment and prognosis. Aim of this large retrospectively-validated study is to evaluate the predictive role of two-lesions radiomics in comparison with other clinical and conventional PET/CT models. cHL patients with bPET/CT between 2010 and 2020 were retrospectively included and randomized into training-validation sets.

View Article and Find Full Text PDF

A novel electrochemical aptasensor based on bimetallic zirconium and copper oxides embedded within mesoporous carbon (denoted as ZrOCuO@mC) was constructed to detect miRNA. The porous ZrOCuO@mC was created through the pyrolysis of bimetallic zirconium/copper-based metal-organic framework (ZrCu-MOF). The substantial surface area and high porosity of ZrOCuO@mC nanocomposite along with its robust affinity toward aptamer strands, facilitated the effective anchoring of aptamer strands on the ZrOCuO@mC-modified electrode surface.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!