The selection of an optimal set of molecular descriptors from a much greater pool of such regression variables is a crucial step in the development of QSAR and QSPR models. The aim of this work is to further improve this important selection process. For this reason three different alternatives for the initial steps of our recently developed enhanced replacement method (ERM) and replacement method (RM) are proposed. These approaches had previously proven to yield near optimal results with a much smaller number of linear regressions than the full search. The algorithms were tested on four different experimental data sets, formed by collections of 116, 200, 78, and 100 experimental records from different compounds and 1268, 1338, 1187, and 1306 molecular descriptors, respectively. The comparisons showed that one of the new alternatives further improves the ERM, which has shown to be superior to genetic algorithms for the selection of an optimal set of molecular descriptors from a much greater pool. The new proposed alternative also improves the simpler and the lower computational demand algorithm RM.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1021/ci200079b | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!