Exploring Dimensionality Reduction Techniques for Deep Learning Driven QSAR Models of Mutagenicity.

Toxics

Department of Analytical, Environmental and Forensic Sciences, King's College London, Franklin-Wilkins Building, 150 Stamford St., London SE1 9NH, UK.

Published: June 2023

AI Article Synopsis

  • Dimensionality reduction techniques are essential for improving deep learning models in quantitative structure-activity relationships (QSAR) when analyzing complex toxicological data.
  • Six methods, including both linear (like PCA) and non-linear techniques (like kernel PCA and autoencoders), were tested for their effectiveness in a mutagenicity dataset, revealing that simpler linear techniques achieved optimal model performance.
  • The study also discovered that while many data points were within a usable range, certain regions posed challenges, suggesting that selecting appropriate dimensionality reduction methods can enhance model navigation through chemical space.

Article Abstract

Dimensionality reduction techniques are crucial for enabling deep learning driven quantitative structure-activity relationship (QSAR) models to navigate higher dimensional toxicological spaces, however the use of specific techniques is often arbitrary and poorly explored. Six dimensionality techniques (both linear and non-linear) were hence applied to a higher dimensionality mutagenicity dataset and compared in their ability to power a simple deep learning driven QSAR model, following grid searches for optimal hyperparameter values. It was found that comparatively simpler linear techniques, such as principal component analysis (PCA), were sufficient for enabling optimal QSAR model performances, which indicated that the original dataset was at least approximately linearly separable (in accordance with Cover's theorem). However certain non-linear techniques such as kernel PCA and autoencoders performed at closely comparable levels, while (especially in the case of autoencoders) being more widely applicable to potentially non-linearly separable datasets. Analysis of the chemical space, in terms of XLogP and molecular weight, uncovered that the vast majority of testing data occurred within the defined applicability domain, as well as that certain regions were measurably more problematic and antagonised performances. It was however indicated that certain dimensionality reduction techniques were able to facilitate uniquely beneficial navigations of the chemical space.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10384850PMC
http://dx.doi.org/10.3390/toxics11070572DOI Listing

Publication Analysis

Top Keywords

dimensionality reduction
12
reduction techniques
12
deep learning
12
learning driven
12
driven qsar
8
qsar models
8
qsar model
8
performances indicated
8
chemical space
8
techniques
7

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!