Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set.

J Cheminform

Division of Medicinal Chemistry, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands.

Published: August 2017

The increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitative Structure Activity Relationship (QSAR)-based protocols. However, such studies are typically conducted on different datasets, using different validation strategies, and different metrics. In this study, different methods were compared using one single standardized dataset obtained from ChEMBL, which is made available to the public, using standardized metrics (BEDROC and Matthews Correlation Coefficient). Specifically, the performance of Naïve Bayes, Random Forests, Support Vector Machines, Logistic Regression, and Deep Neural Networks was assessed using QSAR and proteochemometric (PCM) methods. All methods were validated using both a random split validation and a temporal validation, with the latter being a more realistic benchmark of expected prospective execution. Deep Neural Networks are the top performing classifiers, highlighting the added value of Deep Neural Networks over other more conventional methods. Moreover, the best method ('DNN_PCM') performed significantly better at almost one standard deviation higher than the mean performance. Furthermore, Multi-task and PCM implementations were shown to improve performance over single task Deep Neural Networks. Conversely, target prediction performed almost two standard deviations under the mean performance. Random Forests, Support Vector Machines, and Logistic Regression performed around mean performance. Finally, using an ensemble of DNNs, alongside additional tuning, enhanced the relative performance by another 27% (compared with unoptimized 'DNN_PCM'). Here, a standardized set to test and evaluate different machine learning algorithms in the context of multi-task learning is offered by providing the data and the protocols. Graphical Abstract .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5555960PMC
http://dx.doi.org/10.1186/s13321-017-0232-0DOI Listing

Publication Analysis

Top Keywords

deep neural
20
neural networks
20
random forests
8
forests support
8
support vector
8
vector machines
8
machines logistic
8
logistic regression
8
methods
7
performance
6

Similar Publications

Machine learning outperforms humans in microplastic characterization and reveals human labelling errors in FTIR data.

J Hazard Mater

December 2024

Discipline of Chemistry, The University of Newcastle, University Drive, Newcastle, New South Whales 2308, Australia; School of Chemistry, Monash University, Wellington Road, Melbourne, Victoria 3800, Australia. Electronic address:

Microplastics are ubiquitous and appear to be harmful, however, the full extent to which these inflict harm has not been fully elucidated. Analysing environmental sample data is challenging, as the complexity in real data makes both automated and manual analysis either unreliable or time-consuming. To address challenges, we explored a dense feed-forward neural network (DNN) for classifying Fourier transform infrared (FTIR) spectroscopic data.

View Article and Find Full Text PDF

Automated ultrasonography of hepatocellular carcinoma using discrete wavelet transform based deep-learning neural network.

Med Image Anal

January 2025

Department of Electrical and Computer Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon, 440-746, South Korea. Electronic address:

This study introduces HCC-Net, a novel wavelet-based approach for the accurate diagnosis of hepatocellular carcinoma (HCC) from abdominal ultrasound (US) images using artificial neural networks. The HCC-Net integrates the discrete wavelet transform (DWT) to decompose US images into four sub-band images, a lesion detector for hierarchical lesion localization, and a pattern-augmented classifier for generating pattern-enhanced lesion images and subsequent classification. The lesion detection uses a hierarchical coarse-to-fine approach to minimize missed lesions.

View Article and Find Full Text PDF

Artificial neural networks (ANNs) can help camera-based remote photoplethysmography (rPPG) in measuring cardiac activity and physiological signals from facial videos, such as pulse wave, heart rate and respiration rate with better accuracy. However, most existing ANN-based methods require substantial computing resources, which poses challenges for effective deployment on mobile devices. Spiking neural networks (SNNs), on the other hand, hold immense potential for energy-efficient deep learning owing to their binary and event-driven architecture.

View Article and Find Full Text PDF

A universal strategy for smoothing deceleration in deep graph neural networks.

Neural Netw

January 2025

College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, 150001, China. Electronic address:

Graph neural networks (GNNs) have shown great promise in modeling graph-structured data, but the over-smoothing problem restricts their effectiveness in deep layers. Two key weaknesses of existing research on deep GNN models are: (1) ignoring the beneficial aspects of intra-class smoothing while focusing solely on reducing inter-class smoothing, and (2) inefficient computation of residual weights that neglect the influence of neighboring nodes' distributions. To address these weaknesses, we propose a novel Smoothing Deceleration (SD) strategy to reduce the smoothing speed rate of nodes as information propagates between layers, thereby mitigating over-smoothing.

View Article and Find Full Text PDF

Reducing reading time and assessing disease in capsule endoscopy videos: A deep learning approach.

Int J Med Inform

January 2025

University of Coimbra, Faculty of Medicine, Coimbra, Portugal; Department of Gastroenterology, Centro Hospitalar e Universitário de Coimbra, Coimbra, Portugal. Electronic address:

Background: The wireless capsule endoscope (CE) is a valuable diagnostic tool in gastroenterology, offering a safe and minimally invasive visualization of the gastrointestinal tract. One of the few drawbacks identified by the gastroenterology community is the time-consuming task of analyzing CE videos.

Objectives: This article investigates the feasibility of a computer-aided diagnostic method to speed up CE video analysis.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!