Redundancy in two major compound databases.

Drug Discov Today

Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, D-53113 Bonn, Germany. Electronic address:

Published: June 2018

Public repositories of compounds and activity data are of prime importance for pharmaceutical research in academic and industrial settings. Major databases have evolved over the years. Their growth is accompanied by an increasing tendency toward data sharing. This is a positive development but not without potential problems. Using ChEMBL and PubChem as examples, we show that crosstalk between databases also leads to substantial data redundancy that might not be obvious. Redundancy is an important issue because it biases data analysis and knowledge extraction and leads to inflated views of available compounds, assays and activity data. Going forward it will be important to further refine data exchange and deposition criteria and make redundancy as transparent as possible.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.drudis.2018.03.005DOI Listing

Publication Analysis

Top Keywords

activity data
8
data
6
redundancy
4
redundancy major
4
major compound
4
compound databases
4
databases public
4
public repositories
4
repositories compounds
4
compounds activity
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!