Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for K and IC values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1186/s13321-025-00956-y | DOI Listing |
Int J Urol
January 2025
Department of Urology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China.
Background: It was controversial to use open surgery or minimally invasive surgery (MIS) for adrenocortical carcinoma (ACC). This retrospective study aimed to evaluate the impact on prognosis between MIS and open surgery in patients with clinical stage I-II ACC.
Methods: Patients with stage I-II ACC from December 2000 to October 2022 were retrospectively studied.
Int J Stroke
January 2025
Department of Health Security System, Center for Health Security, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
background: : Intravenous thrombolysis (IVT) for acute ischemic stroke (AIS) related to underlying intracranial artery dissection (IAD) poses potential risks, including the exacerbation of intramural hematoma and the rupture of the dissected arterial wall. However, the safety of IVT in this specific population remains uncertain.
aims:: This study aimed to assess whether IAD is associated with an increased risk of intracranial hemorrhage (ICH) following IVT and to evaluate its impact on functional outcomes.
J Cheminform
January 2025
Drug Discovery Data Sciences, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium.
Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent.
View Article and Find Full Text PDFClin Microbiol Infect
January 2025
Unidad de Enfermedades Infecciosas y Microbiología, Hospital Universitario Virgen Macarena and Departamento de Medicina, Universidad de Sevilla/Instituto de Biomedicina de Sevilla/CSIC, Seville, Spain; CIBER de Enfermedades Infecciosas (CIBERINFEC). Instituto de Salud Carlos III, Madrid, Spain. Electronic address:
Objectives: The FOSFO-MIC study assessed the clinical and microbiological effectiveness, and safety of intravenous fosfomycin in treating complicated urinary tract infections (cUTIs) caused by Escherichia coli, in comparison with other intravenous antimicrobials.
Methods: A prospective, multinational matched-cohorts study involving adults with community-acquired cUTIs and receiving targeted therapy with intravenous fosfomycin or other first-line drugs (beta-lactams or fluoroquinolones) was conducted from November 2019 to May 2023 in 10 centres from Spain, Italy, and Türkiye. Matching criteria included healthcare-relation, Charlson and Pitt scores.
Metabolomics
January 2025
Owlstone Medical Ltd., Cambridge, UK.
Introduction: Breath Volatile organic compounds (VOCs) are promising biomarkers for clinical purposes due to their unique properties. Translation of VOC biomarkers into the clinic depends on identification and validation: a challenge requiring collaboration, well-established protocols, and cross-comparison of data. Previously, we developed a breath collection and analysis method, resulting in 148 breath-borne VOCs identified.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!