Background: Predicting in advance the behavior of new chemical compounds can support the design process of new products by directing the research toward the most promising candidates and ruling out others. Such predictive models can be data-driven using Machine Learning or based on researchers' experience and depend on the collection of past results. In either case: models (or researchers) can only make reliable assumptions about compounds that are similar to what they have seen before. Therefore, consequent usage of these predictive models shapes the dataset and causes a continuous specialization shrinking the applicability domain of all trained models on this dataset in the future, and increasingly harming model-based exploration of the space.
Proposed Solution: In this paper, we propose CANCELS (CounterActiNg Compound spEciaLization biaS), a technique that helps to break the dataset specialization spiral. Aiming for a smooth distribution of the compounds in the dataset, we identify areas in the space that fall short and suggest additional experiments that help bridge the gap. Thereby, we generally improve the dataset quality in an entirely unsupervised manner and create awareness of potential flaws in the data. CANCELS does not aim to cover the entire compound space and hence retains a desirable degree of specialization to a specified research domain.
Results: An extensive set of experiments on the use-case of biodegradation pathway prediction not only reveals that the bias spiral can indeed be observed but also that CANCELS produces meaningful results. Additionally, we demonstrate that mitigating the observed bias is crucial as it cannot only intervene with the continuous specialization process, but also significantly improves a predictor's performance while reducing the number of required experiments. Overall, we believe that CANCELS can support researchers in their experimentation process to not only better understand their data and potential flaws, but also to grow the dataset in a sustainable way. All code is available under github.com/KatDost/Cancels .
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10197453 | PMC |
http://dx.doi.org/10.1186/s13321-023-00716-w | DOI Listing |
J Chem Inf Model
January 2025
Department of Computer Science and Technology, Shantou University, Shantou 515063, China.
The human microbiota may influence the effectiveness of drug therapy by activating or inactivating the pharmacological properties of drugs. Computational methods have demonstrated their ability to screen reliable microbe-drug associations and uncover the mechanism by which drugs exert their functions. However, the previous prediction methods failed to completely exploit the neighborhood topologies of the microbe and drug entities and the diverse correlations between the microbe-drug entity pair and the other entities.
View Article and Find Full Text PDFJAMA Cardiol
January 2025
National Heart and Lung Institute, Imperial College London, United Kingdom.
Importance: Hypertension underpins significant global morbidity and mortality. Early lifestyle intervention and treatment are effective in reducing adverse outcomes. Artificial intelligence-enhanced electrocardiography (AI-ECG) has been shown to identify a broad spectrum of subclinical disease and may be useful for predicting incident hypertension.
View Article and Find Full Text PDFClin Nucl Med
November 2024
From the Interventional Oncology/Radiology Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY.
Background: Radiation segmentectomy (RS) is an alternative potential local curative treatment for selected colorectal liver metastases (CLMs) not amenable to ablation or limited resection.
Purpose: The aim of this study was to evaluate the dosimetric response of low volume CLMs to RS in heavily pretreated patients who are not candidates for resection or percutaneous ablation.
Patients And Methods: This single-center retrospective study evaluated CLMs patients treated with RS (prescribed tumor dose >190 Gy) from 2015 to 2023.
Rheumatol Int
January 2025
Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA, 95661, USA.
Women are disproportionately affected by chronic autoimmune diseases (AD) like systemic lupus erythematosus (SLE), scleroderma, rheumatoid arthritis (RA), and Sjögren's syndrome. Traditional evaluations often underestimate the associated cardiovascular disease (CVD) and stroke risk in women having AD. Vitamin D deficiency increases susceptibility to these conditions.
View Article and Find Full Text PDFInt J Legal Med
January 2025
University Department of Forensic Sciences, University of Split, R. Boškovića 33, Split, 21000, Croatia.
This study aimed to test age-related changes in sternal fusion and sternal-rib cartilage ossification on multi-slice computed tomography (MSCT) images of the Croatian population. The additional aim was to develop models to estimate age and provide an interface for the model's application and validation. This retrospective study was conducted on 144 MSCT images of the sternal region, and the developed models were tested on 36 MSCT images.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!