We have applied the k-nearest neighbor (kNN) modeling technique to the prediction of melting points. A data set of 4119 diverse organic molecules (data set 1) and an additional set of 277 drugs (data set 2) were used to compare performance in different regions of chemical space, and we investigated the influence of the number of nearest neighbors using different types of molecular descriptors. To compute the prediction on the basis of the melting temperatures of the nearest neighbors, we used four different methods (arithmetic and geometric average, inverse distance weighting, and exponential weighting), of which the exponential weighting scheme yielded the best results. We assessed our model via a 25-fold Monte Carlo cross-validation (with approximately 30% of the total data as a test set) and optimized it using a genetic algorithm. Predictions for drugs based on drugs (separate training and test sets each taken from data set 2) were found to be considerably better [root-mean-squared error (RMSE)=46.3 degrees C, r2=0.30] than those based on nondrugs (prediction of data set 2 based on the training set from data set 1, RMSE=50.3 degrees C, r2=0.20). The optimized model yields an average RMSE as low as 46.2 degrees C (r2=0.49) for data set 1, and an average RMSE of 42.2 degrees C (r2=0.42) for data set 2. It is shown that the kNN method inherently introduces a systematic error in melting point prediction. Much of the remaining error can be attributed to the lack of information about interactions in the liquid state, which are not well-captured by molecular descriptors.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1021/ci060149f | DOI Listing |
JMIR Med Inform
January 2025
Institute of History and Ethics in Medicine, School of Medicine and Health, Technical University of Munich, Munich, Germany.
Background: In data-sparse areas such as health care, computer scientists aim to leverage as much available information as possible to increase the accuracy of their machine learning models' outputs. As a standard, categorical data, such as patients' gender, socioeconomic status, or skin color, are used to train models in fusion with other data types, such as medical images and text-based medical information. However, the effects of including categorical data features for model training in such data-scarce areas are underexamined, particularly regarding models intended to serve individuals equitably in a diverse population.
View Article and Find Full Text PDFPLoS One
January 2025
School of Life Course and Population Sciences, King's College London, London, United Kingdom.
Introduction: High-Flow Nasal Therapy (HFNT) is an innovative non-invasive form of respiratory support. Compared to standard oxygen therapy (SOT), there is an equipoise regarding the effect of HFNT on patient-centred outcomes among those at high risk of developing postoperative pulmonary complications after undergoing cardiac surgery. The NOTACS trial aims to determine the clinical and cost-effectiveness of HFNT compared to SOT within 90 days of surgery in the United Kingdom, Australia, and New Zealand.
View Article and Find Full Text PDFJ Chem Theory Comput
January 2025
Qingdao Institute for Theoretical and Computational Sciences and Center for Optics Research and Engineering, Shandong University, Qingdao 266237, China.
Given a number of data sets for evaluating the performance of single reference methods for the low-lying excited states of closed-shell molecules, a comprehensive data set for assessing the performance of multireference methods for the low-lying excited states of open-shell systems is still lacking. For this reason, we propose an extension (QUEST#4X) of the radical subset of QUEST#4 ( , , 3720) to cover 110 doublet and 39 quartet excited states. Near-exact results obtained by iterative configuration interaction with selection and second-order perturbation correction (iCIPT2) are taken as benchmark to calibrate static-dynamic-static configuration interaction (SDSCI) and static-dynamic-static second-order perturbation theory (SDSPT2), which are minimal MRCI and CI-like perturbation theory, respectively.
View Article and Find Full Text PDFJ Chem Inf Model
January 2025
Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States.
In the field of computational chemistry, predicting bond dissociation energies (BDEs) presents well-known challenges, particularly due to the multireference character of reactive systems. Many chemical reactions involve configurations where single-reference methods fall short, as the electronic structure can significantly change during bond breaking. As generating training data for partially broken bonds is a challenging task, even state-of-the-art reactive machine learning interatomic potentials (MLIPs) often fail to predict reliable BDEs and smooth dissociation curves.
View Article and Find Full Text PDFCell Transplant
January 2025
Functional Neurosurgery Research Center, Shohada Tajrish Comprehensive Neurosurgical Center of Excellence, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
Neuropathic pain is a debilitating complication following spinal cord injury (SCI). Currently, effective treatments for SCI-induced neuropathic pain are highly lacking. This clinical trial aimed to investigate the efficacy of combined intrathecal injection of Schwann cells (SCs) and bone marrow-derived mesenchymal stem cells (BMSCs) in improving SCI-induced neuropathic pain.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!