The training of molecular models of quantum mechanical properties based on statistical machine learning requires large data sets which exemplify the map from chemical structure to molecular property. Intelligent a priori selection of training examples is often difficult or impossible to achieve, as prior knowledge may be unavailable. Ordinarily representative selection of training molecules from such data sets is achieved through random sampling. We use genetic algorithms for the optimization of training set composition consisting of tens of thousands of small organic molecules. The resulting machine learning models are considerably more accurate: in the limit of small training sets, mean absolute errors for out-of-sample predictions are reduced by up to ∼75%. We discuss and present optimized training sets consisting of 10 molecular classes for all molecular properties studied. We show that these classes can be used to design improved training sets for the generation of machine learning models of the same properties in similar but unrelated molecular sets.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jpclett.7b00038DOI Listing

Publication Analysis

Top Keywords

training sets
16
machine learning
16
learning models
12
training
8
optimization training
8
molecular properties
8
data sets
8
selection training
8
sets
7
molecular
6

Similar Publications

Background: Ferroptosis is a cell death process that depends on iron and reactive oxygen species. It significantly contributes to cardiovascular diseases. However, its exact role in ischemic cardiomyopathy (ICM) is still unclear.

View Article and Find Full Text PDF

This investigation evaluated validity and reliability of the HUMAC360 linear position transducer (LPT) compared to the Tendo Sport Weightlifting Analyzer (TENDO) for measuring mean velocity (MV), peak velocity (PV), and displacement (D) during the bench press. Seventeen recreationally active individuals completed three visits. During visit one, participants were assessed for their one repetition maximum (1RM) bench press.

View Article and Find Full Text PDF

We present a modeling strategy to forecast the incidence rate of dengue in the department of Córdoba, Colombia, thereby considering the effect of climate variables. A Seasonal Autoregressive Integrated Moving Average model with exogenous variables (SARIMAX) model is fitted under a cross-validation approach, and we examine the effect of the exogenous variables on the performance of the model. This study uses data of dengue cases, precipitation, and relative humidity reported from years 2007 to 2021.

View Article and Find Full Text PDF

Background: Extrapancreatic perineural invasion (EPNI) increases the risk of postoperative recurrence in pancreatic ductal adenocarcinoma (PDAC). This study aimed to develop and validate a computed tomography (CT)-based, fully automated preoperative artificial intelligence (AI) model to predict EPNI in patients with PDAC.

Methods: The authors retrospectively enrolled 1065 patients from two Shanghai hospitals between June 2014 and April 2023.

View Article and Find Full Text PDF

The cost of perspective switching: Constraints on simultaneous activation.

Psychon Bull Rev

January 2025

Department of Education and Psychology, The Open University, 1 University Road, P.O. Box 808, 4353701, Ra'anana, Israel.

Visual perspective taking often involves transitioning between perspectives, yet the cognitive mechanisms underlying this process remain unclear. The current study draws on insights from task- and language-switching research to address this gap. In Experiment 1, 79 participants judged the perspective of an avatar positioned in various locations, observing either the rectangular or the square side of a rectangular cube hanging from the ceiling.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!