Extreme Gradient Boosting as a Method for Quantitative Structure-Activity Relationships.

J Chem Inf Model

Bioinformatics Department, MSD International GmbH (Singapore Branch) , 1 Fusionopolis Place, #06-10/07-18, Galaxis, Singapore 138522.

Published: December 2016

In the pharmaceutical industry it is common to generate many QSAR models from training sets containing a large number of molecules and a large number of descriptors. The best QSAR methods are those that can generate the most accurate predictions but that are not overly expensive computationally. In this paper we compare eXtreme Gradient Boosting (XGBoost) to random forest and single-task deep neural nets on 30 in-house data sets. While XGBoost has many adjustable parameters, we can define a set of standard parameters at which XGBoost makes predictions, on the average, better than those of random forest and almost as good as those of deep neural nets. The biggest strength of XGBoost is its speed. Whereas efficient use of random forest requires generating each tree in parallel on a cluster, and deep neural nets are usually run on GPUs, XGBoost can be run on a single CPU in less than a third of the wall-clock time of either of the other methods.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.6b00591DOI Listing

Publication Analysis

Top Keywords

random forest
12
deep neural
12
neural nets
12
extreme gradient
8
gradient boosting
8
large number
8
xgboost
5
boosting method
4
method quantitative
4
quantitative structure-activity
4

Similar Publications

Human-wildlife conflict is one of the important research topics in biodiversity and conservation. Understanding the status of wildlife resources and its conflict with human could promote the sustainable protection and management of wildlife. Wild boar () is one of the most widely distributed ungulates in the world, with an increasing population and recently rising levels of conflict with human.

View Article and Find Full Text PDF

Non-grain utilization of cultivated land threatens farmland ecological environment and soil health, which restricts grain production. To identify the key obstacle factors of cultivated soil under non-grain utilization, explore the changes of soil quality and function, and evaluate the effects of non-grain utilization on the health of farmland soil, we evaluated soil health of farmland under different non-grain utilization types (vegetables, bamboo-abandoned, nursery-grown plant-abandoned, nursery-grown plant-rice) by soil quality index and soil multifunctionality index method combined with sensitivity and resistance approaches. The results showed that soil organic carbon and total nitrogen (TN) in the bamboo-abandoned soil were 95.

View Article and Find Full Text PDF

Background: Anxiety and depression represent prevalent yet frequently undetected mental health concerns within the older population. The challenge of identifying these conditions presents an opportunity for artificial intelligence (AI)-driven, remotely available, tools capable of screening and monitoring mental health. A critical criterion for such tools is their cultural adaptability to ensure effectiveness across diverse populations.

View Article and Find Full Text PDF

Predicting intra-abdominal candidiasis in elderly septic patients using machine learning based on lymphocyte subtyping: a prospective cohort study.

Front Pharmacol

December 2024

Department of Critical Care Medicine, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, China.

Objective: Intra-abdominal candidiasis (IAC) is difficult to predict in elderly septic patients with intra-abdominal infection (IAI). This study aimed to develop and validate a nomogram based on lymphocyte subtyping and clinical factors for the early and rapid prediction of IAC in elderly septic patients.

Methods: A prospective cohort study of 284 consecutive elderly patients diagnosed with sepsis and IAI was performed.

View Article and Find Full Text PDF

Introduction: Modifiable Areal Unit Problems are a major source of spatial uncertainty, but their impact on infectious diseases and epidemic detection is unknown.

Methods: CMS claims (2016-2019) which included infectious disease codes learned through Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) were extracted and analysed at two different units of geography; states and 'home to work commute extent' mega regions. Analysis was per member per month.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!