Random forest: a classification and regression tool for compound classification and QSAR modeling.

J Chem Inf Comput Sci

Biometrics Research, Merck Research Laboratories, PO Box 2000, Rahway, New Jersey 07065, USA.

Published: October 2004

A new classification and regression tool, Random Forest, is introduced and investigated for predicting a compound's quantitative or categorical biological activity based on a quantitative description of the compound's molecular structure. Random Forest is an ensemble of unpruned classification or regression trees created by using bootstrap samples of the training data and random feature selection in tree induction. Prediction is made by aggregating (majority vote or averaging) the predictions of the ensemble. We built predictive models for six cheminformatics data sets. Our analysis demonstrates that Random Forest is a powerful tool capable of delivering performance that is among the most accurate methods to date. We also present three additional features of Random Forest: built-in performance assessment, a measure of relative importance of descriptors, and a measure of compound similarity that is weighted by the relative importance of descriptors. It is the combination of relatively high prediction accuracy and its collection of desired features that makes Random Forest uniquely suited for modeling in cheminformatics.

Download full-text PDF

Source
http://dx.doi.org/10.1021/ci034160gDOI Listing

Publication Analysis

Top Keywords

random forest
24
classification regression
12
regression tool
8
features random
8
relative descriptors
8
random
7
forest
5
classification
4
forest classification
4
tool compound
4

Similar Publications

Background: Clear cell renal cell carcinoma (ccRCC) is the most common subtype of renal cell carcinoma (RCC). Due to the lack of symptoms until advanced stages, early diagnosis of ccRCC is challenging. Therefore, the identification of novel secreted biomarkers for the early detection of ccRCC is urgently needed.

View Article and Find Full Text PDF

HIV OctaScanner: A Machine Learning Approach to Unveil Proteolytic Cleavage Dynamics in HIV-1 Protease Substrates.

J Chem Inf Model

January 2025

State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.

The rise of resistance to antiretroviral drugs due to mutations in human immunodeficiency virus-1 (HIV-1) protease is a major obstacle to effective treatment. These mutations alter the drug-binding pocket of the protease and reduce the drug efficacy by disrupting interactions with inhibitors. Traditional methods, such as biochemical assays and structural biology, are crucial for studying enzyme function but are time-consuming and labor-intensive.

View Article and Find Full Text PDF

Exploring Mortality and Prognostic Factors of Heart Failure with In-Hospital and Emergency Patients by Electronic Medical Records: A Machine Learning Approach.

Risk Manag Healthc Policy

January 2025

Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, New Taipei City, 235603, Taiwan.

Purpose: As HF progresses into advanced HF, patients experience a poor quality of life, distressing symptoms, intensive care use, social distress, and eventual hospital death. We aimed to investigate the relationship between morality and potential prognostic factors among in-patient and emergency patients with HF.

Patients And Methods: A case series study: Data are collected from in-hospital and emergency care patients from 2014 to 2021, including their international classification of disease at admission, and laboratory data such as blood count, liver and renal functions, lipid profile, and other biochemistry from the hospital's electrical medical records.

View Article and Find Full Text PDF

Supervised machine learning statistical models for visual outcome prediction in macular hole surgery: a single-surgeon, standardized surgery study.

Int J Retina Vitreous

January 2025

Department of Retina and Vitreous, Narayana Nethralaya, #121/C, 1st R Block, Chord Road, Rajaji Nagar, Bengaluru, 560010, India.

Purpose: To evaluate the predictive accuracy of various machine learning (ML) statistical models in forecasting postoperative visual acuity (VA) outcomes following macular hole (MH) surgery using preoperative optical coherence tomography (OCT) parameters.

Methods: This retrospective study included 158 eyes (151 patients) with full-thickness MHs treated between 2017 and 2023 by the same surgeon and using the same intraoperative surgical technique. Data from electronic medical records and OCT scans were extracted, with OCT-derived qualitative and quantitative MH characteristics recorded.

View Article and Find Full Text PDF

Ulcerative colitis (UC) is a chronic inflammatory bowel disease characterized by intestinal inflammation and autoimmune responses. This study aimed to identify diagnostic biomarkers for UC through bioinformatics analysis and machine learning, and to validate these findings through immunofluorescence staining of clinical samples. Differential expression analysis was conducted on expression profile datasets from 4 UC samples.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!