Species distribution modeling often involves high-dimensional environmental data. Large amounts of data and multicollinearity among covariates impose challenges to statistical models in variable selection for reliable inferences of the effects of environmental factors on the spatial distribution of species. Few studies have evaluated and compared the performance of multiple machine learning (ML) models in handling multicollinearity. Here, we assessed the effectiveness of removal of correlated covariates and regularization to cope with multicollinearity in ML models for habitat suitability. Three machine learning algorithms maximum entropy (MaxEnt), random forests (RFs), and support vector machines (SVMs) were applied to the original data (OD) of 27 landscape variables, reduced data (RD) with 14 highly correlated covariates being removed, and 15 principal components (PC) of the OD accounting for 90% of the original variability. The performance of the three ML models was measured with the area under the curve and continuous Boyce index. We collected 663 nonduplicated presence locations of Eastern wild turkeys () across the state of Mississippi, United States. Of the total locations, 453 locations separated by a distance of ≥2 km were used to train the three ML algorithms on the OD, RD, and PC data, respectively. The remaining 210 locations were used to validate the trained ML models to measure ML performance. Three ML models had excellent performance on the RD and PC data. MaxEnt and SVMs had good performance on the OD data, indicating the adequacy of regularization of the default setting for multicollinearity. Weak learning of RFs through bagging appeared to alleviate multicollinearity and resulted in excellent performance on the OD data. Regularization of ML algorithms may help exploratory studies of the effects of environmental factors on the spatial distribution and habitat suitability of wildlife.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6540709PMC
http://dx.doi.org/10.1002/ece3.5177DOI Listing

Publication Analysis

Top Keywords

machine learning
12
performance data
12
data
9
wild turkeys
8
high-dimensional environmental
8
environmental data
8
effects environmental
8
environmental factors
8
factors spatial
8
spatial distribution
8

Similar Publications

Objective: The first objective is to develop a nuchal thickness reference chart. The second objective is to compare rule-based algorithms and machine learning models in predicting small-for-gestational-age infants.

Method: This retrospective study involved singleton pregnancies at University Malaya Medical Centre, Malaysia, developed a nuchal thickness chart and evaluated its predictive value for small-for-gestational-age using Malaysian and Singapore cohorts.

View Article and Find Full Text PDF

Radiography is a field of medicine inherently intertwined with technology. The dependency on technology is very high for obtaining images in ultrasound (US), computed tomography (CT), and magnetic resonance imaging (MRI). Although the reduction in radiation dose is not applicable in US and MRI, advancements in technology have made it possible in CT, with ongoing studies aimed at further optimization.

View Article and Find Full Text PDF

Purpose: Patients with advanced non-small cell lung cancer (NSCLC) have varying responses to immunotherapy, but there are no reliable, accepted biomarkers to accurately predict its therapeutic efficacy. The present study aimed to construct individualized models through automatic machine learning (autoML) to predict the efficacy of immunotherapy in patients with inoperable advanced NSCLC.

Methods: A total of 63 eligible participants were included and randomized into training and validation groups.

View Article and Find Full Text PDF

Plastic waste management is one of the key issues in global environmental protection. Integrating spectroscopy acquisition devices with deep learning algorithms has emerged as an effective method for rapid plastic classification. However, the challenges in collecting plastic samples and spectroscopy data have resulted in a limited number of data samples and an incomplete comparison of relevant classification algorithms.

View Article and Find Full Text PDF

Background And Aim: Discriminating between idiosyncratic drug-induced liver injury (DILI) and autoimmune hepatitis (AIH) is critical yet challenging. We aim to develop and validate a machine learning (ML)-based model to aid in this differentiation.

Methods: This multicenter cohort study utilised a development set from Beijing Friendship Hospital, with retrospective and prospective validation sets from 10 tertiary hospitals across various regions of China spanning January 2009 to May 2023.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!