The ecological and environmental science communities have embraced machine learning (ML) for empirical modelling and prediction. However, going beyond prediction to draw insights into underlying functional relationships between response variables and environmental 'drivers' is less straightforward. Deriving ecological insights from fitted ML models requires techniques to extract the 'learning' hidden in the ML models.We revisit the theoretical background and effectiveness of four approaches for deriving insights from ML: ranking independent variable importance (Gini importance, GI; permutation importance, PI; split importance, SI; and conditional permutation importance, CPI), and two approaches for inference of bivariate functional relationships (partial dependence plots, PDP; and accumulated local effect plots, ALE). We also explore the use of a surrogate model for visualization and interpretation of complex multi-variate relationships between response variables and environmental drivers. We examine the challenges and opportunities for extracting ecological insights with these interpretation approaches. Specifically, we aim to improve interpretation of ML models by investigating how effectiveness relates to (a) interpretation algorithm, (b) sample size and (c) the presence of spurious explanatory variables.We base the analysis on simulations with known underlying functional relationships between response and predictor variables, with added white noise and the presence of correlated but non-influential variables. The results indicate that deriving ecological insight is strongly affected by interpretation algorithm and spurious variables, and moderately impacted by sample size. Removing spurious variables improves interpretation of ML models. Meanwhile, increasing sample size has limited value in the presence of spurious variables, but increasing sample size does improves performance once spurious variables are omitted. Among the four ranking methods, SI is slightly more effective than the other methods in the presence of spurious variables, while GI and SI yield higher accuracy when spurious variables are removed. PDP is more effective in retrieving underlying functional relationships than ALE, but its reliability declines sharply in the presence of spurious variables. Visualization and interpretation of the interactive effects of predictors and the response variable can be enhanced using surrogate models, including three-dimensional visualizations and use of loess planes to represent independent variable effects and interactions.Machine learning analysts should be aware that including correlated independent variables in ML models with no clear causal relationship to response variables can interfere with ecological inference. When ecological inference is important, ML models should be constructed with independent variables that have clear causal effects on response variables. While interpreting ML models for ecological inference remains challenging, we show that careful choice of interpretation methods, exclusion of spurious variables and adequate sample size can provide more and better opportunities to 'learn from machine learning'.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9292299PMC
http://dx.doi.org/10.1111/2041-210X.13686DOI Listing

Publication Analysis

Top Keywords

spurious variables
32
sample size
20
functional relationships
16
variables
16
response variables
16
presence spurious
16
underlying functional
12
relationships response
12
ecological inference
12
spurious
9

Similar Publications

Background: In causal analyses, some third factor may distort the relationship between the exposure and the outcome variables under study, which gives spurious results. In this case, treatment groups and control groups that receive and do not receive the exposure are different from one another in some other essential variables, called confounders.

Method: Place of birth was used as exposure variable and age-specific childhood vaccination status was used as outcome variables.

View Article and Find Full Text PDF

Decades of empirical ecological research have focused on understanding ecological dynamics at local scales. Remote sensing products can help to scale-up ecological understanding to support management actions that need to be implemented across large spatial extents. This new avenue for remote sensing applications requires careful consideration of sources of potential bias that can lead to spurious causal relationships.

View Article and Find Full Text PDF

Aims: At the basis of many important research questions is causality - does X causally impact Y? For behavioural and psychiatric traits, answering such questions can be particularly challenging, as they are highly complex and multifactorial. 'Triangulation' refers to prospectively choosing, conducting and integrating several methods to investigate a specific causal question. If different methods, with different sources of bias, all indicate a causal effect, the finding is much less likely to be spurious.

View Article and Find Full Text PDF

An Energy Approach to the Modal Identification of a Variable Thickness Quartz Crystal Plate.

Sensors (Basel)

October 2024

Mechanical Engineering Program, Physical Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 239556900, Saudi Arabia.

The primary objective of modal identification for variable thickness quartz plates is to ascertain their dominant operating mode, which is essential for examining the vibration of beveled quartz resonators. These beveled resonators are plate structures with varying thicknesses. While the beveling process mitigates some spurious modes, it still presents challenges for modal identification.

View Article and Find Full Text PDF

Application of Deconvolution in Path Integral Simulations.

J Chem Theory Comput

November 2024

Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, Budapest H-1117, Hungary.

In path integral molecular dynamics (PIMD) simulations, atoms are represented by several replicas connected with harmonic springs, so additional vibrations appear beyond the physical vibrations because of the normal mode frequencies coming from the springs of the ring polymer. In harmonic approximation, the frequencies of these internal modes can be determined exactly from the physical frequencies. We show that this formal effect of the path integral simulations on the vibrations can be considered as a convolution if we use the square of the frequency as an independent variable.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!