The ecological and environmental science communities have embraced machine learning (ML) for empirical modelling and prediction. However, going beyond prediction to draw insights into underlying functional relationships between response variables and environmental 'drivers' is less straightforward. Deriving ecological insights from fitted ML models requires techniques to extract the 'learning' hidden in the ML models.We revisit the theoretical background and effectiveness of four approaches for deriving insights from ML: ranking independent variable importance (Gini importance, GI; permutation importance, PI; split importance, SI; and conditional permutation importance, CPI), and two approaches for inference of bivariate functional relationships (partial dependence plots, PDP; and accumulated local effect plots, ALE). We also explore the use of a surrogate model for visualization and interpretation of complex multi-variate relationships between response variables and environmental drivers. We examine the challenges and opportunities for extracting ecological insights with these interpretation approaches. Specifically, we aim to improve interpretation of ML models by investigating how effectiveness relates to (a) interpretation algorithm, (b) sample size and (c) the presence of spurious explanatory variables.We base the analysis on simulations with known underlying functional relationships between response and predictor variables, with added white noise and the presence of correlated but non-influential variables. The results indicate that deriving ecological insight is strongly affected by interpretation algorithm and spurious variables, and moderately impacted by sample size. Removing spurious variables improves interpretation of ML models. Meanwhile, increasing sample size has limited value in the presence of spurious variables, but increasing sample size does improves performance once spurious variables are omitted. Among the four ranking methods, SI is slightly more effective than the other methods in the presence of spurious variables, while GI and SI yield higher accuracy when spurious variables are removed. PDP is more effective in retrieving underlying functional relationships than ALE, but its reliability declines sharply in the presence of spurious variables. Visualization and interpretation of the interactive effects of predictors and the response variable can be enhanced using surrogate models, including three-dimensional visualizations and use of loess planes to represent independent variable effects and interactions.Machine learning analysts should be aware that including correlated independent variables in ML models with no clear causal relationship to response variables can interfere with ecological inference. When ecological inference is important, ML models should be constructed with independent variables that have clear causal effects on response variables. While interpreting ML models for ecological inference remains challenging, we show that careful choice of interpretation methods, exclusion of spurious variables and adequate sample size can provide more and better opportunities to 'learn from machine learning'.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9292299 | PMC |
http://dx.doi.org/10.1111/2041-210X.13686 | DOI Listing |
BMC Med Inform Decis Mak
December 2024
School of Mathematics, Statistics & Computer Science, University of KwaZulu Natal, Durban, South Africa.
Background: In causal analyses, some third factor may distort the relationship between the exposure and the outcome variables under study, which gives spurious results. In this case, treatment groups and control groups that receive and do not receive the exposure are different from one another in some other essential variables, called confounders.
Method: Place of birth was used as exposure variable and age-specific childhood vaccination status was used as outcome variables.
Trends Ecol Evol
November 2024
Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO 80309, USA.
Decades of empirical ecological research have focused on understanding ecological dynamics at local scales. Remote sensing products can help to scale-up ecological understanding to support management actions that need to be implemented across large spatial extents. This new avenue for remote sensing applications requires careful consideration of sources of potential bias that can lead to spurious causal relationships.
View Article and Find Full Text PDFEpidemiol Psychiatr Sci
November 2024
Lovisenberg Diaconal Hospital, Nic Waals Institute, Oslo, Norway.
Aims: At the basis of many important research questions is causality - does X causally impact Y? For behavioural and psychiatric traits, answering such questions can be particularly challenging, as they are highly complex and multifactorial. 'Triangulation' refers to prospectively choosing, conducting and integrating several methods to investigate a specific causal question. If different methods, with different sources of bias, all indicate a causal effect, the finding is much less likely to be spurious.
View Article and Find Full Text PDFSensors (Basel)
October 2024
Mechanical Engineering Program, Physical Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 239556900, Saudi Arabia.
The primary objective of modal identification for variable thickness quartz plates is to ascertain their dominant operating mode, which is essential for examining the vibration of beveled quartz resonators. These beveled resonators are plate structures with varying thicknesses. While the beveling process mitigates some spurious modes, it still presents challenges for modal identification.
View Article and Find Full Text PDFJ Chem Theory Comput
November 2024
Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, Budapest H-1117, Hungary.
In path integral molecular dynamics (PIMD) simulations, atoms are represented by several replicas connected with harmonic springs, so additional vibrations appear beyond the physical vibrations because of the normal mode frequencies coming from the springs of the ring polymer. In harmonic approximation, the frequencies of these internal modes can be determined exactly from the physical frequencies. We show that this formal effect of the path integral simulations on the vibrations can be considered as a convolution if we use the square of the frequency as an independent variable.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!