Publications by authors named "David Warton"

New technologies for monitoring biodiversity such as environmental (e)DNA, passive acoustic monitoring, and optical sensors promise to generate automated spatiotemporal community observations at unprecedented scales and resolutions. Here, we introduce 'novel community data' as an umbrella term for these data. We review the emerging field around novel community data, focusing on new ecological questions that could be addressed; the analytical tools available or needed to make best use of these data; and the potential implications of these developments for policy and conservation.

View Article and Find Full Text PDF
Article Synopsis
  • * Using MRI and advanced AI techniques, researchers measured the muscle volumes of 208 typically developing children aged 0 to 15 years.
  • * Results showed that lower leg muscles grow asynchronously, with significant changes in muscle volume ratios occurring especially between birth and age five, which could help identify atypical growth patterns in children with neuromotor conditions.
View Article and Find Full Text PDF

Unmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses.

View Article and Find Full Text PDF
Article Synopsis
  • In regression modeling, measurement error models are essential for addressing uncertainty in predictor variables, but general tools for maximum likelihood estimation in such models are limited and often require advanced statistical knowledge.
  • A new algorithm is developed that allows researchers to include measurement error in various regression models using the Monte Carlo Expectation-Maximization (MCEM) technique, enabling them to adapt existing models easily.
  • The method is validated through simulations across different model types and comes with a software package in R (refitME) that simplifies the process of adjusting fitted models for measurement errors.
View Article and Find Full Text PDF

The life span of leaves increases with their mass per unit area (LMA). It is unclear why. Here, we show that this empirical generalization (the foundation of the worldwide leaf economics spectrum) is a consequence of natural selection, maximizing average net carbon gain over the leaf life cycle.

View Article and Find Full Text PDF

The accurate extraction of species-abundance information from DNA-based data (metabarcoding, metagenomics) could contribute usefully to diet analysis and food-web reconstruction, the inference of species interactions, the modelling of population dynamics and species distributions, the biomonitoring of environmental state and change, and the inference of false positives and negatives. However, multiple sources of bias and noise in sampling and processing combine to inject error into DNA-based data sets. To understand how to extract abundance information, it is useful to distinguish two concepts.

View Article and Find Full Text PDF
Article Synopsis
  • Multiple imputation and maximum likelihood estimation are two common methods for handling missing data, but improper multiple imputation can act similarly to a stochastic expectation-maximization approach.
  • The article suggests that traditional model selection tools like Akaike's Information Criterion (AIC) and Bayesian Information Criterion (BIC) can help effectively choose the best imputation model, which is crucial to avoid bias in analysis.
  • Simulations show that not only can incorrect imputation lead to biased parameter estimates, but also overfitting the imputation model can have negative effects, highlighting the need for careful model selection in imputation strategies.
View Article and Find Full Text PDF

Urbanised estuaries, ports and harbours are often utilised for recreational purposes, notably recreational angling. Yet there has been little quantitative assessment of the footprint and intensity of these activities at scales suitable for spatial management. Urban and industrialised estuaries have previously been considered as having low conservation value, perhaps due to issues with contamination and disturbance.

View Article and Find Full Text PDF

Generalized linear latent variable models (GLLVM) are popular tools for modeling multivariate, correlated responses. Such data are often encountered, for instance, in ecological studies, where presence-absences, counts, or biomass of interacting species are collected from a set of sites. Until very recently, the main challenge in fitting GLLVMs has been the lack of computationally efficient estimation methods.

View Article and Find Full Text PDF

Generalized linear latent variable models (GLLVMs) offer a general framework for flexibly analyzing data involving multiple responses. When fitting such models, two of the major challenges are selecting the order, that is, the number of factors, and an appropriate structure for the loading matrix, typically a sparse structure. Motivated by the application of GLLVMs to study marine species assemblages in the Southern Ocean, we propose the Ordered Factor LASSO or OFAL penalty for order selection and achieving sparsity in GLLVMs.

View Article and Find Full Text PDF
Article Synopsis
  • Bootstrapping of residuals in regression can face challenges when residuals are not identically distributed, especially in logistic or Poisson regression settings.
  • A new method called the PIT-trap has been proposed, using probability integral transform (PIT) residuals, which assume a known parametric form for the data's marginal distribution.
  • This method maintains correlation in multivariate data without requiring a model and shows improved performance in simulations compared to traditional resampling techniques.
View Article and Find Full Text PDF

While data transformation is a common strategy to satisfy linear modeling assumptions, a theoretical result is used to show that transformation cannot reasonably be expected to stabilize variances for small counts. Under broad assumptions, as counts get smaller, it is shown that the variance becomes proportional to the mean under monotonic transformations g(·) that satisfy g(0)=0, excepting a few pathological cases. A suggested rule-of-thumb is that if many predicted counts are less than one then data transformation cannot reasonably be expected to stabilize variances, even for a well-chosen transformation.

View Article and Find Full Text PDF
Article Synopsis
  • Recent technological advances have led to the development of multivariate models that allow ecologists to analyze species abundances together and assess interactions between different taxa and environmental variables.
  • These joint models can help estimate correlations between species, perform multivariate environmental impact assessments, and handle missing data, enhancing predictive accuracy across species.
  • The text presents examples of these methods in action, discusses new computational tools available to researchers, and outlines potential future developments in this field.
View Article and Find Full Text PDF

We propose a new variable selection criterion designed for use with forward selection algorithms; the score information criterion (SIC). The proposed criterion is based on score statistics which incorporate correlated response data. The main advantage of the SIC is that it is much faster to compute than existing model selection criteria when the number of predictor variables added to a model is large, this is because SIC can be computed for all candidate models without actually fitting them.

View Article and Find Full Text PDF

Species distribution models (SDMs) are an important tool for studying the patterns of species across environmental and geographic space. For community data, a common approach involves fitting an SDM to each species separately, although the large number of models makes interpretation difficult and fails to exploit any similarities between individual species responses. A recently proposed alternative that can potentially overcome these difficulties is species archetype models (SAMs), a model-based approach that clusters species based on their environmental response.

View Article and Find Full Text PDF

Presence-only data, where information is available concerning species presence but not species absence, are subject to bias due to observers being more likely to visit and record sightings at some locations than others (hereafter "observer bias"). In this paper, we describe and evaluate a model-based approach to accounting for observer bias directly--by modelling presence locations as a function of known observer bias variables (such as accessibility variables) in addition to environmental variables, then conditioning on a common level of bias to make predictions of species occurrence free of such observer bias. We implement this idea using point process models with a LASSO penalty, a new presence-only method related to maximum entropy modelling, that implicitly addresses the "pseudo-absence problem" of where to locate pseudo-absences (and how many).

View Article and Find Full Text PDF

We provide the first global test of the idea that introduced species have greater seed dispersal distances than do native species, using data for 51 introduced and 360 native species from the global literature. Counter to our expectations, there was no significant difference in mean or maximum dispersal distance between introduced and native species. Next, we asked whether differences in dispersal distance might have been obscured by differences in seed mass, plant height and dispersal syndrome, all traits that affect dispersal distance and which can differ between native and introduced species.

View Article and Find Full Text PDF

1. The problems of analysing used-available data and presence-only data are equivalent, and this paper uses this equivalence as a platform for exploring opportunities for advancing analysis methodology. 2.

View Article and Find Full Text PDF

Modeling the spatial distribution of a species is a fundamental problem in ecology. A number of modeling methods have been developed, an extremely popular one being MAXENT, a maximum entropy modeling approach. In this article, we show that MAXENT is equivalent to a Poisson regression model and hence is related to a Poisson point process model, differing only in the intercept term, which is scale-dependent in MAXENT.

View Article and Find Full Text PDF

In allometry, bivariate techniques related to principal component analysis are often used in place of linear regression, and primary interest is in making inferences about the slope. We demonstrate that the current inferential methods are not robust to bivariate contamination, and consider four robust alternatives to the current methods -- a novel sandwich estimator approach, using robust covariance matrices derived via an influence function approach, Huber's M-estimator and the fast-and-robust bootstrap. Simulations demonstrate that Huber's M-estimators are highly efficient and robust against bivariate contamination, and when combined with the fast-and-robust bootstrap, we can make accurate inferences even from small samples.

View Article and Find Full Text PDF

The arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data.

View Article and Find Full Text PDF

• It has long been believed that plant species from the tropics have higher levels of traits associated with resistance to herbivores than do species from higher latitudes. A meta-analysis recently showed that the published literature does not support this theory. However, the idea has never been tested using data gathered with consistent methods from a wide range of latitudes.

View Article and Find Full Text PDF

Leaf mechanical properties strongly influence leaf lifespan, plant-herbivore interactions, litter decomposition and nutrient cycling, but global patterns in their interspecific variation and underlying mechanisms remain poorly understood. We synthesize data across the three major measurement methods, permitting the first global analyses of leaf mechanics and associated traits, for 2819 species from 90 sites worldwide. Key measures of leaf mechanical resistance varied c.

View Article and Find Full Text PDF

A modification of generalized estimating equations (GEEs) methodology is proposed for hypothesis testing of high-dimensional data, with particular interest in multivariate abundance data in ecology, an important application of interest in thousands of environmental science studies. Such data are typically counts characterized by high dimensionality (in the sense that cluster size exceeds number of clusters, n>K) and over-dispersion relative to the Poisson distribution. Usual GEE methods cannot be applied in this setting primarily because sandwich estimators become numerically unstable as n increases.

View Article and Find Full Text PDF