New technologies for monitoring biodiversity such as environmental (e)DNA, passive acoustic monitoring, and optical sensors promise to generate automated spatiotemporal community observations at unprecedented scales and resolutions. Here, we introduce 'novel community data' as an umbrella term for these data. We review the emerging field around novel community data, focusing on new ecological questions that could be addressed; the analytical tools available or needed to make best use of these data; and the potential implications of these developments for policy and conservation.
View Article and Find Full Text PDFUnmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses.
View Article and Find Full Text PDFThe life span of leaves increases with their mass per unit area (LMA). It is unclear why. Here, we show that this empirical generalization (the foundation of the worldwide leaf economics spectrum) is a consequence of natural selection, maximizing average net carbon gain over the leaf life cycle.
View Article and Find Full Text PDFThe accurate extraction of species-abundance information from DNA-based data (metabarcoding, metagenomics) could contribute usefully to diet analysis and food-web reconstruction, the inference of species interactions, the modelling of population dynamics and species distributions, the biomonitoring of environmental state and change, and the inference of false positives and negatives. However, multiple sources of bias and noise in sampling and processing combine to inject error into DNA-based data sets. To understand how to extract abundance information, it is useful to distinguish two concepts.
View Article and Find Full Text PDFUrbanised estuaries, ports and harbours are often utilised for recreational purposes, notably recreational angling. Yet there has been little quantitative assessment of the footprint and intensity of these activities at scales suitable for spatial management. Urban and industrialised estuaries have previously been considered as having low conservation value, perhaps due to issues with contamination and disturbance.
View Article and Find Full Text PDFGeneralized linear latent variable models (GLLVM) are popular tools for modeling multivariate, correlated responses. Such data are often encountered, for instance, in ecological studies, where presence-absences, counts, or biomass of interacting species are collected from a set of sites. Until very recently, the main challenge in fitting GLLVMs has been the lack of computationally efficient estimation methods.
View Article and Find Full Text PDFGeneralized linear latent variable models (GLLVMs) offer a general framework for flexibly analyzing data involving multiple responses. When fitting such models, two of the major challenges are selecting the order, that is, the number of factors, and an appropriate structure for the loading matrix, typically a sparse structure. Motivated by the application of GLLVMs to study marine species assemblages in the Southern Ocean, we propose the Ordered Factor LASSO or OFAL penalty for order selection and achieving sparsity in GLLVMs.
View Article and Find Full Text PDFWhile data transformation is a common strategy to satisfy linear modeling assumptions, a theoretical result is used to show that transformation cannot reasonably be expected to stabilize variances for small counts. Under broad assumptions, as counts get smaller, it is shown that the variance becomes proportional to the mean under monotonic transformations g(·) that satisfy g(0)=0, excepting a few pathological cases. A suggested rule-of-thumb is that if many predicted counts are less than one then data transformation cannot reasonably be expected to stabilize variances, even for a well-chosen transformation.
View Article and Find Full Text PDFWe propose a new variable selection criterion designed for use with forward selection algorithms; the score information criterion (SIC). The proposed criterion is based on score statistics which incorporate correlated response data. The main advantage of the SIC is that it is much faster to compute than existing model selection criteria when the number of predictor variables added to a model is large, this is because SIC can be computed for all candidate models without actually fitting them.
View Article and Find Full Text PDFSpecies distribution models (SDMs) are an important tool for studying the patterns of species across environmental and geographic space. For community data, a common approach involves fitting an SDM to each species separately, although the large number of models makes interpretation difficult and fails to exploit any similarities between individual species responses. A recently proposed alternative that can potentially overcome these difficulties is species archetype models (SAMs), a model-based approach that clusters species based on their environmental response.
View Article and Find Full Text PDFPresence-only data, where information is available concerning species presence but not species absence, are subject to bias due to observers being more likely to visit and record sightings at some locations than others (hereafter "observer bias"). In this paper, we describe and evaluate a model-based approach to accounting for observer bias directly--by modelling presence locations as a function of known observer bias variables (such as accessibility variables) in addition to environmental variables, then conditioning on a common level of bias to make predictions of species occurrence free of such observer bias. We implement this idea using point process models with a LASSO penalty, a new presence-only method related to maximum entropy modelling, that implicitly addresses the "pseudo-absence problem" of where to locate pseudo-absences (and how many).
View Article and Find Full Text PDFWe provide the first global test of the idea that introduced species have greater seed dispersal distances than do native species, using data for 51 introduced and 360 native species from the global literature. Counter to our expectations, there was no significant difference in mean or maximum dispersal distance between introduced and native species. Next, we asked whether differences in dispersal distance might have been obscured by differences in seed mass, plant height and dispersal syndrome, all traits that affect dispersal distance and which can differ between native and introduced species.
View Article and Find Full Text PDF1. The problems of analysing used-available data and presence-only data are equivalent, and this paper uses this equivalence as a platform for exploring opportunities for advancing analysis methodology. 2.
View Article and Find Full Text PDFModeling the spatial distribution of a species is a fundamental problem in ecology. A number of modeling methods have been developed, an extremely popular one being MAXENT, a maximum entropy modeling approach. In this article, we show that MAXENT is equivalent to a Poisson regression model and hence is related to a Poisson point process model, differing only in the intercept term, which is scale-dependent in MAXENT.
View Article and Find Full Text PDFIn allometry, bivariate techniques related to principal component analysis are often used in place of linear regression, and primary interest is in making inferences about the slope. We demonstrate that the current inferential methods are not robust to bivariate contamination, and consider four robust alternatives to the current methods -- a novel sandwich estimator approach, using robust covariance matrices derived via an influence function approach, Huber's M-estimator and the fast-and-robust bootstrap. Simulations demonstrate that Huber's M-estimators are highly efficient and robust against bivariate contamination, and when combined with the fast-and-robust bootstrap, we can make accurate inferences even from small samples.
View Article and Find Full Text PDFThe arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data.
View Article and Find Full Text PDF• It has long been believed that plant species from the tropics have higher levels of traits associated with resistance to herbivores than do species from higher latitudes. A meta-analysis recently showed that the published literature does not support this theory. However, the idea has never been tested using data gathered with consistent methods from a wide range of latitudes.
View Article and Find Full Text PDFLeaf mechanical properties strongly influence leaf lifespan, plant-herbivore interactions, litter decomposition and nutrient cycling, but global patterns in their interspecific variation and underlying mechanisms remain poorly understood. We synthesize data across the three major measurement methods, permitting the first global analyses of leaf mechanics and associated traits, for 2819 species from 90 sites worldwide. Key measures of leaf mechanical resistance varied c.
View Article and Find Full Text PDFA modification of generalized estimating equations (GEEs) methodology is proposed for hypothesis testing of high-dimensional data, with particular interest in multivariate abundance data in ecology, an important application of interest in thousands of environmental science studies. Such data are typically counts characterized by high dimensionality (in the sense that cluster size exceeds number of clusters, n>K) and over-dispersion relative to the Poisson distribution. Usual GEE methods cannot be applied in this setting primarily because sandwich estimators become numerically unstable as n increases.
View Article and Find Full Text PDF