When building regression models for multivariate abundance data in ecology, it is important to allow for the fact that the species are correlated with each other. Moreover, there is often evidence species exhibit some degree of homogeneity in their responses to each environmental predictor, and that most species are informed by only a subset of predictors. We propose a generalized estimating equation (GEE) approach for simultaneous homogeneity pursuit (ie, grouping species with similar coefficient values while allowing differing groups for different covariates) and variable selection in regression models for multivariate abundance data.
View Article and Find Full Text PDFHuman remains are oftentimes located with textile materials, making them a ubiquitous source of physical evidence. Human remains are also frequently discovered in outdoor environments, increasing the exposure to scavenging activity and soft-tissue decomposition. In such cases, postmortem interval (PMI) estimations can be challenging for investigators when attempting to use traditional methods for reconstructive purposes.
View Article and Find Full Text PDFUnmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses.
View Article and Find Full Text PDFMicroarray studies, in order to identify genes associated with an outcome of interest, usually produce noisy measurements for a large number of gene expression features from a small number of subjects. One common approach to analyzing such high-dimensional data is to use linear errors-in-variables (EIV) models; however, current methods for fitting such models are computationally expensive. In this paper, we present two efficient screening procedures, namely, corrected penalized marginal screening (PMSc) and corrected sure independence screening (SISc), to reduce the number of variables for final model building.
View Article and Find Full Text PDFIdentifying generalisable processes that underpin population dynamics is crucial for understanding successional patterns. While longitudinal or chronosequence data are powerful tools for doing so, the traditional focus on community-level shifts in taxonomic and functional composition rather than species-level trait-demography relationships has made generalisation difficult. Using joint species distribution models, we demonstrate how three traits-photosynthetic rate, adult stature, and seed mass-moderate recruitment and sapling mortality rates of 46 woody species during secondary succession.
View Article and Find Full Text PDFMultivariate spatial data, where multiple responses are simultaneously recorded across spatially indexed observational units, are routinely collected in a wide variety of disciplines. For example, the Southern Ocean Continuous Plankton Recorder survey collects records of zooplankton communities in the Indian sector of the Southern Ocean, with the aim of identifying and quantifying spatial patterns in biodiversity in response to environmental change. One increasingly popular method for modeling such data is spatial generalized linear latent variable models (GLLVMs), where the correlation across sites is captured by a spatial covariance function in the latent variables.
View Article and Find Full Text PDFSocial information obtained from heterospecifics can enhance individual fitness by reducing environmental uncertainty, making it an important driver of mixed-species grouping behavior. Heterospecific groups are well documented among fishes, yet are notably more prevalent among juveniles than more advanced life stages, implying that the adaptive value of joining other species is greater during this developmental period. We propose this phenomenon can be explained by the heightened ecological relevance of heterospecifically produced cues pertaining to predation risk and or resources, as body-size uniformity inherent in early ontogeny yields greater overlap in predator and prey guild membership across juveniles of disparate taxa.
View Article and Find Full Text PDFSpatiotemporal patterns in biological communities are typically driven by environmental factors and species interactions. Spatial data from communities are naturally described by stacking models for all species in the community. Two important considerations in such multispecies or joint species distribution models (JSDMs) are measurement errors and correlations between species.
View Article and Find Full Text PDFGeneralized linear latent variable models (GLLVM) are popular tools for modeling multivariate, correlated responses. Such data are often encountered, for instance, in ecological studies, where presence-absences, counts, or biomass of interacting species are collected from a set of sites. Until very recently, the main challenge in fitting GLLVMs has been the lack of computationally efficient estimation methods.
View Article and Find Full Text PDFDelineating naturally occurring and self-sustaining subpopulations (stocks) of a species is an important task, especially for species harvested from the wild. Despite its central importance to natural resource management, analytical methods used to delineate stocks are often, and increasingly, borrowed from superficially similar analytical tasks in human genetics even though models specifically for stock identification have been previously developed. Unfortunately, the analytical tasks in resource management and human genetics are not identical-questions about humans are typically aimed at inferring ancestry (often referred to as "admixture") rather than breeding stocks.
View Article and Find Full Text PDFIn addition to the processes structuring free-living communities, host-associated microbiota are directly or indirectly shaped by the host. Therefore, microbiota data have a hierarchical structure where samples are nested under one or several variables representing host-specific factors, often spanning multiple levels of biological organization. Current statistical methods do not accommodate this hierarchical data structure and therefore cannot explicitly account for the effect of the host in structuring the microbiota.
View Article and Find Full Text PDFGeneralized linear latent variable models (GLLVMs) offer a general framework for flexibly analyzing data involving multiple responses. When fitting such models, two of the major challenges are selecting the order, that is, the number of factors, and an appropriate structure for the loading matrix, typically a sparse structure. Motivated by the application of GLLVMs to study marine species assemblages in the Southern Ocean, we propose the Ordered Factor LASSO or OFAL penalty for order selection and achieving sparsity in GLLVMs.
View Article and Find Full Text PDFArctic plant communities are altered by climate changes. The magnitude of these alterations depends on whether species distributions are determined by macroclimatic conditions, by factors related to local topography, or by biotic interactions. Our current understanding of the relative importance of these conditions is limited due to the scarcity of studies, especially in the High Arctic.
View Article and Find Full Text PDFDormancy and germination requirements determine the timing and magnitude of seedling emergence, with important consequences for seedling survival and growth. Physiological dormancy is the most widespread form of dormancy in flowering plants, yet the seed ecology of species with this dormancy type is poorly understood in fire-prone vegetation. The role of seasonal temperatures as germination cues in these habitats is often overlooked due to a focus on direct fire cues such as heat shock and smoke, and little is known about the combined effects of multiple fire-related cues and environmental cues as these are seldom assessed in combination.
View Article and Find Full Text PDFTechnological advances have enabled a new class of multivariate models for ecology, with the potential now to specify a statistical model for abundances jointly across many taxa, to simultaneously explore interactions across taxa and the response of abundance to environmental variables. Joint models can be used for several purposes of interest to ecologists, including estimating patterns of residual correlation across taxa, ordination, multivariate inference about environmental effects and environment-by-trait interactions, accounting for missing predictors, and improving predictions in situations where one can leverage knowledge of some species to predict others. We demonstrate this by example and discuss recent computation tools and future directions.
View Article and Find Full Text PDFSpecies distribution models (SDMs) are an important tool for studying the patterns of species across environmental and geographic space. For community data, a common approach involves fitting an SDM to each species separately, although the large number of models makes interpretation difficult and fails to exploit any similarities between individual species responses. A recently proposed alternative that can potentially overcome these difficulties is species archetype models (SAMs), a model-based approach that clusters species based on their environmental response.
View Article and Find Full Text PDFMost plant species have a range of traits that deter herbivores. However, understanding of how different defences are related to one another is surprisingly weak. Many authors argue that defence traits trade off against one another, while others argue that they form coordinated defence syndromes.
View Article and Find Full Text PDFThe arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data.
View Article and Find Full Text PDF