Increasingly, large, nationally representative health and behavioral surveys conducted under a multistage stratified sampling scheme collect high dimensional data with correlation structured along some domain (eg, wearable sensor data measured continuously and correlated over time, imaging data with spatiotemporal correlation) with the goal of associating these data with health outcomes. Analysis of this sort requires novel methodologic work at the intersection of survey statistics and functional data analysis. Here, we address this crucial gap in the literature by proposing an estimation and inferential framework for generalizable scalar-on-function regression models for data collected under a complex survey design. We propose to: (1) estimate functional regression coefficients using weighted score equations; and (2) perform inference using novel functional balanced repeated replication and survey-weighted bootstrap for multistage survey designs. This is the first frequentist study to discuss the estimation of scalar-on-function regression models in the context of complex survey studies and to assess the validity of various inferential techniques based on re-sampling methods via a comprehensive simulation study. We implement our methods to predict mortality using diurnal activity profiles measured via wearable accelerometers using the National Health and Nutrition Examination Survey 2003-2006 data. The proposed computationally efficient methods are implemented in R software package surveySoFR.

Download full-text PDF

Source
http://dx.doi.org/10.1002/sim.10194DOI Listing

Publication Analysis

Top Keywords

scalar-on-function regression
12
complex survey
12
survey designs
8
regression models
8
data
7
survey
6
regression estimation
4
estimation inference
4
inference complex
4
designs increasingly
4

Similar Publications

Article Synopsis
  • This manuscript introduces a novel method for scalar-on-distribution regression, where subject-specific distributions serve as covariates to predict a single outcome, bypassing the need for prior estimation of these distributions.
  • The proposed approach uses observed repeated measures directly as covariates and applies a Gaussian process prior, achieving efficient Bayesian inference without needing intermediate density estimates.
  • The method shows superior performance in simulation studies compared to traditional regression that requires estimating densities first, especially when there are limited repeated measures per subject, and it also accommodates various forms of data dependencies.
View Article and Find Full Text PDF

Regression and alignment for functional data and network topology.

Biostatistics

August 2024

The Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, 423 Guardian Drive, University of Pennsylvania, Philadelphia, PA, 19104, United States.

In the brain, functional connections form a network whose topological organization can be described by graph-theoretic network diagnostics. These include characterizations of the community structure, such as modularity and participation coefficient, which have been shown to change over the course of childhood and adolescence. To investigate if such changes in the functional network are associated with changes in cognitive performance during development, network studies often rely on an arbitrary choice of preprocessing parameters, in particular the proportional threshold of network edges.

View Article and Find Full Text PDF

Increasingly, large, nationally representative health and behavioral surveys conducted under a multistage stratified sampling scheme collect high dimensional data with correlation structured along some domain (eg, wearable sensor data measured continuously and correlated over time, imaging data with spatiotemporal correlation) with the goal of associating these data with health outcomes. Analysis of this sort requires novel methodologic work at the intersection of survey statistics and functional data analysis. Here, we address this crucial gap in the literature by proposing an estimation and inferential framework for generalizable scalar-on-function regression models for data collected under a complex survey design.

View Article and Find Full Text PDF

Predicting plant disease epidemics using boosted regression trees.

Infect Dis Model

December 2024

School of Mathematics and Statistics, Huaiyin Normal University, Huaian, 223300, PR China.

Article Synopsis
  • Plant diseases can be affected by weather, but it's hard to find the right weather factors to predict them.
  • In a study, researchers used a special method to predict a wheat disease called Fusarium head blight by looking at weather data.
  • They found that their new method worked really well, so another team tried a different way using boosted regression trees and got good results too.
View Article and Find Full Text PDF

Wearable devices such as the ActiGraph are now commonly used in research to monitor or track physical activity. This trend corresponds with the growing need to assess the relationships between physical activity and health outcomes, such as obesity, accurately. Device-based physical activity measures are best treated as functions when assessing their associations with scalar-valued outcomes such as body mass index.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!