Background: Local policymakers require information about public health, housing and well-being at small geographical areas. A municipality can for example use this information to organize targeted activities with the aim of improving the well-being of their residents. Surveys are often used to gather data, but many neighborhoods can have only few or even zero respondents. In that case, estimating the status of the local population directly from survey responses is prone to be unreliable.

Methods: Small Area Estimation (SAE) is a technique to provide estimates at small geographical levels with only few or even zero respondents. In classical individual-level SAE, a complex statistical regression model is fitted to the survey responses by using auxiliary administrative data for the population as predictors, the missing responses are then predicted and aggregated to the desired geographical level. In this paper we compare gradient boosted trees (XGBoost), a well-known machine learning technique, to a structured additive regression model (STAR) designed for the specific problem of estimating public health and well-being in the whole population of the Netherlands.

Results: We compare the accuracy and performance of these models using out-of-sample predictions with five-fold Cross Validation (5CV). We do this for three data sets of different sample sizes and outcome types. Compared to the STAR model, gradient boosted trees are able to improve both the accuracy of the predictions and the total time taken to get these predictions. Even though the models appear quite similar in overall accuracy, the small area predictions at neighborhood level sometimes differ significantly. It may therefore make sense to pursue slightly more accurate models for better predictions into small areas. However, one of the biggest benefits is that XGBoost does not require prior knowledge or model specification. Data preparation and modelling is much easier, since the method automatically handles missing data, non-linear responses, interactions and accounts for spatial correlation structures.

Conclusions: In this paper we provide new nationwide estimates of health, housing and well-being indicators at neighborhood level in the Netherlands, see 'Online materials'. We demonstrate that machine learning provides a good alternative to complex statistical regression modelling for small area estimation in terms of accuracy, robustness, speed and data preparation. These results can be used to make appropriate policy decisions at a local level and make recommendations about which estimation methods are beneficial in terms of accuracy, time and budget constraints.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9169293PMC
http://dx.doi.org/10.1186/s12942-022-00304-5DOI Listing

Publication Analysis

Top Keywords

small area
16
machine learning
12
area estimation
12
health housing
12
housing well-being
12
well-being population
8
public health
8
small geographical
8
survey responses
8
complex statistical
8

Similar Publications

Impacts of lateral conductive heat flow on ground temperature and implications for permafrost modeling.

Sci Rep

December 2024

Canada Centre for Remote Sensing, Canada Centre for Mapping and Earth Observation, Natural Resources Canada, 580 Booth Street, Ottawa, ON, K1A 0E4, Canada.

Permafrost ground temperature and its spatial distribution are usually calculated using one-dimensional models based on heat flow in the vertical direction. Here, we theoretically calculated the impacts of lateral conductive heat flow on ground temperature under equilibrium and transient conditions. The results show that lateral heat flow has strong impacts on ground temperature, especially in deep ground.

View Article and Find Full Text PDF

A novel air-to-liquid mass transfer system using wetted rotating membranes was designed to enhance air-to-liquid carbon dioxide (CO) mass transfer efficiency. Traditional methods, such as sparging, are energy-intensive, but the rotating membrane reduces energy demands by optimising membrane wetting via rotational motion. Experimental tests were conducted using a small-scale system with a membrane width of 0.

View Article and Find Full Text PDF

Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder and its underlying neuroanatomical mechanisms still remain unclear. The scaled subprofile model of principal component analysis (SSM-PCA) is a data-driven multivariate technique for capturing stable disease-related spatial covariance pattern. Here, SSM-PCA is innovatively applied to obtain robust ASD-related gray matter volume pattern associated with clinical symptoms.

View Article and Find Full Text PDF

Purpose: Using a thin semitendinosus tendon as an autograft is a risk factor for poor clinical outcomes after anterior cruciate ligament reconstruction. Preoperative evaluation of the cross-sectional area of the semitendinosus tendon using magnetic resonance imaging is useful. However, studies comparing the cross-sectional area of the semitendinosus tendon on magnetic resonance imaging and the collagen fibril diameter of the semitendinosus tendon are lacking.

View Article and Find Full Text PDF

Molecular Characterization of in Mazandaran Province, North of Iran.

Arch Razi Inst

June 2024

Department of Parasitology, Ayatollah Rouhani Hospital, Babol Medical Sciences University, Mazandaran, Iran.

is a parasitic nematode that lives in the mucosa of the small intestine and causes strongyloidiasis in humans. Mazandaran is among the endemic areas of this parasite in Iran. For detecting larvae in stool samples, various techniques, such as PCR technique have been used.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!