The presence of missing values in a time-series dataset is a very common and well-known problem. Various statistical and machine learning methods have been developed to overcome this problem, with the aim of filling in the missing values in the data. However, the performances of these methods vary widely, showing a high dependence on the type of data and correlations within the data. In our study, we performed some of the well-known imputation methods, such as expectation maximization, k-nearest neighbor, iterative imputer, random forest, and simple imputer, to impute missing data obtained from smart, wearable health trackers. In this manuscript, we proposed the use of data binning for imputation. We showed that the use of data binned around the missing time interval provides a better imputation than the use of a whole dataset. Imputation was performed for 15 min and 1 h of continuous missing data. We used a dataset with different bin sizes, such as 15 min, 30 min, 45 min, and 1 h, and we carried out evaluations using root mean square error (RMSE) values. We observed that the expectation maximization algorithm worked best for the use of binned data. This was followed by the simple imputer, iterative imputer, and k-nearest neighbor, whereas the random forest method had no effect on data binning during imputation. Moreover, the smallest bin sizes of 15 min and 1 h were observed to provide the lowest RMSE values for the majority of the time frames during the imputation of 15 min and 1 h of missing data, respectively. Although applicable to digital health data, we think that this method will also find applicability in other domains.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9919790PMC
http://dx.doi.org/10.3390/s23031454DOI Listing

Publication Analysis

Top Keywords

data
12
missing data
12
binned data
8
better imputation
8
missing time
8
missing values
8
expectation maximization
8
k-nearest neighbor
8
iterative imputer
8
random forest
8

Similar Publications

Systematic bias in malaria parasite relatedness estimation.

G3 (Bethesda)

January 2025

Infectious Disease Epidemiology and Analytics G5 Unit, Institut Pasteur, Université Paris Cité, Paris 75015, France.

Genetic studies of Plasmodium parasites increasingly feature relatedness estimates. However, various aspects of malaria parasite relatedness estimation are not fully understood. For example, relatedness estimates based on whole-genome-sequence (WGS) data often exceed those based on sparser data types.

View Article and Find Full Text PDF

The demographic history of a population, and the distribution of fitness effects (DFE) of newly arising mutations in functional genomic regions, are fundamental factors dictating both genetic variation and evolutionary trajectories. Although both demographic and DFE inference has been performed extensively in humans, these approaches have generally either been limited to simple demographic models involving a single population, or, where a complex population history has been inferred, without accounting for the potentially confounding effects of selection at linked sites. Taking advantage of the coding-sparse nature of the genome, we propose a 2-step approach in which coalescent simulations are first used to infer a complex multi-population demographic model, utilizing large non-functional regions that are likely free from the effects of background selection.

View Article and Find Full Text PDF

Hypoxia is a major cause of pulmonary hypertension (PH) worldwide, and it is likely that interstitial pulmonary macrophages contribute to this vascular pathology. We observed in hypoxia-exposed mice an increase in resident interstitial macrophages, which expanded through proliferation and expressed the monocyte recruitment ligand CCL2. We also observed an increase in CCR2+ macrophages through recruitment, which express the protein thrombospondin-1 that functionally activates TGF-beta to cause vascular disease.

View Article and Find Full Text PDF

Novel genetic insight for psoriasis: integrative genome-wide analyses in 863 080 individuals and proteome-wide Mendelian randomization.

Brief Bioinform

November 2024

Department of Dermatology, Daping Hospital, Army Medical University, No. 10, Changjiang Branch Road, Yuzhong District, Chongqing 400042, China.

Psoriasis affects a significant proportion of the worldwide population and causes an extremely heavy psychological and physical burden. The existing therapeutic schemes have many deficiencies such as limited efficacies and various side effects. Therefore, novel ways of treating psoriasis are urgently needed.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!