The real-world data analysis and processing using data mining techniques often are facing observations that contain missing values. The main challenge of mining datasets is the existence of missing values. The missing values in a dataset should be imputed using the imputation method to improve the data mining methods' accuracy and performance. There are existing techniques that use k-nearest neighbors algorithm for imputing the missing values but determining the appropriate k value can be a challenging task. There are other existing imputation techniques that are based on hard clustering algorithms. When records are not well-separated, as in the case of missing data, hard clustering provides a poor description tool in many cases. In general, the imputation depending on similar records is more accurate than the imputation depending on the entire dataset's records. Improving the similarity among records can result in improving the imputation performance. This paper proposes two numerical missing data imputation methods. A hybrid missing data imputation method is initially proposed, called KI, that incorporates k-nearest neighbors and iterative imputation algorithms. The best set of nearest neighbors for each missing record is discovered through the records similarity by using the k-nearest neighbors algorithm (kNN). To improve the similarity, a suitable k value is estimated automatically for the kNN. The iterative imputation method is then used to impute the missing values of the incomplete records by using the global correlation structure among the selected records. An enhanced hybrid missing data imputation method is then proposed, called FCKI, which is an extension of KI. It integrates fuzzy c-means, k-nearest neighbors, and iterative imputation algorithms to impute the missing data in a dataset. The fuzzy c-means algorithm is selected because the records can belong to multiple clusters at the same time. This can lead to further improvement for similarity. FCKI searches a cluster, instead of the whole dataset, to find the best k-nearest neighbors. It applies two levels of similarity to achieve a higher imputation accuracy. The performance of the proposed imputation techniques is assessed by using fifteen datasets with variant missing ratios for three types of missing data; MCAR, MAR, MNAR. These different missing data types are generated in this work. The datasets with different sizes are used in this paper to validate the model. Therefore, proposed imputation techniques are compared with other missing data imputation methods by means of three measures; the root mean square error (RMSE), the normalized root mean square error (NRMSE), and the mean absolute error (MAE). The results show that the proposed methods achieve better imputation accuracy and require significantly less time than other missing data imputation methods.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8323724 | PMC |
http://dx.doi.org/10.7717/peerj-cs.619 | DOI Listing |
Br J Hosp Med (Lond)
January 2025
Department of Pediatrics, Huoqiu First People's Hospital, Lu'an, Anhui, China.
Lobar pneumonia is an acute inflammation with increasing incidence globally. Delayed treatment can lead to severe complications, posing life-threatening risks. Thus, it is crucial to determine effective treatment methods to improve the prognosis of children with lobar pneumonia.
View Article and Find Full Text PDFInt J Cancer
January 2025
Inequalities in Cancer Outcomes Network (ICON) group, Department of Health Services Research and Policy, Faculty of Public Health and Policy, London School of Hygiene & Tropical Medicine, London, UK.
We aimed to investigate socio-economic inequalities in second primary cancer (SPC) incidence among breast cancer survivors. Using Data from cancer registries in England, we included all women diagnosed with a first primary breast cancer (PBC) between 2000 and 2018 and aged between 18 and 99 years and followed them up from 6 months after the PBC diagnosis until a SPC event, death, or right censoring, whichever came first. We used flexible parametric survival models adjusting for age and year of PBC diagnosis, ethnicity, PBC tumour stage, comorbidity, and PBC treatments to model the cause-specific hazards of SPC incidence and death according to income deprivation, and then estimated standardised cumulative incidences of SPC by deprivation, taking death as the competing event.
View Article and Find Full Text PDFEquine Vet J
January 2025
Department of Large Animal Diseases and Clinic, Institute of Veterinary Medicine, Warsaw University of Life Sciences (WULS - SGGW), Warsaw, Poland.
Background: The temporomandibular joint (TMJ) is a unique joint that enables mandibular movement. Temporomandibular diseases (TMDs) impair joint function, leading to more or less specific clinical signs.
Objectives: To compile and disseminate clinical data and research findings from existing publications on equine TMD.
Viruses
December 2024
Life Sciences, Health, and Engineering Department, The Roux Institute, Northeastern University, Portland, ME 04101, USA.
JC polyomavirus (JCPyV) establishes a persistent, asymptomatic kidney infection in most of the population. However, JCPyV can reactivate in immunocompromised individuals and cause progressive multifocal leukoencephalopathy (PML), a fatal demyelinating disease with no approved treatment. Mutations in the hypervariable non-coding control region (NCCR) of the JCPyV genome have been linked to disease outcomes and neuropathogenesis, yet few metanalyses document these associations.
View Article and Find Full Text PDFSensors (Basel)
January 2025
Department of Biomedical and Robotics Engineering, Incheon National University, Incheon 22012, Republic of Korea.
With the rise of modern healthcare monitoring, heart rate (HR) estimation using remote photoplethysmography (rPPG) has gained attention for its non-contact, continuous tracking capabilities. However, most HR estimation methods rely on stable, fixed sampling intervals, while practical image capture often involves irregular frame rates and missing data, leading to inaccuracies in HR measurements. This study addresses these issues by introducing low-complexity timing correction methods, including linear, cubic, and filter interpolation, to improve HR estimation from rPPG signals under conditions of irregular sampling and data loss.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!