Data in the medical field often contain missing values and may result in biased research results. Therefore, the objective of this work is to propose a new imputation method, a novel weighted distance threshold method, to impute missing values. After several experiments, we find that the proposed imputation method has the following benefits. (1) The proposed method with purity can reassign instances into the nearest class of the dataset, and the purity computation can filter outliers; (2) The proposed method redefines the degree of missing values and can determine attributes and instances relative to the missing values in different datasets; and (3) The proposed method need not set the k value of the nearest neighborhood because this study identifies the k value based on the best threshold to calculate purity to enhance the results of imputation. In addition, the distance threshold can adjust the optimal nearest neighborhood to estimate missing values. This study implements several experiments to compare the proposed method with other imputation methods using different missing types, missing degrees, and types of datasets. The results indicate that the proposed imputation method is better than the listed methods. Moreover, this study uses the stroke dataset from the International Stroke Trial (IST) to verify whether the proposed method can be effectively applied in practice, and the results show that the proposed method achieves 90% accuracy in the Stroke dataset.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.compbiomed.2020.103824 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!