An Improved Chinese String Comparator for Bloom Filter Based Privacy-Preserving Record Linkage.

Entropy (Basel)

Department of Mathematics and Statistics, College of Science, Huazhong Agricultural University, Wuhan 430070, China.

Published: August 2021

With the development of information technology, it has become a popular topic to share data from multiple sources without privacy disclosure problems. Privacy-preserving record linkage (PPRL) can link the data that truly matches and does not disclose personal information. In the existing studies, the techniques of PPRL have mostly been studied based on the alphabetic language, which is much different from the Chinese language environment. In this paper, Chinese characters (identification fields in record pairs) are encoded into strings composed of letters and numbers by using the SoundShape code according to their shapes and pronunciations. Then, the SoundShape codes are encrypted by Bloom filter, and the similarity of encrypted fields is calculated by Dice similarity. In this method, the false positive rate of Bloom filter and different proportions of sound code and shape code are considered. Finally, we performed the above methods on the synthetic datasets, and compared the precision, recall, F1-score and computational time with different values of false positive rate and proportion. The results showed that our method for PPRL in Chinese language environment improved the quality of the classification results and outperformed others with a relatively low additional cost of computation.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8394278PMC
http://dx.doi.org/10.3390/e23081091DOI Listing

Publication Analysis

Top Keywords

bloom filter
12
privacy-preserving record
8
record linkage
8
chinese language
8
language environment
8
false positive
8
positive rate
8
improved chinese
4
chinese string
4
string comparator
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!