Medical record linkage is becoming increasingly important as clinical data is distributed across independent sources. To improve linkage accuracy we studied different name comparison methods that establish agreement or disagreement between corresponding names. In addition to exact raw name matching and exact phonetic name matching, we tested three approximate string comparators. The approximate comparators included the modified Jaro-Winkler method, the longest common substring, and the Levenshtein edit distance. We also calculated the combined root-mean square of all three. We tested each name comparison method using a deterministic record linkage algorithm. Results were consistent across both hospitals. At a threshold comparator score of 0.8, the Jaro-Winkler comparator achieved the highest linkage sensitivities of 97.4% and 97.7%. The combined root-mean square method achieved sensitivities higher than the Levenshtein edit distance or long-est common substring while sustaining high linkage specificity. Approximate string comparators increase deterministic linkage sensitivity by up to 10% compared to exact match comparisons and represent an accurate method of linking to vital statistics data.

Download full-text PDF

Source

Publication Analysis

Top Keywords

approximate string
12
string comparators
12
record linkage
8
common substring
8
levenshtein edit
8
edit distance
8
combined root-mean
8
root-mean square
8
linkage
6
real performance
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!