Data linkage involves connecting records referring to the same entity across different data sources, essential for research in fields like public health.
When unique identifiers are lacking, probabilistic methods are used to assess similarities between records, which requires careful selection of attributes and metrics.
The paper introduces AtyImo, a hybrid probabilistic linkage tool that shows high accuracy (93%-97% true matches) and can efficiently process large datasets, linking 114 million individuals in Brazil in under nine days.