The paper examines adversarial attacks and defenses in multi-label classification, highlighting how domain knowledge can help identify incoherent predictions caused by these attacks.
By integrating first-order logic constraints into a semi-supervised learning framework, the authors demonstrate that classifiers can reject samples that don't align with the established domain knowledge.
Their findings reveal that even without prior knowledge of specific attacks, domain constraints can effectively detect adversarial examples, suggesting a path toward more resilient multi-label classifiers.