Adversarial training is effective in improving the robustness of deep neural networks. However, existing studies still exhibit significant drawbacks in terms of the robustness, generalization, and fairness of models. In this study, we validate the importance of different perturbation directions (i.e., adversarial and anti-adversarial) and bounds from both theoretical and practical perspectives. The influence of adversarial training on deep learning models in terms of fairness, robustness, and generalization is theoretically investigated under a more general perturbation scope that different samples can have different perturbation directions and varied perturbation bounds. Our theoretical explorations suggest that combining adversaries and anti-adversaries with varied bounds in training can be more effective in achieving better fairness among classes and a better tradeoff among robustness, accuracy, and fairness in some typical learning scenarios compared with standard adversarial training. Inspired by our theoretical findings, a more general learning objective that combines adversaries and anti-adversaries with varied bounds on each training sample is presented. To solve this objective, two adversarial training frameworks based on meta-learning and reinforcement learning are proposed, in which the perturbation direction and bound for each sample are determined by its training characteristics. Furthermore, the role of the combination strategy with varied bounds is explained from a regularization perspective. Extensive experiments under different learning scenarios verify our theoretical findings and the effectiveness of the proposed methodology.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2024.3432973 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!