In black-box scenarios, adversarial attacks against text classification models face challenges in ensuring highly available adversarial samples, especially a high number of invalid queries under long texts. The existing methods select distractors by comparing the confidence vector differences obtained before and after deleting words, and the query increases linearly with the length of the text, making it difficult to apply to attack scenarios with limited queries. Generating adversarial samples based on a thesaurus can lead to semantic inconsistencies and even grammatical errors, making it easy for the target model to recognize adversarial samples and resulting in a low success rate of attacks.
View Article and Find Full Text PDF