A Textual Backdoor Defense Method Based on Deep Feature Classification.

Entropy (Basel)

College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China.

Published: January 2023

Natural language processing (NLP) models based on deep neural networks (DNNs) are vulnerable to backdoor attacks. Existing backdoor defense methods have limited effectiveness and coverage scenarios. We propose a textual backdoor defense method based on deep feature classification. The method includes deep feature extraction and classifier construction. The method exploits the distinguishability of deep features of poisoned data and benign data. Backdoor defense is implemented in both offline and online scenarios. We conducted defense experiments on two datasets and two models for a variety of backdoor attacks. The experimental results demonstrate the effectiveness of this defense approach and outperform the baseline defense method.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955932PMC
http://dx.doi.org/10.3390/e25020220DOI Listing

Publication Analysis

Top Keywords

backdoor defense
16
defense method
12
based deep
12
deep feature
12
textual backdoor
8
method based
8
feature classification
8
backdoor attacks
8
defense
7
method
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!