Textual Backdoor Defense via Poisoned Sample Recognition
Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attacks based on data poisoning in existing research is as high as 100%. In order to enhance the natural language processing model’s defense against backdoor attacks, we propose a textual backdoor defense m...
Guardado en:
Autores principales: | , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
MDPI AG
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/539075c4b9b94a4daaa69c6db1972118 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:539075c4b9b94a4daaa69c6db1972118 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:539075c4b9b94a4daaa69c6db19721182021-11-11T15:02:04ZTextual Backdoor Defense via Poisoned Sample Recognition10.3390/app112199382076-3417https://doaj.org/article/539075c4b9b94a4daaa69c6db19721182021-10-01T00:00:00Zhttps://www.mdpi.com/2076-3417/11/21/9938https://doaj.org/toc/2076-3417Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attacks based on data poisoning in existing research is as high as 100%. In order to enhance the natural language processing model’s defense against backdoor attacks, we propose a textual backdoor defense method via poisoned sample recognition. Our method consists of two parts: the first step is to add a controlled noise layer after the model embedding layer, and to train a preliminary model with incomplete or no backdoor embedding, which reduces the effectiveness of poisoned samples. Then, we use the model to initially identify the poisoned samples in the training set so as to narrow the search range of the poisoned samples. The second step uses all the training data to train an infection model embedded in the backdoor, which is used to reclassify the samples selected in the first step, and finally identify the poisoned samples. Through detailed experiments, we have proved that our defense method can effectively defend against a variety of backdoor attacks (character-level, word-level and sentence-level backdoor attacks), and the experimental effect is better than the baseline method. For the BERT model trained by the IMDB dataset, this method can even reduce the success rate of word-level backdoor attacks to 0%.Kun ShaoYu ZhangJunan YangHui LiuMDPI AGarticledeep neural networksnatural language processingadversarial machine learningbackdoor attacksbackdoor defensesTechnologyTEngineering (General). Civil engineering (General)TA1-2040Biology (General)QH301-705.5PhysicsQC1-999ChemistryQD1-999ENApplied Sciences, Vol 11, Iss 9938, p 9938 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
deep neural networks natural language processing adversarial machine learning backdoor attacks backdoor defenses Technology T Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999 |
spellingShingle |
deep neural networks natural language processing adversarial machine learning backdoor attacks backdoor defenses Technology T Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999 Kun Shao Yu Zhang Junan Yang Hui Liu Textual Backdoor Defense via Poisoned Sample Recognition |
description |
Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attacks based on data poisoning in existing research is as high as 100%. In order to enhance the natural language processing model’s defense against backdoor attacks, we propose a textual backdoor defense method via poisoned sample recognition. Our method consists of two parts: the first step is to add a controlled noise layer after the model embedding layer, and to train a preliminary model with incomplete or no backdoor embedding, which reduces the effectiveness of poisoned samples. Then, we use the model to initially identify the poisoned samples in the training set so as to narrow the search range of the poisoned samples. The second step uses all the training data to train an infection model embedded in the backdoor, which is used to reclassify the samples selected in the first step, and finally identify the poisoned samples. Through detailed experiments, we have proved that our defense method can effectively defend against a variety of backdoor attacks (character-level, word-level and sentence-level backdoor attacks), and the experimental effect is better than the baseline method. For the BERT model trained by the IMDB dataset, this method can even reduce the success rate of word-level backdoor attacks to 0%. |
format |
article |
author |
Kun Shao Yu Zhang Junan Yang Hui Liu |
author_facet |
Kun Shao Yu Zhang Junan Yang Hui Liu |
author_sort |
Kun Shao |
title |
Textual Backdoor Defense via Poisoned Sample Recognition |
title_short |
Textual Backdoor Defense via Poisoned Sample Recognition |
title_full |
Textual Backdoor Defense via Poisoned Sample Recognition |
title_fullStr |
Textual Backdoor Defense via Poisoned Sample Recognition |
title_full_unstemmed |
Textual Backdoor Defense via Poisoned Sample Recognition |
title_sort |
textual backdoor defense via poisoned sample recognition |
publisher |
MDPI AG |
publishDate |
2021 |
url |
https://doaj.org/article/539075c4b9b94a4daaa69c6db1972118 |
work_keys_str_mv |
AT kunshao textualbackdoordefenseviapoisonedsamplerecognition AT yuzhang textualbackdoordefenseviapoisonedsamplerecognition AT junanyang textualbackdoordefenseviapoisonedsamplerecognition AT huiliu textualbackdoordefenseviapoisonedsamplerecognition |
_version_ |
1718437600903036928 |