Textual Backdoor Defense via Poisoned Sample Recognition

Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attacks based on data poisoning in existing research is as high as 100%. In order to enhance the natural language processing model’s defense against backdoor attacks, we propose a textual backdoor defense m...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Kun Shao, Yu Zhang, Junan Yang, Hui Liu
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
T
Acceso en línea:https://doaj.org/article/539075c4b9b94a4daaa69c6db1972118
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:539075c4b9b94a4daaa69c6db1972118
record_format dspace
spelling oai:doaj.org-article:539075c4b9b94a4daaa69c6db19721182021-11-11T15:02:04ZTextual Backdoor Defense via Poisoned Sample Recognition10.3390/app112199382076-3417https://doaj.org/article/539075c4b9b94a4daaa69c6db19721182021-10-01T00:00:00Zhttps://www.mdpi.com/2076-3417/11/21/9938https://doaj.org/toc/2076-3417Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attacks based on data poisoning in existing research is as high as 100%. In order to enhance the natural language processing model’s defense against backdoor attacks, we propose a textual backdoor defense method via poisoned sample recognition. Our method consists of two parts: the first step is to add a controlled noise layer after the model embedding layer, and to train a preliminary model with incomplete or no backdoor embedding, which reduces the effectiveness of poisoned samples. Then, we use the model to initially identify the poisoned samples in the training set so as to narrow the search range of the poisoned samples. The second step uses all the training data to train an infection model embedded in the backdoor, which is used to reclassify the samples selected in the first step, and finally identify the poisoned samples. Through detailed experiments, we have proved that our defense method can effectively defend against a variety of backdoor attacks (character-level, word-level and sentence-level backdoor attacks), and the experimental effect is better than the baseline method. For the BERT model trained by the IMDB dataset, this method can even reduce the success rate of word-level backdoor attacks to 0%.Kun ShaoYu ZhangJunan YangHui LiuMDPI AGarticledeep neural networksnatural language processingadversarial machine learningbackdoor attacksbackdoor defensesTechnologyTEngineering (General). Civil engineering (General)TA1-2040Biology (General)QH301-705.5PhysicsQC1-999ChemistryQD1-999ENApplied Sciences, Vol 11, Iss 9938, p 9938 (2021)
institution DOAJ
collection DOAJ
language EN
topic deep neural networks
natural language processing
adversarial machine learning
backdoor attacks
backdoor defenses
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
spellingShingle deep neural networks
natural language processing
adversarial machine learning
backdoor attacks
backdoor defenses
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
Kun Shao
Yu Zhang
Junan Yang
Hui Liu
Textual Backdoor Defense via Poisoned Sample Recognition
description Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attacks based on data poisoning in existing research is as high as 100%. In order to enhance the natural language processing model’s defense against backdoor attacks, we propose a textual backdoor defense method via poisoned sample recognition. Our method consists of two parts: the first step is to add a controlled noise layer after the model embedding layer, and to train a preliminary model with incomplete or no backdoor embedding, which reduces the effectiveness of poisoned samples. Then, we use the model to initially identify the poisoned samples in the training set so as to narrow the search range of the poisoned samples. The second step uses all the training data to train an infection model embedded in the backdoor, which is used to reclassify the samples selected in the first step, and finally identify the poisoned samples. Through detailed experiments, we have proved that our defense method can effectively defend against a variety of backdoor attacks (character-level, word-level and sentence-level backdoor attacks), and the experimental effect is better than the baseline method. For the BERT model trained by the IMDB dataset, this method can even reduce the success rate of word-level backdoor attacks to 0%.
format article
author Kun Shao
Yu Zhang
Junan Yang
Hui Liu
author_facet Kun Shao
Yu Zhang
Junan Yang
Hui Liu
author_sort Kun Shao
title Textual Backdoor Defense via Poisoned Sample Recognition
title_short Textual Backdoor Defense via Poisoned Sample Recognition
title_full Textual Backdoor Defense via Poisoned Sample Recognition
title_fullStr Textual Backdoor Defense via Poisoned Sample Recognition
title_full_unstemmed Textual Backdoor Defense via Poisoned Sample Recognition
title_sort textual backdoor defense via poisoned sample recognition
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/539075c4b9b94a4daaa69c6db1972118
work_keys_str_mv AT kunshao textualbackdoordefenseviapoisonedsamplerecognition
AT yuzhang textualbackdoordefenseviapoisonedsamplerecognition
AT junanyang textualbackdoordefenseviapoisonedsamplerecognition
AT huiliu textualbackdoordefenseviapoisonedsamplerecognition
_version_ 1718437600903036928