A Black-Box Adversarial Attack via Deep Reinforcement Learning on the Feature Space

Lyue Li, Amir Rezapour, Wen Guey Tzeng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper we propose a novel black-box adversarial attack by using the reinforcement learning to learn the characteristics of the target classifier C. Our method does not need to find a substitute classifier that resembles C with respect to its structure and parameters. Instead, our method learns an optimal attacking policy of guiding the attacker to build an adversarial image from the original image. We work on the feature space of images, instead of the pixels of images directly. Our method achieves better results on many measures. Our method achieves 94.5 % attack success rate on a well-Trained digit classifier. Our adversarial images have better imperceptibility even though the norm distances to original images are larger than other methods. Since our method works on the characteristics of a classifier, it has better transferability. The transfer rate of our method could reach 52.1 % for a targeted class and 65.9% for a non-Targeted class. This improves over previous results of single-digit transfer rates. Also, we show that it is harder to defend our attack by incorporating defense mechanisms, such as MagNet, which uses a denoising technique. We show that our method achieves 65% attack success rate even though the target classifier employs MagNet to defend.

Original languageEnglish
Title of host publication2021 IEEE Conference on Dependable and Secure Computing, DSC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728175348
DOIs
StatePublished - 30 Jan 2021
Event2021 IEEE Conference on Dependable and Secure Computing, DSC 2021 - Aizuwakamatsu, Fukushima, Japan
Duration: 30 Jan 20212 Feb 2021

Publication series

Name2021 IEEE Conference on Dependable and Secure Computing, DSC 2021

Conference

Conference2021 IEEE Conference on Dependable and Secure Computing, DSC 2021
Country/TerritoryJapan
CityAizuwakamatsu, Fukushima
Period30/01/212/02/21

Keywords

  • adversarial attack
  • adversarial defense
  • adversarial example
  • autoen-coder
  • black-box attack
  • deep reinforcement learning
  • image processing
  • machine learning

Fingerprint

Dive into the research topics of 'A Black-Box Adversarial Attack via Deep Reinforcement Learning on the Feature Space'. Together they form a unique fingerprint.

Cite this