Generating Adversarial Examples with Adversarial Networks [ICLR 18 Under Review]

Generating Adversarial Examples with Adversarial Networks [ICLR 18 Under Review]

2017, Dec 08    

One Line Summary

  • A targeted adversarial perturbation generation using a generative model.

Motivation

  • Adversarial perturbations are shown to fool the state-of-the-art networks, most of the approaches for generating the perturbations use gradient based optimization, this approach uses the neural networks for generating the adversarial perturbations.

Detailed Summary

  • AdvGAN can attack black-box models by training a distilled model. We propose to dynamically train the distilled model with query information and achieve high black-box attack success rate and targeted black-box attack, which is difficult to achieve for transferability-based black-box attacks.

  • AdvGAN on M ˛adry et al.’s MNIST challenge (2017a) achieved 88.93% accuracy on the published robust model in the semi-whitebox setting and 92.76% in the blackbox setting, which wins the top position in the challenge .

Novelty and Contributions

  • Different from the previous optimization-based methods, a conditional adversarial network to directly produce adversarial examples which are both perceptually realistic and achieve state-of-the-art attack success rate against different target models.

  • We use the state-of-the-art defense methods to defend against adversarial examples and show that AdvGAN achieves much higher attack success rate under current defenses.

Network Details

Network

  • Generator takes the input the image and outputs the corresponding perturbaiton.
  • image added to the perturbaiton fools the target classifier.
  • Results

    Results

Results

Authors

Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, Dawn Song

Sources

Paper

Code