Generating Adversarial Examples with Adversarial Networks [ICLR 18 Under Review]

One Line Summary

Adversarial perturbations are shown to fool the state-of-the-art networks, most of the approaches for generating the perturbations use gradient based optimization, this approach uses the neural networks for generating the adversarial perturbations.

AdvGAN can attack black-box models by training a distilled model. We propose to dynamically train the distilled model with query information and achieve high black-box attack success rate and targeted black-box attack, which is difficult to achieve for transferability-based black-box attacks.
AdvGAN on M ˛adry et al.’s MNIST challenge (2017a) achieved 88.93% accuracy on the published robust model in the semi-whitebox setting and 92.76% in the blackbox setting, which wins the top position in the challenge .

Different from the previous optimization-based methods, a conditional adversarial network to directly produce adversarial examples which are both perceptually realistic and achieve state-of-the-art attack success rate against different target models.
We use the state-of-the-art defense methods to defend against adversarial examples and show that AdvGAN achieves much higher attack success rate under current defenses.

Network

Results

Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, Dawn Song