Skip directly to content

It is double pleasure to deceive the deceiver: disturbing classifiers against adversarial attacks

on Wed, 08/26/2020 - 23:58
TitleIt is double pleasure to deceive the deceiver: disturbing classifiers against adversarial attacks
Publication TypeConference Proceedings
Year of ConferenceIn Press
AuthorsZago JG, Antonelo EA, Baldissera FL, Saad RT
Conference NameIEEE International Conference on Systems, Man and Cybernetics (SMC 2020)
Abstract

Convolutional neural networks (CNNs) for image classification can be fragile to small perturbations in the images they ought to classify. This fragility exposes CNNs to malicious attacks, resulting in safety concerns in many application domains. In this paper, we propose a simple yet efficient strategy for decreasing the effectiveness of black-box attacks that need to sequentially query the classifier network in order to build an attack. The general idea consists of applying controlled random disturbances (noise) at the softmax output layer of neural network classifiers, changing the confidence scores according to a set of design requirements. To evaluate this defense strategy, we employ a CNN, trained on the MNIST data set, and attack it with a black-box attack method from the literature called ZOO. The results show that our defense strategy: a) decreases the attack success rate of the adversarial examples; and b) forces the attack algorithm to insert larger perturbations in the input images.