Skip directly to content

It is double pleasure to deceive the deceiver: disturbing classifiers against adversarial attacks

on Wed, 08/26/2020 - 23:58
TitleIt is double pleasure to deceive the deceiver: disturbing classifiers against adversarial attacks
Publication TypeConference Proceedings
Year of Conference2020
AuthorsZago JG, Antonelo EA, Baldissera FL, Saad RT
Conference Name2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
Pagination160-165
Abstract

Convolutional neural networks (CNNs) for image classification can be fragile to small perturbations in the images they ought to classify. This fragility exposes CNNs to malicious attacks, resulting in safety concerns in many application domains. In this paper, we propose a simple yet efficient strategy for decreasing the effectiveness of black-box attacks that need to sequentially query the classifier network in order to build an attack. The general idea consists of applying controlled random disturbances (noise) at the softmax output layer of neural network classifiers, changing the confidence scores according to a set of design requirements. To evaluate this defense strategy, we employ a CNN, trained on the MNIST data set, and attack it with a black-box attack method from the literature called ZOO. The results show that our defense strategy: a) decreases the attack success rate of the adversarial examples; and b) forces the attack algorithm to insert larger perturbations in the input images.

DOI10.1109/SMC42975.2020.9282889