Michael Picheny, Zoltan Tuske, et al.
INTERSPEECH 2019
Fooling deep neural networks with adversarial input have exposed a significant vulnerability in the current state-of-the-art systems in multiple domains. Both black-box and white-box approaches have been used to either replicate the model itself or to craft examples which cause the model to fail. In this work, we propose a framework which uses multi-objective evolutionary optimization to perform both targeted and un-targeted black-box attacks on Automatic Speech Recognition (ASR) systems. We apply this framework on two ASR systems: Deepspeech and Kaldi-ASR, which increases the Word Error Rates (WER) of these systems by upto 980%, indicating the potency of our approach. During both un-targeted and targeted attacks, the adversarial samples maintain a high acoustic similarity of 0.98 and 0.97 with the original audio.
Michael Picheny, Zoltan Tuske, et al.
INTERSPEECH 2019
Vibha Singhal Sinha, Senthil Mani, et al.
MSR 2013
Gakuto Kurata, Kartik Audhkhasi
INTERSPEECH 2019
Richard Goodwin, Pietro Mazzoleni, et al.
SRII 2012