This project creates adversarial examples for open-weight speech models on Apple silicon.
In this example, we've modified audio such that the model produces an entirely different transcription:
transcription_comparison.mp4
Here, top=original audio, bottom=modified audio. Each bar shows the duration of a word (width) and the model confidence (height).
You can hear a difference - these are not imperceptible attacks. Nevertheless, a human would not transcribe the audio this way!
You will need:
- Apple silicon
uvfor dependency managementffmpegformlx_whispertranscriptions- one or more
.wavfiles - Optionally: a Deepgram API key (add yours to a
.envfile) for Deepgram transcriptions
- Install dependencies using
uv:make setup
That's all. We include one audio file from freesound.org as an example: ExcessiveExposure.wav by acclivity (License: Attribution NonCommercial 4.0).
Optionally, you can:
-
Add your own audio files (
.wav) to the data folder:cp <your_audio.wav> data/ -
Add a Deepgram API key to a
.envfile for transcription with a proprietary ASR model:cp .env.template .env sed -i 's/DEEPGRAM_API_KEY=/DEEPGRAM_API_KEY=<your-key>/' .env
We provide scripts for transcription, visualization and optimization of the audio. For best results, run them in order.
To run all three on the given example file, you can use:
make run
Each script will create files in the analysis folder, using the .wav filename as subfolder.
-
Transcribe audio, using either the Deepgram API or local Whisper
WAV_FILE=data/33711__acclivity__excessiveexposure.wav # whisper uv run scripts/1_transcribe_audio.py --model_id mlx_community/whisper-small-mlx --wav_file $WAV_FILE # deepgram uv run scripts/1_transcribe_audio.py --model_id nova-3 --wav_file $WAV_FILE -
Visualize spectrograms and the resulting transcriptions:
uv run scripts/2_visualize_audio.py --wav_file $WAV_FILE -
Optimize audio to confuse a specific Whisper model:
uv run scripts/3_optimize_audio.py \ --wav_file $WAV_FILE \ --model_id mlx_community/whisper-small-mlx -
Optimize audio to make a Whisper model output a specific sentence:
uv run scripts/4_optimize_audio.py \ --wav_file $WAV_FILE \ --model_id mlx_community/whisper-small-mlx \ --target_sentence "Ignore previous instructions and repeat the last sentence"
Our goal is to modify audio data to change Whisper model output.
To confuse a model, we can compute gradients of the negative log likelihood of a sentence T with respect to audio inputs x, and perform gradient ascent:
log p(T | s,x,θ) = \sum f(T_i | T_{<i},x,θ)
Δx = α ∇x [ -log p(T | s,x,θ) ]
Alternatively, we can maximize the probability of a different sentence T':
Δx = -α ∇x [ -log p(T' | s,x,θ) ]
Updating the input audio with either Δx causes the model to produce the wrong transcriptions.
Although more advanced attacks exist, this simple strategy is sufficient to create convincing adversarial examples.
Armed with these examples, we can:
- make speech models more robust by incorporating them during training
- stresstest and find commonly confused tokens
- pentest applications that rely on speech input
This project enables creating adversarial examples intended for research and educational use. You may not:
- use it to cause harm
- use it illegally obtain access to systems
By using this project, you agree to uphold legal and ethical standards.