DEF CON CTF 2021

A week before the game, Tea Deliverers split up into four sub-teams and held an inner competition using some of the past challenges with some minor tweaks. I was the only one doing the KoH challenge in our team, and I was doing fairly okay, so I decided to be a KoH player for the finals. I was sure the challenges would keep me busy whilst not stress me out during the game, as there would be one and only one completely new challenge each day.

Turned out I was right. I enjoyed each and every King of the Hill challenge, and I got enough rest during the time off when it’s daytime in China, very healthy lifestyle indeed.

zero-is-you

zero-is-you-score

This was a copy of the game Baba is You. What’s nice about the original game is that it’s already Turing complete, and just to add shellcode execution onto it, woah, you just opened the gate to another dimension. (insert mind-blown meme pic here)

For those without prior experience in Baba is You, the game is a Sokoban-kind puzzle game, where the player can push different blocks into different places. There are some famous modern development of this classic genre, like A Good Snowman Is Hard To Build or Stephen’s Sausage Roll. The twist with Baba is You is that the blocks themselves can be a part of an instruction that tells how the game world functions. I will explain this more in the following.

I really recommend you trying out Baba is You, but for a smol brain like me, I can’t finish the game without referencing a walkthrough, sad.

Extract Game Code

Within the provided package game_client.tgz, there were a few files, README, start, sync, and a build folder. README told us how the game worked. start was a shell script that simply run zero-is-you binary within the build folder.

For sync, since we needed to run it by python3, it must be a Python program. It was not a text file, so it must be a compiled Python byte-code like .pyc. Using an existing decompiler like rocky/python-decompile3, we were able to decompile sync into a Python source code. There’s nothing interested about sync, it just uploads and downloads to and from the server. As for what data it actually transmitted, we’ll get back to it in a sec.

After @riatre gave us the hint that the entire game is written in Python, as a Game Data-Mine Professional™, without playing the game first, I dove directly into extracting the game data. A quick strings of the zero-is-you elf binary showed things like _MEIPASS2, and a quick Google search told us that this was a program packed by PyInstaller. Using an existing extractor extremecoders-re/pyinstxtractor, we were able to extract every .pyc from the binary. And using the aforementioned decompiler, we could turn these bytecodes back into their .py form. Although there were some decompile errors, we were getting most part of the game in source code form.

Once we had the extracted game, my teammates could run the game on Windows and MacOS without problems. We also implemented custom functionalities to the game, such as undoing a move, which was just to replay the moves. The decompiled source code also helped us figure out the format of level files and rules of the game, which I will talk about later.

Ready Player One

OOO’s internet is down so we were not able to use sync to communicate with the server, but they kindly sent us the level1 file so that we can look into this level first.

With some prior experience with Baba is You, I immediately knew what’s going on. On the top we had two “rules,” zero is you and ice is stop. The former meant that we were currently in control of zero, and the latter meant that everything should stop in front of ice. So zero referred to the little hacker boy icon in the middle of the screen, this could be easily checked by moving zero using arrow keys; ice referred to ice blocks surrounding the middle part of the screen, and a line of ice blocks at the top.

Any such “rule” consisted of three blocks that looked like [xxx] [is] [yyy] reading from left to right or top to bottom would be enforced in the game world. You could make a new rule by pushing the blocks around and putting them together, or break existing rules by splitting the blocks up. The special rule is that [noun] is you will make the player able to control that [noun] thing on the board, just like Baba is You.

However, the processor-looking block and a line of hex values at the top of the screen was something not in Baba is You. With a further look into the source code and a few trial-and-errors with the blocks, we figured out how the entire game works:

Shellcode Mechanism

The entire screen we saw, which was a $25 \times 15$ space of blocks, was actually the memory of the machine, meaning the address of the machine ranged from 0x0 to 0x176. The memory read from left to right, so the layout was

\[\begin{matrix} \texttt{0x00} & \texttt{0x01} &\cdots &\texttt{0x18} \\ \texttt{0x19} &\texttt{0x20}&\cdots \\ \vdots & \vdots &\ddots & \\ \texttt{0x15e} &\texttt{0x25f}& \cdots & \texttt{0x176} \end{matrix}\]

The hex bytes displayed on those blocks were then the values on that memory address. If the value is zero, then that block would be displayed as empty. However, you could switch the display style by pressing ` (backquote), which let you hide the game block and display the empty zero values.

The block that looked like a processor, which is on the top-left corner of the game, would act as a program counter (or instruction pointer) that would execute the memory, if and only if an architecture was specified (cpu is [arch] was on the board) and cpu is run was on the board at the same time. If cpu is run was on the board but no architecture was specified, a segfault would be thrown.

When cpu is run was on the board, every time the player made a move, either using arrow keys or the Space key to stay where you were, the machine would execute the instruction at the program counter (location of the processor block).

The level would be cleared when a syscall to SYS_EXECV was invoked with the argument being either /bin/sh or /bin/bash after sanitized by os.path.abspath in Python. For example, for x64 arch, that means if we could put an address to a string /bin/sh in rdi and 0x3b in al, and then invoke a syscall, we could pass the level.

Level File Decryption

The level file decryption was inside the file utils.py after unpacking and decompiling. It looked something like this:

def load_level_data(filename):
    key = b'zero'
    l = len(key)
    with open(filename, 'rb') as (f):
        data = f.read()[3:]
        compressed = bytearray((data[i] ^ key[(i % l)] for i in range(0, len(data))))
        data = zlib.decompress(compressed)
    return data.decode('utf8')

Figured this out, we could view the level file in its clear text form. Level 1’s file after decryption looked like this:

Ready Player One
W wall
w _WALL
I ice
i _ICE
Z zero
s _is
t _STOP
z _ZERO
m _YOU
1 cpu
c _CPU
x _x64
b _BUG
r _RUN
B bug

1........................  90 48 83 c4 50 50 48 bb 2f 62 69 6e 2f 2f 73 68 53 54 5f b0 3b 0f 05
IIIIIIIIIIIIIIIIIIIIIIIII  
zsm...................ist
.........................
.......IIIIIIIIIII.......
.......I.........I.......
.......I....s....I.......
.......I...cxr...I.......
.......I....s....I.......
.......I.........I.......
.......I....Z....I.......
.......IIIIIIIIIII.......
.........................
.........................
.........................
.........................

The first line says the title name of the level. The following lines before an empty line says what each character meant in the game board representation down below. And then we have a $25 \times 15$ board made out of ASCII characters, each line followed by a maximum of $25$ hex values, representing the memory values on the board. If the line is trailing with 00s, then those values are omitted.

We could build custom levels as well by reversing what the function does. We could also alter the existing levels to make testing out solutions much easier.

Playing the Game

Now going back to the first level, there was only one cpu block and one arch x64 we could use, meaning we had to somehow first set cpu is x64 and then set cpu is run without breaking the first one. Recall that the rules were parsed from left to right AND from up to down. Therefore, we could put cpu at the upper-left corner, so it could be used by two rules at the same time, something like this:

\[\begin{matrix} \texttt{cpu} & \texttt{is} & \texttt{x64}\\ \texttt{is}\\ \texttt{run} \end{matrix}\]

After I did that and moved around to execute the instructions, we cleared the level. But before we go to the next level, let’s first take a look at the shellcode for the first level:

0:  90                      nop
1:  48 83 c4 50             add    rsp,0x50
5:  50                      push   rax
6:  48 bb 2f 62 69 6e 2f    movabs rbx,0x68732f2f6e69622f
d:  2f 73 68
10: 53                      push   rbx
11: 54                      push   rsp
12: 5f                      pop    rdi
13: b0 3b                   mov    al,0x3b
15: 0f 05                   syscall

As we can see, this is a pretty normal shellcode that does a syscall with al being 0x3b and rdi points to a string in the memory that is /bin//sh. The extra slash does not matter because it would be sanitized out by os.path.abspath in Python.

Flatline

This time we had a x86 machine, and a quick disassembling told us

0:  83 c4 20                add    esp,0x20
3:  50                      push   eax
4:  68 2f 2f 73 68          push   0x68732f2f
9:  68 2f 62 69 6e          push   0x6e69622f
e:  89 e3                   mov    ebx,esp
10: 89 c1                   mov    ecx,eax
12: 89 c2                   mov    edx,eax
14: b0 0b                   mov    al,0xb
16: cd 80                   int    0x80

That we had a completely valid shellcode here already, so the only thing we needed to do is to run cpu. The only cpu is run was at the left side of the game board, so we had to get there. However, we were surrounded by a wall of virus blocks, and virus had two rules enforced on them at the start of the game: virus is stop and virus is kill. We could only disable virus is stop by pushing away any of the blocks in that rule, but after that, when we tried to step onto a virus block, we would be killed immediately with a “game over” screen.

It seemed impossible to disable the rule without leaving the “virus jail,” but we had to leave the jail first to move the block away, so it’s a paradox! However, this was just another famous trick in Baba is You, where you realize that the “kill” command only applies to the player but not other things in the game world, so other blocks can move freely through the virus block, assuming virus is stop is not in effect.

The solution then is something like this:

zero-lvl02-sol

Where you first line up the blocks at the bottom horizontally, then push the blocks from the right so the blocks pushes each other and extends out, until they reach virus is stop and split that rule. After that, our Zero can walk pass through the virus blocks and push together cpu is run to finish the game. This strategy came in handy in the later part of the game as well.

ICE Crash

For this level, we had cpu is x64 and cpu is run already in effect. However, if we just wander around and let the program executes, the processor would move to the next line and hits the ice, resulting in a segfault, because ice is stop was also in effect.

Let’s then look at this level’s shellcode

0:  48 83 c4 18             add    rsp,0x18
...
c:  50                      push   rax
...
19: 48 bb 2f 62 69 6e 2f    movabs rbx,0x68732f2f6e69622f
20: 2f 73 68
23: 53                      push   rbx
24: 54                      push   rsp
25: 5f                      pop    rdi
...
32: b0 3b                   mov    al,0x3b
34: 0f 05                   syscall

The zeros were hidden since 00 00 was very similar to nop in our case. This was a normal shellcode, and if we could execute the entirety of this shellcode, we could pass this level. Therefore, we must had somehow made the processor execute the program without being stoped.

It looked like there were two ice is stop on the board right now, so we could just break them up and that should solve the problem, right? A few tries told us that we just didn’t have enough time to break the blocks up before the processor reached an ice block.

Notice that for all the levels before, we had most of the rules formed in a way that looks like [noun] is [verb], but it’s possible to turn something into something else by setting up a rule as [noun] is [noun]. Looking at the board, we had zero is you exposed to the player. That meant it was possible to form ice is zero or ice is you in the game.

In fact, both solutions work: ice is zero would turn the ice into a bunch of Zero so the processor would not be stopped since the ice blocks are gone. And ice is you would make the player be able to control the ice so we could move the ice blocks out of the way.

High-speed Pizza Delivery

Here we had cpu is x64 and ice is stop at the bottom of the screen where we were unable to reach. And we had a bunch of new blocks that looked like different registers. Ignoring that first and putting cpu is run together, we could see the processor reached the end of the shellcode and then a segfault was thrown. Let’s take a look at the shellcode then:

0:  90                      nop
1:  90                      nop
2:  48 83 c4 50             add    rsp,0x50
6:  48 bb 2f 62 69 6e 2f    movabs rbx,0x68732f2f6e69622f
d:  2f 73 68
10: 50                      push   rax
11: 53                      push   rbx
12: 54                      push   rsp
13: 58                      pop    rax
14: b0 3b                   mov    al,0x3b
16: 0f 05                   syscall

Huh, everything looked normal, though with a closer look, we could see that the address of the string was stored at rax instead of the supposed rdi register. Then, combined with the newly appeared blocks that had register names on them, it came to us that we needed to make rax is rdi so that when the machine was executing pop rax, it was actually executing pop rdi.

However, if we first formed rax is rdi and let the program run, we still would get a segfault. Then we tried a lot of things, like using rdi is rax instead. One of my teammates finally realized that we might only want the rule to be in effect when we were executing that single instruction pop rax, so he pushed cpu is run vertically, let it run to address 0x13, stopped the run by splitting the blocks, went to the right part and formed rdi is rax, came back and continued the run for 1 tick, and then disabled that rule, came back to run the last syscall, and finally we got it working.

Ten Thousands Steps

Well, the title said ten thousands steps, but if you actually used more than 1000 steps, you would get a screen saying you have run out of time.

We were again locked up in a jail, made out of ice blocks this time. However, there’s a lock block and lock is push inside the jail, so we could simply push from left to right to enable that rule and walk out of the jail. However, if we just went straight to cpu is run, the processor would inevitably stumble onto an ice block and stops running (throwing a segfault).

With a closer look at the exiting shellcode on the board, I realized we actually had every instruction we needed on the board, but they were just scattered around the memory. Looking at what we had on the board, I came up with a solution:

Enable lock is push to escape the jail.
Push cpu is run all into the jail so that it can form cpu is push horizontally and cpu is run vertically at the same time. Notice that since push is at the right most edge of the board, it is impossible to push it out.
Execute the following instructions one by one, by pushing the processor block to the start of that instruction, go back into the jail to enable cpu is run, and immediately disable it so the machine will run exactly one instruction. Go back to push the processor block to the head of the next instruction we want to execute, and repeat the process.
1. 48 83 c4 2a (add rsp,0x2a) to set up the stack.
2. 48 bf 2f 62 69 6e 2f 2f 73 68 (movabs rdi,0x68732f2f6e69622f) to store the string into rdi.
3. 57 (push rdi) to push the string onto the stack.
4. 54 (push rsp) to store the string address onto the stack.
5. 5f (pop rdi) to pop the string address back into rdi.
6. 6a 3b (push 0x3b) to push the syscall number onto the stack.
7. 58 (pop rax) to pop the syscall number back into rax.
8. ……

Wait a minute, where’s the syscall instruction? Then my teammate hinted me, it’s actually hidden under an ice block. So a look at the game level file, it really was hidden under the bottom line of ice cubes. This meant that we would have to change our strategy up a bit.

So my solution was, once we escaped the jail, we could push lock text block all the way down to the left-bottom side, and use the is block out there to form ice is lock so all the ice blocks will turn into lock blocks. Then, pushing the lock text block all the way back to form lock is push, so we could push away the jail wall and reveal the syscall instruction bytes. Then we could continue our plan from step 2.

In the end, we just needed to push the processor block to the revealed 0f 05 (syscall) and let the machine run one last time. In the end it took about 750 steps for me to beat this level by hand.

The Elegant Mantis

This level really juiced my brain. The two blocks with the “recycle” symbol drawn on them would flip the memory in respect to the game board when the player stepped on them. The left one would flip the memory horizontally (the center row wouldn’t change); the right one would flip vertically (the center column wouldn’t change). Therefore, we had four potential shellcodes: one original, one flipped horizontally, one flipped vertically, and one flipped diagonally.

I didn’t solve this level, so I will skip what I tried during the game and talk about how my teammate solved it.

Let’s first analyze the original shellcode:

0:  be 16 00 00 00          mov    esi,0x16
5:  bb 00 00 00 00          mov    ebx,0x0
a:  eb 05                   jmp    0x11
c:  01 db                   add    ebx,ebx
e:  83 c6 01                add    esi,0x1
11: 83 fe 30                cmp    esi,0x30
14: 76 f6                   jbe    0xc
16: 83 ec 90                sub    esp,0xffffff90
19: 08 53 76                or     BYTE PTR [ebx+0x76],dl
1c: f0 83 c4 12             lock add esp,0x12
20: 89 c7                   mov    edi,eax
22: 01 d8                   add    eax,ebx
24: e8 22 5a 55 90          call   0x90555a4b
29: 00 31                   add    BYTE PTR [ecx],dh
2b: c0 81 ea 18 21 4a 33    rol    BYTE PTR [ecx+0x4a2118ea],0x33
32: eb 0d                   jmp    0x41
34: 85 c0                   test   eax,eax
36: 54                      push   esp
37: 83 c4 21                add    esp,0x21
3a: 39 cb                   cmp    ebx,ecx
3c: 75 e8                   jne    0x26
3e: 5d                      pop    ebp
3f: 58                      pop    eax
40: 76 fe                   jbe    0x40
42: 01 db                   add    ebx,ebx
44: 83 ec 01                sub    esp,0x1
47: 01 fc                   add    esp,edi
49: 41                      inc    ecx
4a: 41                      inc    ecx
4b: b8 10 00 00 00          mov    eax,0x10
50: bb 2f 62 69 6e          mov    ebx,0x6e69622f
55: 89 18                   mov    DWORD PTR [eax],ebx
57: bb 2f 73 68 00          mov    ebx,0x68732f
5c: 89 58 04                mov    DWORD PTR [eax+0x4],ebx
5f: 89 c3                   mov    ebx,eax
61: 31 c9                   xor    ecx,ecx
63: 29 d2                   sub    edx,edx
65: b0 0b                   mov    al,0xb
67: cd 80                   int    0x80

If we just let this shellcode run, the machine would eventually throw a segfault at address 0x24, because the calling address was out of bounds. Taking a closer look at the end of the program, from address 0x4b to 0x67, we saw it actually set up the register and memory appropriately. That meant we could clear this level if we could let the program counter somehow reach there.

Therefore, starting from the very beginning, there were four actions we could do at every tick of the game: either to run the CPU, or flip the memory horizontally, or flip the memory vertically, or stop the CPU from running. We had a search algorithm to help us explore all different paths to take, via the help of unicorn, to decide whether to continue searching down the path or stop because of a segfault.

Finally we were able to find a path of shellcode that led to the desired instruction address mentioned above. Converting that to the list of movements in the game, and we were done with the level.

(Consensual) Hallucination

The shellcode in this level was just an infinite loop and contained no actual string of “/bin/bash” or any sort, so we could ignore it. Looking at the board, we had bug is bash, which meant that we could pass this level if we were able to touch the bug icon at the upper-right corner of the game.

This was yet another core mechanism in Baba is You, and since this time the shellcode was not important anymore, it was truly a game level. Not much to talk about then, we played it manually and found a solution:

Utilizing lock bind disk, we can push the floppy disk to the right side of the virus wall, while keeping the lock on the left side.
Use the floppy disk as a hook to get cpu is run back to the left side of the virus wall.
Use the bind mechanism to put the floppy disk on the bug icon and make it stay there by breaking lock bind disk.
Line up disk is at the bottom in front of the virus wall, and put all the remaining blocks to the left of them.
Push from left to right so the blocks pass through the virus wall and connect the zero text block on the right hand side.

This would form a new rule disk is zero so the floppy disk block turned into Zero. Since the floppy disk was on the bug icon, the Zero was then on the bug so we successfully passed this level.

UpWind

A lot of new mechanisms in this level.

There was a new portal-like block, which corresponded to the nrg text block (although I have no idea what nrg means) if nrg is edit was in effect and the player stood on one of the portal-like blocks, a pop-up would appear and the game would allow you to input a hex value. The value would then be written into that memory.

And there was a new “fan” mechanism, where if the fan is wind was in effect, all pushable blocks on the same row as the fan would all move left for one block, if possible.

We tried a lot of things and then had some basic understandings:

We had to go to the bottom of the level and trigger cpu is run, however
It was impossible to bring the processor into the wind tunnel because it would stick at the end and not allow us to pass through it, and
It was also impossible to use the processor at the upper part of the level, because there weren’t enough portals for us to input a valid shellcode, and we couldn’t even use jmp cause that requireed two bytes, but
there was nrg text block at the bottom of the level, so we could turn nrg into a CPU.

So our idea was to push all the portal blocks into the wind tunnel, input a shellcode using the edit function, and leave the leftmost one untouched so that we can turn it into a CPU later.

However, we ignored one of the largest problems: there’s a wind blowing in the tunnel and it would blow the CPU to the left one block each tick. Quickly, we came up with two potential solutions for this:

We write shellcode such that even if the processor was blown back by one block, it would still run without issues. That meant the ending byte of the previous instruction would be the start of the next instruction.
We write shellcode in reverse from right to left, so the wind would help us blow the CPU to the correct position. Since we could control when the CPU runs, we could execute one instruction when we saw the CPU reached a certain position, and then to stop the execution, and wait until it reaches the next instruction on the left.

The first one required some real hardcore shellcode technique, and the second one required precision timing, both were not easy to do. Therefore, we splitted into two teams, one trying the first solution and another trying the second. Eventually the first team had their answers out, and I had no idea how they did it.

Code Choreography

This level was very similar to level 5 Ten Thousands Steps.

A closer look at the existing shellcode on the board, we realized that we had everything except for cd 80 (int 0x80). There was cd 81 already on the board, so all we needed to do is to turn that 81 into 80, the question was, how?

We disassembled the entire memory at each offset and tried to find a combination of these that may work:

We first took a look at address 0xd1, which was dec DWORD PTR [ecx+0xc1], so we thought that we have to somehow put 0x32 into ecx such that they sum up to be 0xf3, which was the memory address of that byte 81 we wanted to decrease. There were inc eax and mov ecx, eax on the board, so my teammate wrote a script that could automatically repeat the process of controlling the character to push the processor to one location, come back and execute that instruction. However, this quickly exceeded the 1000 steps limit, so we had to look for something else.

We then turned our eyes to the other dec on the board at address 0x9e. Immediately we saw mov bl, BYTE PTR [eax] and mov BYTE PTR [eax], bl around it. After more digging, we found another really interesting instruction or eax, DWORD PTR [eax+0x40] at address 0xec. With some brute-forcing, we found that there’s an instruction mov al, 0x61, and at exactly address 0xa1 we had a value 0x90. ORing 0x90 with 0x61 we get 0xf1. Then all we need to do is to add eax twice, and viola, we find a way to decrease the value at 0xf3.

Together the chain to decrease the value looked like this:

e6:   b0 61                   mov    al, 0x61
ec:   0b 40 40                or     eax, DWORD PTR [eax+0x40]
ed:   40                      inc    eax
ee:   40                      inc    eax
9c:   8a 18                   mov    bl, BYTE PTR [eax]
9e:   4b                      dec    ebx
9f:   88 18                   mov    BYTE PTR [eax], bl

Then all that’s left to do was to build the remaining instructions.

NetMaze

Really the simplest non-gaming level here. All I did was writing a DFS to walk randomly, and before long I found a path to walk from the top-left to the bottom-right corner of the maze.

The only thing was that, I realized that the console output is not consistent. Sometimes even if the game displays the segfault screen, the console may not output a line saying segfault. So I patched the source script a bit so the output is consistent with the actual state.

Bub and Bob

Another pure game level. Our solution was something like this:

Fall from the left and push ball is stop all the way to the right so that bub is stop extends out.
Ride the bubble back to the top.
Fall from the right and bring bub down.
Repeat 2 and 3 and bring is down.
Form bub is you horizontally so you can control the little green dragon (called bub). At the same time break zero is you.
Move bub to the middle of the screen and push up the middle is so cpu is run is in effect.
Push ball is back out from right to left, and push them from bottom to the top of the screen.
Prepare ball is at the right of cpu is x64, but don’t connect them yet.
At the right timing, connect ball is cpu and move straight to the start of the shellcode (starting at 83). Once the bubble is produced it will turn into the CPU and start running.

Let it run to the end and we’re done. Unfortunately at the exact moment we solved this level, the server is down, and that marked the end of day 1.

Here’s the solutions to all 15 levels: Zero is You | 15 solutions - YouTube.

Optimization

While a bunch of people were trying to solve the puzzle, another group of our KoH players was doing their best to try to optimize the existing solutions. Other than trying to optimize by hand, they wrote a fuzzing tool to randomly (maybe with a heuristic) delete a part out of the solution or replace a part with a smaller part.

With their effort, we successfully shrunk many of our past solutions. However, with the game scoring method, every 5 moves we saved only would give us 1 point. That being said, this fuzzing method made its work on the final day……

Overall, we didn’t do well in the first day. Everyone was so good at playing games and we just weren’t fast enough. We were able to get to the fifth place when the game ends, but the way the KoH score works means that all other teams are so ahead of us. We got to keep up in the following days!

www

www-score

This King of the Hill game combined Attack/Defense, Penetration Testing, and maybe even Web altogether. A really messy experience, but in a fun way.

The Rule

The rule was simple: at each round, each team could use a flag string to exchange for a graffiti string. The graffiti string then could be put onto other team’s walls. Each team could get 1 point if they put a graffiti string onto some other team’s wall.

The twist was, if a team could figure out which team put a graffiti on their wall, and their team’s real ip , they could accuse the team for vandalizing. If the accusation was successful, not only that 1 point that team got for putting up the graffiti would be gone, they would also be deducted 4 points for vandalizing (and being spotted).

There were two APIs on each team’s server:

/graffiti_store, using which we could exchange a flag string for a graffiti string.
/accuse, which could be used to report a graffiti and ip.

And each team would have a game box that needs to be ssh’d in. Within the game server, the port 1337 of each team’s box was the wall. Connecting to it, a team could spray their graffiti onto others’ walls.

There were two more rules for the fairness and effectiveness of the game:

Teams could accuse retrospectively up to 6 rounds back in the past.
A team only had one chance to accuse a certain graffiti.
A team had to pass services availability check each round, otherwise 15 points would be deducted that round.

The Observation

Few thoughts and observations we had after we read the rules, logged onto our game box, and tried to manually spray graffiti onto other teams’ walls and accuse other team’s graffiti.

We have the IP address 13.37.228.64, and other teams are scattered around in 13.37.xx.64. This xx did not correspond to the team ID, therefore a scan of the subnet is needed to find out other teams’ machines. However, when accusing, you only need to report the IP of each team’s box inside the network, not the IP that each team has that contains their team ID (i.e. not 10.13.37.xx).
There’s a folder in the game box that includes a flag file and a bunch of pcaps. The flag content will change every round.
The IP address that we used to spray graffiti (send TCP connection with) is recorded along with the graffiti string itself. This means that if we were to simply use our real IP address to spray graffiti, and other teams had a automatic accusing system, then for sure we are going to get our points deducted.
From 2, it means it definitely was a dominated strategy to spray graffiti with a team’s real IP. Therefore, there must be someway to hide a team’s real IP. Combining this fact with the challenge’s description, and a later hint, there’s definitely some machines out there in the game network, over which we could take control.
Furthermore, there must be other flags out there in these pwnable servers that we could use to buy graffiti with. This is also hinted in the rules of the game.
Continuing 4, there must be a way to spot the real IP of the vandalizing team, even if they used a jump/zombie server to spray graffiti. In other words, correspond jump/zombie servers’ IPs back to the real IP the team had, assuming each jump server is used only by one team.

Even though we knew that it is not a good strategy to spray graffiti with our real IP, before anyone had time to write an auto accusing script, we had some chance to get points, before we successfully pwned a service. We could always turn this off if we started to see our points were deducted.

Quickly, I started writing the script to automatically acquire the flag string, exchange it for a graffiti and spray it onto every other teams’ walls. A few annoying things we realized once we started:

The graffiti exchange endpoint is outside the game server, so I had to port forward it using SSH.
The SSH server has a really short ClientAliveInterval, so we had to manually adjust the ServerAliveInterval from our SSH Client.
There’s a maximum connection limit for the SSH port. So one of us had to do a proxy and let everyone else connect to it via that.

While I was writing that, one of my teammates was writing an accusing script. Other members of KoH were then scanning the subnet and trying to find other machines that we could take over.

The Turnabout

While my spraying script was running, some of my teammates realized a interesting thing: If you used a flag string from a past round to exchange for a graffiti this round, you would get a completely new graffiti without error messages. Since in the rule it clearly says that both the flag and graffiti lasts for only one round, I thought it may be just a bug, and you couldn’t really get a point using that graffiti string exchanged from an expired flag.

Nonetheless, there were no downsides to collect all the past flags, so I quickly added more code to my script, so it would collect the current flag and save it along with all the past ones. At every new round, the script would exchange all the saved flags into graffiti strings and spray them onto every other teams’ walls.

This completely changed the game. It was actually counted as a successful spray, in fact, to spray graffiti exchanged with a flag from past rounds. This meant for anyone that knows this bug, their scores could increase linearly each round: Assuming a team has $c$ “flag sources” (one plus how many machines the team took over), and that they had been collecting flags for $n$ rounds, then they could get $c\cdot n$ points in that round, and in total their points would be on the magnitude of $\Theta(c\cdot n^2)$.

Even without a second machine, we could see our points grow faster than ever. In the case where there were no further progress on pwning other machines, this is the only hope we have.

The Smoke

During this time, we kinda worried (and almost certain) that some other team have an auto-accusing script running, and we would get more points deducted if we were to spray more graffiti. Therefore, we wrote a “smoking” machine, basically just to randomly generate hex-value strings that looks like graffiti strings, but in fact were just counterfeits, to spray onto other teams’ walls.

Doing this, the real graffiti string would be hidden within a pool of fake ones. We thought that teams with a not good enough script would have a hard time accusing, because there were to many accusations they had to make. This should also increase the difficulty of analyzing our behaviors.

Or at least that’s the hope. As for weather it had any effect or not, we didn’t know.

The Fake IP

Spoiler alert: till the end of the game, we weren’t able to find or pwn any other services. However, that doesn’t mean we didn’t get our own fake IP to spray graffiti with.

@cbmixx realized that, since we had root privileges on our own machine, it was possible to give ourselves another IP within a subnet that we had control over with.

I don’t quite remember what exact prefix we had control with, might be /26 or something.

After some trials, we were able to assign our machine a new IP address 13.37.228.65 (and later 13.37.228.66). Adding a rule to iptables, all our outgoing traffic to port 1337 of other teams’ machines were all going through the new IP address. For teams with a automated accusing script, this should prevent them from accusing us for a long time.

However, one big downside of this was that, if any team were to analyze the graffiti logs manually, they would soon discover what’s going on and correspond our fake IP back to the real IP, because they share the same prefix. Nonetheless, this was the best we could do.

The Accusation

Our connection to the game VPN is shut off by the admin, said we had too many traffic going through the network and we were DOSing the server. After spending sometime figuring out where the traffic was coming from, we realized that we had our accusing script acting way to aggressive, so we had to take that down for a moment.

@mcfx moved the accusing script locally and only periodically pulled data from the remote server. He then analyzed data all by himself, and write a script to automatically correspond jump server IPs back to the real IP. He was able to figure out all the big player’s real IP and their jump servers’. I believe he was the sole reason why PPP and Katzebin lost so many points.

There were many ways to find the real IP behind the jump server, although I’m not sure what exact method @mcfx used. Nonetheless, I’ll talk about what my idea was:

Assuming we had an IP that we didn’t know the real IP behind it, we could set up a list of all possible real IPs behind it. For the first time we met this IP, the list is just all other teams’ IP addresses. Then, for all the graffiti that were sent out using that IP, we could try to accuse that graffiti with one of the IP address that’s still on the list. Since the /accuse endpoint told us we didn’t successfully accuse, we cross out that IP from the list, until we met with one IP that we accused successfully.

During the game, I had a real evil thought: How about we tell other teams what is the real IP address behind each jump server, so that they could help with accusing the big teams and taking them down. Maybe I could publicize it in the Discord chat or spray it onto other teams’ walls. However, that would probably be violating some rules, or at least not so moral, and didn’t sound like a correct thing to do, so we didn’t do it.

The Avoidance

After some period of time, we could see our point no longer grew that rapidly, so it must be some teams starting to find out our fake IP address and accusing us with the real ones. Utilizing the rules that you can only accuse graffiti that were sprayed on your team’s wall, we had two ideas:

Spray randomly to only a portion to all the teams and record who did we spray. We can then check our scores to find out who should we stop spraying.
Only stop spraying the big players we found, since they have the largest chance of finding out our real IP as well.

The first one seemed to be a good idea, but really hard to implement. First of all, the scores are lagging behind, so in reality we would have to wait for a lot of rounds to get the data we need. Furthermore, the data is really noisy, and we have to do this for many rounds to be sure. Therefore, I didn’t go with it.

Instead, for the second idea, since we already identified some of the big players out there, we could just stop spraying them and quickly check if we indeed get more points. We picked out two largest teams out there (who owned the most jump servers), who had real IP 13.37.109.64 and 13.37.238.64, and stopped spraying them.

At first it didn’t seem to work, but after the scoreboard kept up with the round where we implemented this idea, we saw our score increased drastically, and the score we gain each round then surpassed PPP and Katzebin.

The Countermeasure

After a while, we realized that the big teams stopped attacking us as well. Probably due to the same thought process as above. However, there is actually a way to counter that, let’s first review what we knew:

In order to successful accuse some team, we needed to present an (IP, graffiti) pair, with restriction being that

We knew the real IP address of the WWW game box, which the team that exchanged this graffiti owned.
That graffiti was sprayed onto our wall.

If people stop spraying us, then clearly we couldn’t do anything about it, right?

Notice one thing, that we could see other teams’ wall as well. Meaning even if at some round some team doesn’t spray us, they would spray someone else and their graffiti would be left there for anyone to see.

Then, one of our team members had a crazy thought: we could take the graffiti that the team sprayed on other teams’ walls and spray them onto ours wall. This idea had two implications:

This must be counted a point for that team, because the system had no way to tell weather we sprayed it onto our own wall or the other team took control over our machine and sprayed a graffiti onto ours.
Since 1 must work, then we definitely can successfully accuse that team for spraying, because it satisfies the above 2 restrictions.

Quickly we tried out if this really works, and it did. Hence, for teams that don’t want to get accused, the best thing for them to do is to probably stop spraying. At the final hours of the game, both PPP and Katzebin stopped spraying, and we were the only team who still has a growing score.

The Reset

At one point of the game, we suddenly lost all connections to the game box. At first we thought it was just a connection issue, so we waited a bit, but it didn’t come back on its own. I started to panic, because all of the saved flags are stored on the server, and we had to at least get the file back before resetting the machine. So we sent out a ticket, hoping that the admin could help us sort out the issue.

However, waiting for almost 10 minutes after we lost connection to the server, there were still no progress on the matter, and we can’t afford to lose more points. We have accumulated about 40 flags at this time of the game, and a reset means we would have to start from zero.

Just when we were stressed about whether to wait for admin’s support or just reset the machine, one of the team members found the content of the flags file in his terminal history, a cat /flags command that might just save us. After checking the content, we confirmed that it’s a relative recent history, which contains enough flags for us to get back to where we were. We backed up that content and immediately reset our machine.

But that’s not the end.

The Second Reset

When we first lost connections to the server, we were kind of suspecting that another team hacked our box, but we didn’t know. We knew that there were SSH weak password issues, but we weren’t able to log into anyone’s machine using that weak password, and we had changed our SSH password from the beginning.

However, not long after we reset our machine, suddenly we were kicked out from the server again. This time, it’s not that we cannot connect to the SSH port, but SSH server kept telling us the password is wrong. We were sure that we were getting hacked and that was really scary.

Quickly we reset our machine again, and on the moment we log in, not only we changed all the users’ passwords, including the main one, we also disabled the apache server. Although this might make us fail the service availability check, but it’s only 15 points per round, and the amount of scores we get is much more than that.

After that, our connections are fine, so that’s the end of the this brief interlude.

The End

An hour before day two ended, almost everyone’s score stopped increasing, except for ours. At the end, we even climbed to the second place, only 2000 scores left with PPP. I would say that’s impressive considering we didn’t pwn any server at all.

There’s also one point in the game where nobody scored for some reason. I think that’s because the /graffiti_store endpoint is down, but I don’t think it was down for that much time. It remains to be a mystery then.

shooow-your-shell

shooow-your-shell-score

After day two ended, a KoH homework was released. The name was shooow-your-shell and obviously from the name it was yet another shellcode challenge. Nine hours before day three started, we began to work on this challenge.

Looking at the Python code and disassembled runner, we understood how the game worked:

Each time a team can submit a hex-encoded shellcode. The shellcode can be written in x86_64, arm64 or riscv64 architecture.
The game will try the shellcode on three architectures each time, and if any of them reads the file content from /secret and print the content to stdout, it was deemed as a success.
The shellcode is executed using qemu-user-static, but chroot into a temporary folder and executed using a non-root user privilege.
However, there are some restrictions about the shellcode a team can submit:
- At any time of the game, there would be a set of blocked bytes $B\subseteq \{\texttt{0x00},\texttt{0x01},\cdots,\texttt{0xFF}\}$. The submitted shellcode cannot contain any of the blocked bytes.
- Comparing to the last accepted submission $S_l$, the new shellcode $S_n$ must be either:
  - $\{S_l\dots\} \setminus \{S_n\dots\} \neq \varnothing$ (the new shellcode did not use at least one byte the last accepted shellcode used), or
  - $\vert S_l\vert > \vert S_n\vert$ (the new shellcode is shorter than the last accepted shellcode in length).
  Where $S_l$ and $S_n$ are both strings of shellcodes, and $\{S\dots\}$ is a set containing bytes used in a shellcode $S$.
If the shellcode was then accepted, the newly blocked bytes will be $B_n = B \cup \left(\{S_l\dots\} \setminus \{S_n\dots\}\right)$. That is, all bytes that’s not used by the new shellcode but used by the old one will be accepted.
The team with the latest accepted shellcode would be viewed as “the top of the hill.” The team cannot submit more shellcode if they were the top of the hill.
The newly accepted shellcode would then be appended into history, where the leaderboard is calculated as a reverse of the history. That is, the team with the latest accepted shellcode would be #1, the team that took the top of the hill before them would be #2, so on and so forth.
Each time the Python script would read past information from history, and write to it if the shellcode is accepted.
If a team stayed on the top of the hill for more than 900 seconds (or 15 minutes), then that team would be regarded as a “winner,” and the game would reset. However, one random byte of the winner’s shellcode would be part of the new game’s initial banned bytes $B$. This applied to all past winners.

The rules are pretty clear, but we also had some doubts:

The history file was read at the start of the script, and would be overwritten when the new history was saved, so there were clearly a race condition: open two connections $A$ and $B$ at the same time, submit an acceptable shellcode to $A$ such that the history is overwritten. However, because the script reads history at the start of the script, so for $B$ the history is still what it used to be (when the connection opened), and accepts a shellcode that may not be acceptable for the current history. If now we submit a shellcode to $B$ and it is accepted, $B$’s connection would overwrite history again as if $A$ never happened.
There must be a way to sync up the files between each team’s boxes, or everyone would be seeing there own version of history and there’s no point of the game anymore. There’s also one possibility that everyone is playing on the same server, but that’s highly unlikely according to @riatre.

But there’s no way to know before the game started, so we went on with our preparation.

Preparation

We first started by writing a few different shellcodes. There were shellcodes that used syscalls directly, shellcodes that called functions statically linked into the binary, and shellcodes that pushed a ROP chain into the stack and returns. Nothing quite out of ordinary here.

Three-Byte Shellcode

Until @meowmere sent a shellcode that blew all of us away. In his shellcode, there were only 3 bytes used, 0x05, 0x50, and 0xc3. Quickly we disassembled the shellcode and understood how it worked:

The shellcode was simply pushing a ROP chain onto the stack and returning, but written in a way such that all the values were added up in the register. Therefore there are only three basic instructions used

add eax, imm32, which is 0x05 followed by four bytes of the immediate value in four bytes.
push rax, which is 0x50, and
ret which is 0xc3.

And for the add instruction, the immediate values were consisted of only these 3 bytes, so the shellcode only used these three bytes. Of course the values in the ROP chain contains bytes that doesn’t fall into these 3 bytes, but amazingly, using a combination of number that made up from these 3 bytes and modular arithmetic (which every computer comes with), we can actually add them up to the number we want.

For example, if we want a number 0xd093cffa to be pushed onto the stack, then we can have:

add eax, 0xc350c305
add eax, 0xc350c305
add eax, 0xc350c350
add eax, 0xc350c350
add eax, 0xc350c350
push eax

And the assembled shellcode for this would consist only of the aforementioned 3 bytes. This works because

\[(\text{c350c305})_{\text{hex}} \cdot 2 + (\text{c350c350})_{\text{hex}} \cdot 3 \equiv (\text{d093cffa})_\text{hex} \mod 2^{32}\]

I’m not good at number theory, so I can’t tell if the combination of these 3 bytes into a 32-bit number can add up to any number within the 32-bit range if mod $2^{32}$. That being said, it is proven that it can add up to some number, so it’s time for some math.

Find Values that Adds Up to the Target One

We have an alphabet $A$ that consists of numbers from $[0,\vert A \vert)$, a set of usable symbols $B \subseteq A$, and a target string $T$ of length $s$. We represent $T$ as a base-$\vert A\vert$ number, that is, $T = \sum_{i=0}^{s-1} {\vert A\vert }^{i} T_i$.

We want to find a list of numbers $[n_1, n_2, \cdots, n_i]$ such that $T \equiv \sum_i n_i \mod \vert A\vert ^s$ and $n_i=\sum_{j=0}^{s-1} \vert A\vert ^{j}\cdot n_{ij}, n_{ij} \in B$, i.e. $n_i$ consists only symbols from $B$ in their base-$\vert A\vert$ representation as a string.

Although it is possible to generate all possible values of $n_i$—such a set has a size of $\vert B\vert ^s$—it is very hard to find a combination of them that adds up to $T$. If using breadth-first search, on the search depth $d$, there will have $O\left(\vert B\vert ^{sd}\right)$ such many values, and there could be as many as $\vert A\vert ^s$ of different combinations and only few of them is what we need. Therefore, we need to find a way to search for these values quickly.

We have

\[\begin{align} T &\equiv \sum_i n_i &\mod |A|^s\\ &\equiv \sum_i\sum_{j=0}^{s-1} |A|^j\cdot n_{ij} &\mod |A|^s\\ &\equiv \sum_{j=0}^{s-1} \left(|A|^j \cdot \sum_{k=1}^{|B|} b_k c_{jk} \right)&\mod |A|^s&& \text{where }c_{jk} = \sum_i[n_{ij} = b_k]\\ \end{align}\]

where $n_{ij}$ is the $j$ the symbol of string $n_i$. A constraint we must add is

\[\exists i \in \mathbb{Z}^+, \forall j \in [0, s), \sum_{k=1}^{|B|} c_{jk} = i\]

That is, we must have the same number of symbols on each position on the string, otherwise we must use $0$ to pad the string which might not be in $B$. To find the optimal answer (one that produces shortest $n_i$ list), we just need to find such a minimum $i$. Also, it is easy to see that $i$ has an upper bound of $\vert A\vert \cdot \vert B\vert$, since each symbol on each position need only appear at most $\vert A\vert$ times.

Notice that we represented $T$ as a number, that is:

\[T =\sum_{j=0}^{s-1}|A|^j \cdot T_j \equiv\sum_{j=0}^{s-1} \left(|A|^j \cdot \sum_{k=1}^{|B|} b_k c_{jk} \right)\mod |A|^s\]

where $T_j \in A$. That means we have

\[\begin{align} T_0 &\equiv\sum_{k=1}^{|B|} b_k c_{0k} &\mod |A|\\ T_1 &\equiv \sum_{k=1}^{|B|} b_k c_{1k} + \left\lfloor\frac{\sum_{k=1}^{|B|} b_k c_{0k}}{|A|}\right\rfloor&\mod |A|\\ T_2 &\equiv \sum_{k=1}^{|B|} b_k c_{2k} + \left\lfloor\frac{\sum_{k=1}^{|B|} b_k c_{0k}}{|A|^2} \right\rfloor + \left\lfloor\frac{\sum_{k=1}^{|B|} b_k c_{1k}}{|A|}\right\rfloor&\mod |A|\\ \dots\\ T_d &\equiv \sum_{p=0}^d \left\lfloor \frac{\sum_{k=1}^{|B|} b_k c_{pk}}{|A|^p} \right\rfloor &\mod |A| \end{align}\]

Now we have turned the task from brute-forcing a list of $n_i$s into brute-forcing $c_{jk}$s. For each byte of $T$, we have at most $\vert A\vert ^{\vert B\vert }$ different combination of $\sum_{k=1}^{\vert B\vert } b_k c_{jk} \mod \vert A\vert$, and assuming they distributes evenly over the range, there are $\approx \vert A\vert ^{\vert B\vert -1}$ ways that they sums up to be any specific symbol, which makes the search space much more acceptable when $\vert B\vert$ is small. Furthermore, we can cache all values that different $\sum_{k=1}^{\vert B\vert } b_k c_{jk} \mod \vert A\vert$ has, such that we can quickly lookup the value in reverse. The overall search complexity is $O\left(s\cdot\vert A\vert ^{\vert B\vert -1}\right)$ (a very loose bound).

The pseudocode for the above procedure, without optimization:

input Target, UsableSymbols, Alphabet
a := size(Alphabet)
b := size(UsableSymbols)

def Search(T, i, depth, path)
  if depth == length(Target)
    return path
  end
  for each (c_1, c_2, ..., c_b) such that
      sum(c_1, ...) == i and
      ({c_1, ...} <dot product> UsableSymbols) mod b == T mod b
    answer := Search((T - {c_1, ...} dot UsableSymbols) mod b / a, i, depth + 1, append(path, (c_1, ...)))
    return answer if not nil
  end
  return nil
end

for i from 1 to a * b
  answer = Search(Target, i, 0, [])
  return answer if not nil
end
output "no answer"

In this specific case, our alphabet are possible hex values within a byte, i.e. 0x00, 0x01, …, 0xff, so 256 of them. And our usable symbols depend on what operations we need to do. In this case of add eax, push rax and ret, our usable bytes are 0x05, 0x50, and 0xc3 respectively. Therefore, using this algorithm, given a number $T$ within the range of $2^{32} = 256^4$, we are able to find a way to build a minimal list of numbers such that they sum up to be the target number and they only contain 0x05, 0x50, and 0xc3 in their 4-byte hex representation. Let $f(T)$ be such the optimal (minimal) size of the list for a given $T$.

Here’s my script for adding values consist of limited bytes to a target number using modular arithmetic (github.com).

Construct an ADD/PUSH ROP Chain

For a normal hand-written ROP chain, the values on stack are fixed. However, in a binary executable, there actually could have multiple occurrence of the same instruction like pop rax; ret or mov dword ptr [rdi], edx; ret. It is the same to use any of the address that contains our wanted instruction in an ROP chain, but since our target now is to minimize the length of converted ROP chain, we can look for an optimal combination of values such that they produce the shortest ROP chain after converting them to the add/push format.

Notice that the current rax value is dependent on the last value, for example, our shellcode looks like this:

# rax = 0
add rax, 0x50505050
add rax, 0x50505050
...
push rax # rax = v1
...      # somehow add v2 - v1 to rax
push rax # rax = v2

The add operations between the first push and second push depends on both values of $v1$ and $v2$. Let’s make this more formal:

Given a ROP chain of $n$ values that need to be pushed onto the stack, we have a list of sets $N = \left[V_1, V_2, \cdots, V_n\right]$ where $V_i$ is a set of values that, no matter which being pushed onto the stack, the ROP chain has the same effect. We want to find a solution $(v_1, v_2, \cdots, v_n) \in V_1 \times V_2 \times \cdots \times V_n$ such that

\[\sum_{i=1}^n f\left(v_i - v_{i-1} \mod |A|^s\right)\]

is minimal (where $v_0$ is the initial rax value). A brute-force method such as going through all such possible $n$s is not ideal, which requires $\prod_{i=1}^n \vert S_i\vert$ such many tries. A greedy approach trying to minify $f\left(v_i - v_{i-1} \mod \vert A\vert ^s\right)$ going through each $i$ is clearly not optimal. It’s time for dynamic programming then.

Let $F_i(v)$ be the minimal number of add instructions when the $i$th value on the stack is $v$, We have

\[\begin{align} F_{i+1}(v) &= \min_{(v_1, \cdots, v_i) \in V_1 \times \cdots \times V_i} \left( f\left(v -v_{i} \mod |A|^s\right) + \sum_{j=1}^{i} f\left(v_{j} - v_{j-1} \mod |A|^s\right)\right)\\ &= \min_{v_i \in V_i} \left(f\left(v -v_{i} \mod |A|^s\right) + \min_{(v_1, \cdots, v_{i-1}) \in V_1 \times \cdots \times V_{i-1}} \left( f\left(v_i-v_{i-1} \mod |A|^s\right) + \sum_{j=1}^{i-1} f\left(v_{j} - v_{j-1} \mod |A|^s\right)\right)\right)\\ &= \min_{v_i \in V_i} \left( f\left(v -v_{i} \mod |A|^s\right) + F_i(v_i)\right) \end{align}\]

And our final answer is to look for $\min_{v_n \in V_n} F_n(v_n)$. The time complexity of this DP is

\[\Theta\left(\sum_{i=1}^n \left(\sum_{(v_i, v_{i-1}) \in V_i \times V_{i-1}} f^t\left(v_i - v_{i-1} \mod |A|^s\right)\right)\right)\]

where $f^t(v)$ is the time needed for the $f$ algorithm to run on input $v$.

Therefore, using this strategy, we were able to generate a three-byte shellcode that has the minimal length. That’s not the end. We are well known the rules that a winner’s shellcode would be banned, so there are some alternatives to the method:

If 0x05 is banned, we could use 0x2d (sub eax) or 0x15 (adc eax). Although the analysis above would be a bit different.
If 0x50 is banned, we could use 0x81 0xc3 (add ebx) and 0x53 (push ebx).
If 0xc3 is banned, we could use 0xc2, since ret 0 is the same as ret.
- If 0xc2 is banned, we could use 0xff 0xe0 (jmp eax).

And for all the above, we could replace the register rax with any other register and the method should still work. Also, this analysis applies for a four-byte set as well.

Phishing Strategy

It all sounds fun and games, until you remembered that the rule said any shellcode that doesn’t use one of the bytes the current top-of-the-hill shellcode had could be accepted. So everything we’ve been doing is in vain!?

Well, if we could ban every single byte except for these three bytes, then obviously we could take the top of the hill and win without a doubt. However, we couldn’t submit again if we were at top of the hill, so it’s nearly impossible to ban the bytes we want to ban. Or was it?

The first idea came to mind was to collude with some other team. If another team could submit a shellcode that uses all the bytes, then our three-byte shellcode would wipe every other bytes out. However, letting alone the potential rules that would break, it’s clear that no team would collaborate with us on this.

Then a thought came through my mind: it is totally possible for us to ban all the other bytes without anyone’s cooperation—the other teams would be helping us no matter they want to or not—it’s time for phishing!

The idea was dead simple:

We prepare a few shellcode that use as few number of different bytes as possible. The shellcodes CANNOT contain the three bytes we need to use. The used bytes in these shellcodes should overlap each other as little as possible.
At each round, we check the current banned bytes, and pick the shellcode that does not contain the banned bytes out. Then, append all other bytes to the end of the shellcode, but don’t include the banned ones. For the three-bytes we need, there are two cases:
- If the current top-of-the-hill shellcode include that byte, then also append that byte to the end,
- Otherwise, don’t include that byte in our submitted shellcode.
Once we submitted that shellcode, we’ll just have to wait for another team to submit whatever shellcode they have, and then submit our three-byte shellcode. Then, all the other bytes should be banned except for those three bytes we used.

Of course, if one of our three bytes was already banned, then this couldn’t work. But overall this seemed like a really good strategy. It’s almost like phishing because it would result in other teams submitting shellcode that is actually used against them, while they had no idea what happened.

As for how it really went in the real game, I’ll leave to a later part.

Copying Homework

Recall the race condition we talked about earlier, so another group of our KoH players were thinking about how to exploit it. The game also features an in-game leaderboard, where everyone can see each other’s accepted shellcode. We then had a great idea of how to utilize this race condition and the leaderboard:

Start two threads that connects to the game service simultaneously, so they would have the same view of the history.
Thread 1 keeps the connection alive, while thread 2 constantly reconnect to check if there’s a change in leaderboard.
If thread 2 detected a new top-of-the-hill shellcode, it send that shellcode to thread 1.
Thread 1 then wait until seconds before the timeout of the connection (30 seconds), and submit that shellcode.

A diagram looks like this:

copy-diagram

This would work because Thread 1 had an old view of history, so if team B’s shellcode could be accepted for that version of history, then definitely we could use the same shellcode as well. This would also wipe Team A’s submission out of existence. A really powerful strategy indeed.

Countermeasure

The way to counter this (of course there is one), is that for the submitting team, they had to do exactly the same thing.

Although we didn’t quite think of this during the preparation, for the sake of consistency, I’ll put it here.

When submitting a new shellcode, the team had to open two connections, one submitting shellcode immediately, and another should wait until the timeout to submit. This would work because:

Suppose another team A who’s trying to steal the shellcode opens a connection at $t_0$. So for that team, the shellcode it can steal is submitted in the time range $(t_0, t_0+t_{\text{close}}]$ where $t_{\text{close}}$ is the connection timeout. Suppose we submit our shellcode at $t_1$, where $t_1 \in (t_0, t_0+t_\text{close}]$, then the best they can do is to resubmit the shellcode at time $t_0+t_\text{close}$. However, we then resubmit our shellcode at $t_1+t_\text{close}$, so our record will still be on the leaderboard. Notice that no matter how close $t_0$ and $t_1$ can be, since $t_0 < t_1$ because the stealing team must read history before our submission, it must be $t_0+t_\text{close}< t_1+t_\text{close}$.

Therefore as long as we submitted our shellcode twice using the strategy, we could make sure that our shellcode wouldn’t be stolen.

Some Other Methods

A Repository of Shellcodes

We also put all of the possible shellcodes into a git repository, and we wrote a script to pick the shellcodes from the repository to submit with some rules and heuristics:

First we cross out of the shellcodes that couldn’t possibly be accepted either because of banned bytes or they weren’t better than the current top-of-the-hill.
Then for each of the remaining shellcode, we pick the one that if accepted, would ban most number of bytes.
In case of a draw, we submit the shellcode that’s shortest in length.

Any shellcode we encounter during the game would also be added into this repository.

Fuzzing

Remembered the fuzzing we talked about in zero-is-you? Now it’s back. I didn’t work on this so I know not much about the details, but essentially it would take the current top-of-the-hill and try to fuzz out a shellcode that is shorter than the current one.

Using Existing Tools

There were clearly CTFs before that required inputting shellcode using only printable characters, so definitely there were existing tools to generate/transform shellcodes into limited character set. We found some of the tools and they really worked pretty well.

Game Start

Game started, the phishing strategy worked right away and we blocked all the bytes, just to see that the history was rewritten by some other teams. At that time we didn’t figure out the countermeasure yet, so that’s that. Nonetheless, copying homework did work and we were on the top-of-the-hill for a moment.

We didn’t plan to use the 3-byte strategy until we get a reasonable large banned bytes set. However, one thing we missed is that we put our 3-byte shellcode into the repository as well, so the script automatically submitted that. Oh well, that’s bad. Not only our trump card was leaked and everyone could figure out how this worked, but we submitted when there were still a lot of acceptable bytes, so anyone could take us down with an easy shellcode.

Furthermore, it’s obvious that we were the only ones figured out how to use only a 3-byte set (not exactly, more on that later) to construct a shellcode, but everyone noticed the race condition and started to copy each other’s homework. It’s really sad to see we submitted a shellcode, just a few seconds later we disappeared from the leaderboard and being replaced by some other team with the same shellcode.

That being said, the game was fair to everyone, we indeed successfully copied others’ homework as well, so no real complains here.

Unexpected Reset

We were the winner for the first round, everything was going smoothly, until it wasn’t.

On the start of the second round, it all seemed normal as teams submitting different shellcode again, but then, out of nowhere, a reset happened. It was not a reset as if someone became a winner again, it was a reset as if the entire history was wiped out. We can tell because we were no longer under the Renowned ancestors section. We thought maybe it was a sync issue, that there were some problems with the sync script that messed everything up. But the weirdest thing is yet to come.

StarBugs’s true 3-byte shellcode blew our mind. 69 89 73 was immediately the meme in the Discord server. There was no way, we thought, that this short of a shellcode would do the things the game asked us to do. We were certain that, either the syncing script messed something up, or it cut off a part of the shellcode away.

However, OOO said that “the sync bot [was] working just fine,” so it made me think it might be another possibility: they must had some Linux kernel chroot escape zero-day, or else how did you explain the situation?

Anyway, 698973 didn’t stay on the top for a long time, because there were still plenty of acceptable bytes to use, so other team’s were able to use the remaining bytes to write something and kicked StarBugs out. It might be a fluke then, we thought, so we quickly went back to the battlefield.

Read From Stderr

It was all going pretty normal for a few rounds, but something weird happened again. This time, PPP threw out 000000ca00080091210000d4, and we were all like, what? Soon enough, 698973 came back, and 699973 appeared. We were all dumbstruck. A quick run of the code on our local environment showed clearly that these shellcode doesn’t pass the test, so what trickery were they playing with?

Not before long, my teammate noticed something odd. If you run these shellcodes, the execution would actually take longer for some reason. It would stuck at one architecture and then timed out after 5 seconds. That’s a bit weird. Then, we noticed that if you were to press Enter, the shellcode would exit immediately. This was a sign of accepting input!

The questions were then, what was the input format and why can you input something. The second one isn’t hard to figure out, as we finally saw that in subprocess execution, the stderr was redirected as 1, but you could actually read data from that. This is really wild but not something we haven’t seen before.

While some were investigating how the shellcode itself worked, we took an educated guess that it just accepted raw shellcode. So quickly we wrote a shellcode using that architecture, input it when the shellcode waited for input, and it really did work.

After that and some digging into RISC V and ARM64 architecture specifications, we figured out how the shellcode could read from input while only had 3 bytes.

I wasn’t focused on this so I didn’t know how it actually worked. The reader can refer to some other fantastic write-ups.

Phishing Worked

An hour or so before the game end, finally the phishing worked and the accepted bytes was limited to only 3 bytes. We were so happy, just to found out that someone copied our homework.

That’s a really bad news because we were too naïve and submitted our optimal solutions already. Ends up that we have to look for another set of ROP chain that would result in a shorter shellcode after converting it into an ADD/PUSH ROP chain.

After a few minutes, we found one, and we were really excited, so someone just submitted the shellcode by hand. And you know what happened? Our homework was being copied once again. Oh big F.

Finally we found another one, and this time we took our lesson. We finally figured out the countermeasure (mentioned above), and submitted it the right way. We then start our 15 minutes countdown, while preparing for the next three-bytes payload.

However, an unexpected reset happened once again before we reached 15 minutes, and we didn’t even have enough time to start running our phishing script. Welp, that’s the end for phishing I guess.

To the End

It was then basically a battle for the shellcode that read from stderr. Since we figured out how it worked, we used this technique as well. At the end, PPP throw out the final bomb, shellcode 73 00, marking probability the shortest shellcode that could be accepted.

Still we didn’t figure out why the unexpected resets happened. Was it some other teams’ doing things to alter history, or just a bad sync script made by OOO. Probably we could figure it out by digging into the pcaps, but I’ll leave that to the reader.

Epilogue

Writing this took way longer than I planned, but I’m glad I had most of the details laid out, as a memorial to this precious experience. I genuinely had a lot of fun playing with Tea Deliverers, hope they will still bring me in next year.

Finally, here’s an overall ranking for KoH:

koh-score

Thank you for reading!

PREVIOUSGoogle CTF 2021

NEXTIntroduction to Go