Stripe CTF 3.0

Sadly, level 3 would not run for me, even with Stripe’s patch, so I could not continue with the competition. It was fun while it lasted though – C

Wednesday saw the beginning of another Stripe CTF! This time I was in London when it started so I went to the launch meeting with some old uni friends.

The theme this time was distributed computation, so with a tiny netbook, I dove in to the levels.

Level 0

Level 0 was essentially an exercise in optimisation. A given text input was checked against a list of words. If an input word was in the preset dictionary, it had to be tagged. The preset dictionary was an ordered list, and as such was O(n) to search. By applying the following:

index 1558f2d..d07273f 100644
--- orig.rb
+++ mod.rb
@@ -1,4 +1,5 @@
#!/usr/bin/env ruby
+require 'set'

# Our test cases will always use the same dictionary file (with SHA1
# 6b898d7c48630be05b72b3ae07c5be6617f90d8e). Running `test/harness`
@@ -7,6 +8,7 @@

path = ARGV.length > 0 ? ARGV[0] : '/usr/share/dict/words'
entries = File.read(path).split("\n")
+entries = Set.new(entries)

contents = $stdin.read
output = contents.gsub(/[^ \n]+/) do |word|

 


The list is turned into a set with a  O(1) lookup time. Significantly speeding up the operation.

Level 1

This level was about cryptocurrencies, and to pass this level you had to mine a … ‘GitCoin’. Essentially, you were given a repo with an initial catalog of transactions. You had to successfully submit a transaction with gave your given use a gitcoin.

Proof of work for a gitcoin was determined by ensuring that the git commit message had a SHA1 signature that was lexigraphically smaller than the difficulty. So add a nonce to your commit message and keep cycling though random numbers until the commit message had a valid signature.

Stripe provided a very slow bash reference implementation, which I still used to pass the level. Instead of increasing the nonce in bash though, I wrote a python script to find a correct hash for me faster.

import sys
from hashlib import sha1
import random
import string
import Queue as queue
import threading

def work(diff, tree,  parent, timestamp, q):
    diffl = len(diff)
    diff = ''.join('0' for x in range(diffl))
    body = "tree %s\nparent %s\nauthor CTF user <me@example.com> %s +0000\ncommitter CTF user <me@example.com> %s +0000\n\nGive me a Gitcoin\n\nnonce: " % (tree, parent, timestamp, timestamp)
    while True:
        body_b = '%s%s' % (body, ''.join(random.choice(string.hexdigits) for x in range(8)))
        s = sha1('commit ' + str(len(body_b)) + '\0' + body_b)
        hex = s.hexdigest()[:diffl]
        if hex.startswith(diff):
            body = body_b
            break
    q.put(body)

def main():
    diff, tree, parent, timestamp = sys.argv[1:]
    q = queue.Queue()
    threads = [threading.Thread(target=work, args=(diff, tree, parent, timestamp, q)) for i in range(1)]
    for th in threads:
        th.daemon = True
        th.start()

    body = bytes(q.get())
    with open('/home/carl/level1/test.txt', 'w') as f:
        f.write(body)

    for th in threads:
        th.join(0)

if __name__ == '__main__':
    main()

There were some hurdles I came across while solving this, which show in the code.  The git hashing command git hash-object -t commit didn’t just take the SHA1 hash of its input, it would first prepend commit len(data)\0 before hashing. This was easy enough to find with a bit of searching, but a major issue I was having that I couldn’t replicate the SHA1 hash unless I first wrote the commit to a file, rather than streaming via stdout. So I just wrote to a file and modified the miner bash script to change:

@@ -56,12 +56,12 @@ $counter"

        # See http://git-scm.com/book/en/Git-Internals-Git-Objects for
        # details on Git objects.
-       sha1=$(git hash-object -t commit --stdin <<< "$body")
+       sha1=$(git hash-object -t commit /home/carl/level1/test.txt)

        if [ "$sha1" "<" "$difficulty" ]; then
            echo
            echo "Mined a Gitcoin with commit: $sha1"
-           git hash-object -t commit --stdin -w <<< "$body"  > /dev/null
+           git hash-object -t commit --stdin -w /home/carl/level1/test.txt  > /dev/null
            git reset --hard "$sha1" > /dev/null
            break
        fi

Which let me get the correct hashes and mine the coin.

Level 2

Level 2 was all about DDOS attacks. The idea was that there were a number of back end servers, a reverse proxy (which you modified), and a number of clients, some malicious and others not. You had to modify the reverse proxy (called shield) to not let malicious traffic through, and to attempt to minimise back end idleness. Scores were determined by a test harness and also on the git push hook.

Stripe provided the attack code for reference, which made the level really easy. Malicious attackers basically spawned more connections more often, and the numbers that they spawned was defined in the file, as:

simulationParameters = {
'clientLifetime': 2, // In rounds
'roundLength': 500, // In ms
'roundCount': 40,
'clientsPerRound': 5,
'pElephant': 0.4,
'mouseRequestsPerRound': 2,
'elephantRequestsPerRound': 50,
'backendCount': 2,
'backendInFlight': 2,
'backendProcessingTime': 75
};

So from this you can see that malicious clients send 50 requests per round, and normal clients send 2.  So my first solution was just to limit the number of connections from each client with a simple counter. My implementation looks like:

diff --git a/shield b/shield
index c67bd68..8ba87f2 100755
--- a/shield
+++ b/shield
@@ -7,6 +7,7 @@ var httpProxy = require('./network_simulation/lib/proxy');
var checkServer = require('./network_simulation/lib/check_server');
var nopt = require('nopt');
var url = require('url');
+var rcount = {};

var RequestData = function (request, response, buffer) {
this.request = request;
@@ -14,6 +15,16 @@ var RequestData = function (request, response, buffer) {
this.buffer = buffer;
};

+
+function checkRequest(ip){
+ if (rcount[ip] === undefined) {
+ rcount[ip] = 1;
+ } else {
+ rcount[ip]++;
+ }
+ return rcount[ip] <= 4;
+}
+
function ipFromRequest(reqData) {
return reqData.request.headers['x-forwarded-for'];
}
@@ -29,10 +40,10 @@ var Queue = function (proxies, parameters) {
};
Queue.prototype.takeRequest = function (reqData) {
// Reject traffic as necessary:
- // if (currently_blacklisted(ipFromRequest(reqData))) {
- // rejectRequest(reqData);
- // return;
- // }
+ if (!checkRequest(ipFromRequest(reqData))) {
+ rejectRequest(reqData);
+ return;
+ }
// Otherwise proxy it through:
this.proxies[0].proxyRequest(reqData.request, reqData.response, reqData.buffer);
};

I committed and pushed, and surprisingly, this gave me a passing score!

Level 3

This is where the story gets sad. I checked out the code, and I could not get the ./test/harness to work correctly. The tak was a file indexing service, and it had to be optimised. It was written in Scala, which I have never used – so I could not work out how to debug it.  Stripe released a fix, but it still did not fix my issues. At which point I had to move on to other things and could not complete the CTF.

Sad times.

Fashion Hackathon – London Startup Weekend

The weekend of the 14th December I attended the London Startup Weekend Fashion Hackathon. This was a much larger event than the previous hackathon I attended and was more geared towards creating a viable business as well as the tech to support it.

The format was fun, on the first day a number of people would pitch ideas, we would all vote for them, then form teams to begin on the Saturday morning. I attended in order to build something new and fun, so just stood back and listened for some interesting pitches.

There were two super interesting pitches: A smart bag which worked out what was in your bag and alerted you if things were missing; and an automatic garment detector which would allow you to take a picture, and then buy the clothes from the picture.

I ended up picking the image recognition project as it sounded the most fun and I didn’t think we would be able to source an RFID reader (or similar) over the weekend. (it turned out that this team didn’t pitch,  so maybe they pivoted or disbanded?)

The mini-startup we made was called LookSnap, and it was fun and quite gratifying to see that my business instincts were reinforced by the actions of the rest of the group. Over the day and a half that it was worked on,  I think the business model ended up fairly solid.

My main job for the weekend was getting the image recognition working. In terms of the technology and with the very short time-scale in mind I decided to limit the acceptable inputs as much as possible. As such, I designed an algorithm that would be able to extract the clothing (top, bottoms, shoes) from a picture of someone who was facing forward and had their arms down.

The algorithm works as follows:

  1. Use OpenCV to detect a face
  2. With the face position, composite a “clothing mask” (see images) onto the original photo using graphicsmagick
  3. This than gives you a fairly decent cut out of just that persons clothes. Apply different masks for top, bottom, and shoes.

Once I had these images, the idea was to use reverse image search on the lyst.com domain to always return something relevant.

However, there was a slight hitch with this plan. Google reverse image search, which worked well manually, had no API in which to pass an image…

So the stopgap method was to extract the average colour from the garment by averaging all the pixel colours that were in the appropriate garment masks, and then mapping these colour to their more broader hue. This turned out to be incredibly hard and would have been impossible if not for reverse engineering a very good hue detector at http://www.color-blindness.com/color-name-hue/

Once this was working I packaged it all up in a FLASK api where an image file was posted to the endpoint, the above magic happened, and a json file was returned giving the X,Y of the garment in the photo, and information on the product name, description, image, and a buy link.

Unfortunately there was not enough time to integrate the service into our POC app, which would have made persuading the judges that we have actually done basic image detection much easier!

Overall, the team did an excellent job, and even though we didn’t win I feel the weekend was very well spent.

Data Science London Hackathon

On the weekend of October 5th, I participated in the Data Science London Hackathon for Smart Cities. This involved having access to a number of datasets of city based data from London. These datasets included things such as:

  • Car Parking Counts
  • Oyster Journeys
  • Incidents of Antisocial Behaviour

A couple of guys from work and myself made a team (TeamLYST) and decided to have a closer look at the antisocial behaviour dataset to see if we could make something interesting.

The data gave events that happen on a given day, for a given street for about a month. The events were lovingly given as:

  • Dog Fouling
  • Graffiti
  • AntiSocials (public urination, vomit, etc)

So from this we decided to make a predictive application that would generate a number of likely events to happen for a Monday, Tuesday, etc.

The application was split into 3 parts:

  1. Pre-processing the data into a format which was useful, adding in default values etc,
  2. Creating a generative predictive model from this data
  3. Visualising the data

There were three on our team, so I picked the visualisation. I did this using Python and PyGame to draw a PNG of London, which was generated by open streetmap. Event locations were translated to map locations, and the map could be translated and zoomed with the events staying where they were supposed to be. The visualiser allowed you to flip through different days and to access new generated events.

The generative model was trained by looking at each Monday, Tuesday, etc to work out a count of each event type per street, which was then normalised against the total events of that day. This gave a likelihood for each event in each street for each day in the week. Assuming that all events are equally likely to occur (a big assumption) we can sample a normal distribution and apply this to our likelihood map to generate an event. We do this the same amount as the average number of events for that day and we get a pseudo -typical event set.

The final product worked as intended, and with more accurate data could be extended into a nice predictive application to help with local law enforcement responses and distributions.

We didn’t win the hackathon, but it was a fun experience. We put up a video of our work too.

Founder, Researcher, Programmer, Mountaineer