Recurse Center projects, part 2

2024-02-02

The end of my Recurse batch is nearing and I would like to follow up my previous post with another look at what I have been working on since then.

Update: also see part 3.

Blog static site generator

This blog uses a static site generator I wrote myself, mostly in Ruby using ERB. The posts themselves are written in Markdown and I compile them with Gruber's original Markdown.pl Perl script. I host the site on Amazon S3 behind Amazon Cloudfront.

I created this blog years ago but I never got in the habit of posting much. Recurse encouraged me to start writing again. As I revived the blog, I did the following work on the site generator:

Moved it from notes.jacobvosmaer.nl to blog.jacobvosmaer.nl
Updated styling (yes, this plain-looking site does have styling)
Added a tag system
Added an atom feed
Added support for non-Markdown content such as image files
Added post dates (they were just numbered before)
Added multi-threading to speed up the build
Added a "hidden" post status, that allows me to proof-read and ask for feedback before the post gets published on the blog index

Thanks to Ruby's compactness the total amount of code I have to maintain is not that large.


% wc -l build deploy new *.erb
    144 build
      4 deploy
     27 new
     17 atom.xml.erb
     27 index.html.erb
    219 total

Publishing my MIDI parser library

Ever since I got into embedded programming I have been using my own C MIDI parser. Not because the world needed another MIDI parser but because I enjoyed building one. While at Recurse I have finally published this code as a standalone library. For more information, see the blog post I wrote about it.

Cryptopals

It seems to be popular among Recurse participants to form study groups that work through some kind of online curriculum or book together. I can understand why but in my first six weeks at Recurse I did not see a group I wanted to join because I did not have a click with any of the topics. I also could not think of any book or course to start my own group around.

In the second half of my batch however, a fellow participant started a group to work on the Cryptopals challenges and I thought I would give it a shot. I thought the exercises looked familiar and I later realized that I already did the first 8 challenges 10 years ago when it was still called the "Matasano Crypto Challenge". This time around I made it to Challenge 20.

The Cryptopals challenges are a set of exercises designed to give the "student" a better understanding of how attacks on cryptography work. You can sometimes read articles about how another weaknes in AES-123-XYZ has been found but actually exploiting the weakness gives you a much better understanding. It feels similar to the difference between nodding along while reading a math book and actually doing the exercises.

It seems to be customary to do Cryptopals in a programming language you are not fluent in, so that you get better at using that language as a side effect. I have my doubts about this idea because some of the challenges are difficult enough without worrying about how to use your language. Still, I did the challenges in C, in which I would rate myself as well above "beginner" but not "fluent".

Some of the exercises are proper code breaking, while others are more like stepping stones to prepare you for the code breaking. For example, you often start with a blob of binary data, handed to you in Base64 encoding. One of the first stepping stones is making sure you know how to decode Base64 in the programming environment you are working in. I had way too much fun writing my own Base64 decoder. Modern languages either have a Base64 decoder in their standard library or they make it easy to install one. I am still confused by how to integrate C libraries so I wrote my own Base64 parser. Later challenges rely on OpenSSL: there I did use the library after figuring out the necessary LDFLAGS, LDLIBS and CFLAGS incantations.

The best part for me was the code breaking, especially challenges 12-14, 16 and 17. As mentioned above I have not progressed beyond Challenge 20 but I assume/hope there are more cool challenges after that. Challenge 14 became a kind of friendly competition among our group because it can be interpreted at different levels of hardness, and there is a large spread of how computationally efficient you can solve it. After solving the "easy" variant (which was hard enough for me) I went back to it twice, once to solve the hard variant and once more to make it 40 times faster. I would not have gone back without learning from my peers that my previous solution could be improved on.

Apart from the code breaking I enjoyed the opportunity to do more practical work with C as a general purpose programming language (i.e. outside the domain of embedded programming). In particular I got quicker at adding unit tests to my code.

Here is an example, collapsed for brevity.


struct session *parsekv(u8 *in, int size) {
  u8 *end = in + size;
  struct session *s = sessionalloc();
  while (in < end)
    parsefield(&in, end, "email=", parsestring, &s->email) ||
        parsefield(&in, end, "uid=", parsestring, &s->uid) ||
        parsefield(&in, end, "role=", parsestring, &s->role) ||
        parsefield(&in, end, "", discard, 0);
  return s;
}

void expectequalstr(char *expect, char *actual, char *context) {
  if (strcmp(expect, actual))
    error("%s: want %s, got %s", context, expect, actual);
}

void testparsekv(void) {
  struct {
    char *in;
    int size;
    struct session out;
  } * t,
      tests[] = {
          {0, 0, {"", "", ""}},
          {"email=foo@bar.com", -1, {"foo@bar.com", "", ""}},
          {"role=admin", -1, {"", "", "admin"}},
          {"role=ad\x00min", 11, {"", "", "admin"}},
          {"role=admin&email=foo1@bar.com", -1, {"foo1@bar.com", "", "admin"}},
          {"uid=1234", -1, {"", "1234", ""}},
          {"role=admin&uid=123&email=foo1@bar.com",
           -1,
           {"foo1@bar.com", "123", "admin"}},
          {"garbage=field&email=hello@example.com",
           -1,
           {"hello@example.com", "", ""}},
          {"email=hello1@example.com&garbage=field",
           -1,
           {"hello1@example.com", "", ""}},
      };
  for (t = tests; t < endof(tests); t++) {
    struct session *s;
    if (t->size < 0)
      t->size = strlen(t->in);
    if (!(s = parsekv((u8 *)t->in, t->size)))
      error("expected pointer to session, got null");
    expectequalstr(t->out.email, s->email, "email");
    expectequalstr(t->out.uid, s->uid, "uid");
    expectequalstr(t->out.role, s->role, "role");
    sessionfree(s);
  }
}

I just call these test from the main function because they are so fast. That way I do not even need a "test framework" of some sort.

The biggest downside of Cryptopals has been the need to not spoil it for others. I did not put my solutions online because of that. This makes it harder to share things I learned such as the testing above. Also, because I got so excited I got too far ahead of the group; the weekly meetings were discussing the challenges at a pace that was too slow for me. I managed to have some nice conversations with others at Recurse who got further ahead but the overall experience was still a bit lonely.

I am glad I joined this group. The code breaking challenges are exciting, they seem "real" and they are often very ingenious. Doing the challenges at the same time as my peers created a shared experience and sense of camaraderie.

I wanted to talk about a few more things I worked on but this post feels long enough for now. Update: you can continue reading in part 3.

Tags: recurse

Index — Contact

Recurse Center projects, part 2

Blog static site generator

Publishing my MIDI parser library

Cryptopals

More