Jacob Vosmaer's blog

Recurse Center project overview

2024-02-08

In this post I give will an overview of the programming and writing I have worked on as a Recurse participant. I also added some more notes about Forth, Daisy and SIMD.

I have written a separate post to reflect on why I attended the Recurse retreat, what it was like and what I got from it.

What I worked on

In reverse chronological order.

Yet another Forth (in C)

GitHub

I have read about Forth plenty of times and while I was always intrigued by the language I never knew how get started with it. Forth is a minimal yet flexible programming language that can be bootstrapped from assembler relatively easily. Therefore it seems to be a popular challenge to do just that. But I am not comfortable enough with assembly language so I always felt lost when reading about Forth.

At Recurse it is encouraged to write daily status updates ("checkins") on Zulip (an app similar to Slack) to share what you are working on or looking into. This is very valuable because it can lead to serendipitous advice and collaborations. Through other people's checkins I came across a long blog post title "Forth: The programming language that writes itself: The Web Page". It talks about the origins of Forth and, suprise surprise, showcases the author's adventures with writing their own Forth in assembler.

Even though this was yet another reinforcement of my "Forth requires assembler" idea, the blog post also taught me that Forth is more like a set of ideas or principles than a specific language. I cannot exhaustively list the Forth ideas but some of them are:

I find this idea that Forth is a set of principles freeing because it tells me that I don't have to do assembly. I can learn about Forth, and from Forth, by exploring the "Forth ideas".

The second helping hand I received was from a Recurse participant who shared a great Hacker News comment where someone outlines some simple steps to get started writing your own Forth, without assembler.

Primed by these two experiences I felt inspired to start writing my own Forth in C which I unimaginatively called "yaforth" which stands for "yet another Forth".

I got far enough to implement "IF THEN" branching and support for recursive function definitions, but I cannot do loops yet. I don't know if or when I will get back to it. But what matters here is that I got to finally explore Forth, freed from the "obligation" of doing assembly. Yay!

Improving DaisyAudioMangler

GitHub

In the past month I have gone back and improved the "Mangler" audio effect. I developed it on the Daisy Pod prototyping board which is great for prototyping, but too fragile to sit in my studio. I just don't have a good feeling about all the exposed circuits.

To address this, I built a bare Daisy Seed board into a Hammond enclosure.

I also rewrote the Mangler firmware, which was using Daisy Pod buttons and knobs to read parameter changes, so that it could be controlled via MIDI instead. I could quickly tell that MIDI control was opening up new possibilities for "playing" the effect like an instrument.

Once I finally had the Mangler installed in my studio and I tried to use it I got the impression that there were clicks in the audio. This is really to be expected when you abruptly move a read pointer in an audio buffer, I don't know why it wasn't noticeable before. To remedy the clicks I refactored the firmware a second time so that it runs all audio engines at once and is able to cross-fade between them.

All this was a good reminder how much work it takes to "finish" a project once you think the hard part is done. Even though the Mangler is such a simple, almost trivial effect, I look forward to being able to use it in my music.

SIMD in 1brc

GitHub

I previously wrote about my work on the "One Billion Row challenge". This is a challenge to write a program that processes a 15GB file as fast as possible. In that post I mentioned SIMD programming, i.e. vectorizing, as a possible furter performance improvement.

I am happy to report that I managed to write a SIMD parser for the simple number format of the challenge using ARM Neon intrinsics. It took me nearly a day to figure out what that means and how to do it. It is a fun exercise to take a character-at-a-time algorithm and to vectorize it which means to write it so that it does a vector's worth of work at once. In this particular case, it means taking a string of 4 characters like 12.3 and turning it into the number 123 by operating on all 4 characters at once. While I was at it I also made sure there is no branching.

A parser for numbers of the formats 12.3\n, -12.3\n, 1.2\n and -1.2\n.


int64_t parsenumneon(char **pp) {
  uint16x8_t normalized, scaled,
      minimums = {'0', '0', '.', '0', '0', '.', '0', '\n'},
      scaletens = {100, 10, 0, 1, 10, 0, 1, 0},
      maximums = {9, 9, 0, 9, 9, 0, 9, 0}, invec, rangecheck;
  uint16_t input[4];
  int16_t sign;
  uint16_t isddpd, isdpdn;
  uint8_t *p = (uint8_t *)*pp;
  int64_t out = 0;

  sign = 1 - 2 * (*p == '-');
  p += (*p == '-');

  input[0] = p[0];
  input[1] = p[1];
  input[2] = p[2];
  input[3] = p[3];
  invec = vcombine_u16(vld1_u16(input), vld1_u16(input));

  /* Note that x >= '0' && x <= '9' is equivalent to x - '0' >= 0 && x - '0'
   * <= 9. With unsigned integers, y >= 0 is always true so we can simplify this
   * to x - '0' <= 9. Similarly, x == '.' is equivalent to x - '.' <= 0. So we
   * can simultaneously do the range checks for the digits and look for the '.'
   * by doing vector subtraction followed by less-than-or-equal. */

  normalized = vsubq_u16(invec, minimums);
  rangecheck = vcleq_u16(normalized, maximums);

  /* ddpd is the "digit digit period digit" pattern; dpdn is "digit period digit
   * newline" */
  isddpd = !!vminv_u16(vget_low_u16(rangecheck));
  isdpdn = !!vminv_u16(vget_high_u16(rangecheck));

  scaled = vmulq_u16(normalized, scaletens);
  out = isddpd * sign * vaddv_u16(vget_low_u16(scaled)) +
        isdpdn * sign * vaddv_u16(vget_high_u16(scaled));
  p += isddpd * 4 + isdpdn * 3;
  *pp = (char *)p;
  return out;
}

The bad news is that my program barely got any faster! This is what you get when you "optimize" things without good profiling data. So really the next step would be for me to move development over to Linux, or figure out how to get CPU profiles on macOS.

Tags: recurse forth

IndexContact