Forth progress report
2025-06-14
I'm slowly building out my own Forth implementation. In this post I will reflect on what I have been doing since my last post.
Recap
Forth is a programming language that is simple enough for people to like writing their own version of it. Simple does not mean easy however.
My ideal is to write something like Jonesforth but that is written in 386 assembler which is currently beyond my skills. I got started writing my own Forth "yaforth" in C instead, based on a nice Hacker News comment by user "tzs". I'll call that design "tzs-forth".
My project is to add features to yaforth and transform its design from tzs-forth to something closer to Jonesforth. If you want to read along with what changed then you can see my tzs-forth here. The current more Jones-ish version is here.
Why I want to move from tzs-forth closer to Jonesforth
One of the things I like about Jonesforth is how the code is structured. It has two "halves". The first half (jonesforth.S) is assembler, and the second half (jonesforth.f) is Forth. The assembler part compiles to an executable. The idea is you run the executable with jonesforth.f as a kind of startup file. Once the executable has processed the startup file it is ready for interactive use.
As a Forth learner, what I find so interesting here is how much of the system is defined in Forth itself. Even inside the assembler file jonesforth.S you find many functions ("words") defined in Forth. They don't use Forth syntax because it's an assembler file but the author is nevertheless defining "colon words", meaning words defined in Forth instead of native code.
For example, the word :
is defined as an arary of previously defined words, not as assembly.
I want yaforth to be like that too.
Changing the memory layout
The great thing about tzs-forth is that it uses "natural" or "normal" C concepts to help you get started building a Forth interpreter. This helps if you don't already understand Forth internals. But the downside of this approach, at least in my case, was that I ended up with data structures and a memory layout that make sense for a generic C program, but that are hard to read and write from Forth.
Forth expects to work with a single contiguous block of memory. My tzs-forth implementation used C malloc for strings and Forth colon word instruction arrays, meaning many scattered bits of memory.
The current version of yaforth uses one block of 32KB of memory called mem
for all its allocations. Malloc is no longer used anywhere. We still have separate arrays for the parameter and return stacks. I'm not sure how much I would gain from moving those into mem
main memory; I will find out.
Strings are stored in mem
main memory with 32-bit length prefixes.
Dictionary entries are still C structs, which means that the C compiler is in charge of their memory layout. For the sake of maximizing code written in Forth it would be better to define their layout in Forth. I may do that later.
Functions that modify interpreter state
Another problem is that Jonesforth uses a pair of global instruction pointers. This allows it to intermingle code and data. For example, a relative jump in Jonesforth looks like a pointer to the code word BRANCH
followed by the jump offset. That works because BRANCH
can read the instruction pointer to find the jump offset, and then add that offset to the instruction pointer: that is what a jump is after all.
In my tzs-forth there was no global instruction pointer so I built support for local jumps into the function that runs colon words. This is the old code that performs a jump if the first value on the stack is zero. I have omitted the rest of the function.
} else if (def[i] == DEFJUMPNZ) {
int x;
assert(++i < deflen);
if (stackpop(&x) && !x) {
i += def[i];
assert(i >= 0 && i < deflen);
}
}
In the current code, the function that runs code words is oblivious about how jumps are implemented. Jumps are split into unconditional jump (branch
) and jump-if-zero (branch0
). The variable vm.next
is one of the (global) instruction pointers.
void branch(void) {
vm.next += *intaddr(vm.next);
next();
}
void branch0(void) {
int x;
if (stackpop(&x) && !x) {
branch();
} else {
intpp(&vm.next); /* discard jump offset */
next();
}
}
Writing part of the language in Forth itself
In the old yaforth I could not wrap my head around how to start writing part of the language in Forth itself. In the current version I have solved this as follows.
There is a separate file builtin.f
in the root of the repository. The Makefile converts this into an array of bytes in a header file using the ad-hoc file embedding utility builtin.c. In forth.c, I have replaced all calls to getchar()
with a wrapper function which first returns all bytes from the builtin "input array" before it starts calling getchar()
to read from stdin.
int Getchar(void) {
return builtinp < nelem(builtinforth) ? builtinforth[builtinp++] : getchar();
}
I don't know why I couldn't come up with that the first time. :)
I know this is inscrutable to the untrained eye, but it is satisfying to define if
, then
and else
in the language rather than in the interpreter:
: if immediate ' branch0 , here @ 0 , ;
: else immediate ' branch , here @ 0 , swap dup here @ swap - swap ! ;
: then immediate dup here @ swap - swap ! ;
Understanding what this means is one of my main motivations for writing my own Forth.
What's next
I'm going to keep nudging yaforth closer to Jonesforth. I can pick anything I like from the Jonesforth source code and see if I feel like implementing it. I'm not trying to make a useful language per se, my main goal is to slowly get to understand Forth better.
Thank you for reading along!
Tags: forth