I have implemented another Forth. My fourth Forth in fact ;-)
A decade ago I built Enth, an ANS Forth, and Flux, a colorForth variant. Later I added Raven to the list. These projects were amateurish because I was inexperienced, but they ran fine and were a lot of fun. I learnt a lot about software and hardware that proved useful in my later work even though I was no longer working with Forth.
To make money I have had to learn and use various other mainstream languages. Mostly that is fine and they usually get the job done. Most importantly every language teaches me something new.
For a while now I have wanted to try building a Forth again. It should be recognisably Forth, but one that fixes aspects of the language that I now consider problematic.
Some general Forth gripes:
Some high-level gripes when using Forth inside other popular envrionments:
It is small, familiar, still a Forth. Not an ANS-Forth out of the box, but:
Version 1 is built with C to get going quickly and so it can run on my x86(-64) machines and my ARM devices. It uses GNU C's Labels as Values (like gforth does) to build a proper threaded Forth without relying on the C call stack.
It is on github.
Reforth tries to solve my gripes without fundamentally changing Forth's elegance and simplicity.
Now to the gripes! Many of these have been solved in other Forths and certainly none are new ideas. But, for whatever reason, I havn't found them all together in one place. So, here goes...
In Reforth, the word end replaces traditional Forth's then, loop, again, and repeat. This:
Any number of while calls can be made in a single conditional loop:
begin condition1 while condition2 while operation end
The above allows conditional chaining like the && operator in C or bash.
The word until does not terminate a loop, but is just syntactic sugar for 0= while. Plus, the two can be mixed:
begin condition1 while condition2 until operation end
Counted loops use for end instead of do loop, and take a single argument rather than an index and limit:
10 for i . cr end
Having simpler counted loops allows the conditional loop words to also be used there:
10 for condition1 while operation end
And conversely i can also be used in a conditional loop to get the index:
begin i . cr end
The words leave and next work like C's break and continue for fine-grained control, and may apply to either counted or conditional loops:
begin condition if operation next end condition1 if operation1 leave end end 10 for condition if operation leave end end
Finally, ANS-Forth's case of endof endcase have been dumped, because it is easy to build a switch:
begin condition1 if operation1 leave end condition2 if operation1 leave end operation leave end
Words if, begin, and for are state-smart. When called in interpret mode they automatically enter compile mode. Later, end detects the fact, executes the compiled code fragment, and reverts to interpret mode. This is quite useful when using Forth as a shell or scripting language.
> 5 for "hello world\n" type end hello world hello world hello world hello world hello world ok (0)
Word definitions may be nested.
: hiphip ( -- ) : cheer "hip hip, hooray!" type ; cheer cheer cheer ;
Ok, so one could do the above with a loop. But there is more to it:
Implementing sub-words is easy. Many Forths could probably do it:
: dump ( address length -- ) : hex ( a -- ) at! 16 for c@+ FFh and "%02x " format type end ; : ascii ( a -- ) at! 16 for c@+ dup alpha? over digit? or 0= if drop 46 end emit end ; 16 / 1+ for dup "\n%08x " format type dup hex space dup ascii 16 + end drop ;
> 'dump sys:xt-body @ 100 dump 0062ef0a 72 00 18 00 14 00 6f 00 10 00 00 00 00 00 00 00 r.....o......... 0062ef1a 73 00 0f 00 19 00 6f 00 ff 00 00 00 00 00 00 00 s.....o......... 0062ef2a 28 00 70 00 04 00 25 30 32 78 20 00 98 00 74 00 ..p....02x....t. 0062ef3a 02 00 72 00 1b 00 14 00 6f 00 10 00 00 00 00 00 ..r.....o....... 0062ef4a 00 00 73 00 12 00 19 00 08 00 96 00 0a 00 95 00 ..s............. 0062ef5a 29 00 2f 00 71 00 07 00 09 00 6f 00 2e 00 00 00 ....q.....o..... 0062ef6a 00 00 00 00 40 00 74 00 02 00 6f 00 10 00 00 00 ......t...o..... ok
Supporting random numbers of local variables in Forth makes for complex handling of the return stack at run-time, or a complex compiler, or both.
Reforth supports precisely two local variables: at and my
: accept ( buf lim -- len ) over at! for key my! my while my 10 = until my c!+ end 0 c!+ at 1- swap - ;
Since the number of locals is fixed the implementation is able to be very efficient. Entering and exiting a high-level word simply adjusts the return stack pointer by three cells instead of one.
Being limited to two locals per word combines elegantly with sub-words. Want more locals? Use more words!
It has been common for decades for Forth programmers to implement records. Something like:
: struct 0 ; : field create over , + does @ + ; struct cell field alpha cell field beta 100 field gamma constant stuff
Basic relative addressing using names which improves code readbility. The create/does overhead can be a bottleneck but fancier and faster implementations are in the wild. Forth200x has something similar.
Reforth implements records and fields in code for efficiency, and uses a nice clean syntax:
record stuff cell field alpha cell field beta 100 field gamma end
Furthermore, record is an immediate word, so one can define private records inside words. This is useful for neatly managing memory allotted by create does defining words:
: fruit ( apples oranges -- ) record fields cell field apples cell field oranges end create here tuck fields allot oranges ! apples ! does ( etc... ) ;
Sub-words lend themselves to implementing APIs and libraries just like traditional wordlists. In Reforth, the parser is modified to recognise a new syntax:
The above tells the parser to:
This solves the second half of the problem with vocabularies. Saying hiphip:cheer is unambiguous, does not rely on a search order being set first, and does double duty by grouping related words with a common prefix without requiring that prefix to be used inside the library itself.
Being able to call sub-words using the outer:inner notation looks a lot like calling a static method in a class in other languages. How about proper objects and methods? We already have the tools: records, sub-words, and Forth's create/does.
: fruit-basket ( -- ) record fields cell field apples cell field oranges cell field bananas end : count-fruit ( this -- n ) at! at apples @ at oranges @ + at bananas @ + ; create fields allot does ; fruit-basket hamper 5 hamper.apples ! 3 hamper.oranges ! 2 hamper.bananas ! hamper.count-fruit
Above, the outer:inner static notation has been tweaked to be object.method. It tells the parser to:
Step #2 works because Reforth's word does patches the defining word's sub-word list into created words' headers, ie, hamper inherits fruit-basket's sub-word list.
A more useful example:
: array ( -- ) record fields cell field size cell field data end : index ( i a -- ) dup at! size @ 1- min 0 max cells at data @ + ; : get ( i a -- n ) index @ ; : set ( n i a -- ) index ! ; : dump ( a -- ) dup at! size @ at "array %x\n" print for i at get i " %d => %d\n" print end ; : construct ( n -- a ) here fields allot over cells allocate over data ! tuck size ! ; : destruct ( a -- ) dup data @ free free ; create construct drop does ;
> 3 array test ok > test.dump array 62f452 0 => 0 1 => 0 2 => 0 ok > 17 1 test.set ok > test.dump array 62f452 0 => 0 1 => 17 2 => 0 ok > 1 test.get . 17 ok
Reforth understands how to parse string literals directly, without words like s" or z". It follows the C-like escape sequences for \t, \r, \n, etc. A circular set of string buffers are used to allow up to three string literals in use simultaneously in interpret mode.
"hello world\n" type
Reforth has the word format which wraps C's sprintf() functionality. All the usual % sequences are supported. Like string literals, formatted strings use a circular set of buffers so that formatting operations may be chained in interpret mode.
42 "the number is: %d\n" format type
Reforth allows matching and splitting strings by regex:
: taste-test ( subject -- ) "(apple|orange|banana)" match if "got fruit!\n" type end ; : process-lines ( text -- ) : line ( text -- text' flag ) dup "\r?\n" split rot type ; begin line while cr end drop ;