Reforth

I have implemented another Forth. My fourth Forth in fact ;-)

A decade ago I built Enth, an ANS Forth, and Flux, a colorForth variant. Later I added Raven to the list. These projects were amateurish because I was inexperienced, but they ran fine and were a lot of fun. I learnt a lot about software and hardware that proved useful in my later work even though I was no longer working with Forth.

To make money I have had to learn and use various other mainstream languages. Mostly that is fine and they usually get the job done. Most importantly every language teaches me something new.

For a while now I have wanted to try building a Forth again. It should be recognisably Forth, but one that fixes aspects of the language that I now consider problematic.

Some general Forth gripes:

Some high-level gripes when using Forth inside other popular envrionments:

Rethinking Forth

Enter Reforth!

It is small, familiar, still a Forth. Not an ANS-Forth out of the box, but:

Version 1 is built with C to get going quickly and so it can run on my x86(-64) machines and my ARM devices. It uses GNU C's Labels as Values (like gforth does) to build a proper threaded Forth without relying on the C call stack.

It is on github.

Reforth tries to solve my gripes without fundamentally changing Forth's elegance and simplicity.

Now to the gripes! Many of these have been solved in other Forths and certainly none are new ideas. But, for whatever reason, I havn't found them all together in one place. So, here goes...

Simplified Control Structures

In Reforth, the word end replaces traditional Forth's then, loop, again, and repeat. This:

Any number of while calls can be made in a single conditional loop:

begin
    condition1
while
    condition2
while
    operation
end

The above allows conditional chaining like the && operator in C or bash.

The word until does not terminate a loop, but is just syntactic sugar for 0= while. Plus, the two can be mixed:

begin
    condition1
while
    condition2
until
    operation
end

Counted loops use for end instead of do loop, and take a single argument rather than an index and limit:

10 for i . cr end

Having simpler counted loops allows the conditional loop words to also be used there:

10 for
    condition1
while
    operation
end

And conversely i can also be used in a conditional loop to get the index:

begin
    i . cr
end

The words leave and next work like C's break and continue for fine-grained control, and may apply to either counted or conditional loops:

begin
    condition
    if  operation
        next
    end
    condition1
    if  operation1
        leave
    end
end

10 for
    condition
    if  operation
        leave
    end
end

Finally, ANS-Forth's case of endof endcase have been dumped, because it is easy to build a switch:

begin
    condition1
    if  operation1
        leave
    end
    condition2
    if  operation1
        leave
    end
    operation
    leave
end

Smarter Control Structures

Words if, begin, and for are state-smart. When called in interpret mode they automatically enter compile mode. Later, end detects the fact, executes the compiled code fragment, and reverts to interpret mode. This is quite useful when using Forth as a shell or scripting language.

> 5 for "hello world\n" type end
hello world
hello world
hello world
hello world
hello world
 ok (0)

Sub-Words

Word definitions may be nested.

: hiphip ( -- )
    : cheer "hip hip, hooray!" type ;
    cheer cheer cheer
;

Ok, so one could do the above with a loop. But there is more to it:

Implementing sub-words is easy. Many Forths could probably do it:

  1. Make : (colon) an immediate word.
  2. Turn Forth's compile/interpret flag into a counter. Have : (colon) increment it and ; (semi-colon) decrement.
  3. Implement one of the following:
  4. Make ; (semicolon) patch the current wordlist appropriately to hide sub-words.

Example:

: dump ( address length -- )

    : hex ( a -- )
        at! 16
        for c@+
            FFh and "%02x " format type
        end ;

    : ascii ( a -- )
        at! 16
        for c@+
            dup alpha? over digit? or 0=
            if drop 46 end emit
        end ;

    16 / 1+
    for
        dup "\n%08x  " format type
        dup hex space dup ascii
        16 +
    end
    drop ;

Result:

> 'dump sys:xt-body @ 100 dump

0062ef0a  72 00 18 00 14 00 6f 00 10 00 00 00 00 00 00 00  r.....o.........
0062ef1a  73 00 0f 00 19 00 6f 00 ff 00 00 00 00 00 00 00  s.....o.........
0062ef2a  28 00 70 00 04 00 25 30 32 78 20 00 98 00 74 00  ..p....02x....t.
0062ef3a  02 00 72 00 1b 00 14 00 6f 00 10 00 00 00 00 00  ..r.....o.......
0062ef4a  00 00 73 00 12 00 19 00 08 00 96 00 0a 00 95 00  ..s.............
0062ef5a  29 00 2f 00 71 00 07 00 09 00 6f 00 2e 00 00 00  ....q.....o.....
0062ef6a  00 00 00 00 40 00 74 00 02 00 6f 00 10 00 00 00  ......t...o..... ok

Two Local Variables

Supporting random numbers of local variables in Forth makes for complex handling of the return stack at run-time, or a complex compiler, or both.

Reforth supports precisely two local variables: at and my

Example:

: accept ( buf lim -- len )
    over at!
    for
        key my!
        my while
        my 10 = until
        my c!+
    end
    0 c!+ at 1- swap - ;

Since the number of locals is fixed the implementation is able to be very efficient. Entering and exiting a high-level word simply adjusts the return stack pointer by three cells instead of one.

Being limited to two locals per word combines elegantly with sub-words. Want more locals? Use more words!

Immediate Structures

It has been common for decades for Forth programmers to implement records. Something like:

: struct 0 ;
: field create over , + does @ + ;

struct
    cell field alpha
    cell field beta
     100 field gamma
constant stuff

Basic relative addressing using names which improves code readbility. The create/does overhead can be a bottleneck but fancier and faster implementations are in the wild. Forth200x has something similar.

Reforth implements records and fields in code for efficiency, and uses a nice clean syntax:

record stuff
    cell field alpha
    cell field beta
     100 field gamma
end

Furthermore, record is an immediate word, so one can define private records inside words. This is useful for neatly managing memory allotted by create does defining words:

: fruit ( apples oranges -- )

    record fields
        cell field apples
        cell field oranges
    end

    create
        here tuck
        fields allot
        oranges !
        apples  !
    does
        ( etc... ) ;

Sub-words as Libraries

Sub-words lend themselves to implementing APIs and libraries just like traditional wordlists. In Reforth, the parser is modified to recognise a new syntax:

hiphip:cheer

The above tells the parser to:

  1. Look for the word hiphip in the dictionary
  2. Look for the sub-word cheer in hiphip's list
  3. Execute or compile cheer

This solves the second half of the problem with vocabularies. Saying hiphip:cheer is unambiguous, does not rely on a search order being set first, and does double duty by grouping related words with a common prefix without requiring that prefix to be used inside the library itself.

Sub-words as Objects

Being able to call sub-words using the outer:inner notation looks a lot like calling a static method in a class in other languages. How about proper objects and methods? We already have the tools: records, sub-words, and Forth's create/does.

: fruit-basket ( -- )

    record fields
        cell field apples
        cell field oranges
        cell field bananas
    end

    : count-fruit ( this -- n )
        at!
        at apples  @
        at oranges @ +
        at bananas @ + ;

    create fields allot does ;

fruit-basket hamper

5 hamper.apples  !
3 hamper.oranges !
2 hamper.bananas !

hamper.count-fruit

Above, the outer:inner static notation has been tweaked to be object.method. It tells the parser to:

  1. Look for the word hamper in the dicitonary
  2. Look for the sub-word count-fruit in hamper's list
  3. Execute or compile hamper
  4. Execute or compile cheer

Step #2 works because Reforth's word does patches the defining word's sub-word list into created words' headers, ie, hamper inherits fruit-basket's sub-word list.

A more useful example:

: array ( -- )

    record fields
        cell field size
        cell field data
    end

    : index ( i a -- )
        dup at! size @ 1- min 0 max cells at data @ + ;

    : get ( i a -- n )
        index @ ;

    : set ( n i a -- )
        index ! ;

    : dump ( a -- )
        dup at! size @
        at "array %x\n" print
        for
            i at get
            i " %d => %d\n" print
        end ;

    : construct ( n -- a )
        here fields allot
        over cells allocate
        over data !
        tuck size ! ;

    : destruct ( a -- )
        dup data @ free free ;

    create construct drop does
;

Usage:

> 3 array test
 ok
> test.dump
array 62f452
 0 => 0
 1 => 0
 2 => 0
 ok
> 17 1 test.set
 ok
> test.dump
array 62f452
 0 => 0
 1 => 17
 2 => 0
 ok
> 1 test.get .
17  ok

String Literals

Reforth understands how to parse string literals directly, without words like s" or z". It follows the C-like escape sequences for \t, \r, \n, etc. A circular set of string buffers are used to allow up to three string literals in use simultaneously in interpret mode.

"hello world\n" type

String Formatting

Reforth has the word format which wraps C's sprintf() functionality. All the usual % sequences are supported. Like string literals, formatted strings use a circular set of buffers so that formatting operations may be chained in interpret mode.

42 "the number is: %d\n" format type

POSIX Regular Expressions

Reforth allows matching and splitting strings by regex:

: taste-test ( subject -- )

    "(apple|orange|banana)" match

    if "got fruit!\n" type end ;

: process-lines ( text -- )

    : line ( text -- text' flag )
        dup "\r?\n" split rot type ;

    begin line while cr end drop ;