TBF_FORTH_VERSION equ 1 COMMENT @/* PUBLIC DOMAIN ---------------------------------------------------------------------- I, the copyright holder of this work, hereby release it into the public domain. This applies worldwide. In case this is not legally possible, I grant any entity the right to use this work for any purpose, without any conditions, unless such conditions are required by law. INFO ------------------------------------------------------------------------------- File: forath.asm VENDOR TARGETS: OS: Windows 11 amd64 ASSEMBLER: Micorsoft Macro Assembler Version 14.43 Inspiration to learn FORTH: Metaprogramming VAMP in KYRA, a Next-gen Forth-like language --- Onat Türkçüoğlu -- 2025-04-26 https://www.youtube.com/watch?v=J9U_5tjdegY Onat's post on his KYRA language: https://onatto.github.io/lang.html Timothy Lottes forth-like language: "A"; inspired him with KYRA. All of which are related to FORTH and COLOR FORTH An introduction to forth based on jonesforth.S See: https://github.com/nornagon/jonesforth/blob/master/jonesforth.S I will be pasting much of the commentary from the original source into this file, with some edits. SETTING UP ---------------------------------------------------------------------- Let's get a few housekeeping things out of the way. Firstly because I need to draw lots of ASCII-art diagrams to explain concepts, the best way to look at this is using a window which uses a fixed width font and is at least this wide: <------------------------------------------------------------------------------------------------------------------------> Secondly make sure TABS are set to 8 characters. The following should be a vertical line. If not, sort out your tabs. | | | Thirdly I assume that your screen is at least 50 characters high. THE DICTIONARY ---------------------------------------------------------------------- In FORTH as you will know, functions are called "words", and just as in other languages they have a name and a definition. Here are two FORTH words: : DOUBLE DUP + ; \ name is "DOUBLE", definition is "DUP +" : QUADRUPLE DOUBLE DOUBLE ; \ name is "QUADRUPLE", definition is "DOUBLE DOUBLE" Words, both built-in ones and ones which the programmer defines later, are stored in a dictionary which is just a linked list of dictionary entries. <--- DICTIONARY ENTRY (HEADER) -----------------------> +------------------------+--------+---------- - - - - +----------- - - - - | LINK POINTER | LENGTH/| NAME | DEFINITION | | FLAGS | | +--- (4 bytes) ----------+- byte -+- n bytes - - - - +----------- - - - - I'll come to the definition of the word later. For now just look at the header. The first 4 bytes are the link pointer. This points back to the previous word in the dictionary, or, for the first word in the dictionary it is just a NULL pointer. Then comes a length/flags byte. The length of the word can be up to 31 characters (5 bits used) and the top three bits are used for various flags which I'll come to later. This is followed by the name itself, and in this implementation the name is rounded up to a multiple of 4 bytes by padding it with zero bytes. That's just to ensure that the definition starts on a 32 bit boundary. A FORTH variable called LATEST contains a pointer to the most recently defined word, in other words, the head of this linked list. DOUBLE and QUADRUPLE might look like this: pointer to previous word ^ | +--|------+---+---+---+---+---+---+---+---+------------- - - - - | LINK | 6 | D | O | U | B | L | E | 0 | (definition ...) +---------+---+---+---+---+---+---+---+---+------------- - - - - ^ len padding | +--|------+---+---+---+---+---+---+---+---+---+---+---+---+------------- - - - - | LINK | 9 | Q | U | A | D | R | U | P | L | E | 0 | 0 | (definition ...) +---------+---+---+---+---+---+---+---+---+---+---+---+---+------------- - - - - ^ len padding | | LATEST You should be able to see from this how you might implement functions to find a word in the dictionary (just walk along the dictionary entries starting at LATEST and matching the names until you either find a match or hit the NULL pointer at the end of the dictionary); and add a word to the dictionary (create a new definition, set its LINK to LATEST, and set LATEST to point to the new word). We'll see precisely these functions implemented in assembly code later on. One interesting consequence of using a linked list is that you can redefine words, and a newer definition of a word overrides an older one. This is an important concept in FORTH because it means that any word (even "built-in" or "standard" words) can be overridden with a new definition, either to enhance it, to make it faster or even to disable it. However because of the way that FORTH words get compiled, which you'll understand below, words defined using the old definition of a word continue to use the old definition. Only words defined after the new definition use the new definition. */@ ; NEXT MACRO: ; This loads a qword from [rsi] into rax and increments rsi by 8, then ; jumps to the address stored at [rax]. NEXT MACRO lodsq jmp qword ptr [rax] ENDM COMMENT @/* The macro is called NEXT. That's a FORTH-ism. It expands to those two instructions. Every FORTH primitive that we write has to be ended by NEXT. Think of it kind of like a return. DIRECT THREADED CODE ---------------------------------------------------------------------- Let's talk about what "threaded code" means. Imagine a peculiar version of C where you are only allowed to call functions without arguments. (Don't worry for now that such a language would be completely useless!) So in our peculiar C, code would look like this: f () { a (); b (); c (); } and so on. How would a function, say 'f' above, be compiled by a standard C compiler? Probably into assembly code like this. On the right hand side I've written the actual x86 machine code. f: CALL a 0E8h, 008h, 000h, 000h, 000h CALL b 0E8h, 01Ch, 000h, 000h, 000h CALL c 0E8h, 02Ch, 000h, 000h, 000h ; ignore the return from the function for now "E8h" is the x86 machine code to "CALL" a function. In the first 20 years of computing memory was hideously expensive and we might have worried about the wasted space being used by the repeated "E8h" bytes. We can save 20% in code size (and therefore, in expensive memory) by compressing this into just: 008h, 000h, 000h, 000h Just the function addresses, without 01Ch, 000h, 000h, 000h the CALL prefix. 02Ch, 000h, 000h, 000h On a 16-bit machine like the ones which originally ran FORTH the savings are even greater - 33%. [Historical note: If the execution model that FORTH uses looks strange from the following paragraphs, then it was motivated entirely by the need to save memory on early computers. This code compression isn't so important now when our machines have more memory in their L1 caches than those early computers had in total, but the execution model still has some useful properties]. Of course this code won't run directly on the CPU any more. Instead we need to write an interpreter which takes each set of bytes and calls it. On an x86 machine it turns out that we can write this interpreter rather easily, in just two assembly instructions which turn into just 3 bytes of machine code. Let's store the pointer to the next word to execute in the ESI register: 008h, 000h, 000h, 000h <- We're executing this one now. ESI is the _next_ one to execute. ESI -> 01Ch, 000h, 000h, 000h 02Ch, 000h, 000h, 000h The all-important x86 instruction is called LODSD (or in Intel manuals, LODSD). It does two things. Firstly it reads the memory at ESI into the accumulator (EAX). Secondly it increments ESI by 4 bytes. So after LODSD, the situation now looks like this: 008h, 000h, 000h, 000h <- We're still executing this one 01Ch, 000h, 000h, 000h <- EAX now contains this address (0000001Ch) ESI -> 02Ch, 000h, 000h, 000h Now we just need to jump to the address in EAX. This is again just a single x86 instruction written JMP DWORD PTR [EAX]. And after doing the jump, the situation looks like: 008h, 000h, 000h, 000h 01Ch, 000h, 000h, 000h <- Now we're executing this subroutine. ESI -> 02Ch, 000h, 000h, 000h To make this work, each subroutine is followed by the two instructions 'LODSD; JMP DWORD PTR [EAX]' which literally make the jump to the next subroutine. ------------------------------------------------------------------------------------------- To sum up: We compress our function calls down to a list of addresses and use a somewhat magical macro to act as a "jump to next function in the list". We also use one register (ESI) to act as a kind of instruction pointer, pointing to the next function in the list. */@ ; Macros that deal with the return stack PUSH_RSP MACRO reg lea rbp, [rbp - 8] ; push reg on to return stack mov [rbp], reg ENDM POP_RSP MACRO reg mov reg, [rbp] ; pop top of return stack to reg lea rbp, [rbp + 8] ENDM .code set_up_data_segment PROC ; For now, just return - Windows handles memory differently than Linux ret set_up_data_segment ENDP ; DOCOL - the interpreter! .code ALIGN 8 DOCOL: PUSH_RSP rsi ; push rsi on to the return stack add rax, 8 ; rax points to codeword, so make mov rsi, rax ; rsi point to first data word NEXT .code PUBLIC main main: cld mov var_S0, rsp ; Save the initial data stack pointer in FORTH variable S0. mov rbp, OFFSET return_stack_top ; Initialise the return stack. call set_up_data_segment mov rsi, OFFSET cold_start ; Initialise interpreter. NEXT ; Run interpreter! .const cold_start: ; High-level code without a codeword. dq QUIT ; Flags - these are discussed later. F_IMMED equ 80h F_HIDDEN equ 20h F_LENMASK equ 1fh ; length mask ; Store the chain of links. link = 0 def_word MACRO name, namelen, flags:=<0>, label .const ALIGN 8 PUBLIC name_&label name_&label: dq link ; link link = name_&label db flags + namelen ; flags + length byte db "&name" ; the name ALIGN 8 ; padding to next 8 byte boundary PUBLIC label label: dq DOCOL ; codeword - the interpreter ; list of word pointers follow ENDM def_code MACRO name, namelen, flags:=<0>, label .const ALIGN 8 PUBLIC name_&label&_WORD name_&label&_WORD: dq link ; 64-bit link pointer link = name_&label&_WORD ; Update link to current word db flags + namelen ; Flags + length byte db name ; The name (string literal) ALIGN 8 ; Padding to next 8-byte boundary PUBLIC &label&_WORD &label&_WORD: dq code_&label&_WORD ; 64-bit codeword pointer .code ALIGN 8 PUBLIC code_&label&_WORD code_&label&_WORD: ; Assembler code follows ENDM ; Now some easy FORTH primitives. These are written in assembly for speed. ; drop top of stack def_code "DROP", 4, , DROP pop rax NEXT ; Swap two elements on stack def_code "SWAP", 4, , SWAP pop rax pop rbx push rax push rbx NEXT ; duplicate top of stack def_code "DUP", 3, , DUP mov rax, [rsp] push rax NEXT ; get the second element of the stack and push it on top def_code "OVER", 4, , OVER mov rax, [rsp + 8] ; get the second element of stack push rax ; and push it on top NEXT def_code "ROT", 3, , ROT pop rax pop rbx pop rcx push rbx push rax push rcx NEXT def_code "-ROT", 4, , NROT pop rax pop rbx pop rcx push rax push rcx push rbx NEXT ; drop top two elements of stack def_code "2DROP", 5, , TWODROP pop rax pop rax NEXT ; duplicate top two elements of stack def_code "2DUP", 4, , TWODUP mov rax, [rsp] mov rbx, [rsp + 8] push rbx push rax NEXT ; swap top two pairs of elements of stack def_code "2SWAP", 5, , TWOSWAP pop rax pop rbx pop rcx pop rdx push rbx push rax push rdx push rcx NEXT ; duplicate top of stack if non-zero def_code "?DUP", 4, , QDUP mov rax, [rsp] test rax, rax jz @F push rax @@: NEXT ; increment top of stack def_code "1+", 2, , INCR inc qword ptr [rsp] NEXT ; decrement top of stack def_code "1-", 2, , DECR dec qword ptr [rsp] NEXT ; add 4 to top of stack def_code "4+", 2, , INCR4 add qword ptr [rsp], 4 NEXT ; subtract 4 from top of stack def_code "4-", 2, , DECR4 sub qword ptr [rsp], 4 NEXT ; get top of stack ; and add it to next word on stack def_code "+", 1, , ADD pop rax add [rsp], rax NEXT ; get top of stack ; and subtract it from next word on stack def_code "-", 1, , SUB pop rax sub [rsp], rax NEXT ; ignore overflow def_code "*", 1, , MUL pop rax pop rbx imul rax, rbx push rax NEXT COMMENT @/* In this FORTH, only /MOD is primitive. Later we will define the / and MOD words in terms of the primitive /MOD. The design of the i386 assembly instruction idiv which leaves both quotient and remainder makes this the obvious choice. */@ def_code "/MOD", 4, , DIVMOD xor rdx, rdx pop rbx pop rax idiv rbx push rdx ; push remainder push rax ; push quotient NEXT COMMENT @/* Lots of comparison operations like =, <, >, etc.. ANS FORTH says that the comparison words should return all (binary) 1's for TRUE and all 0's for FALSE. However this is a bit of a strange convention so this FORTH breaks it and returns the more normal (for C programmers ...) 1 meaning TRUE and 0 meaning FALSE. */@ ; top two words are equal? def_code "=", 1, , EQU pop rax pop rbx cmp rbx, rax sete al movzx rax, al push rax NEXT ; top two words are not equal? def_code "<>", 2, , NEQU pop rax pop rbx cmp rbx, rax setne al movzx rax, al push rax NEXT def_code "<", 1, , LT pop rax pop rbx cmp rbx, rax setl al movzx rax, al push rax NEXT def_code ">", 1, , GT pop rax pop rbx cmp rbx, rax setg al movzx rax, al push rax NEXT def_code "<=", 2, , LE pop rax pop rbx cmp rbx, rax setle al movzx rax, al push rax NEXT def_code ">=", 2, , GE pop rax pop rbx cmp rbx, rax setge al movzx rax, al push rax NEXT ; top of stack equals 0? def_code "0=", 2, , ZEQU pop rax test rax, rax setz al movzx rax, al push rax NEXT ; top of stack not 0? def_code "0<>", 3, , ZNEQU pop rax test rax, rax setnz al movzx rax, al push rax NEXT ; comparisons with 0 def_code "0<", 2, , ZLT pop rax test rax, rax setl al movzx rax, al push rax NEXT def_code "0>", 2, , ZGT pop rax test rax, rax setg al movzx rax, al push rax NEXT def_code "0<=", 3, , ZLE pop rax test rax, rax setle al movzx rax, al push rax NEXT def_code "0>=", 3, , ZGE pop rax test rax, rax setge al movzx rax, al push rax NEXT ; bitwise AND def_code "AND", 3, , AND pop rax and [rsp], rax NEXT ; bitwise OR def_code "OR", 2, , OR pop rax or [rsp], rax NEXT ; bitwise XOR def_code "XOR", 3, , XOR pop rax xor [rsp], rax NEXT ; this is the FORTH bitwise "NOT" function def_code "INVERT", 6, , INVERT not qword ptr [rsp] NEXT COMMENT @/* RETURNING FROM FORTH WORDS ---------------------------------------------------------------------- Time to talk about what happens when we EXIT a function. In this diagram QUADRUPLE has called DOUBLE, and DOUBLE is about to exit (look at where %esi is pointing): QUADRUPLE +------------------+ | codeword | +------------------+ DOUBLE | addr of DOUBLE ---------------> +------------------+ +------------------+ | codeword | | addr of DOUBLE | +------------------+ +------------------+ | addr of DUP | | addr of EXIT | +------------------+ +------------------+ | addr of + | +------------------+ %esi -> | addr of EXIT | +------------------+ What happens when the + function does NEXT? Well, the following code is executed. */@ ; pop return stack into rsi def_code "EXIT", 4, , EXIT POP_RSP rsi NEXT COMMENT @/* EXIT gets the old %esi which we saved from before on the return stack, and puts it in %esi. So after this (but just before NEXT) we get: QUADRUPLE +------------------+ | codeword | +------------------+ DOUBLE | addr of DOUBLE ---------------> +------------------+ +------------------+ | codeword | %esi -> | addr of DOUBLE | +------------------+ +------------------+ | addr of DUP | | addr of EXIT | +------------------+ +------------------+ | addr of + | +------------------+ | addr of EXIT | +------------------+ And NEXT just completes the job by, well, in this case just by calling DOUBLE again :-) LITERALS ---------------------------------------------------------------------- The final point I "glossed over" before was how to deal with functions that do anything apart from calling other functions. For example, suppose that DOUBLE was defined like this: : DOUBLE 2 * ; It does the same thing, but how do we compile it since it contains the literal 2? One way would be to have a function called "2" (which you'd have to write in assembler), but you'd need a function for every single literal that you wanted to use. FORTH solves this by compiling the function using a special word called LIT: +---------------------------+-------+-------+-------+-------+-------+ | (usual header of DOUBLE) | DOCOL | LIT | 2 | * | EXIT | +---------------------------+-------+-------+-------+-------+-------+ LIT is executed in the normal way, but what it does next is definitely not normal. It looks at %esi (which now points to the number 2), grabs it, pushes it on the stack, then manipulates %esi in order to skip the number as if it had never been there. What's neat is that the whole grab/manipulate can be done using a single byte single i386 instruction, our old friend LODSL. Rather than me drawing more ASCII-art diagrams, see if you can find out how LIT works: */@ def_code "LIT", 3, , LIT ; rsi points to the next command, but in this case it points to the next ; literal 64 bit integer. Get that literal into rax and increment rsi. lodsq push rax ; push the literal number on to stack NEXT COMMENT @/* MEMORY ---------------------------------------------------------------------- An important point about FORTH is that it gives you direct access to the lowest levels of the machine. Manipulating memory directly is done frequently in FORTH, and these are the primitive words for doing it. */@ def_code "!", 1, , STORE pop rbx ; address to store at pop rax ; data to store there mov [rbx], rax ; store it NEXT def_code "@", 1, , FETCH pop rbx ; address to fetch mov rax, [rbx] ; fetch it push rax ; push value onto stack NEXT def_code "+!", 2, , ADDSTORE pop rbx ; address pop rax ; the amount to add add [rbx], rax ; add it NEXT def_code "-!", 2, , SUBSTORE pop rbx ; address pop rax ; the amount to subtract sub [rbx], rax ; subtract it NEXT COMMENT $/* ! and @ (STORE and FETCH) store 32-bit words. It's also useful to be able to read and write bytes so we also define standard words C@ and C!. Byte-oriented operations only work on architectures which permit them (i386 is one of those). */$ def_code "C!", 2, , STOREBYTE pop rbx ; address to store at pop rax ; data to store there mov [rbx], al ; store it NEXT def_code "C@", 2, , FETCHBYTE pop rbx ; address to fetch xor rax, rax mov al, [rbx] ; fetch it push rax ; push value onto stack NEXT ; C@C! is a useful byte copy primitive. def_code "C@C!", 4, , CCOPY mov rbx, [rsp + 8] ; source address mov al, [rbx] ; get source character pop rdi ; destination address stosb ; copy to destination push rdi ; increment destination address inc qword ptr [rsp + 8] ; increment source address NEXT ; and CMOVE is a block copy operation. def_code "CMOVE", 5, , CMOVE mov rdx, rsi ; preserve rsi pop rcx ; length pop rdi ; destination address pop rsi ; source address rep movsb ; copy source to destination mov rsi, rdx ; restore rsi NEXT COMMENT $/* BUILT-IN VARIABLES ---------------------------------------------------------------------- These are some built-in variables and related standard FORTH words. Of these, the only one that we have discussed so far was LATEST, which points to the last (most recently defined) word in the FORTH dictionary. LATEST is also a FORTH word which pushes the address of LATEST (the variable) on to the stack, so you can read or write it using @ and ! operators. For example, to print the current value of LATEST (and this can apply to any FORTH variable) you would do: LATEST @ . CR To make defining variables shorter, I'm using a macro called defvar, similar to def_word and def_code above. (In fact the defvar macro uses def_code to do the dictionary header). */$ def_var MACRO name, namelen, flags:=<0>, label, initial:=<0> def_code name, namelen, flags, label push OFFSET var_&label NEXT ; Don't switch sections here - define the variable data separately ENDM COMMENT @/* The built-in variables are: STATE Is the interpreter executing code (0) or compiling a word (non-zero)? LATEST Points to the latest (most recently defined) word in the dictionary. HERE Points to the next free byte of memory. When compiling, compiled words go here. S0 Stores the address of the top of the parameter stack. BASE The current base for printing and reading numbers. */@ def_var "STATE", 5, , STATE def_var "HERE", 4, , HERE def_var "LATEST", 6, , LATEST, 0 ; SYSCALL0 must be last in built-in dictionary def_var "S0", 2, , SZ def_var "BASE", 4, , BASE, 10 COMMENT @/* BUILT-IN CONSTANTS ---------------------------------------------------------------------- It's also useful to expose a few constants to FORTH. When the word is executed it pushes a constant value on the stack. The built-in constants are: VERSION Is the current version of this FORTH. R0 The address of the top of the return stack. DOCOL Pointer to DOCOL. F_IMMED The IMMEDIATE flag's actual value. F_HIDDEN The HIDDEN flag's actual value. F_LENMASK The length mask in the flags/len byte. SYS_* and the numeric codes of various Linux syscalls (from ) */@ ; Define def_const macro ; Macros (assuming NEXT and defcode are already defined) def_const MACRO name, namelen, flags:=<0>, label, value def_code name, namelen, flags, label push value ; Just push the raw value NEXT ENDM ; FORTH system constants ; FORTH system constants def_const "VERSION", 7,, VERSION, TBF_FORTH_VERSION def_const "R0",2,,RZ,OFFSET return_stack_top def_const "DOCOL",5,,__DOCOL,OFFSET DOCOL def_const "F_IMMED", 7, , __F_IMMED, F_IMMED def_const "F_HIDDEN", 8, , __F_HIDDEN, F_HIDDEN def_const "F_LENMASK", 9, , __F_LENMASK, F_LENMASK ; Windows-specific constants def_const "EXIT_SUCCESS", 11, , EXIT_SUCCESS, 0 def_const "EXIT_FAILURE", 11, , EXIT_FAILURE, 1 ; Windows file access modes def_const "GENERIC_READ", 11, , GENERIC_READ, 80000000h def_const "GENERIC_WRITE", 12, , GENERIC_WRITE, 40000000h def_const "GENERIC_READ_WRITE", 16, , GENERIC_READ_WRITE, 0C0000000h ; Windows file creation flags def_const "CREATE_NEW", 10, , CREATE_NEW, 1 def_const "CREATE_ALWAYS", 12, , CREATE_ALWAYS, 2 def_const "OPEN_EXISTING", 12, , OPEN_EXISTING, 3 def_const "OPEN_ALWAYS", 10, , OPEN_ALWAYS, 4 def_const "TRUNCATE_EXISTING", 15, , TRUNCATE_EXISTING, 5 mainCRTStartup proc mainCRTStartup endp end