forth_bootslop/references/ForthNotes/libforth.md

# libforth

**Source:** https://github.com/howerj/libforth

GitHub - howerj/libforth: libforth: A small Forth interpreter that can be used as a library written in c99


[Skip to content](#start-of-content)


## Navigation Menu

Toggle navigation

[Sign in](/login?return_to=https%3A%2F%2Fgithub.com%2Fhowerj%2Flibforth)

Appearance settings

* Platform

  + AI CODE CREATION
    - [GitHub CopilotWrite better code with AI](https://github.com/features/copilot)
    - [GitHub SparkBuild and deploy intelligent apps](https://github.com/features/spark)
    - [GitHub ModelsManage and compare prompts](https://github.com/features/models)
    - [MCP RegistryNewIntegrate external tools](https://github.com/mcp)
  + DEVELOPER WORKFLOWS
    - [ActionsAutomate any workflow](https://github.com/features/actions)
    - [CodespacesInstant dev environments](https://github.com/features/codespaces)
    - [IssuesPlan and track work](https://github.com/features/issues)
    - [Code ReviewManage code changes](https://github.com/features/code-review)
  + APPLICATION SECURITY
    - [GitHub Advanced SecurityFind and fix vulnerabilities](https://github.com/security/advanced-security)
    - [Code securitySecure your code as you build](https://github.com/security/advanced-security/code-security)
    - [Secret protectionStop leaks before they start](https://github.com/security/advanced-security/secret-protection)
  + EXPLORE
    - [Why GitHub](https://github.com/why-github)
    - [Documentation](https://docs.github.com)
    - [Blog](https://github.blog)
    - [Changelog](https://github.blog/changelog)
    - [Marketplace](https://github.com/marketplace)

  [View all features](https://github.com/features)
* Solutions

  + BY COMPANY SIZE
    - [Enterprises](https://github.com/enterprise)
    - [Small and medium teams](https://github.com/team)
    - [Startups](https://github.com/enterprise/startups)
    - [Nonprofits](https://github.com/solutions/industry/nonprofits)
  + BY USE CASE
    - [App Modernization](https://github.com/solutions/use-case/app-modernization)
    - [DevSecOps](https://github.com/solutions/use-case/devsecops)
    - [DevOps](https://github.com/solutions/use-case/devops)
    - [CI/CD](https://github.com/solutions/use-case/ci-cd)
    - [View all use cases](https://github.com/solutions/use-case)
  + BY INDUSTRY
    - [Healthcare](https://github.com/solutions/industry/healthcare)
    - [Financial services](https://github.com/solutions/industry/financial-services)
    - [Manufacturing](https://github.com/solutions/industry/manufacturing)
    - [Government](https://github.com/solutions/industry/government)
    - [View all industries](https://github.com/solutions/industry)

  [View all solutions](https://github.com/solutions)
* Resources

  + EXPLORE BY TOPIC
    - [AI](https://github.com/resources/articles?topic=ai)
    - [Software Development](https://github.com/resources/articles?topic=software-development)
    - [DevOps](https://github.com/resources/articles?topic=devops)
    - [Security](https://github.com/resources/articles?topic=security)
    - [View all topics](https://github.com/resources/articles)
  + EXPLORE BY TYPE
    - [Customer stories](https://github.com/customer-stories)
    - [Events & webinars](https://github.com/resources/events)
    - [Ebooks & reports](https://github.com/resources/whitepapers)
    - [Business insights](https://github.com/solutions/executive-insights)
    - [GitHub Skills](https://skills.github.com)
  + SUPPORT & SERVICES
    - [Documentation](https://docs.github.com)
    - [Customer support](https://support.github.com)
    - [Community forum](https://github.com/orgs/community/discussions)
    - [Trust center](https://github.com/trust-center)
    - [Partners](https://github.com/partners)
* Open Source

  + COMMUNITY
    - [GitHub SponsorsFund open source developers](https://github.com/sponsors)
  + PROGRAMS
    - [Security Lab](https://securitylab.github.com)
    - [Maintainer Community](https://maintainers.github.com)
    - [Accelerator](https://github.com/accelerator)
    - [Archive Program](https://archiveprogram.github.com)
  + REPOSITORIES
    - [Topics](https://github.com/topics)
    - [Trending](https://github.com/trending)
    - [Collections](https://github.com/collections)
* Enterprise

  + ENTERPRISE SOLUTIONS
    - [Enterprise platformAI-powered developer platform](https://github.com/enterprise)
  + AVAILABLE ADD-ONS
    - [GitHub Advanced SecurityEnterprise-grade security features](https://github.com/security/advanced-security)
    - [Copilot for BusinessEnterprise-grade AI features](https://github.com/features/copilot/copilot-business)
    - [Premium SupportEnterprise-grade 24/7 support](https://github.com/premium-support)
* [Pricing](https://github.com/pricing)

Search or jump to...


# Search code, repositories, users, issues, pull requests...

Search

Clear

[Search syntax tips](https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax)

# Provide feedback

We read every piece of feedback, and take your input very seriously.


Include my email address so I can be contacted

Cancel
 Submit feedback


# Saved searches

## Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our [documentation](https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax).

Cancel
 Create saved search

[Sign in](/login?return_to=https%3A%2F%2Fgithub.com%2Fhowerj%2Flibforth)

[Sign up](/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E&source=header-repo&source_repo=howerj%2Flibforth)

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.


Dismiss alert

{{ message }}

[howerj](/howerj)
/
**[libforth](/howerj/libforth)**
Public

* [Notifications](/login?return_to=%2Fhowerj%2Flibforth) You must be signed in to change notification settings
* [Fork
  31](/login?return_to=%2Fhowerj%2Flibforth)
* [Star
   200](/login?return_to=%2Fhowerj%2Flibforth)

libforth: A small Forth interpreter that can be used as a library written in c99

### License

[MIT license](/howerj/libforth/blob/master/LICENSE)

[200
stars](/howerj/libforth/stargazers) [31
forks](/howerj/libforth/forks) [Branches](/howerj/libforth/branches) [Tags](/howerj/libforth/tags) [Activity](/howerj/libforth/activity)

[Star](/login?return_to=%2Fhowerj%2Flibforth)

[Notifications](/login?return_to=%2Fhowerj%2Flibforth) You must be signed in to change notification settings

* [Code](/howerj/libforth)
* [Issues
  0](/howerj/libforth/issues)
* [Pull requests
  0](/howerj/libforth/pulls)
* [Actions](/howerj/libforth/actions)
* [Projects
  0](/howerj/libforth/projects)
* [Wiki](/howerj/libforth/wiki)
* [Security
  0](/howerj/libforth/security)
* [Insights](/howerj/libforth/pulse)

Additional navigation options


* [Code](/howerj/libforth)
* [Issues](/howerj/libforth/issues)
* [Pull requests](/howerj/libforth/pulls)
* [Actions](/howerj/libforth/actions)
* [Projects](/howerj/libforth/projects)
* [Wiki](/howerj/libforth/wiki)
* [Security](/howerj/libforth/security)
* [Insights](/howerj/libforth/pulse)

# howerj/libforth

master

[Branches](/howerj/libforth/branches)[Tags](/howerj/libforth/tags)

Go to file

Code

Open more actions menu

## Folders and files

| Name | | Name | Last commit message | Last commit date |
| --- | --- | --- | --- | --- |
| Latest commit   History[350 Commits](/howerj/libforth/commits/master/)   350 Commits | | |
| [fth](/howerj/libforth/tree/master/fth "fth") | | [fth](/howerj/libforth/tree/master/fth "fth") |  |  |
| [libline @ 465e8a5](/howerj/libline/tree/465e8a59a0bfbef625d44ba250cccc60e8fea3be "libline") | | [libline @ 465e8a5](/howerj/libline/tree/465e8a59a0bfbef625d44ba250cccc60e8fea3be "libline") |  |  |
| [.gitignore](/howerj/libforth/blob/master/.gitignore ".gitignore") | | [.gitignore](/howerj/libforth/blob/master/.gitignore ".gitignore") |  |  |
| [.gitmodules](/howerj/libforth/blob/master/.gitmodules ".gitmodules") | | [.gitmodules](/howerj/libforth/blob/master/.gitmodules ".gitmodules") |  |  |
| [.travis.yml](/howerj/libforth/blob/master/.travis.yml ".travis.yml") | | [.travis.yml](/howerj/libforth/blob/master/.travis.yml ".travis.yml") |  |  |
| [LICENSE](/howerj/libforth/blob/master/LICENSE "LICENSE") | | [LICENSE](/howerj/libforth/blob/master/LICENSE "LICENSE") |  |  |
| [convert](/howerj/libforth/blob/master/convert "convert") | | [convert](/howerj/libforth/blob/master/convert "convert") |  |  |
| [dpans94.txt](/howerj/libforth/blob/master/dpans94.txt "dpans94.txt") | | [dpans94.txt](/howerj/libforth/blob/master/dpans94.txt "dpans94.txt") |  |  |
| [editor.fth](/howerj/libforth/blob/master/editor.fth "editor.fth") | | [editor.fth](/howerj/libforth/blob/master/editor.fth "editor.fth") |  |  |
| [forth.fth](/howerj/libforth/blob/master/forth.fth "forth.fth") | | [forth.fth](/howerj/libforth/blob/master/forth.fth "forth.fth") |  |  |
| [length](/howerj/libforth/blob/master/length "length") | | [length](/howerj/libforth/blob/master/length "length") |  |  |
| [libforth.c](/howerj/libforth/blob/master/libforth.c "libforth.c") | | [libforth.c](/howerj/libforth/blob/master/libforth.c "libforth.c") |  |  |
| [libforth.h](/howerj/libforth/blob/master/libforth.h "libforth.h") | | [libforth.h](/howerj/libforth/blob/master/libforth.h "libforth.h") |  |  |
| [main.c](/howerj/libforth/blob/master/main.c "main.c") | | [main.c](/howerj/libforth/blob/master/main.c "main.c") |  |  |
| [makefile](/howerj/libforth/blob/master/makefile "makefile") | | [makefile](/howerj/libforth/blob/master/makefile "makefile") |  |  |
| [readme.md](/howerj/libforth/blob/master/readme.md "readme.md") | | [readme.md](/howerj/libforth/blob/master/readme.md "readme.md") |  |  |
| [unit.c](/howerj/libforth/blob/master/unit.c "unit.c") | | [unit.c](/howerj/libforth/blob/master/unit.c "unit.c") |  |  |
| [unit.fth](/howerj/libforth/blob/master/unit.fth "unit.fth") | | [unit.fth](/howerj/libforth/blob/master/unit.fth "unit.fth") |  |  |
| [unit.h](/howerj/libforth/blob/master/unit.h "unit.h") | | [unit.h](/howerj/libforth/blob/master/unit.h "unit.h") |  |  |
| View all files | | |

## Repository files navigation

* [README](#)
* [MIT license](#)

% FORTH(1)
% Richard Howe
% November 2016

# NAME

forth - a forth interpreter

# SYNOPSIS

**forth** [**-s** file] [**-e** string] [**-l** file] [**-m** size] [**-VthvLSnx**] [**-**] [**files**]

# DESCRIPTION

A Forth interpreter built around a library, libforth, that implements a
complete Forth interpreter.

For a much better, albeit 16-bit only, Forth that is also very well documented
and (slightly more) standards compliant, please visit the more up to date
[Embed](https://github.com/howerj/embed) virtual machine. Also of interest is
the [Forth CPU](https://github.com/howerj/forth-cpu) I wrote, in VHDL, for an
FPGA.

This interpreter is available at [here](https://github.com/howerj/libforth).

[![Libforth build status](https://camo.githubusercontent.com/2fc05cd9c4e36dbe5aafa595c13716d1ef247c3965e6ecfc83e4811e0a604359/68747470733a2f2f7472617669732d63692e6f72672f686f7765726a2f6c6962666f7274682e7376673f6272616e63683d6d6173746572 "Build Status")](https://camo.githubusercontent.com/2fc05cd9c4e36dbe5aafa595c13716d1ef247c3965e6ecfc83e4811e0a604359/68747470733a2f2f7472617669732d63692e6f72672f686f7765726a2f6c6962666f7274682e7376673f6272616e63683d6d6173746572)

# OPTIONS

Command line switches must be given before any files, unless that switch takes
a file as an argument.

* -s file

This saves the working memory of the Forth interpreter to a file,
which can later be loaded with the "-l" option. If a core file has been
invalidated this will not be saved, invalidation occurs when an unrecoverable
error has been detected that would prevent any recovery or meaningful
execution with the current image.

* -e string

Evaluate a Forth string passed in as an argument.

* -t

After all the files have been read from and any core files have been loaded
this will make the Forth interpreter read from [stdin](https://en.wikipedia.org/wiki/Standard_streams), the core file will be
saved after [stdin](https://en.wikipedia.org/wiki/Standard_streams) has been read from and there is no more work to do, if
the "-d" or "-s" flags have been specified.

* -h

Print out a short help message and exit unsuccessfully.

* -v

Turn verbose mode on, more information will be printed out, to [stderr](https://en.wikipedia.org/wiki/Standard_streams), about
what is happening in the interpreter. Usually the interpreter is as silent as
possible.

* -m size

Specify the virtual machines memory size in kilobytes, overriding the default
memory size. This is mutually exclusive with "-l".

* -l file

This option loads a forth core file generated from the "-d" option of a
previous run. This core file is not portable and must be generated on the same
platform as it was generated. It can only be specified once per run of the
interpreter.

* -L

The same as "-l", however the default core file name is used, "forth.core", so
an argument does not have to be provided.

* -S

The same as "-s", however the default core file name is used, "forth.core", so
an argument does not have to be provided.

* '-'

Stop processing any more command line options and treat all arguments after
this as files to be executed, if there are any.

* -V

Print version and interpreter information and exit successfully.

* -f file

Process a file immediately. This allows options and file arguments to be
intermingled.

* -n

If the line editing library is compiled into the executable, which is a compile
time option, then when reading from [stdin](https://en.wikipedia.org/wiki/Standard_streams) this will use a [line editor](https://github.com/howerj/libline)
to read in a line at a time. This option implies *-t*.

* -x

Enable signal handling, signal handling is experimental and not quite worked
out yet. It is off by default as I find it annoying when programs catch signals
when you really want to program to *die*. This cannot be enabled from within
the Forth interpreter. This option should disappear once signal handling has
been sorted out.

* file...

If a file, or list of files, is given, read from them one after another
and execute them. The dictionary and any stored Forth blocks will persist, as
will values on the stack.

If no files are given to execute [stdin](https://en.wikipedia.org/wiki/Standard_streams) will be read from.

## EXAMPLES

```
./forth
```

Execute any commands given from [stdin](https://en.wikipedia.org/wiki/Standard_streams)

```
./forth -t file1.4th file2.4th
```

Execute file "file1.4th", then "file2.4th", then read from [stdin](https://en.wikipedia.org/wiki/Standard_streams)

```
./forth file1.4th
```

Execute file "file1.4th".

```
./forth -s file1.4th
```

Execute file "file1.4th", the produce a "forth.core" save file.

```
./forth -s -l forth.core
```

Load a "forth.core" file, read from [stdin](https://en.wikipedia.org/wiki/Standard_streams) and execute any commands given,
then dump the new core file to "forth.core".

The interpreter returns zero on success and non zero on failure.

# LICENSE

The Forth interpreter and the library that implements it are released under the
[MIT](https://opensource.org/licenses/MIT) license. Copyright (c) Richard Howe, 2016.

# EXIT STATUS

This program will return a non-zero value on failure, and zero on success.

# SEE ALSO

libforth(3)

# BUGS

If you find a bug, or would like to request a new feature, please Email me at:

```
    howe.r.j.89 [ at ] gmail . com
```

The interpreter has not been battle hardened yet so there is likely behavior
that is non-standard (for no reason) or just outright incorrect.

# MANUAL

This small [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) interpreter is based on a de-obfuscated entrant
into the [IOCCC](http://ioccc.org/winners.html) by *buzzard*. The entry described a [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29)
like language which this derives from. You can use this library to
evaluate [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) strings or as an embeddable interpreter. Work
would need to be done to get useful information after doing those
evaluations, but the library works quite well.

*main.c* is simply a wrapper around one the functions that implements
a simple [REPL](https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop).

This project implements a [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) interpreter library which can be embedded
in other projects, it is incredibly minimalistic, but usable. To build the
project a [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) compiler is needed, and a copy of [Make](https://en.wikipedia.org/wiki/Make_%28software%29), type:

```
make help
```

For a list of build options. By running:

```
make run
```

Will build the interpreter and run it, it will then read from [stdin](https://en.wikipedia.org/wiki/Standard_streams).

To build the documentation other programs may be needed, such as [pandoc](http://pandoc.org/) and
the [markdown script](https://daringfireball.net/projects/markdown/), but these steps are optional.

[Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) is an odd language that has a loyal following groups, but it
is admittedly not the most practical of language as it lacks nearly everything
the modern programmer wants in a language; safety, garbage collection,
modularity and clarity. It is however possible to implement a fully working
interpreter in a one to two kilobytes of assembly, those kilobytes can make a
functional and interactive programming environment, giving a high ratio of utility
memory used.

From the [Wikipedia](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) article we can neatly summarize the language:

```
    "Forth is an imperative stack-based computer programming language
    and programming environment.

    Language features include structured programming, reflection (the
    ability to modify the program structure during program execution),
    concatenative programming (functions are composed with juxtaposition)
    and extensibility (the programmer can create new commands).

    ...

    A procedural programming language without type checking, Forth features
    both interactive execution of commands (making it suitable as a shell
    for systems that lack a more formal operating system) and the ability
    to compile sequences of commands for later execution."
```

Given the nature of the [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) language it does not make for a terribly good
embeddable scripting language, but it is simple to implement and can be fun
to use. This interpreter is based off a previous [IOCCC](http://ioccc.org/winners.html) in a file called
[buzzard.2.c](http://www.ioccc.org/1992/buzzard.2.c), it is a descendant of that file.

Before using and understanding this library/interpreter it is useful to checkout
more literature on [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) such as [Thinking Forth](http://thinking-forth.sourceforge.net/) by Leo Brodie for a
philosophy of the language, [Starting Forth](http://www.forth.com/starting-forth/) (same Author), [Jonesforth](https://rwmj.wordpress.com/2010/08/07/jonesforth-git-repository/)
which is a specific implementation of the language in x86 assembly and
[Gforth](https://www.gnu.org/software/gforth/), a more modern and portable implementation of the language.

It is important to realize that [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) is really more a philosophy and
collection of ideas than a specific reference implementation or standard.
It has been said that an intermediate [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) user is one who has implemented
a [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) interpreter, something which cannot be said about other languages
nor is possible given their complexity.

The saying "if you have seen one Forth implementation, you have seen one Forth
implementation" applies, nearly every single [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) implementation has its
own idea of how to go about things despite standardization efforts - in keeping
with this, this library has its own idiosyncrasies.

This implementation, written in [C](https://en.wikipedia.org/wiki/C_%28programming_language%29), can be thought of as a hybrid between a
fairly dumb stack based virtual machine with instructions such as "pop two
values off the stack, add them, and push the result" and a small
interpreter/compiler for the virtual machine. This simple kernel is then used
to build a more compliant and usable [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) implementation by defining
words that build upon those provided by the base system.

## Other documentation

Apart from this file there are other sources of information about the
project:

As can the code, which is small enough to be comprehensible:

* [libforth.c](/howerj/libforth/blob/master/libforth.c) (contains the core interpreter)
* [libforth.h](/howerj/libforth/blob/master/libforth.h) (contains the API documentation)

And the forth startup code:

* [forth.fth](/howerj/libforth/blob/master/forth.fth)

The startup code is well commented and shows how the core interpreter is
extended to a more function [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) environment.

The source file [libforth.c](/howerj/libforth/blob/master/libforth.c) can be converted to a more readable webpage by
first converting the source to [markdown](https://daringfireball.net/projects/markdown/) with [convert](/howerj/libforth/blob/master/convert) script, the converting
that to HTML in the usual fashion

## Using the interpreter

*main.c* simple calls the function *main\_forth()* in *libforth.c*, this function
initializes a [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) environment and puts the user in a [REPL](https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop) where you
can issue commands and define words. See the manual pages for list of command
line options and library calls. All commands are given using
[Reverse Polish Notation](https://en.wikipedia.org/wiki/Reverse_Polish_notation) (or RPN),

So:

```
    2+(2*4)
```

Becomes:

```
    4 2 * 2 +
```

And brackets are no longer needed. Numbers of pushed on to the variable
stack automatically and commands (such as '\*' and '+') take their operands
off the stack and push the result. Juggling variables on the stack becomes
easier over time. To pop a value from the stack and print it there is the
'.' word.

So:

```
    2 2 + .
```

Prints:

```
    4
```

The simplicity of the language allows for a small interpreter, the
loop looks something like this:

```
    1) Read in a space delimited Forth WORD.
    2) Is this WORD in the dictionary?
       FOUND)      Are we in IMMEDIATE mode?
                   IMMEDIATE-MODE) Execute WORD.
                                   goto 1;
                   COMPILE-MODE)   Compile WORD into the dictionary.
                                   goto 1;
       NOT-FOUND)  Is this actually a number?
                   YES) Are we in IMMEDIATE mode?
                        IMMEDIATE-MODE) Push Number onto the stack.
                                        goto 1;
                        COMPILE-MODE)   Compile a literal number.
                                        goto 1;
                   NO)  Error! Handle error
                        goto 1;
```

Given that we are reading in *space delimited words* if follows that the
above expression:

```
    2 2 + .
```

Would not work if we did:

```
    2 2+ .
```

Or:

```
    2 2 +.
```

As "2+" and "+." would be parsed as words, which may or may not be defined
and if they are do not have the behavior that we want. This is more apparent
when we do any kind of string handling.

## A Forth Word

The Forth execution model uses [Threaded Code](https://en.wikipedia.org/wiki/Threaded_code), the layout of a word
header follows from this.

A [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) word is defined in the dictionary and has a particular format that
varies between implementations. A dictionary is simply a linked list of
[Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) words, the dictionary is usually contiguous and can only grow. The
format for our [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) words is as follows:

Briefly:

* Word Header:
* field <0 = Word Name (the name is stored before the main header)
* field 0 = Previous Word
* field 1 = Code Word (bits 0 - 7) | Hidden Flag (bit 8) | Word Name Offset (bit 9 - 14) | Compiling bit (bit 15)
* field 2+ = Data field (if it exists).

And in more detail:

```
    .----------------------------------------.
    |       Word Header          | Word Body |
    .---------------.-----.------.-----------.
    | NAME ...      | PWD | MISC | DATA ...  |
    .---------------.-----.------.-----------.

    ____
    NAME        = The name, or the textual representation, of a Forth
                  word, it is a variable length field that is ASCII NUL
                  terminated, the MISC field has an offset that points
                  to the begining of this field if taken off the PWD
                  position (not value). The offset is in machine words,
                  not characters.
    ___
    PWD         = A pointer to the previously declared word.
    ____
    MISC        = A complex field that can contains a CODE WORD, a
                  "hide" bit and the offset from the PWD field to the
                  beginning of NAME, as well as the compiling bit.
    ____
    DATA        = This could be anything, but it is most likely to be
                  a list of pointers to CODE WORDs of previously defined
                  words if this optional DATA field is present.
```

All fields are aligned on the [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) virtual machines word boundaries.

The MISC field is laid out as so:

```
    .-------------------------------------------------------------------------------.
    | <- Least Significant Bit                              Most Significant Bit -> |
    .-------------------------------------------------------------------------------.
    |  0 |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 | 10 | 11 | 12 | 13 | 14 | 15 |
    .-------------------------------------------------------------------------------.
    |            CODE WORD             | HD |          NAME OFFSET             | CB |
    .-------------------------------------------------------------------------------.
    _________
    CODE WORD    = Bits 0-6 are a code word, this code word is always run
                   reguardless of whether we are in compiling or command
                   mode
    __
    HD           = Bit 7 is the Hide Bit, if this is true then when
                   compiling or executing words the word will be hidden from the
                   search.
    ___________
    NAME OFFSET  = Bits 8 to 15 are the offset to the words name. To find the
                   beginning of the words name we take this value away from
                   position of this words PWD header. This value is in
                   machine words, and so the beginning of the NAME must be aligned
                   to the virtual machine words boundaries and not character, or byte,
                   aligned. The length of this field, and the size of the input buffer,
                   limit the maximum size of a word.
__
CB           = Compiling bit, if set this is a compiling word, if
               cleared it is an immediate word.
```

Depending on the virtual machine word size, or cell size, there may be more
bits above bit '15', the most significant bit, in the MISC field. These bits
are not used and should be set to zero.

And the dictionary looks like this:

```
   [ Special 'fake' word ]
              .
             /|\
              |
   .-------.-----.----------------------.
   | NAME  | PWD | Rest of the word ... |
   .-------.-----.----------------------.
              .
             /|\
              |
            ~~~~~
     The rest of the dictionary
            ~~~~~
              |
   .-------.-----.----------------------.
   | NAME  | PWD | Rest of the word ... |
   .-------.-----.----------------------.
              .
             /|\
              |
   .-------.-----.----------------------.
   | NAME  | PWD | Rest of the word ... |
   .-------.-----.----------------------.
              .
             /|\
              |
   [ Previous Word Register ]
```

Searching of the dictionary starts from the *Previous Word Register* and ends
at a special 'fake' word.

Defining words adds them to the dictionary, we can defined words with the ':'
words like this:

```
    : two-times 2 * ;
```

Which defined the word "two-times", a word that takes a value from the stack,
multiplies it by two and pushes the results back onto the stack.

The word ':' performs multiple actions; it is an immediate word that reads in the
next space delimited word from the input stream and creates a header for that
word. It also switches the interpreter into compile mode, compiling words will
be compiled into that word definition instead of being executed, immediate words
are executed as normal. ';' is also an immediate word, it compiles a special
word exit into the dictionary which returns from a word call and switches the
interpreter back into command mode. This type of behavior is typical of
[Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) implementations.

## Memory Map and Special Registers

The way this interpreter works is that is emulates an idealized machine, one
built for executing [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) directly. As such it has to make compromises and
treats certain sectors of memory as being special, as shown below (numbers are
given in *hexadecimal* and are multiples of the virtual machines word-size
which is either 16, 32 or 64 bit depending on compile time options.

Where the dictionary ends and the variable and return stacks begin depends on
how much memory was allocated to the interpreter (with a minimum of 2048
words), the default is 32768 words, and the following diagram assumes this:

```
    .-----------------------------------------------.
    | 0-3F      | 40-7BFF       |7C00-7DFF|7E00-7FFF|
    .-----------------------------------------------.
    | Registers | Dictionary... | V stack | R stack |
    .-----------------------------------------------.

    V stack = The Variable Stack
    R stack = The Return Stack
```

Each may be further divided into special sections:

### Registers

At the beginning of the Forth virtual machine there is a section used for
registers, modifying them arbitrary can cause undefined behavior to occur which
will most likely cause the virtual machine to be terminated.

```
NAME          LOCATION        DESCRIPTION
              DECIMAL  HEX
               0-1      0-1    Unused
               2-5      2-5    Push integer word
DIC            6        6      Dictionary pointer
RSTK           7        7      Return stack pointer
STATE          8        8      Interpreter state; compile/command mode
BASE           9        9      Base conversion variable
PWD            10       A      Pointer to last defined word
SOURCE_ID      11       B      Input source selector (-1 = string input,
                               0 = file input)
SIN            12       C      String input pointer
SIDX           13       D      String input index  (index into SIN)
SLEN           14       E      String input length (length of SIN)
START_ADDR     15       F      Pointer to start of VM
FIN            16       10     File input pointer
FOUT           17       11     File output pointer
STDIN          18       12     File pointer to stdin, if available
STDOUT         19       13     File pointer to stdout, if available
STDERR         20       14     File pointer to stderr, if available
ARGC           21       15     Count of arguments passed to program,
                               if available
ARGV           22       16     An array of pointers to NUL terminated
                               ASCII strings, if available, of ARGC
                               length
DEBUG          23       17     Turn debugging on/off if enabled
INVALID        24       18     If non zero, this interpreter is invalid
TOP            25       19     Stored version of top of stack
INSTRUCTION    26       1A     Stored version of instruction pointer
STACK_SIZE     27       1B     Size of the variable stack
ERROR_HANDLER  28       1C     Action to take on error
THROW          29       1D     Used for throw/catch
SIGNAL_HANDLER 30       1E     Used for signal handling
SCRATCH_X      31       1F     Scratch variable for the user
```

Some registers will need more explaining.

* SIGNAL\_HANDLER

This register is used when a signal is caught, it is up to the C environment to
call *forth\_signal* from a signal handler in the C environment to let the
Forth interpreter know a signal has been caught.

* SCRATCH\_X

Scratch X is a variable that can be used by the user, be warned that other
words defined in the base library *might* use this register as well.

### Dictionary

Apart from the constraints that the dictionary begins after where the
registers are and before where V stack is there are no set demarcations
for each region, although currently the defined word region ends before
0x200 leaving room between that and 0x7BFF for user defined words.

```
    .----------------------------------------------------------------.
    | 40-???            | ???-???          | ???-7BFF                |
    .----------------------------------------------------------------.
    | Special read word | Interpreter word | Defined word ...        |
    .----------------------------------------------------------------.

    Special read word = A word called on entrance to the interpreter,
                        it calls itself recursively (as a tail call). This
                        word cannot be 'found', it does not have a name.
    Interpreter word  = Any named (not 'invisible' ones) interpreter word
                        gets put here.
    Defined word      = A list of words that have been defined with ':'
```

## Glossary of Forth words

Each word is also given with its effect on the variable stack, any other effects
are documented (including the effects on other stacks). Each entry looks like
this:

* word ( y -- z )

Where 'word' is the word being described, the contents between the parenthesis
describe the stack effects, this word expects one number to be one the stack,
'y', and returns a number to the stack 'z'.

Some of the descriptions of the variables in the stack effect comment
have a meaning:

```
.---------.-------------------------------------------------.
|  Name   |                   Meaning                       |
.---------.-------------------------------------------------.
| addr    | An address in cells                             |
| c-addr  | Character address                               |
| r-addr  | A raw address                                   |
| file-id | A file id, used as a handle for file operations |
| ior     | A error status returned by file operations      |
| char    | Single byte / character                         |
| u       | An unsigned value                               |
| x       | A signed value                                  |
| c" xxx" | The word parses a word (not a stack effect)     |
.---------.-------------------------------------------------.
```

The addressing modes complicate certain operations of Forth words. A normal
Forth address, as passed to words like '!', '@', 'c@' and 'c!' are relative to
the beginning of the start of the Forth cores memory, or at register zero. '!'
and '@' take their addresses in cells (*addr*), whereas 'c@' and 'c!' take
their addresses in character address (*c-addr*).

Raw addresses are however normal addresses, as understood by [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) programs
and the computer. Operations on raw addresses can possibly result in undefined
behavior, and all objects they may operate on that exists outside of the Forth
core cannot be guaranteed to exists after a core file has been reloaded (that
is pointers to objects outside of Forth core should not be saved for further
use). Raw-addresses are also in character, not cell, units.

In future revisions of Forth this memory addressing might be simplified, if a
simple way of doing this can be achieved.

### Internal words

There are three types of words.

#### 'Invisible' words

These invisible words have no name but are used to implement the Forth. They
are all *immediate* words.

* push ( -- x )

Push the next value in the instruction stream onto the variable stack, advancing
the instruction stream.

* const ( -- x )

Push the next location after this instruction, not the instruction stream, onto
the variable stack.

* run ( -- )

Save the current instruction stream pointer onto the return stack and set
the pointer instruction stream pointer to point to value after *run*.

#### Immediate words

These words are named and are *immediate* words.

* ':' ( c" xxx" -- )

Read in a new word from the input stream and compile it into the dictionary.

* 'immediate' ( -- )

Make the previously declared word immediate. Unlike in most Forth
implementations this is used after the words name is given not after the
final ';' has been reached.

So:

```
    : word immediate ... ;
```

Instead of:

```
    : word ... ; immediate
```

* '\' ( c" \n" -- )

A comment, ignore everything until the end of the line.

#### Compiling words

* 'read' ( c" xxx" -- )

*read* is a complex word that implements most of the input interpreter,
it reads in a [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) *word* (up to 31 characters), if this *word* is in
the *dictionary* it will either execute the word if we are in *command mode*
or compile a pointer to the executable section of the word if in *compile
mode*. If this *word* is not in the *dictionary* it is checked if it is a
number, if it is then in *command mode* we push this value onto the *variable
stack*, if in *compile mode* then we compile a *literal* into the *dictionary*.
If it is none of these we print an error message and attempt to read in a
new word.

* '@' ( address -- x )

Pop an address and push the value at that address onto the stack.

* '!' ( x address -- )

Given an address and a value, store that value at that address.

* 'c@' ( char-address -- char )

Pop a character address and push the character value at that address onto the
stack. Note that this access is not checked for being within range of the
virtual machines memory, but it is still relative to the start address of
virtual machine memory.

* 'c!' ( char char-address -- )

Given a character address, store a character value at that address, like 'c@'
the address is relative to the virtual machines starting address.

* '-' ( x y -- z )

Pop two values, subtract 'y' from 'x' and push the result onto the stack.

* '+' ( x y -- z )

Pop two values, add 'y' to 'x' and push the result onto the stack.

* 'and' ( x y -- z )

Pop two values, compute the bitwise 'AND' of them and push the result on to
the stack.

* 'or' ( x y -- z )

Pop two values, compute the bitwise 'OR' of them and push the result on to
the stack.

* 'xor' ( x y -- z )

Pop two values, compute the bitwise 'XOR' of them and push the result on to
the stack.

* 'invert' ( x y -- z )

Perform a bitwise negation on the top of the stack.

* 'lshift' ( x y -- z )

Pop two values, compute 'y' shifted by 'x' places to the left and push
the result on to the stack.

* 'rshift' ( x y -- z )

Pop two values, compute 'y' shifted by 'x' places to the right and push
the result on to the stack.

* '\*' ( x y -- z )

Pop two values, multiply them and push the result onto the stack.

* '/' ( x y -- z )

Pop two values, divide 'x' by 'y' and push the result onto the stack. If 'y'
is zero and error message is printed and 'x' and 'y' will remain on the
stack, but execution will continue on as normal.

* 'u<' ( x y -- z )

Pop two unsigned values, compare them (y < x) and push the result onto the
stack, the comparison will be unsigned.

* 'u>' ( x y -- z )

Pop two values, compare them (y > x) and push the result onto the stack. The
comparison will be unsigned.

* '\_exit' ( -- )

Pop the return stack and set the instruction stream pointer to that
value.

* 'exit' ( -- )

This does the same as **\_exit**, the reason there are two exits instead
of one is so that the word 'see', defined in **forth.fth** can differentiate
between an exit that occurs in the definition of a word, and one that occurs
at the end of a words definition. ('see' is a decompiler for Forth).

For example:

```
: test 0 begin dup 10 > if exit then dup . cr 1+  again ;
```

Gets Compiled to:

```
Address        Contents
         ._____._____._____._____. <- Start of Word
  X      | 't' | 'e' | 's' | 't' |
         ._____._____._____._____.
  X+1    |  0  |  0  |  0  |  0  | NUL terminates 'test' string
         ._____._____._____._____.
  X+2    | Previous Word Pointer | AKA 'PWD' field
         ._______________________.
  X+3    |       MISC Field      | <- Execution Starts here
         ._______________________.
  X+4    | Literal               | Literals a compiled as a pointer to
         ._______________________. a 'literal' word and the literal in
  X+5    | 0                     | the next field.
         ._______________________.
  X+6    | Pointer to 'dup'      |
         ._______________________.
  X+7    | literal               |
         ._______________________.
  X+8    | 10                    |
         ._______________________.
  X+9    | Pointer to '>'        |
         ._______________________.
  X+10   | Pointer to 'branch?'  | 'if' gets compiled to 'branch?'
         ._______________________. and '2' so it jumps over 'exit'
  X+11   | 2                     | if the previous test fails. This
         ._______________________. is encoded as the jump destination
  X+12   | Pointer to 'exit'     | less one as an increment happens
         ._______________________. after the word is executed.
  X+13   | Pointer to 'dup'      |
         ._______________________.
  X+14   | Pointer to '.'        |
         ._______________________.
  X+15   | Pointer to 'cr'       |
         ._______________________.
  X+16   | Pointer to '1+'       |
         ._______________________.
  X+17   | Pointer to 'branch'   |
         ._______________________.
  X+18   | -12                   |
         ._______________________.
  X+19   | Pointer to '_exit'    |
         ._______________________. <- End of Word
```

The decompiler knows that the end of a word is demarcated by a pointer to
**\_exit**, and that pointers to **exit** can occur within the body of the
definition.

* 'key' ( -- char )

Get a value from the input and put it onto the stack.

* '\_emit' ( char -- status )

Put a character to the output stream returning a success value.

* 'r>' ( -- x )

Pop a value from the return stack and push it to the variable stack.

* '>r' ( x -- )

Pop a value from the variable stack and push it to the return stack.

* 'branch' ( -- )

Jump unconditionally to the destination next in the instruction stream.

* '?branch' ( bool -- )

Pop a value from the variable stack, if it is zero the jump to the
destination next in the instruction stream, otherwise skip over it.

* '(.)' ( x -- status )

Pop a value from the variable stack and print it to the output either
as a ASCII decimal or hexadecimal value depending on the BASE register. A
return status is pushed onto the stack, greater or equal to zero is a success,
negative is a failure. Failure can occur because of an invalid base in the BASE
register, or because the output could not be written to.

* ''' ( -- )

Push the next value in the instruction stream onto the variable stack
and advance the instruction stream pointer over it.

* ',' ( x -- )

Write a value into the dictionary, advancing the dictionary pointer.

* '=' ( x y -- z )

Pop two values, perform a test for equality and push the result.

* 'swap' ( x y -- y z )

Swap two values on the stack.

* 'dup' ( x -- x x )

Duplicate a value on the stack.

* 'drop' ( x -- )

Drop a value.

* 'over' ( x y -- x y x )

Duplicate the value that is next on the stack.

* 'find' ( -- execution-token )

Find a word in the dictionary pushing a pointer to that word onto the
variable stack.

* 'depth' ( -- depth )

Push the current stack depth onto the stack, the value is the depth of the
stack before the depth value was pushed onto the variable stack.

* 'sp@' ( -- addr )

Push the address of the stack pointer onto the stack, before **sp@** was
executed:

```
1 2 sp@ . . .
```

Prints:

```
2 2 1
```

* 'sp!' ( addr -- )

Set the address of the stack pointer.

* 'clock' ( -- x )

Push the difference between the startup time and now, in milliseconds. This
can be used for timing and implementing sleep functionality, the counter
will not increase the interpreter is blocking and waiting for input, although
this is implementation dependent.

* 'evaluator' ( c-addr u 0 | file-id 0 1 -- x )

This word is a primitive used to implement 'evaluate' and 'include-file', it
takes a boolean to decide whether it will read from a file (1) or a string (0),
and then takes either a forth string, or a **file-id**.

* 'system' ( c-addr u -- status )

Execute a command with the systems command interpreter.

* 'raise' ( u -- ??? )

Raises a signal, what happens after the signal is raised is undefined.

* 'date' ( -- date )

Push the date onto the stack, the order of the arguments are:

```
Is day light savings time?
Days since start of year
Week day
Year
Month
Day of month
Hour
Minutes
Seconds (note, this can include lead seconds!)
```

The time is in UTC time.

* 'memory-copy' ( r-addr1 r-addr2 u -- )

Operates on two raw addresses. Copy 'u' characters from r-addr2 to r-addr1.

* 'memory-locate' ( r-addr char u -- r-addr | 0 )

Locate a character in a block of memory 'u' characters wide, returning a pointer to
that character or zero if that address cannot be found.

* 'memory-set' ( r-addr char u -- )

Set 'u' character of memory starting at 'r-addr' to 'char'.

* 'memory-compare' ( r-addr1 r-addr2 u -- x )

Compare two blocks of memory 'u' units wide.

* 'allocate' ( u -- r-addr status )

Allocate a block of memory.

* 'free' ( r-addr -- status )

Free a block of memory.

* 'getenv' ( c-addr u -- r-addr u )

Get an [environment variable](https://en.wikipedia.org/wiki/Environment_variable) given a string, it returns '0 0' if the
variable was not found.

##### File Access Words

The following compiling words are part of the File Access Word set, a few of
the fields need explaining in the stack comments. "file-id" refers to a
previously opened file as returned by "open-file", "ior" refers to a return
status provided by the file operations. "fam" is a file access method,

* 'close-file' ( file-id -- ior )

Close an already opened file.

* 'open-file' ( c-addr u fam -- file-id ior )

Open a file, given a Forth string (the 'c-addr' and the 'u' arguments), and a
file access method, which is defined within "forth.fth". Possible file access
methods are "w/o", "r/w" and "r/o" for read only, read-write and write only
respectively.

* 'delete-file' ( c-addr u -- ior )

Delete a file on the file system given a Forth string.

* 'read-file' ( c-addr u file-id -- ior )

Read in 'u' characters into 'c-addr' given a file identifier.

* 'write-file' ( c-addr u file-id -- ior )

Write 'u' characters from 'c-addr' to a given file identifier.

* 'file-position' ( file-id -- ud ior )

Get the file position offset from the beginning of the file given a file
identifier.

* 'reposition-file' ( ud file-id -- ior )

Reposition a files offset relative to the beginning of the file given a file
identifier.

* 'flush-file' ( file-id -- ior )

Attempt to flush any buffered information written to a file.

* 'rename-file' ( c-addr1 u1 c-addr2 u2 -- ior )

Rename a file on the file system named by the first string ('c-addr1' and 'u1')
to the second string ('c-addr2' and 'u2').

* 'temporary-file' ( -- file-id ior )

Open up a new temporary file for writing and reading.

### Defined words

Defined words are ones which have been created with the ':' word, some words
get defined before the user has a chance to define their own to make their
life easier.

* 'state' ( -- addr )

Push the address of the register that controls the interpreter state onto
the stack, this value can be written to put the interpreter into compile
or command modes.

* ';' ( -- )

Write 'exit' into the dictionary and switch back into command mode.

* 'base' ( -- addr )

This pushes the address of a variable used for input and output conversion of
numbers, this address can be written to and read, valid numbers to write are 0
and 2 to 36 (*not* 1).

* 'pwd' ( -- pointer )

Pushes a pointer to the previously define word onto the stack.

* 'h' ( -- pointer )

Push a pointer to the dictionary pointer register.

* 'r' ( -- pointer )

Push a pointer to the register pointer register.

* 'here' ( -- dictionary-pointer )

Push the current dictionary pointer (equivalent to "h @").

* '[' ( -- )

Immediately switch into command mode.

* ']' ( -- )

Switch into compile mode

* '>mark' ( -- location )

Write zero into the head of the dictionary and advance the dictionary pointer,
push a address to the zero written into the dictionary. This is usually used
after in a word definition that changes the control flow, after a branch for
example.

* ':noname' ( -- execution-token )

This creates a word header for a word without a name and switches to compile
mode, the usual ';' finishes the definition. It pushes a execution token onto
the stack that can be written into the dictionary and run, or executed directly.

* 'if' ( bool -- )

Begin an if-else-then statement. If the top of stack is true then we
execute all between the if and a corresponding 'else' or 'then', otherwise
we skip over it.

Abstract Examples:

```
    : word ... bool if do-stuff ... else do-other-stuff ... then ... ;

    : word ... bool if do-stuff ... then ... ;
```

and a concrete examples:

```
    : test-word if 2 2 + . cr else 3 3 * . cr ;
    0 test-word
    4             # prints 4
    1 test-word
    9             # prints 9
```

Is a simple and contrived example.

* 'else' ( -- )

See 'if'.

* 'then' ( -- )

See 'if'.

* 'begin' ( -- )

This marks the beginning of a loop.

* 'until' ( bool -- )

Loop back to the corresponding 'begin' if the top of the stack is zero, continue
on otherwise.

* "')'" ( -- char )

Push the number representing the ')' character onto the stack.

* 'tab' ( -- )

Print a tab.

* 'cr' ( -- )

Prints a newline.

* '(' ( -- )

This will read the input stream until encountering a ')' character, it
is used for comments.

* 'allot' ( amount -- )

Allocate a number of cells in the dictionary.

* 'tuck' ( x y -- y x y )

The stack comment documents this word entirely.

* 'nip' ( x y -- y )

The stack comment documents this word entirely.

* 'rot' ( x y z -- z x y )

The stack comment documents this word entirely. This word rotates three items
on the variable stack.

* '-rot' ( x y z -- y z x )

The stack comment documents this word entirely. This word rotates three items
on the variable stack, in the opposite direction of "rot".

* 'emit' ( x -- )

Write a single character out to the output stream.

* '?' ( addr -- )

Print the contents of addr to the screen.

## Library of Forth words

The file [forth.fth](/howerj/libforth/blob/master/forth.fth) contains many defined words, however those words are
documented within that file and so as to avoid duplication will not be
mentioned here. This file is *not* loaded automatically, and so should be run
like this:

Unix:

```
    ./forth -t forth.fth
```

Windows

```
forth.exe -t forth.fth
```

## Glossary of Forth terminology

* Word vs Machine-Word

Usually in computing a 'word' refers to the natural length of integer in a
machine, the term 'machine word' is used to invoke this specific meaning,
a word in [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) is more analogous to a function, but there
are different types of Forth words; *immediate* and *compiling* words,
*internal* and *defined* words and finally *visible* and *invisible* words.

The distinction between a machine word and a Forth word
can lead to some confusion.

* *The* dictionary

There is only one dictionary in a normal [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) implementation, it is a
data structure that can only grow in size (or at least it can in this
implementation) and holds all of the defined words.

* *The* stack

When we referring to a stack, or the stack, we refer to the variable stack
unless otherwise stated (such as the return stack). The variable, or the
stack, holds the result of recent operations such as addition or subtraction.

* The return stack

Forth implementations are two (or more) stack machines. The second stack
is the return stack which holds the usual function call return values as
well as temporary variables.

* Defined Words

A defined word is one that is not implement directly by the interpreter but
has been create with the ':' word. It can be an *immediate* word, but does
not have to be.

* Compile mode

In this mode we *compile* words unless those words are *immediate* words,
if the are then we immediately execute them.

* Command mode

In this mode, regardless of whether we are in *command* or *compile* mode
we execute words or push them on to the stack.

* A block.

A [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) block is primitive way of managing persistent storage and this
version of block interface is more primitive than most. A block is a
contiguous range of bytes, usually 1024 of them as in this instance, and
they can be written or read from disk. Flushing of dirty blocks is not
performed in this implementation and must be done 'manually'.

## Porting this interpreter

The interpreter code is written in [C99](https://en.wikipedia.org/wiki/C99), and is written to be portable, however
porting it to embedded platforms that lack a C standard library (which is most
of them) would mean replacing the most of the C standard library functions used,
and implementing a new I/O mechanism for reading, printing and block storage.

The interpreter has been tested on the following platforms:

* Linux x86-64 with,
  + Debian Jessie (8.x)
  + GCC version 4.9.2
* Windows 7 x86-64 (not recently)
* Linux ARM 32-bit Little Endian (not recently)
* OSX Sierra 10.12.1 (tested by Rikard Lang).

And the different virtual machine word size options (32 and 64 bit machine
words) have been tested. There is no reason it should not also work on 16-bit
platforms.

libforth is also available as a [Linux Kernel Module](http://tldp.org/LDP/lkmpg/2.6/html/), on a branch of libforth,
see <https://github.com/howerj/libforth/tree/linux-kernel-module>. This is
module is very experimental, and it is quite possible that it will make your
system unstable.

## Standards compliance

This Forth interpreter is in no way compliant with any of the standards
relating to Forth, such as [ANS Forth](http://lars.nocrew.org/dpans/dpans.htm), or previous Forth standardization
efforts. However attempts to support words and behavior typical of these
standards are made (although there is still a lot to do).

Some important deviations are:

* ok

'ok' is not printed after a successful command execution , this is for two
reasons, firstly because of limitations in the implementation, and secondly
there is no reason for cluttering up the output window with this. The
implementation should be silent by default.

* Control structures

Various control structures in Forth behave slightly differently than in
normal Forths. This should be fixed, given time.

## forth.core magic

The [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29) interpreters core files can be detected by the [file](https://linux.die.net/man/1/file) utility
by creating a [magic](https://linux.die.net/man/5/magic) file.

Save the following file on [Unix](/howerj/libforth/blob/master/www.opengroup.org/unix) systems to */etc/magic*:

```
# libforth core files start with 0xFF '4' 'T' 'H'
0  string \xFF\x34\x54\x48 libforth core file
>4 byte   2                16-bit
>4 byte   4                32-bit
>4 byte   8                64-bit
## File version, version 4 is current as of 2017/03/08
>5 byte   x                version=[%d]
>5 byte   <4               ancient
>5 byte   4                current
>5 byte   >4               futuristic
## Endianess test
>6 byte   0                big-endian
>6 byte   1                little-endian
>6 byte   >1               INVALID-ENDIANESS
## Size is stored as the base-2 logarithm of the size
>7 byte   x                size=[2^%d]
## Extra tests could be added, such as whether the core file is still valid
```

## Coding Standards

The coding standards used for both the C and Forth code deviate from what is
normal, the Forth code mostly so. Nonetheless they should be abided by
regardless of what one may think of them. The C code is also written in a
literate style (although the documentation is generated from it, rather than
the code being generated from the documentation), this imposes extra
requirements on comments.

### C Coding Standards

The libforth Forth interpreter is primarily a library first and should be
thought of as such, it is possible that multiple instances of the Forth
interpreter are active at the same time, as such absolutely no global state can
be used within the Forth library. Be very careful in what functions you export,
by default all functions should be declared as static.

Global state can be used in the [main.c][] file which contains a wrapper
around libforth. This wrapper is used to make the **forth** executable, and
there is only ever one instance of the interpreter in use at a time.

The library is also written in [C99](https://en.wikipedia.org/wiki/C99) and made to be portable. Any
non-portable code should be placed in [main.c][] and should be optionally
compiled only if the correct macro for that functionality is defined. The
macros **\_WIN32** and **\_\_unix\_\_** are currently used to test for the
operating system in use for example.

The program [doxygen](https://en.wikipedia.org/wiki/Doxygen) is used to extract and built the API documentation for
libforth. At minimum all functions and macros that are exported by the library
should be documented using [doxygen](https://en.wikipedia.org/wiki/Doxygen) style markup.

All functions exported by [libforth.c](/howerj/libforth/blob/master/libforth.c) should be tested, and should be
prefixed with "forth\_". Unit tests against [libforth.c](/howerj/libforth/blob/master/libforth.c) belong in
[unit.c](/howerj/libforth/blob/master/unit.c). The same portability guidelines that apply to [libforth.c](/howerj/libforth/blob/master/libforth.c)
also apply to [unit.c](/howerj/libforth/blob/master/unit.c).

Try to use assertions and checks on code liberally, even if the check is
expensive to do. Expensive checks should be disabled with by defining the
**NDEBUG** macro, like **assert** is.

The Linux kernel coding standards are to be roughly followed. Line lengths
should be limited to 80 characters in length (this is much more important given
documentation is generated from the source).

The main exception to the rule on code indentation is the **forth\_run**
function, which mostly consists of a large switch statement that implements the
Forth virtual machine.

Bellow is a cut down version of the switch statement:

```
switch(w) { /* switches on instruction */
/* ...More instructions... */
case DEFINE:
	m[STATE] = 1; /* compile mode */
	if(forth_get_word(o, o->s) < 0)
		goto end;
	compile(o, RUN, (char*)o->s, true);
	break;
/* ...More instructions... */
case ADD:     f = *S-- + f;                   break;
case AND:     f = *S-- & f;                   break;
case OR:      f = *S-- | f;                   break;
case XOR:     f = *S-- ^ f;                   break;
case INV:     f = ~f;                         break;
/* ...More instructions... */
}
```

Of note is that simple instructions like **ADD** and **OR** take up a single
line, having worked on the virtual machine for a while this aids in readability
for me. The **DEFINE** instruction is a lot longer and so is split up into
multiple lines.

A [AWK](https://en.wikipedia.org/wiki/AWK) script, specifically [GAWK](https://www.gnu.org/software/gawk/), is used to turn the [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) code into a
single [PDF](https://en.wikipedia.org/wiki/Portable_Document_Format) document, by first generating [markdown](https://daringfireball.net/projects/markdown/) from it. The script,
called [convert](/howerj/libforth/blob/master/convert), is simple. The script by default indents any [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) code
which in [markdown](https://daringfireball.net/projects/markdown/) means it is treated as a code block, to turn the
indentation on or off the commands "/\*\*" turns if off and the command
"\*\*/" turns it back on, these obviously double as both [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) comment blocks,
but also make [doxygen](https://en.wikipedia.org/wiki/Doxygen) look for tags within the comments. The script only
looks for comment commands at the beginning of a line.

The following [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) program:

```
/**
## A title using markdown syntax
This comment will not be indented in the resulting mark down
file and can be used to describe any of the following code.
**/
int main(void)
{
	printf("Hello, World\n"); /* a normal C comment */
	return 0;
}
```

Will be turned into:

```
## A title using markdown syntax
This comment will not be indented in the resulting mark down
file and can be used to describe any of the following code.
<NEW-LINE>
	int main(void)
	{
		printf("Hello, World\n"); /* a normal C comment */
		return 0;
	}
```

The comment blocks are used to explain the code in detail.

### Forth Coding Standards

The way code is formatted is idiosyncratic for Forth, the main difference is
that tabs are used to indent code. Spaces and new lines are used to breakup
longer word definitions, for short definitions only the entire definition
should take up only two lines, as so:

```
: foo ( n -- n : add bar to 'n')
	bar + ;
```

All functions exposed to the user should be unit tested (which currently is not
the case, but that is not an excuse not to test them). Unit tests go in the
file [unit.fth](/howerj/libforth/blob/master/unit.fth).

Sections of related code should be enclosed with decoration that makes it
obvious that the code belongs together, like so:

```
( ============ The Foo Library ============ )
( The Foo library implements the standard
words 'foo', 'bar' and 'foobar' )
( ... )
: foo ( n -- n )
	bar + ;
( ... )
( ============ The Foo Library ============ )
```

The standard way of making a word immediate in Forth is to place the word
**immediate** after a word definition. This is still possible to do in
libforth, however the preferred method is to place it just after the word being
defined:

```
: example immediate ; ( preferred way )
: example ; immediate ( traditional way )
```

I prefer the first way of making a word immediate as it implies that it is part
of the word definition and not something that acts on the word.

## Bugs

As mentioned in the standards compliance section, this Forth does things in a
non-standard way. Apart from that:

* Passing invalid pointers to instructions like **open-file** or **system** can
  cause undefined behavior (your program will most likely crash). There is no
  simple way to handle this (apart from not doing it).
* The core interpreter does not currently make use of the throw and catch
  mechanism when handling certain errors (like division by zero), in effect there
  are two error handlers. These mechanisms need unifying.

## To-Do

* A few environment variables could be used to specify start up files for the
  interpreter and user specific startup files.
* Signal handling should be improved.
* Error handling could be improved - the latest word definition should be
  erased if an error occurs before the terminating ';'. And trap handling
  should be done in pure forth, instead of as a hybrid which is currently is.
* A series simple Unix utilities, like in <https://github.com/howerj/multicall>
  could be made in Forth, this would demonstrate that this Forth is usable
  and practical. Tools like "cat", "tr", "grep", and various other text
  formatting tools could be done in pure C, others like "kill", "mount" or "ls"
  would require operating system dependent code to be added, this would need
  to be contained somehow, so the core remains pure C.
* Allow the main memory to be resized. A Forth primitive MORE-CORE could be
  added.
* Move the stack space so it is before the dictionary, this would mean we
  could write to the end of memory without worrying about hitting the stack.
* Make a debian package for the Forth interpreter.
* For a Forth only related "To-Do" list see the end of the file [forth.fth](/howerj/libforth/blob/master/forth.fth).
* A compiler for the virtual machine itself should be made, as a separate
  program. This could be used to make a more advanced read-evaluate loop.
* Core files are currently not portable across machines of different words
  sizes or endianess, which needs addressing, or ignoring conclusively.
* Use either liballocs or libffi for adding foreign function interfaces
  to programs, see:
  + <https://github.com/stephenrkell/liballocs>
  + <https://sourceware.org/libffi/>
* The Forth could be made to be more standards compliant (especially where it
  comes to I/O and addressing (but also things like CASE statements).
* A document (part of this document) for the coding standards used in this
  document needs to be created and added here. Things that will need
  documenting are:

* How comments should look like (literate and doxygen comments)
* Tabs vs Spaces (use tabs)
* Levels of indentation
* Coding standards for Forth (stack comments, indentation, etcetera).
* And much more.

* Improve error handling by adding 'throw' and 'catch' to the virtual
  machine as both C functions callable within the confines of 'libforth.c' and
  as virtual machine instructions.
* To fix the problem with a mismatching between C addresses and Forth
  addresses which currently exists all modes of addressing should be made to
  be relative, or fixed up by the **forth\_init**. Currently access to memory
  outside of the Forth address space can only be done by copying memory to and
  from the C and the Forth addresses spaces with words that accept real addresses
  (like **memory-copy**).

* Branches use relative addressing at the moment.
* Register values are either fixed up at initialization (as in the case
  of **stdin**, **stdout** and a few other pointers) or need fixing (like the
  variable stack pointer, return stack pointer the HERE pointer and more).
* How the constants are defined will need to be changed (this will be the
  most difficult thing to fix).
* See: <https://en.wikipedia.org/wiki/Position-independent_code>
* And: <https://en.wikipedia.org/wiki/Relocation_%28computing%29>

### Virtual Machine To-Do points

The virtual machine needs changing, in someways drastically, to accommodate the
following behaviors:

1. Error handling should be purely Forth based.

Currently error handling and recovery is a weird mishmash between [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) and
[Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29), more of the error handling should be moved to the [Forth](https://en.wikipedia.org/wiki/Forth_%28programming_language%29)
environment using the normal THROW/CATCH mechanisms (which should be turned
into virtual machine instructions and C functions, which can then be called to
make a throw occur.

2. The virtual machine should use character based addressing. Currently it uses
   cell based addressing, which causes all kinds of confusion.
3. Words currently not defined need to be hidden, until the terminating ';'.

### Submodules

The [main.c][] file depends, optionally, on various submodules. These
submodules need improving.

#### libline

* libline needs porting to Windows
* The API needs improving so there is more control on whether or not raw mode
  is turned on or off, whether a terminal is being read from or not, ...

### Experimental single binary

The build system can be used to build a single binary which contains the
contents of "forth.fth", with the command "make libforth". This is a
multistage build process that is experimental, so should not be relied upon. It
involves a bootstrapping process using the first forth executable to build a
second.

The process works like this:

1. The executable "forth" is made.

   make forth

or:

```
gcc -std=c99 main.c unit.c libforth.c -o forth
```

2. The executable is used to generate a code file containing the contents of
   "forth.fth" is made.

   ./forth -s forth.core
3. A forth word, *core2c* is used to generate a C file called "core.gen.c":

   ./forth -l forth.core -e 'c" forth.core" c" core.gen.c" core2c'
4. The forth program is recompiled with an extra define, which means that
   initialization of a minimal forth environment is replaced with the core file we
   just made:

   gcc -DUSE\_BUILT\_IN\_CORE -std=c99 main.c unit.c libforth.c core.gen.c -o libforth

The new executable, *libforth*, behaves the same as *forth* but with a built in
core file.

## Notes

* The compilation should result in a small executable, and when statically
  linked against [musl](https://www.musl-libc.org/) under Linux (x86-84), the stripped executable is around
  50kb in size.
* It is quite possible to make Forth programs that corrupt memory that they
  should, this is not a design flaw in this interpreter but more part of the
  Forth philosophy. If you want memory safety (and most of the time you should)
  you should use a different language, or implementation.

<style type="text/css">body{margin:40px auto;max-width:850px;line-height:1.6;font-size:16px;color:#444;padding:0 10px}h1,h2,h3{line-height:1.2}</style>

## About

libforth: A small Forth interpreter that can be used as a library written in c99

### Topics

[c](/topics/c "Topic: c")
[interpreter](/topics/interpreter "Topic: interpreter")
[stack](/topics/stack "Topic: stack")
[virtual-machine](/topics/virtual-machine "Topic: virtual-machine")
[forth](/topics/forth "Topic: forth")

### Resources

[Readme](#readme-ov-file)

### License

[MIT license](#MIT-1-ov-file)

### Uh oh!

There was an error while loading. Please reload this page.

[Activity](/howerj/libforth/activity)

### Stars

[**200**
stars](/howerj/libforth/stargazers)

### Watchers

[**17**
watching](/howerj/libforth/watchers)

### Forks

[**31**
forks](/howerj/libforth/forks)

[Report repository](/contact/report-content?content_url=https%3A%2F%2Fgithub.com%2Fhowerj%2Flibforth&report=howerj+%28user%29)

## [Releases](/howerj/libforth/releases)

[6
tags](/howerj/libforth/tags)

## [Packages 0](/users/howerj/packages?repo_name=libforth)

No packages published

## Languages

* [C
  49.2%](/howerj/libforth/search?l=c)
* [Forth
  48.3%](/howerj/libforth/search?l=forth)
* [Makefile
  1.6%](/howerj/libforth/search?l=makefile)
* [Awk
  0.9%](/howerj/libforth/search?l=awk)

## Footer

© 2026 GitHub, Inc.

### Footer navigation

* [Terms](https://docs.github.com/site-policy/github-terms/github-terms-of-service)
* [Privacy](https://docs.github.com/site-policy/privacy-policies/github-privacy-statement)
* [Security](https://github.com/security)
* [Status](https://www.githubstatus.com/)
* [Community](https://github.community/)
* [Docs](https://docs.github.com/)
* [Contact](https://support.github.com?tags=dotcom-footer)
* Manage cookies
* Do not share my personal information


You can’t perform that action at this time.