asm_dip/toolchain/fasmg.kl0e/docs/fasmg.txt



What is flat assembler g?

It is an assembly engine designed as a successor of the one used in
flat assembler 1, one of the recognized assemblers for x86 processors.
This is a bare engine that by itself has no ability to recognize and
encode instructions of any processor, however it has the ability to
become an assembler for any CPU architecture. It has a macroinstruction
language that is substantially improved compared to the one provided by
flat assembler 1 and it allows to easily implement instruction encoders
in form of customizable macroinstructions.
  The source code of this tool can be compiled with flat assembler 1,
but it is also possible to use flat assembler g itself to compile it.
The source contains clauses that include different header files depending
on the assembler used. When flat assembler g compiles itself, it uses
the provided set of headers that implement x86 instructions and formats
with a syntax mostly compatible with flat assembler 1.
  The example programs for x86 architecture that come in this package are
the selected samples that originally came with flat assembler 1 and they
use sets of headers that implement instruction encoders and output formatters
required to assemble them just like the original flat assembler did.
  To demonstrate how the instruction sets of different architectures
may be implemented, there are some example programs for the microcontrollers,
8051 and AVR. They have been kept simple and therefore they do not provide
a complete framework for programming such CPUs, though they may provide
a solid base for the creation of such environments.
  There is also an example of assembling the JVM bytecode, which is
a conversion of the sample originally created for flat assembler 1. For this
reason it is somewhat crude and does not fully utilize the capabilities
offered by the new engine. However it is good at visualising the structure
of a class file.


How does this work?

The essential function of flat assembler g is to generate output defined
by the instructions in the source code. Given the one line of text as
shown below, the assembler would generate a single byte with the stated
value:

        db 90h

  The macroinstructions can be defined to generate some specific
sequences of data depending on the provided parameters. They may correspond
to the instructions of chosen machine language, as in the following example,
but they could as well be defined to generate other kinds of data, for
various purposes.

        macro int number
                if number = 3
                        db 0CCh
                else
                        db 0CDh, number
                end if
        end macro

        int 20h         ; generates two bytes

  The assembly as seen this way may be considered a kind of interpreted
language, and the assembler certainly has many characteristics of the
interpreter. However it also shares certain aspects with a compiler.
It is possible for an instruction to use the value which is defined
later in the source and may depend on the instructions that come before
that definition, as demonstrated by the following sample.

        macro jmpi target
                if target-($+2) < 80h & target-($+2) >= -80h
                        db 0EBh
                        db target-($+1)
                else
                        db 0E9h
                        dw target-($+2)
                end if
        end macro

                jmpi start
                db 'some data'
        start:

  The "jmpi" defined above produces the code of jump instruction as
in 8086 architecture. Such code contains the relative offset of the
target of a jump, stored in either single byte or 16-bit word.
The relative offset is computed as a difference between the address
of the target and the address of the next instruction. The special
symbol "$" provides the address of current instruction and it is
used to calculate the relative offset and determine whether it may
fit in a single byte.
  Therefore the code generated by "jmpi start" in the above sample
depends on the value of an address labeled as "start", and this
in turn depends on the length of the output of all the instructions
that precede it, including the said jump. This creates a loop of
dependencies and the assembler needs to find a solution that
fulfills all the constraints created by the source text. This would
not be possible if assembler was just an imperative interpreter.
Its language is thus in some aspects declarative.
  Finding a solution for such circular dependencies may resemble
solving an equation, and it is even possible to construct an example
where flat assembler g is indeed capable of solving one:

        x = (x-1)*(x+2)/2-2*(x+1)
        db x

  The circular reference has been reduced here to a single definition
that references itself to construct the value. The flat assembler g
is able to find a solution in this case, though in many others it may
fail. The method used by this assembler is to perform multiple passes
over the source text and then try to predict all the values with the
knowledge gathered this way. This approach is in most cases good enough
for the assembly of machine codes, but rarely suffices to solve the
complex equations and the above sample is one of the exceptions.


What are the means of parsing the arguments of an instruction?

Not all instructions have a simple syntax like then ones in the
previous examples. To aid in the processing of arguments that may
contain special constructions, flat assembler g provides a few
capable tools, demonstrated below on the examples that implement
selected few instructions of the Z80 processor. The rules governing
the use of presented features are found in the manual.
  When an instruction has a very small set of allowed arguments,
each one of them can be treated separately with the "match"
construction:

        macro EX? first,second
                match (=SP?), first
                        match =HL?, second
                                db 0E3h
                        else match =IX?, second
                                db 0DDh,0E3h
                        else match =IY?, second
                                db 0FDh,0E3h
                        else
                                err "incorrect second argument"
                        end match
                else match =AF?, first
                        match =AF'?, second
                                db 08h
                        else
                                err "incorrect second argument"
                        end match
                else match =DE?, first
                        match =HL?, second
                                db 0EBh
                        else
                                err "incorrect second argument"
                        end match
                else
                        err "incorrect first argument"
                end match
        end macro

        EX (SP),HL
        EX (SP),IX
        EX AF,AF'
        EX DE,HL

  The "?" character appears in many places to mark the names as
case-insensitive and all these occurrences could be removed to
further simplify the example.
  When the set of possible values of an argument is larger but
has some regularities, the textual substitutions can be defined
to replace some of the symbols with carefully chosen constructions
that can then be recognized and parsed:

        A? equ [:111b:]
        B? equ [:000b:]
        C? equ [:001b:]
        D? equ [:010b:]
        E? equ [:011b:]
        H? equ [:100b:]
        L? equ [:101b:]

        macro INC? argument
                match [:r:], argument
                        db 100b + r shl 3
                else match (=HL?), argument
                        db 34h
                else match (=IX?+d), argument
                        db 0DDh,34h,d
                else match (=IY?+d), argument
                        db 0FDh,34h,d
                else
                        err "incorrect argument"
                end match
        end macro

        INC A
        INC B
        INC (HL)
        INC (IX+2)

This approach has a trait that may not always be desirable:
it allows to use an expression like "[:0:]" directly in an argument.
But it is possible to prevent exploiting the syntax in such way
by using a prefix in the "match" construction:

        REG.A? equ [:111b:]
        REG.B? equ [:000b:]
        REG.C? equ [:001b:]
        REG.D? equ [:010b:]
        REG.E? equ [:011b:]
        REG.H? equ [:100b:]
        REG.L? equ [:101b:]

        macro INC? argument
                match [:r:], REG.argument
                        db 100b + r shl 3
                else match (=HL?), argument
                        db 34h
                else match (=IX?+d), argument
                        db 0DDh,34h,d
                else match (=IY?+d), argument
                        db 0FDh,34h,d
                else
                        err "incorrect argument"
                end match
        end macro

  In case of an argument structured like "(IX+d)" it could sometimes
be desired to allow other algebraically equivalent forms of the
expression, like "(d+IX)" or "(c+IX+d)". Instead of parsing every
possible variant individually, it is possible to let the assembler
evaluate the expression while treating the selected symbol in a distinct
way. When a symbol is declared as an "element", it has no value and
when it is used in an expression, it is treated algebraically like
a variable term in a polynomial.

        element HL?
        element IX?
        element IY?

        macro INC? argument
                match [:r:], REG.argument
                        db 100b + r shl 3
                else match (a), argument
                        if a eq HL
                                db 34h
                        else if a relativeto IX
                                db 0DDh,34h,a-IX
                        else if a relativeto IY
                                db 0FDh,34h,a-IY
                        else
                                err "incorrect argument"
                        end if
                else
                        err "incorrect argument"
                end match
        end macro

        INC (3*8+IX+1)

        virtual at IX
                x db ?
                y db ?
        end virtual

        INC (y)

  There is a small problem with the above macroinstruction. A parameter
may contain any text and when such value is placed into an expression,
it may induce erratic behavior. For example if "INC (1|0)" was processed,
it would turn the "a eq HL" expression into "1|0 eq HL" and this logical
expression is correct and true even though the argument was malformed.
Such unfortunate side-effect is a consequence of macroinstructions
operating on a simple principle of text substitution (and the best way
to avoid such problems is to use CALM instead). Here, to prevent it
from happening, a local variable may be used as a proxy holding the value
of an argument:

        macro INC? argument
                match [:r:], REG.argument
                        db 100b + r shl 3
                else match (a), argument
                        local value
                        value = a
                        if value eq HL
                                db 34h
                        else if value relativeto IX
                                db 0DDh,34h,a-IX
                        else if value relativeto IY
                                db 0FDh,34h,a-IY
                        else
                                err "incorrect argument"
                        end if
                else
                        err "incorrect argument"
                end match
        end macro

  There is an additional advantage of such proxy variable, thanks to
the fact that its value is computed before the macroinstruction begins
to generate any output. When an expression contains a symbol like "$",
it may give different values depending where it is calculated and
the use of proxy variable ensures that the value taken is the one
obtained by evaluating the argument before generating the code of
an instruction.
  When the set of symbols allowed in expressions is larger, it is
better to have a single construction to process an entire family
of them. An "element" declaration may associate an additional value
with a symbol and this information can then be retrieved with
the "metadata" operator applied to a linear polynomial that contains
given symbol as a variable. The following example is another
variant of the previous macroinstruction that demonstrates the use
of this feature:

        element register
        element A? : register + 111b
        element B? : register + 000b
        element C? : register + 001b
        element D? : register + 010b
        element E? : register + 011b
        element H? : register + 100b
        element L? : register + 101b

        element HL?
        element IX?
        element IY?

        macro INC? argument
                local value
                match (a), argument
                        value = a
                        if value eq HL
                                db 34h
                        else if value relativeto IX
                                db 0DDh,34h,a-IX
                        else if value relativeto IY
                                db 0FDh,34h,a-IY
                        else
                                err "incorrect argument"
                        end if
                else match any more, argument
                        err "incorrect argument"
                else
                        value = argument
                        if value eq value element 1 & value metadata 1 relativeto register
                                db 100b + (value metadata 1 - register) shl 3
                        else
                                err "incorrect argument"
                        end if
                end match
        end macro

  The "any more" pattern is there to catch any argument that
contains a complex expressions consisting of more than one token.
This prevents the use of syntax like "INC A+0" or "INC A+B-A".
But in case of some of the instructions sets, the inclusion of such
constraint may depend on a personal preference.
  The "value eq value element 1" condition ensures that the value does not
contain any terms other than the name of a register. Even when an argument
is forced to contain no more than a single token, it is still possible
that is has a complex value, for instance if there were definitions like
"X = A + B" or "Y = 2 * A". Both "INC X" and "INC Y" would then cause
the operator "element 1" to return the value "A", which differs from the
value checked in either case.
  If an instruction takes a variable number of arguments, a simple
way to recognize its various forms is to declare an argument with "&"
modifier to pass the complete contents of the arguments to "match":

        element CC

        NZ? := CC + 000b
        Z?  := CC + 001b
        NC? := CC + 010b
        C?  := CC + 011b
        PO  := CC + 100b
        PE  := CC + 101b
        P   := CC + 110b
        M   := CC + 111b

        macro CALL? arguments&
                local cc,nn
                match condition =, target, arguments
                        cc = condition - CC
                        nn = target
                        db 0C4h + cc shl 3
                else
                        nn = arguments
                        db 0CDh
                end match
                dw nn
        end macro

        CALL 0
        CALL NC,2135h

This approach also allows to handle other, more difficult cases, like when
the arguments may contain commas or are delimited in different ways.


How are the labels processed?

A standard way of defining a label is by following its name with ":" (this
also acts like a line break and any other command, including another label,
may follow in the same line). Such label simply defines a symbol with
the value equal to the current address, which initially is zero and increases
when any bytes are added into the output.
  In some variants of assembly language it may be desirable to allow label
to precede an instruction without an additional ":" inbetween. It is then
necessary to create a labeled macroinstruction that after defining a label
passes processing to the original macroinstruction with the same name:

        struc INC? argument
                .:
                INC argument
        end struc

        start   INC A
                INC B

This has to be done for every instruction that needs to allow this kind
of syntax. A simple loop like the following one would suffice:

        iterate instruction, EX,INC,CALL
                struc instruction? argument
                        .: instruction argument
                end struc
        end iterate

Every built-in instruction that defines data already has the labeled variant.
  By defining a labeled instruction that has "?" in place of name it is
possible to intercept every line that starts with an identifier that is not
a known instruction and is therefore assumed to be a label. The following one
would allow a label without ":" to begin any line in the source text (it also
handles the special cases so that labels followed with ":" or with "=" and
a value would still work):

        struc ? tail&
                match :, tail
                        .:
                else match : instruction, tail
                        .: instruction
                else match == value, tail
                        . = value
                else
                        .: tail
                end match
        end struc

Obviously, it is no longer needed to define any specific labeled
macrointructions when a global effect of this kind is applied. A variant
should be chosen depending on the type of syntax that needs to be allowed.
  Intercepting even the labels defined with ":" may become useful when the
value of current address requires some additional processing before being
assigned to a label - for example when a processor uses addresses with a
unit larger than a byte. The intercepting macroinstruction might then look
like this:

        struc ? tail&
                match :, tail
                        label . at $ shr 1
                else match : instruction, tail
                        label . at $ shr 1
                        instruction
                else
                        . tail
                end match
        end struc

  The value of current address that is used to define labels may be altered
with "org". If the labels need to be differentiated from absolute values,
a symbol defined with "element" may be used to form an address:

        element CODEBASE
        org CODEBASE + 0

        macro CALL? argument
                local value
                value = argument
                if value relativeto CODEBASE
                        db 0CDh
                        dw value - CODEBASE
                else
                        err "incorrect argument"
                end if
        end macro

  To define labels in an address space that is not going to be reflected in
the output, a "virtual" block should be declared. The following sample
prepares macroinstructions "DATA" and "CODE" to switch between generating
program instructions and data labels. Only the instruction codes would go to
the output:

        element DATA
        DATA_OFFSET = 2000h
        element CODE
        CODE_OFFSET = 1000h

        macro DATA?
                _END
                virtual at DATA + DATA_OFFSET
        end macro

        macro CODE?
                _END
                org CODE + CODE_OFFSET
        end macro

        macro _END?
                if $ relativeto DATA
                        DATA_OFFSET = $ - DATA
                        end virtual
                else if $ relativeto CODE
                        CODE_OFFSET = $ - CODE
                end if
        end macro

        postpone
                _END
        end postpone

        CODE

The "postpone" block is used here to ensure that the "virtual" block
always gets closed correctly, even if source text ends with data
definitions.
  Within the environment prepared by the above sample any instruction
would be able to distinguish data labels from the ones defined within
program. For example a branching instruction could be made to accept
an argument being either a label within a program or an absolute value,
but to disallow any label of data:

        macro CALL? argument
                local value
                value = argument
                if value relativeto CODE
                        db 0CDh
                        dw value - CODE
                else if value relativeto 0
                        db 0CDh
                        dw value
                else
                        err "incorrect argument"
                end if
        end macro

        DATA

        variable db ?

        CODE

        routine:

In this context either "CALL routine" or "CALL 1000h" would be allowed,
while "CALL variable" would not be.
  When the labels have values that are not absolute numbers, it is
possible to generate relocations for instructions that use them.
A special "virtual" block may be used to store the offsets of values
inside the program that need to be relocated when its base changes:

        virtual at 0
                Relocations::
                rw RELOCATION_COUNT
        end virtual

        RELOCATION_INDEX = 0

        postpone
                RELOCATION_COUNT := RELOCATION_INDEX
        end postpone

        macro WORD? value
                if value relativeto CODE
                        store $ - CODE : 2 at Relocations : RELOCATION_INDEX shl 1
                        RELOCATION_INDEX = RELOCATION_INDEX + 1
                        dw value - CODE
                else
                        dw value
                end if
        end macro

        macro CALL? argument
                local value
                value = argument
                if value relativeto CODE | value relativeto 0
                        db 0CDh
                        word value
                else
                        err "incorrect argument"
                end if
        end macro

The table of relocations that is created this way can then be accessed
with "load". The following two lines could be used to put the table
in its entirety somewhere in the output:

        load RELOCATIONS : RELOCATION_COUNT shl 1 from Relocations : 0
        dw RELOCATIONS

The "load" reads the whole table into a single string, then "dw" writes it
into output (padded to multiple of a word, but in this case the string never
requires such padding).
  For more complex types of relocations additional modifier may need to be
employed. For example, if upper and lower portions of an address needed to be
stored in separate places (likely across two instructions) and relocated
separately, necessary modifiers could be implemented as follows:

        element MOD.HIGH
        element MOD.LOW

        HIGH? equ MOD.HIGH +
        LOW? equ MOD.LOW +

        macro BYTE? value
                if value relativeto MOD.HIGH + CODE
                        ; register HIGH relocation
                        db (value - MOD.HIGH - CODE) shr 8
                else if value relativeto MOD.LOW + CODE
                        ; register LOW relocation
                        db (value - MOD.LOW - CODE) and 0FFh
                else if value relativeto MOD.HIGH
                        db (value - MOD.HIGH) shr 8
                else if value relativeto MOD.LOW
                        db (value - MOD.LOW) and 0FFh
                else
                        db value
                end if
        end macro

The commands that would register relocation have been omitted for clarity,
in this case not only offset within code but some additional information would
need to registered in appropriate structures. With such preparation, relocatable
units in code might be generated like:

        BYTE HIGH address
        BYTE LOW address

Such approach allows to easily enable syntax with modifiers in any instruction
that internally uses "byte" macroinstruction when generating code.


How can multiple sections of file be generated in parallel?

This assembly engine has a single main output that has to be generated
sequentially. This may seem problematic when the file needs to contains
distinct sections for code and data, collected from interleaved pieces that
may be spread across multiple source files. There are, however, a couple of
methods to handle it, all based in one way or another on forward-referencing
capabilities of the assembler.
  A natural approach is to define contents of auxiliary section in "virtual"
block and copy it to appropriate position in the output with a single
operation. When a "virtual" block is labeled, it can be re-opened multiple
times to append more data to it.

                include '8086.inc'
                org     100h
                jmp     CodeSection

        DataSection:

                virtual
                        Data::
                end virtual

                postpone
                        virtual Data
                                load Data.OctetString : $ - $$ from $$
                        end virtual
                end postpone

                db Data.OctetString

        CodeSection:

                virtual Data
                        Hello db "Hello!",24h
                end virtual

                mov     ah,9
                mov     dx,Hello
                int     21h

                virtual Data
                        ExitCode db 37h
                end virtual

                mov     ah,4Ch
                mov     al,[ExitCode]
                int     21h

This leads to a relatively simple syntax even without help of additional
macros.
  Another method could be to put the pieces of the section into macros and
execute them all at the required position in source. A disadvantage of such
approach is that tracing errors in definitions might become a bit cumbersome.
  The techniques that allow to easily append to a section generated in
parallel can also be very useful to generate data structures like relocation
tables. Instead of "store" commands used earlier when demonstrating
the concept, regular data directives could be used inside a re-opened
"virtual" block to create relocation records.


What options are there to parse other kinds of syntax?

In some cases a command that assembler needs to parse may begin with
something different than a name of instruction or a label. It may be
that a name is preceded by a special character, like "." or "!",
or that it is an entirely different kind of construction. It is then
necessary to use "macro ?" to intercept whole lines of source text
and process any special syntax of such kind.
  For example, if it was required to allow a command written as ".CODE",
it would not be possible to implement it directly as a macroinstruction,
because initial dot causes the symbol to be interpreted as a local one
and globally defined instruction could never be executed this way.
The intercepting macroinstruction provides a solution:

        macro ? line&
                match .=CODE?, line
                        CODE
                else match .=DATA?, line
                        DATA
                else
                        line
                end match
        end macro

The lines that contain either ".CODE" or ".DATA" text are processed here
in such a way, that they invoke the global macroinstruction with
corresponding name, while all other intercepted lines are executed without
changes. This method allows to filter out any special syntax and let
the assembler process the regular instructions as usual.
  Sometimes unconventional syntax is expected only in a specific area
of source text, like inside a block with defined boundaries. The
parsing macroinstruction should then be applied only in this place,
and removed with "purge" when the block ends:

        macro concise
                macro ? line&
                        match =end =concise, line
                                purge ?
                        else match dest+==src, line
                                ADD dest,src
                        else match dest-==src, line
                                SUB dest,src
                        else match dest==src, line
                                LD dest,src
                        else match dest++, line
                                INC dest
                        else match dest--, line
                                DEC dest
                        else match any, line
                                err "syntax error"
                        end match
                end macro
        end macro

        concise
                C=0
                B++
                A+=2
        end concise

A macroinstruction defined this way does not intercept lines that contain
directives controlling the flow of the assembly, like "if" or "repeat", and
they can still be used freely inside such a block. This would change if
the declaration was in the form "macro ?! line&". Such a variant would
intercept every line with no exception.
  Another option to catch special commands might be to use "struc ?"
to intercept only lines that do not start with a known instruction
(the initial symbol is then treated as label). Since this one only tests
unknown commands, it should cause less overhead on the assembly:

        struc (head) ? tail&
                match .=CODE?, head
                        CODE tail
                else
                        head tail
                end match
        end struc

  All these approaches hide a subtle trap. A label defined with ":" may be
followed by another instruction in the same line. If that next instruction
(which here becomes hidden in the "tail" parameter) is a control directive
like "if", putting it inside the "else" clause is going to cause broken nesting
of control blocks. A possible solution is to somehow invoke "tail" contents
outside of "match" block. One way could be to call a special macro:

        struc (head) ? tail&
                local invoker
                match .=CODE?, head
                        macro invoker
                                CODE tail
                        end macro
                else
                        macro invoker
                                head tail
                        end macro
                end match
                invoker
        end struc

A simpler option is to call the original line directly and when override
is needed, cause it to be ignored with help of another line interceptor
(disposing of itself immediately after):

        struc (head) ? tail&
                match .=CODE?, head
                        CODE tail
                        macro ? line&
                                purge ?
                        end macro
                end match
                head tail
        end struc

However, a much better way of avoiding this kinds of pitfalls is to use
CALM instructions instead of standard macros. There it is possible to
process arguments and assemble the original or modified line without
use of any control directives. CALM instructions also offer a much better
performance, which might be especially important in case of interceptors
that get called for nearly every line in source text.


How to define an instruction sharing a name with one of the core directives?

  It may happen that a language can be in general easily implemented with
macros, but it needs to include a command with the same name as one of
the directives of assembler. While it is possible to override any
instruction with a macro, macros themself may require an access to
the original directive. To allow the same name call a different instruction
depending on the context, the implemented language may be interpreted
within a namespace that contains overriding macro, while all the macros
requiring access to original directive would have to temporarily switch
to another namespace where it has not have been overridden. This would
require every such macro to pack its contents in a "namespace" block.
  But there is another trick, related to how texts of macro parameters
or symbolic variables preserve the context under which the symbols within
them should be interpreted (this includes the base namespace and
the parent label for symbols starting with dot).
  Unlike the two mentioned occurences, the text of a macro normally does
not carry such extra information, but if a macro is constructed in such way
that it contains text that was once carried within a parameter to another
macro or within a symbolic variable, then this text retains the information
about context even when it becomes a part of a newly defined macro.
For example:

        macro definitions end?
                namespace embedded
                struc LABEL? size
                        match , size
                                .:
                        else
                                label . : size
                        end match
                end struc
                macro E#ND? name
                        end namespace
                        match any, name
                                ENTRYPOINT := name
                        end match
                        macro ?! line&
                        end macro
                end macro
        end macro

        definitions end

        start LABEL
        END start

The parameter given to "definitions" macro may appear to do nothing, as it
replaces every instance of "end" with exactly the same word - but the text
that comes from the parameter is equipped with additional information about
context, and this attribute is then preserved when the text becomes a part
of a new macro. Thanks to that, macro "LABEL" can be used in a namespace
where "end" instruction has taken a different meaning, but the instances
of "end" within its body still refer to the symbol in the outer namespace.
  In this example the parameter has been made case-insensitive, and thus
it would replace even the "END" in "macro" statement that is supposed to
define a symbol in "embedded" namespace. For this reason the identifier
has been split with a concatenation operator to prevent it from being
recognized as parameter. This would not be necessary if the parameter
was case-sensitive (as more usual).
  The same effect can be achieved through use of symbolic variables instead
of macro parameters, with help of "match" to extract the text of a symbolic
variable:

        define link end
        match end, link
                namespace embedded
                struc LABEL? size
                        match , size
                                .:
                        else
                                label . : size
                        end match
                end struc
                macro END? name
                        end namespace
                        match any, name
                                ENTRYPOINT := name
                        end match
                        macro ?! line&
                        end macro
                end macro
        end match

        start LABEL
        END start

This would not work without passing the text through symbolic variable,
because parameters defined by control directives like "match" do not
add context information to the text unless it was already there.
  CALM instructions allow for another approach to this kind of problems.
If a customized instruction set is defined entirely in form of CALM,
they may not even need an access to original control directives.
However, if CALM instruction needs to assemble a directive that might not
be accessible, the symbolic variable passed to "assemble" should be
defined with appropriate context for the instruction symbol.


How to convert a macroinstruction to CALM?

A classic macroinstruction consists of lines of text that are preprocessed
(by replacing names of parameters with their corresponding values) every time
the instruction is called and these preprocessed lines are passed to assembly.
For example this macroinstruction generates just a single line to be assembled,
and it does it by replacing "number" with the text given by the only argument
to the instruction:

        macro octet value*
                db value
        end macro

A CALM instruction can be viewed as customized preprocessor, which needs to
be written in a special language. It is able to use various commands to
process the arguments and generate lines to be assembled. On the basic
level, it is also able to simulate what standard preprocessor does - with
help of "arrange" command. After preprocessing the line, it also needs to
explicitly pass it to the assembly with an "assemble" command:

        calminstruction octet value*
                arrange value, =db value
                assemble value
        end calminstruction

This gives the same result as the original macroinstruction, as it performs
the same kind of preprocessing. However, unlike the text of macroinstruction
a pattern given to "arrange" needs to explicitly state which name tokens are
to be replaced with their values and which ones (prepended with "=") should
be left untouched. The tokens that are copied from the pattern are stripped of
any context information, just like the text of macroinstruction is normally not
carrying any (while the values that came from arguments retain the recognition
context in which the instruction was started).
  This is the most straightforward method of conversion and a simple sequence
of "arrange" and "assemble" commands could be made to generate the same lines as
by the original macroinstruction. But there is one exception - when a "local"
command is executed by macroinstruction, it creates a preprocessed parameter
with a special value that points to a symbol in the namespace unique to given
instance of the instruction.

        macro pointer
                local next
                dd next
            next:
        end macro

In case of CALM there is no such namespace available, the local namespace of
a CALM instruction is shared among all its instances. Therefore, if a new unique
symbol is needed every time the instruction is called, it has to be constructed
manually. An obvious method might be to append a unique number to the name:

        global_uid = 0

        calminstruction pointer
                compute global_uid, global_uid + 1
                local command
                arrange command, =dd =next#global_uid
                assemble command
                arrange command, =next#global_uid:
                assemble command
        end calminstruction

Here "arrange" is given a variable that has a numeric value and it has to
replace it with a text. This works only when the value is a plan non-negative
number, in such case "arrange" converts it to a text token that contains decimal
representation of that number. The lines passed to assembly are therefore
going to contains identifiers like "next#1".
  While incrementation of the global counter could be done by preparing
a standard assembly command like "global_uid = global_uid + 1" with "arrange"
and passing it to assembly, "compute" command allows to do it directly in the
CALM processor. Moreover, it is then not affected by anything that alters
the context of assembly. If the instruction was defined as unconditional and
used inside a skipped IF block, the "compute" would still perform its task,
because execution of CALM commands is - just like standard preprocessing - done
independently from the main flow of the assembly. Also, references to
the "global_uid" always point to the same symbol - the one that was in scope
when the CALM instruction was defined and compiled. Therefore incrementing
the value with "compute" is more reliable and predictable.
  In a similar manner, the assembly of line defining the label can be replaced
with a "publish" command. Here the value of the label (which should be equal
to the address after the line containing "dd" is assembled) needs to be computed
first, because "publish" only performs the assignment of a value to the symbol:

        global_uid = 0

        calminstruction pointer
                compute global_uid, global_uid + 1
                local symbol, command
                arrange symbol, =next#global_uid
                arrange command, =dd symbol
                assemble command
                local address
                compute address, $
                publish symbol:, address
        end calminstruction

Because the CALM instruction itself is conditional, the "publish" inside is
effectively conditional, too. Therefore it works correctly as a replacement
for the assembly of line with a label.
  While a global counter has several advantages, it can be interfered with,
so sometimes use of a local counter might be preferable. However, the local
namespace of CALM instruction is not normally not accessible from outside, so
it is a bit harder to give an initial value to such counter. One way could be
to check whether the counter has already been initialized with some value using
"take" command:

        calminstruction pointer
                local id
                take id, id
                jyes increment
                compute id, 0
            increment:
                compute id, id + 1
                local symbol, command
                arrange symbol, =next#id
                arrange command, =dd symbol
                assemble command
                local address
                compute address, $
                publish symbol:, address
        end calminstruction

But this adds commands that are executed every time the instruction is called.
A better solution makes use of the ability to define custom instructions
processed during the definition of CALM instruction:

        calminstruction calminstruction?.init? var*, val:0
                compute val, val
                publish var, val
        end calminstruction

        calminstruction pointer
                local id
                init id, 0
                compute id, id + 1
                local symbol, command
                arrange symbol, =next#id
                arrange command, =dd symbol
                assemble command
                local address
                compute address, $
                publish symbol:, address
        end calminstruction

The custom statement "init" is called at the time when the CALM instruction is
defined (it does not generate any commands to be executed by the defined
instruction - it would itself have to use "assemble" commands to generate
statements to be compiled). It is given the name of variable from the local
scope of the CALM instruction, and it uses "publish" to assign an initial
numeric value to that variable.
  To initialize local variable with a symbolic value, even simpler custom
instruction would suffice:

        calminstruction calminstruction?.initsym? var*, val&
                publish var, val
        end calminstruction

The text of "val" argument carries the recognition context of the definition
of CALM instruction that contains the "initsym" statement, therefore it allows
to prepare a text for "assemble" containing references to local symbols:

        calminstruction be32? value
                local command
                initsym command, dd value
                compute value, value bswap 4
                assemble command
        end calminstruction

Again, after this intruction is compiled, it contains just two actual commands,
"compute" and "assemble", and the value of local symbol "command" is a text
that is interpreted in the same local context and refers to the same symbol
"value" as the "compute" does.
  This example also demonstrates another advantage of CALM over standard
macroinstructions: its strict semantics prevent various kinds of unwanted
behavior that is allowed by a simple substitution of text. The text of "value"
is going to be evaluated by "compute" as a numeric sub-expression, signalling
an error on any unexpected syntax. Therefore it should be favorable to process
arguments entirely through CALM commands and only use "assemble" for final
simple statements.