asm_dip/toolchain/fasmg.kl0e/docs/fasmg.txt
2024-11-24 23:13:28 -05:00

1101 lines
44 KiB
Plaintext

What is flat assembler g?
It is an assembly engine designed as a successor of the one used in
flat assembler 1, one of the recognized assemblers for x86 processors.
This is a bare engine that by itself has no ability to recognize and
encode instructions of any processor, however it has the ability to
become an assembler for any CPU architecture. It has a macroinstruction
language that is substantially improved compared to the one provided by
flat assembler 1 and it allows to easily implement instruction encoders
in form of customizable macroinstructions.
The source code of this tool can be compiled with flat assembler 1,
but it is also possible to use flat assembler g itself to compile it.
The source contains clauses that include different header files depending
on the assembler used. When flat assembler g compiles itself, it uses
the provided set of headers that implement x86 instructions and formats
with a syntax mostly compatible with flat assembler 1.
The example programs for x86 architecture that come in this package are
the selected samples that originally came with flat assembler 1 and they
use sets of headers that implement instruction encoders and output formatters
required to assemble them just like the original flat assembler did.
To demonstrate how the instruction sets of different architectures
may be implemented, there are some example programs for the microcontrollers,
8051 and AVR. They have been kept simple and therefore they do not provide
a complete framework for programming such CPUs, though they may provide
a solid base for the creation of such environments.
There is also an example of assembling the JVM bytecode, which is
a conversion of the sample originally created for flat assembler 1. For this
reason it is somewhat crude and does not fully utilize the capabilities
offered by the new engine. However it is good at visualising the structure
of a class file.
How does this work?
The essential function of flat assembler g is to generate output defined
by the instructions in the source code. Given the one line of text as
shown below, the assembler would generate a single byte with the stated
value:
db 90h
The macroinstructions can be defined to generate some specific
sequences of data depending on the provided parameters. They may correspond
to the instructions of chosen machine language, as in the following example,
but they could as well be defined to generate other kinds of data, for
various purposes.
macro int number
if number = 3
db 0CCh
else
db 0CDh, number
end if
end macro
int 20h ; generates two bytes
The assembly as seen this way may be considered a kind of interpreted
language, and the assembler certainly has many characteristics of the
interpreter. However it also shares certain aspects with a compiler.
It is possible for an instruction to use the value which is defined
later in the source and may depend on the instructions that come before
that definition, as demonstrated by the following sample.
macro jmpi target
if target-($+2) < 80h & target-($+2) >= -80h
db 0EBh
db target-($+1)
else
db 0E9h
dw target-($+2)
end if
end macro
jmpi start
db 'some data'
start:
The "jmpi" defined above produces the code of jump instruction as
in 8086 architecture. Such code contains the relative offset of the
target of a jump, stored in either single byte or 16-bit word.
The relative offset is computed as a difference between the address
of the target and the address of the next instruction. The special
symbol "$" provides the address of current instruction and it is
used to calculate the relative offset and determine whether it may
fit in a single byte.
Therefore the code generated by "jmpi start" in the above sample
depends on the value of an address labeled as "start", and this
in turn depends on the length of the output of all the instructions
that precede it, including the said jump. This creates a loop of
dependencies and the assembler needs to find a solution that
fulfills all the constraints created by the source text. This would
not be possible if assembler was just an imperative interpreter.
Its language is thus in some aspects declarative.
Finding a solution for such circular dependencies may resemble
solving an equation, and it is even possible to construct an example
where flat assembler g is indeed capable of solving one:
x = (x-1)*(x+2)/2-2*(x+1)
db x
The circular reference has been reduced here to a single definition
that references itself to construct the value. The flat assembler g
is able to find a solution in this case, though in many others it may
fail. The method used by this assembler is to perform multiple passes
over the source text and then try to predict all the values with the
knowledge gathered this way. This approach is in most cases good enough
for the assembly of machine codes, but rarely suffices to solve the
complex equations and the above sample is one of the exceptions.
What are the means of parsing the arguments of an instruction?
Not all instructions have a simple syntax like then ones in the
previous examples. To aid in the processing of arguments that may
contain special constructions, flat assembler g provides a few
capable tools, demonstrated below on the examples that implement
selected few instructions of the Z80 processor. The rules governing
the use of presented features are found in the manual.
When an instruction has a very small set of allowed arguments,
each one of them can be treated separately with the "match"
construction:
macro EX? first,second
match (=SP?), first
match =HL?, second
db 0E3h
else match =IX?, second
db 0DDh,0E3h
else match =IY?, second
db 0FDh,0E3h
else
err "incorrect second argument"
end match
else match =AF?, first
match =AF'?, second
db 08h
else
err "incorrect second argument"
end match
else match =DE?, first
match =HL?, second
db 0EBh
else
err "incorrect second argument"
end match
else
err "incorrect first argument"
end match
end macro
EX (SP),HL
EX (SP),IX
EX AF,AF'
EX DE,HL
The "?" character appears in many places to mark the names as
case-insensitive and all these occurrences could be removed to
further simplify the example.
When the set of possible values of an argument is larger but
has some regularities, the textual substitutions can be defined
to replace some of the symbols with carefully chosen constructions
that can then be recognized and parsed:
A? equ [:111b:]
B? equ [:000b:]
C? equ [:001b:]
D? equ [:010b:]
E? equ [:011b:]
H? equ [:100b:]
L? equ [:101b:]
macro INC? argument
match [:r:], argument
db 100b + r shl 3
else match (=HL?), argument
db 34h
else match (=IX?+d), argument
db 0DDh,34h,d
else match (=IY?+d), argument
db 0FDh,34h,d
else
err "incorrect argument"
end match
end macro
INC A
INC B
INC (HL)
INC (IX+2)
This approach has a trait that may not always be desirable:
it allows to use an expression like "[:0:]" directly in an argument.
But it is possible to prevent exploiting the syntax in such way
by using a prefix in the "match" construction:
REG.A? equ [:111b:]
REG.B? equ [:000b:]
REG.C? equ [:001b:]
REG.D? equ [:010b:]
REG.E? equ [:011b:]
REG.H? equ [:100b:]
REG.L? equ [:101b:]
macro INC? argument
match [:r:], REG.argument
db 100b + r shl 3
else match (=HL?), argument
db 34h
else match (=IX?+d), argument
db 0DDh,34h,d
else match (=IY?+d), argument
db 0FDh,34h,d
else
err "incorrect argument"
end match
end macro
In case of an argument structured like "(IX+d)" it could sometimes
be desired to allow other algebraically equivalent forms of the
expression, like "(d+IX)" or "(c+IX+d)". Instead of parsing every
possible variant individually, it is possible to let the assembler
evaluate the expression while treating the selected symbol in a distinct
way. When a symbol is declared as an "element", it has no value and
when it is used in an expression, it is treated algebraically like
a variable term in a polynomial.
element HL?
element IX?
element IY?
macro INC? argument
match [:r:], REG.argument
db 100b + r shl 3
else match (a), argument
if a eq HL
db 34h
else if a relativeto IX
db 0DDh,34h,a-IX
else if a relativeto IY
db 0FDh,34h,a-IY
else
err "incorrect argument"
end if
else
err "incorrect argument"
end match
end macro
INC (3*8+IX+1)
virtual at IX
x db ?
y db ?
end virtual
INC (y)
There is a small problem with the above macroinstruction. A parameter
may contain any text and when such value is placed into an expression,
it may induce erratic behavior. For example if "INC (1|0)" was processed,
it would turn the "a eq HL" expression into "1|0 eq HL" and this logical
expression is correct and true even though the argument was malformed.
Such unfortunate side-effect is a consequence of macroinstructions
operating on a simple principle of text substitution (and the best way
to avoid such problems is to use CALM instead). Here, to prevent it
from happening, a local variable may be used as a proxy holding the value
of an argument:
macro INC? argument
match [:r:], REG.argument
db 100b + r shl 3
else match (a), argument
local value
value = a
if value eq HL
db 34h
else if value relativeto IX
db 0DDh,34h,a-IX
else if value relativeto IY
db 0FDh,34h,a-IY
else
err "incorrect argument"
end if
else
err "incorrect argument"
end match
end macro
There is an additional advantage of such proxy variable, thanks to
the fact that its value is computed before the macroinstruction begins
to generate any output. When an expression contains a symbol like "$",
it may give different values depending where it is calculated and
the use of proxy variable ensures that the value taken is the one
obtained by evaluating the argument before generating the code of
an instruction.
When the set of symbols allowed in expressions is larger, it is
better to have a single construction to process an entire family
of them. An "element" declaration may associate an additional value
with a symbol and this information can then be retrieved with
the "metadata" operator applied to a linear polynomial that contains
given symbol as a variable. The following example is another
variant of the previous macroinstruction that demonstrates the use
of this feature:
element register
element A? : register + 111b
element B? : register + 000b
element C? : register + 001b
element D? : register + 010b
element E? : register + 011b
element H? : register + 100b
element L? : register + 101b
element HL?
element IX?
element IY?
macro INC? argument
local value
match (a), argument
value = a
if value eq HL
db 34h
else if value relativeto IX
db 0DDh,34h,a-IX
else if value relativeto IY
db 0FDh,34h,a-IY
else
err "incorrect argument"
end if
else match any more, argument
err "incorrect argument"
else
value = argument
if value eq value element 1 & value metadata 1 relativeto register
db 100b + (value metadata 1 - register) shl 3
else
err "incorrect argument"
end if
end match
end macro
The "any more" pattern is there to catch any argument that
contains a complex expressions consisting of more than one token.
This prevents the use of syntax like "INC A+0" or "INC A+B-A".
But in case of some of the instructions sets, the inclusion of such
constraint may depend on a personal preference.
The "value eq value element 1" condition ensures that the value does not
contain any terms other than the name of a register. Even when an argument
is forced to contain no more than a single token, it is still possible
that is has a complex value, for instance if there were definitions like
"X = A + B" or "Y = 2 * A". Both "INC X" and "INC Y" would then cause
the operator "element 1" to return the value "A", which differs from the
value checked in either case.
If an instruction takes a variable number of arguments, a simple
way to recognize its various forms is to declare an argument with "&"
modifier to pass the complete contents of the arguments to "match":
element CC
NZ? := CC + 000b
Z? := CC + 001b
NC? := CC + 010b
C? := CC + 011b
PO := CC + 100b
PE := CC + 101b
P := CC + 110b
M := CC + 111b
macro CALL? arguments&
local cc,nn
match condition =, target, arguments
cc = condition - CC
nn = target
db 0C4h + cc shl 3
else
nn = arguments
db 0CDh
end match
dw nn
end macro
CALL 0
CALL NC,2135h
This approach also allows to handle other, more difficult cases, like when
the arguments may contain commas or are delimited in different ways.
How are the labels processed?
A standard way of defining a label is by following its name with ":" (this
also acts like a line break and any other command, including another label,
may follow in the same line). Such label simply defines a symbol with
the value equal to the current address, which initially is zero and increases
when any bytes are added into the output.
In some variants of assembly language it may be desirable to allow label
to precede an instruction without an additional ":" inbetween. It is then
necessary to create a labeled macroinstruction that after defining a label
passes processing to the original macroinstruction with the same name:
struc INC? argument
.:
INC argument
end struc
start INC A
INC B
This has to be done for every instruction that needs to allow this kind
of syntax. A simple loop like the following one would suffice:
iterate instruction, EX,INC,CALL
struc instruction? argument
.: instruction argument
end struc
end iterate
Every built-in instruction that defines data already has the labeled variant.
By defining a labeled instruction that has "?" in place of name it is
possible to intercept every line that starts with an identifier that is not
a known instruction and is therefore assumed to be a label. The following one
would allow a label without ":" to begin any line in the source text (it also
handles the special cases so that labels followed with ":" or with "=" and
a value would still work):
struc ? tail&
match :, tail
.:
else match : instruction, tail
.: instruction
else match == value, tail
. = value
else
.: tail
end match
end struc
Obviously, it is no longer needed to define any specific labeled
macrointructions when a global effect of this kind is applied. A variant
should be chosen depending on the type of syntax that needs to be allowed.
Intercepting even the labels defined with ":" may become useful when the
value of current address requires some additional processing before being
assigned to a label - for example when a processor uses addresses with a
unit larger than a byte. The intercepting macroinstruction might then look
like this:
struc ? tail&
match :, tail
label . at $ shr 1
else match : instruction, tail
label . at $ shr 1
instruction
else
. tail
end match
end struc
The value of current address that is used to define labels may be altered
with "org". If the labels need to be differentiated from absolute values,
a symbol defined with "element" may be used to form an address:
element CODEBASE
org CODEBASE + 0
macro CALL? argument
local value
value = argument
if value relativeto CODEBASE
db 0CDh
dw value - CODEBASE
else
err "incorrect argument"
end if
end macro
To define labels in an address space that is not going to be reflected in
the output, a "virtual" block should be declared. The following sample
prepares macroinstructions "DATA" and "CODE" to switch between generating
program instructions and data labels. Only the instruction codes would go to
the output:
element DATA
DATA_OFFSET = 2000h
element CODE
CODE_OFFSET = 1000h
macro DATA?
_END
virtual at DATA + DATA_OFFSET
end macro
macro CODE?
_END
org CODE + CODE_OFFSET
end macro
macro _END?
if $ relativeto DATA
DATA_OFFSET = $ - DATA
end virtual
else if $ relativeto CODE
CODE_OFFSET = $ - CODE
end if
end macro
postpone
_END
end postpone
CODE
The "postpone" block is used here to ensure that the "virtual" block
always gets closed correctly, even if source text ends with data
definitions.
Within the environment prepared by the above sample any instruction
would be able to distinguish data labels from the ones defined within
program. For example a branching instruction could be made to accept
an argument being either a label within a program or an absolute value,
but to disallow any label of data:
macro CALL? argument
local value
value = argument
if value relativeto CODE
db 0CDh
dw value - CODE
else if value relativeto 0
db 0CDh
dw value
else
err "incorrect argument"
end if
end macro
DATA
variable db ?
CODE
routine:
In this context either "CALL routine" or "CALL 1000h" would be allowed,
while "CALL variable" would not be.
When the labels have values that are not absolute numbers, it is
possible to generate relocations for instructions that use them.
A special "virtual" block may be used to store the offsets of values
inside the program that need to be relocated when its base changes:
virtual at 0
Relocations::
rw RELOCATION_COUNT
end virtual
RELOCATION_INDEX = 0
postpone
RELOCATION_COUNT := RELOCATION_INDEX
end postpone
macro WORD? value
if value relativeto CODE
store $ - CODE : 2 at Relocations : RELOCATION_INDEX shl 1
RELOCATION_INDEX = RELOCATION_INDEX + 1
dw value - CODE
else
dw value
end if
end macro
macro CALL? argument
local value
value = argument
if value relativeto CODE | value relativeto 0
db 0CDh
word value
else
err "incorrect argument"
end if
end macro
The table of relocations that is created this way can then be accessed
with "load". The following two lines could be used to put the table
in its entirety somewhere in the output:
load RELOCATIONS : RELOCATION_COUNT shl 1 from Relocations : 0
dw RELOCATIONS
The "load" reads the whole table into a single string, then "dw" writes it
into output (padded to multiple of a word, but in this case the string never
requires such padding).
For more complex types of relocations additional modifier may need to be
employed. For example, if upper and lower portions of an address needed to be
stored in separate places (likely across two instructions) and relocated
separately, necessary modifiers could be implemented as follows:
element MOD.HIGH
element MOD.LOW
HIGH? equ MOD.HIGH +
LOW? equ MOD.LOW +
macro BYTE? value
if value relativeto MOD.HIGH + CODE
; register HIGH relocation
db (value - MOD.HIGH - CODE) shr 8
else if value relativeto MOD.LOW + CODE
; register LOW relocation
db (value - MOD.LOW - CODE) and 0FFh
else if value relativeto MOD.HIGH
db (value - MOD.HIGH) shr 8
else if value relativeto MOD.LOW
db (value - MOD.LOW) and 0FFh
else
db value
end if
end macro
The commands that would register relocation have been omitted for clarity,
in this case not only offset within code but some additional information would
need to registered in appropriate structures. With such preparation, relocatable
units in code might be generated like:
BYTE HIGH address
BYTE LOW address
Such approach allows to easily enable syntax with modifiers in any instruction
that internally uses "byte" macroinstruction when generating code.
How can multiple sections of file be generated in parallel?
This assembly engine has a single main output that has to be generated
sequentially. This may seem problematic when the file needs to contains
distinct sections for code and data, collected from interleaved pieces that
may be spread across multiple source files. There are, however, a couple of
methods to handle it, all based in one way or another on forward-referencing
capabilities of the assembler.
A natural approach is to define contents of auxiliary section in "virtual"
block and copy it to appropriate position in the output with a single
operation. When a "virtual" block is labeled, it can be re-opened multiple
times to append more data to it.
include '8086.inc'
org 100h
jmp CodeSection
DataSection:
virtual
Data::
end virtual
postpone
virtual Data
load Data.OctetString : $ - $$ from $$
end virtual
end postpone
db Data.OctetString
CodeSection:
virtual Data
Hello db "Hello!",24h
end virtual
mov ah,9
mov dx,Hello
int 21h
virtual Data
ExitCode db 37h
end virtual
mov ah,4Ch
mov al,[ExitCode]
int 21h
This leads to a relatively simple syntax even without help of additional
macros.
Another method could be to put the pieces of the section into macros and
execute them all at the required position in source. A disadvantage of such
approach is that tracing errors in definitions might become a bit cumbersome.
The techniques that allow to easily append to a section generated in
parallel can also be very useful to generate data structures like relocation
tables. Instead of "store" commands used earlier when demonstrating
the concept, regular data directives could be used inside a re-opened
"virtual" block to create relocation records.
What options are there to parse other kinds of syntax?
In some cases a command that assembler needs to parse may begin with
something different than a name of instruction or a label. It may be
that a name is preceded by a special character, like "." or "!",
or that it is an entirely different kind of construction. It is then
necessary to use "macro ?" to intercept whole lines of source text
and process any special syntax of such kind.
For example, if it was required to allow a command written as ".CODE",
it would not be possible to implement it directly as a macroinstruction,
because initial dot causes the symbol to be interpreted as a local one
and globally defined instruction could never be executed this way.
The intercepting macroinstruction provides a solution:
macro ? line&
match .=CODE?, line
CODE
else match .=DATA?, line
DATA
else
line
end match
end macro
The lines that contain either ".CODE" or ".DATA" text are processed here
in such a way, that they invoke the global macroinstruction with
corresponding name, while all other intercepted lines are executed without
changes. This method allows to filter out any special syntax and let
the assembler process the regular instructions as usual.
Sometimes unconventional syntax is expected only in a specific area
of source text, like inside a block with defined boundaries. The
parsing macroinstruction should then be applied only in this place,
and removed with "purge" when the block ends:
macro concise
macro ? line&
match =end =concise, line
purge ?
else match dest+==src, line
ADD dest,src
else match dest-==src, line
SUB dest,src
else match dest==src, line
LD dest,src
else match dest++, line
INC dest
else match dest--, line
DEC dest
else match any, line
err "syntax error"
end match
end macro
end macro
concise
C=0
B++
A+=2
end concise
A macroinstruction defined this way does not intercept lines that contain
directives controlling the flow of the assembly, like "if" or "repeat", and
they can still be used freely inside such a block. This would change if
the declaration was in the form "macro ?! line&". Such a variant would
intercept every line with no exception.
Another option to catch special commands might be to use "struc ?"
to intercept only lines that do not start with a known instruction
(the initial symbol is then treated as label). Since this one only tests
unknown commands, it should cause less overhead on the assembly:
struc (head) ? tail&
match .=CODE?, head
CODE tail
else
head tail
end match
end struc
All these approaches hide a subtle trap. A label defined with ":" may be
followed by another instruction in the same line. If that next instruction
(which here becomes hidden in the "tail" parameter) is a control directive
like "if", putting it inside the "else" clause is going to cause broken nesting
of control blocks. A possible solution is to somehow invoke "tail" contents
outside of "match" block. One way could be to call a special macro:
struc (head) ? tail&
local invoker
match .=CODE?, head
macro invoker
CODE tail
end macro
else
macro invoker
head tail
end macro
end match
invoker
end struc
A simpler option is to call the original line directly and when override
is needed, cause it to be ignored with help of another line interceptor
(disposing of itself immediately after):
struc (head) ? tail&
match .=CODE?, head
CODE tail
macro ? line&
purge ?
end macro
end match
head tail
end struc
However, a much better way of avoiding this kinds of pitfalls is to use
CALM instructions instead of standard macros. There it is possible to
process arguments and assemble the original or modified line without
use of any control directives. CALM instructions also offer a much better
performance, which might be especially important in case of interceptors
that get called for nearly every line in source text.
How to define an instruction sharing a name with one of the core directives?
It may happen that a language can be in general easily implemented with
macros, but it needs to include a command with the same name as one of
the directives of assembler. While it is possible to override any
instruction with a macro, macros themself may require an access to
the original directive. To allow the same name call a different instruction
depending on the context, the implemented language may be interpreted
within a namespace that contains overriding macro, while all the macros
requiring access to original directive would have to temporarily switch
to another namespace where it has not have been overridden. This would
require every such macro to pack its contents in a "namespace" block.
But there is another trick, related to how texts of macro parameters
or symbolic variables preserve the context under which the symbols within
them should be interpreted (this includes the base namespace and
the parent label for symbols starting with dot).
Unlike the two mentioned occurences, the text of a macro normally does
not carry such extra information, but if a macro is constructed in such way
that it contains text that was once carried within a parameter to another
macro or within a symbolic variable, then this text retains the information
about context even when it becomes a part of a newly defined macro.
For example:
macro definitions end?
namespace embedded
struc LABEL? size
match , size
.:
else
label . : size
end match
end struc
macro E#ND? name
end namespace
match any, name
ENTRYPOINT := name
end match
macro ?! line&
end macro
end macro
end macro
definitions end
start LABEL
END start
The parameter given to "definitions" macro may appear to do nothing, as it
replaces every instance of "end" with exactly the same word - but the text
that comes from the parameter is equipped with additional information about
context, and this attribute is then preserved when the text becomes a part
of a new macro. Thanks to that, macro "LABEL" can be used in a namespace
where "end" instruction has taken a different meaning, but the instances
of "end" within its body still refer to the symbol in the outer namespace.
In this example the parameter has been made case-insensitive, and thus
it would replace even the "END" in "macro" statement that is supposed to
define a symbol in "embedded" namespace. For this reason the identifier
has been split with a concatenation operator to prevent it from being
recognized as parameter. This would not be necessary if the parameter
was case-sensitive (as more usual).
The same effect can be achieved through use of symbolic variables instead
of macro parameters, with help of "match" to extract the text of a symbolic
variable:
define link end
match end, link
namespace embedded
struc LABEL? size
match , size
.:
else
label . : size
end match
end struc
macro END? name
end namespace
match any, name
ENTRYPOINT := name
end match
macro ?! line&
end macro
end macro
end match
start LABEL
END start
This would not work without passing the text through symbolic variable,
because parameters defined by control directives like "match" do not
add context information to the text unless it was already there.
CALM instructions allow for another approach to this kind of problems.
If a customized instruction set is defined entirely in form of CALM,
they may not even need an access to original control directives.
However, if CALM instruction needs to assemble a directive that might not
be accessible, the symbolic variable passed to "assemble" should be
defined with appropriate context for the instruction symbol.
How to convert a macroinstruction to CALM?
A classic macroinstruction consists of lines of text that are preprocessed
(by replacing names of parameters with their corresponding values) every time
the instruction is called and these preprocessed lines are passed to assembly.
For example this macroinstruction generates just a single line to be assembled,
and it does it by replacing "number" with the text given by the only argument
to the instruction:
macro octet value*
db value
end macro
A CALM instruction can be viewed as customized preprocessor, which needs to
be written in a special language. It is able to use various commands to
process the arguments and generate lines to be assembled. On the basic
level, it is also able to simulate what standard preprocessor does - with
help of "arrange" command. After preprocessing the line, it also needs to
explicitly pass it to the assembly with an "assemble" command:
calminstruction octet value*
arrange value, =db value
assemble value
end calminstruction
This gives the same result as the original macroinstruction, as it performs
the same kind of preprocessing. However, unlike the text of macroinstruction
a pattern given to "arrange" needs to explicitly state which name tokens are
to be replaced with their values and which ones (prepended with "=") should
be left untouched. The tokens that are copied from the pattern are stripped of
any context information, just like the text of macroinstruction is normally not
carrying any (while the values that came from arguments retain the recognition
context in which the instruction was started).
This is the most straightforward method of conversion and a simple sequence
of "arrange" and "assemble" commands could be made to generate the same lines as
by the original macroinstruction. But there is one exception - when a "local"
command is executed by macroinstruction, it creates a preprocessed parameter
with a special value that points to a symbol in the namespace unique to given
instance of the instruction.
macro pointer
local next
dd next
next:
end macro
In case of CALM there is no such namespace available, the local namespace of
a CALM instruction is shared among all its instances. Therefore, if a new unique
symbol is needed every time the instruction is called, it has to be constructed
manually. An obvious method might be to append a unique number to the name:
global_uid = 0
calminstruction pointer
compute global_uid, global_uid + 1
local command
arrange command, =dd =next#global_uid
assemble command
arrange command, =next#global_uid:
assemble command
end calminstruction
Here "arrange" is given a variable that has a numeric value and it has to
replace it with a text. This works only when the value is a plan non-negative
number, in such case "arrange" converts it to a text token that contains decimal
representation of that number. The lines passed to assembly are therefore
going to contains identifiers like "next#1".
While incrementation of the global counter could be done by preparing
a standard assembly command like "global_uid = global_uid + 1" with "arrange"
and passing it to assembly, "compute" command allows to do it directly in the
CALM processor. Moreover, it is then not affected by anything that alters
the context of assembly. If the instruction was defined as unconditional and
used inside a skipped IF block, the "compute" would still perform its task,
because execution of CALM commands is - just like standard preprocessing - done
independently from the main flow of the assembly. Also, references to
the "global_uid" always point to the same symbol - the one that was in scope
when the CALM instruction was defined and compiled. Therefore incrementing
the value with "compute" is more reliable and predictable.
In a similar manner, the assembly of line defining the label can be replaced
with a "publish" command. Here the value of the label (which should be equal
to the address after the line containing "dd" is assembled) needs to be computed
first, because "publish" only performs the assignment of a value to the symbol:
global_uid = 0
calminstruction pointer
compute global_uid, global_uid + 1
local symbol, command
arrange symbol, =next#global_uid
arrange command, =dd symbol
assemble command
local address
compute address, $
publish symbol:, address
end calminstruction
Because the CALM instruction itself is conditional, the "publish" inside is
effectively conditional, too. Therefore it works correctly as a replacement
for the assembly of line with a label.
While a global counter has several advantages, it can be interfered with,
so sometimes use of a local counter might be preferable. However, the local
namespace of CALM instruction is not normally not accessible from outside, so
it is a bit harder to give an initial value to such counter. One way could be
to check whether the counter has already been initialized with some value using
"take" command:
calminstruction pointer
local id
take id, id
jyes increment
compute id, 0
increment:
compute id, id + 1
local symbol, command
arrange symbol, =next#id
arrange command, =dd symbol
assemble command
local address
compute address, $
publish symbol:, address
end calminstruction
But this adds commands that are executed every time the instruction is called.
A better solution makes use of the ability to define custom instructions
processed during the definition of CALM instruction:
calminstruction calminstruction?.init? var*, val:0
compute val, val
publish var, val
end calminstruction
calminstruction pointer
local id
init id, 0
compute id, id + 1
local symbol, command
arrange symbol, =next#id
arrange command, =dd symbol
assemble command
local address
compute address, $
publish symbol:, address
end calminstruction
The custom statement "init" is called at the time when the CALM instruction is
defined (it does not generate any commands to be executed by the defined
instruction - it would itself have to use "assemble" commands to generate
statements to be compiled). It is given the name of variable from the local
scope of the CALM instruction, and it uses "publish" to assign an initial
numeric value to that variable.
To initialize local variable with a symbolic value, even simpler custom
instruction would suffice:
calminstruction calminstruction?.initsym? var*, val&
publish var, val
end calminstruction
The text of "val" argument carries the recognition context of the definition
of CALM instruction that contains the "initsym" statement, therefore it allows
to prepare a text for "assemble" containing references to local symbols:
calminstruction be32? value
local command
initsym command, dd value
compute value, value bswap 4
assemble command
end calminstruction
Again, after this intruction is compiled, it contains just two actual commands,
"compute" and "assemble", and the value of local symbol "command" is a text
that is interpreted in the same local context and refers to the same symbol
"value" as the "compute" does.
This example also demonstrates another advantage of CALM over standard
macroinstructions: its strict semantics prevent various kinds of unwanted
behavior that is allowed by a simple substitution of text. The text of "value"
is going to be evaluated by "compute" as a numeric sub-expression, signalling
an error on any unexpected syntax. Therefore it should be favorable to process
arguments entirely through CALM commands and only use "assemble" for final
simple statements.