2475 lines
113 KiB
Plaintext
2475 lines
113 KiB
Plaintext
|
|
||
|
|
||
|
flat assembler g
|
||
|
User Manual
|
||
|
|
||
|
|
||
|
This document describes the syntax of flat assembler g language, with basic
|
||
|
examples. It was written with an assumption that it would be read sequentially
|
||
|
and at any moment it uses only the concepts and constructions that have been
|
||
|
introduced earlier. However it should be possible to jump right to the
|
||
|
section that interests the reader, and then go back to earlier parts only when
|
||
|
it is needed in order to better understand the later ones.
|
||
|
|
||
|
|
||
|
Table of contents
|
||
|
|
||
|
0. Executing the assembler
|
||
|
1. Fundamental syntax rules
|
||
|
2. Symbol identifiers
|
||
|
3. Basic symbol definitions
|
||
|
4. Expression values
|
||
|
5. Symbol classes
|
||
|
6. Generating data
|
||
|
7. Conditional assembly
|
||
|
8. Macroinstructions
|
||
|
9. Labeled macroinstructions
|
||
|
10. Symbolic variables and recognition context
|
||
|
11. Repeating blocks of instructions
|
||
|
12. Matching parameters
|
||
|
13. Output areas
|
||
|
14. Source and output control
|
||
|
15. CALM instructions
|
||
|
16. Assembly commands in CALM instructions
|
||
|
|
||
|
|
||
|
0. Executing the assembler
|
||
|
|
||
|
To start assembly from the command line it is necessary to provide at least one
|
||
|
parameter, the name of a source file, and optionally a second one -
|
||
|
name of the destination file. If the assembly is successful, the generated
|
||
|
output is written into the destination and a short summary is displayed,
|
||
|
otherwise an information about errors is shown. The maximum number of presented
|
||
|
errors can be controlled with an additional "-e" switch (by default no more than
|
||
|
one error is presented). The "-p" switch controls the maximum number of passes
|
||
|
the assembler is going to attempt. This limit is by default set to 100.
|
||
|
The "-r" switch allows to set up the limit of the recursion stack, that is the
|
||
|
maximum allowed depth of entering macroinstructions and including additional
|
||
|
source files. The "-v" switch can enable showing all the lines from this stack
|
||
|
when reporting an error (by default the assembler tries to select only the
|
||
|
lines that are likely the most informative, but this simple heuristic may not
|
||
|
always be correct). If "-v" switch is used with value 2, it in addition makes
|
||
|
all the messages displayed by commands from the source text to be shown in real
|
||
|
time (in every consecutive pass). The "-i" switch allows to insert any command at
|
||
|
the beginning of processed source.
|
||
|
|
||
|
|
||
|
1. Fundamental syntax rules
|
||
|
|
||
|
Every command in the assembly language occupies a single line of text.
|
||
|
If a line contains the semicolon character, everything from that character up
|
||
|
to the end of the line is treated as a comment and ignored by the assembler.
|
||
|
The main part of a line (i.e. excluding the comment) may end with the backslash
|
||
|
character and in such case the next line from the source text is going to be
|
||
|
appended to this one. This allows to split any command across multiple lines,
|
||
|
when needed. From now on we will refer to a source line as an entity obtained
|
||
|
by stripping comments and joining the lines of text connected with backslash
|
||
|
characters.
|
||
|
The text of source line is divided into syntactical units called tokens.
|
||
|
There is a number of special characters that become separate tokens all by
|
||
|
themselves. Any of the characters listed below is such a syntactical unit:
|
||
|
|
||
|
+-/*=<>()[]{}:?!,.|&~#`\
|
||
|
|
||
|
Any contiguous (i.e. not broken by whitespace) sequence of characters other than
|
||
|
the above ones becomes a single token, which can be a name or a number.
|
||
|
The exception to this rule is when a sequence starts with the single or the double
|
||
|
quote character. This defines a quoted string and it may contain any of the
|
||
|
special characters, whitespace and even semicolons, as it ends only when the
|
||
|
same character that was used to start it is encountered. The quotes that are
|
||
|
used to enclose the string do not become a part of the string themselves.
|
||
|
If it is needed to define a string containing the same character that is used to
|
||
|
enclose it, the character needs to be doubled inside the string - only one copy
|
||
|
of the character will become a part of the string, and the sequence will
|
||
|
continue.
|
||
|
Numbers are distinguished from names by the fact that they either
|
||
|
begin with a decimal digit, or with the "$" character followed by any hexadecimal
|
||
|
digit. This means that a token can be considered numeric even when it is not a
|
||
|
valid number. To be a correct one it must be one of the following: a decimal
|
||
|
number (optionally with the letter "d" attached at the end), a binary number
|
||
|
followed by the letter "b", an octal number followed by the letter "o" or "q", or a
|
||
|
hexadecimal number either prepended with "$" or "0x", or followed by the character
|
||
|
"h". Because the first digit of a hexadecimal number can be a letter, it may be
|
||
|
needed to prepend it with the digit zero in order to make it recognizable as a number.
|
||
|
For example, "0Ah" is a valid number, while "Ah" is just a name.
|
||
|
|
||
|
|
||
|
2. Symbol identifiers
|
||
|
|
||
|
Any name can become a defined symbol by having some meaning (a value) assigned to it.
|
||
|
One of the simplest methods of creating a symbol with a given value is to use
|
||
|
the "=" command:
|
||
|
|
||
|
a = 1
|
||
|
|
||
|
The ":" command defines a label, that is a symbol with a value equal to the
|
||
|
current address in the generated output. At the beginning of the source text this
|
||
|
address is always zero, so when the following two commands are the first ones
|
||
|
in the source file, they define symbols that have identical values:
|
||
|
|
||
|
first:
|
||
|
second = 0
|
||
|
|
||
|
Labels defined with ":" command are special constructs in assembly language,
|
||
|
since they allow any other command (including another label definition) to
|
||
|
follow in the same line. This is the only kind of command that allows this.
|
||
|
What comes before the ":" or "=" character in such definition is a symbol
|
||
|
identifier. It can be a simple name, like in the above samples, but it may
|
||
|
also contain some additional modifiers, described below.
|
||
|
When a name in a symbol definition has the "?" character appended to it (with
|
||
|
no whitespace between them), the symbol is case-insensitive (otherwise it would
|
||
|
be defined as case-sensitive). This means that the value of such
|
||
|
symbol may be referred to (as in an expression to the right of the "=" character)
|
||
|
by the name being any variant of the original name that differs only in the case
|
||
|
of letters. Only the cases of the 26 letters of the English alphabet are
|
||
|
allowed to differ, though.
|
||
|
It is possible to define a case-sensitive symbol that clashes with a
|
||
|
case-insensitive one. Then the case-sensitive symbol takes precedence and the more
|
||
|
general one is used only when corresponding case-sensitive symbol is not defined.
|
||
|
This can be remedied by using the "?" modifier, since it always means that the name
|
||
|
followed by it refers to the case-insensitive symbol.
|
||
|
|
||
|
tester? = 0
|
||
|
tester = 1
|
||
|
TESTER = 2
|
||
|
x = tester ; x = 1
|
||
|
y = Tester ; y = 0
|
||
|
z = TESTER ; z = 2
|
||
|
t = tester? ; t = 0
|
||
|
|
||
|
Every symbol has its own namespace of descendants, called child namespace. When two
|
||
|
names are connected with a dot (with no whitespace in between), such identifier refers to
|
||
|
an entity named by the second one in the namespace of descendants to the symbol specified
|
||
|
by the first one. This operation can be repeated many times within a single identifier,
|
||
|
allowing to refer to descendants of descendants in a chain of any length.
|
||
|
|
||
|
space:
|
||
|
space.x = 1
|
||
|
space.y = 2
|
||
|
space.color:
|
||
|
space.color.r = 0
|
||
|
space.color.g = 0
|
||
|
space.color.b = 0
|
||
|
|
||
|
Any of the names in such chain may optionally be followed by the "?" character
|
||
|
to mark that it refers to a case-insensitive symbol. If "?" is inserted in
|
||
|
the middle of the name (effectively splitting it into separate tokens) such
|
||
|
identifier is considered a syntactical error.
|
||
|
When an identifier starts with a dot (in other words: when the name of the parent
|
||
|
symbol is empty), it refers to the symbol in the namespace of the most recent
|
||
|
regular label defined before current line. This allows to rewrite the above sample
|
||
|
like this:
|
||
|
|
||
|
space:
|
||
|
.x = 1
|
||
|
.y = 2
|
||
|
.color:
|
||
|
.color.r = 0
|
||
|
.color.g = 0
|
||
|
.color.b = 0
|
||
|
|
||
|
After the "space" label is defined, it becomes the most recently defined normal
|
||
|
label, so the following ".x" refers to the "space.x" symbol and then the ".color"
|
||
|
refers to the "space.color".
|
||
|
The "namespace" command followed by a symbol identifier changes the base
|
||
|
namespace for a section of source text. It must be paired with the
|
||
|
"end namespace" command later in the source to mark the end of such block.
|
||
|
This can be used to again rewrite the above sample in a different way:
|
||
|
|
||
|
space:
|
||
|
namespace space
|
||
|
x = 1
|
||
|
y = 2
|
||
|
color:
|
||
|
.r = 0
|
||
|
.g = 0
|
||
|
.b = 0
|
||
|
end namespace
|
||
|
|
||
|
When a name is not preceded by a dot, and as such it does not have explicitly
|
||
|
specified in what namespace the symbol resides, the assembler looks for defined
|
||
|
symbol in the current namespace, and if none is found, in the consecutive namespaces
|
||
|
of parent symbols, starting from the namespace containing the parent symbol of
|
||
|
current namespace. If no defined symbol with such name is found, it is assumed that
|
||
|
the name refers to the symbol in the current namespace (and unless there is "?"
|
||
|
character after such name, it is assumed that the symbol is case-sensitive).
|
||
|
A definition that does not specify the namespace where the new symbol should be
|
||
|
created, always makes a new symbol in the current base namespace.
|
||
|
|
||
|
global = 0
|
||
|
regional = 1
|
||
|
namespace regional
|
||
|
regional = 2 ; regional.regional = 2
|
||
|
x = global ; regional.x = 0
|
||
|
regional.x = regional ; regional.regional.x = 2
|
||
|
global.x = global ; global.x = 0
|
||
|
end namespace
|
||
|
|
||
|
The comments in the above sample show equivalent definitions with respect
|
||
|
to the original base namespace. Note that when a name is used to specify the
|
||
|
namespace, the assembler looks for a defined symbol with such name to lookup in
|
||
|
its namespace, but when it is a name of a symbol to be defined, it is always
|
||
|
created within the current base namespace.
|
||
|
When the final dot of an identifier is not followed by any name, it refers
|
||
|
to the parent symbol of the namespace that would be searched for a symbol if
|
||
|
there was a name after this dot. Adding such dot at the end of an identifier may
|
||
|
appear redundant, but it can be used to alter the way the definition of a symbol
|
||
|
works, because it forces the assembler to look for an already existing symbol that
|
||
|
it can alter instead of squarely creating a new one in the current namespace.
|
||
|
For instance, if in the fourth line of the previous example "regional." was put
|
||
|
in place of "regional", it would rewrite a value of the original "regional"
|
||
|
symbol instead of making a new symbol in the child namespace. Similarly,
|
||
|
a definition formed this way may assign a new value to a symbol regardless of
|
||
|
whether it was previously defined as case-insensitive or not.
|
||
|
If an identifier is just a single dot, by the above rules it refers to the most
|
||
|
recent label that did not start with a dot. This can be applied to rewrite
|
||
|
the earlier example in yet another way:
|
||
|
|
||
|
space:
|
||
|
namespace .
|
||
|
x = 1
|
||
|
y = 2
|
||
|
color:
|
||
|
namespace .
|
||
|
r = 0
|
||
|
g = 0
|
||
|
b = 0
|
||
|
end namespace
|
||
|
end namespace
|
||
|
|
||
|
It also demonstrates how namespace sections can be nested one within another.
|
||
|
The "#" may be inserted anywhere inside an identifier without changing its
|
||
|
meaning. When "#" is the only character separating two name tokens, it causes
|
||
|
them to be interpreted as a single name formed by concatenating the tokens.
|
||
|
|
||
|
variable = 1
|
||
|
varia#ble = var#iable + 2 ; variable = 3
|
||
|
|
||
|
This can also be applied to numbers.
|
||
|
Inside a block defined with "namespace" there is initially no label that would
|
||
|
be considered base for identifiers starting with dot (even if there was a label
|
||
|
that served this purpose outside of the block, it loses this status and is brought
|
||
|
back to use only after the block is closed with "end namespace"). A similar thing
|
||
|
also happens in the beginning of the source text, before any label has been defined.
|
||
|
This is connected to additional rules concerning dots in identifiers.
|
||
|
When an identifier starts with a dot, but there is no label that would be
|
||
|
a parent for it, the identifier refers to the descendant of a special symbol
|
||
|
that resides in the current namespace but has no name. If an identifier starts
|
||
|
with a sequence of two or more dots, the identifier refers to the descedant of
|
||
|
a similar unnamed symbol, but it is a distinct one for any given number of dots.
|
||
|
While the namespace accessed with a single starting dot changes every time a new
|
||
|
regular label is defined, the special namespace accessed with two or more dots
|
||
|
in the beginning of an identifier remains the same:
|
||
|
|
||
|
first:
|
||
|
.child = 1
|
||
|
..other = 0
|
||
|
second:
|
||
|
.child = 2
|
||
|
..another = ..other
|
||
|
|
||
|
In this example the meaning of the ".child" identifier changes from place to
|
||
|
place, but the "..other" identifier means the same everywhere.
|
||
|
When two names inside an identifier are connected with a sequence of two or
|
||
|
more dots, the identifier refers to the descendant of such special unnamed
|
||
|
symbol in the namespace specified by the partial identifier before that sequence
|
||
|
of dots. The unnamed child namespace is chosen depending on a number of dots and
|
||
|
in this case the number of required dots is increased by one. The following
|
||
|
example demonstrates the two methods of identifying such symbol:
|
||
|
|
||
|
namespace base
|
||
|
..other = 1
|
||
|
end namespace
|
||
|
|
||
|
result = base.#..other
|
||
|
|
||
|
The "#" character has been inserted into the last identifier for a better
|
||
|
readability, but the plain sequence of three dots would do the same.
|
||
|
The unnamed symbol that hosts a special namespace can itself be accessed
|
||
|
when an identifier ends with a sequence of two or more dots - thanks to the
|
||
|
rule that an identifier which ends in a dot refers to the parent symbol of
|
||
|
the namespace that would be accessed if there was a name after this dot. So
|
||
|
in the context of the previous example the "base..." (or "base.#..") would
|
||
|
refer to the unnamed parent of the namespace where the "other" symbol resides,
|
||
|
and it would be the same symbol as identified by simple ".." inside the
|
||
|
namespace of the "base" symbol.
|
||
|
Any identifier can be prepended with a "?" character and such modifier has
|
||
|
an effect when it is used in a context where identifier could mean something
|
||
|
different than a label or variable to be defined. This modifier then
|
||
|
suppresses any other interpretation. For example, identifier starting with "?"
|
||
|
is not going to be treated as an instruction, even if it is the first symbol
|
||
|
on the line. This can be used to define a variable that shares a name with
|
||
|
an existing command:
|
||
|
|
||
|
?namespace = 0
|
||
|
|
||
|
If such modified identifier is used in a place where it is evaluated and not
|
||
|
defined, it still refers to the same symbol it would refer to in a definition.
|
||
|
Therefore, unless identifier also uses a dot, it always refers to a symbol
|
||
|
in the current namespace.
|
||
|
A number can be used in a role of a name inside an identifier, but not when
|
||
|
it is placed at the beginning, because then it is considered a literal value.
|
||
|
This restriction also may be bypassed by prepending an identifier with "?".
|
||
|
|
||
|
|
||
|
3. Basic symbol definitions
|
||
|
|
||
|
When a symbol is defined as a label, it must be the only definition of
|
||
|
this symbol in the entire source. A value that is assigned to the symbol this way
|
||
|
can be accesed from every place in the source, even before the label is actually
|
||
|
defined. When a symbol is used before it is defined (this is often called
|
||
|
forward-referencing) the assembler tries to correctly predict the value of
|
||
|
the symbol by doing multiple passes over the source text. Only when all
|
||
|
predictions prove to be correct, the assembler generates the final output.
|
||
|
This kind of symbol, which can only be defined once and thus have a universal
|
||
|
value that can always be forward-referenced, is called a constant. All labels
|
||
|
are constants.
|
||
|
When a symbol is defined with a "=" command, it may have multiple definitions
|
||
|
of this kind. Such symbol is called variable and when it is used, the value from
|
||
|
its latest definition is accessed. A symbol defined with such command may also be
|
||
|
forward-referenced, but only when it is defined exactly once in the entire
|
||
|
source and as such has a single unambiguous value.
|
||
|
|
||
|
a = 1 ; a = 1
|
||
|
a = a + 1 ; a = 2
|
||
|
a = b + 1 ; a = 3
|
||
|
b = 2
|
||
|
|
||
|
A special case of forward-referencing is self-referencing, when the value
|
||
|
of a symbol is used in its own definition. The assembly of such construct is
|
||
|
successful only when the assembler is able to find a value that is stable under
|
||
|
such evaluation, effectively solving an equation. But due to the simplicity
|
||
|
of the resolving algorithm based on predictions a solution may not be found even
|
||
|
when it exists.
|
||
|
|
||
|
x = (x-1)*(x+2)/2-2*(x+1) ; x = 6 or x = -1
|
||
|
|
||
|
The ":=" defines a constant value. It may be used instead of "=" to
|
||
|
ensure that the given symbol is defined exactly once and that it can be
|
||
|
forward-referenced.
|
||
|
The "=:" defines a variable symbol like "=", but it differs in how
|
||
|
it treats the previous value (when such exists). While "=" discards the
|
||
|
previous value, "=:" preserves it so it can later be brought back with the
|
||
|
"restore" command:
|
||
|
|
||
|
a = 1
|
||
|
a =: 2 ; preserves a = 1
|
||
|
a = 3 ; discards a = 2 and replaces it with a = 3
|
||
|
restore a ; brings back a = 1
|
||
|
|
||
|
A "restore" may be followed by multiple symbol identifiers separated with
|
||
|
commas, and it discards the latest definition of every one of them. It is not
|
||
|
considered an error to use "restore" with a symbol that has no active
|
||
|
definition (either because it was never defined or because all of its
|
||
|
definitions were already discarded earlier). If a symbol is treated with the
|
||
|
"restore" command, it becomes a variable and can never be forward-referenced.
|
||
|
For this reason "restore" cannot be applied to constants.
|
||
|
The "label" keyword followed by a symbol identifier is an alternative way
|
||
|
of defining a label. In this basic form it is equivalent to a definition made
|
||
|
with ":", but it occupies an entire line. However with this command it is
|
||
|
possible to provide more settings for the defined label. The identifier may
|
||
|
be optionally followed by the ":" token and then an additional value to be
|
||
|
associated with this label (usually denoting the size of the labeled entity).
|
||
|
The assembler has a number of built-in constants defining various sizes for
|
||
|
this purpose, but this value can also be provided as a plain number.
|
||
|
|
||
|
label character:byte
|
||
|
label char:1
|
||
|
|
||
|
The ":" character may be omitted in favor of a plain whitespace, but it is
|
||
|
recommended for clarity. After an identifier and an optional size, the "at"
|
||
|
keyword may follow and then a value that should be assigned to the label instead
|
||
|
of the current address.
|
||
|
|
||
|
label wchar:word at char
|
||
|
|
||
|
The built-in size constants are equivalent to the following set of
|
||
|
definitions:
|
||
|
|
||
|
byte? = 1 ; 8 bits
|
||
|
word? = 2 ; 16 bits
|
||
|
dword? = 4 ; 32 bits
|
||
|
fword? = 6 ; 48 bits
|
||
|
pword? = 6 ; 48 bits
|
||
|
qword? = 8 ; 64 bits
|
||
|
tbyte? = 10 ; 80 bits
|
||
|
tword? = 10 ; 80 bits
|
||
|
dqword? = 16 ; 128 bits
|
||
|
xword? = 16 ; 128 bits
|
||
|
qqword? = 32 ; 256 bits
|
||
|
yword? = 32 ; 256 bits
|
||
|
dqqword? = 64 ; 512 bits
|
||
|
zword? = 64 ; 512 bits
|
||
|
|
||
|
The "element" keyword followed by a symbol identifier defines a special
|
||
|
constant that has no fixed value and can be used as a variable in the linear
|
||
|
polynomials. The identifier may be optionally followed by the ":" token and
|
||
|
then a value to be associated with this symbol, called metadata of the
|
||
|
element.
|
||
|
|
||
|
element A
|
||
|
element B:1
|
||
|
|
||
|
The metadata assigned to a symbol can be extracted with a special operator,
|
||
|
defined in the next section.
|
||
|
|
||
|
|
||
|
4. Expression values
|
||
|
|
||
|
In every construction described so far where a value of some kind was
|
||
|
provided, like after the "=" command or after the "at" keyword, it could be
|
||
|
a literal value (a number or a quoted string) or a symbol identifier.
|
||
|
A value can also be specified through an expression containing built-in
|
||
|
operators.
|
||
|
The "+", "-" and "*" perform standard arithmetic operations on integers
|
||
|
("+" and "-" can also be used in a unary form - with only one argument).
|
||
|
"/" and "mod" perform division with remainder, giving a quotient or a remainder
|
||
|
respectively. Of these arithmetic operators "mod" has the highest precedence
|
||
|
(it is calculated first), "*" and "/" come next, while "+" and "-" are evaluated
|
||
|
last (even in their unary variants). Operators with the same precedence are
|
||
|
evaluated from left to right. Parentheses can be used to enclose sub-expressions
|
||
|
when a different order of operations is required.
|
||
|
The "xor", "and" and "or" perform bitwise operations on numbers. "xor" is
|
||
|
addition of bits (exclusive or), "and" is multiplication of bits, and "or" is
|
||
|
inclusive or (logical disjunction). These operators have higher precedence
|
||
|
than any arithmetic operators.
|
||
|
The "shl" and "shr" perform bit-shifting of the first argument by the amount
|
||
|
of bits specified by the second one. "shl" shifts bits left (towards the higher
|
||
|
powers of two), while "shr" shifts bits right (towards zero), dropping bits that
|
||
|
fall into the fractional range. These operators have higher precedence than other
|
||
|
binary bitwise operations.
|
||
|
The "not", "bsf" and "bsr" are unary operators with even higher precedence.
|
||
|
"not" inverts all the bits of a number, while "bsf" and "bsr" search for the
|
||
|
lowest or highest set bit respectively, and give the index of that bit as a
|
||
|
result.
|
||
|
All the operations on numbers are performed as if they were done on the
|
||
|
infinite 2-adic representations of those numbers. For example the "bsr" with a
|
||
|
negative number as an argument gives no valid result, since such number has an
|
||
|
infinite chain of set bits extending towards infinity and as such contains no
|
||
|
highest set bit (this is signaled as an error).
|
||
|
The "bswap" operator allows to create a string of bytes containing the
|
||
|
representation of a number in a reverse byte order (big endian). The second
|
||
|
argument to this operator should be the length in bytes of the required string.
|
||
|
This operator has the same precedence as the "shl" and "shr" operators.
|
||
|
When a string value is used as an argument to any of the operations on
|
||
|
numbers, it is treated as a sequence of bits and automatically converted into
|
||
|
a positive number (extended with zero bits towards the infinity). The
|
||
|
consecutive characters of a string correspond to the higher and higher bits of a
|
||
|
number.
|
||
|
To convert a number back to a string, the "string" unary operator may be
|
||
|
used. This operator has the lowest possible precedence, so when it precedes
|
||
|
an expression, all of it is evaluated prior to the conversion. When conversion
|
||
|
in the opposite direction is needed, simple unary "+" is enough to make a string
|
||
|
become a number.
|
||
|
The length of a string may be obtained with the "lengthof" unary operator,
|
||
|
one of the operators with the highest precedence.
|
||
|
The "bappend" operator appends a sequence of bytes of a string given by the
|
||
|
second argument to the sequence of bytes given by the first one. If either of
|
||
|
the arguments is a number, it becomes implicitly converted into a string. This
|
||
|
operator has the same precedence as binary bitwise operations.
|
||
|
When a symbol defined with the "element" command is used in an expression the
|
||
|
result may be a linear polynomial in a variable represented by the symbol.
|
||
|
Only simple arithmetic operations are allowed on the terms of a polynomial,
|
||
|
and it must stay linear - so, for example, it is only allowed to multiply a
|
||
|
polynomial by a number, but not by another polynomial.
|
||
|
There are a few operators with high precedence that allow to extract the information
|
||
|
about the terms of linear polynomial. The polynomial should come as the first argument,
|
||
|
and the index of the term as the second one. The "element" operator extracts
|
||
|
the variable of a polynomial term (with the coefficient of one), the "scale" operator
|
||
|
extracts the coefficient (a number by which the variable is multiplied) and "metadata"
|
||
|
operator gives back the metadata associated with the variable.
|
||
|
When the second argument is an index higher than the index of the last term
|
||
|
of the polynomial, all three operators return zero. When the second argument
|
||
|
is zero, "element" and "scale" give information about the constant term -
|
||
|
"element" returns numeric 1 and "scale" returns the value of the constant term.
|
||
|
|
||
|
element A
|
||
|
linpoly = A + A + 3
|
||
|
vterm = linpoly scale 1 * linpoly element 1 ; vterm = 2 * A
|
||
|
cterm = linpoly scale 0 * linpoly element 0 ; cterm = 3 * 1
|
||
|
|
||
|
The "metadata" operator with an index of zero returns the size that is associated
|
||
|
with the first argument. This value is definite only when the first argument is
|
||
|
a symbol that has a size associated with it (or an arithmetic expression
|
||
|
that contains such symbol), otherwise it is zero. There exists an additional
|
||
|
unary operator "sizeof", which gives the same value as "metadata 0".
|
||
|
|
||
|
label table : 256
|
||
|
length = sizeof table ; length = 256
|
||
|
|
||
|
The "elementof", "scaleof" and "metadataof" are variants of "element", "scale"
|
||
|
and "metadata" operators with the opposite order of arguments. Therefore when "sizeof"
|
||
|
is used in an expression it is equivalent to writing "0 metadataof" in its place.
|
||
|
These operators have even higher precendence than their counterparts and are
|
||
|
right-associative.
|
||
|
The order of the terms of the linear polynomial depends on the way in which the value
|
||
|
was constructed. Every arithmetic operation preserves the order of the terms in
|
||
|
the first argument, and the terms that were not present in the first argument are
|
||
|
attached at the end in the same order in which they occurred in the second argument.
|
||
|
This order only matters when extracting terms with appropriate operators.
|
||
|
The "elementsof" is another unary operator of the highest precedence, it
|
||
|
counts the number of variable terms of a linear polynomial.
|
||
|
An expression may also contain a literal value that defines a floating-point
|
||
|
number. Such number must be in decimal notation, it may contain "." character
|
||
|
as a decimal mark and may be followed by the "e" character and then a decimal
|
||
|
value of the exponent (optionally preceded by "+" or "-" to mark the sign of
|
||
|
exponent). When "." or "e" is present, it must be followed by at least
|
||
|
one digit. The "f" character can be appended at the end of such literal value.
|
||
|
If a number contains neither "." nor "e", the final "f" is the only way to
|
||
|
ensure that it is treated as floating-point and not as a simple decimal
|
||
|
integer.
|
||
|
The floating-point numbers are handled by the assembler in the binary form.
|
||
|
Their range and precision are at least as high as they are in the longest
|
||
|
floating-point format that the assembler is able to produce in the output.
|
||
|
Basic arithmetic operations are allowed to have a floating-point
|
||
|
number as any of the arguments, but none of the arguments may contain
|
||
|
a non-scalar (linear polynomial) terms then. The result of such operation is
|
||
|
always a floating-point number.
|
||
|
The unary "float" operator may be used to convert an integer value to
|
||
|
floating-point. This operator has the highest precedence.
|
||
|
The "trunc" is another unary operator with the highest precedence and it can be
|
||
|
applied to floating-point numbers. It extracts the integer part of a number
|
||
|
(it is a truncation toward zero) and the result is always a plain integer, not
|
||
|
a floating-point number. If the argument was already a plain integer, this
|
||
|
operation leaves it unchanged.
|
||
|
The "bsr" operator can be applied to floating-point numbers and it returns
|
||
|
the exponent of such number, which is the exponent of the largest power of
|
||
|
two that is not larger than the given number. The sign of the floating-point value
|
||
|
does not affect the result of this operation.
|
||
|
It is also allowed to use a floating-point number as the first argument
|
||
|
to the "shl" and "shr" operators. The number is then multiplied or divided by the
|
||
|
power of two specified by the second argument.
|
||
|
|
||
|
|
||
|
5. Symbol classes
|
||
|
|
||
|
There are three distinct classes of symbols, determining the position in
|
||
|
source line at which the symbol may be recognized. A symbol belonging to the
|
||
|
instruction class is recognized only when it is the first identifier of the
|
||
|
command, while a symbol from the expression class is recognized only when used
|
||
|
to provide a value of arguments to some command.
|
||
|
All the types of definitions that were described in the earlier sections
|
||
|
create the expression class symbols. The "label" and "restore" are examples
|
||
|
of built-in symbols belonging to the instruction class.
|
||
|
In any namespace it is allowed for symbols of different classes to share the
|
||
|
same name, for example it is possible to define the instruction named "shl",
|
||
|
while there is also an operator with the same name - but an operator belongs
|
||
|
to the expression class.
|
||
|
It is even possible for a single line to contain the same identifier
|
||
|
meaning different things depending on its position:
|
||
|
|
||
|
?restore = 1
|
||
|
restore restore ; remove the value of the expression-class symbol
|
||
|
|
||
|
The third class of symbols are the labeled instructions. A symbol belonging
|
||
|
to this class may be recognized only when the first identifier of the command
|
||
|
is not an instruction - in such case the first identifier becomes a label to
|
||
|
the instruction defined by the second one. If we treat "=" as a special kind
|
||
|
of identifer, it may serve as an example of labeled instruction.
|
||
|
The assembler contains built-in symbols of all classes. Their names are
|
||
|
always case-insensitive and they may be redefined, but it is not possible to
|
||
|
remove them. When all the values of such symbol are removed with a command
|
||
|
like "restore", the built-in value persists.
|
||
|
The rules concerning namespace apply equally to the symbols of all classes,
|
||
|
for example symbol of instruction class belonging to the child namespace of
|
||
|
latest label can be executed by preceding its name with dot. It should be
|
||
|
noted, however, that when a namespace is specified through its parent symbol,
|
||
|
it is always a symbol belonging to the expression class. It is not possible to
|
||
|
refer to a child namespace of an instruction, only to the namespace belonging
|
||
|
to the expression class symbol with the same name.
|
||
|
|
||
|
xor?.mask? := 10101010b
|
||
|
a = XOR.MASK ; symbol in the namespace of built-in case-insensitive "XOR"
|
||
|
|
||
|
label?.test? := 0
|
||
|
a = LABEL.TEST ; undefined unless "label?" is defined
|
||
|
|
||
|
Here the namespace containing "test" belongs to an expression-class symbol,
|
||
|
not to the existing instruction "label". When there is no expression-class symbol
|
||
|
that would fit the "LABEL" specifier, the namespace chosen is the one that would
|
||
|
belong to the case-sensitive symbol of such name. The "test" is therefore not found,
|
||
|
because it has been defined in another namespace - the one of case-insensitive "label".
|
||
|
|
||
|
|
||
|
6. Generating data
|
||
|
|
||
|
The "db" instruction allows to generate bytes of data and put them into the
|
||
|
output. It should be followed by one or more values, separated with commas.
|
||
|
When the value is numeric, it defines a single byte. When the value is a
|
||
|
string, it puts the string of bytes into output.
|
||
|
|
||
|
db 'Hello',13,10 ; generate 7 bytes
|
||
|
|
||
|
The "dup" keyword may be used to generate the same value multiple times. The
|
||
|
"dup" should be preceded by numeric expression defining the number of
|
||
|
repetitions, and the value to be repeated should follow. A sequence of values
|
||
|
may also be duplicated this way, in such case "dup" should be followed by the
|
||
|
entire sequence enclosed in parentheses (with values separated with commas).
|
||
|
|
||
|
db 4 dup 90h ; generate 4 bytes
|
||
|
db 2 dup ('abc',10) ; generate 8 bytes
|
||
|
|
||
|
When a special identifier consisting of a lone "?" character is used as a
|
||
|
value in the arguments to "db", it reserves a single byte. This advances the
|
||
|
address in the output where the next data are going to be put, but the reserved
|
||
|
bytes are not generated themselves unless they are followed by some other data.
|
||
|
Therefore if the bytes are reserved at the end of output, they do not increase
|
||
|
the size of generated file. This kind of data is called uninitialized, while
|
||
|
all the regular data are said to be initialized.
|
||
|
The "rb" instruction reserves a number of bytes specified by its argument.
|
||
|
|
||
|
db ? ; reserve 1 byte
|
||
|
rb 7 ; reserve 7 bytes
|
||
|
|
||
|
Every built-in instruction that generates data (traditionally called a data
|
||
|
directive) is paired with a labeled instruction of the same name. Such command
|
||
|
in addition to generating data defines a label at address of generated data,
|
||
|
with associated size equal to the size of data unit used by this instruction.
|
||
|
In case of "db" and "rb" this size is 1.
|
||
|
|
||
|
some db sizeof some ; generate a byte with value 1
|
||
|
|
||
|
The "dw", "dd", "dp", "dq", "dt", "ddq", "dqq" and "ddqq" are instructions
|
||
|
analogous to "db" with a different sizes of data unit. The order of bytes
|
||
|
within a single generated unit is always little-endian. When a string of bytes
|
||
|
is provided as the value to any of these instructions, the generated data
|
||
|
is extended with zero bytes to the length which is the multiple of data unit.
|
||
|
The "rw", "rd", "rp", "rq", "rt", "rdq", "rqq" and "rdqq" are the instructions
|
||
|
that reserve a specified number of data units. The unit sizes associated with
|
||
|
all these instructions are listed in table 1.
|
||
|
The "dw", "dd", "dq", "dt" and "ddq" instructions allow floating-point
|
||
|
numbers as data units. Any such number is then converted into floating-point
|
||
|
format appropriate for a given size.
|
||
|
The "emit" (with a synonym "dbx") is a data directive that uses the size
|
||
|
of unit specified by its first argument to generate data defined by
|
||
|
the remaining ones. The size may be separated from the next argument with
|
||
|
a colon instead of a comma, for better readability. When the unit size
|
||
|
is such that it has a dedicated data directive, the definition made with "emit"
|
||
|
has the same effect as if these values were passed to the instruction tailored
|
||
|
for this size.
|
||
|
|
||
|
emit 2: 0,1000,2000 ; generate three 16-bit values
|
||
|
|
||
|
The "file" instruction reads the data from an external file and writes it
|
||
|
into output. The argument must be a string containing the path to the file, it
|
||
|
may optionally be followed by ":" and the numeric value specifying an offset
|
||
|
within the file, next it may be followed by comma and the numeric value
|
||
|
specifying how many bytes to copy.
|
||
|
|
||
|
file 'data.bin' ; insert entire file
|
||
|
excerpt file 'data.bin':10h,4 ; insert selected four bytes
|
||
|
|
||
|
|
||
|
Table 1 Data directives
|
||
|
/------------------------------\
|
||
|
| Unit | Generate | Reserve |
|
||
|
| (bytes) | data | data |
|
||
|
|=========|==========|=========|
|
||
|
| 1 | db | rb |
|
||
|
| | file | |
|
||
|
|---------|----------|---------|
|
||
|
| 2 | dw | rw |
|
||
|
|---------|----------|---------|
|
||
|
| 4 | dd | rd |
|
||
|
|---------|----------|---------|
|
||
|
| 6 | dp | rp |
|
||
|
|---------|----------|---------|
|
||
|
| 8 | dq | rq |
|
||
|
|---------|----------|---------|
|
||
|
| 10 | dt | rt |
|
||
|
|---------|----------|---------|
|
||
|
| 16 | ddq | rdq |
|
||
|
|---------|----------|---------|
|
||
|
| 32 | dqq | rqq |
|
||
|
|---------|----------|---------|
|
||
|
| 64 | ddqq | rdqq |
|
||
|
|---------|----------|---------|
|
||
|
| * | emit | |
|
||
|
\------------------------------/
|
||
|
|
||
|
|
||
|
7. Conditional assembly
|
||
|
|
||
|
The "if" instruction causes a block of source text to be assembled only
|
||
|
under certain condition, specified by a logical expression that is an argument
|
||
|
to this instruction. The "else if" command in the following lines
|
||
|
ends the previous conditionally assembled block and opens a new one, assembled
|
||
|
only when the previous conditions were not met and the new condition (an
|
||
|
argument to "else if") is true. The "else" command ends the previous
|
||
|
conditionally assembled block and begins a block that is assembled only when
|
||
|
none of the previous conditions was true. The "end if" command should be used
|
||
|
to end the entire construction. There may be many or none "else if" commands
|
||
|
inside and no more than one "else".
|
||
|
A logical expression is a distinct syntactical entity from the basic
|
||
|
expressions that were described earlier. A logical expression consists of
|
||
|
logical values connected with logical operators. The logical operators are:
|
||
|
unary "~" for negation, "&" for conjunction and "|" for alternative.
|
||
|
The negation is evaluated first, while "&" and "|" are simply evaluated
|
||
|
from left to right, with no precedence over each other.
|
||
|
A logical value in its simplest form may be a basic expression, it then
|
||
|
corresponds to true condition if and only if its value is not constant zero.
|
||
|
Another way to create a logical value is to compare the values of two basic
|
||
|
expressions with one of the following operators: "=" (equal), "<" (less than),
|
||
|
">" (greater than), "<=" (less or equal), ">=" (greater or equal),
|
||
|
"<>" (not equal).
|
||
|
|
||
|
count = 2
|
||
|
if count > 1
|
||
|
db '0'
|
||
|
db count-1 dup ',0'
|
||
|
else if count = 1
|
||
|
db '0'
|
||
|
end if
|
||
|
|
||
|
When linear polynomials are compared this way, the logical value is
|
||
|
valid only when they are comparable, which is whey they differ in constant
|
||
|
term only. Otherwise the condition like equality is neither universally true
|
||
|
nor universally false, since it depends on the values substituted for variables,
|
||
|
and assembler signals this as an error.
|
||
|
The "relativeto" operator creates a logical value that is true only when
|
||
|
the difference of compared values does not contain any variable terms. Therefore
|
||
|
it can be used to check whether two linear polynomials are comparable - the
|
||
|
"relativeto" condition is true only when both compared polynomials have the same
|
||
|
variable terms.
|
||
|
Because logical expressions are lazily evaluated, it is possible to create
|
||
|
a single condition that will not cause an error when the polynomials are not
|
||
|
comparable, but will compare them if they are:
|
||
|
|
||
|
if a relativeto b & a > b
|
||
|
db a - b
|
||
|
end if
|
||
|
|
||
|
The "eqtype" operator can also be used to compare two basic expressions,
|
||
|
it makes a logical value which is true when the values of the expressions are
|
||
|
of the same type - either both are algebraic, both are strings or both are
|
||
|
floating-point numbers. An algebraic type covers the linear polynomials and
|
||
|
it includes the integer values.
|
||
|
The "eq" operator compares two basic expressions and creates a logical value
|
||
|
which is true only when their values are of the same type and equal. This operator
|
||
|
can be used to check whether a value is a certain string, a certain floating-point
|
||
|
number or a certain linear polynomial. It can compare values that are not
|
||
|
comparable with "=" operator.
|
||
|
The "defined" operator creates a logical value combined with a basic expression
|
||
|
that follows it. This condition is true when the expression does not contain
|
||
|
symbols that have no accessible definition. The expression is only tested for the
|
||
|
availability of its components, it does not need to have a computable value.
|
||
|
This can be used to check whether a symbol of expression class has been defined,
|
||
|
but since the symbol can be accessible through forward-referencing, this condition
|
||
|
may be true even when the symbol is defined later in source. If this is undesirable,
|
||
|
the "definite" operator should be used instead, as it checks whether all symbols
|
||
|
within a basic expression that follows have been defined earlier.
|
||
|
The basic expression that follows "defined" is also allowed to be empty and
|
||
|
the condition is then trivially satisfied. This does not apply to "definite".
|
||
|
The "used" operator forms a logical value if it is followed by a single
|
||
|
identifier. This condition is true when the value of specified symbol has
|
||
|
been used anywhere in the source.
|
||
|
The "assert" is an instruction that signalizes an error when a condition
|
||
|
specified by its argument is not met.
|
||
|
|
||
|
assert a < 65536
|
||
|
|
||
|
|
||
|
8. Macroinstructions
|
||
|
|
||
|
The "macro" command allows to define a new instruction, in form of a
|
||
|
macroinstruction. The block of source text between the "macro" and
|
||
|
"end macro" command becomes the text of macroinstruction and this sequence
|
||
|
of lines is assembled in place of the original command that starts with
|
||
|
identifier of instruction defined this way.
|
||
|
|
||
|
macro null
|
||
|
db 0
|
||
|
end macro
|
||
|
|
||
|
null ; "db 0" is assembled here
|
||
|
|
||
|
The macroinstruction is allowed to have arguments only when the
|
||
|
definition contains them. After the "macro" and the identifier of defined
|
||
|
symbol optionally may come a list of simple names separated with commas,
|
||
|
these names define the parameters of macroinstruction. When this instruction
|
||
|
is then used, it may be followed by at most the same number of arguments
|
||
|
separated with commas, and their values are assigned to the consecutive
|
||
|
parameters. Before any line of text inside the macroinstruction is interpreted,
|
||
|
the name tokens that correspond to any of the parameters are replaced with their
|
||
|
assigned values.
|
||
|
|
||
|
macro lower name,value
|
||
|
name = value and 0FFh
|
||
|
end macro
|
||
|
|
||
|
lower a,123h ; a = 23h
|
||
|
|
||
|
The value of a parameter can be any text, not necessarily a correct expression.
|
||
|
If a line calling the macroinstruction contains fewer arguments than the
|
||
|
number of defined parameters, the excess parameters receive the empty values.
|
||
|
When a name of a parameter is defined, it may be followed by "?" character
|
||
|
to denote that it is case-insensitive, analogously to a name in a symbol
|
||
|
identifier. There must be no whitespace between the name and "?".
|
||
|
A definition of a parameter may also be followed by "*" to denote that it
|
||
|
requires a value that is not empty, or alternatively by ":" character
|
||
|
followed by a default value, which is assigned to the parameter instead of
|
||
|
an empty one when no other value is provided.
|
||
|
|
||
|
macro prepare name*,value:0
|
||
|
name = value
|
||
|
end macro
|
||
|
|
||
|
prepare x ; x = 0
|
||
|
prepare y,1 ; y = 1
|
||
|
|
||
|
If an argument to macroinstruction needs to contain a comma character, the
|
||
|
entire argument must be enclosed between the "<" and ">" characters (they do
|
||
|
not become a part of the value). If another "<" character is encountered inside
|
||
|
such value, it must be balanced with corresponding ">" character inside the
|
||
|
same value.
|
||
|
|
||
|
macro data name,value
|
||
|
name:
|
||
|
.data db value
|
||
|
.end:
|
||
|
end macro
|
||
|
|
||
|
data example, <'abc',10>
|
||
|
|
||
|
The last defined parameter may be followed by "&" character to denote that
|
||
|
this parameter should be assigned a value containing the entire remaining
|
||
|
part of line, even if it normally would define multiple arguments. Therefore
|
||
|
when macroinstruction has just one parameter followed by "&", the value of
|
||
|
this parameter is the entire text of arguments following the instruction.
|
||
|
|
||
|
macro id first,rest&
|
||
|
dw first
|
||
|
db rest
|
||
|
end macro
|
||
|
|
||
|
id 2, 7,1,8
|
||
|
|
||
|
When a name of a parameter is to be replaced with its value and it is
|
||
|
preceded by "`" character (without any whitespace inbetween), the text of
|
||
|
the value is embedded into a quoted string and this string replaces
|
||
|
both the "`" character and the name of parameter.
|
||
|
|
||
|
macro text line&
|
||
|
db `line
|
||
|
end macro
|
||
|
|
||
|
text x+1 ; db 'x+1'
|
||
|
|
||
|
The "local" is a command that may only be used inside a macroinstruction.
|
||
|
It should be followed by one or more names separated with commas, and it
|
||
|
declares that the names from this list should in the context of current
|
||
|
macroinstruction be interpreted as belonging to a special namespace
|
||
|
associated with this macroinstruction instead of current base namespace. This
|
||
|
allows to create unique symbols every time the macroinstruction is called.
|
||
|
Such declaration defines additional parameters with the specified names and
|
||
|
therefore only affects the uses of those names that follow within the same
|
||
|
macroinstruction. Declaring the same name as local multiple times within
|
||
|
the same macroinstruction gives no additional effect.
|
||
|
|
||
|
macro measured name,string
|
||
|
local top
|
||
|
name db string
|
||
|
top: name.length = top - name
|
||
|
end macro
|
||
|
|
||
|
measured hello, 'Hello!' ; hello.length = 6
|
||
|
|
||
|
A parameter created with "local" becomes replaced with a text that contains
|
||
|
the same name as the name of parameter, but has added context information
|
||
|
that causes it to be identified as belonging to the unique local namespace
|
||
|
associated with the instance of macroinstruction. This kind of context
|
||
|
information is going to be discussed further in the section about
|
||
|
symbolic variables.
|
||
|
A symbol that is local to a macroinstruction is never considered the most
|
||
|
recent label that is base for symbols starting with dot. Moreover, its
|
||
|
descendant namespace is disconnected from the main tree of symbols, so if
|
||
|
"namespace" command was used with a local symbol as the argument, symbols
|
||
|
from the main tree would no longer be visible (including all the named
|
||
|
instructions of the assembler, even commands like "end namespace").
|
||
|
Just like an expression symbol may be redefined and refer to its previous
|
||
|
value in the definition of the new one, the macroinstructions can also be
|
||
|
redefined, and use the previous value of this instruction symbol in its
|
||
|
text:
|
||
|
|
||
|
macro zero
|
||
|
db 0
|
||
|
end macro
|
||
|
|
||
|
macro zero name
|
||
|
label name:byte
|
||
|
zero
|
||
|
end macro
|
||
|
|
||
|
zero x
|
||
|
|
||
|
And just like other symbols, a macroinstruction may be forward-referenced when
|
||
|
it is defined exactly once in the entire source.
|
||
|
The "purge" command discards the definition of a symbol just like "restore",
|
||
|
but it does so for the symbol of instruction class. It behaves in the same
|
||
|
way as "restore" in all the other aspects. A macroinstruction can remove its
|
||
|
own definition with "purge".
|
||
|
It is possible for a macroinstruction to use its own value in a recursive way,
|
||
|
but to avoid inadvertent infinite recursion this feature is only available when
|
||
|
the macroinstruction is marked as such by following its identifier with ":"
|
||
|
character.
|
||
|
|
||
|
macro factorial: n
|
||
|
if n
|
||
|
factorial n-1
|
||
|
result = result * (n)
|
||
|
else
|
||
|
result = 1
|
||
|
end if
|
||
|
end macro
|
||
|
|
||
|
In addition to allowing recursion, such macroinstruction behaves like a constant.
|
||
|
It cannot be redefined and "purge" cannot be applied to it.
|
||
|
A macroinstruction may in turn define another macroinstruction or a number
|
||
|
of them. The blocks designated by "macro" and "end macro" must be properly
|
||
|
nested one within the other for such definition to be accepted by the
|
||
|
assembler.
|
||
|
|
||
|
macro enum enclosing
|
||
|
counter = 0
|
||
|
macro item name
|
||
|
name := counter
|
||
|
counter = counter + 1
|
||
|
end macro
|
||
|
macro enclosing
|
||
|
purge item,enclosing
|
||
|
end macro
|
||
|
end macro
|
||
|
|
||
|
enum x
|
||
|
item a
|
||
|
item b
|
||
|
item c
|
||
|
x
|
||
|
|
||
|
When it is required that macroinstruction generates unpaired "macro" or
|
||
|
"end macro" command, it can be done with special "esc" instruction. Its
|
||
|
argument becomes a part of macroinstruction, but is not being taken into
|
||
|
account when counting the nested "macro" and "end macro" pairs.
|
||
|
|
||
|
macro xmacro name
|
||
|
esc macro name x&
|
||
|
end macro
|
||
|
|
||
|
xmacro text
|
||
|
db `x
|
||
|
end macro
|
||
|
|
||
|
If "esc" is placed inside a nested definition, it is not processed out until
|
||
|
the innermost macroinstruction becomes defined. This allows a definition
|
||
|
containing "esc" to be placed inside another macroinstruction without having
|
||
|
to repeat "esc" for every nesting level.
|
||
|
When an identifer of macroinstruction in its definition is followed by "!"
|
||
|
character, it defines an unconditional macroinstruction. This is a special
|
||
|
kind of instruction class symbol, which is evaluated even in places where the
|
||
|
assembly is suspended - like inside a conditional block whose condition is
|
||
|
false, or inside a definition of another macroinstruction. This allows to
|
||
|
define instructions that can be used where otherwise a directly stated
|
||
|
"end if" or "end macro" would be required, as in the following example:
|
||
|
|
||
|
macro proc name
|
||
|
name:
|
||
|
if used name
|
||
|
end macro
|
||
|
|
||
|
macro endp!
|
||
|
end if
|
||
|
.end:
|
||
|
end macro
|
||
|
|
||
|
proc tester
|
||
|
db ?
|
||
|
endp
|
||
|
|
||
|
If the macroinstruction "endp" in the above sample was not defined as an
|
||
|
unconditional one and the block started with "if" was being skipped, the
|
||
|
macroinstruction would not get evaluated, and this would lead to an error
|
||
|
because "end if" would be missing.
|
||
|
It should be noted that "end" command executes an instruction identified
|
||
|
by its argument in the child namespace of case-insensitive "end" symbol.
|
||
|
Therefore command like "end if" could be alternatively invoked with
|
||
|
an "end.if" identifier, and it is possible to override any such instruction
|
||
|
by redefining a symbol in the "end?" namespace. Moreover, any instruction
|
||
|
defined within the "end?" namespace can then be called with the "end" command.
|
||
|
This slighly modified variant of the above sample puts these facts to use:
|
||
|
|
||
|
macro proc name
|
||
|
name:
|
||
|
if used name
|
||
|
end macro
|
||
|
|
||
|
macro end?.proc!
|
||
|
end if
|
||
|
.end:
|
||
|
end macro
|
||
|
|
||
|
proc tester
|
||
|
db ?
|
||
|
end proc
|
||
|
|
||
|
A similar rule applies to the "else" command and the instructions in the
|
||
|
"else?" namespace.
|
||
|
When an identifier consisting of a lone "?" character is used as an
|
||
|
instruction symbol in the definition of macroinstruction, it defines a special
|
||
|
instruction that is then called every time a line to be assembled does not
|
||
|
contain an unconditional instruction, and the complete text of line becomes
|
||
|
the arguments to this macroinstruction. This special symbol can also be defined
|
||
|
as an unconditional instruction, and then it is called for every following line
|
||
|
with no exception. This allows to completely override the assembly process on
|
||
|
portions of the text. The following sample defines a macroinstruction which
|
||
|
allows to define a block of comments by skiping all the lines of text until it
|
||
|
encounters a line with content equal to the argument given to "comment".
|
||
|
|
||
|
macro comment? ender
|
||
|
macro ?! line&
|
||
|
if `line = `ender
|
||
|
purge ?
|
||
|
end if
|
||
|
end macro
|
||
|
end macro
|
||
|
|
||
|
comment ~
|
||
|
Any text may follow here.
|
||
|
~
|
||
|
|
||
|
An identifier consisting of two question marks can be used to define a special
|
||
|
instruction that is called only as last resort, on lines that contain no
|
||
|
recognizable instruction. This allows to intercept lines that would otherwise
|
||
|
be rejected with "illegal instruction" message due to unknown syntax.
|
||
|
The "mvmacro" is an instruction that takes two arguments, both identifying
|
||
|
an instruction-class symbols. The definition of a macroinstruction specified
|
||
|
by the second argument is moved to the symbol identified by the first one.
|
||
|
For the second symbol the effect of this command is the same as of "purge".
|
||
|
This allows to effectively rename a macroinstruction, or temporarily disable it
|
||
|
only to bring it back later. The symbols affected by this operation become
|
||
|
variables and cannot be forward-referenced.
|
||
|
|
||
|
|
||
|
9. Labeled macroinstructions
|
||
|
|
||
|
The "struc" command allows to define a labeled instruction, in form of a
|
||
|
macroinstruction. Except for the fact that such definition must be closed
|
||
|
with "end struc" instead of "end macro", these macroinstructions are defined
|
||
|
in the same way as with "macro" command. A labeled instruction is evaluated
|
||
|
when the first identifier of a command is not an instruction and the second
|
||
|
identifier is of the labeled instruction class:
|
||
|
|
||
|
struc some
|
||
|
db 1
|
||
|
end struc
|
||
|
|
||
|
get some ; "db 1" is assembled here
|
||
|
|
||
|
Inside a labeled macroinstruction identifiers starting with dot no longer
|
||
|
refer to the namespace of a previously defined regular label. Instead they
|
||
|
refer to the namespace of label with which the instruction was labeled.
|
||
|
|
||
|
struc POINT
|
||
|
label . : qword
|
||
|
.x dd ?
|
||
|
.y dd ?
|
||
|
end struc
|
||
|
|
||
|
my POINT ; defines my.x and my.y
|
||
|
|
||
|
Note that the parent symbol, which can be refered by "." identifier, is not
|
||
|
defined unless an appropriate definition is generated by the macroinstruction.
|
||
|
Furthermore, this symbol is not considered the most recent label in
|
||
|
the surrounding namespace unless it gets defined as an actual label in
|
||
|
the macroinstruction it labeled.
|
||
|
For an easier use of this feature, other syntaxes may be defined with
|
||
|
macroinstructions, like in this sample:
|
||
|
|
||
|
macro struct? definition&
|
||
|
esc struc definition
|
||
|
label . : .%top - .
|
||
|
namespace .
|
||
|
end macro
|
||
|
|
||
|
macro ends?!
|
||
|
%top:
|
||
|
end namespace
|
||
|
esc end struc
|
||
|
end macro
|
||
|
|
||
|
struct POINT vx:?,vy:?
|
||
|
x dd vx
|
||
|
y dd vy
|
||
|
ends
|
||
|
|
||
|
my POINT 10,20
|
||
|
|
||
|
The "restruc" command is analogous to "purge", but it operates on symbols
|
||
|
from the class of labeled instructions. Similarly, the "mvstruc" command is
|
||
|
the same as "mvmacro" but for labeled instructions.
|
||
|
As with "macro", it is possible to use an identifier consisting of a lone "?"
|
||
|
character with "struc". It defines a special labeled macroinstruction that is
|
||
|
called every time the first symbol of a line is not recognized as an instruction.
|
||
|
Everything that follows that first identifier becomes the arguments to labeled
|
||
|
macroinstruction. The following sample uses this feature to catch any orphaned
|
||
|
labels (the ones that are not followed by any character) and treat them as regular
|
||
|
ones instead of causing an error. It achieves it by making ":" the default value
|
||
|
for "def" parameter:
|
||
|
|
||
|
struc ? def::&
|
||
|
. def
|
||
|
end struc
|
||
|
|
||
|
orphan
|
||
|
regular:
|
||
|
assert orphan = regular
|
||
|
|
||
|
Similarly to "macro" this special variant does not override unconditional labeled
|
||
|
instructions unless it is unconditional itself.
|
||
|
While "." provides an efficient method of accessing the label symbol,
|
||
|
sometimes it may be needed to process the actual text of the label.
|
||
|
A special parameter can be defined for this purpose and its name should be
|
||
|
inserted enclosed in parentheses before the name of labeled macroinstruction:
|
||
|
|
||
|
struc (name) SYMBOL
|
||
|
. db `name,0
|
||
|
end struc
|
||
|
|
||
|
test SYMBOL
|
||
|
|
||
|
|
||
|
10. Symbolic variables and recognition context
|
||
|
|
||
|
The "equ" is a built-in labeled instruction that defines symbol of expression
|
||
|
class with a symbolic value. Such value contains a snippet of source text
|
||
|
consisting of any number of tokens (even zero, allowing for an empty value)
|
||
|
and when it is used in an expression it is equivalent to inserting the text
|
||
|
of its value in place of its identifier, with an effect similar to
|
||
|
evaluation of a parameter of macroinstruction (except that a parameter is
|
||
|
always identified by a single name, while a symbolic value may be hidden
|
||
|
behind a complex identifier).
|
||
|
This can lead to an unexpected outcome compared to the use of standard
|
||
|
variables defined with "=", as the following example demonstrates:
|
||
|
|
||
|
numeric = 2 + 2
|
||
|
symbolic equ 2 + 2
|
||
|
x = numeric*3 ; x = 4*3
|
||
|
y = symbolic*3 ; y = 2 + 2*3
|
||
|
|
||
|
While "x" is assigned the value of 12, the value of "y" is 8. This shows that
|
||
|
the use of such symbols can lead to unintended interactions and therefore
|
||
|
definitions of this type should be avoided unless really necessary.
|
||
|
The "equ" allows redefinitions, and it preserves the previous value of
|
||
|
symbol analogously to the "=:" command, so the earlier value can be brought
|
||
|
back with "restore" instruction. To replace the symbolic value (analogously
|
||
|
to how "=" overwrites the regular value) the "reequ" command should be used
|
||
|
instead of "equ".
|
||
|
A symbolic value, in addition to retaining the exact text it was defined
|
||
|
with, preserves the context in which the symbols contained in this text are
|
||
|
to be interpreted. Therefore it can effectively become a reliable link to
|
||
|
value of some other symbol, lasting even when it is used in a different
|
||
|
context (this includes change of the base namespace or a symbol referred by
|
||
|
a starting dot):
|
||
|
|
||
|
first:
|
||
|
.x = 1
|
||
|
link equ .x
|
||
|
.x = 2
|
||
|
second:
|
||
|
.x = 3
|
||
|
db link ; db 2
|
||
|
|
||
|
It should be noted that the same process is applied to the arguments of any
|
||
|
macroinstruction when they become preprocessed parameters. If during
|
||
|
the execution of a macroinstruction the context changes, the identifiers
|
||
|
within the text of parameters still refer to the same symbols as in the line
|
||
|
that called the instruction:
|
||
|
|
||
|
x = 1
|
||
|
namespace x
|
||
|
x = 2
|
||
|
end namespace
|
||
|
macro prodx value
|
||
|
namespace x
|
||
|
db value*x
|
||
|
end namespace
|
||
|
end macro
|
||
|
prodx x ; db 1*2
|
||
|
|
||
|
Furthermore, parameters defined with "local" command use the same mechanism
|
||
|
to alter the context in which given name is interpreted, without altering
|
||
|
the text of the name. However, such modified context is not relevant
|
||
|
if the value of parameter is inserted in a middle or at the end of
|
||
|
a complex identifier, because it is the structure of an identifier that
|
||
|
dictates how its later parts are interpreted and only the context for an
|
||
|
initial part matters. For example, prepending a name of a parameter with
|
||
|
"#" character is going to cause the identifier to use current context instead
|
||
|
of context carried by the text of that parameter, because initial context
|
||
|
for the identifier is then the context associated with text "#".
|
||
|
If the text following "equ" contains identifiers of known symbolic variables,
|
||
|
each of them is replaced with its contents and it is such processed text that
|
||
|
gets assigned to the newly defined symbol.
|
||
|
The "define" is a regular instruction that also creates a symbolic value,
|
||
|
but as opposed to "equ" it does not evaluate symbolic variables in the
|
||
|
assigned text. It should be followed by an identifier of symbol to be defined
|
||
|
and then by the text of the value.
|
||
|
The difference between "equ" and "define" is often not noticeable, because
|
||
|
when used in final expression the symbolic variables are nestedly evaluated
|
||
|
until only the usable constituents of expressions are left. A possible use of
|
||
|
"define" is to create a link to another symbolic variable, like the following
|
||
|
example demonstrates:
|
||
|
|
||
|
a equ 0*
|
||
|
x equ -a
|
||
|
define y -a
|
||
|
a equ 1*
|
||
|
db x 2 ; db -0*2
|
||
|
db y 2 ; db -1*2
|
||
|
|
||
|
The other uses of "define" will arise in the later sections, with the
|
||
|
introduction of other instructions that operate on symbolic values.
|
||
|
The "define", like "equ", preserves the previous value of symbol. The
|
||
|
"redefine" is a variant of this instruction that discards the earlier value,
|
||
|
analogously to "reequ".
|
||
|
Note that while symbolic variables belong to the expression class of symbols,
|
||
|
their state cannot be determined with operators like "defined", "definite",
|
||
|
or "used", because a logical expression is evaluated as if every symbolic
|
||
|
variable was replaced with the text of corresponding value. Therefore operator
|
||
|
followed by an identifer of symbolic variable is going to be applied to
|
||
|
the content of this variable, whatever it is. For example if a symbolic variable
|
||
|
is made which is a link to a regular symbol, then any operator like "defined"
|
||
|
followed by the identifier of said symbolic variable is going to determine
|
||
|
the status of a linked symbol, not a linking variable.
|
||
|
Unlike the value of a symbolic variable, the body of a macroinstruction
|
||
|
by itself carries no context (although it may contain snippets of text that
|
||
|
came from replaced parameters and because of that have some context associated
|
||
|
with them). Also, if a macroinstruction becomes unrolled at the time when
|
||
|
another one is being defined (this can only happen when called macroinstruction
|
||
|
is unconditional), no context information is added to the arguments, to aid in
|
||
|
preservation of this context-lessness.
|
||
|
It also also possible to force a macro argument to add no context information
|
||
|
to its text. The name of such argument should be preceded by "&" character.
|
||
|
This allows to have an argument whose text is reinterpreted in the new context
|
||
|
during the evaluation of a macro.
|
||
|
|
||
|
char = 'A'
|
||
|
other.char = 'W'
|
||
|
|
||
|
macro both a, &b
|
||
|
namespace other
|
||
|
db a, b
|
||
|
end namespace
|
||
|
end macro
|
||
|
|
||
|
both char+1, char+1 ; db 'B', 'X'
|
||
|
|
||
|
|
||
|
11. Repeating blocks of instructions
|
||
|
|
||
|
The "repeat" instruction allows to assemble a block of instructions multiple
|
||
|
times, with the number of repetitions specified by the value of its argument.
|
||
|
The block of instructions should be ended with "end repeat" command. A synonym
|
||
|
"rept" can be used instead of "repeat".
|
||
|
|
||
|
a = 2
|
||
|
repeat a + 3
|
||
|
a = a + 1
|
||
|
end repeat
|
||
|
assert a = 7
|
||
|
|
||
|
The "while" instruction causes the block of instructions to be assembled
|
||
|
repeatedly as long as the condition specified by its argument is true. Its
|
||
|
argument should be a logical expression, like an argument for "if" or
|
||
|
"assert". The block should be closed with "end while" command.
|
||
|
|
||
|
a = 7
|
||
|
while a > 4
|
||
|
a = a - 2
|
||
|
end while
|
||
|
assert a = 3
|
||
|
|
||
|
The "%" is a special parameter which is preprocessed inside the repeated
|
||
|
block of instructions and is replaced with a decimal number being the number
|
||
|
of current repetition (starting with 1). It works in a similar way to a
|
||
|
parameter of macroinstruction, so it is replaced with its value before the
|
||
|
actual command is processed and so it can be used to create symbol
|
||
|
identifiers containing the number as a part of name:
|
||
|
|
||
|
repeat 16
|
||
|
f#% = 1 shl %
|
||
|
end repeat
|
||
|
|
||
|
The above example defines symbols "f1" to "f16" with values being the
|
||
|
consecutive powers of two.
|
||
|
The "repeat" instruction can have additional arguments, separated with
|
||
|
commas, each containing a name of supplementary parameters specific to this
|
||
|
block. Each of the names can be followed by ":" character and the expression
|
||
|
specifying the base value from which the parameter is going to start counting
|
||
|
the repetitions. This allows to easily change the previous sample to define
|
||
|
the range of symbols from "f0" to "f15":
|
||
|
|
||
|
repeat 16, i:0
|
||
|
f#i = 1 shl i
|
||
|
end repeat
|
||
|
|
||
|
The "%%" is another special parameter that has a value equal to the total
|
||
|
number of repetitions planned. This parameter is undefined inside the "while"
|
||
|
block. The following example uses it to create the sequence of bytes with
|
||
|
values descending from 255 to 0:
|
||
|
|
||
|
repeat 256
|
||
|
db %%-%
|
||
|
end repeat
|
||
|
|
||
|
The "break" instruction allows to stop the repeating prematurely. When it
|
||
|
is encountered, it causes the rest of repeated block to be skipped and no
|
||
|
further repetitions to be executed. It can be used to stop the repeating if
|
||
|
a certain condition is met:
|
||
|
|
||
|
s = x/2
|
||
|
repeat 100
|
||
|
if x/s = s
|
||
|
break
|
||
|
end if
|
||
|
s = (s+x/s)/2
|
||
|
end repeat
|
||
|
|
||
|
The above sample tries to find the square root of the value of symbol "x",
|
||
|
which is assumed defined elsewhere. It can easily be rewritten to perform the
|
||
|
same task with "while" instead of "repeat":
|
||
|
|
||
|
s = x/2
|
||
|
while x/s <> s
|
||
|
s = (s+x/s)/2
|
||
|
if % = 100
|
||
|
break
|
||
|
end if
|
||
|
end while
|
||
|
|
||
|
The "iterate" instruction (with a synonym "irp") repeats the block of
|
||
|
instructions while iterating through the list of values separated with commas.
|
||
|
The first argument to "iterate" should be the a name of parameter, folowed by
|
||
|
the comma and then a list of values. During each iteration the parameter
|
||
|
receives one of the values from the list.
|
||
|
|
||
|
iterate value, 1,2,3
|
||
|
db value
|
||
|
end iterate
|
||
|
|
||
|
Like it is in the case of an argument to macroinstruction, the value of parameter
|
||
|
that contains commas needs to be enclosed with "<" and ">" characters. It is
|
||
|
also possible to enclose the first argument to "iterate" with "<" and ">", in
|
||
|
order to define multiple parameters. The list of values is then divided
|
||
|
into section containing as many values as there are parameters, and each
|
||
|
iteration operates on one such section, assigning to each parameter a
|
||
|
corresponding value:
|
||
|
|
||
|
iterate <name,value>, a,1, b,2, c,3
|
||
|
name = value
|
||
|
end iterate
|
||
|
|
||
|
The name of a parameter can also, like in the case of macroinstructions, be
|
||
|
followed by "*" to require that the parameter has a value that is not empty,
|
||
|
or ":" and a default value. If an "iterate" statement ends with a comma not
|
||
|
followed by anything else, it is not interpreted as an additional empty value,
|
||
|
to put a blank value at the end of list an empty enclosing "<>" needs to be used.
|
||
|
The "break" instruction plus both the "%" and "%%" parameters can be used
|
||
|
inside the "iterate" block with the same effects as in case of "repeat".
|
||
|
The "indx" is an instruction that can be only be used inside an iterated
|
||
|
block and it changes the values of all the iterated parameters to the ones
|
||
|
corresponding to iteration with number specified by the argument to "indx" (but
|
||
|
when the next iteration is started, the values of parameters are again assigned
|
||
|
the normal way). This allows to process the iterated values in a different
|
||
|
order. In the following example the values are processed from the last to the
|
||
|
first:
|
||
|
|
||
|
iterate value, 1,2,3
|
||
|
indx 1+%%-%
|
||
|
db value
|
||
|
end iterate
|
||
|
|
||
|
With "indx" it is even possible to move the view of iterated values many times
|
||
|
during the single repetition. In the following example the entire processing
|
||
|
is done during the first repetition of iterated block and then the "break"
|
||
|
instruction is used to prevent further iterations:
|
||
|
|
||
|
iterate str, 'alpha','beta','gamma'
|
||
|
repeat %%
|
||
|
dw offset#%
|
||
|
end repeat
|
||
|
repeat %%
|
||
|
indx %
|
||
|
offset#% db str
|
||
|
end repeat
|
||
|
break
|
||
|
end iterate
|
||
|
|
||
|
The parameters defined by "iterate" do not attach the context to iterated
|
||
|
values, but neither do they remove the original context if such is already
|
||
|
attached to the text of arguments. So if the values given to "iterate" were
|
||
|
themselves created from another parameter that preserved the original context
|
||
|
for the symbol identifiers (like the parameter of macroinstruction), then this
|
||
|
context is preserved, but otherwise "iterate" defines just a plain text
|
||
|
substitution.
|
||
|
The parameters defined by instructions like "iterate" or "repeat" are
|
||
|
processed everywhere in the text of associated block, but with some limitations
|
||
|
if the block is defined partly by the text of macroinstruction and partly in
|
||
|
other places. In that case the parameters are only accessible in the parts of
|
||
|
the block that are defined in the same place as the initial command.
|
||
|
Every time a parameter is defined, its name can have the "?" character
|
||
|
attached to it to indicate that this parameter is case-insensitive. However
|
||
|
when parameters are recognized inside the preprocessed line, it does not matter
|
||
|
whether they are followed by "?" there. The only modifier that is recognized
|
||
|
by preprocessor when it replaces the parameter with its value is the "`"
|
||
|
character.
|
||
|
The repeating instructions together with "if" belong to a group called
|
||
|
control directives. They are the instructions that control the flow of
|
||
|
assembly. Each of them defines its own block of subordinate instructions,
|
||
|
closed with corresponding "end" command, and if these blocks are nested within
|
||
|
each other, it always must be a proper nesting - the inner block must always
|
||
|
be closed before the outer one. All control directives are therefore the
|
||
|
unconditional instructions - they are recognized even when they are inside
|
||
|
an otherwise skipped block.
|
||
|
The "postpone" is another control directive, which causes a block of
|
||
|
instructions to be assembled later, when all of the following source text
|
||
|
has already been processed.
|
||
|
|
||
|
dw final_count
|
||
|
postpone
|
||
|
final_count = counter
|
||
|
end postpone
|
||
|
counter = 0
|
||
|
|
||
|
The above sample postpones the definition of "final_count" symbol until the
|
||
|
entire source has been processed, so that it can access the final value of
|
||
|
"counter" variable.
|
||
|
The assembly of the source text that follows "postpone" includes the assembly
|
||
|
of any additional blocks declared with "postpone", therefore if there are
|
||
|
multiple such blocks, they are assembled in the reverse order. The one that
|
||
|
was declared last is assembled first when the end of the source text is reached.
|
||
|
When the "postpone" directive is provided with an argument consisting of
|
||
|
a single "?" character, it tells the assembler that the block contains
|
||
|
operations which should not affect any of the values defined in the main
|
||
|
source and thus the assembler may refrain from evaluating them until all
|
||
|
other values have been successfully resolved. Such blocks are processed
|
||
|
even later than the ones declared by "postpone" with no arguments. They
|
||
|
may be used to perform some finalizing tasks, like the computation of a
|
||
|
checksum of the assembled code.
|
||
|
The "irpv" is another repeating instruction and an iterator. It has just two
|
||
|
arguments, first being a name of parameter and second an identifier of
|
||
|
a variable. It iterates through all the stacked values of symbolic
|
||
|
variable, starting from the oldest one (this applies only to the values
|
||
|
defined earlier in the source).
|
||
|
|
||
|
var equ 1
|
||
|
var equ 2
|
||
|
var equ 3
|
||
|
var reequ 4
|
||
|
irpv param, var
|
||
|
db param
|
||
|
end irpv
|
||
|
|
||
|
In the above example there are three iterations, with values 1, 2, and 4.
|
||
|
"irpv" can effectively convert a value of symbolic variable into a parameter,
|
||
|
and this can be useful all by itself, because the symbolic variable is only
|
||
|
evaluated in the expressions inside the arguments of instructions (labeled or
|
||
|
not), while the parameters are preprocessed in the entire line before any
|
||
|
processing of command is started. This allows, for example, to redefine a
|
||
|
regular value that is linked by symbolic variable:
|
||
|
|
||
|
x = 1
|
||
|
var equ x
|
||
|
irpv symbol, var
|
||
|
indx %%
|
||
|
symbol = 2
|
||
|
break
|
||
|
end irpv
|
||
|
assert x = 2
|
||
|
|
||
|
The combination of "indx" and "break" was added to the above sample to limit
|
||
|
the iteration to the latest value of symbolic variable. In the next section
|
||
|
a better solution to the same problem will be presented.
|
||
|
When a variable passed to "irpv" has a value that is not symbolic, the
|
||
|
parameter is given a text that produces the same value upon computation. When
|
||
|
the value is a positive number, the parameter is replaced with its decimal
|
||
|
representation (similarly how the "%" parameter is processed), otherwise
|
||
|
the parameter is replaced with an identifier of a proxy symbol holding the
|
||
|
value from stack.
|
||
|
The "outscope" directive is available while any macroinstruction is processed,
|
||
|
and it modifies the command that follows in the same line. If the command causes
|
||
|
any parameters to be defined, they are created not in the context of currently
|
||
|
processed macroinstruction but in the context of the source text that called it.
|
||
|
|
||
|
macro irpv?! statement&
|
||
|
display 'IRPV wrapper'
|
||
|
esc outscope irpv statement
|
||
|
end macro
|
||
|
|
||
|
This allows not only to safely wrap some control directives in macroinstructions,
|
||
|
but also to create additional customized language constructions that define
|
||
|
parameters for a block of text. Because "outscope" needs to be present in the
|
||
|
text of a specific macroinstruction that requires it, it is recommended to use
|
||
|
it in conjunction with "esc" as in the example above, this ensures that it is
|
||
|
handled the same way even when the entire definition is put inside another
|
||
|
macroinstruction.
|
||
|
|
||
|
|
||
|
12. Matching parameters
|
||
|
|
||
|
The "match" is a control directive which causes its block of instructions to
|
||
|
be assembled only when the text specified by its second argument matches the
|
||
|
pattern given by the first one. A text is separated from a pattern with a comma
|
||
|
character, and it includes everything that follows this separator up to the end
|
||
|
of line.
|
||
|
Every special character (except for the "," and "=", which have a specific
|
||
|
meaning in the pattern) is matched literally - it must be paired with identical
|
||
|
token in the text. In the following example the content of the first block
|
||
|
is assembled, while the content of the second one is not.
|
||
|
|
||
|
match +,+
|
||
|
assert 1 ; positive match
|
||
|
end match
|
||
|
|
||
|
match +,-
|
||
|
assert 0 ; negative match
|
||
|
end match
|
||
|
|
||
|
The quoted strings are also matched literally, but name tokens in the pattern
|
||
|
are treated differently. Every name acts as a wildcard and can match any
|
||
|
sequence of tokens which is not empty. If the match is successful, the
|
||
|
parameters with such names are created, and each is assigned a value equal
|
||
|
to the text the wildcard was matched with.
|
||
|
|
||
|
match a[b], 100h[3]
|
||
|
dw a+b ; dw 100h+3
|
||
|
end match
|
||
|
|
||
|
A parameter name in pattern can have an extra "?" character attached to it
|
||
|
to indicate that it is a case-insensitive name.
|
||
|
The "=" character causes the token that follows it to be matched literally.
|
||
|
It allows to perform matching of name tokens, and also of special characters
|
||
|
that would otherwise have a different meaning, like "," or "=", or "?" following
|
||
|
a name.
|
||
|
|
||
|
match =a==a, a=8
|
||
|
db a ; db 8
|
||
|
end match
|
||
|
|
||
|
If "=" is followed by name token with "?" character attached to it, this
|
||
|
element is matched literally but in a case-insensitive way:
|
||
|
|
||
|
match =a?==a, A=8
|
||
|
db a ; db 8
|
||
|
end match
|
||
|
|
||
|
When there are many wildcards in the pattern, each consecutive one is matched
|
||
|
with as few tokens as possible and the last one takes what is left. If the
|
||
|
wildcards follow each other without any literally matched elements between
|
||
|
them, the first one is matched with just a single token, and the second one with
|
||
|
the remaining text:
|
||
|
|
||
|
match car cdr, 1+2+3
|
||
|
db car ; db 1
|
||
|
db cdr ; db +2+3
|
||
|
end match
|
||
|
|
||
|
In the above sample the matched text must contain at least two tokens, because
|
||
|
each wildcard needs at least one token to be not empty. In the next example
|
||
|
there are additional constraints, but the same general rules applies and the
|
||
|
first wildcard consumes as little as possible:
|
||
|
|
||
|
match first:rest, 1+2:3+4:5+6
|
||
|
db `first ; db '1+2'
|
||
|
db 13,10
|
||
|
db `rest ; db '3+4:5+6'
|
||
|
end match
|
||
|
|
||
|
While any whitespace next to a wildcard is ignored, the presence or
|
||
|
absence of whitespace between literally matched elements is meaningful.
|
||
|
If such elements have no whitespace between them, their counterparts must
|
||
|
contain no whitespace between them either. But if there is a whitespace
|
||
|
between elements in pattern, it places no constraints on the use of
|
||
|
whitespace in the corresponding text - it can be present of not.
|
||
|
|
||
|
match ++,++
|
||
|
assert 1 ; positive match
|
||
|
end match
|
||
|
|
||
|
match ++,+ +
|
||
|
assert 0 ; negative match
|
||
|
end match
|
||
|
|
||
|
match + +,++
|
||
|
assert 1 ; positive match
|
||
|
end match
|
||
|
|
||
|
match + +,+ +
|
||
|
assert 1 ; positive match
|
||
|
end match
|
||
|
|
||
|
The presence of whitespace in the text becomes required when the pattern
|
||
|
contains the "=" character followed by a whitespace:
|
||
|
|
||
|
match += +, ++
|
||
|
assert 0 ; negative match
|
||
|
end match
|
||
|
|
||
|
match += +, + +
|
||
|
assert 1 ; positive match
|
||
|
end match
|
||
|
|
||
|
The "match" command is analogous to "if" in that it allows to use the
|
||
|
"else" or "else match" to create a selection of blocks from which only one is
|
||
|
executed:
|
||
|
|
||
|
macro let param
|
||
|
match dest+==src, param
|
||
|
dest = dest + src
|
||
|
else match dest-==src, param
|
||
|
dest = dest - src
|
||
|
else match dest++, param
|
||
|
dest = dest + 1
|
||
|
else match dest--, param
|
||
|
dest = dest - 1
|
||
|
else match dest==src, param
|
||
|
dest = src
|
||
|
else
|
||
|
assert 0
|
||
|
end match
|
||
|
end macro
|
||
|
|
||
|
let x=3 ; x = 3
|
||
|
let x+=7 ; x = x + 7
|
||
|
let x++ ; x = x + 1
|
||
|
|
||
|
It is even possible to mix "if" and "match" conditions in a sequence of
|
||
|
"else" blocks. The entire construction must be closed with "end" command
|
||
|
corresponding to whichever of the two was used last:
|
||
|
|
||
|
macro record text
|
||
|
match any, text
|
||
|
recorded equ `text
|
||
|
else if RECORD_EMPTY
|
||
|
recorded equ ''
|
||
|
end if
|
||
|
end macro
|
||
|
|
||
|
The "match" is able to recognize symbolic variables and before the matching
|
||
|
is started, their identifiers in the text of the second argument are replaced
|
||
|
with corresponding values (just like they are replaced in the text that follows
|
||
|
the "equ" command):
|
||
|
|
||
|
var equ 2+3
|
||
|
|
||
|
match a+b, var
|
||
|
db a xor b
|
||
|
end match
|
||
|
|
||
|
This means that the "match" can be used instead of "irpv" to convert the
|
||
|
latest value of a symbolic variable to parameter. The sample from the previous
|
||
|
section, where "irpv" was used with "break" to perform just one iteration on
|
||
|
the last value, can be rewritten to use "match" instead:
|
||
|
|
||
|
x = 1
|
||
|
var equ x
|
||
|
match symbol, var
|
||
|
symbol = 2
|
||
|
end match
|
||
|
assert x = 2
|
||
|
|
||
|
The difference between them is that "irpv" would execute its block even for
|
||
|
an empty value, while in the case of "match" the "else" block would need to be
|
||
|
added to handle an empty text.
|
||
|
When the evaluation of symbolic variables in the matched text is undesirable,
|
||
|
a symbol created with "define" can be used as a proxy to preserve the text,
|
||
|
because the replacement is not recursive:
|
||
|
|
||
|
macro drop value
|
||
|
local temporary
|
||
|
define temporary value
|
||
|
match =A, temporary
|
||
|
db A
|
||
|
restore A
|
||
|
else
|
||
|
db value
|
||
|
end match
|
||
|
end macro
|
||
|
|
||
|
A equ 1
|
||
|
A equ 2
|
||
|
|
||
|
drop A
|
||
|
drop A
|
||
|
|
||
|
A concern could arise that "define" may modify the meaning of text by
|
||
|
equipping it with a local context. But when the value for "define" comes from
|
||
|
a parameter of macroinstruction (as in the above sample), it already carries
|
||
|
its original context and "define" does not alter it.
|
||
|
The "rawmatch" directive (with a synonym "rmatch") is very similar to "match",
|
||
|
but it operates on the raw text of the second argument. Not only it does not
|
||
|
evaluate the symbolic variables, but it also strips the text of any additional
|
||
|
context it could have carried.
|
||
|
|
||
|
struc has instruction
|
||
|
rawmatch text, instruction
|
||
|
namespace .
|
||
|
text
|
||
|
end namespace
|
||
|
end rawmatch
|
||
|
end struc
|
||
|
|
||
|
define x
|
||
|
x has a = 3
|
||
|
assert x.a = 3
|
||
|
|
||
|
In the above sample the identifier of "a" would be interpreted in the context
|
||
|
effective for the line calling the "has" macroinstruction if it was not
|
||
|
converted back into the raw text by "rmatch".
|
||
|
|
||
|
|
||
|
13. Output areas
|
||
|
|
||
|
The "org" instruction starts a new area of output. The content of such
|
||
|
area is written into the destination file next to the previous data, but the
|
||
|
addresses in the new area are based on the value specified in the argument to
|
||
|
"org". The area is closed automatically when the next one is started or when
|
||
|
the source ends.
|
||
|
|
||
|
org 100h
|
||
|
start: ; start = 100h
|
||
|
|
||
|
The "$" is a built-in symbol of expression class which is always equal to
|
||
|
the value of current address. Therefore definition of a constant with the value
|
||
|
specified by "$" symbol is equivalent to defining a label at the same point:
|
||
|
|
||
|
org 100h
|
||
|
start = $ ; start = 100h
|
||
|
|
||
|
The "$$" symbol is always equal to the base of current addressing space, so
|
||
|
in the area started with "org" it has the same value as the base address from
|
||
|
the argument of "org". The difference between "$" and "$$" is thus the current
|
||
|
position relative to the start of the area:
|
||
|
|
||
|
org 2000h
|
||
|
db 'Hello!'
|
||
|
size = $ - $$ ; size = 6
|
||
|
|
||
|
The "$@" symbol evaluates to the base address of current block of uninitialized
|
||
|
data. When there was no such data defined just before the current position,
|
||
|
this value is equal to "$", otherwise it is equal to "$" minus the length of
|
||
|
said data inside the current addressing space. Note that reserved data
|
||
|
no longer counts as such when it is followed by an initialized one.
|
||
|
The "section" instruction is similar to "org", but it additionally trims
|
||
|
all the reserved data that precedes it analogously to how the uninitialized
|
||
|
data is not written into output when it is at the end of file. The "section"
|
||
|
can therefore be followed by initialized data definitions without causing
|
||
|
the previously reserved data to be initialized with zeros and written into
|
||
|
output. In this sample only the first of the three reserved buffers is
|
||
|
actually converted into zeroed data and written into output, because it is
|
||
|
followed by some initialized data. The second one is trimmed because of the
|
||
|
"section", and the third one is cut off since it lies at the end of file:
|
||
|
|
||
|
data1 dw 1
|
||
|
buffer1 rb 10h ; zeroed and present in the output
|
||
|
|
||
|
org 400h
|
||
|
data dw 2
|
||
|
buffer2 rb 20h ; not in the output
|
||
|
|
||
|
section 1000h
|
||
|
data3 dw 3
|
||
|
buffer3 rb 30h ; not in the output
|
||
|
|
||
|
The "$%" is a built-in symbol equal to the offset within the output file at
|
||
|
which the initialized data would be generated if it was defined at this point.
|
||
|
The "$%%" symbol is the current offset within the output file. These two
|
||
|
values differ only when they are used after some data has been reserved -
|
||
|
the "$%" is then larger than "$%%" by the length of unitialized data which
|
||
|
would be generated into output if it was to be followed by some initialized
|
||
|
one.
|
||
|
|
||
|
db 'Hello!'
|
||
|
rb 4
|
||
|
position = $%% ; position = 6
|
||
|
next = $% ; next = 10
|
||
|
|
||
|
The values in the comments of the above sample assume that the source contains
|
||
|
no other instructions generating output.
|
||
|
The "virtual" creates a special output area which is not written into the main
|
||
|
output file. This kind of area must reside between the "virtual" and "end virtual"
|
||
|
commands, and after it is closed, the output generator comes back to the area it
|
||
|
was previously operating on, with position and address the same as there were just
|
||
|
before opening the "virtual" block. This allows also to nest the "virtual" blocks
|
||
|
within each other.
|
||
|
When "virtual" has no argument, the base address of this area is the same
|
||
|
as current address in the outer area. An argument to "virtual" can have a form
|
||
|
of "at" keyword followed by an expression defining the base address for the
|
||
|
enclosed area:
|
||
|
|
||
|
int dw 1234h
|
||
|
virtual at int
|
||
|
low db ?
|
||
|
high db ?
|
||
|
end virtual
|
||
|
|
||
|
Instead of or in addition to such argument, "virtual" can also be followed by
|
||
|
an "as" keyword and a string defining an extension of additional file where
|
||
|
the initialized content of the area is going to be stored at the end of
|
||
|
a successful assembly.
|
||
|
The "load" instruction defines the value of a variable by loading the string
|
||
|
of bytes from the data generated in an output area. It should be followed by
|
||
|
an identifier of symbol to define, then optionally the ":" character and a
|
||
|
number of bytes to load, then the "from" keyword and an address of the data
|
||
|
to load. This address can be specified in two modes. If it is simply a numeric
|
||
|
expression, it is an address within the current area. In that case the loaded
|
||
|
bytes must have already been generated, so it is only possible to load from the
|
||
|
space between "$$" and "$" addresses.
|
||
|
|
||
|
virtual at 100h
|
||
|
db 'abc'
|
||
|
load b:byte from 101h ; b = 'b'
|
||
|
end virtual
|
||
|
|
||
|
When the number of bytes is not specified, the length of loaded string is
|
||
|
determined by the size associated with address.
|
||
|
Another variant of "load" needs a special kind of label, which is created
|
||
|
with "::" instead of ":". Such label has a value that cannot be used directly,
|
||
|
but it can be used with "load" instruction to access the data of the area in
|
||
|
which this label has been defined. The address for "load" has then to be
|
||
|
specified as the area label followed by ":" and then the address within that
|
||
|
area:
|
||
|
|
||
|
virtual at 0
|
||
|
hex_digits::
|
||
|
db '0123456789ABCDEF'
|
||
|
end virtual
|
||
|
load a:byte from hex_digits:10 ; a = 'A'
|
||
|
|
||
|
This variant of "load" can access the data which is generated later, even
|
||
|
within the current area:
|
||
|
|
||
|
area::
|
||
|
db 'abc'
|
||
|
load sub:3 from area:$-2 ; sub = 'bcd'
|
||
|
db 'def'
|
||
|
|
||
|
The "store" instruction can modify already generated data in the output
|
||
|
area. It should be followed by a value (automatically converted to string
|
||
|
of bytes), then optionally the ":" character followed by a number of bytes
|
||
|
to write (when this setting is not present, the length of string is determined
|
||
|
by the size associated with address), then the "at" keyword and the address of
|
||
|
data to replace, in one of the same two modes as allowed by "load". However the
|
||
|
"store" is not allowed to modify the data that has not been generated yet, and
|
||
|
any area that has been touched by "store" becomes a variable area, forbidding
|
||
|
also the "load" to read a data from such area in advance.
|
||
|
The following example uses the combination of "load" and "store" to encrypt
|
||
|
the entire contents of the current area with a simple "xor" operation:
|
||
|
|
||
|
db "Text"
|
||
|
key = 7Bh
|
||
|
repeat $-$$
|
||
|
load a : byte from $$+%-1
|
||
|
store a xor key : byte at $$+%-1
|
||
|
end repeat
|
||
|
|
||
|
If the final data of an area that has been modified by "store" needs to be
|
||
|
read earlier in the source, it can be achieved by copying this data into
|
||
|
a different area that would not be constrained in such way. This is analogous
|
||
|
to defining a constant with a final value of some variable:
|
||
|
|
||
|
load char : byte from const:0
|
||
|
|
||
|
virtual
|
||
|
var::
|
||
|
db 'abc'
|
||
|
.length = $
|
||
|
end virtual
|
||
|
|
||
|
store 'A' : byte at var:0
|
||
|
|
||
|
virtual
|
||
|
const::
|
||
|
repeat var.length
|
||
|
load a : byte from var:%-1
|
||
|
db a
|
||
|
end repeat
|
||
|
end virtual
|
||
|
|
||
|
The area label can be forward-referenced by "load", but it can never be
|
||
|
forward-referenced by "store", even if it refers to the current output area.
|
||
|
The "virtual" instruction can have an existing area label as the only
|
||
|
argument. This variant allows to extend a previously defined and closed
|
||
|
block with additional data. The area label must refer to a block that was
|
||
|
created earlier in the source with "virtual". Any definition of data within
|
||
|
an extending block is going to have the same effect as if that definition was
|
||
|
present in the original "virtual" block.
|
||
|
|
||
|
virtual at 0 as 'log'
|
||
|
Log::
|
||
|
end virtual
|
||
|
|
||
|
virtual Log
|
||
|
db 'Hello!',13,10
|
||
|
end virtual
|
||
|
|
||
|
If an area label is used in an expression, it forms a variable term of a
|
||
|
linear polynomial. The metadata of such term is the base address of the area.
|
||
|
The metadata of an area label itself, accessible with "sizeof" operator,
|
||
|
is equal to the current length of data within the area.
|
||
|
There is an additional variant of "load" and "store" directives that allows
|
||
|
to read and modify already generated data in the output file given simply
|
||
|
an offset within that output. This variant is recognized when the "at" or
|
||
|
"from" keyword is followed by ":" character and then the value of an offset.
|
||
|
|
||
|
checksum = 0
|
||
|
repeat $%
|
||
|
load a : byte from : %-1
|
||
|
checksum = checksum + a
|
||
|
end repeat
|
||
|
|
||
|
The "restartout" instruction abandons all the output generated up to this
|
||
|
point and starts anew with an empty one. An optional argument may specify
|
||
|
the base address of newly started output area. When "restartout" has no
|
||
|
argument, the current address is preserved by using it as the base for the
|
||
|
new area.
|
||
|
The "org", "section" and "restartout" instructions cannot be used inside
|
||
|
a "virtual" block, they can only separate areas that go into the output file.
|
||
|
|
||
|
|
||
|
14. Source and output control
|
||
|
|
||
|
The "include" instruction reads the source text from another file and
|
||
|
processes it before proceeding further in the current source. Its argument
|
||
|
should be a string defining the path to a file (the format of the path may
|
||
|
depend on the operating system). If there is a "!" between the instruction
|
||
|
and the argument, the other file is read and processed unconditionally,
|
||
|
even when it is inside a skipped block (the unconditional instructions from
|
||
|
the other file may then get recognized).
|
||
|
|
||
|
include 'macro.inc'
|
||
|
|
||
|
An additional argument may be optionally added (separated from the path
|
||
|
by comma), and it is interpreted as a command to be executed after the file
|
||
|
has been read and inserted into the source stream, just before processing
|
||
|
the first line.
|
||
|
The "eval" instruction takes a sequence of bytes defined by its arguments,
|
||
|
treats it as a source text and assembles it. The arguments are either strings
|
||
|
or the numeric values of single bytes, separated with commas. In the next
|
||
|
example "eval" is used to generate definitions of symbols named as a
|
||
|
consecutive letters of the alphabet:
|
||
|
|
||
|
repeat 26
|
||
|
eval 'A'+%-1,'=',`%
|
||
|
end repeat
|
||
|
|
||
|
assert B = 2
|
||
|
|
||
|
The "display" instruction causes a sequence of bytes to be written into
|
||
|
standard output, next to the messages generated by the assembler. It should
|
||
|
be followed by strings or numeric values of single bytes, separated
|
||
|
with commas. The following example uses "repeat 1" to define a parameter
|
||
|
with a decimal representation of computed number, and then displays it as
|
||
|
a string:
|
||
|
|
||
|
macro show description,value
|
||
|
repeat 1, d:value
|
||
|
display description,`d,13,10
|
||
|
end repeat
|
||
|
end macro
|
||
|
|
||
|
show '2^64=',1 shl 64
|
||
|
|
||
|
The "err" instruction signalizes an error in the assembly process, with
|
||
|
a custom message specified by its argument. It allows the same kind of
|
||
|
arguments as the "display" directive.
|
||
|
|
||
|
if $>10000h
|
||
|
err 'segment too large'
|
||
|
end if
|
||
|
|
||
|
The "format" directive allows to set up additional options concerning
|
||
|
the main output. Currently the only available choice is "format binary" followed
|
||
|
by the "as" keyword and a string defining an extension for the output file.
|
||
|
Unless a name of the output file is specified from the command line, it is
|
||
|
constructed from the path to the main source file by dropping the extension and
|
||
|
attaching a new extension if such is defined.
|
||
|
|
||
|
format binary as 'com'
|
||
|
|
||
|
The "format" directive, analogously to "end", uses an identifier that follows
|
||
|
it to find an instruction in the child namespace of case-insensitive symbol
|
||
|
named "format". The only built-in instruction that resides in that namespace
|
||
|
is the "binary", but additional ones may be defined in form of macroinstructions.
|
||
|
The built-in symbol "__time__" (with legacy synonym "%t") has the constant value
|
||
|
of the timestamp marking the point in time when the assembly was started.
|
||
|
The "__file__" is a built-in symbol whose value is a string containing
|
||
|
the name of currently processed source file. The accompanying "__line__" symbol
|
||
|
provides the number of currently processed line in that file. When these symbols
|
||
|
are accessed within a macroinstruction, they keep the same value they had for the
|
||
|
calling line. If there are several levels of macroinstructions calling each
|
||
|
other, these symbols have the same value everywhere, corresponding to the line
|
||
|
that called the outermost macroinstruction.
|
||
|
The "__source__" is another built-in symbol, with value being a string containing
|
||
|
the name of the main source file.
|
||
|
The "retaincomments" directive switches the assembler to treat a semicolon as
|
||
|
a regular token and therefore not strip comments from lines before processing.
|
||
|
This allows to use semicolons in places like MATCH pattern.
|
||
|
|
||
|
retaincomments
|
||
|
macro ? line&
|
||
|
match instruction ; comment , line
|
||
|
virtual
|
||
|
comment
|
||
|
end virtual
|
||
|
instruction
|
||
|
else
|
||
|
line
|
||
|
end match
|
||
|
end macro
|
||
|
|
||
|
var dd ? ; bvar db ?
|
||
|
|
||
|
The "isolatelines" directive prevents the assembler from subsequently combining
|
||
|
lines read from the source text when the line break is preceded by a backslash.
|
||
|
The "removecomments" directive brings back the default behavior of semicolons
|
||
|
and the "combinelines" directive allows lines from the source text to be combined
|
||
|
as usual.
|
||
|
|
||
|
|
||
|
15. CALM instructions
|
||
|
|
||
|
The "calminstruction" directive allows to define new instructions in form of
|
||
|
compiled sequences of specialized commands. As opposed to regular macroinstructions,
|
||
|
which operate on a straightforward principle of textual substitution, CALM
|
||
|
(Compiled Assembly-Like Macro) instructions are able to perform many operations
|
||
|
without passing any text through the standard preprocessing and assembly cycle.
|
||
|
This allows for a finer control, better error handling and faster execution.
|
||
|
All references to symbols in the text defining a CALM instruction are fixed
|
||
|
at the time of definition. As a consequence, any symbols local to the CALM instruction
|
||
|
are shared among all its executed instances (for example consecutive instances may see
|
||
|
the values of local symbols left by the previous ones). To aid in reusing these
|
||
|
references, commands in CALM are generally operating on variables, routinely rewriting
|
||
|
the symbols with new values.
|
||
|
A "calminstruction" statement follows the same rules as "macro" declaration,
|
||
|
including options like "!" modifier to define unconditional instruction, "*" to mark
|
||
|
a required argument, ":" to give it a default value and "&" to indicate that
|
||
|
the final argument consumes all the remaining text in line.
|
||
|
However, because CALM instruction operates outside of the standard preprocessing
|
||
|
and assembly cycle, its arguments do not become preprocessed parameters. Instead
|
||
|
they are local symbolic variables, given new values every time the instruction is called.
|
||
|
If the name of defined instruction is preceded by another name enclosed in round
|
||
|
brackets, the statement defines a labeled instruction and enclosed name is the
|
||
|
argument that is going to receive the text of the label.
|
||
|
In the definition of CALM instruction, only the statements of its specialized
|
||
|
language are identified. The initial symbol of every line must be a simple name without
|
||
|
modifiers and it is only recognized as valid instruction if a case-insensitive symbol with
|
||
|
such name is found in the namespace of CALM commands (which, for the purpose
|
||
|
of customization, is accessible as the namespace anchored at the case-insensitive
|
||
|
"calminstruction" symbol). When no such named instruction is found, the initial name may
|
||
|
become a label if it is followed by ":", it is then treated as a case-sensitive symbol
|
||
|
belonging to a specialized class. Symbols of this class are only recognized when used
|
||
|
as arguments to CALM jump commands (described further down).
|
||
|
An "end calminstruction" statement needs to be used to close the definition and
|
||
|
bring back normal mode of assembly. It is not a regular "end" command,
|
||
|
but an identically named instruction in the CALM namespace, which only accepts
|
||
|
"calminstruction" as its argument.
|
||
|
The "assemble" is a command that takes a single argument, which should be
|
||
|
an identifier of a symbolic variable. The text of this variable is passed directly
|
||
|
to assembly, without any preprocessing (if the text came from an argument to
|
||
|
the instruction, it already went through preprocessing when that line was prepared).
|
||
|
|
||
|
calminstruction please? cmd&
|
||
|
assemble cmd
|
||
|
end calminstruction
|
||
|
|
||
|
please display 'Hi!'
|
||
|
|
||
|
The "match" command is in many ways similar to the standard directive with the same
|
||
|
name. Its first argument should be a pattern following the same rules as those for
|
||
|
"match" directive. The second argument must be an identifier of a symbolic variable,
|
||
|
whose text is going to be matched against the pattern. The name tokens in pattern
|
||
|
(except for the ones made literal with "=" symbol) are treated as names of variables
|
||
|
where the matched portions of text should be put if the match is successful. The same
|
||
|
variable that is a source of text can also be used in pattern as a variable
|
||
|
to write to. When there is no match, all variables remain unaffected.
|
||
|
|
||
|
calminstruction please? cmd&
|
||
|
match (cmd), cmd
|
||
|
assemble cmd
|
||
|
end calminstruction
|
||
|
|
||
|
please(display 'Hi!')
|
||
|
|
||
|
Whether the match was successful can also be tested with a conditional jump "jyes"
|
||
|
or "jno" following the "match" command. A "jyes" jump is taken only when the match
|
||
|
succeeded.
|
||
|
|
||
|
calminstruction please? cmd&
|
||
|
match =do? =not? cmd, cmd
|
||
|
jyes done
|
||
|
assemble cmd
|
||
|
done:
|
||
|
end calminstruction
|
||
|
|
||
|
please do not display 'Bye!'
|
||
|
|
||
|
To further control the flow of processing, the "jump" command allows to jump
|
||
|
unconditionally, and with "exit" it is possible to terminate processing of
|
||
|
CALM instruction at any moment (this command takes no arguments).
|
||
|
While the symbols used for the arguments of the instruction are implicitly local,
|
||
|
other identifiers may become fixed references to global symbols if they are seen
|
||
|
as accessible at the time of definition (because in CALM instruction all such references
|
||
|
are treated as uses, not as definitions). A command like "match" may then write to
|
||
|
a global variable.
|
||
|
|
||
|
define comment
|
||
|
|
||
|
calminstruction please? cmd&
|
||
|
match cmd //comment, cmd
|
||
|
assemble cmd
|
||
|
end calminstruction
|
||
|
|
||
|
please display 'Hi!' // 3
|
||
|
db comment ; db 3
|
||
|
|
||
|
To enforce treatment of a symbol as local, a "local" command should be used, followed
|
||
|
by one or more names separated with commas.
|
||
|
|
||
|
calminstruction please? cmd&
|
||
|
local comment
|
||
|
match cmd //comment, cmd
|
||
|
assemble cmd
|
||
|
end calminstruction
|
||
|
|
||
|
A symbol made local is initally assigned a defined but unusable value.
|
||
|
If a pattern in CALM instruction has a "?" character immediately following the name
|
||
|
of a wildcard, it does not affect how the symbol is identified (whether the used symbol
|
||
|
is case-insensitive depends on what is present in the local scope at the time
|
||
|
the instruction is defined). Instead, modifying the name of a wildcard with "?" allows it
|
||
|
to be matched with an empty text.
|
||
|
Since the source text for "match" is in this variant given by just a single identifier,
|
||
|
this syntax allows to have more optional arguments. A third argument to "match" may
|
||
|
contain a pair of bracket characters. Any wildcard element must then be matched with
|
||
|
a text that has this kind of brackets properly balanced.
|
||
|
|
||
|
calminstruction please? cmd&
|
||
|
local first, second
|
||
|
match first + second, cmd, ()
|
||
|
jyes split
|
||
|
assemble cmd
|
||
|
exit
|
||
|
split:
|
||
|
assemble first
|
||
|
assemble second
|
||
|
end calminstruction
|
||
|
|
||
|
please display 'H',('g'+2) + display '!'
|
||
|
|
||
|
The "arrange" command is like an inverse of "match", it can build up a text
|
||
|
containing the values of one or more symbolic variables. The first argument defines
|
||
|
a variable where the constructed text is going to be stored, while the second argument
|
||
|
is a pattern formed in the same way as for "match" (except that it does not need
|
||
|
to precede a comma with "=" to have it included in the argument).
|
||
|
All non-name tokens other than "=" and tokens preceded with "=" are copied literally
|
||
|
into the constructed text and they do not carry any recognition context with them.
|
||
|
The name tokens that are not made literal with "=" are treates as names of variables
|
||
|
whose symbolic values are put in their place into the constructed text.
|
||
|
|
||
|
calminstruction addr? arg
|
||
|
local base, index
|
||
|
match base[index], arg
|
||
|
local cmd
|
||
|
arrange cmd, =dd base + index
|
||
|
assemble cmd
|
||
|
end calminstruction
|
||
|
|
||
|
addr 8[5] ; dd 8 + 5
|
||
|
|
||
|
With suitably selected patterns, "arrange" can be used to copy symbolic value
|
||
|
from one variable to another or to assign it a fixed value (even an empty one).
|
||
|
If a variable used in pattern turns out to have a numeric value instead of symbolic,
|
||
|
as long as it is a non-negative number with no additional terms, it is converted
|
||
|
into a decimal token stored into the constructed symbolic value (an operation
|
||
|
that outside of CALM instructions would require use of a "repeat 1" trick):
|
||
|
|
||
|
digit = 4 - 1
|
||
|
|
||
|
calminstruction demo
|
||
|
local cmd
|
||
|
arrange cmd, =display digit#0h
|
||
|
assemble cmd
|
||
|
end calminstruction
|
||
|
|
||
|
demo ; display 3#0h
|
||
|
|
||
|
This is the only case when a non-symbolic value is converted to symbols that may be
|
||
|
put into text composed by "arrange", other types are not supported.
|
||
|
The "compute" command allows to evaluate expressions and assign numeric results to
|
||
|
variables. The first argument to "compute" defines a target where the result should
|
||
|
be stored, while the second argument can be any numeric expression, which is
|
||
|
becomes pre-compiled at the time of definition. When the expression is evaluated
|
||
|
and any of the symbols it refers to turns out to have symbolic value, this text
|
||
|
is parsed as a new sub-expression, and its calculated value is then used in the
|
||
|
computation of the main expression.
|
||
|
A "compute" therefore can be used not only to evaluate a pre-defined expression,
|
||
|
but also to parse and compute an expression from a text of a symbolic variable
|
||
|
(like one coming from an argument to the instruction), or a combination of both:
|
||
|
|
||
|
a = 0
|
||
|
|
||
|
calminstruction low expr*
|
||
|
compute a, expr and 0FFh
|
||
|
end calminstruction
|
||
|
|
||
|
low 200 + 73 ; a = 11h
|
||
|
|
||
|
Because symbolic variable is evaluated as a sub-expression, its use here has no
|
||
|
side-effects that would be caused by a straightforward text substitution.
|
||
|
The "check" command is analogous to "if". It evaluates a condition defined by
|
||
|
the logical expression that follows it and accordingly sets up the result flag which
|
||
|
may be tested with "jyes" or "jno" command. The values of symbolic variables
|
||
|
are treated as numeric sub-expressions (they may not contain any operators specific
|
||
|
to logical expression).
|
||
|
|
||
|
calminstruction u8range? value
|
||
|
check value >= 0 & value < 256
|
||
|
jyes ok
|
||
|
local cmd
|
||
|
arrange cmd, =err 'value out of range'
|
||
|
assemble cmd
|
||
|
ok:
|
||
|
end calminstruction
|
||
|
|
||
|
u8range -1
|
||
|
|
||
|
All commands that are not explicitly said to set the flag that is checked by "jyes"
|
||
|
and "jno", keep the value of this flag unchanged.
|
||
|
The "publish" command allows to assign a value to a symbol identified by the text
|
||
|
held in a variable. This allows to define a symbol with a name constructed with
|
||
|
a command like "arrange", or a name that was passed in an argument to an instruction.
|
||
|
The first argument needs to be the symbolic variable containing the identifier
|
||
|
of the symbol to define, the second argument should be the variable holding
|
||
|
the value to assign (either symbolic or numeric). The first argument may be
|
||
|
followed by ":" character to indicate that the symbol should be made constant,
|
||
|
or it can be preceded by ":" to make the value stacked on top of the previous one
|
||
|
(so that the previous one can be brought back with "restore" directive).
|
||
|
|
||
|
calminstruction constdefine? var
|
||
|
local val
|
||
|
arrange val,
|
||
|
match var= val, var
|
||
|
publish var:, val
|
||
|
end calminstruction
|
||
|
|
||
|
constdefine plus? +
|
||
|
|
||
|
The above instruction allows to define a symbolic constant, something that is not
|
||
|
possible with standard directives of the assembler.
|
||
|
The purpose of "transform" command is to replace identifiers of symbolic variables
|
||
|
(or constants) with their values in a given text, which is the same operation as done
|
||
|
by "equ" directive when it prepares the value to assign. The argument to "transform"
|
||
|
should be a symbolic variable whose value is going to be processed this way and then
|
||
|
replaced by the transformed text.
|
||
|
|
||
|
calminstruction (var) constequ? val
|
||
|
transform val
|
||
|
publish var:, val
|
||
|
end calminstruction
|
||
|
|
||
|
A "transform" command updates the result flag to indicate whether any replacement
|
||
|
has been done.
|
||
|
|
||
|
calminstruction prepasm? cmd&
|
||
|
loop:
|
||
|
transform cmd
|
||
|
jyes loop ; warning: may hang on cyclic references
|
||
|
assemble cmd
|
||
|
end calminstruction
|
||
|
|
||
|
The result flag is modified only by some of the commands, like "check", "match"
|
||
|
or "transform". Other commands keep it unchanged.
|
||
|
Optionally, "transform" can have two arguments, with second one specifying
|
||
|
a namespace. Identifiers in the text given by the first argument are then interpreted
|
||
|
as symbols in this namespace regardless of their original context.
|
||
|
The "stringify" is a command that converts text of a variable into a string
|
||
|
and writes it into the same variable, specified by the only argument. This operation
|
||
|
is similar to the one performed by "`" operator in preprocessing, but it produces
|
||
|
a value of string type, not a quoted string.
|
||
|
|
||
|
calminstruction (var) strcalc? val
|
||
|
compute val, val ; compute expression
|
||
|
arrange val, val ; convert result to a decimal token
|
||
|
stringify val ; convert decimal token to string
|
||
|
publish var, val
|
||
|
end calminstruction
|
||
|
|
||
|
p strcalc 1 shl 1000
|
||
|
display p
|
||
|
|
||
|
While most commands available to CALM instructions replace the values of variables
|
||
|
when writing to them, the "take" is a command that allows to work with stacks of values.
|
||
|
It removes the topmost value of the source symbol (specified by the second argument)
|
||
|
and gives it to the destination symbol (the first argument), placing it on top of any
|
||
|
existing values. The destination argument may be empty, in such case the value is
|
||
|
removed completely and the operation is analogous to "restore" directive. This command
|
||
|
updates the result flag to indicate whether there was any value to remove.
|
||
|
If the destination symbol is the same as source, the result flag can be used to check
|
||
|
whether there is an available value without affecting it.
|
||
|
|
||
|
calminstruction reverse? cmd&
|
||
|
local tmp, stack
|
||
|
collect:
|
||
|
match tmp=,cmd, cmd
|
||
|
take stack, tmp
|
||
|
jyes collect
|
||
|
execute:
|
||
|
assemble cmd
|
||
|
take cmd, stack
|
||
|
jyes execute
|
||
|
end calminstruction
|
||
|
|
||
|
reverse display '!', display 'i', display 'H'
|
||
|
|
||
|
A symbol accessed as either destination or source by a "take" command can never be
|
||
|
forward-referenced even if it could otherwise.
|
||
|
Defining macroinstructions in the namespace of case-insensitive "calminstruction"
|
||
|
allows to add customized commands to the language of CALM instructions. However,
|
||
|
they must be defined as case-insensitive to be recognized as such.
|
||
|
|
||
|
macro calminstruction?.asmarranged? variable*, pattern&
|
||
|
arrange variable, pattern
|
||
|
assemble variable
|
||
|
end macro
|
||
|
|
||
|
calminstruction writeln? text&
|
||
|
asmarranged text, =display text,10
|
||
|
end calminstruction
|
||
|
|
||
|
writeln 'Next!'
|
||
|
|
||
|
Such additional commands may even be defined as CALM instructions themselves:
|
||
|
|
||
|
calminstruction calminstruction?.initsym? variable*,value&
|
||
|
publish variable, value
|
||
|
end calminstruction
|
||
|
|
||
|
calminstruction show? text&
|
||
|
local command
|
||
|
initsym command, display text
|
||
|
stringify text
|
||
|
assemble command
|
||
|
end calminstruction
|
||
|
|
||
|
show :)
|
||
|
|
||
|
The command "initsym" in this example is used to assign text to the local
|
||
|
symbolic variable at the time when "show" instruction is defined.
|
||
|
Similarly to "local" (and unlike "stringify" and "assemble") it does not produce
|
||
|
any actual code that would be executed when the "show" instruction is called.
|
||
|
The arguments to "initsym" retain their original context, therefore symbols
|
||
|
in the text assigned to the "command" variable are interpreted as in the local
|
||
|
namespace of the "show" instruction. This allows the "display" command to access
|
||
|
the "text" even though it is local to the CALM instruction and therefore normally
|
||
|
visible only in the scope of the definition of "show". This is similar to the use
|
||
|
of "define" to form symbolic links.
|
||
|
The "call" command allows to directly execute another CALM instruction. Its
|
||
|
first argument must provide an identifier of an instruction-class symbol, and
|
||
|
at the execution time this symbol must be defined as CALM (it is not possible
|
||
|
to call a macroinstruction or a built-in instruction this way). The execution
|
||
|
then proceeds directly to the entry point of that instruction, and only returns
|
||
|
after the called instruction finishes.
|
||
|
|
||
|
define Msg display 'Hi'
|
||
|
|
||
|
calminstruction showMsg
|
||
|
assemble Msg
|
||
|
end calminstruction
|
||
|
|
||
|
calminstruction demo
|
||
|
call showMsg
|
||
|
arrange Msg, =display '!'
|
||
|
call showMsg
|
||
|
end calminstruction
|
||
|
|
||
|
demo
|
||
|
|
||
|
When looking up the instruction symbol, the assembler skips the local namespace
|
||
|
of the CALM instruction, as it is not expected to contain instruction definitions.
|
||
|
Additional arguments to "call" should be identifiers of variables (or constants)
|
||
|
whose values are going to be passed as arguments to the called instruction.
|
||
|
The values of these symbols are assigned directly to the argument variables,
|
||
|
without any additional validation - this allows to pass to the CALM instruction
|
||
|
some values that otherwise would be impossible to pass directly, like numeric ones
|
||
|
(because when the instructions are called normally, the arguments are treated as
|
||
|
text and assigned as symbolic values). An argument may be omitted when the definition
|
||
|
of called instruction allows it, in such case the default value for that argument
|
||
|
is used.
|
||
|
|
||
|
calminstruction hex_nibble digit*, command: display
|
||
|
compute digit, 0FFh and '0123456789ABCDEF' shr (digit*8)
|
||
|
arrange command, command digit
|
||
|
assemble command
|
||
|
end calminstruction
|
||
|
|
||
|
calminstruction display_hex_byte value: DATA
|
||
|
compute value, value
|
||
|
local digit
|
||
|
compute digit, (value shr 4) and 0Fh
|
||
|
call hex_nibble, digit
|
||
|
compute digit, value and 0Fh
|
||
|
call hex_nibble, digit
|
||
|
end calminstruction
|
||
|
|
||
|
DATA = 0xedfe
|
||
|
|
||
|
calminstruction demo
|
||
|
call display_hex_byte
|
||
|
compute DATA, DATA shr 8
|
||
|
call display_hex_byte
|
||
|
end calminstruction
|
||
|
|
||
|
demo
|
||
|
|
||
|
|
||
|
16. Assembly commands in CALM instructions
|
||
|
|
||
|
An additional sets of commands for CALM instructions makes it possible
|
||
|
to use them for more than just pre-processing, but also to directly generate
|
||
|
and process output. They perform elementary operations, mostly on a single
|
||
|
unit of data, but at the same time they can perform many calculations in-place,
|
||
|
because their arguments, with few exceptions, are pre-compiled expressions,
|
||
|
similar to the second argument to "compute".
|
||
|
The "display" command presents a string of bytes as a message in
|
||
|
the standard output, just like the regular directive with the same name.
|
||
|
It takes a single argument, an expression giving either a string or a numeric
|
||
|
value of a single byte.
|
||
|
The "err" command signalizes an error, analogously to its namesake in
|
||
|
base language. It takes a single argument, specifying a custom message to
|
||
|
present. The argument is expected to evaluate to string value.
|
||
|
The "emit" command generates data of length specified by the first argument
|
||
|
and the value specified by the second. Both arguments are treated as
|
||
|
pre-compiled expressions. The second argument is optional, if it is omitted,
|
||
|
the data of specified length is generated as uninitialized. When the second
|
||
|
argument is a string, it must fit within the specified size (a "lengthof"
|
||
|
operator may be useful in this case).
|
||
|
The "load" and "store" commands allow to inspect or modify values in the
|
||
|
already generated output or in the virtual blocks. While they are similar
|
||
|
to their counterparts in base language, they have a different syntax, both
|
||
|
always taking three comma-separated arguments. Unlike their cousins, they
|
||
|
do not operate on addresses associated with output areas, but on raw offsets.
|
||
|
To point to the first byte of an area, stated offset must be zero.
|
||
|
The arguments to "load" are, in order: target variable, offset to load from,
|
||
|
number of bytes to load. The first argument must be an identifier of a symbol,
|
||
|
the latter two are pre-compiled expressions. The second argument may contain
|
||
|
a label of the area, followed by ":" and then offset, or just a plain numeric
|
||
|
expression, in which case it is an offset within entire output generated
|
||
|
up to this point. The loaded value is always a string of the specified length.
|
||
|
The arguments to "store" are, in order: offset to store at, number of bytes
|
||
|
to store, the value to store (numeric or string). The last two arguments
|
||
|
are analogous to the arguments to "emit", at the same time the first two
|
||
|
arguments are like the last two arguments to "load". The offset may be
|
||
|
prepended with the label of an area with ":" as the separator.
|
||
|
To convert between the addresses used by classic "load" and "store" and
|
||
|
the raw offsets expected by the CALM commands, it suffices to add or subtract
|
||
|
the base address of the area. If the base address is not known, it can be
|
||
|
obtained with help of "1 metadataof" operator applied to an area label.
|