430 lines
23 KiB
Plaintext
430 lines
23 KiB
Plaintext
|
|
||
|
flat assembler
|
||
|
Symbolic information file format
|
||
|
|
||
|
|
||
|
Table 1 Header
|
||
|
/-------------------------------------------------------------------------\
|
||
|
| Offset | Size | Description |
|
||
|
|========|=========|======================================================|
|
||
|
| +0 | dword | Signature 1A736166h (little-endian). |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +4 | byte | Major version of flat assembler. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +5 | byte | Minor version of flat assembler. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +6 | word | Length of header. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +8 | dword | Offset of input file name in the strings table. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +12 | dword | Offset of output file name in the strings table. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +16 | dword | Offset of strings table. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +20 | dword | Length of strings table. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +24 | dword | Offset of symbols table. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +28 | dword | Length of symbols table. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +32 | dword | Offset of preprocessed source. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +36 | dword | Length of preprocessed source. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +40 | dword | Offset of assembly dump. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +44 | dword | Length of assembly dump. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +48 | dword | Offset of section names table. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +52 | dword | Length of section names table. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +56 | dword | Offset of symbol references dump. |
|
||
|
|--------|---------|------------------------------------------------------|
|
||
|
| +60 | dword | Length of symbol references dump. |
|
||
|
\-------------------------------------------------------------------------/
|
||
|
|
||
|
Notes:
|
||
|
|
||
|
If header is shorter than 64 bytes, it comes from a version that does not
|
||
|
support dumping some of the structures. It should then be interpreted
|
||
|
that the data for missing structures could not be provided, not that the
|
||
|
size of that data is zero.
|
||
|
|
||
|
Offsets given in header generally mean positions in the file, however
|
||
|
input and output file names are specified by offsets in the strings table,
|
||
|
so you have to add their offset to the offset of strings table to obtain
|
||
|
the positions of those strings in the file.
|
||
|
|
||
|
The strings table contains just a sequence of ASCIIZ strings, which may
|
||
|
be referred to by other parts of the file. It contains the names of
|
||
|
main input file, the output file, and the names of the sections and
|
||
|
external symbols if there were any.
|
||
|
|
||
|
The symbols table is an array of 32-byte structures, each one in format
|
||
|
specified by table 2.
|
||
|
|
||
|
The preprocessed source is a sequence of preprocessed lines, each one
|
||
|
in format as defined in table 3.
|
||
|
|
||
|
The assembly dump contains an array of 28-byte structures, each one in
|
||
|
format specified by table 4, and at the end of this array an additional
|
||
|
double word containing the offset in output file at which the assembly
|
||
|
was ended.
|
||
|
|
||
|
It is possible that file does not contain assembly dump at all - this
|
||
|
happens when some error occured and only the preprocessed source was
|
||
|
dumped. If error occured during the preprocessing, only the source up to
|
||
|
the point of error is provided. In such case (and only then) the field
|
||
|
at offset 44 contains zero.
|
||
|
|
||
|
The section names table exists only when the output format was an object
|
||
|
file (ELF or COFF), and it is an array of 4-byte entries, each being an
|
||
|
offset of the name of the section in the strings table.
|
||
|
The index of section in this table is the same, as the index of section
|
||
|
in the generated object file.
|
||
|
|
||
|
The symbol references dump contains an array of 8-byte structures, each
|
||
|
one describes an event of some symbol being used. The first double word
|
||
|
of such structure contains an offset of symbol in the symbols table,
|
||
|
and the second double word is an offset of structure in assembly dump,
|
||
|
which specifies at what moment the symbol was referenced.
|
||
|
|
||
|
|
||
|
Table 2 Symbol structure
|
||
|
/-------------------------------------------------------------------------\
|
||
|
| Offset | Size | Description |
|
||
|
|========|=======|========================================================|
|
||
|
| +0 | qword | Value of symbol. |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +8 | word | Flags (table 2.1). |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +10 | byte | Size of data labelled by this symbol (zero means plain |
|
||
|
| | | label without size attached). |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +11 | byte | Type of value (table 2.2). Any value other than zero |
|
||
|
| | | means some kind of relocatable symbol. |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +12 | dword | Extended SIB, the first two bytes are register codes |
|
||
|
| | | and the second two bytes are corresponding scales. |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +16 | word | Number of pass in which symbol was defined last time. |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +18 | word | Number of pass in which symbol was used last time. |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +20 | dword | If the symbol is relocatable, this field contains |
|
||
|
| | | information about section or external symbol, to which |
|
||
|
| | | it is relative - otherwise this field has no meaning. |
|
||
|
| | | When the highest bit is cleared, the symbol is |
|
||
|
| | | relative to a section, and the bits 0-30 contain |
|
||
|
| | | the index (starting from 1) in the table of sections. |
|
||
|
| | | When the highest bit is set, the symbol is relative to |
|
||
|
| | | an external symbol, and the bits 0-30 contain the |
|
||
|
| | | the offset of the name of this symbol in the strings |
|
||
|
| | | table. |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +24 | dword | If the highest bit is cleared, the bits 0-30 contain |
|
||
|
| | | the offset of symbol name in the preprocessed source. |
|
||
|
| | | This name is a pascal-style string (byte length |
|
||
|
| | | followed by string data). |
|
||
|
| | | Zero in this field means an anonymous symbol. |
|
||
|
| | | If the highest bit is set, the bits 0-30 contain the |
|
||
|
| | | offset of the symbol name in the strings table, and |
|
||
|
| | | this name is a zero-ended string in this case (as are |
|
||
|
| | | all the strings there). |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +28 | dword | Offset in the preprocessed source of line that defined |
|
||
|
| | | this symbol (see table 3). |
|
||
|
\-------------------------------------------------------------------------/
|
||
|
|
||
|
|
||
|
Table 2.1 Symbol flags
|
||
|
/-----------------------------------------------------------------\
|
||
|
| Bit | Value | Description |
|
||
|
|=====|=======|===================================================|
|
||
|
| 0 | 1 | Symbol was defined. |
|
||
|
|-----|-------|---------------------------------------------------|
|
||
|
| 1 | 2 | Symbol is an assembly-time variable. |
|
||
|
|-----|-------|---------------------------------------------------|
|
||
|
| 2 | 4 | Symbol cannot be forward-referenced. |
|
||
|
|-----|-------|---------------------------------------------------|
|
||
|
| 3 | 8 | Symbol was used. |
|
||
|
|-----|-------|---------------------------------------------------|
|
||
|
| 4 | 10h | The prediction was needed when checking |
|
||
|
| | | whether the symbol was used. |
|
||
|
|-----|-------|---------------------------------------------------|
|
||
|
| 5 | 20h | Result of last predicted check for being used. |
|
||
|
|-----|-------|---------------------------------------------------|
|
||
|
| 6 | 40h | The prediction was needed when checking |
|
||
|
| | | whether the symbol was defined. |
|
||
|
|-----|-------|---------------------------------------------------|
|
||
|
| 7 | 80h | Result of last predicted check for being defined. |
|
||
|
|-----|-------|---------------------------------------------------|
|
||
|
| 8 | 100h | The optimization adjustment is applied to |
|
||
|
| | | the value of this symbol. |
|
||
|
|-----|-------|---------------------------------------------------|
|
||
|
| 9 | 200h | The value of symbol is negative number encoded |
|
||
|
| | | as two's complement. |
|
||
|
|-----|-------|---------------------------------------------------|
|
||
|
| 10 | 400h | Symbol is a special marker and has no value. |
|
||
|
\-----------------------------------------------------------------/
|
||
|
|
||
|
Notes:
|
||
|
|
||
|
Some of those flags are listed here just for completness, as they
|
||
|
have little use outside of the flat assembler. However the bit 0
|
||
|
is important, because the symbols table contains all the labels
|
||
|
that occured in source, even if some of them were in the
|
||
|
conditional blocks that did not get assembled.
|
||
|
|
||
|
|
||
|
Table 2.2 Symbol value types
|
||
|
/-------------------------------------------------------------------\
|
||
|
| Value | Description |
|
||
|
|=======|===========================================================|
|
||
|
| 0 | Absolute value. |
|
||
|
|-------|-----------------------------------------------------------|
|
||
|
| 1 | Relocatable segment address (only with MZ output). |
|
||
|
|-------|-----------------------------------------------------------|
|
||
|
| 2 | Relocatable 32-bit address. |
|
||
|
|-------|-----------------------------------------------------------|
|
||
|
| 3 | Relocatable relative 32-bit address (value valid only for |
|
||
|
| | symbol used in the same place where it was calculated, |
|
||
|
| | it should not occur in the symbol structure). |
|
||
|
|-------|-----------------------------------------------------------|
|
||
|
| 4 | Relocatable 64-bit address. |
|
||
|
|-------|-----------------------------------------------------------|
|
||
|
| 5 | [ELF only] GOT-relative 32-bit address. |
|
||
|
|-------|-----------------------------------------------------------|
|
||
|
| 6 | [ELF only] 32-bit address of PLT entry. |
|
||
|
|-------|-----------------------------------------------------------|
|
||
|
| 7 | [ELF only] Relative 32-bit address of PLT entry (value |
|
||
|
| | valid only for symbol used in the same place where it |
|
||
|
| | was calculated, it should not occur in the symbol |
|
||
|
| | structure). |
|
||
|
\-------------------------------------------------------------------/
|
||
|
|
||
|
Notes:
|
||
|
|
||
|
The types 3 and 7 should never be encountered in the symbols dump,
|
||
|
they are only used internally by the flat assembler.
|
||
|
|
||
|
If type value is a negative number, it is an opposite of a value
|
||
|
from this table and it means that the symbol of a given type has
|
||
|
been negated.
|
||
|
|
||
|
|
||
|
Table 2.3 Register codes for extended SIB
|
||
|
/------------------\
|
||
|
| Value | Register |
|
||
|
|=======|==========|
|
||
|
| 23h | BX |
|
||
|
|-------|----------|
|
||
|
| 25h | BP |
|
||
|
|-------|----------|
|
||
|
| 26h | SI |
|
||
|
|-------|----------|
|
||
|
| 27h | DI |
|
||
|
|-------|----------|
|
||
|
| 40h | EAX |
|
||
|
|-------|----------|
|
||
|
| 41h | ECX |
|
||
|
|-------|----------|
|
||
|
| 42h | EDX |
|
||
|
|-------|----------|
|
||
|
| 43h | EBX |
|
||
|
|-------|----------|
|
||
|
| 44h | ESP |
|
||
|
|-------|----------|
|
||
|
| 45h | EBP |
|
||
|
|-------|----------|
|
||
|
| 46h | ESI |
|
||
|
|-------|----------|
|
||
|
| 47h | EDI |
|
||
|
|-------|----------|
|
||
|
| 48h | R8D |
|
||
|
|-------|----------|
|
||
|
| 49h | R9D |
|
||
|
|-------|----------|
|
||
|
| 4Ah | R10D |
|
||
|
|-------|----------|
|
||
|
| 4Bh | R11D |
|
||
|
|-------|----------|
|
||
|
| 4Ch | R12D |
|
||
|
|-------|----------|
|
||
|
| 4Dh | R13D |
|
||
|
|-------|----------|
|
||
|
| 4Eh | R14D |
|
||
|
|-------|----------|
|
||
|
| 4Fh | R15D |
|
||
|
|-------|----------|
|
||
|
| 80h | RAX |
|
||
|
|-------|----------|
|
||
|
| 81h | RCX |
|
||
|
|-------|----------|
|
||
|
| 82h | RDX |
|
||
|
|-------|----------|
|
||
|
| 83h | RBX |
|
||
|
|-------|----------|
|
||
|
| 84h | RSP |
|
||
|
|-------|----------|
|
||
|
| 85h | RBP |
|
||
|
|-------|----------|
|
||
|
| 86h | RSI |
|
||
|
|-------|----------|
|
||
|
| 87h | RDI |
|
||
|
|-------|----------|
|
||
|
| 88h | R8 |
|
||
|
|-------|----------|
|
||
|
| 89h | R9 |
|
||
|
|-------|----------|
|
||
|
| 8Ah | R10 |
|
||
|
|-------|----------|
|
||
|
| 8Bh | R11 |
|
||
|
|-------|----------|
|
||
|
| 8Ch | R12 |
|
||
|
|-------|----------|
|
||
|
| 8Dh | R13 |
|
||
|
|-------|----------|
|
||
|
| 8Eh | R14 |
|
||
|
|-------|----------|
|
||
|
| 8Fh | R15 |
|
||
|
|-------|----------|
|
||
|
| 94h | EIP |
|
||
|
|-------|----------|
|
||
|
| 98h | RIP |
|
||
|
\------------------/
|
||
|
|
||
|
|
||
|
Table 3 Preprocessed line
|
||
|
/--------------------------------------------------------------------------\
|
||
|
| Offset | Size | Value |
|
||
|
|========|=================================================================|
|
||
|
| +0 | dword | When the line was loaded from source, this field |
|
||
|
| | | contains either zero (if it is the line from the main |
|
||
|
| | | input file), or an offset inside the preprocessed |
|
||
|
| | | source to the name of file, from which this line was |
|
||
|
| | | loaded (the name of file is zero-ended string). |
|
||
|
| | | When the line was generated by macroinstruction, this |
|
||
|
| | | field contains offset inside the preprocessed source to |
|
||
|
| | | the pascal-style string specifying the name of |
|
||
|
| | | macroinstruction, which generated this line. |
|
||
|
|--------|-------|---------------------------------------------------------|
|
||
|
| +4 | dword | Bits 0-30 contain the number of this line. |
|
||
|
| | | If the highest bit is zeroed, this line was loaded from |
|
||
|
| | | source. |
|
||
|
| | | If the highest bit is set, this line was generated by |
|
||
|
| | | macroinstruction. |
|
||
|
|--------|-------|---------------------------------------------------------|
|
||
|
| +8 | dword | If the line was loaded from source, this field contains |
|
||
|
| | | the position of the line inside the source file, from |
|
||
|
| | | which it was loaded. |
|
||
|
| | | If line was generated by macroinstruction, this field |
|
||
|
| | | contains the offset of preprocessed line, which invoked |
|
||
|
| | | the macroinstruction. |
|
||
|
| | | If line was generated by instantaneous macro, this |
|
||
|
| | | field is equal to the next one. |
|
||
|
|--------|-------|---------------------------------------------------------|
|
||
|
| +12 | dword | If the line was generated by macroinstruction, this |
|
||
|
| | | field contains offset of the preprocessed line inside |
|
||
|
| | | the definition of macro, from which this one was |
|
||
|
| | | generated. |
|
||
|
|--------|-------|---------------------------------------------------------|
|
||
|
| +16 | ? | The tokenized contents of line. |
|
||
|
\--------------------------------------------------------------------------/
|
||
|
|
||
|
Notes:
|
||
|
|
||
|
To determine, whether this is the line loaded from source, or generated by
|
||
|
macroinstruction, you need to check the highest bit of the second double
|
||
|
word.
|
||
|
|
||
|
The contents of line is no longer a text, which it was in source file,
|
||
|
but a sequence of tokens, ended with a zero byte.
|
||
|
Any chain of characters that aren't special ones, separated from other
|
||
|
similar chains with spaces or some other special characters, is converted
|
||
|
into symbol token. The first byte of this element has the value of 1Ah,
|
||
|
the second byte is the count of characters, followed by this amount of
|
||
|
bytes, which build the symbol.
|
||
|
Some characters have a special meaning, and cannot occur inside the
|
||
|
symbol, they split the symbols and are converted into separate tokens.
|
||
|
For example, if source contains this line of text:
|
||
|
|
||
|
mov ax,4
|
||
|
|
||
|
preprocessor converts it into the chain of bytes, shown here with their
|
||
|
hexadecimal values (characters corresponding to some of those values are
|
||
|
placed below the hexadecimal codes):
|
||
|
|
||
|
1A 03 6D 6F 76 1A 02 61 78 2C 1A 01 34 00
|
||
|
m o v a x , 4
|
||
|
|
||
|
The third type of token that can be found in preprocessed line is the
|
||
|
quoted text. This element is created from chain of any bytes other than
|
||
|
line breaks that are placed between the single or double quotes in the
|
||
|
original text. First byte of such element is always 22h, it is followed
|
||
|
by double word which specifies the number of bytes that follow, and the
|
||
|
value of quoted text comes next. For example, this line from source:
|
||
|
|
||
|
mov eax,'ABCD'
|
||
|
|
||
|
is converted into (the notation used is the same as in previous sample):
|
||
|
|
||
|
1A 03 6D 6F 76 1A 03 65 61 78 2C 22 04 00 00 00 41 42 43 44 00
|
||
|
m o v e a x , A B C D
|
||
|
|
||
|
This data defines two symbols followed by symbol character, quoted text
|
||
|
and zero byte that marks end of line.
|
||
|
There is also a special case of symbol token with first byte having the
|
||
|
value 3Bh instead of 1Ah, such symbol means that all the line elements
|
||
|
that follow, including this one, have already been interpreted by
|
||
|
preprocessor and are ignored by assembler.
|
||
|
|
||
|
|
||
|
Table 4 Row of the assembly dump
|
||
|
/-------------------------------------------------------------------------\
|
||
|
| Offset | Size | Description |
|
||
|
|========|=======|========================================================|
|
||
|
| +0 | dword | Offset in output file. |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +4 | dword | Offset of line in preprocessed source. |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +8 | qword | Value of $ address. |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +16 | dword | Extended SIB for the $ address, the first two bytes |
|
||
|
| | | are register codes and the second two bytes are |
|
||
|
| | | corresponding scales. |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +20 | dword | If the $ address is relocatable, this field contains |
|
||
|
| | | information about section or external symbol, to which |
|
||
|
| | | it is relative - otherwise this field is zero. |
|
||
|
| | | When the highest bit is cleared, the address is |
|
||
|
| | | relative to a section, and the bits 0-30 contain |
|
||
|
| | | the index (starting from 1) in the table of sections. |
|
||
|
| | | When the highest bit is set, the address is relative |
|
||
|
| | | to an external symbol, and the bits 0-30 contain the |
|
||
|
| | | the offset of the name of this symbol in the strings |
|
||
|
| | | table. |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +24 | byte | Type of $ address value (as in table 2.2). |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +25 | byte | Type of code - possible values are 16, 32, and 64. |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +26 | byte | If the bit 0 is set, then at this point the assembly |
|
||
|
| | | was taking place inside the virtual block, and the |
|
||
|
| | | offset in output file has no meaning here. |
|
||
|
| | | If the bit 1 is set, the line was assembled at the |
|
||
|
| | | point, which was not included in the output file for |
|
||
|
| | | some other reasons (like inside the reserved data at |
|
||
|
| | | the end of section). |
|
||
|
|--------|-------|--------------------------------------------------------|
|
||
|
| +27 | byte | The higher bits of value of $ address. |
|
||
|
\-------------------------------------------------------------------------/
|
||
|
|
||
|
|
||
|
Notes:
|
||
|
|
||
|
Each row of the assembly dump informs, that the given line of preprocessed
|
||
|
source was assembled at the specified address (defined by its type, value
|
||
|
and the extended SIB) and at the specified position in output file.
|