refrences

This commit is contained in:
2026-02-19 16:16:24 -05:00
parent 3ce2977f01
commit 2d43f1711c
90 changed files with 30482 additions and 1 deletions

View File

@@ -0,0 +1,90 @@
# 20070910 - 2 4th | !2 4th
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20070910.html
# 20070910 - 2 4th | !2 4th
![](20070910-A.jpg)
After just finishing up the last show of the season for my Photography Business (lighting shot above from Monument Valley),
I will again soon have time to really push to finish Atom.
Besides working on optimization of the engine, I've flushed out the scripting language I'm going to use as a rapid development tool to produce the game.
Since I'm going solo on this endeavor, I'm fully taking advantage of the fact that I can deviate from the common C/C++ standard which would be a practical requirement of working in a large team.
One notable, successful, and practical use of non-C like languages for game development is the [Game Object Assembly Lisp (GOAL)](http://www.gamasutra.com/features/20020710/white_02.htm)
used to build Jax and Daxter for the PS2. With inline assembly and runtime code generation,
looks like they found a perfect mix of both a very high and very low level programming language.
**Forth**
Similar in concept to GOAL, I've decided upon a Color Forth like rapid development programming tool for Atom which has the following advantages.
1. Run-time x86/x86-64 binary code generation. Can code the game from inside the game itself.
2. Ability to natively mix low level assembly, mid-level systems programming, and high level scripting all in the same source code.
Full forth like factoring of code at any level.
3. Compiler which has no need of lexical analysis, preprocessing, parsing, or semantic analysis.
Code is in a binary form with near direct factored translation to machine code.
So full re-compile of the entire source base in less than a second, from within the game itself.
4. Built-in in-game console, code/data editor, scripting, and debug tool. Editor works somewhat like a mix between a console, text editor, spreadsheet, and a debugger.
5. Data mixed in code in native binary form. No need for serialization code, simply save the program's code and the program's data is "serialized" as it is saved to disk.
6. Insane ease of use, code is color syntax highlighted, tab completion, multiple views, auto lookup and view of code/data definitions/values as cursor moves from word to word, and more.
7. And most importantly, it is amazingly simple. Current prototype of the built-in x86/SSE assembler is less than 4K of code/data.
I'll be posting more as I work on this production tool.
For those who know about Color Forth here is a quick preview. I've deviated quite far.
For instance my dictionary is embedded in the code itself, and as you edit (moving the memory locations of words around),
dictionary pointers are automatically re-linked. So the dictionary is direct, there is no need for hashing or searching to lookup a word.
My cell size is 16 bytes (supports SSE vectors) for the stacks and the dictionary, but source code is still in 32-bit tokens.
I'm supporting floating point single and double precision. Source code is similar to Color Forth, with many exceptions.
I've split the tag bits into 2 blocks. So code tokens still have a "color" tag, but data is embedded directly in code tokens without any tag bits.
In order to know if the source token is code or just raw data, the shadows blocks now contain extra tags fields which note code vs data and if data,
the type of the data (text, bytes, words, float, double, vector etc). So shadow blocks no longer contain comments,
instead they contain tag bits and are used to encode the dictionary directly (holding the strings to name words).
More later ... but for now, here is a shot as I was prototyping the language, prior to deciding to embed the dictionary in the source code.
**Of course, I've already changed all this but it will give you an idea.**
At this point in the development the colors were as follows,
1. DEFINE-TOKEN: Copy address of this token into dictionary word indexed by tag index field.
2. DEFINE-COMPILED: Copy memory stack top pointer into dictionary word indexed by tag index field.
3. DEFINE-VARIABLE: Copy data from token (or vector of 2 or 4 tokens based on tag type field) into dictionary word indexed by tag index field.
4. COMMENT: If tag is not set to define then tag index field is a comment.
5. JUMP: Dictionary word indexed by token is a pointer to a code token, jump interpreter to that point.
6. INTERPRET: Dictionary word indexed by token is a pointer to a code token, push current interpreter position on interpreter return stack,
then jump interpreter to that point.
7. EXECUTE: Dictionary word indexed by token is a function pointer, call that function.
8. COMPILE: Dictionary word indexed by token is a function pointer, compile a call to that function at the current memory stack top.
Advance the memory stack top to point to after the compiled CALL opcode.
9. PUSH-TOKEN: Push data from token (or vector of 2 or 4 tokens based on tag type field) into data stack.
10. PUSH-WORD: Push cell value from dictionary word indexed by token onto data stack.
11. POP-WORD: Pop cell top of data stack into dictionary word indexed by token.
12. IGNORE: Ignore token. Used for extra comments, and defining data.
The shot shows part of the assembler code block. If you read enough to decode it, you will notice that I was assembling with interpreted instead of compiled code.
Click on the image to see the full size version which is actually readable...
[![](20070910-B.jpg)](20070910-B.jpg)
**Interested in "Moore" Forth?**
Check out the evolution of Forth and the work of Chuck Moore (Forth's Inventor), specifically
[Color Forth](http://www.colorforth.com/cf.html) and
[Jeff Fox's](http://www.ultratechnology.com/dindex.htm) writings.
Moore's ideas are simply revolutionary, asynchronous massively multi-processing ultra-low power (24 billion operations / second at 450mW, yes thats mW!)
[18-bit forth based processor](http://www.intellasys.net/products/index.php?target=seaforth/SEAforth24A.txt)
designed for embedded DSP and more.

View File

@@ -0,0 +1,31 @@
# 20070912 - The Making of a Font
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20070912.html
# 20070912 - The Making of a Font
Yes the font looks tiny on the web, but at its 640x480 target resolution it is 100% readable.
*(Lost Image)*
For the Atom Forth Editor I needed a raster font which met a very specific set of constraints.
* The font must work perfect at 640x480.* It must support 64 characters in 640 pixels wide.* It must also support 9 characters in 40 pixels wide.* It must be readable when drawn over the same color background.
The editor is displaying 16 32-bit tokens per line, with each line containing 2 sub-lines. The top sub-line shows 4 characters, and the bottom shows data, which can be formated with up to 9 characters. So the font was built exactly for these guidelines. I also went with lowercase as just smaller uppercase letters, in order to remove the need for any descending pixels, and thus pack in more lines per screen.
**Building the Font**
For the font texture I'm using the Red channel as a font shadow mask, and the Green channel as the font, hence the red background and green lettering at the top. The extra spacing is so that texture interpolation works correctly. In order to have anti-aliasing and font features on exact pixel boundaries for great readability, I started with drawing an aliased version of the font at its native size. Then nearest-up-sized to 4x its original size, and cleaned the bitmap. Then finally bilinear downsized to the native resolution to have perfect anti-aliasing and readability.
**Why 640x480?**
Ever watch a HDTV broadcast of a home Celtics basketball game? The MPEG artifacting during any motion is horrendous, especially on the wood grain of the court. It is so bad and distracting, I'd rather watch it on regular TV or even TV resolution DirectTV. The point? Technology is pushing huge resolution, but image quality is suffering. As like HDTV, what is the point of HD games with such bad aliasing problems? Personally I prefer a perfectly anti-aliased 640x480. So for Atom I am targeting 640x480 as the midrange resolution. Sure 1024 and larger if you have a fast GPU, and also 512 and smaller if you have a really slow GPU. But regardless of the resolution, Atom is always going to have perfect anti-aliasing.

View File

@@ -0,0 +1,32 @@
# 20070915 - Building the Chicken Without an Egg
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20070915.html
# 20070915 - Building the Chicken Without an Egg
The classic Chicken and Egg problem, I really did want to write the editor for Atom Forth in the Atom Forth language, but atlas, it is hard to edit a binary source format without the editor. My solution was to get the editor up quickly using C, and add some "syscall" hooks in Atom Forth for controlling the editor. Later when I have the time, I will port the editor over to Atom Forth.
**Random Progress**
Just finished up the display code, screen shot below shows formatting of completely random data. Got the initial coloring somewhat where I want it. Already moved on to the editing code. Soon this will actually be usable. With the help of objdump -d and gas wrote the initial interpreter, using the dis-assembly to grab the byte stream for various interpreted opcodes like,
```
0000000000400745 SAdd:
400745: 48 03 47 00 :: add 0x0(%rdi),%rax
400749: 48 83 c7 f0 :: add $0xfffffffffffffff0,%rdi
40074d: c3 :: retq
```
Also this is my first time really getting into the binary opcode details of the 64bit extension to the x86 instruction set for my assembler and the differences between the C ABI between Linux and Windows for my C "syscall" interface. I've noticed Microsoft made a subtle but important optimization error in their C calling convention concerning how SSE values are saved and restored during function calls. The problem is that they have SSE regs xmm6-xmm15 as callee saved. Of course the callee doesn't know the type of data in the SSE regs, it could be scaler float, scaler double, vector float, vector double, and vector integer, all of which have separate load and store opcodes. On the Core Duo 2 and newer Intel processors, there is a performance penalty for using instructions which don't match the type of data currently in the SSE regs. This is caused by the processor having to do an extra internal conversion (probably because the register file is not stored in exact binary format, i.e. different radix between int and float for low latency ALU operations).

View File

@@ -0,0 +1,10 @@
# 20070919 - Editor Working
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20070919.html
# 20070919 - Editor Working
![](20070919.jpg)
Got the editor fully working, 1800 lines of highly factored dense C code (I put multiple statements on one line). Now moving into incorporating the Atom Forth interpreter (which is only one page of assembly), followed by writing the assembler in Atom Forth, and lastly porting the editor and interpreter over to Atom Forth. Will then be able to edit the source code of the editor from within the editor itself, which is great for extending/modifying the editor when I need new keyboard shortcuts or other features!

View File

@@ -0,0 +1,26 @@
# 20070921 - Assembler in Atom4th
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20070921.html
# 20070921 - Assembler in Atom4th
![](20070921.jpg)
Nearly finished with writing a x86-32/64 bit assembler in the Atom4th language. The shot above shows about 50% of the code, excluding the opcode tables. In the process, I've changed a few things about the editor and fine-tuned the programming style. The final code for the assembler (which I still have to test) is nearly the same as what I had prototyped in Excel, see the screen shot from [20070910](20070910.html), except I've moved the assembler over to being fully compiled instead of interpreted.
**Again, Why Atom4th?**
The way I like to think about this is like the difference between your average bland front wheel drive car which might be able to do a one wheel burnout if you mash the gas pedal, and a raw striped down vintage road racing prototype race car.
99.99999% of people enjoy the comfort of knowing absolutely nothing about their car (after all, what the hell is a one wheel burn out anyway?), it gets good gas millage and goes from A to B. This front drive car is like C and other common languages. It is the work horse of near the entire world.
For the other 0.00001% there is the need for something different, something that requires intimate knowledge of the machine, purpose built to extract extreme performance, and yet simple enough to work on. This is the essence of vintage. To go back to simpler times, and have the freedom and access to do awesome things. Atom4th is that exotic vintage race car designed for those who get an adrenaline rush from the chest pounding 100+ decibel exhaust note and taste of ultra high octane leaded race gas.
**So What's Next**
Building this beast is taking longer than I expected, but with the assembler soon done, I'll be back onto the game engine, porting it to and optimizing it in Atom4th. I'm guessing the 4000 lines of C game engine code will factor out and simplify into about 32 Atom4th pages (at 32 line pairs per page). Porting the Atom4th editor into Atom4th is about 1800 lines of C code to port, so I'm going to skip that until way later, so I can get back to work on the actual game.

View File

@@ -0,0 +1,62 @@
# 20140816 - Vintage Programming
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20140816.html
# 20140816 - Vintage Programming
A photo (not a screenshot) of one of my home vintage development environments running on modern fast PCs. Shot shows colored syntax highlighted source to the compiler of the language I use most often (specifically the part which generates the ELF header for Linux). More on this below.
![](20140816-A.png)
This is running 640x480 on a small mid 90's VGA CRT which supports around 1000 lines. So no garbage double scan and no horrible squares for pixels. Instead a high quality analog display running at 85 Hz. The font is my 6x11 fixed size programming font.
![](20140816-B.png)
This specific compiler binary on x86-64 Linux is under 1700 bytes.
**A Language**
The language is ultra primitive, it does not include a linker, or anything to do code generation, there is no debugger (and it frankly is not needed as debuggers are slower than instant run-time recompile/reload style development). Instead the ELF (or platform) header for the binary, and the assembler or secondary language which actually describes the program, is written in the language itself.
Over the years I've been playing with either languages which are in classic text form, and languages which require custom editors and are in a binary form. This A language is the classic text source form. All the variations of languages I've been interested in are heavily influenced by [Color Forth](http://www.colorforth.com/).
This A compiler works in 2 passes, the first both parses and translates the source into x86-64 machine code. Think of this as factoring out the interpreter into the parser. The second pass simply calls the entry point of the source code to interpret the source (by running the existing generated machine code). After that whatever is written in the output buffer gets saved to a file.
Below is the syntax for the A language. A symbol is an untyped 64-bit value in memory. Like Forth there is a separate data and return stack.
\comment\
012345- \compile: push -0x12345 on the data stack\
,c3 \write a literal byte into the compile stream\
symbol \compile: call to symbol, symbol value is a pointer to function\
'symbol \compile: pop top of data stack, if value is true, call symbol\
`symbol \copy the symbol data into the compile stream, symbol is {32-bit pointer, 32-bit size}\
:symbol \compile: pop data stack into symbol value\
.symbol \compile: push symbol value onto data stack\
%symbol \compile: push address of symbol value onto data stack\
"string" \compile: push address of string, then push size of string on the data stack\
{ symbol ... } \define a function, symbol value set to head of compile stream\
And that is the A language. The closing "}" writes out the 32-bit size to the packed {32-bit pointer, 32-bit size} symbol value, and also adds an extra RET opcode to avoid needing to add one at the end of every define. There is one other convention missing in the above description, there is a hidden register used for the pointer to the output buffer.
**Writing Parts of the Language in the Language**
The first part of any source file is a collection of opcodes, like the *{ xor ,48 ... }* at the top of the image which is the raw x86-64 machine code to do the following in traditional assembly language (rax = top of data stack, rbx points to second data stack entry),
XOR rax, [rbx]
SUB rbx, 8
These collection of opcodes generate symbols which form the stack based language the interpreter uses. They would get used like *`xor* in the code (the copy symbol to compile stream syntax). For instance *`long* pops the top of the data stack and writes out 8-bytes to the output buffer, and *`asm* pushes the output buffer pointer onto the data stack.
I use this stack based language to then define an assembler (in the source code), and then I write code in the assembler using the stack based language as effectively the ultimate macro language. For instance if I was to describe the *`xor* command in the assembly it would look like follows,
{ xor .top .stk$ 0 X@^ .stk$ 8 #- }
Which is really hard to read without syntax coloring (sorry my HTML is lazy). For naming, the "X" = 64-bit extended, the "@" = load, and the "#" = immediate. So the "X@^" means assemble "XOR reg,[mem+imm]". The symbols "top" and "stk$" contain the numbers of the registers for the top of the stack and the pointer to the second item on the stack respectively.
**Compiler Parser**
The compiler parsing pass is quite easy, just a character jump table based on prefix character to a function which parses the {symbol, number, comment, white space, etc}. These functions don't return, they simply jump to the next thing to parse. As symbol strings are read they are hashed into a register and bit packed into two extra 64-bit registers (lower 4-bits/character in one register, upper 3-bits/character in another register). This packing makes string compare easy later when probing. Max symbol string is 16 characters. Hash table is a simple linear probing style, but with an array 2 of entries per hash value filling one cacheline. Each hash table entry has the following 8-byte values {lower bits of string, upper bits of string, pointer to symbol storage, unused}. The symbol storage is allocated from another stack (which only grows). Upon lookup, if a symbol isn't in the hash table it is added with new storage. Symbols never get deleted.

View File

@@ -0,0 +1,249 @@
# 20141231 - Continued Notes on Custom Language
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20141231.html
# 20141231 - Continued Notes on Custom Language
*Continuing notes on portable development for Linux and Windows using a custom language...*
Using [WINE](https://www.winehq.org/) on Linux for Windows development is working great: ability to use OpenGL 4.x in native 64-bit WIN32 binaries on Linux. Using the same compiler to build both Linux and Windows binaries which can both be run from the same Linux box.
**Forth-like Language as a Pre-Processor for Code-Generation**
My 1706 byte compiler takes a stream of prefixed text: mostly {optional prefix character, then word, then space} repeated...
\comment\
word \compile call to address at word\
:word \compile pop data stack into word\
.word \compile push 64-bit value at word onto data stack\
'word \compile conditional call to address at word if pop top of stack is non-zero\
`word \copy opcodes at word into compile stream\
%word \compile push address of word, still haven't used this\
{ word \define word, stores compile address in word, then at closing stores size (used for opcodes)\ }
34eb- \compile push hex number negated\
,c3 \write raw byte into compile stream, x86 ret opcode in this example\
"string" \compile push address of string, then push size of string\
Compiler strips whitespace and comments while converting the input words into machine code (using the conventions above), then executes the machine code.
There is no "interpreter" as would normally be used in a Forth based language.
The compiler does not even have the standard Forth opcodes,
these are instead just specified in the input source file. For example ADD,
{ add ,48 ,03 ,03 ,83 ,eb ,08 }
Which is the raw opcode stream for "MOV rax,[rbx]; ADD rbx,8" where "rax" is the top of the data stack, and "rbx" points to the 2nd item on the data stack. Since Forth opcodes are zero operand, it is trival to just write them in source code directly (language is easily extendable). I use under 30 forth style opcodes. After an opcode is defined, it can be used. For example,
10 20 `add
Which pushes 16 then 32 on the data stack (numbers are all in hex), then adds them. To do anything useful, words are added which pop a value from the data stack and write them to a buffer. For example write a byte,
{ byte ,40 ,88 ,07 ,83 ,c7 ,01 ,48 ,8b ,03 ,83 ,eb ,08 }
Once the compiler is finished executing the machine code generated by the source, which in turn is used to write a binary into a buffer, the compiler stores that buffer to disk and exits. In order to do anything useful the next step is to use the language and source opcodes which extend the language to build an assembler. Some bits of my assembler (enough for the ADD opcodes),
\setup words for integer registers\
0 :rax 1 :rcx 02 :rdx 03 :rbx 04 :rsp 05 :rbp 06 :rsi 07 :rdi
8 :r8 9 :r9 0a :r10 0b :r11 0c :r12 0d :r13 0e :r14 0f :r15
\words used to generate the opcode\
{ REXw 40 `add `byte }
{ REX .asmR 1 `shr 4 `and .asmRM 3 `shr `add `dup 'REXw `drp }
{ REXx .asmR 1 `shr 4 `and .asmRM 3 `shr `add 48 `add `byte }
{ MOD .asmR 7 `and 8 `mul .asmRM 7 `and `add .asmMOD `add `byte }
{ OP .asmOP 8 `shr 'OPh .asmOP `byte }
{ OP2 :asmOP :asmRM :asmR 0c0 :asmMOD REX OP MOD }
{ OP2x :asmOP :asmRM :asmR 0c0 :asmMOD REXx OP MOD }
\implementation of 32-bit and 64-bit ADD\
{ + 03 OP2 } { X+ 03 OP2x }
Afterwords it is possible to write assembly like,
.rax .rbx + \32-bit ADD eax,ebx\
.rax .rbx X+ \64-bit ADD rax,rbx\
Due to the complexity of x86-64 ISA, I used roughly 300 lines to get a full assembler (sans vector opcodes). With a majority of those opcodes not even getting used in practice. The [ref.x86asm.net/coder64.html](http://ref.x86asm.net/coder64.html) site is super useful as an opcode reference.
**Binary Header**
Next step is writing source to generate either a PE (Windows) or ELF (Linux) binary header. ELF with "dlsym" symbol used roughly 70 lines (mostly comments to describe the mess of structure required for an ELF). The PE header I generate for WIN32 binaries looks similar to [this example from Peter Ferrie](http://pferrie.host22.com/misc/pehdr.htm)
which is a rather minimal header with non-standard overlapping structures.
I added an import table for base Kernel32 functions like "LoadLibraryA",
because of fear that manual run-time "linking" via PEB would trigger virus warnings on real Windows boxes.
I'm not really attempting to hit a minimum size (like a 4KB demo), but rather just attempting to limit complexity.
WINE handles my non-standard PE with ease.
If I was to write an OS, I wouldn't have binary headers (PE/ELF complexity just goes away). Instead I would just load the binary at zero, with execution starting at 4 with no stack setup, with binary pages loaded with read+write+execute, and then some defined fixed address to grab any system related stuff (same page always mapped to all processes as read-only). This has an interesting side effect that JMP/CALL to zero would just restart the program (if nop filled) or do exception (if invalid opcode filled). Program start would map zero-fill and setup stack. I'd also implement thread local storage as page mapping specific to a thread (keeping it simple).
**ABI: Dealing With the Outside World**
Having your own language is awesome ... dealing with the C based Linux/Windows OS is a pain. I use a 8-byte stack alignment convention. The ABI uses a 16-byte stack alignment convention. The ABI for Linux Kernel, Linux C export libraries, and 64-bit Windows is different. Here is a rough breakdown of register usage,
\_ \_\_\_ \_ LXK LXU WIN
0 rax .
1 rcx . k0 a3 a0
2 rdx . a2 a2 a1
3 rbx . s0 s0 s0
4 rsp X
5 rbp X t0 t0 t0
6 rsi . a1 a1 a4 <- stored before call on WIN
7 rdi . a0 a0 a5 <- stored before call on WIN
8 r8\_ . a4 a4 a2
9 r9\_ . a5 a5 a3
a r10 . a3 k0 k0
b r11 . k1 k1 k1
c r12 X t1 t1 t1
d r13 X t2 t2 t2
e r14 . s1 s1 s1
f r15 . s2 s2 s2
rax = return value (or syscall index in Linux)
rsp = hardware stack
a# = register argument (where Windows 64-bit a4 and a5 are actually on the stack)
t# = temp register saved by callee, but register requires SIB byte for immediate indexed addressing
s# = register saved by callee, no SIB required for immediate indexed addressing
k# = register saved by caller if required (callee can kill the register)
I use a bunch of techniques to manage portability. Use a set of words {abi0, abi1, abi2 ... } and {os0, os1, os2 ... } (for things which map to Linux system calls) which map to different registers based on platform. Use word "ABI(" to store the stack into R13, then align the stack to 16-bytes, then setup a stack frame for the ABI safe for any amount of arguments. Words "ABI", "ABI5", "ABI6+" to do ABI based calls based on the number of integer arguments needed for the call. This is needed because Linux supports 6 arguments in registers, and Windows only supports 4. Then later ")ABI" to get my 8-byte aligned stack back,
{ ABI( .abiT2 .rsp X= .rsp 0fffffffffffffff0 X#& .rsp 50- X#+ }
{ ABI \imm\ #@CAL } \call with up to 4 arguments\
{ ABI5 \imm\ #@CAL } \call with 5 arguments\
{ ABI6+ \imm\ #@CAL } \call with 6 or more arguments\
{ )ABI .rsp .abiT2 X= }
With the following words overriding some of the above words on Windows (slighly more overhead on Windows),
{ ABI(.W .abiT2 .rsp X= .rsp 0fffffffffffffff0 X#& .rsp 80- X#+ }
{ ABI5.W .abi4 .abiW4 PSH! \imm\ #@CAL }
{ ABI6+.W .abi4 .abiW4 PSH! .abi5 .abiW5 PSH! \imm\ #@CAL }
So a C call to something like glMemoryBarrier would end up being something like,
ABI( .abi0 \GL\_BUFFER\_UPDATE\_BARRIER\_BIT\ 200 # .GlMemoryBarrier ABI )ABI
And in practice the "ABI(" and ")ABI" would be factored out to a much larger group of work. The "#@CAL" translates to call to "CALL [RIP+disp32]", since all ABI calls are manual dynamic linking the ".GlMemoryBarrier" is the address which holds the address to the external function (in practice I rename long functions into something smaller). Since both Windows and Linux lack any way to force libraries into the lower 32-bits of address space, and x86-64 has no "CALL [RIP+disp64]", I decided against run-time code modification patching due to complexity (would be possible via "MOV rax,imm64; CALL rax"). Both Windows and Linux require slightly different stack setup. Convention used for Linux (arg6 is the 7th argument),
-58 pushed return
-50 arg6 00 <---- aligned to 16-bytes, ABI base
-48 arg7 08
-40 arg8 10
-38 arg9 18
-30 argA 20
-28 argB 28
-20 argC 30
-18 argD 38
-10 argE 40
-08 argF 48
+00 aligned base
Convention used for Windows,
-88 pushed return
-80 .... 00 <---- aligned to 16-bytes, ABI base
-78 .... 08
-70 .... 10
-68 .... 18
-60 arg4 20
-58 arg5 28
-50 arg6 30
-48 arg7 38
-40 arg8 40
-38 arg9 48
-30 argA 50
-28 argB 58
-20 argC 60
-18 argD 68
-10 argE 70
-08 argF 78
+00 aligned base
The C based ABI and associated
"{{save state, call, ... nesting ..., return, load state} small amount of code} repeat ...}" pattern
forces inefficient code in the form of code bloat and constant shuffling around data between functional interfaces.
Some percentage of callee-save registers are often used to shadow arguments,
often data is loaded to register arguments
only to be saved in memory again for the next call,
caller saves happen even when the callee does not modify, etc.
I'd much rather be using a "have command structures in memory, fire-and-forget array of pointers to commands" model.
The fire-and-forget model is more parallel friendly (no return),
and provides ability for reuse of command data (patching).
The majority of system calls or ABI library calls could just be baked command data which exists in the binary.
Why do I need to generate complex code to do constant run-time generation and movement of mostly static data?
I conceptionally treat registers as a tiny fast compile-time immediate-indexed RAM (L0$).
Register allocation is a per-task process, not a per-function process.
There is no extra shuffling of data. No push/pop onto stack frames, etc.
For example register allocation is fixed purpose during the two passes
of the compiler,
\\_\_\_\_REGISTERS\_DURING\_COMPILE\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\
.rax :chr \input character\ .rax :dic \dictionary entry address\ .rax :str#
.rcx :hsh \hash value\ .rcx :jmp \jump address\ .rcx :num .rcx :str$
.rdx :pck1 \string packing 1\ .rdx :num- .rdx :siz
.rbx :pck2 \string packing 2\ .rbx :chr$1
.rbp :dic$ \dictionary top pointer, not addressed from\
.rsi :chr$ \input pointer\
.rdi :mem$ \memory stack, compile stack\
.r8 :def$ \define stack\
.r15 :zero \set to zero\
\\_\_\_\_REGISTERS\_DURING\_EXECUTE\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\
.rax :top \rax used for .dic only at launch\
.rcx :cnt \counter on text copy\
.rdx :fsk$ \float stack pointer, points to 2nd entry\
.rbx :stk$ \data stack pointer, points to 2nd entry\
.rsi :txt$ \used for source on text copy\
.rdi :out$ \output pointer, points to next empty byte\
**Debugging**
My dev environment is basically two side by side terminal windows per virtual desktop with a easy switch to different virtual desktops.
I'm using nano as a text editor and have some rather simple color syntax highlighting for my language.
No IDE no debugger. Proper late stone-age development.
To port to WIN32 I did have to fix some code generation bugs with regard to having a non-zero ORG.
On Linux I load the binary at zero, so file offset and loaded offsets are the same.
This is not possible in Windows.
IMO there is more utility in keeping it simple than having zero page fault.
When tracking down bugs in code generation or binary headers I just use "hexdump -C binary".
My language supports injection of any kind of data during code generation,
so it is trivial to just wrap an instruction or bit of code or data with something like "--------"
which is easy to find via "hexdump -C binary | less".
The Forth inspired language I use has only one possible error, attempting to call a word which has not been defined (which calls zero).
My compiler does not print any error messages, it simply crashes on that error.
Since in practice this effectively almost never happens, I've never bothered to have it write out an error message.
The last time I misspelled a word, it was a super quick manual log search (commenting out code) to find the error.
When compiling and testing is perceptually instant, lots of traditional complexity is just not needed.
As for regular development, I started programming when a bug would take down the entire machine (requiring reboot).
Being brought up in that era quickly instills a different kind of development process
(one that does not generate bugs often).
Most bugs I have are opcode mistakes (misspellings), like load "@" instead of store "!", or using a 32-bit form of an opcode instead of the 64-bit form.
The only pointers which are 64-bit in my binaries are pointers to library functions (Linux loads libraries outside the 32-bit address space),
the stack (I'm still using the OS default instead of moving to the lower 32-bit address space), or pointers returned by a library.
When dealing with bugs with an instant iteration cycle, "printf" style debug is the fast path.
I've built some basic constructs for debugging,
BUG( "message" )BUG \writes message to console\
.bugN .rax = 10 BUG# \example: prints the hex value in RAX to 16 digits to console\
Adding something to my "watch window" is just adding the code to output the value to the console.
This ends up being way better than a traditional debug tool because console output provides
both the history and timeline of everything being "watched".
**Misc**
In the past I had a practice of building a custom editor and run-time for each language.
The idea being that it was more efficient to compile from a binary representation,
which has a dictionary embedded in the source code (no parsing, no hashing).
Ultimately I moved away from that approach due to the complexity involved in building the editor that I wanted for the language.
Mostly due to the complexity of interfacing with the OS.
It is really easy to fall in the trap of building a tool which is many times more complex than the problem the tool is engineered to solve.
Decoupling from the dependency of a typical compiler and toolchain on modern systems has been a great learning experience.
If FPGA perf/$ was competitive with GPUs I'd probabably move on to building at the hardware level instead...

View File

@@ -0,0 +1,16 @@
# 20150414 - From Scratch Bug
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20150414.html
# 20150414 - From Scratch Bug
Inspired by Jaymin's [JayStation2](http://jaystation2.maisonikkoku.com/JayStation2/JayStation2.html) effort and remembering a past life building custom OSs for early x86 machines, haven't been able to avoid the custom OS bug any longer. It starts easy with a harmless QEMU install, followed by a 512-byte bootloader switching to 80x50 text mode and installing a custom 48 character Forth font, then bring up of a Forth assembler/editor, then on to the pain of modern PCI and USB driver bring-up... with the eventual goal of a tiny bootable USB thumb system.
Amazingly refreshing to not have the OS telling you NO. Or the API telling you NO. Modern systems are all about the NO. Systems I grew up on were all about the YES.
Reworking my language from scratch, trying something new, replacing the Forth data stack with a new concept, but maintaining zero operand opcodes. Not sure if the idea will pan out. Dropping everything but 32-bit word support from the language, no need to interop with other software. No more 8/16/64-bit loads or stores (can still just inline machine code if required). Still running in x86-64 64-bit mode, so return stack PUSH/POP/CALL/RET is still a 64-bit stack operation, just don't need that 64-bit address space or 64-bit pointers anywhere else. Trying padding out all x86 opcodes to 32-bit alignment. This makes the 32-bit immediate 32-bit aligned. Wastes space, gives up some perf? Why would I care when most of the CPU side of the system fits in the L1 cache. Dropping paging, dropping interrupts, dropping everything, none of that stuff is needed.
Reworking an editor and binary source encoding. Switching to 32-bit tokens with 5 character max strings. 48 character character-set. Doing something horrible with font design: 1=I, 0=O, 2=Z, 5=S, etc. All caps font with no non-vertical or non-horizonal lines. Actually looks awesome. When you don't have to interop with the NO machine, long symbol names are not required. Color Forth like editors have almost no state. It is magical how they function simultaneously as an editor/assembler/console/debugger/UI/etc. Take the idea of "editor-time-words", words embedded in the source code which are evaluated when the block of source is drawn to the screen. Becomes possible to build out UI tools in the source. Can have an editor-time word read system data and draw in real-time updates in the source code itself. Editor-time words are just like any other word in the system, just color tagged to only be evaluated at draw time.
Minimal systems are a blessing, more so when you have only minimal free time to work on them.

View File

@@ -0,0 +1,40 @@
# 20150420 - From Scratch Bug 2 : Source-Less Programming
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20150420.html
# 20150420 - From Scratch Bug 2 : Source-Less Programming
*This is a disruptive idea which comes back periodically: source-less programming.
Is it possible to efficiently program at a level even lower than an assembler?*
The general idea is that the editor is now something similar to an enhanced hex editor which edits a binary image directly.
Lowest memory is split into three parts {{running image, annotations for edit image}, edit image}.
The {running image, annotations for edit image} is the boot image.
The {edit image} is working space which gets snapshot replacement for {running image} on a safe transition zone.
The "annotation" section is what enables human understanding of the binary image.
**Words**
One way to keep the system very simple is to always work in 32-bit words.
A 32-bit word in memory is one of four things {data, opcode, absolute address, rip relative address}.
Data is easily handled by the hex editor part.
The annotation could provide a name for the data or a comment.
Opcodes in x86 can be complex but it is easy to simplify.
Start with something similar to forth zero-operand and one-operand operations (calls, etc).
Make all operations padded to 32-bit alignment (x86-64 can use the 2e ignored prefix to pad).
A call for instance becomes {32-bit opcode for call, 32-bit rip relative branch address}.
Or a global load becomes {32-bit opcode for load, 32-bit rip relative branch address}.
Annotation memory can provide a name for the opcode.
Annotation can provide a tag for each word in memory which marks
if the memory is a relative or absolute address (word gets a different color based on tag similar to color forth).
Addresses can be auto annotated by displaying the annotation associated with the destination.
Editor works on the {edit image},
with insert and delete of words automatically adjusting all the words tagged as address
(effectively relinking at edit time).
The {edit image} can also keep a mapping of original {running image} address
so that it is possible to view the live values of any data.
Editor provides something to easily be able to write an annotation
and have the associated opcode or address automatically get updated.
For example type the opcode name and the 32-bit value is automatically updated.
Very simple and yet powerful minimal tool.

View File

@@ -0,0 +1,72 @@
# 20150422 - Source-Less Programming : 2
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20150422.html
# 20150422 - Source-Less Programming : 2
*Continuing with what will either be an insanely great or amazingly stupid project...*
Making slow progress with bits of free-time after work, far enough thinking through the full editor design to continue building. Decided to ditch 64-bit long mode for 32-bit protected mode. Not planning on using the CPU for much other than driving more parallel friendly hardware... so this is mostly a question of limiting complexity.
Don't need 16 registers and the REX prefix is too ugly for me to waste time on any more.
The 32-bit mode uses much more friendly mov reg,[imm32] absolute addressing,
also with ability to use [EBP+imm32] without an SIB byte (another thing I mostly avoid).
Unfortunately still need relative addresses for branching.
32-bit protected mode thankfully doesn't require page tables unlike 64-bit long mode.
Can still pad out instructions to 32-bits via reduntant segment selectors.
**Source-Less Analog to Compile-Time Math?**
Compile-time math is mostly for the purpose of self-documenting code:
"const uint32\_t willForgetHowICameUpWithThisNumber = startHere + 16 \* sizeof(lemons);".
The source-less analog is to write out the instructions to compute the value,
execute that code at edit time, then have anotations
for 32-bit data words which automatically pull from the result when building 32-bit words for opcode immediates for the new binary image.
**Reduced Register Usage Via Self Modifying Code**
Sure, kills the trace cache in two ways, what do I care.
Sometimes the easist way to do something complex is to
just modify the opcode immediates before calling into the function...
**What Will Annotations Look Like?**
The plan so far is for the editor to display a grid of 8x8 32-bit words.
Each word is colored according to a tag annotation
{data, absolute address, relative address, pull value}.
Each word has two extra associated annotations {LABEL, NOTE}.
Both are 5 6-bit character strings.
Words in grid get drawn showing {LABEL, HEX VALUE, NOTE} as follows,
LABEL
00000000
NOTE
The LABEL provides a name for an address in memory (data or branch address).
Words tagged with absolute or relative addresses or pull value show in the NOTE field
the LABEL of the memory address they reference.
Words tagged with data use NOTE to describe the opcode, or the immediate value.
Editor when inserting a NOTE can grab the data value from other words
with the same NOTE (so only need to manually assemble an opcode once).
Edit-time insert new words, delete words, and move blocks of words,
all just relink the entire edit copy of the binary image.
ESC key updates a version number in the edit copy,
which the excuting copy sees triggering it to replace itself
with the edit copy.
**Boot Strapping**
I'm bootstrapping the editor in NASM in a way that I'll be able
to see and edit later at run-time.
This is a time consuming process to get started
because instead of using NASM to assemble code,
I need to manually write the machine code to get the 32-bit padded opcodes.
Once enough of the editor is ready,
I need a very tiny IDE/PATA driver to be able to store to the disk image.
Then I can finish the rest of the editor in the editor.
Then I'll also be self hosted outside the emulator
and running directly on an old PC with a non-USB keyboard,
but with a proper PCIe slot...

View File

@@ -0,0 +1,34 @@
# 20150423 - Source-Less Programming : 3
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20150423.html
# 20150423 - Source-Less Programming : 3
**Annotation Encoding**
Refined from last post, two 32-bit annotation words per binary image word,
FEDCBA9876543210FEDCBA9876543210
================================
00EEEEEEDDDDDDCCCCCCBBBBBBAAAAAA - LABEL : 5 6-bit chr string ABCDE
FEDCBA9876543210FEDCBA9876543210
================================
..............................00 - DAT : hex data
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA01 - GET : get word from address A\*4
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA02 - ABS : absolute address to A\*4
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA03 - REL : relative address to A\*4
Going to switch to just 2 lines per word displayed in the editor,
Only DAT annotations show hex value, other types show LABEL of referenced address
in the place of the hex value. So no need for an extra note.
In practice will be using some amount of binary image memory to build up
a dictionary of DAT words representing all the common somewhat forth like opcodes,
then GET words in the editor to build up source.
Need to redo the bootloader from floppy to harddrive usage,
and switch even the bootloader's 16-bit x86 code to 32-bit aligned
LABEL'ed stuff so the final editor can edit the bootloader.
Prior was avoiding manually assembling the 16-bit x86 code in the boot loader,
but might as well ditch NASM and use something else to bootstrap everything.

View File

@@ -0,0 +1,61 @@
# 20150424 - Source-Less Programming : 4
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20150424.html
# 20150424 - Source-Less Programming : 4
*Still attempting to fully vet the design
before the bootstrap reboot...*
DAT words in the edit image need to maintain
their source address in the live image
This way on reload the live data can be copied over,
and persistent data gets saved to disk.
DAT annotations no longer have 30 bits of free space,
instead they have a live address.
When live address is zero. then DAT words
won't maintain live data.
This way read-only data can be self-repairing
(as long as the annotations don't get modified).
Going to use a different color for read-only DAT words.
New persistent data DAT words
will reference their edit-image hex value
before reload (then get updated to the live address).
REL words always get changed on reload (self repairing).
No need to keep the live address.
REL is only used for relative branching x86 opcodes.
Don't expect to have any run-time (non-edit-time)
self-modifying of relative branch addresses.
Given that branching to a relative branch opcode immedate
is not useful, the LABEL of a REL word is only useful
as a comment.
GET words also get changed on reload (self repairing).
GET is only designed for opcodes and labeled constants.
GET words will often be LABELed as a named branch/call target.
Been thinking about removing GET, and instead making a new
self-annotating word (display searches for a LABELed DAT
word with the same image value, then displays the
LABEL instead of HEX).
This opens up the implicit possibility of mis-annotations.
Would be rare for opcodes given they are large 32-bit values.
But for annotating things like data structure immediate offsets,
this will be a problem
(4 is the second word offset in any structure).
ABS words always get changed on reload (self repairing).
ABS words are targets for self-modifying code/data,
so they also need LABELs.
Reset on reload presents a problem in that
ABS cannot be used to setup persistent data
unless that persistent data is constant
or only built/changed in the editor.
But this limitation makes sense in the context
that ABS addresses in live data structures
can get invalidated by moving stuff around in memory.
The purpose of ABS is edit-time relinking.

View File

@@ -0,0 +1,84 @@
# 20150426 - Source-Less Programming : 5
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20150426.html
# 20150426 - Source-Less Programming : 5
**Boot Loader Bring-up**
Managed to get the boot loader done, which includes the following steps,
(1.) Move the stack seg:pointer (since next step overwrites it).
(2.) Use BIOS to read the other 62 512-byte sectors for the first track.
(3.) Use BIOS to switch to 80x50 text mode and load custom character glyphs.
(4.) Use BIOS to set EGA text palette to 0-15 with 0 for overscan.
(5.) Program VGA palette registers for those 16 colors.
(6.) Use BIOS to enable A20.
(7.) Turn off interrupts, and relocate the image's 63 sectors to zero.
(8.) Load zero entry IDT, minimal 3 entry GDT.
(9.) Enable protected mode and jump to the 3rd sector.
The 2nd 512-byte sector contains the 8x8 character bitmaps for the first 64 characters.
The majority of the time was spent making a nice font,
getting colors the way I wanted,
and prototyping editor look and feel (without building it).
Didn't feel like fully hand assembling 16-bit x86 machine code for the boot loader,
so I used NASM and hexdump to accellerate the process
(to provide machine code I could pad out to 32-bit alignment).
Also wrote a quick C based tool to bootstrap the process of building the loader.
Something which would enable me to easily build out an annotated image,
and show a print out in the console of what I'd be seeing in the editor.
Here is a shot of a bit of the scratch C code I used to make the font,
![](20150426-A.png)
Here is a shot in QEMU of the loader displaying the font,
![](20150426-B.png)
And another shot from QEMU showing the pallet,
![](20150426-C.png)
**What the Current Annotated Image Looks Like**
Below is a shot captured from the terminal window output of the C tool.
I'm using 3 cache lines for the loader code.
![](20150426-D.png)
Grey lines separate the 512-byte sectors.
Memory address on the left in grey.
Each pair of lines shows half a x86 cacheline.
The blue to white shows the 5 character/word annotation strings
(now using the extra 2 bits of the label for color).
The red hex show the image data.
Not using {GET,ABS,REL} tagged words in this part,
so everything in the bootloader is just hand assembled 16-bit machine code,
and this is not representative of what the rest of the system will look like.
The rest of the system will have {GET opcode} followed by {HEX} or {ABS}
for opcode immediates (easy to write).
The 16-bit code is {HEX} mixed opcode and immediates, quite a bit different (hard to write).
Some hints on the annotations,
Everything is in base 16.
AX is TOP so I don't bother with "A=9000" (which wouldn't fit anyway),
instead I just write "9000" (the A= is implied).
The "!" means store so "SSSP!" is storing TOP (or AX) into both SS and SP.
The "B=200" means BX=200h.
In this 16-bit x86 case I use 3E to pad out opcodes to 32-bit.
The "X" = SI, "Y" = DI, "F" = BP.
**Next Step**
Ground work is done,
next step is to bring up the opcode dictionary for {GET} words,
then write a little IDE driver to get access to load the rest of the image,
and to be able to save in the editor.
After that, write the drawing code for the editor,
then a mini PS/2 driver for the input,
then write editor input handling.
Then I have a full OS ready to start on a real machine.

View File

@@ -0,0 +1,67 @@
# 20150710 - Inspiration Reboot
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20150710.html
# 20150710 - Inspiration Reboot
*Quite inspired by the insane one still or video per day at [beeple.tumblr.com](http://beeple.tumblr.com/).
Attempting to get back in the grove of consistently taking a small amount of non-work time every day to reboot fun projects.
I'm on week 2 now of probably a three month process of healing from a torn lower back,
sitting in front of a computer is now low enough pain to have fun again...*
**1536**
Setting a new 1536-byte (3x 512-byte sector) constraint for a bootloader + source interpreter
which brings up a PC in 64-bit long-mode with a nice 8x8 pix VGA font
and with 30720-bytes (60 sectors, to fill out one track) of source text for editor and USB driver.
USB providing thumb drive access to load in more stuff.
Have 1st sector bringing up VGA and long mode,
2nd sector with 64-character font,
and last 512-byte sector currently in progress as the interpreter.
Went full circle back to something slow, but dead simple:
interpreter works on bytes as input.
The following selection of characters appends simultaneously to a 60-bit 10 6-bit/char word string,
and a 64-bit number,
```
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ#$@!+*-^&|=?<>_
```
Then giving a "color forth tag" like meaning to another fixed set of characters,
~ - Negate the 64-bit number.
. - Lookup word in dictionary, and push 64-bit value onto data stack.
, - Push 64-bit number on data stack.
: - Lookup word in dictionary, pop 64-bit value from data stack to word.
; - Write 32-bit number at compile position.
" - Lookup word in dictionary, interpret the string at address stored in word.
[ - Lookup word in dictionary, store compile position in word, append string from [ to ] compile position.
] - When un-matched with ], this ends interpretation via RET.
\ - Ignore text until the next \.
` - Lookup word in dictionary, call to address stored in word.
Those set of characters replace the "space" character in forth, and work like a post-fix tag working on either the string or number just parsed from input.
The set of tags is minimal but flexible enough to build up a forth style macro language assembler,
with everything defined in the source itself.
More on this next time.
One nice side effect of post-fix tags is that syntax highlight is trivial by reading characters backwards starting at the end of the block.
**Sony Wega CRT HDTVs**
The old Wega CRT HDTVs work quite well.
They apparently are nearly fixed frequency 1080 interlaced with around 540 lines (or fields) per ~60 Hz frame,
and unlike prior NTSC CRT TVs, they seem to not do any real progressive scanning.
Taking a working 1080i modeline and converting it to 540p and driving the CRT
results in the Wega initiating a "mode-reset" when it doesn't see the interlaced fields for the 2nd frame.
However 480p modes do work (perhaps with an internal conversion to 1080i).
Given that 1080i modes are totally useless as the 60Hz interlace flicker is horrible,
and 540p won't work, these HDTVs should be complete garbage.
However 720p works awesome as the TV's processing to re-sample to 1080i does not flicker any worse than 60Hz already does.
In theory the even and odd fields (in alternating frames) share around 75% of an input line (540/720),
and likely more if the re-sampling has some low-pass filtering.
Drop in a PS4 game which has aliasing problems, and the CRT HDTV works like magic.
These late model "hi-scan" Wega CRTs only had roughly 853 pixel width aperture grille:
853x540 from what was a 1920x1080 render is a good amount of super-sampling...

View File

@@ -0,0 +1,50 @@
# 20150714 - 1536-1 : The Programmer Addiction = Feedback
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20150714.html
# 20150714 - 1536-1 : The Programmer Addiction = Feedback
Continuing on the 1536-byte loader based system.
Interpreter finished, under the 1536-byte goal.
Second major goal is to get to the instant feedback productivity addiction loop going: modify, get feedback, repeat.
Have simple ASCII to custom 64-character set encoding converter,
and way to include the converter source text starting at sector 3 in the boot loader.
First major test, getting an infinite loop or infinite reboot working.
Source without any syntax coloring,
```
\1536-1 --- BOOTUP INTO SPIN OR REBOOT LOOP\
800000, \PUSH BOOT-UP COMPILE POSITION\
BOOT: \STORE IN BOOT WHICH GETS CALLED TO RUN SYSTEM\
FEEBFEEB,/ \WRITE OPCODE TO JUMP TO SELF\
EAEAEAEA,/ \OR WRITE OPCODE TO CRASH ON INVALID INSTRUCTION\
] \END COMPILE - LOADER WILL THEN JMP TO BOOT WORD\
```
Loader sets up memory with dictionary at 1MB (2MB chunk with 1MB overflow), copies source to 4MB (4MB chunk maximum), then starts the compile position at 8MB (so 8MB and on is the rest of the memory on the system). Had one major bug getting the interpreter up, forgot that \ in NASM results in a line continuation even when in a comment, removing a line of a lookup table resulting in a crash.
Tracking down bugs is very easy, add "JMP $" or "db 0xEA" in NASM to hang or reboot respectively.
**Adjustments**
Adjusted the character syntax.
- - Negate the 64-bit number, add dash to the string.
. - Lookup word in dictionary, and push 64-bit value from word entry onto data stack.
, - Push 64-bit number on data stack.
: - Lookup word in dictionary, pop 64-bit value from data stack to word entry.
; - Lookup word in dictionary, interpret string starting at address stored in word entry.
[ - Lookup word in dictionary, store pointer to source after the [ in the word entry, skip past next ].
] - When un-matched with ], this ends interpretation via RET.

View File

@@ -0,0 +1,126 @@
# 20150715 - 1536-2 : Assembling From the Nothing
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20150715.html
# 20150715 - 1536-2 : Assembling From the Nothing
*Started bringing up a limited subset x86-64 assembler.
The full x86-64 opcode encoding space is an unfortunate beast of complexity which I'd like to avoid.
So I did...*
**Compromises**
This prototype sticks to only exactly 4-byte or 8-byte instructions
(8-byte only if the instruction contains a 32-bit immediate/displacement).
The native x86-64 opcodes are prefix padded to fill the full 4-byte word.
Given that x86-64 CPUs work in chunks of 16-bytes of instruction fetch,
this makes it easy to maintain branch alignment visually in the code.
Since x86-64 float opcodes are natively 4-bytes without the REX prefix,
I'm self limiting to only 8 registers for this assembler,
which is good enough for the intended usage.
I'm not doing doubles and certainly not wasting time on vector instructions (have an attached GPU for that!).
Supported opcode forms in classic Intel syntax,
op;
op reg;
op reg,reg;
op reg,imm32;
op reg,[reg];
op reg,[reg+imm8];
op reg,[reg+imm32];
op reg,[imm32];
op reg,[reg+reg]; <- For LEA only.
This is a bloody ugly list
which needed translation into some kind of naming
in which "op" changes based on the form.
I borrowed some forthisms:
@ for load, ! for store.
Then added ' for imm8,
" for imm32, and # for RIP relative [imm32].
A 32-bit ADD and LEA ends up with this mess of options
(note . pushes word value on the stack, so A. pushes 0 for EAX in this context,
and , pushes a hex number, and / executes the opcode word which assembles the instruction to the current assembly write position),
A.B.+/ .......... add eax,ebx;
A.1234,"+/ ...... add eax,0x1234;
A.B.@+/ ......... add eax,[rbx];
A.B.12,'@+/ ..... add eax,[rbx+0x12];
A.B.1234,"@+/ ... add eax,[rbx+0x1234];
A.LABEL.#@+/ .... add eax,[LABEL]; <- RIP relative
A.B.12,'+=/ ..... lea eax,[rbx+0x12];
A.B.C.+=/ ....... lea eax,[rbx+rcx\*1];
Then using L to expand from 32-bit operand to 64-bit operand,
A.B.L+/ .......... add rax,rbx;
A.1234,L"+/ ...... add rax,0x1234;
A.B.L@+/ ......... add rax,[rbx];
A.B.12,L'@+/ ..... add rax,[rbx+0x12];
A.B.1234,L"@+/ ... add rax,[rbx+0x1234];
A.LABEL.L#@+/ .... add rax,[LABEL]; <- RIP relative
A.B.12,L'+=/ ..... lea rax,[rbx+0x12];
A.B.C.L+=/ ....... lea rax,[rbx+rcx\*1];
**Source Example With Google Docs Mockup Syntax Highlighting**
Font and colors are not what I'm going for, just enough to get to the next step.
This is an expanded example which starts building up enough of an assembler
to boot and clear the VGA text screen.
Some of this got copied from older projects in which I used "X" instead of "L" to mark the 64-bit operand
(just noticed I need to fix the shifts...).
I just currently copy from this to a text file which gets included into the boot loader on build.
*(Lost Image Here When Minus Went Down)*
**From Nothing to Something**
This starts by semi-self-documenting hand assembled x86 instructions via macros.
So "YB8-L'![F87B8948,/]" reads like this,
(1.) Y.B.8-,L'! packed to a word name YB8-L'! with tag characters removed.
(2.) [ which starts the macro.
(3.) F87B8948 which is {48 (REX 64-bit operand), 89 (store version of MOV), 79 (modrm byte: edi,[rbx+imm8]), F8 (-8)}.
(4.) , which pushes the number on the data stack.
(5.) / which after , executes the empty word, which pops the data stack and writes 32-bit to the asm position.
(6.) ] which ends the macro.
Later YB8-L'! with ; appended can be used to assemble that instruction by interpreting the macro.
The first assembled words are $ which pushes the current assembly position on the stack,
and $DRP (which is actually a bug which needs to be removed).
The $! pops an address from the data stack, and stores the current assembly position to given address.
This is later used for instruction build macros which do things like PSH` where the ` results in the dictionary address for the PSH word to be placed on the data stack.
The end game is getting to the point where given one of the opcode forms,
it is possible to write the following to produce a function which compiles an opcode,
C033403E,^`\_;
Which pushes the 4-byte opcode base 0xC033403E, then the opcode name ^ for XOR, then runs the \_ macro which assembles this into:
MOV eax,0xC033403E;
JMP X86-RM;
Immediately afterwards it is possible to execute the ^ word (call it) and assemble an XOR instruction.
The X86-RM expects to get the REG and RM operands from the data stack with base instruction opcode data in EAX.
**Making a Mess to Clean Up**
This about concludes the worst part of getting going from nothing,
except for the PTSD dreams where people only speak in mixed hex and x86 machine code: FUCOM! REX DA TEST JO.
When placed into final context there will be a few KB of source to build an assembler which covers all functionality I need for the rest of the system.
At this point I can easily add instructions and a few more of the opcode forms as they are they are needed.
And it becomes very easy to write assembly like this,
A.A.^/ Y.B8000,"/ C.1F40,"/ L!REP/
Which is this in Intel syntax,
xor eax,eax; <- set eax to zero
mov edi,0xB8000; <- VGA text memory start address
mov ecx,0x1F40; <- 80x50 times two bytes per character
cld;
rep storq; <- using old CISC style slow form to "do:mov [edi],rax;add rdi,8;dec rcx;jnz do;"

View File

@@ -0,0 +1,28 @@
# 20150722 - 1536-3 : Simplify, Repeat
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20150722.html
# 20150722 - 1536-3 : Simplify, Repeat
Night 3 on the 1536 project, decided to make some changes before going full ahead with writing an editor.
(1.) Switched the boot loader to not relocate to zero.
Now leaving the BIOS areas alone, I doubt I'll ever go back to real mode after switching to long mode,
this ends up making the code easier, and provides room for the next change.
(2.) Switched the boot loader to fetch the first 15 tracks instead of just 1 track.
Now have a little over 472KB of source to work with on boot,
which is effectively infinite for this project.
The motivation for this change was the realization that x86-64 assembly source would get big.
Don't want to change this later. 472KB is close to the maximum without checking for things like EBA, or handling non-full-track reads.
(3.) Switched to an easier source model. Lines are now always 64 characters long.
Comments are switch to a single \ which functions like the C // style comment, ignoring the rest of the line.
Since lines are always 64 bytes (cacheline sized and aligned),
the interpreter can quickly skip over comments.
This is a trade in increased source size, for simplification of the editor:
fixed size lines makes everything trivial.
(4.) Making a convention that syntax highlighting with color only has a line of context.
Which translates into don't let things wrap. Easy.

View File

@@ -0,0 +1,41 @@
# 20150809 - 1536-4 : Coloring
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20150809.html
# 20150809 - 1536-4 : Coloring
Night 4 on 1536.
Brought up most of the "x56-40" (x86-64 in hex) assembler now.
Also have majority of the forth-like words needed to assemble self-documenting constants {add,mul,neg,not,and,or,xor,...}.
*(Lost Image When Minus Went Down)*
Started on the editor.
Just enough of a quick prototype to render the text view in the editor (sans cursor for now).
All screens on this post are captured from the editor running in an x86-64 emulator.
Keeping the fixed 64 character lines makes everything very simple.
Syntax highlighting was carefully designed to only need one line of context.
Just a simple backward sweep to color, then a forward sweep to correct the color for comments (the \ marks rest of line as comment).
Adjusted the font, {\_,-,=} all now extend out full font cell width so they can double as lines.
Adjusted the colors closer to what I like for syntax highlighting.
Still experimenting with how to comment and arrange source.
![](20150809.png)
Have 16 characters to the right of the source window to use for real-time debug data.
Like viewing values of registers, memory, etc.
Thinking through details in the background.
Next step is to bring up the non-USB throw-away keyboard driver,
then get the editor functional.
**Bugs**
Still finding the no-errors, no-tools, know-everything path, easy to work with.
This time lost some time to an opcode assembly bug.
A full class of opcodes was broken, something never validated from last time, just forgot to make a RIP relative offset RIP relative for non-branch instructions.
Everything else working out of the box with no human errors.
When the mind can reason about the entire system,
and the edit/execute loop is near instant,
bugs normally are instant fix.
Quite satisfying to work this way.

View File

@@ -0,0 +1,15 @@
# 20150810 - 1536-5 : Keys
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20150810.html
# 20150810 - 1536-5 : Keys
Evening 5 on 1536. Wrote a mini PS/2 keyboard driver (source below) based on prior work. Ran out of time for testing, got distracted by SIGGRAPH slides.
Only supporting 64 keys (bit array in register), good enough to run arcade controllers which alias as keyboards.
Only supporting driver key release on {shift, control, alt}, allowing application to clear bits for release for other keys.
Had an interesting bug today: forgot to implement the "MOV REG,REG" opcode, surprised got this far in 1536 without register to register move.
Manually keeping 16-byte groupings for instructions has some interesting side effects on coding style...
![](20150810.png)

View File

@@ -0,0 +1,98 @@
# 20151113 - Rethinking the Symbolic Dictionary
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20151113.html
# 20151113 - Rethinking the Symbolic Dictionary
*Another permutation of dictionary implementation for forth like languages...*
**Source**
Exported source is composed of two parts,
(1.) Token array, where tokens can reference a local symbol by index into local hash table.
(2.) Local symbol hash table, has string for each entry.
Strings are 64-bits maximum and are stored in a reversible nearly pre-hashed form.
So hashing of a string is just an AND operation.
Tokens are 32-bits.
Local symbol hash is after the token array, so it can be trivially discarded after import.
**Global Dictionary**
Global dictionary maps 32-bit index to 32-bit value.
Each 32-bit index has an associated 64-bit string stored in the same reversible nearly pre-hashed form.
Dictionary entries are allocated by just taking the next entry in a line.
There is no deletion. Just two arrays (32-bit value array, and 32-bit string array), and an index for the top.
**Source Import**
Starts with loaded source in memory and allocated space for one extra array,
(1.) Source token array, gets translated to loaded-in-memory form.
(2.) Source local symbol hash table, with each entry being a 64-bit string.
(3.) Remap space, extra zeroed array with a 32-bit value per entry in local hash table.
Import streams through the global dictionary,
checking for a match in the source's local symbol hash table.
Upon finding a match, it writes the global index for the symbol into the associated remap space entry.
Import next streams through the source token array,
replacing the local symbol index
with the global index from the remap space entry.
When the remap space entry is zero,
a new symbol is allocated in the global dictionary
(this involves adding a symbol to the end of the dictionary,
and coping over the string from the local symbol hash table to the global dictionary string array).
After import the local symbol hash table and remap space are discarded.
This solves many of the core problems from a more conventional design where the global dictionary is a giant hash table.
That conventional design suffers from bad cache locality (because of the huge hash table).
This new design maintains a cache packed global dictionary (no gaps).
That conventional design can have worst case first load behavior,
each initial lookup of a new word in the dictionary on load would miss through to DRAM,
adding 100 ns per lookup.
This new design is composed of either linear streaming operations for big data
(global dictionary, source token array, etc) all of which get hardware auto-prefetch.
The source local symbol hash table is expected to be not too big and easily stay in cache (the only thing with random access).
Note with this new design, interpreting source at run-time no longer has any hash lookup,
just a direct lookup.
**First Source Import**
First source import (after machine reboot) has effectively an empty dictionary,
so import can be optimized.
**Editing**
Edit time operations, such as find the index for an existing symbol,
check if a symbol already exists,
or tab complete a symbol,
is done via a full stream through the global dictionary string table.
This is a linear operation with full auto-prefetch, so expected to be quite fast in practice.
Edit time operations are limited by human factors, so not a problem.
**Source Export**
Source export requires first checking how many unique symbols are in the chunk of source.
Use a bit array with one bit per global dictionary entry.
Zero the bit array.
Stream through the chunk of source tokens and check for a clear bit in the bit array.
For each clear bit, set the bit, and advance the count of unique words.
Setup space for the local symbol hash.
Scale up the unique symbol count to make sure the hashing is efficient.
Pad up to the next power of 2 in size.
Stream through the source tokens,
using the token index to get a global dictionary string,
hash the string into the local symbol hash, writing the associated string if new entry,
and remapping the source token index to the local hash.
Export is the most complex part of the design, but still quite simple.

View File

@@ -0,0 +1,189 @@
# 20151222 - Random Holiday 2015
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20151222.html
# 20151222 - Random Holiday 2015
**Google Photos = Actually Working (edit) Fail Great**
Minus has basically imploded loosing a lot of the image content on this blog,
unfortunately Google Photos is not a great alternative,
while it seems to support original quality, it does not, see the 2nd image (got low-passed)!
EDIT: turns out Blogger is the source of this problem, the preview sends back re-compressed images,
but after the initial posting, the original quality file is actually presented.
So Google Photos seems to be good, and it's a quality bug in Blogger Preview which was the source of the problem.
![](20151222-A.png)
**Vintage Interactive Realtime Operating System**
Taking part of my Christmas vacation time
to reboot my personal console x86-64 OS project.
I've rebooted this project more times than I'm willing to disclose on this blog.
Hopefully this reboot will stick long enough to build something real with it.
Otherwise it's been great practice, and the evolution of ideas has certainly been enlightening.
Here is a shot of the binary data for the boot sector in the editor,
![](20151222-B.png)
My goal is to rewind to the principals of the Commodore 64,
just applied to modern x86-64 hardware and evolved down a different path.
In my case to build a free OS with open source code which boots into a programming interface,
which fully exposes the machine, and provides some minimal interface to access hardware
including modern AMD GPUs.
With a feature list which is the inverse of a modern operating system:
single-user, single-tasking, no paging, no memory protection ...
the application gets total control over the machine,
free to use parallel CPU cores and GPU(s) for whatever it wants.
Ultimately manifesting as a thumb drive which can be plugged into a x86-64 machine
with compatible hardware and boot the OS to run 100% from RAM, to run software.
**A Tale of Two Languages**
C64 provided Basic at boot, I'm doing something vastly different.
Booting into an editor
which is a marriage
between
a spreadsheet, hex editor,
raw memory view, debugging tool,
interactive live programming environment,
annotated sourceless binary editor with automatic relink on edit,
and a forth-like language.
Effectively I've embedded a forth-like language in a sourceless binary framework.
The editor runs in a virtualized console
which can easily be embedded inside an application.
The editor shows 8 32-bit "cells" of memory per line (half cacheline),
with a pair of rows per cell.
Top row has a 10-character annotation (with automatic green highlight if the cell is referenced in the binary),
the botton row shows the data formatted based on a 4-bit tag stored in the annotation.
Note the screen shot showing the boot sector
was hand assembled 8086 (so is embedded data),
built from bits of NASM code chunks then disassembled
(it's not showing any of the embedded language source).
Tags are as follows,
unsigned 32-bit word
signed 32-bit word
four unsigned 8-bit bytes
32-bit floating point value
-----
unsigned 32-bit word with live update
signed 32-bit word with live update
four unsigned 8-bit bytes with live update
32-bit floating point value with live update
-----
32-bit absolute memory address
32-bit relative memory address [RIP+imm32]
toe language (subset of x86-64 with 32-bit padded opcodes)
ear language (forth-like language, encoded in binary form)
-----
5-character word (6-bits per character)
last 3 saved for GPU binary disassembly
Editor designed to edit an annotated copy of the live binary,
with a frameword designed to allow realtime update of the live binary as a snapshot of the edited copy.
The "with live update" tags
mean that the editor copy saves a 32-bit address to the live data
in it's copy of the binary (instead of the data itself).
This allows for direct edit and visualization of the live data,
with ability to still move bits of the binary around in memory.
The "toe" and "ear" tagged cells show editable disassembled x86-64 code
in the form of these languages.
The "ear" language is a zero-operand forth-like language.
Current design,
```
regs
----
rax = top
rcx = temp for shift
rbx = 2nd item on data stack, grows up
rbp = 4
rdi = points to last written 32-bit item on compile stack
bin word name x86-64 meaning
-------- ---- ----------------------------
0389dd03 , add ebx,ebp; mov [rbx],eax;
dd2b3e3e \ sub ebx,ebp;
c3f2c3f2 ; ret;
dd2b0303 add\ add eax,[rbx]; sub ebx,ebp;
dd2b0323 and\ and eax,[rbx]; sub ebx,ebp;
07c7fd0c dat# add edi,ebp; mov [rdi],imm;
d0ff3e3e cal call rax;
15ff3e3e cal# call [rip+imm];
058b3e3e get# mov eax,[rip+imm];
890fc085 jns# test eax,eax; jns imm;
850fc085 jnz# test eax,eax; jnz imm;
880fc085 js# test eax,eax; js imm;
840fc085 jz# test eax,eax; jz imm;
c0c73e3e lit# mov eax,imm;
03af0f3e mul imul eax,[rbx];
d8f73e3e neg neg eax;
00401f0f nop nop;
d0f73e3e not not eax;
dd2b030b or\ or eax,[rbx]; sub ebx,ebp;
05893e3e put# mov [rip+imm],eax;
f8d20b8b sar mov ecx,[rbx]; sar eax,cl;
e0d20b8b shl mov ecx,[rbx]; shl eax,cl;
e8d20b8b shr mov ecx,[rbx]; shr eax,cl;
dd2b032b sub\ sub eax,[rbx]; sub ebx,ebp;
dd2b0333 xor\ xor eax,[rbx]; sub ebx,ebp;
```
For symbols the immediate operands are all 32-bit
and use "absolute" or "relative" tagged cells following the "ear" tagged cell.
Likewise for "dat#" which pushes an immediate on the compile stack,
and "lit#" which pushes an immediate data on the data stack,
the following cell would have a data tag.
The dictionary is directly embedded in the binary,
using the edit-time relinking support.
No interpretation is done at run-time,
only edit-time,
as the language is kept in an executable form.
After building so many prototypes which "compile"
source to a binary form,

View File

@@ -0,0 +1,60 @@
# 20161113 - Vintage Programming 2
**Source:** https://refined-github-html-preview.kidonng.workers.dev/gomson/TimothyLottes.github.io/raw/refs/heads/master/20161113.html
# 20161113 - Vintage Programming 2
This is a follow-up to the [20140816 - Vintage Programming](20140816.html) post talking about this,
![](20140816-A.png)
**Retrospective**
Was thinking about resurecting the Atom project now that Vulkan supports what I would need to complete it,
and had to decide upon what language to write the reboot in.
Looking back at all my language and compiler experiments on PC over the years,
the A Language as described in that prior post from 2014 had the most utility
and was the easiest to be productive on in current PC style operating systems.
**Adjustments**
First implementation of compiler took ANSI text source file and output a binary.
The source had to build the ELF or PE header for the binary (similar to an assembler running without a linker).
This round, the compiler is the platform executable,
and programs will run directly from source (JITing themselves).
Compiler will be built around interactive run-time edit/recompile/retry loop which maintains existing program state in memory during the process.
* Start: Read the source file into the top of the memory stack.* Interpret the source into binary code.* Loop: Call the binary code entry point, if source file timestamp changed exit loop.* Jump to Start.
Source can thus be edited in the background in any text editor while the program runs.
Save the source file, and the program reloads itself while it runs.
Likewise the source could be edited inside the program itself.
Lots of flexibility.
**Portability and Distribution**
One source file for any platform, source adapts to the platform as it JITs itself into binary code at run-time.
Distribution is just pairing the platform specific compiler binary (which can be renamed to the program name),
and the source file (which will have a set name like "rom.a", so a double click of the renamed compiler runs the application without any arguments).
Platform specific compilers will be a few K executable depending on how minimal the ELF or PE is.
For a secondary option at some point I'll make something which can build a single executable with the compiler embedded with source.
**Open Source**
I'm going fully open-source for this project.
Compiler is being written in NASM assembly with embedded PE/ELF so there is no need for any kind of linker to build the compiler.
Thus far got a 1st draft on the PE done with all the Kernel32 symbols I believe I need,
with an executable which loads and then spins forever.
Will continue to bring up on Windows first to verify PE works on real Windows boxes,
before migrating back to Linux and Wine. More later as this project continues ...
Work in progress here: [github.com/TimothyLottes/A](https://github.com/TimothyLottes/A)