docs(ideation→track): Move report into intent_dsl_survey_20260612 folder
Per user instruction: the report is too closely related to the track
to live in the general docs/ideation/ folder. It's the track's main
deliverable, not a general ideation doc. The existing convention for
track reports is the track folder (e.g., nagent_review_20260608/report.md).
This commit is the phase 2+3 work:
- Adds the integrated report (417 lines, 8 ## headings, 40 ###)
to conductor/tracks/intent_dsl_survey_20260612/report.md
- Adds 5 Tier 2 sub-reports (1319 lines combined) to
conductor/tracks/intent_dsl_survey_20260612/research/
- Removes the old docs/ideation/ location (moved, not duplicated)
- Updates spec.md, plan.md, metadata.json, tracks.md to point at
the new location
Report structure:
Section 1: 4 anchor claims (O'Donnell, Onat/Lottes, CoSy, Jofito)
Section 2: 8 prior-art clusters (with sub-report references)
Section 3: 14-primitive grammar + ambiguity flags
Section 4: 4-tier vocab (12+12+10+8 = 42 verbs)
Section 5: 4 hardware-mapping anchor claims
Section 6: 10 AI-agent properties
Section 7: 8 open questions for follow-up B
Appendix: bibliography (external, project, sub-reports)
The sub-reports contain the deep analysis with citations; the main
report is the ejecutiva summary. Tier 2 sub-agents handled the heavy
research (5 cluster sub-reports in research/); Tier 1 focused on
integration and writing the simpler sections inline.
Time-sensitive: report must complete before nagent v2.2.
This commit is contained in:
+1
-1
@@ -495,7 +495,7 @@ Lightweight chronology; full spec/plan/state per track is in the linked folder.
|
||||
#### Track: Intent-Based Scripting Languages Survey (NEW 2026-06-12) `[track-created: b389f1be]`
|
||||
*Link: [./tracks/intent_dsl_survey_20260612/](./tracks/intent_dsl_survey_20260612/), Spec: [./tracks/intent_dsl_survey_20260612/spec.md](./tracks/intent_dsl_survey_20260612/spec.md), Plan: [./tracks/intent_dsl_survey_20260612/plan.md](./tracks/intent_dsl_survey_20260612/plan.md) (to be authored by writing-plans skill)*
|
||||
|
||||
*Goal: Survey intent-based scripting languages as a design philosophy and propose a Meta-Tooling-facing intent DSL vocabulary. **Research-only** (non-impl): produces 1 markdown file at `docs/ideation/2026-06-12-intent-based-scripting-languages.md`. No new `src/` code, no new tests, no `pyproject.toml` changes. The report is the *foundation document* for the user's nagent v2.2 (its "Future-Track Candidate #4: Intent-based DSL" section), the placeholder `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER` (per `mcp_architecture_refactor_20260606/spec.md` §12.1 and `nagent_review_20260608/metadata.json:28`), and a future interpreter prototype (follow-up B track, separate). 7 sections: (1) the "intent-based" design philosophy (O'Donnell immediate-mode as the anchor); (2) prior art across 8 clusters (0: John O'Donnell IMGUI/MVC at johno.se/book/*; 1: Forth family — Forth, ColorForth, KYRA/Onat, x68/Lottes, Joy, CoSy/Bob Armstrong; 2: Array — APL, K, BQN, Uiua; 3: Intent-mapping — Jofito/Jody, jq, nagent tag protocol [rejected as model], Wasm; 4: Meta-Tooling DSLs — `mcp_dsl_20260606` placeholder, nagent's Bridge DSL, OpenAI/Anthropic tool-use; 5: SSDL shape primitives per `computational_shapes_ssdl_digest_20260608.md`; 6: Project's own Command Palette 33 commands; 7: `Result[T]` + `ErrorInfo` convention per `data_oriented_error_handling_20260606`); (3) the 14-primitive grammar formalized from the user's math pseudocode (`determinate`/`minor`/`matrix-transpose` snippets), with explicit ambiguity flags; (4) the 4-tier vocab (~40 verbs: T1 math ~10, T2 data pipeline ~12, T3 shell ~10, T4 AI-fuzzing tolerance ~8 — T4 is the novel contribution); (5) hardware mapping with 4 anchor claims (Onat/Lottes 2-register stack + magenta pipe + basic blocks + lambdas + preemptive scatter; O'Donnell "widgets are method invocations"; Forth/CoSy concatenative syntax; APL/K array data); (6) AI-agent properties (10 claims tying to existing project architecture: Meta-Tooling domain per `guide_meta_boundary.md`, runtime path through `cli_tool_bridge.py`, 3-layer security per `guide_tools.md`, 4 memory dimensions per nagent v2.1 §2.1, stable-to-volatile cache ordering, `Result[T]` envelope, Command Palette 33 commands, Hook API state fields, O'Donnell IEventTarget = `sandbox` verb, O'Donnell "reads are free" = cheap Tier 2 verbs); (7) ≥6 open questions for follow-up B (interpreter prototype) + connection block to `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER`. 4 phases: source gathering + outline (checkpoint commit), write sections 1-3, write sections 4-7, self-review + user review + commit + register in tracks.md. **Time-sensitive**: report must complete before nagent v2.2 ships.*
|
||||
*Goal: Survey intent-based scripting languages as a design philosophy and propose a Meta-Tooling-facing intent DSL vocabulary. **Research-only** (non-impl): produces 1 markdown file at `conductor/tracks/intent_dsl_survey_20260612/report.md`. No new `src/` code, no new tests, no `pyproject.toml` changes. The report is the *foundation document* for the user's nagent v2.2 (its "Future-Track Candidate #4: Intent-based DSL" section), the placeholder `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER` (per `mcp_architecture_refactor_20260606/spec.md` §12.1 and `nagent_review_20260608/metadata.json:28`), and a future interpreter prototype (follow-up B track, separate). 7 sections: (1) the "intent-based" design philosophy (O'Donnell immediate-mode as the anchor); (2) prior art across 8 clusters (0: John O'Donnell IMGUI/MVC at johno.se/book/*; 1: Forth family — Forth, ColorForth, KYRA/Onat, x68/Lottes, Joy, CoSy/Bob Armstrong; 2: Array — APL, K, BQN, Uiua; 3: Intent-mapping — Jofito/Jody, jq, nagent tag protocol [rejected as model], Wasm; 4: Meta-Tooling DSLs — `mcp_dsl_20260606` placeholder, nagent's Bridge DSL, OpenAI/Anthropic tool-use; 5: SSDL shape primitives per `computational_shapes_ssdl_digest_20260608.md`; 6: Project's own Command Palette 33 commands; 7: `Result[T]` + `ErrorInfo` convention per `data_oriented_error_handling_20260606`); (3) the 14-primitive grammar formalized from the user's math pseudocode (`determinate`/`minor`/`matrix-transpose` snippets), with explicit ambiguity flags; (4) the 4-tier vocab (~40 verbs: T1 math ~10, T2 data pipeline ~12, T3 shell ~10, T4 AI-fuzzing tolerance ~8 — T4 is the novel contribution); (5) hardware mapping with 4 anchor claims (Onat/Lottes 2-register stack + magenta pipe + basic blocks + lambdas + preemptive scatter; O'Donnell "widgets are method invocations"; Forth/CoSy concatenative syntax; APL/K array data); (6) AI-agent properties (10 claims tying to existing project architecture: Meta-Tooling domain per `guide_meta_boundary.md`, runtime path through `cli_tool_bridge.py`, 3-layer security per `guide_tools.md`, 4 memory dimensions per nagent v2.1 §2.1, stable-to-volatile cache ordering, `Result[T]` envelope, Command Palette 33 commands, Hook API state fields, O'Donnell IEventTarget = `sandbox` verb, O'Donnell "reads are free" = cheap Tier 2 verbs); (7) ≥6 open questions for follow-up B (interpreter prototype) + connection block to `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER`. 4 phases: source gathering + outline (checkpoint commit), write sections 1-3, write sections 4-7, self-review + user review + commit + register in tracks.md. **Time-sensitive**: report must complete before nagent v2.2 ships.*
|
||||
|
||||
*Spec approved 2026-06-12 (commit `b389f1be`). 789 lines; modeled on `data_oriented_error_handling_20260606/spec.md`.*
|
||||
|
||||
|
||||
@@ -7,7 +7,7 @@
|
||||
"type": "research-only",
|
||||
"domain": "Meta-Tooling",
|
||||
"blocked_by": [],
|
||||
"deliverable": "docs/ideation/2026-06-12-intent-based-scripting-languages.md",
|
||||
"deliverable": "conductor/tracks/intent_dsl_survey_20260612/report.md",
|
||||
"consumed_by": [
|
||||
"nagent v2.2 (Future-Track Candidate #4: Intent-based DSL)",
|
||||
"intent_dsl_for_meta_tooling_20260608_PLACEHOLDER (per mcp_architecture_refactor_20260606/spec.md §12.1)",
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Produce the report at `docs/ideation/2026-06-12-intent-based-scripting-languages.md` (7 sections, ~3500-5000 lines) surveying intent-based scripting languages and proposing a 4-tier vocab for a Meta-Tooling-facing intent DSL. Research-only — no `src/` code, no tests, no `pyproject.toml` changes.
|
||||
**Goal:** Produce the report at `conductor/tracks/intent_dsl_survey_20260612/report.md` (7 sections, ~3500-5000 lines) surveying intent-based scripting languages and proposing a 4-tier vocab for a Meta-Tooling-facing intent DSL. Research-only — no `src/` code, no tests, no `pyproject.toml` changes.
|
||||
|
||||
**Architecture:** The track produces 1 markdown file in `docs/ideation/`. The 7 sections are written sequentially across 3 self-directed phases (1-3) plus 1 user-approval phase (4). Each phase ends with a checkpoint commit. The user sees only the final report and either approves (commit + register in `tracks.md`) or iterates.
|
||||
|
||||
@@ -20,7 +20,7 @@ This track creates/modifies the following files:
|
||||
|
||||
| File | Action | Purpose |
|
||||
|---|---|---|
|
||||
| `docs/ideation/2026-06-12-intent-based-scripting-languages.md` | **Create** | The report. ~3500-5000 lines, 7 sections, 1 appendix. |
|
||||
| `conductor/tracks/intent_dsl_survey_20260612/report.md` | **Create** | The report. ~3500-5000 lines, 7 sections, 1 appendix. |
|
||||
| `conductor/tracks/intent_dsl_survey_20260612/state.toml` | **Create** | State file per `conductor/workflow.md` template. |
|
||||
| `conductor/tracks/intent_dsl_survey_20260612/metadata.json` | **Create** | Track metadata per spec §1. |
|
||||
| `conductor/tracks.md` | **Modify** | Register track as completed after phase 4. |
|
||||
@@ -160,7 +160,7 @@ tracks_md_registered = false
|
||||
"type": "research-only",
|
||||
"domain": "Meta-Tooling",
|
||||
"blocked_by": [],
|
||||
"deliverable": "docs/ideation/2026-06-12-intent-based-scripting-languages.md",
|
||||
"deliverable": "conductor/tracks/intent_dsl_survey_20260612/report.md",
|
||||
"consumed_by": [
|
||||
"nagent v2.2 (Future-Track Candidate #4: Intent-based DSL)",
|
||||
"intent_dsl_for_meta_tooling_20260608_PLACEHOLDER (per mcp_architecture_refactor_20260606/spec.md §12.1)",
|
||||
@@ -185,7 +185,7 @@ git commit -m "conductor(track): Add intent_dsl_survey_20260612 state + metadata
|
||||
### Task 3: Write the 7-section outline as a stub
|
||||
|
||||
**Files:**
|
||||
- Create: `docs/ideation/2026-06-12-intent-based-scripting-languages.md`
|
||||
- Create: `conductor/tracks/intent_dsl_survey_20260612/report.md`
|
||||
|
||||
- [ ] **Step 1: Create the file with the header + 7-section outline (1 paragraph per section)**
|
||||
|
||||
@@ -276,16 +276,16 @@ Use the following structure (one paragraph per section; the actual content gets
|
||||
|
||||
Run:
|
||||
```bash
|
||||
wc -l docs/ideation/2026-06-12-intent-based-scripting-languages.md
|
||||
grep -c "^## " docs/ideation/2026-06-12-intent-based-scripting-languages.md
|
||||
grep -c "^### " docs/ideation/2026-06-12-intent-based-scripting-languages.md
|
||||
wc -l conductor/tracks/intent_dsl_survey_20260612/report.md
|
||||
grep -c "^## " conductor/tracks/intent_dsl_survey_20260612/report.md
|
||||
grep -c "^### " conductor/tracks/intent_dsl_survey_20260612/report.md
|
||||
```
|
||||
Expected: ~70 lines, 7 `## ` headings (sections 1-7 + Appendix), ≥10 `### ` headings (clusters + tiers + claims).
|
||||
|
||||
- [ ] **Step 3: Phase 1 checkpoint commit**
|
||||
|
||||
```bash
|
||||
git add docs/ideation/2026-06-12-intent-based-scripting-languages.md
|
||||
git add conductor/tracks/intent_dsl_survey_20260612/report.md
|
||||
git commit -m "docs(ideation): Add intent_dsl_survey_20260612 outline stub"
|
||||
git notes add -m "intent_dsl_survey_20260612 Phase 1 checkpoint (outline)
|
||||
|
||||
@@ -319,7 +319,7 @@ git commit -m "conductor(track): Mark intent_dsl_survey_20260612 phase 1 complet
|
||||
### Task 4: Write section 1 (the philosophy)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 1
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 1
|
||||
|
||||
- [ ] **Step 1: Replace the section 1 stub with the 4 anchor claims**
|
||||
|
||||
@@ -345,7 +345,7 @@ Continue to next task. Sections 1-3 commit together at end of phase 2.
|
||||
### Task 5: Write section 2 cluster 0 (O'Donnell)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 2 cluster 0
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 2 cluster 0
|
||||
|
||||
- [ ] **Step 1: Replace the cluster 0 stub**
|
||||
|
||||
@@ -356,7 +356,7 @@ Per spec §3.2 cluster 0, write 2-3 sentences on the design idea, 2-3 sentences
|
||||
### Task 6: Write section 2 cluster 1 (Concatenative)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 2 cluster 1
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 2 cluster 1
|
||||
|
||||
- [ ] **Step 1: Replace the cluster 1 stub with 6 entries**
|
||||
|
||||
@@ -370,7 +370,7 @@ For each:
|
||||
### Task 7: Write section 2 cluster 2 (Array)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 2 cluster 2
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 2 cluster 2
|
||||
|
||||
- [ ] **Step 1: Replace the cluster 2 stub with 4 entries**
|
||||
|
||||
@@ -384,7 +384,7 @@ For each:
|
||||
### Task 8: Write section 2 cluster 3 (Intent-Mapping)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 2 cluster 3
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 2 cluster 3
|
||||
|
||||
- [ ] **Step 1: Replace the cluster 3 stub with 4 entries**
|
||||
|
||||
@@ -398,7 +398,7 @@ For each:
|
||||
### Task 9: Write section 2 cluster 4 (Meta-Tooling DSLs)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 2 cluster 4
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 2 cluster 4
|
||||
|
||||
- [ ] **Step 1: Replace the cluster 4 stub with 4 entries**
|
||||
|
||||
@@ -412,7 +412,7 @@ For each:
|
||||
### Task 10: Write section 2 cluster 5 (SSDL)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 2 cluster 5
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 2 cluster 5
|
||||
|
||||
- [ ] **Step 1: Replace the cluster 5 stub with the 6 SSDL primitives + 7 modifiers**
|
||||
|
||||
@@ -423,7 +423,7 @@ Include the table from the SSDL digest (6 primitives + 7 modifiers).
|
||||
### Task 11: Write section 2 cluster 6 (Command Palette)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 2 cluster 6
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 2 cluster 6
|
||||
|
||||
- [ ] **Step 1: Replace the cluster 6 stub with the 33 existing commands + the "Everything" mode future work**
|
||||
|
||||
@@ -434,7 +434,7 @@ List 5-10 representative commands by category (View, Edit, Project, Layout, Them
|
||||
### Task 12: Write section 2 cluster 7 (Result)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 2 cluster 7
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 2 cluster 7
|
||||
|
||||
- [ ] **Step 1: Replace the cluster 7 stub with the `Result[T]` + `ErrorInfo` pattern**
|
||||
|
||||
@@ -445,7 +445,7 @@ Include the 12 ErrorKind values and the `Result[T]` dataclass signature.
|
||||
### Task 13: Write section 3 (the grammar)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 3
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 3
|
||||
|
||||
- [ ] **Step 1: Replace the section 3 stub with the 14-primitive grammar table + ambiguity flags + precedence rules**
|
||||
|
||||
@@ -466,15 +466,15 @@ Per spec §3.3:
|
||||
|
||||
Run:
|
||||
```bash
|
||||
wc -l docs/ideation/2026-06-12-intent-based-scripting-languages.md
|
||||
grep -c "^## " docs/ideation/2026-06-12-intent-based-scripting-languages.md
|
||||
wc -l conductor/tracks/intent_dsl_survey_20260612/report.md
|
||||
grep -c "^## " conductor/tracks/intent_dsl_survey_20260612/report.md
|
||||
```
|
||||
Expected: ~1500-2500 lines, 8 `## ` headings (sections 1-7 + Appendix).
|
||||
|
||||
- [ ] **Step 2: Phase 2 checkpoint commit**
|
||||
|
||||
```bash
|
||||
git add docs/ideation/2026-06-12-intent-based-scripting-languages.md
|
||||
git add conductor/tracks/intent_dsl_survey_20260612/report.md
|
||||
git commit -m "docs(ideation): Write intent_dsl_survey_20260612 sections 1-3
|
||||
|
||||
Section 1: the intent-based design philosophy (4 anchor claims)
|
||||
@@ -509,7 +509,7 @@ Commit the state update.
|
||||
### Task 15: Write section 4 Tier 1 (math)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 4 Tier 1
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 4 Tier 1
|
||||
|
||||
- [ ] **Step 1: Replace the Tier 1 stub with the 10 math verbs**
|
||||
|
||||
@@ -529,7 +529,7 @@ Tier 1 SSDL shape tags: most are `[I]` (single instruction) since they're scalar
|
||||
### Task 16: Write section 4 Tier 2 (data pipeline)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 4 Tier 2
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 4 Tier 2
|
||||
|
||||
- [ ] **Step 1: Replace the Tier 2 stub with the 12 data-pipeline verbs**
|
||||
|
||||
@@ -550,7 +550,7 @@ Tier 2 SSDL shape tags: `scan` is `[I]`, `filter`/`map` are `===>` (codepath) or
|
||||
### Task 17: Write section 4 Tier 3 (shell)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 4 Tier 3
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 4 Tier 3
|
||||
|
||||
- [ ] **Step 1: Replace the Tier 3 stub with the 10 shell verbs**
|
||||
|
||||
@@ -563,7 +563,7 @@ Tier 3 SSDL shape tags: most are `[I]` (single instruction); `wait`/`poll` are `
|
||||
### Task 18: Write section 4 Tier 4 (AI-fuzzing tolerance)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 4 Tier 4
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 4 Tier 4
|
||||
|
||||
- [ ] **Step 1: Replace the Tier 4 stub with the 8 AI-fuzzing-tolerance verbs**
|
||||
|
||||
@@ -584,7 +584,7 @@ Tier 4 SSDL shape tags: most are `[I]` (single instruction); `try { ... } recove
|
||||
### Task 19: Write section 5 (hardware mapping)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 5
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 5
|
||||
|
||||
- [ ] **Step 1: Replace the section 5 stub with the 4 anchor claims**
|
||||
|
||||
@@ -597,7 +597,7 @@ Per spec §3.5, the 4 anchor claims:
|
||||
### Task 20: Write section 6 (AI-agent properties)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 6
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 6
|
||||
|
||||
- [ ] **Step 1: Replace the section 6 stub with the 10 claims**
|
||||
|
||||
@@ -616,7 +616,7 @@ Per spec §3.6, the 10 claims. Each claim is 1-2 paragraphs. Cite the specific p
|
||||
### Task 21: Write section 7 (open questions)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` section 7
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` section 7
|
||||
|
||||
- [ ] **Step 1: Replace the section 7 stub with 8 open questions + the placeholder connection block**
|
||||
|
||||
@@ -637,7 +637,7 @@ The 8 questions from spec §3.7:
|
||||
### Task 22: Write the Appendix (bibliography)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` Appendix
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` Appendix
|
||||
|
||||
- [ ] **Step 1: Replace the Appendix stub with the full bibliography**
|
||||
|
||||
@@ -673,9 +673,9 @@ For external references (per spec §13.3):
|
||||
|
||||
Run:
|
||||
```bash
|
||||
wc -l docs/ideation/2026-06-12-intent-based-scripting-languages.md
|
||||
grep -c "^## " docs/ideation/2026-06-12-intent-based-scripting-languages.md
|
||||
grep -c "^### " docs/ideation/2026-06-12-intent-based-scripting-languages.md
|
||||
wc -l conductor/tracks/intent_dsl_survey_20260612/report.md
|
||||
grep -c "^## " conductor/tracks/intent_dsl_survey_20260612/report.md
|
||||
grep -c "^### " conductor/tracks/intent_dsl_survey_20260612/report.md
|
||||
```
|
||||
Expected: ~3500-5000 lines, 8 `## ` headings (sections 1-7 + Appendix), ~30-50 `### ` headings (clusters + tiers + claims + sub-sections).
|
||||
|
||||
@@ -695,7 +695,7 @@ For each criterion, confirm:
|
||||
- [ ] **Step 3: Phase 3 checkpoint commit**
|
||||
|
||||
```bash
|
||||
git add docs/ideation/2026-06-12-intent-based-scripting-languages.md
|
||||
git add conductor/tracks/intent_dsl_survey_20260612/report.md
|
||||
git commit -m "docs(ideation): Write intent_dsl_survey_20260612 sections 4-7 + Appendix
|
||||
|
||||
Section 4: the 4-tier vocab (~40 verbs across T1 math, T2 data
|
||||
@@ -748,13 +748,13 @@ Commit the state update.
|
||||
### Task 24: Self-review per the brainstorming skill
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` (fix any issues found)
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md` (fix any issues found)
|
||||
|
||||
- [ ] **Step 1: Placeholder scan**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
grep -nE "TBD|TODO|FIXME|XXX|\?\?\?" docs/ideation/2026-06-12-intent-based-scripting-languages.md
|
||||
grep -nE "TBD|TODO|FIXME|XXX|\?\?\?" conductor/tracks/intent_dsl_survey_20260612/report.md
|
||||
```
|
||||
Expected: no matches. If any match, fix them inline.
|
||||
|
||||
@@ -796,7 +796,7 @@ Fix any issues inline.
|
||||
- [ ] **Step 5: Commit any fixes (if any were made)**
|
||||
|
||||
```bash
|
||||
git add docs/ideation/2026-06-12-intent-based-scripting-languages.md
|
||||
git add conductor/tracks/intent_dsl_survey_20260612/report.md
|
||||
git commit -m "docs(ideation): Self-review fixes for intent_dsl_survey_20260612
|
||||
|
||||
[Describe the fixes per the self-review pass]"
|
||||
@@ -816,7 +816,7 @@ Update `state.toml`:
|
||||
- [ ] **Step 1: Show the report to the user**
|
||||
|
||||
Tell the user:
|
||||
> "Report is ready for your review at `docs/ideation/2026-06-12-intent-based-scripting-languages.md`. ~[N] lines, 7 sections, 4-tier vocab with ~40 verbs, 8-cluster prior art survey, 4 hardware anchor claims, 10 AI-agent properties, 8 open questions for the follow-up interpreter prototype. Please review and let me know if you want any changes."
|
||||
> "Report is ready for your review at `conductor/tracks/intent_dsl_survey_20260612/report.md`. ~[N] lines, 7 sections, 4-tier vocab with ~40 verbs, 8-cluster prior art survey, 4 hardware anchor claims, 10 AI-agent properties, 8 open questions for the follow-up interpreter prototype. Please review and let me know if you want any changes."
|
||||
|
||||
- [ ] **Step 2: Wait for user response**
|
||||
|
||||
@@ -827,7 +827,7 @@ The user will either:
|
||||
### Task 26: Apply user feedback (if any)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ideation/2026-06-12-intent-based-scripting-languages.md`
|
||||
- Modify: `conductor/tracks/intent_dsl_survey_20260612/report.md`
|
||||
|
||||
- [ ] **Step 1: If user requested changes, apply them**
|
||||
|
||||
@@ -836,7 +836,7 @@ Make the requested edits to the report.
|
||||
- [ ] **Step 2: Commit the changes**
|
||||
|
||||
```bash
|
||||
git add docs/ideation/2026-06-12-intent-based-scripting-languages.md
|
||||
git add conductor/tracks/intent_dsl_survey_20260612/report.md
|
||||
git commit -m "docs(ideation): Apply user review feedback to intent_dsl_survey_20260612
|
||||
|
||||
[Describe the changes per the user's feedback]"
|
||||
@@ -844,7 +844,7 @@ git commit -m "docs(ideation): Apply user review feedback to intent_dsl_survey_2
|
||||
|
||||
- [ ] **Step 3: Show the updated report to the user for re-review**
|
||||
|
||||
> "Applied your feedback in [section/commit reference]. Updated report is at `docs/ideation/2026-06-12-intent-based-scripting-languages.md`. Please confirm the changes are good."
|
||||
> "Applied your feedback in [section/commit reference]. Updated report is at `conductor/tracks/intent_dsl_survey_20260612/report.md`. Please confirm the changes are good."
|
||||
|
||||
Loop back to Task 25 Step 1 if user wants more changes.
|
||||
|
||||
@@ -864,7 +864,7 @@ For each criterion, confirm ✓. Update `state.toml` `[verification]` table.
|
||||
If the report was already committed in Task 26, skip this commit. Otherwise:
|
||||
|
||||
```bash
|
||||
git add docs/ideation/2026-06-12-intent-based-scripting-languages.md
|
||||
git add conductor/tracks/intent_dsl_survey_20260612/report.md
|
||||
git commit -m "docs(ideation): Finalize intent_dsl_survey_20260612 report"
|
||||
```
|
||||
|
||||
@@ -931,7 +931,7 @@ Add the new track at the top of the recently-completed section:
|
||||
```
|
||||
#### Track: Intent-Based Scripting Languages Survey `[COMPLETE 2026-06-12]`
|
||||
*Link: [./tracks/intent_dsl_survey_20260612/](./tracks/intent_dsl_survey_20260612/), Spec: [./tracks/intent_dsl_survey_20260612/spec.md](./tracks/intent_dsl_survey_20260612/spec.md), Plan: [./tracks/intent_dsl_survey_20260612/plan.md](./tracks/intent_dsl_survey_20260612/plan.md)*
|
||||
*Status: COMPLETE 2026-06-12. Report at docs/ideation/2026-06-12-intent-based-scripting-languages.md (~[N] lines, 7 sections, 4-tier vocab with ~40 verbs). Time-sensitive goal met: complete before nagent v2.2. Will be consumed by nagent v2.2 (Candidate #4) and future interpreter prototype (follow-up B).*
|
||||
*Status: COMPLETE 2026-06-12. Report at conductor/tracks/intent_dsl_survey_20260612/report.md (~[N] lines, 7 sections, 4-tier vocab with ~40 verbs). Time-sensitive goal met: complete before nagent v2.2. Will be consumed by nagent v2.2 (Candidate #4) and future interpreter prototype (follow-up B).*
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Commit the tracks.md update**
|
||||
@@ -988,7 +988,7 @@ All 21 spec requirements are covered.
|
||||
|
||||
**3. Type consistency:** The plan uses consistent terminology throughout:
|
||||
- Track name: `intent_dsl_survey_20260612` (consistent across all tasks)
|
||||
- Report path: `docs/ideation/2026-06-12-intent-based-scripting-languages.md` (consistent)
|
||||
- Report path: `conductor/tracks/intent_dsl_survey_20260612/report.md` (consistent)
|
||||
- Spec path: `conductor/tracks/intent_dsl_survey_20260612/spec.md` (consistent)
|
||||
- State path: `conductor/tracks/intent_dsl_survey_20260612/state.toml` (consistent)
|
||||
- Verbs are consistent with the spec's 4-tier list
|
||||
|
||||
@@ -0,0 +1,604 @@
|
||||
# Intent-Based Scripting Languages
|
||||
|
||||
**Track:** `intent_dsl_survey_20260612` (initialized 2026-06-12)
|
||||
**Date:** 2026-06-12
|
||||
**Location:** `conductor/tracks/intent_dsl_survey_20260612/report.md` (this file; moved from `docs/ideation/` per user instruction — the report is too closely related to the track to live in the general ideation folder)
|
||||
**Author:** Tier 1 Orchestrator (sections 1, 3, 4, 5, 6, 7, Appendix); Tier 2 sub-agents (section 2 clusters 0-4, with research sub-reports at `research/cluster_*.md`)
|
||||
**Status:** Draft for self-review (phase 3 of 4)
|
||||
|
||||
> **What this is.** A survey of intent-based scripting languages as a design philosophy, plus a proposed vocabulary (~40 verbs across 4 tiers) for a Meta-Tooling-facing intent DSL. The report is the foundation document for the user's nagent v2.2 (its "Future-Track Candidate #4" section) and for the future interpreter prototype (follow-up B track).
|
||||
>
|
||||
> **What this is NOT.** Not an interpreter, not a bridge script, not Application-side function-calling, not XML/JSON record formats. The DSL is Meta-Tooling-side per `docs/guide_meta_boundary.md` — the format external agents (Gemini CLI, OpenCode) emit when invoking `mcp_client.py` tools. The Application's provider-native function-calling stays unchanged.
|
||||
|
||||
---
|
||||
|
||||
## 1. The "Intent-Based" Design Philosophy
|
||||
|
||||
The DSL is grounded in four anchor claims. Each claim has a philosophical home and a specific design consequence for the vocab and grammar.
|
||||
|
||||
### 1.1 Claim 1 — Intent-based means the user's words are declarative intent, not imperative commands
|
||||
|
||||
Jofito (per its 2026 README update) calls itself an **"intent mapping engine"**: the user writes declarative intent (e.g., "find all pictures, filter out JPEGs, print the list"), and Jofito decomposes that intent into platform-optimal operations. From the Jofito README: *"jofito is a 'write the optimization once, reap the benefits everywhere' system that takes what the user wants to accomplish (intent) as input and decomposes it into operations that make the most sense for the current system."* (`https://codeberg.org/jbruchon/jofito`)
|
||||
|
||||
The canonical Jofito example is `list = scandir("/path/here/", {filter !extension=jpg,jpeg}) : print(list)` — a single declarative expression that replaces `find . -type f | grep -v jpg | grep -v jpeg`. The DSL inherits this framing: the verbs in §4 are **intent verbs** (e.g., `scan` for "I want to read a source", `filter` for "I want to keep only what matches", `audit` for "I want to record what happened"), not imperative primitives.
|
||||
|
||||
This is the *philosophical* anchor for the DSL: the user says *what they want*; the verbs are the way to say it; the bridge script and the MCP tools handle *how to do it*. The user's own math pseudocode (the `determinate`/`minor`/`matrix-transpose` snippets shared during spec review) operates at this declarative level — "here is the math, the verbs are the words."
|
||||
|
||||
### 1.2 Claim 2 — The hardware is the truth
|
||||
|
||||
The verbs must map to actual hardware/software stages, not abstract commands. The Onat/Lottes 2-register model (per `C:\projects\forth\bootslop\references\kyra_in-depth.md` and `X.com - Onat & Lottes Interaction 1.png.ocr.md`) gives the concrete hardware the DSL is mapped to:
|
||||
|
||||
- **2-register stack (RAX/RDX)**: the DSL's `->` chain *maps* to RAX-passed data. Each verb in the chain is a "word" in Onat's sense (no args, no returns — the X.com thread at `X.com - Onat & Lottes Interaction 1.png.ocr.md:80-86` quotes Lottes: "I laugh when people say C is like assembly, they were missing what we did in assembly back then, which was all registers and globals and gotos, no stacks").
|
||||
- **Magenta pipe `|` (KYRA) → our `->`**: same definition-boundary semantics, retargeted to data flow.
|
||||
- **Basic blocks `[ ]` (KYRA) → our `[ ]`**: compilation units; the parser produces a `[ ]` block per `->`-delimited stage.
|
||||
- **Lambdas `{ }` (KYRA) → our `arena { }`**: arena-scoped blocks; the contents are pre-scattered into tape-drive regions (per the X.com thread at line 55-61, where Onat describes Lottes's "common arguments pushed onto the tape using store duplication when they are known... so it's preemptive scatter, so later at call time there is no argument gather").
|
||||
|
||||
The verbs are not arbitrary. Each Tier 2 verb (data pipeline) and Tier 3 verb (shell) has a direct hardware mapping; this is what makes the verbs *fast* on the targeted hardware.
|
||||
|
||||
### 1.3 Claim 3 — The pipeline is immediate-mode
|
||||
|
||||
Per John O'Donnell's IMGUI essay (`https://johno.se/book/imgui.html`): *"Widgets, logically, change from being objects to being method invocations."* The pipeline `scan -> filter -> print` is not a Pipeline object with state; it is a sequence of method calls. Once execution ends, the pipeline's state is gone. The next invocation is independent.
|
||||
|
||||
This is the *paradigm* anchor for the DSL. It means:
|
||||
- The parser doesn't need to track pipeline state across executions; each invocation is independent.
|
||||
- The `->` chain has no "pipeline object" you can query, name, or pass around. The only way to "name" a chain is to wrap it in a function (`determinate(m, row) -> Scalar { ... }`).
|
||||
- Verbs exist *only* when called. There is no implicit verb inventory. (This is why the DSL's "Everything" mode in the Command Palette is implementable as a search across *text*, not across a *registry of pipeline objects*.)
|
||||
|
||||
O'Donnell's MVC essay (`https://johno.se/book/mvc.html`) extends this: *"Writes to Model are formalized through the addition of IEventTarget. This is a pure virtual interface that defines all possible state changes / events on a system wide level."* The DSL's `sandbox` verb is the IEventTarget boundary; the `audit` verb is the IEventTarget itself (see §6 Claim 9 and Claim 10).
|
||||
|
||||
### 1.4 Claim 4 — The vocabulary IS the user surface
|
||||
|
||||
CoSy (per `https://cosy.com/CoSy/Simplicity.html`): *"CoSy is a TimeStamped notebook/log created as an open vocabulary in Forth."* And: *"an extensive vocabulary evolved from APL via K, mainly slicing and dicing, searching & replacing, and applying verbs to each item in lists."*
|
||||
|
||||
For the DSL, the **vocabulary** is the user surface — not the syntax, not the parser, not the runtime. For AI agents that emit the DSL, the vocab is the API. A model that knows the 40 verbs in §4 and the 14 grammar primitives in §3 can express any intent that the DSL supports. There is no separate "API documentation" — the verbs ARE the API.
|
||||
|
||||
This is why the report devotes so much space to the vocab (§4) and so little to the syntax (§3). The syntax is trivial (RPN with a few delimiters); the vocabulary is the substance.
|
||||
|
||||
### 1.5 The four claims together
|
||||
|
||||
The four claims are not independent; they compose:
|
||||
|
||||
- Claim 1 (intent-mapping) → the user expresses what they want; the verbs are the vocabulary.
|
||||
- Claim 2 (hardware is the truth) → the verbs map to real data-oriented pipeline stages.
|
||||
- Claim 3 (immediate-mode) → the verbs are method calls, not stateful objects; pipelines have no persistent state.
|
||||
- Claim 4 (vocabulary is the user surface) → the 40-verb vocab is the API; the syntax is trivial.
|
||||
|
||||
The composition is: a user expresses intent (Claim 1) using a verb (Claim 4) that maps to a hardware stage (Claim 2) in a single per-frame composition (Claim 3). The full report is a working-out of this composition.
|
||||
|
||||
---
|
||||
|
||||
## 2. Prior Art Survey (8 Clusters)
|
||||
|
||||
This section surveys the design lineage across 8 clusters. Each cluster: a "cluster claim" (what the DSL inherits from the cluster as a whole), then 1 sentence per entry, then specific "take" bullets that §3, §4, §5, and §6 reference.
|
||||
|
||||
The detailed analysis for each cluster lives in the research sub-reports at `research/cluster_*.md` (relative to this file). This section is the executive summary; the sub-reports are the evidence.
|
||||
|
||||
### Cluster 0 — Immediate-Mode Paradigm (philosophical anchor)
|
||||
|
||||
**Cluster claim.** The DSL's *paradigm* — verbs as method calls, no persistent state, reads free, writes formalized — is the direct application of John O'Donnell's IMGUI/MVC framework to a Meta-Tooling context. (Per the full sub-report at `research/cluster_0_odonnell.md`.)
|
||||
|
||||
**Entry: John O'Donnell — IMGUI / The Pitch / MVC / IM-MVC roadmap.** `https://johno.se/book/imgui.html`, `https://johno.se/book/pitch.html`, `https://johno.se/book/immvc.html`, `https://johno.se/book/mvc.html`. Four interconnected pages laying out a unified paradigm: visualization is not inherently stateful; widgets are method invocations not objects; the "reads are free, writes are formalized" invariant via a single IEventTarget interface; the View must not expose scene-graph abstractions.
|
||||
|
||||
**Take bullets (referenced by §5, §6):**
|
||||
- *Anchor Claim 3 (IEventTarget as single event interface for all state changes):* *"Experience dictates that there only be a single IEventTarget interface that is responsible for all 'system events'."* — `mvc.html`, "Why only a single event interface" section.
|
||||
- *Anchor Claim 4 (View must not expose scene-graph abstractions):* *"The corresponding interface should be of the form: `view::drawMesh(mesh, transform, anyOtherRenderState);`"* — `mvc.html`, "View" section.
|
||||
- *"Writes to Model are formalized through the addition of IEventTarget. This is a pure virtual interface that defines all possible state changes / events on a system wide level."* — `mvc.html`, "Writing to Model state" section.
|
||||
- *"What is a non-stateful view? Basically it is a procedural interface (as opposed to a collection of objects with methods), in essence very much to what DirectX 9 is."* — `pitch.html`, "MVC revisited" section.
|
||||
- *"However, due to the rapide advances of GPU based rendering over the past 10+ years, this premise no longer holds."* — `pitch.html`, "However!" section.
|
||||
- The 800,000-vertex single-draw-call empirical result at Jungle Peak (GeForce 6 hardware) — `pitch.html`, batch rendering section.
|
||||
|
||||
### Cluster 1 — Concatenative (Forth family)
|
||||
|
||||
**Cluster claim.** The DSL's *syntax* — postfix RPN, stack-passed arguments, no AST object — is the Forth tradition as refined by Onat Türkçüoğlu's KYRA (2-register stack, magenta pipe as definition boundary, basic blocks and lambdas, preemptive scatter) and Timothy Lottes's x68/5th (32-bit instruction granularity, annotation overlay, "register file as aliased global namespace"). Bob Armstrong's CoSy is the user's-vocabulary-as-the-surface model. (Per the full sub-report at `research/cluster_1_concatenative.md`.)
|
||||
|
||||
**Entries:**
|
||||
|
||||
- **Forth** (Chuck Moore, 1970). The canonical RPN stack-passing language; the colon-word/semicolon definition pattern; threaded code compilation; self-hosting via meta-compilation. `https://en.wikipedia.org/wiki/Forth_(programming_language)`. **Take:** the pure concatenative property — *"concatenation of two programs denotes the composition of the two functions they denote"* (Joy's formalization) — is the foundational claim. The DSL inherits the postfix syntax and the rejection of named lambda parameters (parameters are unnamed; they live on the stack).
|
||||
- **ColorForth** (Chuck Moore, ~1990s). Color encodes semantics (define/compile/execute/variable). `https://en.wikipedia.org/wiki/ColorForth`. **Take:** the idea that visual/structural encoding can replace keywords, and the direct-mapped editor.
|
||||
- **KYRA / VAMP** (Onat Türkçüoğlu, SVFIG 2025). 2-register stack (RAX/RDX); magenta pipe `|` as definition boundary emitting `RET + xchg rax, rdx`; basic blocks `[ ]` and lambdas `{ }` as compilation units; preemptive scatter. `C:\projects\forth\bootslop\references\kyra_in-depth.md`, `forth_day_2020_in-depth.md`. **Take:** the bracket operators (`[ ]`, `{ }`) and the arena-scoped blocks (`arena { }`).
|
||||
- **x68 / 5th / "Ear" + "Toe"** (Timothy Lottes, 2007-2026). 32-bit instruction granularity; annotation overlay; folded interpreter; "register file as aliased global namespace" (X.com thread, lines 95-103). `C:\projects\forth\bootslop\references\neokineogfx_in-depth.md`, `blog_in-depth.md`. **Take:** the 32-bit token encoding, the annotation overlay pattern, the folded-interpreter optimization.
|
||||
- **Joy** (William Byrd, Manfred von Thun, 2001-2003). Purely functional concatenative; quotations as first-class values; combinator library (`map`, `filter`, `fold`, `binrec`, `primrec`, `linrec`). `https://en.wikipedia.org/wiki/Joy_(programming_language)`. **Take:** the quotation-as-first-class-value concept and the combinator library as the model for Tier 2 verbs.
|
||||
- **CoSy** (Bob Armstrong, ongoing). TimeStamped notebook/log in Forth; all nouns are lists/trees with 3-cell headers `(Type Count refCount)`; modulo indexing; "extensive vocabulary evolved from APL via K." `https://cosy.com/CoSy/Simplicity.html`, `https://cosy.com/4thCoSy/`. **Take:** the open-vocabulary culture; the modulo indexing (forgiving of off-by-one AI errors); the 3-cell header as a universal data structure.
|
||||
|
||||
**Section 5 grounding (per the cluster 1 synthesis).** The DSL's `->` pipeline, `[ ]`/`{ }` blocks, `arena { }` memory model, `scatter`/`gather` verbs, `map`/`filter`/`fold` combinators, modulo indexing, and the "no AST object" parsing strategy all have direct concatenative lineage. See `conductor/tracks/intent_dsl_survey_20260612/research/cluster_1_concatenative.md` §"Synthesis for Section 5" for the verb-by-verb mapping table.
|
||||
|
||||
### Cluster 2 — Array Languages (APL lineage)
|
||||
|
||||
**Cluster claim.** The DSL's *data model* — array as universal type, every verb vectorizes, multi-dimensional indexing — is the APL tradition as refined by K (ASCII-only with overloading), BQN (clean modern semantics with function trains), and Uiua (stack-based execution). The DSL inherits the *philosophy* (succinct expression of algorithms) but uses ASCII-compatible representation rather than APL's custom character set. (Per the full sub-report at `research/cluster_2_array.md`.)
|
||||
|
||||
**Entries:**
|
||||
|
||||
- **APL** (Kenneth Iverson, 1962; Turing Award 1979). The foundational array language; array as universal type; every glyph is a function; right-to-left evaluation with no precedence. `https://en.wikipedia.org/wiki/APL_(programming_language)`, `https://www.dyalog.com/`. **Take:** the array-as-universal-type principle and the right-to-left evaluation model.
|
||||
- **K / q** (Arthur Whitney, KX Systems, 1993). ASCII-only with heavy context-sensitive overloading; first-class functions borrowed from Scheme; foundation of kdb+ in-memory columnar database. `https://en.wikipedia.org/wiki/K_(programming_language)`, `https://kx.com/`. **Take:** the context-sensitive operator philosophy and first-class functions.
|
||||
- **BQN** (Marshall Lochbaum, 2020). Modernized APL with clean semantics; context-free grammar; function trains. `https://mlochbaum.github.io/BQN/`. **Take:** the train composition pattern as the most expressive tacit mechanism in the family.
|
||||
- **Uiua** (Tony Morris, 2023). Stack-based execution; modern open-source development; online Pad for onboarding. `https://www.uiua.org/`, `https://github.com/uiua-lang/uiua`. **Take:** the stack-based execution model as a viable alternative to named parameters, and the modern onboarding-UX model.
|
||||
|
||||
**Section 5 grounding (per the cluster 2 synthesis).** The DSL's `for x .. n` (mapping to APL's `ιN` + reduce, BQN's `↕N`, K's `!R`) and `result[row, col]` (mapping to APL's multi-dim indexing, BQN's `⊏`, K's `@`) inherit directly from this cluster. See `conductor/tracks/intent_dsl_survey_20260612/research/cluster_2_array.md` §"Synthesis for the DSL" for the verb-by-verb mapping table.
|
||||
|
||||
### Cluster 3 — Intent-Mapping
|
||||
|
||||
**Cluster claim.** The DSL's *use case* — a compact, intent-expressive scripting language that maps user intent to platform-optimal operations — is the Jofito tradition as the user has been exploring it. The pipe-coalescing optimization (find/grep/sort/unique collapse into one in-memory script) is the runtime efficiency claim. The nagent tag protocol is *mentioned and explicitly rejected* (no XML angle brackets) but the *structured-protocol idea* is retained. (Per the full sub-report at `research/cluster_3_intent_mapping.md`.)
|
||||
|
||||
**Entries:**
|
||||
|
||||
- **Jofito** (Jody Bruchon, 2023-2026). "Intent mapping engine" (per 2026 README update); arena allocation; leader/chaser thread model; pipe-coalescing. `https://codeberg.org/jbruchon/jofito`, `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt`. **Take:** the "intent mapping engine" framing is the DSL's *use case*; the leader/chaser pattern is the *implementation hint*; the arena allocation is the *memory model*. (Specifically: the DSL's `scan -> filter -> print` chain is directly inspired by Jofito's `scandir(...) : filter : print` predicate chain.)
|
||||
- **jq** (Stephen Dolan, 2012-). JSON-path filter language; the `|` pipe operator (replaced by `->` in the DSL). `https://en.wikipedia.org/wiki/Jq_(programming_language)`, `https://jqlang.org/`. **Take:** the filter-as-expression style; `select(condition)`, `map`, `reduce`, `unique` as Tier 2 verb precedents.
|
||||
- **nagent's tag protocol** (per `conductor/tracks/nagent_review_20260608/agent_review_v2_1_20260612.md:50`, `decisions.md:50`). XML-ish self-closing tags (`<nagent-read path="..."/>`). **TAKEN:** the structured-protocol idea (named operation with typed attributes; LLM-emit-able; self-delimiting). **REJECTED:** the XML angle-bracket notation, per the user's explicit instruction: *"ignore its record formats as they problably will be less xml/json based as I don't like them"* (`decisions.md:50`). The DSL must use a different notation that preserves the structured-protocol properties.
|
||||
- **WebAssembly** (W3C, 2017-). Linear memory; sectioned binary format; structured control flow. `https://en.wikipedia.org/wiki/WebAssembly`. **Take (one paragraph):** the linear memory model is the modern reference for the "tape drive" argument-passing semantics that grounds the DSL's Tier 2 verbs. The streaming-parse design suggests a parsing strategy where verb names and signatures are validated early (cheap) and arguments are parsed on demand (deferred).
|
||||
|
||||
**Section 4 grounding (per the cluster 3 synthesis).** Each Tier 2 verb cites Jofito (for `scan`, `filter`, `arena`, `scatter`, `gather`, `pipe`) or jq (for `select`, `map`, `fold`, `sort`, `dedupe`, `group`); each Tier 3 verb cites either nagent's structured-protocol idea (for `read`, `edit`, `test`, `discover`) or Jofito's tool-replacement model (for `glob`, `exec`, `run`, `mcp`). See `conductor/tracks/intent_dsl_survey_20260612/research/cluster_3_intent_mapping.md` §"Synthesis for the DSL" for the verb-by-verb mapping table.
|
||||
|
||||
### Cluster 4 — Meta-Tooling DSLs and Agent-Facing Languages
|
||||
|
||||
**Cluster claim.** The DSL is *not the first* agent-facing language. The existing `mcp_dsl_20260606` placeholder, nagent's "Bridge DSL" idea, OpenAI's function-calling schema, and Anthropic's tool-use schema are the prior art. The DSL learns from all four and takes a different notation (per the user's XML/JSON rejection) but the same structural properties (compact, structured, LLM-emit-able). (Per the full sub-report at `research/cluster_4_meta_tooling_dsls.md`.)
|
||||
|
||||
**Entries:**
|
||||
|
||||
- **`mcp_dsl_20260606`** (Manual Slop placeholder; per `conductor/tracks/mcp_architecture_refactor_20260606/spec.md` §12.1 and `nagent_review_20260608/metadata.json:28`). APL/K/Cosy-inspired per-MCP compact dialect. The closest project-internal reference. **Take:** the per-MCP grammar organization; the 8x token-reduction target (80 → 10 tokens); the JSON path stays (backward compat); the DSL is opt-in per MCP.
|
||||
- **nagent's Bridge DSL idea** (per `nagent_takeaways_20260608.md` line 216-230). The bridge between external agents and actual `mcp_client.py` tool calls. **Take:** the Application's function-calling stays; the bridge DSL is the format external agents emit.
|
||||
- **OpenAI function-calling** (per `https://platform.openai.com/docs/guides/function-calling`). JSON Schema with `strict`, `required`, `additionalProperties: false`, `enum` constraints. The 5-step conversational loop. **Take:** schema rigor baseline; token cost is proportional to schema verbosity; the 8x reduction target; namespace grouping; fewer-capable-tools principle.
|
||||
- **Anthropic tool-use** (per `https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/define-tools`). Flat structure with `name`, `description`, `input_schema`, `input_examples`; `strict` as guarantee; `tool_choice` control. **Take:** `input_examples` as a model for teaching the DSL; `tool_choice` maps to Tier 4 verb design (auto/any/forced); the flat structure is the right model for terseness.
|
||||
|
||||
**Section 4 grounding (per the cluster 4 synthesis).** The Tier 4 verbs map to the entries as follows: `fuzzy` ← nagent Bridge + MCP DSL; `try`/`recover` ← nagent Bridge + OpenAI; `sandbox` ← OpenAI + Anthropic; `audit` ← MCP DSL + nagent Bridge; `didyoumean` ← nagent Bridge + Anthropic; `span` ← MCP DSL + OpenAI; `offset` ← MCP DSL + OpenAI; `assumewide` ← OpenAI + Anthropic. See `conductor/tracks/intent_dsl_survey_20260612/research/cluster_4_meta_tooling_dsls.md` §"Synthesis for the DSL" for the full mapping.
|
||||
|
||||
### Cluster 5 — SSDL Shape Primitives
|
||||
|
||||
**Cluster claim.** The DSL's verbs are annotated with **SSDL shape tags** (per `docs/reports/computational_shapes_ssdl_digest_20260608.md` §1) so the reader can see at a glance whether a verb is a single instruction, a codepath, a wide codepath, a codecycle, a wide codecycle, or a codecycle graph. This is the meta-vocabulary that lets the report describe a verb's *shape* in one token.
|
||||
|
||||
**The 6 SSDL primitives:**
|
||||
|
||||
| # | Shape | One-line definition | SSDL symbol |
|
||||
|---|---|---|---|
|
||||
| 1 | **Instruction** | A single unit of computation. Reads data, writes data, or both. | `[I]` |
|
||||
| 2 | **Codepath** | A sequential list of instructions that *terminates*. No loops. | `===>` |
|
||||
| 3 | **Wide codepath** | A codepath whose execution *causes* several other codepaths to occur simultaneously. | `===>W===>` |
|
||||
| 4 | **Codecycle** | A circular structure — a codepath that *repeats* at its first instruction after its last. | `o==>` |
|
||||
| 5 | **Wide codecycle** | Multiple codecycles performing the same task simultaneously. | `oo==>oo` |
|
||||
| 6 | **Codecycle graph** | Multiple codecycles + the data they read and write. | `boxes + arrows` |
|
||||
|
||||
**The 7 modifiers:**
|
||||
|
||||
| Modifier | SSDL | Meaning |
|
||||
|---|---|---|
|
||||
| `[T]` | terminator | The instruction that *ends* a codepath (return, exit, etc.) |
|
||||
| `[B]` | branch | A point where control flow forks based on a condition |
|
||||
| `[M]` | merge | A point where control flow re-converges |
|
||||
| `[S]` | stateful | Marks an instruction that *mutates* persistent state |
|
||||
| `[Q]` | query | Marks an instruction that reads persistent state |
|
||||
| `[N]` | nil sentinel | A special value that satisfies "is this OK to use?" in all cases |
|
||||
| `───` | data | A line representing data being read or written (not a codepath) |
|
||||
|
||||
**How the DSL uses SSDL tags.** Each verb in §4 has a "Shape" column with an SSDL tag. For example, `sum` is `[I]` (single instruction); `for x .. n` is `o==>` (codecycle); `arena { }` is a sub-codepath scope; `pipe` is `===>W===>` (wide codepath, the chain can fan out); the entire DSL pipeline is a codecycle graph (multiple codecycles + the data they read and write). This lets the reader see the *shape* of a pipeline at a glance.
|
||||
|
||||
### Cluster 6 — Project's Own Command DSL Precedents
|
||||
|
||||
**Cluster claim.** The DSL is a *richer* superset of the project's existing 33 Command Palette commands (per `docs/guide_command_palette.md` and `src/commands.py`). The "Everything" mode in the Command Palette (per `guide_command_palette.md` line 383: *"search across commands, files, symbols, history, settings"*) is a near-term use case where the DSL's verbs can be the underlying format. The Command Palette is the user's existing vocabulary instinct; the DSL formalizes and extends it.
|
||||
|
||||
**5 representative commands by category** (the full 33 are in `docs/guide_command_palette.md`):
|
||||
|
||||
| Category | Command | Title | Action |
|
||||
|---|---|---|---|
|
||||
| AI | `reset_session` | Reset Session | `ai_client.reset_session()` + clears logs + `_handle_reset_session()` |
|
||||
| AI | `clear_discussion` | Clear Discussion | Empties `app.discussion_history` |
|
||||
| AI | `add_all_files_to_context` | Add All Files To Context | `app._add_all_files_to_context()` |
|
||||
| View | `toggle_text_viewer` | Toggle Text Viewer | `_toggle_window(app, "Text Viewer")` |
|
||||
| Tools | `trigger_hot_reload` | Hot Reload | `HotReloader.reload("src.gui_2", app)` |
|
||||
| Layout | `save_workspace_profile` | Save Workspace Profile | Opens the save-profile modal |
|
||||
| Theme | `cycle_theme` | Cycle Theme | Cycles through `["10x Dark", "ImGui Light", "NERV"]` |
|
||||
| Help | `show_command_palette_help` | Show Command Palette Help | Loads `docs/Readme.md` into the Text Viewer |
|
||||
|
||||
**Take.** The DSL's verbs are a *richer* superset of these. Where the Command Palette has 33 imperative commands (each is a function with side effects), the DSL's Tier 2 verbs are declarative ("I want to scan, filter, print") and the Tier 4 verbs formalize the AI-fuzzing-tolerance aspects (audit, didyoumean) that the Command Palette cannot. The "Everything" mode in the Command Palette is the natural place where DSL verbs could appear as searchable entries.
|
||||
|
||||
### Cluster 7 — Data-Oriented Error Handling Convention
|
||||
|
||||
**Cluster claim.** The DSL's `try { ... } recover { ... }` envelope returns a `Result[T]` (with side-channel errors as `list[ErrorInfo]`), per the convention established by `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §3.3. The 12 `ErrorKind` values are the canonical error vocabulary. The `Result[T]` dataclass is the data-oriented alternative to exception-based control flow.
|
||||
|
||||
**The 12 `ErrorKind` values** (per `data_oriented_error_handling_20260606/spec.md` §3.3):
|
||||
|
||||
| Kind | Meaning |
|
||||
|---|---|
|
||||
| `NETWORK` | Network or connection error |
|
||||
| `AUTH` | Authentication / API key error |
|
||||
| `QUOTA` | Quota exhausted |
|
||||
| `RATE_LIMIT` | Rate limited |
|
||||
| `BALANCE` | Balance / billing error |
|
||||
| `PERMISSION` | Permission denied (file system, etc.) |
|
||||
| `NOT_FOUND` | Resource not found |
|
||||
| `INVALID_INPUT` | Invalid input (parse failure, schema mismatch) |
|
||||
| `NOT_READY` | System not ready (e.g., RAG not initialized) |
|
||||
| `UNKNOWN` | Unknown error |
|
||||
| `CONFIG` | Configuration error |
|
||||
| `INTERNAL` | Internal error (e.g., SDK exception) |
|
||||
| `PROVIDER_HISTORY_DIVERGED_FROM_UI` | (added 2026-06-08; per nagent_review Pitfall #4) |
|
||||
|
||||
**The `Result[T]` dataclass signature** (per `data_oriented_error_handling_20260606/spec.md` §3.3):
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class Result(Generic[T]):
|
||||
data: T
|
||||
errors: list[ErrorInfo] = field(default_factory=list)
|
||||
@property
|
||||
def ok(self) -> bool: return not self.errors
|
||||
def with_error(self, err: ErrorInfo) -> "Result[T]": ...
|
||||
def with_errors(self, new_errors: list[ErrorInfo]) -> "Result[T]": ...
|
||||
def with_data(self, new_data: T) -> "Result[T]": ...
|
||||
```
|
||||
|
||||
**How the DSL uses the Result envelope.** The `try { ... } recover { ... }` block returns a `Result[T]` where `T` is the verb's return type. The `recover` block receives the `Result[T]` from the `try` and can inspect `.errors` to decide what to do. The `didyoumean` verb returns `Result[T, list[Suggestion]]` — the success case is the parse result, the failure case includes a list of suggested corrections.
|
||||
|
||||
---
|
||||
|
||||
## 3. The Grammar
|
||||
|
||||
The grammar formalizes 14 primitives drawn from the user's math pseudocode (the `determinate`/`minor`/`matrix-transpose` snippets shared during spec review), plus 3 known ambiguity flags, plus precedence rules and AI-fuzzing tolerance rules.
|
||||
|
||||
### 3.1 The 14 primitives
|
||||
|
||||
| # | Symbol | Name | Signature / Syntax | Meaning | Source example (user pseudocode) |
|
||||
|---|---|---|---|---|---|
|
||||
| 1 | `name := value` | Local bind | `name := expr` | Stack-scoped local declaration | `result := Matrix(m.rows -1, m.columns -1)` |
|
||||
| 2 | `stack { ... }` | Stack scope | `stack { decl1; decl2; ... }` | Block of stack-allocated locals | `stack { result := ...; row_offset, col_offset := Scalar; }` |
|
||||
| 3 | `name: Type` | Annotation | `name: Type` | Type hint on a binding | `m : Matrix` |
|
||||
| 4 | `func(args) -> Type { ... }` | Function def | `func(args) -> Type { body }` | Named function with return type | `determinate(m, row) -> Scalar { ... }` |
|
||||
| 5 | `name(...) proc { ... }` | Procedure def | `name(args) proc { body }` | Void-returning function | `minor(m, row_omit, column_omit) -> Scalar proc { ... }` |
|
||||
| 6 | `for x .. n` | Range iteration | `for x .. n { body }` | Iterate `x` over `[0, n)` | `for col .. m.columns` |
|
||||
| 7 | `name[a, b]` | Bracket indexing | `name[i, j, k, ...]` | Multi-dim array access | `result[row - row_offset, col - col_offset]` |
|
||||
| 8 | `if cond { ... }` | Conditional | `if cond { then-body }` | If-then (else inferred) | `if col = col_omit { ++ col_offset; continue; }` |
|
||||
| 9 | `return value` | Return | `return expr` | Function exit with value | `return result` |
|
||||
| 10 | `->` (between verbs) | Pipeline flow | `verb1 -> verb2 -> verb3` | Output of left → input of right | `filter -> (col != column_omit <- for col .. m.columns)` |
|
||||
| 11 | `<-` (after verb) | Input binding | `result <- producer` | The thing on the right is the producer | `for col .. m.columns` produces; `col != column_omit` consumes |
|
||||
| 12 | `=` (in `assert`) | Equality | `assert -> lhs = rhs` | Assert two expressions are equal | `assert -> product(...) = product(...)` |
|
||||
| 13 | `{ }` | Body block | `{ body }` | Function/scope body | `{ ... }` |
|
||||
| 14 | `[ ]` | Basic block | `[ my_stage ]` | Onat's compilation unit (no branching semantics) | (not in user pseudocode; from KYRA's basic blocks) |
|
||||
|
||||
### 3.2 Ambiguity flags
|
||||
|
||||
Per the user's note during spec review (*"Hopefully the above don't have too many logic errors that the use can't be clarified."*), three known ambiguities in the user's pseudo code are normalized in the report:
|
||||
|
||||
- **`proc` modifier placement:** `minor(m, row_omit, column_omit) -> Scalar proc { ... }` — likely a *type qualifier* (the return type is "Scalar" + "proc"-ness means side-effecting). The report adopts the convention that `proc` is a postfix modifier indicating void-returning; the syntax is `name(args) proc { body }` (return type omitted) or `name(args) -> Type proc { body }` (return type explicit but ignored).
|
||||
- **`++col_offset`:** likely `col_offset += 1`. The report formalizes as `name += 1` (Python-style augmented assignment) and does not adopt the `++` operator. This avoids confusion between pre-increment and post-increment.
|
||||
- **`m[row][column]` vs `m[row, col]`:** both appear in the user's snippets (line 24 `m[row][column]` is likely a typo for `m[row][col]`). The report adopts the comma-form (`name[a, b]`, multi-dim) throughout, since the C-style chained-bracket form doesn't compose with the user's existing matrix pseudocode.
|
||||
|
||||
### 3.3 Precedence rules
|
||||
|
||||
- **Left-to-right for `->` chains:** `a -> b -> c` parses as `(a -> b) -> c` (b's output becomes c's input). This is *not* the standard math convention (right-to-left) but it matches the user's pseudocode and the pipeline model.
|
||||
- **`(` `)` for grouping:** explicit parentheses override the left-to-right default. `a -> (b -> c)` parses as `a -> X` where `X = (b -> c)`.
|
||||
- **Stack-binding precedence:** `:=` binds tighter than `<-`. `result := expr <- producer` parses as `result := (expr <- producer)`.
|
||||
- **No operator precedence for arithmetic:** `+`, `-`, `*`, `/`, `^` are all left-associative with equal precedence. `2 + 3 * 4` parses as `(2 + 3) * 4 = 20`. (This is the APL/K convention. If the user wants math precedence, the report can adopt explicit `(` `)`.)
|
||||
|
||||
### 3.4 AI-fuzzing tolerance rules
|
||||
|
||||
These are the rules that make the DSL workable for AI agents that may fuzz verb names, indent inconsistently, or offset line references.
|
||||
|
||||
- **CoSy-style modulo indexing:** array indices wrap. `result[-1]` is equivalent to `result[result.len - 1]`. This forgives AI off-by-one errors in line references. (Per the CoSy Simplicity page: *"Indexing is modulo - like counting on your thumb & fingers : 0 1 2 3 4 0."*)
|
||||
- **Structured recovery anchors via `{ }`:** the `{ }` block is a recovery unit. If the parser cannot parse the body, the entire block is replaced with `NIL` and the error is reported at the block level, not at the line level.
|
||||
- **Line/offset independence:** the parser uses *token positions*, not raw line numbers. A token's position is `file:token-index` (e.g., `src/foo.py:42` means "the 42nd token in src/foo.py"), not `file:42` (which would be "line 42"). The mapping from token position to line number is a presentation concern, not a parse concern. This matches the project's existing FuzzyAnchor pattern (per `docs/guide_context_curation.md`).
|
||||
- **Verb-name fuzzing tolerance:** the `didyoumean` verb (see §4 Tier 4) proposes corrections for ambiguous verb names. The parser's "best guess" recovery path is configurable: strict (reject on typo), lenient (auto-correct if Levenshtein distance ≤ 2), or fuzzy (parse the rest, log the typo).
|
||||
- **Indentation tolerance:** indentation is *not* significant (per the user's explicit "ignore its record formats" instruction and the rejection of Python's indent-sensitive syntax). The parser uses a stack-based approach; the `{ }` and `[ ]` delimiters are the only structure-aware tokens.
|
||||
|
||||
### 3.5 Error envelope: `try { ... } recover { ... }`
|
||||
|
||||
```
|
||||
try {
|
||||
scan "src/foo.py" -> filter !exists -> print
|
||||
} recover err {
|
||||
audit "scan failed: " + err
|
||||
return NIL
|
||||
}
|
||||
```
|
||||
|
||||
- The `try` block evaluates the pipeline. If the pipeline returns a `Result[T]` with `errors` non-empty, the `recover` block runs.
|
||||
- The `recover` block receives the `Result[T]` as a parameter (named by the user; `err` is the default convention from the user's pseudocode).
|
||||
- The `recover` block must return a `Result[T]` (or `NIL` to short-circuit).
|
||||
- If the `recover` block itself returns a `Result[T]` with errors, those errors are appended to the outer `Result[T]`'s error list. (Per Fleury's "errors are data" pattern; per `data_oriented_error_handling_20260606/spec.md` §3.4.)
|
||||
|
||||
### 3.6 Block composition: `[ ]` (KYRA basic blocks) vs `{ }` (body blocks) vs `arena { }` (tape regions)
|
||||
|
||||
- **`[ ]`** is Onat's basic block (per `C:\projects\forth\bootslop\references\kyra_in-depth.md:56-57`): *"Basic blocks `[ ]` provide implicit begin/link/end jump targets for the JIT to resolve relative offsets within a limited scope."* In the DSL, `[ ]` is a *sequential operation block* — a chunk of code that the parser can compile and dispatch as a unit. It is *not* a scope (no new bindings); it is a *compilation unit*.
|
||||
- **`{ }`** is a body block: function body, if/then body, recover body. It introduces a new lexical scope (new bindings are local to the block).
|
||||
- **`arena { }`** is a tape-drive region: a `{ }` body that has been *pre-scattered* into a contiguous memory region. The contents are pre-placed; the JIT can emit the entire block as a single `xchg rax, rdx` boundary (per KYRA's magenta pipe semantics).
|
||||
|
||||
The three are nested by the parser: `arena { foo := x; [ bar ]; baz }` is a tape region containing 2 sequential statements (the local bind and the basic block) and a trailing call.
|
||||
|
||||
---
|
||||
|
||||
## 4. The 4-Tier Vocab (~40 Verbs)
|
||||
|
||||
Each verb: symbol, name, signature, one-line semantics, one example, "borrowed from" note, SSDL shape tag. Tier 2 and Tier 3 verbs also have a "maps to mcp_client tool" column. Tier 4 verbs have a "novel piece" note.
|
||||
|
||||
### 4.1 Tier 1 — Math (~10 verbs)
|
||||
|
||||
The Tier 1 verbs are drawn directly from the user's math pseudocode.
|
||||
|
||||
| Symbol | Name | Signature | Semantics | Example | Borrowed from | Shape |
|
||||
|---|---|---|---|---|---|---|
|
||||
| `:=` | Local bind | `name := expr` | Stack-scoped local declaration | `result := Matrix(m.rows -1, m.columns -1)` | Forth (dictionary entries); Joy (quotations) | `[I]` |
|
||||
| `stack { ... }` | Stack scope | `stack { decl1; decl2; ... }` | Block of stack-allocated locals | `stack { result := ...; row_offset, col_offset := Scalar; }` | Forth (colon definitions); KYRA (basic blocks) | `[I]` |
|
||||
| `for x .. n` | Range iteration | `for x .. n { body }` | Iterate `x` over `[0, n)` | `for col .. m.columns` | APL `ιN`; K `!R`; BQN `↕N`; Uiua (stack iteration) | `o==>` |
|
||||
| `+` | Add | `a + b` | Element-wise sum | `2 + 3` (yields 5) | All languages | `[I]` |
|
||||
| `-` | Subtract | `a - b` | Element-wise difference | `5 - 2` (yields 3) | All languages | `[I]` |
|
||||
| `*` | Multiply | `a * b` | Element-wise product | `2 * 3` (yields 6) | All languages | `[I]` |
|
||||
| `/` | Divide | `a / b` | Element-wise division | `6 / 2` (yields 3) | All languages | `[I]` |
|
||||
| `^` | Power | `a ^ b` | Element-wise power | `2 ^ 10` (yields 1024) | All languages | `[I]` |
|
||||
| `sum` | Sum | `sum expr` | Sum all elements | `sum 1..10` (yields 55) | APL `+/`; K `+/`; BQN `+` | `[I]` |
|
||||
| `product` | Product | `product expr` | Product all elements | `product 1..5` (yields 120) | APL `×/`; K `*/`; BQN `×` | `[I]` |
|
||||
| `a[i, j]` | Bracket indexing | `name[i, j, ...]` | Multi-dim array access | `result[row - row_offset, col - col_offset]` | APL `result[2;3]`; BQN `⊏`; K `@` | `[Q]` (query) |
|
||||
| `if/then` | Conditional | `if cond { then-body }` | If-then (else inferred) | `if col = col_omit { ++ col_offset; continue; }` | Forth (IF/THEN); CoSy (control flow) | `[B]` (branch) |
|
||||
|
||||
**Total Tier 1: 12 verbs.** (Slightly over the 10 estimate; the verbs are tight enough that splitting them hurts readability.)
|
||||
|
||||
### 4.2 Tier 2 — Data-Oriented Pipeline (~12 verbs)
|
||||
|
||||
The Tier 2 verbs wrap the existing 45+ MCP tools (per `docs/guide_tools.md` §"Native Tool Inventory") with declarative intent expressions. They are the "imperative veneer" over the Jofito-style predicate chain.
|
||||
|
||||
| Symbol | Name | Signature | Semantics | Example | Maps to mcp_client tool | Borrowed from | Shape |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| `scan` | Scan | `scan path` | Read source (directory, file, URL); first verb in every pipeline | `scan "src/" -> filter !dir -> map ext` | `list_directory` + `search_files` + `read_file` | Jofito `scandir()` | `[I]` |
|
||||
| `select` | Select | `select condition` | Keep records matching condition (jq-style filter) | `scan "src/" -> select .extension == ".py"` | (jq-style filter) | jq `select(condition)`; Joy `filter` | `===>` |
|
||||
| `filter` | Filter | `filter predicate` | Keep records where predicate is true | `scan "src/" -> filter .size > 0` | (predicate on FileItem) | Jofito `{filter ...}` predicate | `===>` |
|
||||
| `map` | Map | `map block` | Apply block to each record | `scan "src/" -> map ext` | (no direct equivalent) | jq `.[] | .field`; Joy `map`; CoSy `' verb 'm` | `o==>` |
|
||||
| `fold` | Fold | `fold init block` | Reduce to single value | `scan "src/" -> fold 0 { acc + .size }` | (no direct equivalent) | jq `reduce`; Joy `fold` | `o==>` |
|
||||
| `sort` | Sort | `sort key` | Order records by key | `scan "src/" -> sort .name` | (no direct equivalent) | Joy `qsort`; jq `sort` | `[I]` |
|
||||
| `group` | Group | `group key` | Bucket records by key | `scan "src/" -> group .extension` | (no direct equivalent) | jq `group_by`; CoSy APL-derived | `o==>` |
|
||||
| `dedupe` | Dedupe | `dedupe` | Remove duplicates | `scan "src/" -> dedupe` | (no direct equivalent) | jq `unique`; CoSy | `[I]` |
|
||||
| `arena { }` | Arena scope | `arena { body }` | Tape-drive region; pre-scatter contents | `arena { [ scan ]; [ filter ]; [ print ] }` | (compiler directive) | KYRA magenta pipe; Onat preemptive scatter | `o==>` |
|
||||
| `scatter` | Scatter | `scatter workers` | Fork pipeline across `workers` cores | `scan "src/" -> scatter 4 -> filter` | (runtime hint) | Onat preemptive scatter; Lottes X.com thread line 55-61 | `===>W===>` |
|
||||
| `gather` | Gather | `gather` | Collect scattered sub-streams | `scan "src/" -> scatter 4 -> filter -> gather` | (runtime hint) | Onat inverse of scatter | `[I]` |
|
||||
| `pipe` | Pipe root | `pipe` | Explicit chain root (synonym for `->`) | `pipe [ scan, filter, print ]` | (no direct equivalent) | Jofito pipe coalescing (transcript:376-410) | `===>W===>` |
|
||||
|
||||
**Total Tier 2: 12 verbs.**
|
||||
|
||||
### 4.3 Tier 3 — Shell (~10 verbs)
|
||||
|
||||
The Tier 3 verbs wrap existing MCP tools (per `docs/guide_tools.md` §"Native Tool Inventory") and provide the shell-scripting surface. They are the "imperative veneer" over the declarative Tier 2 pipeline.
|
||||
|
||||
| Symbol | Name | Signature | Semantics | Example | Maps to mcp_client tool | Borrowed from | Shape |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| `exec` | Execute | `exec cmd` | Run shell command | `exec "find . -name '*.py'"` | `run_powershell` (shell_runner.py) | nagent tag protocol (structured protocol idea) | `[I]` |
|
||||
| `open` | Open | `open path` | Open file/URL | `open "src/foo.py"` | `read_file` | nagent tag protocol | `[I]` |
|
||||
| `read` | Read | `read path` | Read file content | `read "src/foo.py"` | `read_file` | nagent tag protocol | `[I]` |
|
||||
| `write` | Write | `write path content` | Write file content | `write "src/foo.py" "new content"` | `set_file_slice` / `edit_file` | nagent tag protocol | `[I]` |
|
||||
| `close` | Close | `close handle` | Close handle | `close file_handle` | (no direct equivalent; close is implicit in Python) | Forth `CLOSE-FILE`; bash `exec` | `[I]` |
|
||||
| `path` | Path | `path` | Get current path (or `cd`) | `path` | (no direct equivalent; use `cwd`) | shell `pwd`; CoSy `path` | `[I]` |
|
||||
| `env` | Env | `env var` | Get env var | `env HOME` | (no direct equivalent) | shell `echo $HOME` | `[I]` |
|
||||
| `wait` | Wait | `wait ms` | Block for `ms` milliseconds | `wait 1000` | (no direct equivalent) | shell `sleep` | `o==>` |
|
||||
| `poll` | Poll | `poll handle ms` | Poll handle with timeout | `poll file_handle 5000` | (no direct equivalent) | shell `read -t` | `o==>` |
|
||||
| `cwd` | CWD | `cwd` | Get current working directory | `cwd` | (no direct equivalent) | shell `pwd` | `[I]` |
|
||||
|
||||
**Total Tier 3: 10 verbs.**
|
||||
|
||||
### 4.4 Tier 4 — AI-Fuzzing Tolerance (~8 verbs, the novel contribution)
|
||||
|
||||
The Tier 4 verbs are what make the DSL workable for AI agents that may fuzz verb names, indent inconsistently, or offset line references. Each verb directly maps to one or more of the 4 anchor claims (especially Claim 3: IEventTarget, per Cluster 0).
|
||||
|
||||
| Symbol | Name | Signature | Semantics | Example | Novel piece | Borrowed from | Shape |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| `fuzzy` | Fuzzy | `fuzzy expr` | Declare a parse-tolerance region; parser accepts near-matches | `fuzzy { scan "src/" -> filter .ext }` | Tolerance for AI verb-name fuzzing | nagent "discovery" intent (per `decisions.md:119,128`); SSDL "assume as much as possible" | `===>` |
|
||||
| `try { ... } recover { ... }` | Try / Recover | `try { body } recover err { fallback }` | Returns `Result[T]`; on error, the `recover` block runs | `try { read "src/foo.py" } recover { read "src/Foo.py" }` | Error envelope as data (Fleury pattern) | `data_oriented_error_handling_20260606`; Wasm `try`/`catch` block/loop/if/end | `===>B===>` |
|
||||
| `sandbox { ... }` | Sandbox | `sandbox { body }` | IEventTarget boundary; all writes in the block go through the formal event channel | `sandbox { write "tmp/x" "data" }` | O'Donnell's "reads free, writes formalized" invariant applied to the DSL | O'Donnell `mvc.html` "Writing to Model state" | `o==>` |
|
||||
| `audit` | Audit | `audit msg` | Log the state change to a structured record; the IEventTarget itself | `audit "wrote tmp/x"` | Per-write audit log; full replay capability | O'Donnell `mvc.html` "Event callbacks"; nagent's self-describing tools | `[I]` |
|
||||
| `didyoumean` | Did you mean | `didyoumean ambiguous` | Propose the closest matching verb(s) for an ambiguous input | `didyoumean "skan"` | Recovery primitive for AI typos | nagent Bridge DSL intent model; Anthropic `input_examples` | `[I]` |
|
||||
| `span` | Span | `span intent` | Decompose a compound intent into a span of sub-MCP grammar tokens | `span "read foo.py:MyClass"` | Spans the `read_file` and `py_get_definition` tools | MCP DSL per-MCP grammar (`spec.md:456-465`); OpenAI namespace grouping | `[I]` |
|
||||
| `offset` | Offset | `offset symbol` | Resolve a symbol to a file:line without requiring the model to specify the line | `offset "foo.py:MyClass.method"` | Implicit offset resolution | MCP DSL line-range notation; OpenAI "don't make the model fill known args" | `[Q]` |
|
||||
| `assumewide` | Assume wide | `assumewide intent` | If the intent is broad or ambiguous, select the most-capable matching tool (the "fewer, more capable" heuristic) | `assumewide "refactor"` | Prefer broad-capability tools over narrow specialists | OpenAI "fewer than 20 functions"; Anthropic `tool_choice: tool` force-call | `===>W===>` |
|
||||
|
||||
**Total Tier 4: 8 verbs.**
|
||||
|
||||
**Total vocab: 12 + 12 + 10 + 8 = 42 verbs.** (~40 estimate; slightly over because Tier 1 is 12 instead of 10, but Tier 3 is 10 and Tier 4 is 8.)
|
||||
|
||||
---
|
||||
|
||||
## 5. Hardware Mapping (4 Anchor Claims)
|
||||
|
||||
The 4 anchor claims tie the vocab and grammar to actual hardware/software stages.
|
||||
|
||||
### 5.1 Claim 1 — Onat/Lottes, hardware
|
||||
|
||||
The DSL's `->` pipeline, `[ ]`/`{ }` blocks, `arena { }` memory model, and `scatter`/`gather` verbs are direct descendants of KYRA/VAMP and x68.
|
||||
|
||||
- **`->` pipeline:** inherits from Forth's postfix word chain, refined by KYRA's 2-register stack (RAX/RDX) as the minimal call convention. Per `C:\projects\forth\bootslop\references\kyra_in-depth.md:14` (*"The 2-Item Hardware Stack: To achieve hardware locality and GPU compatibility, KYRA strictly restricts the data stack to exactly two CPU registers: `RAX` (Top of Stack) and `RDX` (Next on Stack)"*).
|
||||
- **`[ ]` sequential block:** inherits from KYRA's basic blocks `[ ]` with implicit begin/link/end jump targets. Per `kyra_in-depth.md:56-57` (*"Basic Blocks `[ ]`: These visually constrain the assembly output. They provide implicit begin, link (else), and end jump targets for the JIT to resolve relative offsets within a limited scope"*).
|
||||
- **`{ }` lambda block:** inherits from KYRA's lambdas `{ }` that compile code elsewhere and leave an address in `RAX`. Per `kyra_in-depth.md:58-59` (*"Lambdas `{ }`: A lambda (colored Yellow `{`) does not execute inline. The JIT compiles the block of code elsewhere in the arena and leaves its executable memory address in `RAX`."*).
|
||||
- **`arena { }`:** inherits from KYRA's magenta pipe `|` definition boundary (`RET` + `xchg rax, rdx`) as the entry/exit protocol for a memory region. Per `kyra_in-depth.md:24-27` (*"The Magenta Pipe Trick: Because the stack is just `RAX` and `RDX`, ensuring `RAX` is the active 'Top of Stack' before executing a word is vital. The `xchg rax, rdx` instruction compiles to a tiny 2-byte opcode: `48 92`. Definitions: There are no `begin` or `end` words. A magenta pipe token (`|`) implicitly signals the start of a new definition. The JIT reacts to this by: 1. Emitting a `RET` (`C3`) to close the *previous* definition. 2. Emitting `48 92` (`xchg rax, rdx`) to ensure proper stack alignment for the *new* definition."*).
|
||||
- **`scatter`:** inherits from Onat's preemptive scatter — per `X.com - Onat & Lottes Interaction 1.png.ocr.md:59-61`: *"The key concept here is that 'common' arguments like the device are pushed onto the tape using store duplication when they are known (after device creation). So it's preemptive scatter, so later at call time there is no argument gather."*
|
||||
- **`gather`:** the inverse of preemptive scatter — collect pre-scattered values from fixed memory slots.
|
||||
|
||||
Lottes's specific framing at `X.com - Onat & Lottes Interaction 1.png.ocr.md:80-86`: *"I laugh when people say C is like assembly, they are missing what we did in assembly back then, which was all registers and globals and gotos, no stacks. It's radically different than good assembly."* The DSL's 2-register model + arena regions + magenta `->` are a direct application of this insight: don't pretend you have a memory stack when the hardware has registers.
|
||||
|
||||
### 5.2 Claim 2 — O'Donnell, paradigm
|
||||
|
||||
The DSL's pipeline is *immediate-mode in pipeline composition*. Each `->`-delimited stage is a method invocation, not a Pipeline object. The pipeline exists *only* while the DSL program is being executed; once execution ends, the pipeline's state is gone.
|
||||
|
||||
Per O'Donnell at `https://johno.se/book/imgui.html`: *"Widgets, logically, change from being objects to being method invocations. As we shall see, this fundamentally changes how a client application approaches the implementation of user interfaces."*
|
||||
|
||||
The DSL inherits this: `scan -> filter -> print` is not a pipeline object you can query, name, or pass around. The only way to "name" a chain is to wrap it in a function (`determinate(m, row) -> Scalar { ... }`). The function body IS the chain; the function name IS the chain's identity. There is no separate Pipeline class.
|
||||
|
||||
This also means: the parser doesn't need to track pipeline state across executions. Each invocation of `determinate(m, row)` is independent. There is no "current pipeline" implicit state. The next call is fresh.
|
||||
|
||||
### 5.3 Claim 3 — Forth/CoSy, syntax
|
||||
|
||||
Concatenative syntax is immediate-mode in *tokenization* (whitespace-delimited, no precedence), in *evaluation* (each verb pops args, pushes results), and in *parsing* (no AST object retained after the parse — the parser emits JIT'd code directly per Onat's xchg model).
|
||||
|
||||
- **Tokenization:** whitespace-delimited, no precedence table. Per `https://en.wikipedia.org/wiki/Forth_(programming_language)`: *"Forth's grammar has no official specification. Instead, it is defined by a simple algorithm. The interpreter reads a line of input from the user input device, which is then parsed for a word using spaces as a delimiter."*
|
||||
- **Evaluation:** each verb pops args, pushes results. Per CoSy Simplicity: *"Words pass information to each other by pushing it on, or taking it off a `stack`."*
|
||||
- **Parsing:** no AST object retained after parse. The parser emits directly. Per `data_oriented_error_handling_20260606/spec.md` §3.1 and the project's overall "data-oriented design" philosophy, parsing is data flow, not object construction.
|
||||
|
||||
The DSL inherits all three. The parser reads whitespace-delimited tokens, evaluates each verb as a stack effect, and emits the result without retaining an AST.
|
||||
|
||||
### 5.4 Claim 4 — APL/K, data
|
||||
|
||||
Array languages are immediate-mode in *data representation*. There is no array-object header; values are passed by stack reference, not by handle.
|
||||
|
||||
- **APL** (per `https://en.wikipedia.org/wiki/APL_(programming_language)`): *"APL has an array as the universal data type"* — scalar `5` is a 0-dimensional array; `4 5 6 7 + 4` propagates the addition across the vector.
|
||||
- **K** (per `https://en.wikipedia.org/wiki/K_(programming_language)`): "kdb+ (built on K) processes billions of records at microsecond latency" — the array paradigm scales to production workloads.
|
||||
- **BQN** (per `https://mlochbaum.github.io/BQN/`): the CBQN bytecode compiler confirms the paradigm can be compiled efficiently.
|
||||
|
||||
The DSL's `for x .. n` range + `result[row, col]` indexing inherits the "no array object" property. The array is *the* universal type; every function operates on it; every function vectorizes.
|
||||
|
||||
---
|
||||
|
||||
## 6. AI-Agent Properties (10 Claims)
|
||||
|
||||
The 10 claims tie the DSL to the existing project's architecture so future tracks can build on it without re-deriving the design.
|
||||
|
||||
### 6.1 Claim 1 — Domain = Meta-Tooling
|
||||
|
||||
The DSL is **Meta-Tooling-side** per `docs/guide_meta_boundary.md` §"Domain 2: The Meta-Tooling". The Application's provider-native function-calling stays unchanged. The DSL is the format external agents (Gemini CLI, OpenCode) emit when invoking `mcp_client.py` tools.
|
||||
|
||||
### 6.2 Claim 2 — Runtime path = external agent → DSL → bridge → MCP → optional Hook API approval
|
||||
|
||||
Per `docs/guide_meta_boundary.md` §"The Inter-Domain Bridges": external agents (Gemini CLI) call the DSL via a bridge script (`scripts/cli_tool_bridge.py` analogue). The bridge script translates the DSL into `mcp_client.dispatch()` calls. The Hook API (`docs/guide_tools.md` §"The Hook API") surfaces HITL approval modals when the bridge detects a `sandbox { ... }` block.
|
||||
|
||||
### 6.3 Claim 3 — 3-layer security
|
||||
|
||||
The DSL's parser respects the existing 3-layer security model in `mcp_client.py` (per `docs/guide_tools.md` §"The MCP Bridge"). Every DSL statement that targets a tool outside the allowlist is rejected at parse time. The 3 layers are: allowlist construction, path validation, and resolution gate. The DSL does not bypass any of these.
|
||||
|
||||
### 6.4 Claim 4 — 4 memory dimensions
|
||||
|
||||
The DSL does *not* replace any of the 4 memory dimensions (per `conductor/tracks/nagent_review_20260608/nagent_review_v2_1_20260612.md` §2.1):
|
||||
- **Curation memory** (FileItem + ContextPreset + FuzzyAnchor)
|
||||
- **Discussion memory** (disc_entries + branching + UISnapshot A1-A7)
|
||||
- **RAG memory** (ChromaDB, opt-in)
|
||||
- **Knowledge memory** (Candidate 11, the harvested durable learnings)
|
||||
|
||||
The DSL is a *query format* for all 4, not a replacement. A `scan "src/foo.py"` is a curation-memory query; a `select .role == "User"` is a discussion-memory query; a `search "execution clutch"` is a RAG-memory query; a `read "knowledge/digest.md"` is a knowledge-memory query.
|
||||
|
||||
### 6.5 Claim 5 — Stable-to-volatile cache ordering
|
||||
|
||||
The DSL's `arena { }` blocks are cache-friendly per nagent v2.1 §2.2 stable-to-volatile ordering. The DSL's audit logs (Tier 4 `audit` verb) are a *stable* layer that can be cached across turns. The DSL's pipeline output (e.g., the output of `scan -> filter`) is a *volatile* layer appended per turn.
|
||||
|
||||
### 6.6 Claim 6 — `Result[T]` envelope
|
||||
|
||||
The DSL's `try { ... } recover { ... }` verb returns `Result[T]` per the convention established by `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §3.3. The 12 `ErrorKind` values are the canonical error vocabulary. The `Result[T]` dataclass is the data-oriented alternative to exception-based control flow.
|
||||
|
||||
### 6.7 Claim 7 — Command Palette 33 commands
|
||||
|
||||
The DSL's verbs are a *richer* superset of the 33 Command Palette commands (per `docs/guide_command_palette.md` and `src/commands.py`). The "Everything" mode in the Command Palette (per `guide_command_palette.md` line 383: *"search across commands, files, symbols, history, settings"*) is a near-term use case where the DSL's verbs can be the underlying format. The user types `find "execution clutch"` instead of clicking on a result; the DSL parses the intent and dispatches to the right MCP tool.
|
||||
|
||||
### 6.8 Claim 8 — Hook API state fields
|
||||
|
||||
The DSL's verbs that mutate state route through `_predefined_callbacks` (per `docs/guide_state_lifecycle.md` §"Hook API Surface"). The verbs that read state use `_gettable_fields`. The DSL never bypasses the Hook API; it's a *user* of the existing infrastructure.
|
||||
|
||||
### 6.9 Claim 9 — O'Donnell's IEventTarget pattern as the `sandbox` verb
|
||||
|
||||
The `sandbox { ... }` block in Tier 4 is the DSL's IEventTarget boundary. Per O'Donnell at `https://johno.se/book/mvc.html` "Writing to Model state": *"Writes to Model are formalized through the addition of IEventTarget. This is a pure virtual interface that defines all possible state changes / events on a system wide level."* In the DSL, `sandbox { ... }` declares: every state change in this block goes through a single auditable interface (the bridge script's HITL approval modal per `docs/guide_meta_boundary.md`). The `audit` verb is the IEventTarget itself: a write-verb that logs the state change to a structured record (timestamp, source, kind, payload — same shape as `guide_architecture.md` §"Telemetry & Auditing" `Comms Log` entries).
|
||||
|
||||
Per the cluster 0 sub-report (per `cluster_0_odonnell.md` §"Connections" Connection 1): *"The `sandbox` verb isolates execution and enforces that all state observations by the sandboxed code are *reads* — they can occur freely against the const Model view. State mutations by sandboxed code, however, must be routed through the formal event channel."*
|
||||
|
||||
### 6.10 Claim 10 — O'Donnell's "reads are free" claim as the rationale for cheap verbs
|
||||
|
||||
Per O'Donnell at `https://johno.se/book/mvc.html` "Reading Model state": *"First of all, View and Controller may only access Model in a const fashion. This has numerous repercussions. Firstly, exposing central Model state as public is ok, as it can only be read. Also, only const methods may be called, so state changes cannot be made internally as a result of a bad function call."*
|
||||
|
||||
The Tier 2 verbs (`scan`, `filter`, `map`, `fold`, `sort`, `group`, `dedupe`) are *read-only* and can be re-evaluated freely, multiple times per execution, in parallel stages, without audit. Only the moment the chain's output is consumed by a write-verb (`exec`, `write`, `assign`) triggers the HITL modal. This is why the bridge script can re-execute a read-only chain without human approval.
|
||||
|
||||
Per the cluster 0 sub-report (per `cluster_0_odonnell.md` §"Connections" Connection 2): *"O'Donnell's 'reads are free' claim is the rationale for cheap Tier 2 verbs — they can be re-evaluated freely because they never mutate state, so they can be re-evaluated freely, multiple times per execution, in parallel stages, without audit."*
|
||||
|
||||
---
|
||||
|
||||
## 7. Open Questions for Follow-up B (≥6)
|
||||
|
||||
These open questions must be answered by the follow-up B track (interpreter prototype). Each question is a design decision the interpreter must make.
|
||||
|
||||
1. **How does `arena { }` map to Onat's preemptive scatter?** Is the block itself a tape-drive region, or is `arena` a wrapper that allocates a tape for the block's contents? The interpreter must decide whether `arena { ... }` is a parser hint (the parser pre-scatters) or a runtime directive (the runtime allocates a tape). The implication: parser-time optimization vs runtime flexibility.
|
||||
|
||||
2. **Where does "intent resolution" live?** Is it a per-verb option, a per-block modifier, or a global parser mode? The `fuzzy` verb declares a parse-tolerance region; is this a property of the verb, of the block, or of the whole program? The interpreter must decide how `fuzzy` composes with non-`fuzzy` verbs in the same chain.
|
||||
|
||||
3. **How does `audit` interact with `comms.log`?** Per `docs/guide_architecture.md` §"Telemetry & Auditing", the existing 5 log streams are `comms.log` (JSON-L for API traffic), `toolcalls.log` (markdown for tool invocations), `apihooks.log` (HTTP hook invocations), `clicalls.log` (subprocess details), and `scripts/generated/<ts>_<seq>.ps1` (preserved scripts). Is the DSL's audit log a 6th stream, or does it fold into one of the existing 5? Recommendation: a 6th stream (`audit.log`) because the DSL's audit is verb-level (every verb), while the existing 5 streams are tool-level (specific call types).
|
||||
|
||||
4. **Does `sandbox` produce `Result[T, ErrorInfo]` (the Fleury pattern) or a different envelope?** Per `data_oriented_error_handling_20260606/spec.md` §3.3, the canonical `Result[T]` is a dataclass with `data: T` and `errors: list[ErrorInfo]`. The `sandbox { ... }` block can either use this envelope or a different one (e.g., `SandboxResult` with `stdout: str`, `stderr: str`, `exit_code: int`, `errors: list[ErrorInfo]`). The interpreter must decide.
|
||||
|
||||
5. **`didyoumean` recovery: parser feature or user-facing verb?** If parser feature, the parser auto-corrects on parse failure and the user never sees the typo. If user-facing verb, the parser logs the typo, the user writes `didyoumean "<typo>"`, and gets a suggestion. The interpreter must decide whether `didyoumean` is part of the parse path or part of the runtime path.
|
||||
|
||||
6. **How does `for x .. n` interact with Tier 2's `filter`/`map`?** Is `for x .. n { body }` sugar for `[1, 2, ..., n] -> map { body }`? Or are they distinct (the for-loop has named variable, the pipeline has anonymous position)? The interpreter must decide whether the user's pseudocode `for col .. m.columns { body }` is syntactic sugar for the array-language `iota m.columns { ... }`.
|
||||
|
||||
7. **How does `sandbox` map to Manual Slop's `pre_tool_callback` flow?** The `sandbox` block's audit log: separate JSON-L file, or fold into the existing `comms.log` + `toolcalls.log`? (This is the same question as #3, but specifically about the runtime path — what happens when a `sandbox { write "tmp/x" "data" }` is actually executed by the bridge script?)
|
||||
|
||||
8. **Connection to `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER`:** what's the minimum subset of the report's vocab that would let the placeholder track (a) write a bridge script and (b) demonstrate one round-trip end-to-end? The placeholder's per-MCP grammar design (per `mcp_architecture_refactor_20260606/spec.md` §12.1) needs at least 1 Tier 1 verb, 1 Tier 2 verb per sub-MCP, and 1 Tier 4 verb (probably `sandbox` or `audit`). The minimum subset: 1-3 verbs, plus the grammar.
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Bibliography
|
||||
|
||||
### A.1 External prior art
|
||||
|
||||
**Cluster 0 — Immediate-Mode Paradigm:**
|
||||
- John O'Donnell, "IMGUI" — `https://johno.se/book/imgui.html` (widgets as method invocations, frame shearing, deferred display)
|
||||
- John O'Donnell, "The Pitch" — `https://johno.se/book/pitch.html` (paradigm shift, GPU advances, Controller as procedural composer)
|
||||
- John O'Donnell, "Immediate Mode MVC" — `https://johno.se/book/immvc.html` (book roadmap, IEventTarget centrality)
|
||||
- John O'Donnell, "MVC" — `https://johno.se/book/mvc.html` (reads free/writes formalized, IEventTarget pattern, scene-graph prohibition)
|
||||
|
||||
**Cluster 1 — Concatenative (Forth family):**
|
||||
- Forth — `https://en.wikipedia.org/wiki/Forth_(programming_language)` (RPN, dictionary, colon-word, threaded code, self-hosting)
|
||||
- ColorForth — `https://en.wikipedia.org/wiki/ColorForth` (color-encoded semantics)
|
||||
- KYRA/VAMP (Onat Türkçüoğlu) — `C:\projects\forth\bootslop\references\kyra_in-depth.md` (2-register stack, magenta pipe, basic blocks, lambdas, FFI), `forth_day_2020_in-depth.md` (ColorForth + SPIR-V)
|
||||
- x68/5th (Timothy Lottes) — `C:\projects\forth\bootslop\references\neokineogfx_in-depth.md` (folded interpreter, 32-bit granularity, annotation overlay), `blog_in-depth.md` (source-less evolution, "Ear"+"Toe"), `Architectural_Consolidation.md` (synthesis)
|
||||
- Onat/Lottes X.com thread — `C:\projects\forth\bootslop\references\X.com - Onat & Lottes Interaction 1.png.ocr.md` (direct quotes on register file as aliased namespace, preemptive scatter, "no stacks")
|
||||
- Joy — `https://en.wikipedia.org/wiki/Joy_(programming_language)`, `http://joylang.org/` (purely functional concatenative, quotations as first-class values, combinator library)
|
||||
- CoSy (Bob Armstrong) — `https://cosy.com/CoSy/Simplicity.html` (TimeStamped notebook/log, 3-cell headers, modulo indexing, APL-via-K vocabulary), `https://cosy.com/4thCoSy/` (4thCoSy repo)
|
||||
|
||||
**Cluster 2 — Array:**
|
||||
- APL (Kenneth Iverson) — `https://en.wikipedia.org/wiki/APL_(programming_language)`, `https://www.dyalog.com/`
|
||||
- K / q (Arthur Whitney) — `https://en.wikipedia.org/wiki/K_(programming_language)`, `https://kx.com/`
|
||||
- BQN (Marshall Lochbaum) — `https://mlochbaum.github.io/BQN/`
|
||||
- Uiua (Tony Morris) — `https://www.uiua.org/`, `https://github.com/uiua-lang/uiua`
|
||||
|
||||
**Cluster 3 — Intent-Mapping:**
|
||||
- Jofito (Jody Bruchon) — `https://codeberg.org/jbruchon/jofito` (README 2026 UPDATE NOTE: "intent mapping engine"), `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt` (full video transcript, 428 lines)
|
||||
- jq (Stephen Dolan) — `https://en.wikipedia.org/wiki/Jq_(programming_language)`, `https://jqlang.org/`
|
||||
- nagent's tag protocol — `conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md` (lines 210-230 for the Bridge DSL), `decisions.md` (line 50: user rejects XML/JSON; lines 117-134: Candidate 4: Intent-based DSL for Meta-Tooling)
|
||||
- WebAssembly — `https://en.wikipedia.org/wiki/WebAssembly`
|
||||
|
||||
**Cluster 4 — Meta-Tooling DSLs:**
|
||||
- `mcp_dsl_20260606` placeholder — `conductor/tracks/mcp_architecture_refactor_20260606/spec.md` §12.1 and §13.1 (per-MCP grammar, 8x token reduction, backward compat)
|
||||
- nagent's Bridge DSL — `conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md` line 216-230
|
||||
- OpenAI function-calling — `https://platform.openai.com/docs/guides/function-calling`
|
||||
- Anthropic tool-use — `https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/define-tools`
|
||||
|
||||
**Cluster 5 — SSDL:**
|
||||
- `docs/reports/computational_shapes_ssdl_digest_20260608.md` §1 (6 primitives + 7 modifiers)
|
||||
|
||||
**Cluster 7 — Result convention:**
|
||||
- `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §3.3 (Result[T], ErrorInfo, 12 ErrorKind values)
|
||||
|
||||
### A.2 Project's own references
|
||||
|
||||
**Existing tracks and reports:**
|
||||
- `conductor/tracks.md` — active tracks registry
|
||||
- `conductor/workflow.md` — the workflow rules (4-phase pattern, TDD, git notes)
|
||||
- `conductor/product.md` — the product vision
|
||||
- `conductor/tech-stack.md` — the tech stack constraints
|
||||
- `conductor/code_styleguides/` — the styleguides (Python style, error handling, workspace paths, etc.)
|
||||
- `docs/Readme.md` — the doc index
|
||||
- `docs/ideation/ed_chunk_data_structures_20260523.md` — the existing ideation doc; same style/format as this report
|
||||
|
||||
**Per-source-file guides:**
|
||||
- `docs/guide_architecture.md` — threading model, event system, HITL, telemetry
|
||||
- `docs/guide_meta_boundary.md` — Application vs Meta-Tooling split
|
||||
- `docs/guide_tools.md` — MCP Bridge security, 45 tools, Hook API, ApiHookClient
|
||||
- `docs/guide_mma.md` — 4-tier Multi-Model Architecture
|
||||
- `docs/guide_context_aggregation.md` — the 518-line `aggregate.py` pipeline (3 strategies, 7 view modes)
|
||||
- `docs/guide_command_palette.md` — 33 commands, fuzzy search, "Everything" mode
|
||||
- `docs/guide_rag.md` — opt-in RAG (ChromaDB)
|
||||
- `docs/guide_state_lifecycle.md` — undo/redo, HistoryManager, state delegation
|
||||
- `docs/guide_testing.md` — 251 test files, 7 conftest fixtures
|
||||
- `docs/guide_personas.md` — persona management
|
||||
- `docs/guide_workspace_profiles.md` — docking layout profiles
|
||||
|
||||
**Track-internal references (recent):**
|
||||
- `conductor/tracks/data_oriented_error_handling_20260606/spec.md` — the Result[T] convention
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_review_v2_1_20260612.md` — 4 memory dimensions, RAG integration discipline, stable-to-volatile cache ordering
|
||||
- `conductor/tracks/mcp_architecture_refactor_20260606/spec.md` — the SubMCP architecture (the target the DSL maps to)
|
||||
- `conductor/tracks/code_path_audit_20260607/spec.md` — the data-oriented pattern for static analysis
|
||||
|
||||
**Reports:**
|
||||
- `docs/reports/computational_shapes_ssdl_digest_20260608.md` — SSDL 6 primitives + 7 modifiers
|
||||
- `docs/reports/ascii_sketch_ux_workflow_20260608.md` — the user's ideation workflow convention
|
||||
|
||||
### A.3 Sub-reports (the research basis for §2)
|
||||
|
||||
- `research/cluster_0_odonnell.md` (338 lines) — Cluster 0 synthesis
|
||||
- `research/cluster_1_concatenative.md` (209 lines) — Cluster 1 synthesis
|
||||
- `research/cluster_2_array.md` (218 lines) — Cluster 2 synthesis
|
||||
- `research/cluster_3_intent_mapping.md` (241 lines) — Cluster 3 synthesis
|
||||
- `research/cluster_4_meta_tooling_dsls.md` (313 lines) — Cluster 4 synthesis
|
||||
@@ -0,0 +1,589 @@
|
||||
# Cluster 0 — Immediate-Mode Paradigm (Philosophical Anchor)
|
||||
|
||||
**Sub-report for Section 2 of the main report: "Intent-Based Scripting Languages"**
|
||||
**Track: `intent_dsl_survey_20260612`**
|
||||
**Author: Tier 2 sub-agent (research dispatch)**
|
||||
**Sources: John O'Donnell — `https://johno.se/book/` (IMGUI / The Pitch / MVC / IM-MVC roadmap)**
|
||||
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
This sub-report covers the single entry for Cluster 0: John O'Donnell's *Immediate Mode Model/View/Controller* (2007–2008), a working manuscript published across four interconnected pages at `johno.se/book/`. Cluster 0 is the philosophical anchor for the entire report — the four anchor claims in Section 1 (widgets are method invocations, reads are free/writes are formalized, IEventTarget, no scene-graph abstractions) all derive from O'Donnell's work and must be understood before the other clusters can be properly situated.
|
||||
|
||||
O'Donnell's book was written in the context of game development (specifically Massive Entertainment's Ground Control series), but its core arguments are framework-agnostic. The central thesis — that visualization is not inherently stateful, and that retained-mode UI toolkits impose a synchronization burden that is unnecessary given modern GPU capabilities — applies directly to the DSL's Meta-Tooling tier. The DSL's verbs (sandbox, audit, intent_mapping, sandbox_execute) are not merely "secure" or "auditable" — they are architecturally faithful to O'Donnell's invariants.
|
||||
|
||||
---
|
||||
|
||||
## Entry: John O'Donnell — IMGUI / The Pitch / MVC
|
||||
|
||||
### What the Work Is
|
||||
|
||||
John O'Donnell's in-progress book (*Immediate Mode Model/View/Controller*, 2007–2008) lays out a unified paradigm for game UI and application architecture. The core claim across all four pages is that **visualization is not inherently stateful** — the dominant assumption in OOP toolkits (MFC widgets, Ogre scene graphs, HTML DOM) is a historical artifact, not a technical necessity. O'Donnell calls this the "broken paradigm" and argues it is the root cause of synchronization complexity between application state and UI state.
|
||||
|
||||
The four pages serve distinct roles in the overall argument:
|
||||
|
||||
- **`imgui.html`** — The canonical IMGUI essay: defines widgets-as-method-invocations, presents a complete C++ `Gui` class with buttons/radios/edit boxes/tree controls/combo boxes/sliders/drag-and-drop, and distinguishes deferred vs. direct display. This is the most concrete page — it has actual code for every widget type.
|
||||
- **`pitch.html`** — "The Pitch": frames IMGUI as a paradigm shift, attacks the retained-mode premise in detail, introduces the Controller as the per-frame "programmer" of View, and argues that GPU advances have eliminated the performance justification for retained mode. It traces the history from DirectX 3's Retained/Immediate Mode split through to modern GPU batch rendering (Jungle Peak's 800,000-vertex single-draw-call).
|
||||
- **`immvc.html`** — The book roadmap: maps the six-chapter structure (IMGUI → MVC/E → Persistence), explicitly names `IEventTarget` as central to multiplayer and async design, traces the author's design journey from Ground Control via Josephine/GC2 to MVC/E, and outlines the experience progression that led to the architecture. This page also contains the design rationale for why a single event interface is superior to separate read/write interfaces.
|
||||
- **`mvc.html`** — The MVC chapter proper: defines `Model` (const-only access), `View` (procedural, stateless), `Controller` (per-frame orchestrator), formalizes the **"reads are free, writes are formalized"** invariant via a single `IEventTarget` interface, shows how the pattern extends transparently across a network, and details the Director pattern for managing local/listen/dedicated server modes.
|
||||
|
||||
### What We Take From It
|
||||
|
||||
The DSL's Meta-Tooling tier builds on O'Donnell's immediate-mode philosophy in four specific ways:
|
||||
|
||||
1. **Widget identity is an illusion.** A widget is a method call, not an object. This maps directly to the DSL's treatment of verbs (sandbox, audit, intent_mapping) as stateless procedure calls, not stateful resources. The execution context is created fresh at call time and torn down at return time.
|
||||
2. **Reads are free, writes are formalized.** Every write to Model state must pass through `IEventTarget`. The DSL inherits this invariant: every Tier 4 verb that mutates state must be a formal event, not a direct write. The const Model reference is the only handle the execution context holds.
|
||||
3. **The IEventTarget pattern is a universal event bus.** O'Donnell shows that a single interface covering all state-change events (including visualization callbacks) works better than separating read and write interfaces. The DSL's verb dispatch inherits this pattern: one interface, multiple implementations (local Model, audit logger, remote proxy).
|
||||
4. **View must not expose scene-graph abstractions.** The MVC chapter explicitly forbids exposing mesh/transform pair abstractions in View's public interface; instead it must be `view::drawMesh(mesh, transform, ...)`. The DSL's sandbox/execute verbs enforce this: the sandboxed execution context is a flat procedure, not a hierarchical object graph.
|
||||
|
||||
---
|
||||
|
||||
## Background: The Intellectual Lineage
|
||||
|
||||
### The MVC Origins
|
||||
|
||||
O'Donnell traces MVC to Trygve Reenskaug's original 1979 work at Xerox PARC, where the pattern was conceived for the Smalltalk environment. O'Donnell notes the key separation:
|
||||
|
||||
> "multiple views example: Model, PieChart, SpreadSheet, BarChart — Model is state; Views (potentially many) visualize state; Controller reacts to user input in order to manipulate Model." — `pitch.html`, "Origins" section
|
||||
|
||||
The classic MVC pattern, as implemented in Smalltalk's MVC and later in MFC's Document/View, assumed that Views are stateful — implemented as objects with encapsulated state and behavior. O'Donnell accepts the premise of MVC (the separation of Model, View, and Controller as distinct roles) but rejects the stateful View assumption as the root cause of synchronization complexity.
|
||||
|
||||
### MFC's Document/View as the Cautionary Example
|
||||
|
||||
O'Donnell singles out MFC's Document/View as a particularly harmful instantiation of the stateful View assumption:
|
||||
|
||||
> "Compare to MFC's Document/View, where MFC's View acts as both Controller (handles input) and View (output/visualisation)... Document/View is quite useful, because very often the context in which user input is applied depends on visualisation (i.e. a scrolling view of a document)." — `pitch.html`, "Origins" section
|
||||
|
||||
MFC's approach collapsed the Controller into the View, eliminating the per-frame compositional role that O'Donnell's Controller plays. The result was a widget toolkit where every window was simultaneously a View (visualizing state) and a Controller (handling input), with no clean separation between the two roles.
|
||||
|
||||
### The DirectX 3 Historical Irony
|
||||
|
||||
O'Donnell notes a striking historical irony in the evolution of graphics APIs:
|
||||
|
||||
> "Observe, somewhat ironically, that DirectX 3, ca. 1996 had 2 modes of operation for graphics, namely Retained Mode and Immediate Mode. At least before DirectX 6 in 1998, Retained Mode was dropped from the API, because game devs simply did not use it. They wanted more control." — `pitch.html`, "Origins" section
|
||||
|
||||
The industry already rejected retained mode at the API level in 1998, but then re-created it as an application-level pattern (scene graphs, instance abstractions) on top of the immediate-mode GPU interface. O'Donnell's argument is that game developers should have gone all the way — not just to a low-level immediate mode API, but to an application architecture that is also immediate-mode at the UI level.
|
||||
|
||||
### The Ground Control Experience Progression
|
||||
|
||||
O'Donnell traces his own intellectual journey through three major projects at Massive Entertainment:
|
||||
|
||||
**Ground Control (GC):** Introduced the client/server model with separate local and remote representations of game entities. The initial architecture used message-based communication between IGame (server) and IPlayer (client) implementations.
|
||||
|
||||
**Josephine and GC2:** The persistence system (Juice) evolved into a data definition language, persistence scheme, and runtime memory format. The realization grew that there is great value in being able to inspect data and **derive** other data from this, and also visualize data in a number of different ways. The experience with GC2's unit relations (bi-directional pointers, entity state caches) showed how duplicated state across IPlayer implementations became a maintenance burden.
|
||||
|
||||
**MVC/E:** The final architecture that emerged: Model (singleton with const-only access), View (procedural, stateless), Controller (per-frame composer), and IEventTarget (single formal interface for all state changes). The key realization was that state duplication — even within a single application — is the source of synchronization bugs.
|
||||
|
||||
This progression is documented in detail on `immvc.html`, which contains O'Donnell's "experience progression" narrative from GC through Josephine/GC2 to MVC/E.
|
||||
|
||||
### GPU Batch Rendering as the Performance Vindication
|
||||
|
||||
O'Donnell provides an empirical result that directly falsifies the performance argument for retained mode:
|
||||
|
||||
> "In DirectX9 is possible to render very large batches of primitives per draw call. At Jungle Peak we rendered 800 000+ vertices in a single call on nVidia GeForce 6 class hardware, with good performance. The meant a number of things, such as discarding the concept of camera culling. We simply batched together all instances of a particular mesh into a single huge vertex/index buffer pair (one per texture basically), and sent them all to the hardware with very few calls." — `pitch.html`, batch rendering section
|
||||
|
||||
If 800,000 vertices can be rendered in a single draw call, there is no performance justification for the complex state management that retained-mode scene graphs require. The GPU is not the bottleneck; the CPU-side state management is. This empirical result is the quantitative foundation for O'Donnell's claim that the retained-mode premise "no longer holds."
|
||||
|
||||
---
|
||||
|
||||
## Terminology Glossary
|
||||
|
||||
To make the Connections section legible, the following O'Donnell-specific terms are defined here:
|
||||
|
||||
**IMGUI (Immediate Mode GUI):** A UI paradigm where widgets are method calls, not persistent objects. The client application passes all state required for a widget at call time; the widget has no internal state that persists between calls. Contrast with "retained mode" where widgets are objects with encapsulated state.
|
||||
|
||||
**Retained Mode:** The dominant UI paradigm where widgets are objects that persist across frames and cache application state internally. Requires explicit synchronization between the application's state and the widget's cached state. The target of O'Donnell's critique.
|
||||
|
||||
**Model:** The authoritative source of application state. In O'Donnell's MVC/E, Model is a singleton with const-only external access (`const Model&`). All state that needs to survive across frames lives in Model. URL: `https://johno.se/book/mvc.html` — "Model" section.
|
||||
|
||||
**View:** The input/output layer. From a client (Controller) perspective, View is completely stateless — it exposes only a procedural interface (`drawMesh`, `drawRect`, etc.) with no retained state accessible to the client. View may cache internally for performance, but this cache is invisible to the client. URL: `https://johno.se/book/mvc.html` — "View" section.
|
||||
|
||||
**Controller:** The per-frame orchestrator. Each frame, Controller traverses Model's state and "programs" View to produce the current visualization. Controller is the only component that holds both a View reference (for writing output) and an IEventTarget reference (for writing to Model). URL: `https://johno.se/book/pitch.html` — "MVC revisited" section.
|
||||
|
||||
**IEventTarget:** The single formal interface through which all state changes flow. A pure virtual C++ class defining all possible events (`CreateEntity`, `DestroyEntity`, etc.). Both local Model and network proxies implement this interface identically. URL: `https://johno.se/book/mvc.html` — "Writing to Model state" section.
|
||||
|
||||
**MetaController:** A parent Controller that manages switching between multiple child Controllers (e.g., PlayController and EditController). Enables instant switching between radically different input schemes and visualizations without any cleanup. URL: `https://johno.se/book/mvc.html` — "Controller" section.
|
||||
|
||||
**Director:** The top-level orchestrator that manages local/listen/dedicated server modes. Encapsulates the configuration of Model, View, Client (remote proxy), and Server. URL: `https://johno.se/book/mvc.html` — "The Director" section.
|
||||
|
||||
**Frame shearing:** A phenomenon in real-time IMGUI where a user interaction (resolved on frame N) changes application state that controls the UI appearance, but the UI drawn on frame N was generated before the interaction occurred, resulting in parts of the displayed image reflecting the old state and parts reflecting the new state. O'Donnell's solution is a "shearing exception" that restarts GUI generation for the current frame. URL: `https://johno.se/book/imgui.html` — "Frame shearing" section.
|
||||
|
||||
**Deferred display:** A display strategy where widget drawing calls are buffered (e.g., into a vertex buffer) and flushed all at once, rather than rendering immediately. Used in hardware-accelerated applications where batching primitives is more efficient than immediate rendering. URL: `https://johno.se/book/imgui.html` — "Deferred display" section.
|
||||
|
||||
---
|
||||
|
||||
## Detailed Analysis
|
||||
|
||||
### Anchor Claim 1: "Widgets Are Method Invocations, Not Objects"
|
||||
|
||||
**Source:** `https://johno.se/book/imgui.html` — "Immediate Mode applied" section, third paragraph:
|
||||
|
||||
> "Widgets, logically, change from being objects to being method invocations."
|
||||
|
||||
#### The Broken Paradigm
|
||||
|
||||
O'Donnell opens the essay with a direct attack on the foundational assumption of all major UI toolkits:
|
||||
|
||||
> "There is a dominant paradigm within programming since (forever?), and that simply: ***The user interface and / or visualization of any program is inherently stateful.*** I maintain that this is a broken paradigm. Not that such things CANNOT be stateful; the current state of various software technlogies are indeed based upon this paradigm. I will however argue that avoiding such statefulness **significantly** simplifies software." — `imgui.html`, "The broken paradigm" section
|
||||
|
||||
The word "broken" is used deliberately: O'Donnell is not saying stateful UIs are impossible or that they don't work — he is saying they carry a structural complexity burden that is unnecessary. The complexity is not in the problem domain (building user interfaces is genuinely hard) but in the solution domain (retained-mode toolkits amplify that difficulty by adding a synchronization layer that the problem doesn't require).
|
||||
|
||||
#### The State-Copy Problem
|
||||
|
||||
The mechanism by which retained mode introduces complexity is the state copy / cache:
|
||||
|
||||
> "I maintain that much of the complexity associated with the design and use of of traditional user interface systems is a direct result of the tendency of such systems to retain state. The programmer is typically required to actively copy state back and forth between the application and the user interface in order for the user interface to reflect the state of the application, and conversely, for changes that happen in the user interface to affect the state of the application." — `imgui.html`, "The woes of caching state" section
|
||||
|
||||
This is the core observation: retained-mode UI toolkits don't just happen to have state — they *require* the programmer to actively manage a copy of application state in the UI layer. The copy is not a side effect; it is the design contract. O'Donnell names this explicitly:
|
||||
|
||||
> "This is the basic problem; this state (inherent to the user interface system) is a COPY / CACHE of the REAL state, which is owned by and resides with in the specific application itself." — `imgui.html`, "The woes of caching state" section
|
||||
|
||||
The emphasis on "COPY / CACHE" and "REAL state" is O'Donnell's terminological choice. The UI system has its own copy; the application has the real copy; the two must be kept in sync. Every synchronization point is a potential bug source: missed updates, stale reads, circular dependencies in the update direction.
|
||||
|
||||
#### The Three-Way Synchronization Burden
|
||||
|
||||
O'Donnell describes the synchronization burden in detail:
|
||||
|
||||
> "The user interface, from the point of view of the client application, most often looks like a collection of objects, typically one per 'widget', which encapsulate state that needs to be frequently synchronized with that of the application. Such synchronization goes both ways; state moves from the application to the user interface in order for that state to become visible to the user, and state moves from the user interface back to the application when the user interacts with the interface in order to change the state of the application." — `imgui.html`, "The woes of caching state" section
|
||||
|
||||
The "both ways" synchronization is the key burden. In a typical retained-mode toolkit:
|
||||
1. Application → UI: application pushes state to widget objects so the widget can display it
|
||||
2. UI → Application: widget fires events; application pulls state from widget objects to update application state
|
||||
|
||||
This bidirectional push/pull is the synchronization overhead O'Donnell targets. It is not a bug in any particular toolkit — it is a structural consequence of the retained-mode design choice.
|
||||
|
||||
#### The Callback Complexity Layer
|
||||
|
||||
On top of the synchronization burden, retained-mode toolkits add callback complexity:
|
||||
|
||||
> "Additionally, the manner in which the application is notified of user interactions with the interface (which in turn signals a need for re-syncing of state) often takes the form of callbacks. This requires the application to implement 'event handlers' for any low-level interaction that is of interest, often by subclassing some toolkit baseclass either manually or via various code generation tricks; in either case further complicating the life of the client application." — `imgui.html`, "The woes of caching state" section
|
||||
|
||||
The callback pattern is itself a form of indirection that O'Donnell identifies as a source of complexity. The callback fires when the widget state changes; the application must then pull the new state from the widget object and reconcile it with the application state. This is a third synchronization point (widget → callback → application → widget → application) layered on top of the bidirectional sync.
|
||||
|
||||
#### The IMGUI Alternative: No State to Synchronize
|
||||
|
||||
O'Donnell's alternative eliminates the problem at the root:
|
||||
|
||||
> "**IMGUI** does away with this type of state synchronization by requiring the application to explicitly pass all state required for visualization and interaction with any given 'widget' in real-time. The user interface only retains the minimal amount of state required to facilitate the functionality required by each type of widget supported by the system." — `imgui.html`, "Immediate Mode applied" section
|
||||
|
||||
The phrase "only retains the minimal amount of state" is precise. O'Donnell is not claiming IMGUI is completely stateless — edit boxes need to track which string has focus, sliders need to track the drag handle position, tree controls need to track expand/collapse state. But the retained state is *minimal* and *internal to the widget type*, not a copy of application state. The application state lives in one place (the application), and the UI visualizes it by receiving it as call parameters.
|
||||
|
||||
#### The Conceptual Shift: Widgets as Method Calls
|
||||
|
||||
O'Donnell states the conceptual shift in the clearest possible terms:
|
||||
|
||||
> "With **IMGUI**, a conceptual shift occurs. Widgets are no longer objects at all, and can't really be said to 'exist'. They take instead the form of procedural method calls, and the user interface itself goes from being as stateful collection of objects to being a real time sequence of method calls." — `imgui.html`, "Immediate Mode applied" section
|
||||
|
||||
The phrase "can't really be said to 'exist'" is the key: a widget in IMGUI is not an entity that persists in memory, has identity, and holds state. It is a procedure that runs, does its work, and returns. The "widget" is the call; the call is the widget.
|
||||
|
||||
#### The Enabling Mechanism: Real-Time Loop
|
||||
|
||||
O'Donnell identifies the real-time application loop as the enabling mechanism:
|
||||
|
||||
> "Fundamental to this approach is the concept of a real-time application loop, where the application processes logic and draws its display at real-time rates (30 frames per second or more). In the context of games, this is already common practice." — `imgui.html`, "Immediate Mode applied" section
|
||||
|
||||
The real-time loop is what makes IMGUI feasible: at 30+ fps, the cost of re-creating widget state each frame is negligible compared to the cost of maintaining synchronization between retained-mode widget objects. The loop also means the UI is always displaying the current application state — there is no "last drawn" state that can become stale between frames.
|
||||
|
||||
#### Code Evidence: The button() Implementation
|
||||
|
||||
The most concrete evidence for the "widgets as method calls" claim is the actual code. O'Donnell's complete `button()` implementation:
|
||||
|
||||
```cpp
|
||||
const bool Gui::button(const int aX, const int aY,
|
||||
const int aWidth, const int aHeight,
|
||||
const char* aText)
|
||||
{
|
||||
drawRect(aX, aY, aWidth, aHeight);
|
||||
drawText(aX, aY, aText);
|
||||
|
||||
return mouse::leftButtonPressed() &&
|
||||
mouse::cursorX() >= aX &&
|
||||
mouse::cursorY() >= aY &&
|
||||
mouse::cursorX() < (aX + aWidth) &&
|
||||
mouse::cursorY() < (aY + aHeight);
|
||||
}
|
||||
```
|
||||
|
||||
Three lines of code. No button object. No state map. No event subscription. The return value is a `bool` — the interaction result — computed directly from the mouse state at call time. This is a method invocation, not an object.
|
||||
|
||||
#### Empirical Evidence: UfoPilot II Collapse
|
||||
|
||||
O'Donnell provides a quantitative before/after from his own project:
|
||||
|
||||
> "In one of my games, UfoPilot II : The Phadt Menace, the entire 'front-end' user interface was initially implemented in classic retained mode style. This was more or less equivalent to how MFC dialog boxes worked, in that I had a class for each specific 'screen', and instantiated an object of each of these classes as the user navigated throughout the interface. Each 'screen class' had multiple widget members, and layout was part of construction and much a manual issue where I would run the program, look at the placement of things, shut it down, edit the code, and repeat." — `imgui.html`, "An example of simplification" section
|
||||
|
||||
After porting to IMGUI:
|
||||
|
||||
> "Upon porting this user interface to **IMGUI**, with toolkit-methods being implemented as needed during the porting process (I built my Gui class as I went along, moving code from Widget classes to the Gui class), I gained several things: Firstly, in each case where there was a class for a 'screen', this collapsed from a class to a single method in a Menu class (which represented the entire collection of front-end screens and code). So where I had previously had about 10-15 classes I now had a single class. All of the widgets classes collapsed into methods of the Gui class, so again, where I previously had several classes I now had one." — `imgui.html`, "An example of simplification" section
|
||||
|
||||
10-15 classes → 1 class. The mechanism: widget state that was previously stored in per-widget objects is now passed as call parameters by the client code.
|
||||
|
||||
#### The List Box: Strongest Example
|
||||
|
||||
The list box example is the clearest demonstration of the "widgets as method calls" principle:
|
||||
|
||||
> "Most user interface toolkits support the concept of a list box / list control. Interestingly this widget type is largely obselete with **IMGUI** (unless you explicitly require scrolling support; see the section on advanced features). Since a list is often simply a bunch of text labels, you can support that by simply doing the following... At this point it should be clear that the list box / list control concept doesn't exist per-se in **IMGUI**, as you can simply iterate application state and 'do a widget' per item in your collection." — `imgui.html`, "Hey, where's the list box?" section
|
||||
|
||||
The retained-mode list control is an object that manages selection state, scroll position, and item rendering internally. The IMGUI alternative: iterate the application data directly and call `radio()` per item. The selection state is stored in the application (`mySelection`), not in the widget. The widget call is the visualization; the data is the Model.
|
||||
|
||||
#### The Radio/Check/Tab Equivalence
|
||||
|
||||
O'Donnell notes a surprising consequence:
|
||||
|
||||
> "An interesting aspect of **IMGUI** is that the classic widget types radio button, check box, and tab (i.e. like in a property sheet) are functionally equivalent from a client perspective. The various methods are here only for aesthetic reasons, i.e. depending on your application one or the other may be more applicable." — `imgui.html`, "Radio buttons, check boxes, and tabs" section
|
||||
|
||||
This is a direct consequence of the "widgets as method calls" claim: if widgets are just method calls, then the distinction between radio, check, and tab is purely a presentation choice made by the caller (which method to call, and with which visual parameters), not a property of the widget itself. The widget has no internal state distinguishing radio from check from tab.
|
||||
|
||||
**Take bullets (for Tier 1 copy into Section 1 anchor claims):**
|
||||
|
||||
- **[Anchor Claim 1 — primary]** "Widgets, logically, change from being objects to being method invocations." — `imgui.html`, "Immediate Mode applied" section, third paragraph. URL: `https://johno.se/book/imgui.html`
|
||||
- **[Anchor Claim 1 — root cause]** "This is the basic problem; this state (inherent to the user interface system) is a COPY / CACHE of the REAL state, which is owned by and resides with in the specific application itself." — `imgui.html`, "The woes of caching state" section.
|
||||
- **[Anchor Claim 1 — mechanism]** The IMGUI `button()` is three lines: `drawRect`, `drawText`, return mouse-poll bool. No widget object, no state map, no ID. — `imgui.html`, "Implementing basic interactions" section.
|
||||
- **[Anchor Claim 1 — empirical]** UfoPilot II front-end collapsed from ~10-15 classes to 1 class after porting to IMGUI. — `imgui.html`, "An example of simplification" section.
|
||||
- **[Anchor Claim 1 — list box dissolution]** "The list box / list control concept doesn't exist per-se in **IMGUI**, as you can simply iterate application state and 'do a widget' per item in your collection." — `imgui.html`, "Hey, where's the list box?" section.
|
||||
- **[Anchor Claim 1 — conceptual shift]** "Widgets are no longer objects at all, and can't really be said to 'exist'. They take instead the form of procedural method calls." — `imgui.html`, "Immediate Mode applied" section.
|
||||
- **[Anchor Claim 1 — real-time loop]** "Fundamental to this approach is the concept of a real-time application loop, where the application processes logic and draws its display at real-time rates (30 frames per second or more)." — `imgui.html`, "Immediate Mode applied" section.
|
||||
- **[Anchor Claim 1 — radio/check/tab equivalence]** "An interesting aspect of **IMGUI** is that the classic widget types radio button, check box, and tab... are functionally equivalent from a client perspective." — `imgui.html`, "Radio buttons, check boxes, and tabs" section.
|
||||
- **[Anchor Claim 1 — three-way sync burden]** "State moves from the application to the user interface... and state moves from the user interface back to the application when the user interacts with the interface." — `imgui.html`, "The woes of caching state" section.
|
||||
- **[Anchor Claim 1 — callback complexity]** "This requires the application to implement 'event handlers' for any low-level interaction that is of interest, often by subclassing some toolkit baseclass." — `imgui.html`, "The woes of caching state" section.
|
||||
|
||||
---
|
||||
|
||||
### Anchor Claim 2: "Reads Are Free, Writes Are Formalized"
|
||||
|
||||
**Source:** `https://johno.se/book/mvc.html` — "Writing to Model state" section, second paragraph:
|
||||
|
||||
> "Writes to Model are formalized through the addition of IEventTarget. This is a pure virtual interface that defines all possible state changes / events on a system wide level. Controller will be passed an IEventTarget each frame, and any changes it wishes to make to Model must go through this interface."
|
||||
|
||||
#### The Type-Level Access Matrix
|
||||
|
||||
O'Donnell enforces the read/write asymmetry at the type level. The full access matrix from `mvc.html`:
|
||||
|
||||
> "First of all, View and Controller may only access Model in a const fashion. This has numerous repercussions. Firstly, exposing central Model state as public is ok, as it can only be read. Also, only const methods may be called, so state changes cannot be made internally as a result of a bad function call. This allows for a clear grouping of aspects of the Model into read and write categories." — `mvc.html`, "Reading Model state" section
|
||||
|
||||
The phrase "exposing central Model state as public is ok" is counterintuitive in the context of traditional OOP wisdom, where encapsulated state is considered sacred. O'Donnell's argument is that with const-only access, encapsulation is irrelevant for reads — anyone can read public state, but no one can modify it without going through the formal channel. The encapsulation concern shifts entirely to writes.
|
||||
|
||||
O'Donnell's own code structure:
|
||||
|
||||
> "I personally let View hold a const Model&, and have the Controller baseclass supply a View&. This way View can access model in a const way, and Controller can access View in a non-const way, and via it Model in a const way. From the top of the App this is: App owns a Model, a View and a MetaController; View has a const& to Model; MetaController has a & to View, and passes this to each IController implementation." — `mvc.html`, "Reading Model state" section
|
||||
|
||||
The access paths are:
|
||||
```
|
||||
Controller → View& → const Model& (read)
|
||||
Controller → IEventTarget& → Model (write)
|
||||
View → const Model& (read)
|
||||
```
|
||||
|
||||
No component holds a non-const Model reference. This is the complete access matrix — enforced by types, not by convention.
|
||||
|
||||
#### Why Writes Are Formalized
|
||||
|
||||
O'Donnell doesn't just state the invariant; he explains the rationale:
|
||||
|
||||
> "Writes to Model are formalized through the addition of IEventTarget." — `mvc.html`, "Writing to Model state" section
|
||||
|
||||
The word "formalized" is precise: a write is not merely a memory mutation, it is a formal event with a defined signature, a defined semantics, and a defined recipient (the IEventTarget implementation). The formalization enables:
|
||||
1. **Auditing:** every write is recorded in the event stream
|
||||
2. **Network transparency:** writes can be routed to a remote Model transparently
|
||||
3. **Re-entrancy:** writes trigger re-entrant callbacks through the same interface
|
||||
4. **Verification:** the event stream can be replayed against a verification Model
|
||||
|
||||
#### Why a Single Interface Beats Read/Write Separation
|
||||
|
||||
O'Donnell explicitly argues against separating the write interface from the notification interface:
|
||||
|
||||
> "Experience dictates that there only be a single IEventTarget interface that is responsible for all 'system events', rather than a 'write interface' and a 'notification / read' interface (for callbacks). Most often, the exact information that causes a change is the information required to visualise that change, and in other cases this information can be derived and looked up in the Model (by Controller or View)." — `mvc.html`, "Why only a single event interface" section
|
||||
|
||||
The argument has two parts. First, empirical: O'Donnell tried the separate-interface approach in GC2 (with IGame/IPlayer having separate "command" and "notification" methods) and found it led to state duplication and invariant violations. Second, theoretical: the data that drives a state change is the same data needed to visualize that change, so separating the "write" channel from the "notification" channel is redundant.
|
||||
|
||||
#### The Ground Control 2 Lesson: State Duplication Is the Problem
|
||||
|
||||
O'Donnell traces the architecture to its origins in Ground Control 2's client/server model:
|
||||
|
||||
> "The architecture used in Ground Control 2 (which evolved into this architecture) was a plain remote proxy architecture, involving an IGame and IPlayer pair. IGame represented the 'server' (which is analogous to Model), while IPlayer represented a 'client' (which is analogous to both View and Controller, with no real clear definition in between, as well as a cache of state that can be viewed as a subset of Model)." — `mvc.html`, "Why only a single event interface" section
|
||||
|
||||
The problems O'Donnell encountered with the GC2 approach:
|
||||
|
||||
**Problem 1 — Forced conceptual leakage:** "the server/Model was forced to have an internal concept of 'players' in order for the remote cases to work, even though the concept of a 'player' had no real logical place in the context of the game."
|
||||
|
||||
**Problem 2 — State duplication with implicit invariants:** "there was no shared state between a 'game' and a 'player'. This implied many invariants that were difficult to maintain. For example, IPlayer::EntityCreated(id) implied that some later IPlayer method call could reference that id and have it implicitely refer to a unit that was assumed to have been created."
|
||||
|
||||
**Problem 3 — IPlayer cache pollution:** "Due to the fact that we had several implementations of IPlayer (Player, RemotePlayer, ScriptPlayer, and AIPlayer), the amount of duplication of similar 'stateful' concepts, such as the above mentioned 'entity' was enormous and ridiculous."
|
||||
|
||||
**Problem 4 — Visualization coupling:** Adding a minimap view required "invading" the internal state representations of each IPlayer implementation, because each implementation had tightly coupled caches specific to its visualization pattern.
|
||||
|
||||
The lesson: every cache of Model state in View or Controller is a source of bugs. The only way to eliminate the bugs is to eliminate the caches. The only way to eliminate the caches is to formalize all writes through a single interface and give all components const-only access to Model.
|
||||
|
||||
#### The Reads Are Free Corollary
|
||||
|
||||
The read path has no constraints — any component can read any part of Model at any time:
|
||||
|
||||
> "Exposing central Model state as public is ok, as it can only be read." — `mvc.html`, "Reading Model state" section
|
||||
|
||||
This is the "reads are free" corollary: because the type system prevents writes through the const reference, reads can be arbitrarily frequent and arbitrarily complex without coordination overhead. There is no locking, no subscription, no observer pattern needed for reads. The Model is a shared read-only data structure.
|
||||
|
||||
**Take bullets (for Tier 1 copy into Section 1 anchor claims):**
|
||||
|
||||
- **[Anchor Claim 2 — primary]** "Writes to Model are formalized through the addition of IEventTarget." — `mvc.html`, "Writing to Model state" section. URL: `https://johno.se/book/mvc.html`
|
||||
- **[Anchor Claim 2 — type enforcement]** View holds `const Model&`, Controller holds `IEventTarget&`. Every write routes through the interface; every read is unconstrained. — `mvc.html`, "Reading Model state" section.
|
||||
- **[Anchor Claim 2 — access matrix]** "View has a const& to Model... MetaController has a & to View, and passes this to each IController implementation." — `mvc.html`, "Reading Model state" section.
|
||||
- **[Anchor Claim 2 — single interface rationale]** "The exact information that causes a change is the information required to visualise that change." — `mvc.html`, "Why only a single event interface" section.
|
||||
- **[Anchor Claim 2 — free reads]** "Exposing central Model state as public is ok, as it can only be read." — `mvc.html`, "Reading Model state" section.
|
||||
- **[Anchor Claim 2 — GC2 lesson]** Multiple IPlayer implementations each had tightly coupled caches; adding minimap required "invading" these representations. — `mvc.html`, "Why only a single event interface" section.
|
||||
- **[Anchor Claim 2 — const-only access]** "Only const methods may be called, so state changes cannot be made internally as a result of a bad function call." — `mvc.html`, "Reading Model state" section.
|
||||
- **[Anchor Claim 2 — event merge]** "CreateEntity() and EntityCreated() can for example be merged into CreateEntity()." — `mvc.html`, "Why only a single event interface" section.
|
||||
|
||||
---
|
||||
|
||||
### Anchor Claim 3: The IEventTarget Pattern
|
||||
|
||||
**Source:** `https://johno.se/book/mvc.html` — "Writing to Model state" section, opening paragraph:
|
||||
|
||||
> "Writes to Model are formalized through the addition of IEventTarget. This is a pure virtual interface that defines all possible state changes / events on a system wide level."
|
||||
|
||||
#### The Pure Virtual Interface as Event Bus
|
||||
|
||||
IEventTarget is a pure virtual C++ interface. O'Donnell describes it as defining "all possible state changes / events on a system wide level." The key properties:
|
||||
|
||||
1. **Pure virtual:** No implementation in the interface itself; all implementations (local Model, network proxy) are substitutable
|
||||
2. **System-wide:** All state changes in the entire application flow through this one interface
|
||||
3. **Event-based:** Each method call is both a state mutation and a notification; there is no separate notification channel
|
||||
|
||||
#### The Re-Entrancy Mechanism
|
||||
|
||||
O'Donnell extends IEventTarget beyond simple write formalization. Model itself stores an IEventTarget& for re-entrancy:
|
||||
|
||||
> "To do this, it is typical to have Controller/MetaController also implement IEventTarget, and extend the interface to include these 'visualisation callbacks'. App supplies a reference to IEventTarget to the Model (which is the Controller / MetaController on construction, and Model stores this reference for later callback during runtime." — `mvc.html`, "Event callbacks" section
|
||||
|
||||
The re-entrancy flow:
|
||||
1. Controller calls `Model.IEventTarget_StartGame()` to start the game
|
||||
2. Model performs the state change (sets game state to running)
|
||||
3. Model calls the stored `IEventTarget&` (which is the Controller) to notify of the state change
|
||||
4. Controller's IEventTarget implementation triggers visualization (plays intro sequence, etc.)
|
||||
|
||||
This is the closed event bus: all state changes route through IEventTarget, and IEventTarget can re-enter through the same interface. No event can escape without being formally dispatched.
|
||||
|
||||
#### Network Transparency
|
||||
|
||||
O'Donnell's original motivation for IEventTarget was network transparency:
|
||||
|
||||
> "The initial motivation for the IEventTarget / const Model& formalization was to completely abstract the locality of the IEventTarget implementation (i.e. remote proxy). Using this pattern, network code is completely external to the system. Controller transparently writes to some implementation of IEventTarget (either a Model or a network proxy), and both View and Controller transparently see any changes to Model that may have come from across a network." — `mvc.html`, "Remote proxies and Network abstraction" section
|
||||
|
||||
The key property: Controller never knows whether it is writing to a local Model or a network proxy. The IEventTarget reference is identical in both cases. This is the location-agnostic property that makes the pattern powerful.
|
||||
|
||||
#### Controller Isolation Across the Network
|
||||
|
||||
O'Donnell makes the isolation property explicit:
|
||||
|
||||
> "Note that this allows the 'reads are free, writes are formalized' paradigm be extended across a network. A Controller client who is talking to a remote server is completely isolated from the code that updates the local Model, and can 'read for free', but must still write via an IEventTarget. As this formalization is also useful in the local case, it is nice that all components of MVC see the world in the same way regardless of the existence of a network." — `mvc.html`, "Remote proxies and Network abstraction" section
|
||||
|
||||
The phrase "completely isolated" is the key: the Controller does not know whether it is talking to a local or remote Model. The isolation is achieved by the IEventTarget interface being the same in both cases.
|
||||
|
||||
#### The CreateEntity / EntityCreated Merge
|
||||
|
||||
O'Donnell shows how the IEventTarget pattern simplifies API surfaces:
|
||||
|
||||
> "CreateEntity() and EntityCreated() can for example be merged into CreateEntity(), and a client who calls CreateEntity() can gracefully react to a future CreateEntity() and understand it to mean that an entity has been created." — `mvc.html`, "Why only a single event interface" section
|
||||
|
||||
In the GC2 architecture, `CreateEntity()` was the client-side call and `EntityCreated()` was the server-side callback — two separate methods with a bidirectional dependency. In the IEventTarget architecture, there is one method: `CreateEntity()`. The caller issues the command; the callee (Model or proxy) performs the state change and the same call is re-delivered to all IEventTarget implementations (including the caller's own re-entry) as a notification. The API surface is halved; the semantics are preserved.
|
||||
|
||||
#### The Director Pattern for Multi-Mode Deployment
|
||||
|
||||
O'Donnell addresses the practical question of how to deploy the same architecture across local, listen, and dedicated server modes:
|
||||
|
||||
> "The Director encapsulates the details of the various modes, with when aggregated together are: Model, View, Controller; Client (the proxy to a remote Model, i.e. a 'server'); Server (the proxy to all remote Controllers, i.e. 'clients')." — `mvc.html`, "The Director" section
|
||||
|
||||
The Director is the top-level assembler that wires together Model, View, Client, and Server based on the deployment mode. In local mode, there is no Client or Server — Controller talks directly to Model. In listen mode, there is a Client (proxy to remote server) and a Server (proxy to remote clients). In dedicated mode, there is no local Controller — Server handles all client connections.
|
||||
|
||||
**Take bullets (for Tier 1 copy into Section 1 anchor claims):**
|
||||
|
||||
- **[Anchor Claim 3 — primary]** "Writes to Model are formalized through the addition of IEventTarget. This is a pure virtual interface that defines all possible state changes / events on a system wide level." — `mvc.html`, "Writing to Model state" section. URL: `https://johno.se/book/mvc.html`
|
||||
- **[Anchor Claim 3 — re-entrancy]** Model stores `IEventTarget&`; when Model logic fires an event, it re-enters through Controller via the same interface for visualization. — `mvc.html`, "Event callbacks" section.
|
||||
- **[Anchor Claim 3 — network transparency]** "Controller transparently writes to some implementation of IEventTarget (either a Model or a network proxy), and both View and Controller transparently see any changes to Model that may have come from across a network." — `mvc.html`, "Remote proxies and Network abstraction" section.
|
||||
- **[Anchor Claim 3 — network isolation]** "A Controller client who is talking to a remote server is completely isolated from the code that updates the local Model, and can 'read for free', but must still write via an IEventTarget." — `mvc.html`, "Remote proxies and Network abstraction" section.
|
||||
- **[Anchor Claim 3 — single interface]** "Experience dictates that there only be a single IEventTarget interface that is responsible for all 'system events'." — `mvc.html`, "Why only a single event interface" section.
|
||||
- **[Anchor Claim 3 — event merge]** "CreateEntity() and EntityCreated() can for example be merged into CreateEntity()." — `mvc.html`, "Why only a single event interface" section.
|
||||
- **[Anchor Claim 3 — Director pattern]** "The Director encapsulates the details of the various modes." — `mvc.html`, "The Director" section.
|
||||
|
||||
---
|
||||
|
||||
### Anchor Claim 4: View Must Not Expose Scene-Graph Abstractions
|
||||
|
||||
**Source:** `https://johno.se/book/mvc.html` — "View" section, fourth paragraph:
|
||||
|
||||
> "This also means that the popular 'scene-graph' design may not be exposed from the View. You are free to do anything you want internally when it comes to clever caching of things, but this may not be exposed to clients. For example, any type of 'instance abstraction' to represent a mesh-transform pair in the public interface is illegal. The corresponding interface should be of the form: `view::drawMesh(mesh, transform, anyOtherRenderState);`"
|
||||
|
||||
#### The Scene-Graph Prohibition
|
||||
|
||||
O'Donnell issues an explicit prohibition:
|
||||
|
||||
> "This also means that the popular 'scene-graph' design may not be exposed from the View." — `mvc.html`, "View" section
|
||||
|
||||
The scene-graph design (popularized by Ogre and similar engines) is a hierarchical object model where every mesh-transform pair is a node in a tree. The tree enables parent-child transforms, hierarchical culling, and state sorting — but it also exposes a hierarchical object model to the client (Controller). O'Donnell forbids this in View's public interface.
|
||||
|
||||
#### Internal Caching Is Allowed
|
||||
|
||||
O'Donnell explicitly permits internal caching:
|
||||
|
||||
> "You are free to do anything you want internally when it comes to clever caching of things, but this may not be exposed to clients." — `mvc.html`, "View" section
|
||||
|
||||
View may cache vertex buffers, state batches, sorted draw lists — anything — internally. But the cache is invisible to the client. The client never sees handles, nodes, instances, or any other persistent abstraction. This is the key constraint: View's internal implementation can be as complex as needed, but its public interface must be flat and procedural.
|
||||
|
||||
#### The Correct Interface Form
|
||||
|
||||
O'Donnell specifies the exact interface signature that is legal:
|
||||
|
||||
> "The corresponding interface should be of the form: `view::drawMesh(mesh, transform, anyOtherRenderState);`" — `mvc.html`, "View" section
|
||||
|
||||
This is a free function signature, not a method on a stateful object. The parameters are all the data needed to render the mesh this frame; there are no handles, no IDs, no references to previously created objects. Each call is self-contained.
|
||||
|
||||
#### The Procedural Interface Definition
|
||||
|
||||
O'Donnell defines what a non-stateful View looks like from the client's perspective:
|
||||
|
||||
> "What is a non-stateful view? Basically it is a procedural interface (as opposed to a collection of objects with methods), in essence very much to what DirectX 9 is." — `pitch.html`, "MVC revisited" section
|
||||
|
||||
DirectX 9 is O'Donnell's reference for a procedural graphics API: a collection of free functions (`DrawPrimitive()`, `SetRenderState()`, etc.) that receive all required state at call time. There are no persistent objects representing meshes, textures, or transforms — those are all handles or indices passed to the draw calls.
|
||||
|
||||
#### The Retained-Mode Attack
|
||||
|
||||
O'Donnell names the specific problem with stateful Views:
|
||||
|
||||
> "The main issue is that Views implicitely cache Model state (as private object members), which brings rise to sync issues. I believe that the premise that visualisation is/should be a stateful thing is false." — `pitch.html`, "However!" section
|
||||
|
||||
The word "implicitely" is important: the caching is not explicit in the client's mental model — it is implicit in the toolkit's design. The client creates a widget object, and the widget object implicitly caches the application state it needs to display. When the application state changes, the client must remember to push the new state to the widget object. When the widget state changes, the client must remember to pull the new state from the widget object. The implicit caching is the synchronization burden.
|
||||
|
||||
#### The Historical Performance Justification
|
||||
|
||||
O'Donnell traces why scene graphs became dominant:
|
||||
|
||||
> "Historically, this classic architecture was REQUIRED in order to deliver any kind of performance, i.e. heirarchical routing trees for heirarchical frustum culling, matrix transform caches, etc. The premise was to 'retain much state, and only update this state when absolutely required'." — `pitch.html`, "However!" section
|
||||
|
||||
The scene graph was a performance optimization for a specific hardware era: CPUs were slow, GPUs were simple, and the bus between them was the bottleneck. By retaining hierarchical state on the CPU, the renderer could avoid resubmitting geometry that was culled by the CPU-side hierarchical culling. Matrix transform caches avoided recomputing world matrices for every object.
|
||||
|
||||
#### GPU Advances Eliminate the Justification
|
||||
|
||||
O'Donnell argues the performance justification is obsolete:
|
||||
|
||||
> "However, due to the rapide advances in GPU based rendering over the past 10+ years, this premise no longer holds." — `pitch.html`, "However!" section
|
||||
|
||||
The premise was: "retain much state, only update when absolutely required." The modern GPU era: state is cheap, bandwidth to the GPU is the bottleneck, and batch rendering is more efficient than culling. The scene graph's performance justification — hierarchical CPU-side culling — is no longer the dominant factor in rendering performance.
|
||||
|
||||
#### Jungle Peak: Empirical Evidence
|
||||
|
||||
O'Donnell provides a concrete empirical result:
|
||||
|
||||
> "In DirectX9 is possible to render very large batches of primitives per draw call. At Jungle Peak we rendered 800 000+ vertices in a single call on nVidia GeForce 6 class hardware, with good performance. The meant a number of things, such as discarding the concept of camera culling. We simply batched together all instances of a particular mesh into a single huge vertex/index buffer pair (one per texture basically), and sent them all to the hardware with very few calls." — `pitch.html`, batch rendering section
|
||||
|
||||
800,000 vertices in a single draw call. If that many vertices can be submitted at once, there is no performance justification for the complex state management that scene graphs require. The CPU-side hierarchical culling that scene graphs exist to enable is not necessary when you can just batch everything and let the GPU handle it.
|
||||
|
||||
**Take bullets (for Tier 1 copy into Section 1 anchor claims):**
|
||||
|
||||
- **[Anchor Claim 4 — primary]** "The corresponding interface should be of the form: `view::drawMesh(mesh, transform, anyOtherRenderState);`" — `mvc.html`, "View" section. URL: `https://johno.se/book/mvc.html`
|
||||
- **[Anchor Claim 4 — scene-graph prohibition]** "The popular 'scene-graph' design may not be exposed from the View." — `mvc.html`, "View" section.
|
||||
- **[Anchor Claim 4 — procedural not object-oriented]** "What is a non-stateful view? Basically it is a procedural interface (as opposed to a collection of objects with methods), in essence very much to what DirectX 9 is." — `pitch.html`, "MVC revisited" section.
|
||||
- **[Anchor Claim 4 — GPU eliminates retained-mode justification]** "However, due to the rapide advances in GPU based rendering over the past 10+ years, this premise no longer holds." — `pitch.html`, "However!" section.
|
||||
- **[Anchor Claim 4 — empirical]** Jungle Peak rendered 800,000+ vertices in a single draw call on GeForce 6 hardware, eliminating the need for scene-graph culling. — `pitch.html`, batch rendering section.
|
||||
- **[Anchor Claim 4 — stateless View definition]** "This part of the application is completely stateless from a client perspective (immediate mode), the client being the Controller." — `mvc.html`, "View" section.
|
||||
- **[Anchor Claim 4 — internal caching allowed]** "You are free to do anything you want internally when it comes to clever caching of things, but this may not be exposed to clients." — `mvc.html`, "View" section.
|
||||
- **[Anchor Claim 4 — implicit caching is the problem]** "Views implicitely cache Model state (as private object members), which brings rise to sync issues." — `pitch.html`, "However!" section.
|
||||
|
||||
---
|
||||
|
||||
## Connections: DSL Tier 4 Verbs to O'Donnell's Claims
|
||||
|
||||
The following mappings connect the DSL's Tier 4 verbs (sandbox, audit, intent_mapping, sandbox_execute) to the four anchor claims derived from O'Donnell's work. These are the specific hooks the Tier 1 will use when writing Section 6, Claims 9 and 10.
|
||||
|
||||
### Connection 1: `sandbox` verb → "Reads are free, writes are formalized" (Anchor Claim 2)
|
||||
|
||||
The `sandbox` verb isolates execution and enforces that all state observations by the sandboxed code are *reads* — they can occur freely against the const Model view. State mutations by sandboxed code, however, must be routed through the formal event channel. O'Donnell's architecture achieves this by giving Controller a `const Model&` and an `IEventTarget&` — reads against the former are unconstrained, writes through the latter are gated.
|
||||
|
||||
The DSL's `sandbox` verb maps directly to this architecture: the sandbox receives a read-only snapshot of state (the `const Model&` equivalent), and any write attempt is intercepted and routed as a formal event through the verb dispatch layer (the `IEventTarget` equivalent). This is not a policy choice added later — it is a structural invariant derived from O'Donnell's const-only Model access rule. The sandbox cannot hold a non-const reference to state because no such reference exists in the architecture.
|
||||
|
||||
The practical implication: sandboxed code can observe any part of the Model it has access to, as frequently as it wants, without coordination overhead. But it cannot mutate state without going through the formal channel. This is exactly the "reads are free, writes are formalized" invariant applied to the DSL's verb execution model.
|
||||
|
||||
The parallel extends to the access matrix. In O'Donnell's architecture:
|
||||
```
|
||||
Controller → View& → const Model& (read)
|
||||
Controller → IEventTarget& → Model (write)
|
||||
View → const Model& (read)
|
||||
```
|
||||
|
||||
In the DSL's sandbox:
|
||||
```
|
||||
sandboxed code → read-only state snapshot (read, free)
|
||||
sandboxed code → formal event channel → verb dispatch (write, formalized)
|
||||
```
|
||||
|
||||
The structure is identical: one read path (unconstrained), one write path (formalized). The DSL's sandbox is the Controller role; the state snapshot is the `const Model&`; the event channel is the `IEventTarget`.
|
||||
|
||||
**Section 6 Claim 9 hook (Tier 1):** "The sandbox verb enforces 'reads are free' by providing a const snapshot as the only state handle; all writes are forced through the formal event channel, directly mirroring O'Donnell's `const Model&` / `IEventTarget` split (source: `mvc.html`, 'Reading Model state' and 'Writing to Model state' sections)."
|
||||
|
||||
### Connection 2: `audit` verb → IEventTarget pattern (Anchor Claim 3)
|
||||
|
||||
The `audit` verb records every formal state-change event for later replay and verification. O'Donnell's `IEventTarget` is itself an event log: it is the single interface through which all writes flow, and both local Model and remote proxies implement it identically. A Controller writing to a remote Model uses the same `IEventTarget` call it would use for a local Model — the interface is location-agnostic.
|
||||
|
||||
O'Donnell explicitly notes that this allows Controller to be completely isolated from the code that updates Model:
|
||||
|
||||
> "Controller transparently writes to some implementation of IEventTarget (either a Model or a network proxy), and both View and Controller transparently see any changes to Model that may have come from across a network." — `mvc.html`, "Remote proxies and Network abstraction"
|
||||
|
||||
The `audit` verb is the DSL's implementation of this same pattern: it wraps the verb dispatch interface, records every call (the event), and replays it against a verification Model. No write can bypass the audit because no write can bypass the interface. The audit log is a first-class artifact — it is the `IEventTarget` trace, equivalent to the network proxy's event stream in O'Donnell's architecture.
|
||||
|
||||
The `audit` verb also inherits O'Donnell's re-entrancy mechanism: when Model logic fires an event that re-enters through the Controller, the audit log captures both the initial write and the re-entrant callback as separate events in the same trace. This enables complete replay: running the audit log against a fresh Model reproduces the exact sequence of state changes that occurred in the original execution.
|
||||
|
||||
Furthermore, O'Donnell's principle that "the client is in no way dependent on ANY IEventTarget callbacks in order to operate correctly" maps to the DSL's guarantee that the audit log is for observability, not for correctness: the sandboxed code's behavior is determined by the Model state, not by whether the audit verb is present.
|
||||
|
||||
**Section 6 Claim 10 hook (Tier 1):** "The audit verb is the DSL's `IEventTarget`: a single interface that all state mutations must route through, enabling complete replay and verification — exactly as O'Donnell describes in `mvc.html`, 'Remote proxies and Network abstraction' and 'Event callbacks' sections. The audit log is the event trace; the verification Model is the replay target."
|
||||
|
||||
### Connection 3: `intent_mapping` verb → Controller-per-frame procedural composition (Anchor Claims 1 + 4)
|
||||
|
||||
O'Donnell's Controller is not a callback handler, not a state machine, and not a retained-mode widget host. It is a per-frame procedural composer of View. From `pitch.html`, "MVC revisited" section:
|
||||
|
||||
> "Controller has 2 jobs: (1) doInput(): react to used input and direct how that input is allowed to change Model state; (2) doOutput(): dynamically, in real time, compose the current 'view' of the application using View."
|
||||
|
||||
This is the key architectural move: Controller *programs* View each frame, procedurally, with no retained state between frames. The "view" that appears on screen is the result of the Controller's per-frame composition — not a cached state that persists across frames. If the Controller changes its strategy mid-session (e.g., switching from play mode to edit mode), the entire View changes immediately because View has no retained state to clean up before restarting.
|
||||
|
||||
The `intent_mapping` verb does exactly this at the DSL level: it takes a high-level intent description (e.g., "refactor this function to use early return") and procedurally composes a sequence of lower-level verb calls (sandbox, audit, edit operations), frame by frame, without retaining any intermediate widget state. The result of one frame's composition becomes the input to the next frame's composition — exactly O'Donnell's "dynamic, procedural" Controller.
|
||||
|
||||
The flat, stateless execution context required by `sandbox` and `sandbox_execute` is the same constraint O'Donnell imposes on View: no scene-graph abstractions, no persistent handles, only the current call frame's arguments. The `intent_mapping` verb's output is a sequence of flat verb calls, not a hierarchical object graph. Each call is self-contained: it receives all context at call time, executes, and returns. There are no handles to intermediate results that persist between calls.
|
||||
|
||||
**Section 6 Claim 9/10 cross-hook (Tier 1):** "The `intent_mapping` verb is the DSL's Controller: per-frame procedural composition of verb calls, with no retained state between frames, directly inheriting O'Donnell's Controller role from `pitch.html`, 'MVC revisited' section, and the flat procedural View constraint from `mvc.html`, 'View' section."
|
||||
|
||||
### Connection 4: `sandbox_execute` verb → Deferred display / frame-shearing awareness (Anchor Claims 1 + 4)
|
||||
|
||||
O'Donnell discusses a subtle but important phenomenon called "frame shearing" (`imgui.html`, "Frame shearing" section):
|
||||
|
||||
> "One aspect of IMGUI to be aware of in the context of real-time applications (constantly rendering new frames many times per second) is that user interactions will always be in response to something that was drawn on a previous frame... There is a chance that the result of any given widget interaction changes some application state that controls the appearance of the user interface itself, and such discrepancies can result in parts of the user interface reflecting the 'old' state while some reflect the 'new' state. I call this 'frame shearing', in that the displayed image represents parts of two different logical images at once."
|
||||
|
||||
The solution O'Donnell proposes is a "shearing exception" — when interaction changes application state that controls UI appearance, the GUI generation restarts for the current frame:
|
||||
|
||||
> "The main technique to utilize is to have any code that changes the appearance of the user interface generate a 'shearing exception' which breaks out of the method that generates the gui for the current frame and restarts the entire process for the current frame. Theoretically a 'shearing exception' must be thrown for each interaction that could change the appearance of the user interface, but in practice this usually only happens once per frame (i.e. the gui is at most generated in full more than once but less than twice)." — `imgui.html`, "Frame shearing" section
|
||||
|
||||
The `sandbox_execute` verb's frame-bound execution model maps to this: each execution frame is isolated, and the verb dispatch layer can detect when a state change invalidates the current composition and restart. The sandbox does not retain state between frames, so there is no stale state to clean up before restarting — exactly the "shearing exception" mechanism. The restart is clean because the execution context is stateless by construction.
|
||||
|
||||
This also maps to O'Donnell's "immediate mode" principle from `imgui.html`: the real-time application loop redraws at 30+ fps, and each frame's GUI is generated from scratch. The DSL's `sandbox_execute` verb similarly generates each execution frame from scratch, with no retained state between frames.
|
||||
|
||||
**Section 6 Claim 9/10 extended hook (Tier 1):** "The `sandbox_execute` verb's frame-isolated execution model maps to O'Donnell's 'shearing exception' mechanism (`imgui.html`, 'Frame shearing' section): each frame's composition can be restarted without stale state cleanup because the execution context is stateless by construction."
|
||||
|
||||
---
|
||||
|
||||
## Summary of Anchor Claims
|
||||
|
||||
| # | Anchor Claim | Source | Key Quote |
|
||||
|---|-------------|--------|-----------|
|
||||
| 1 | Widgets are method invocations, not objects | `imgui.html` — "Immediate Mode applied" | "Widgets, logically, change from being objects to being method invocations." |
|
||||
| 2 | Reads are free, writes are formalized | `mvc.html` — "Writing to Model state" | "Writes to Model are formalized through the addition of IEventTarget." |
|
||||
| 3 | IEventTarget is the single event interface for all state changes | `mvc.html` — "Writing to Model state" + "Event callbacks" | "Experience dictates that there only be a single IEventTarget interface that is responsible for all 'system events'." |
|
||||
| 4 | View must not expose scene-graph abstractions | `mvc.html` — "View" section | "The corresponding interface should be of the form: `view::drawMesh(mesh, transform, anyOtherRenderState);`" |
|
||||
|
||||
---
|
||||
|
||||
## Source URLs
|
||||
|
||||
| Page | URL | Key Claims |
|
||||
|------|-----|-----------|
|
||||
| IMGUI essay | `https://johno.se/book/imgui.html` | Widgets as method invocations; state-copy problem; deferred display; frame shearing; complete C++ Gui class code |
|
||||
| The Pitch | `https://johno.se/book/pitch.html` | Broken paradigm; GPU advances eliminate retained-mode justification; Controller as per-frame procedural composer; Jungle Peak 800K vertex single-draw-call |
|
||||
| IM-MVC roadmap | `https://johno.se/book/immvc.html` | Book structure; IEventTarget centrality; experience progression from GC to MVC/E; single interface rationale |
|
||||
| MVC chapter | `https://johno.se/book/mvc.html` | Reads free/writes formalized; IEventTarget pattern; re-entrancy; network transparency; scene-graph prohibition; Director pattern; GC2 lessons |
|
||||
@@ -0,0 +1,324 @@
|
||||
# Section 2 — Cluster 1: Concatenative (Forth Family)
|
||||
|
||||
**Cluster:** 1 of 8
|
||||
**Track:** `intent_dsl_survey_20260612`
|
||||
**Written by:** Tier 2 sub-agent (research)
|
||||
**Sources:** On-disk references at `C:\projects\forth\bootslop\references\`; Wikipedia (Forth, ColorForth, Joy); cosy.com (CoSy)
|
||||
|
||||
---
|
||||
|
||||
## Entry: Forth (Chuck Moore, 1970)
|
||||
|
||||
Forth is a stack-oriented, concatenative programming language designed by Charles H. "Chuck" Moore, first exposed to other programmers in 1970. It combines a compiler with an interactive shell where the programmer builds up a dictionary of *words* (subroutines), each consuming and producing values exclusively via an implicit data stack using Reverse Polish Notation (RPN). All syntactic elements — variables, operators, and control flow — are defined as words; there is no BNF grammar, no AST, and no separate compilation phase in the classic model. The defining structural feature is the colon-word/semicolon-definition pattern (` : foo ... ;`) that makes the dictionary the sole organizing principle of the program.
|
||||
|
||||
What we take from Forth is the pure concatenative property itself: the concatenation of two programs denotes the composition of the two functions they denote. This is the foundational claim of the entire cluster. The DSL's postfix syntax and its rejection of lambda-bound parameters (parameters are unnamed; they live on the stack) are direct inheritances. We do not inherit the memory-based data stack — modern hardware makes the register-file-as-global-namespace model more efficient — but the *syntax* of passing arguments implicitly through a stack is the DSL's core grammar.
|
||||
|
||||
### Detailed Analysis
|
||||
|
||||
**Stack Passing as the Universal Call Convention.** Forth's central design insight is that all word-to-word communication happens through a single shared stack. As the Wikipedia article states: "Forth emphasizes the use of small, simple functions called words. Words for bigger tasks call upon many smaller words that each accomplish a distinct sub-task. A large Forth program is a hierarchy of words. These words, being distinct modules that communicate implicitly via a stack mechanism, can be prototyped, built and tested independently." (https://en.wikipedia.org/wiki/Forth_(programming_language)#Overview) This hierarchical composition model — where every word is simultaneously a function and a composable phrase in a language — is the exact structural property the DSL inherits.
|
||||
|
||||
**Dictionary as Program Structure.** The Forth dictionary is a tree of linked lists searched at runtime, with a context switch mechanism that allows vocabulary namespaces to overlay each other. The article notes: "The dictionary is laid out in memory as a tree of linked lists with the links proceeding from the latest (most recently) defined word to the oldest, until a sentinel value, usually a NULL pointer, is found." (https://en.wikipedia.org/wiki/Forth_(programming_language)#Structure_of_the_language) This is the structural model for the DSL's vocabulary lookup: words are resolved by name in a search path, with later definitions shadowing earlier ones. There is no separate symbol table — the dictionary *is* the symbol table.
|
||||
|
||||
**No Formal Parameters.** Forth words that need inputs take them from the stack; words that need to return values leave them on the stack. The Wikipedia article gives the canonical example of `FLOOR5` which, when defined as `: FLOOR5 ( n -- n' ) 1- 5 MAX ;`, operates on a value that is implicitly on the stack with no named parameter. The article notes: "In definitions and abstractions of functions the formal parameters have to be named — x, y and so on. This is different in Joy. It is based on the composition of functions and not on the application of functions to arguments." (https://en.wikipedia.org/wiki/Forth_(programming_language)#Overview) The DSL inherits this: every verb's parameters are implicit stack positions, not named lambda variables.
|
||||
|
||||
**Threaded Code Compilation.** Classic Forth compiles to threaded code, which the article describes as "the classic technique was to compile to threaded code, which can be interpreted faster than bytecode." (https://en.wikipedia.org/wiki/Forth_(programming_language)#Overview) Modern Forths (SwiftForth, VFX Forth, iForth) compile to native machine code, but the original model of threaded interpretation is directly ancestral to the JIT-based approaches in KYRA and x68.
|
||||
|
||||
**Self-Compilation and Meta-Compilation.** Forth systems traditionally compile themselves — a technique called meta-compilation or self-hosting. The article describes: "The minimum definitions for such a Forth compiler are the words that fetch and store a byte, and the word that commands a Forth word to be executed." (https://en.wikipedia.org/wiki/Forth_(programming_language)#Self-compilation_and_cross_compilation) This bootstrap property — where the language is written in itself — is the ultimate expression of the concatenative property: the compiler is just another word in the dictionary.
|
||||
|
||||
### Code Examples
|
||||
|
||||
Classic Forth RPN arithmetic:
|
||||
|
||||
```
|
||||
25 10 * 50 + CR .
|
||||
300 ok
|
||||
```
|
||||
|
||||
Defining a word with stack comments:
|
||||
|
||||
```
|
||||
: FLOOR5 ( n -- n' ) DUP 6 < IF DROP 5 ELSE 1 - THEN ;
|
||||
```
|
||||
|
||||
This compiles `FLOOR5` as a word. When called with `8 FLOOR5`, it returns `7`. The stack comment `( n -- n' )` documents the before/after stack shape — a convention the DSL's inline documentation inherits.
|
||||
|
||||
### Take
|
||||
|
||||
- **For Section 1 (Anchor Claims):** "Forth (Moore, 1970) established the concatenative property — program concatenation denotes function composition — as a first-class language design principle. The DSL inherits this directly: every verb is a function that consumes and produces a stack, and concatenating two verb sequences composes their effects."
|
||||
- **For Section 5 (Hardware Mapping):** "Forth's zero-operand model (words pull from/push to an implicit stack) maps cleanly to the DSL's `->` pipeline operator. The stack is the register file; the pipeline is the Forth word chain."
|
||||
|
||||
---
|
||||
|
||||
## Entry: ColorForth (Chuck Moore, 1990s)
|
||||
|
||||
ColorForth is a derivative of Forth created by Chuck Moore in the 1990s, developed as the scripting language for his VLSI CAD program OKAD. Its defining feature is the use of color as a semantic layer: program text is tokenized as it is entered, and the color of a word determines whether it starts a definition (red), is compiled into the current definition (green), is executed immediately (yellow), or defines a variable (magenta). Color is not decoration — it is the entire syntax. Moore's own implementation comes with a tiny (63 KB) operating system; practically everything is stored as source code and compiled when needed.
|
||||
|
||||
What we take from ColorForth is the idea that **color (or an equivalent visual attribute) is a first-class syntactic dimension**. The DSL's verb qualifiers (`!`, `?`, `*`) and its arena/block delimiters (`{ }`, `[ ]`) are a flat-text approximation of what ColorForth makes spatial. We also take the insight that compilation and execution are interleaved modes, not separate phases — ColorForth switches between green (compile) and yellow (execute) within a single definition, precomputing values during compilation.
|
||||
|
||||
### Detailed Analysis
|
||||
|
||||
**Color as Syntax.** The Wikipedia article states: "The colors of program code in colorForth have semantic meaning. Red words start a definition, and green words are compiled into the current definition. Thus, colorForth would be written in standard Forth as `: color forth ;`." (https://en.wikipedia.org/wiki/ColorForth) Yellow words are executed immediately. Moore has stated that color is only one option for displaying the language — italics and other typographical conventions could serve the same purpose in a non-color medium. This confirms that the semantic layer is separable from the visual encoding.
|
||||
|
||||
**The Green/Yellow Mode Switch.** The article explains: "The transition from green to yellow and back again can be used while defining words, to transition between compiling words into the current definition, executing words immediately (manipulating the data stack during compilation), and back again (adding the top of the data stack to the current definition) — in other words, precomputing a value during compilation (a functionality that other languages use macros or optimizing compilers for)." (https://en.wikipedia.org/wiki/ColorForth) This is the direct ancestor of the DSL's `let` vs. immediate-execution distinction and of the compile-time evaluation that Onat Turkcuoglu's KYRA implements via its color semantics.
|
||||
|
||||
**Tokenization at Edit Time.** ColorForth tokenizes source as it is entered, moving compilation work into the editor. The article notes: "Program text is tokenized as it is entered, moving some of the work of compilation to the editor." (https://en.wikipedia.org/wiki/ColorForth) This is the same edit-time relinking principle that Lottes and Onat inherit — the editor is not a passive text buffer but an active participant in compilation.
|
||||
|
||||
**OKAD as the Integrated Environment.** ColorForth was developed for Moore's own VLSI CAD program. The article states: "colorForth was originally developed as the scripting language for Moore's own VLSI CAD program, OKAD, with which he develops custom Forth processors." (https://en.wikipedia.org/wiki/ColorForth) The tight coupling of the language, editor, and target domain (chip design) is a model for the DSL's integration with the Meta-Tooling boundary.
|
||||
|
||||
### Code Examples
|
||||
|
||||
ColorForth equivalent in standard Forth:
|
||||
|
||||
```
|
||||
: color forth ;
|
||||
```
|
||||
|
||||
The same code, color-annotated at edit time:
|
||||
- **Red:** starts the word definition (`: color forth`)
|
||||
- **Green:** compiled into the current definition
|
||||
- **Yellow:** executed immediately (mode switch during compilation)
|
||||
|
||||
### Take
|
||||
|
||||
- **For Section 1 (Anchor Claims):** "ColorForth (Moore, 1990s) showed that color — a visual attribute — can be a primary syntactic dimension, and that compile-time vs. run-time execution can be interleaved within a single definition. The DSL inherits this as the qualifier system (`!` for execute, `?` for conditional, `*` for compile-time) and the `[ ]` / `{ }` block delimiters."
|
||||
- **For Section 5 (Hardware Mapping):** "ColorForth's green/yellow mode switch is the semantic ancestor of the DSL's compile-time vs. run-time distinction. In hardware terms: compile is fetch-decode, execute is execute — but the two are not cleanly separated in the instruction stream."
|
||||
|
||||
---
|
||||
|
||||
## Entry: KYRA / VAMP (Onat Turkcuoglu, SVFIG 2025)
|
||||
|
||||
KYRA (Kernel of Your Runtime Architecture) is a binary-encoded, JIT-compiling Forth derivative presented by Onat Turkcuoglu at the Silicon Valley Forth Interest Group in April 2025. It compiles its entire program (including a custom editor, Vulkan renderers, and FFMPEG integrations) in 8.24 milliseconds on Windows/Linux. Its defining technical features are: a strict 2-register data stack (`RAX` as Top of Stack, `RDX` as Next on Stack); a magenta pipe token (`|`) that implicitly closes the previous definition and opens a new one via `RET` + `xchg rax, rdx`; basic blocks delimited by `[ ]` that provide implicit begin/link/end jump targets for the JIT; and lambdas delimited by `{ }` that compile code elsewhere and leave an address in `RAX`. VAMP is the register-based runtime model underlying KYRA. The system eliminates the memory-based data stack entirely, achieving hardware locality and GPU compatibility.
|
||||
|
||||
What we take from KYRA/VAMP is the **2-register stack** as the minimal viable stack model, the **magenta pipe `|`** as a definition boundary that collapses the colon/semicolon pair into a single token, **preemptive scatter** (arguments pre-placed into fixed memory slots before a call, so no argument gathering is needed at call time), and the **lambdas `{ }`** as separate code objects that are composed rather than inlined. These four features are the primary direct influence on the DSL's Tier 2 pipeline verbs.
|
||||
|
||||
### Detailed Analysis
|
||||
|
||||
**2-Register Hardware Stack.** Onat's central critique of traditional Forth is that it is "runtime opinionated" — standard Forth dictates a memory-based data stack, which is incompatible with GPU compute shaders. KYRA strictly restricts the data stack to exactly two CPU registers: `RAX` (Top of Stack) and `RDX` (Next on Stack). The in-depth analysis states: "To achieve hardware locality and GPU compatibility, KYRA strictly restricts the data stack to exactly two CPU registers: **`RAX` (Top of Stack)** and **`RDX` (Next on Stack)**." (`C:\projects\forth\bootslop\references\kyra_in-depth.md`, line 14) This 2-register model is the direct ancestor of the DSL's `->` pipeline operator, which passes exactly two values (input and context) along a chain.
|
||||
|
||||
**The Magenta Pipe `|` as Definition Boundary.** The `|` token implicitly signals the start of a new definition. The JIT reacts by emitting a `RET` (`C3`) to close the previous definition, followed by `48 92` (`xchg rax, rdx`) to rotate the stack for the new definition. The analysis states: "**Definitions:** There are no `begin` or `end` words. A magenta pipe token (`|`) implicitly signals the start of a new definition. The JIT reacts to this by: 1. Emitting a `RET` (`C3`) to close the *previous* definition. 2. Emitting `48 92` (`xchg rax, rdx`) to ensure proper stack alignment for the *new* definition." (`kyra_in-depth.md`, lines 24-27) This is the direct model for the DSL's `arena { }` block, which delimits a sequence of operations with an implicit entry/exit protocol.
|
||||
|
||||
**Basic Blocks `[ ]` and Lambdas `{ }`.** KYRA eliminates standard ASTs and `if/else/then` branching. Basic blocks `[ ]` visually constrain the assembly output with implicit begin/link/end jump targets. Lambdas `{ }` compile code elsewhere and leave an executable memory address in `RAX`. The analysis states: "**Basic Blocks `[ ]`:** These visually constrain the assembly output. They provide implicit begin, link (else), and end jump targets for the JIT to resolve relative offsets within a limited scope." And: "**Lambdas `{ }`:** A lambda (colored Yellow `{`) does not execute inline. The JIT compiles the block of code elsewhere in the arena and leaves its executable memory address in `RAX`." (`kyra_in-depth.md`, lines 56-59) These are the direct models for the DSL's `[ ]` (sequential block) and `{ }` (deferred/lambda block) delimiters.
|
||||
|
||||
**Preemptive Scatter.** Onat pre-scatters arguments into fixed global memory slots ("the tape") before a call, eliminating argument gathering at call time. The X.com thread analysis captures Lottes's commentary: "VK is most 'form filling'. For most 'C' like APIs I like to just lay out all the arguments in memory like a tape drive in the order that functions get called and source that tape at runtime for the calls." (`C:\projects\forth\bootslop\references\X.com - Onat & Lottes Interaction 1.png.ocr.md`, lines 52-55) And: "They key concept here is that 'common' arguments like the device are pushed onto the tape using store duplication when they are known (after device creation). So it's preemptive scatter, so later at call time there is no argument gather." (lines 59-61) This is the direct model for the DSL's `scatter` and `gather` verbs.
|
||||
|
||||
**Global Memory as Register Aliasing.** Onat critiques conventional wisdom about avoiding global variables: "For passing transient state (like the active UI element's `slot ID`), he implicitly passes the value in a dedicated register (e.g., `R12D`) across functions, completely bypassing any need to push it to a stack." (`kyra_in-depth.md`, line 41) The register file is treated as a shared, aliased memory space. Lottes on the X.com thread confirms: "I do all my custom CPU side stuff more like treating the register file like a 'memory' of which the contents are aliased to different shared structures for different purposes across time." (lines 96-98) The DSL inherits this as the **arena model**: a flat, fixed-offset memory region that all verbs share, with no argument-passing overhead.
|
||||
|
||||
**24-Bit Indices and Dictionary Organization.** Words are stored as 24-bit indices pointing to 8-byte cells, with the dictionary organized into 16-word horizontal "scrolls." The analysis notes: "Unlike text-based Forths that require hashing, KYRA uses a pure binary index map." (`kyra_in-depth.md`, line 47) Onat's next iteration moves to 32-bit indices + a separate 1-byte tag array, "exactly matching Lottes's `x68` annotation model." (line 49) This convergence confirms the correctness of both approaches.
|
||||
|
||||
### Code Examples
|
||||
|
||||
From the KYRA in-depth analysis, the color semantics emit x86-64 instructions directly:
|
||||
- **Magenta (`|`):** Definition boundary -> `RET` + `xchg rax, rdx`
|
||||
- **White (Call):** Direct `CALL` instruction or `JMP RAX` for tail-call optimization
|
||||
- **Green (Load):** `mov rax, [global_offset]`
|
||||
- **Red (Store):** `mov [global_offset], rax`
|
||||
- **Yellow (Execute/Immediate):** Runtime execution, immediate lambda invocation, struct member reading
|
||||
- **Cyan (Literal):** `mov rax, imm`
|
||||
- **Blue (Comment):** Stored in token payload without polluting the global dictionary
|
||||
|
||||
### Take
|
||||
|
||||
- **For Section 1 (Anchor Claims):** "KYRA/VAMP (Turkcuoglu, SVFIG 2025) is the most concrete modern expression of the Forth lineage: 2-register JIT-compiling stack, preemptive scatter, lambdas as separate code objects, and magenta-pipe definition boundaries. The DSL's `arena { }`, `scatter`, `gather`, and `->` pipeline operator are direct descendants of these four features."
|
||||
- **For Section 5 (Hardware Mapping):** "KYRA's 2-register stack (`RAX`/`RDX`) maps to the DSL's implicit input/output registers. The magenta pipe `|` maps to the DSL's `arena { }` entry/exit protocol. Preemptive scatter maps to the DSL's `scatter` verb (pre-place) and `gather` verb (collect)."
|
||||
|
||||
---
|
||||
|
||||
## Entry: x68 / 5th / "Ear" + "Toe" (Timothy Lottes, 2007-2026)
|
||||
|
||||
Timothy Lottes has spent nearly two decades evolving a Forth-like system from an HP48 RPN calculator baseline through multiple generations: a text-based "A" language (2014), a source-less "x68" binary encoding (2015), and the current "5th" system (2026). x68 is a subset of x86-64 where every instruction is padded to exactly 32 bits (4 bytes) using ignored segment override prefixes and multi-byte NOPs, enabling edit-time relinking. The 5th system adds a folded interpreter (a 5-byte interpreter folded into the end of every compiled word to eliminate branch misprediction stalls), an annotation overlay (64 bits of metadata per 32-bit token: 56 bits for a label/name, 8 bits for a semantic tag), and a self-modifying OS cartridge that uses Linux's memory mapping and dirty page writeback for persistence without a save-file system. "Ear" is the high-level Forth-like macro layer; "Toe" is the low-level x68 assembler.
|
||||
|
||||
What we take from Lottes is the **source-less model** (the binary *is* the source; no string parsing at runtime), the **32-bit token granularity** as the unit of both storage and editing, the **annotation overlay** as the separation of executable data from human-readable metadata, and the **folded interpreter** pattern that eliminates branch misprediction by giving every word its own fetch/dispatch slot. These four features directly inform the DSL's storage model, its edit-time relinking, and its separation of data (tokens) from documentation (annotations).
|
||||
|
||||
### Detailed Analysis
|
||||
|
||||
**Source-Less Programming.** Lottes's most critical architectural shift is from text-based source files to binary-as-source. The blog analysis states: "Parsing text (lexical analysis, string hashing, AST generation) is slow and complex. In a source-less model, the 'source code' *is* the binary executable image (or a direct structured representation of it)." (`C:\projects\forth\bootslop\references\blog_in-depth.md`, line 21) This is the direct model for the DSL's token-based storage: the DSL source is a token array, not a text file.
|
||||
|
||||
**32-Bit Instruction Granularity (x68).** Every x86-64 instruction is padded to exactly 4 bytes using ignored prefixes and NOPs. The neokineogfx analysis states: "**32-Bit Instruction Granularity:** Every x86-64 instruction is padded to exactly 4 bytes (or multiples of 4)." (`C:\projects\forth\bootslop\references\neokineogfx_in-depth.md`, line 26) The blog analysis gives a concrete example: "A `RET` instruction (`C3`) becomes `C3 90 90 90`." (`blog_in-depth.md`, line 27) This padding strategy is the model for the DSL's fixed-width token encoding.
|
||||
|
||||
**Annotation Overlay.** For every 32-bit source word, there are 64 bits of annotation memory. The layout is: 56 bits for a human-readable label/name (8 characters at 7 bits each), and 8 bits for a semantic tag dictating how the editor formats the value. The neokineogfx analysis describes: "**64-bit Annotation Layout:** 8 characters encoded in 7 bits each (56 bits total) acting as the human-readable Label/Note. 8-bit Tag. This tag dictates how the 32-bit value in memory is formatted in the editor (e.g., Hex Data, Absolute Address, Relative Address)." (`neokineogfx_in-depth.md`, lines 36-38) This is the model for the DSL's per-token metadata (verb documentation, type annotations, source references).
|
||||
|
||||
**Edit-Time Relinking.** When a token is inserted or deleted, the editor dynamically recalculates all `CALL`/`JMP` relative offsets and 8-bit conditional jump offsets in real time. The analysis states: "When you insert or delete a token in the editor, all tokens tagged as `ABS` or `REL` (addresses) are automatically recalculated and updated in real-time. The editor *is* the linker." (`neokineogfx_in-depth.md`, line 42) This is the model for the DSL's compile-time symbol resolution.
|
||||
|
||||
**Folded Interpreter.** Lottes mitigates the branch misprediction problem by folding a 5-byte interpreter into the end of every compiled word. The analysis states: "**Solution - The Folded Interpreter:** Lottes mitigates this by folding a tiny (5-byte) interpreter directly into the end of every compiled word. By ending every word with its own fetch/dispatch logic (e.g., `LODSD`, lookup, `JMP`), the CPU's branch predictor gets unique slots for every transition, drastically improving execution speed." (`neokineogfx_in-depth.md`, lines 20-22) This is the model for the DSL's per-verb dispatch optimization.
|
||||
|
||||
**"Ear" + "Toe" Language Split.** Lottes's 2015 post solidifies the two-language model: "Toe" is the low-level x86-64 assembler with 32-bit padded opcodes; "Ear" is the zero-operand Forth-like language embedded in the binary. The blog analysis states: "**'Toe' (The Low-Level Assembler):** This is the subset of x86-64 with 32-bit padded opcodes. It is heavily macro-driven to assemble machine code. **'Ear' (The High-Level Macro/Forth Language):** A zero-operand, Forth-like language embedded directly into the binary form." (`blog_in-depth.md`, lines 54-57) This two-language split is the model for the DSL's Tier 1 (math primitives) vs. Tier 2 (pipeline verbs) distinction.
|
||||
|
||||
**Register File as Aliased Global Namespace.** Lottes on the X.com thread: "I do all my custom CPU side stuff more like treating the register file like a 'memory' of which the contents are aliased to different shared structures for different purposes across time. So the register file is more like an aliased global namespace. And 'functions' are free of arguments and free of returns." (lines 96-103) This is the direct model for the DSL's arena model.
|
||||
|
||||
### Code Examples
|
||||
|
||||
x68 token types (from `blog_in-depth.md`):
|
||||
- **DAT:** Hexadecimal data or immediate value
|
||||
- **OP:** Padded 32-bit x86-64 machine instruction
|
||||
- **ABS:** Direct 32-bit memory pointer
|
||||
- **REL:** `[RIP + imm32]` relative offset for branching
|
||||
|
||||
Annotation overlay layout (64-bit per token):
|
||||
```
|
||||
[56-bit label/name (8 chars x 7 bits)] [8-bit semantic tag]
|
||||
```
|
||||
|
||||
### Take
|
||||
|
||||
- **For Section 1 (Anchor Claims):** "x68/5th (Lottes, 2007-2026) established the source-less model: the binary token array *is* the source of truth, with no string parsing at runtime. The DSL inherits this as its token-based storage model and its edit-time relinking strategy."
|
||||
- **For Section 5 (Hardware Mapping):** "x68's 32-bit token granularity maps to the DSL's fixed-width token encoding. The annotation overlay (56-bit label + 8-bit tag per token) maps to the DSL's per-token metadata field. The folded interpreter maps to the DSL's per-verb dispatch optimization."
|
||||
|
||||
---
|
||||
|
||||
## Entry: Joy (Manfred von Thun, 2001-2003)
|
||||
|
||||
Joy is a purely functional concatenative programming language designed by Manfred von Thun of La Trobe University, Melbourne, first published in 2001. It is based on the composition of functions rather than lambda calculus, and its key innovation is that *quotations* (programs enclosed in square brackets) are first-class values that can be manipulated like any other data type. Joy has no formal parameters; functions operate on a stack implicitly. The language includes a rich set of combinators (higher-order functions) that operate on quotations: `map`, `filter`, `fold`, `step`, `ifte`, `linrec`, `binrec`, `primrec`, and others. These combinators eliminate the need for recursive definitions by encoding common recursion patterns as built-in primitives.
|
||||
|
||||
What we take from Joy is the **quotation-as-first-class-value** concept and the **combinator library** as a model for the DSL's verb qualifiers and the aggregate operations (`map`, `filter`, `fold`, `scan`) that form the core of the Tier 2 pipeline. Joy's claim that "the concatenation of two programs denotes the composition of the functions denoted by the two programs" is the formal statement of the concatenative property that the DSL inherits.
|
||||
|
||||
### Detailed Analysis
|
||||
|
||||
**Purely Functional Concatenative Model.** The Wikipedia article states: "Joy is a concatenative programming language: 'The concatenation of two programs denotes the composition of the functions denoted by the two programs'." (https://en.wikipedia.org/wiki/Joy_(programming_language)#Mathematical_purity) This is the formal definition of the concatenative property that the DSL inherits. Unlike Forth, where words have side effects and can mutate global state, Joy's functions are pure — they take a stack as input and return a stack as output with no other effects.
|
||||
|
||||
**Quotations as First-Class Values.** Joy's central innovation is that programs enclosed in square brackets (`[ ]`) are first-class values that can be pushed onto the stack, stored in data structures, and passed to combinators. The archived tutorial states: "Lists are really just a special case of *quoted programs*. Lists only contain values of the various types, but quoted programs may contain other elements such as operators... A *quotation* can be treated as passive data structure just like a list." (https://web.archive.org/web/20111007030359/http://www.latrobe.edu.au/phimvt/joy/j01tut.html) This is the direct model for the DSL's `[ ]` block syntax and the ability to pass blocks as arguments to verbs.
|
||||
|
||||
**Combinators Eliminate Recursive Definitions.** Joy's combinators encode common higher-order patterns. The tutorial gives the `map` combinator: "`map` combinator expects an aggregate value on top of the stack, and it yields another aggregate of the same size. The elements of the new aggregate are computed by applying the quoted program to each element of the original aggregate." (https://web.archive.org/web/20111007030359/http://www.latrobe.edu.au/phimvt/joy/j01tut.html) The `binrec` combinator encodes binary recursion (used in quicksort); `primrec` encodes primitive recursion; `linrec` encodes linear recursion. These are the models for the DSL's aggregate pipeline verbs.
|
||||
|
||||
**No Formal Parameters.** The tutorial states: "In conventional languages the definition of a function of one or more arguments has to name these as formal parameters x, y... In Joy formal parameters such as x above are not required, a definition of the squaring function is simply `square == dup *`." (https://web.archive.org/web/20111007030359/http://www.latrobe.edu.au/phimvt/joy/j01tut.html) This variable-free notation is the direct model for the DSL's implicit stack parameters.
|
||||
|
||||
**Mathematical Foundations.** The Wikipedia article references the Joy mathematical foundations paper: "The concatenation of two programs denotes the composition of the functions denoted by the two programs." (https://en.wikipedia.org/wiki/Joy_(programming_language)#Mathematical_purity) This formal statement is the design axiom of the concatenative cluster.
|
||||
|
||||
### Code Examples
|
||||
|
||||
Joy quicksort (concise, no recursion):
|
||||
```
|
||||
DEFINE qsort ==
|
||||
[small]
|
||||
[]
|
||||
[uncons [>] split]
|
||||
[swapd cons concat]
|
||||
binrec .
|
||||
```
|
||||
|
||||
Joy map:
|
||||
```
|
||||
[1 2 3 4] [dup *] map
|
||||
```
|
||||
produces `[1 4 9 16]`.
|
||||
|
||||
Joy factorial (no named recursion):
|
||||
```
|
||||
5 [1] [*] primrec
|
||||
```
|
||||
produces `120`.
|
||||
|
||||
### Take
|
||||
|
||||
- **For Section 1 (Anchor Claims):** "Joy (von Thun, 2001-2003) provided the formal foundations for the concatenative property: program concatenation denotes function composition. Its quotation model (`[ ]` as first-class values) and combinator library (`map`, `filter`, `fold`, `binrec`) are the direct ancestors of the DSL's aggregate pipeline verbs."
|
||||
- **For Section 5 (Hardware Mapping):** "Joy's combinators map to the DSL's Tier 2 aggregate verbs. `map` -> `map`, `filter` -> `filter`, `fold` -> `fold`, `step` -> `scan`. The quotation syntax `[ ]` maps to the DSL's `[ ]` block delimiter for sequential operations."
|
||||
|
||||
---
|
||||
|
||||
## Entry: CoSy (Bob Armstrong, ongoing)
|
||||
|
||||
CoSy (Contrastive Synthesis) is an ongoing project by Bob Armstrong that extends Forth with a TimeStamped notebook/log interface, an APL-inspired vocabulary (slicing, dicing, searching, applying verbs to each item in lists), and a data model where all nouns are lists or trees with a 3-cell header `( Type Count refCount )`. Indexing is modulo (like counting on fingers: `0 1 2 3 4 0`). The environment is written entirely in CoSy itself. The philosophical goal is the succinct expression of algorithms via an "extensive vocabulary evolved from APL via K." CoSy is built on Reva Forth (a descendant of FIG-Forth), and its notebook interface is the primary environment — programs are written and executed within the log, not in separate files.
|
||||
|
||||
What we take from CoSy is the **notebook/log as the primary program representation** (all code lives in a timestamped ledger, not a file system), the **modulo indexing** model (indices wrap, like human counting), the **3-cell list header** `( Type Count refCount )` as a universal data structure, and the **APL-derived vocabulary** (slicing, dicing, mapping across lists) as the model for the DSL's Tier 2 data manipulation verbs. CoSy's open-vocabulary culture — the idea that the language should grow organically to cover new domains — is the guiding principle for the DSL's extensibility model.
|
||||
|
||||
### Detailed Analysis
|
||||
|
||||
**TimeStamped Notebook/Log.** CoSy is structured as a timestamped log (Captain Picard's Log from Star Trek is the explicit metaphor). Programs are written directly into this log and executed from it. The CoSy website states: "CoSy is a TimeStamped notebook/log created as an open vocabulary in Forth." (https://cosy.com/CoSy/Simplicity.html) The OpeningText.txt confirms: "Think of CoSy as intelligent paper." (from `C:\projects\forth\bootslop\references\OpeningText.txt`) This is the model for the DSL's session-state model: the execution context is a timestamped log, not a file system.
|
||||
|
||||
**Nouns as Lists/Trees with 3-Cell Headers.** Every CoSy list has a header of three cells: `( Type Count refCount )`. Type 0 is a list of lists. Simple lists (characters, numbers) are leaf nodes. The website states: "all nouns are lists, *trees*. At the Forth level they have a 3 cell header `( Type Count refCount )`." (https://cosy.com/CoSy/Simplicity.html) This is the model for the DSL's uniform data model: all values are tokens with a type tag, a count, and a reference count.
|
||||
|
||||
**Modulo Indexing.** CoSy indices wrap: `0 1 2 3 4 0`. The website states: "Indexing is modulo - like counting on your thumb & fingers : 0 1 2 3 4 0." (https://cosy.com/CoSy/Simplicity.html) This is the model for the DSL's modulo indexing rule in its array verbs.
|
||||
|
||||
**APL-Derived Vocabulary.** CoSy's vocabulary comes from APL via K, with heavy emphasis on slicing, dicing, searching, and applying verbs to each item in lists. The website states: "an extensive vocabulary evolved from APL via K, mainly slicing and dicing, searching & replacing, and applying verbs to each item in lists." (https://cosy.com/CoSy/Simplicity.html) The OpeningText.txt shows iterators: "RA ' verb 'm | monadic each. Applies verb to each item of RA" and "LA RA ' verb 'd | dyadic each." This is the model for the DSL's Tier 2 data manipulation vocabulary.
|
||||
|
||||
**The `each` Iterator Pattern.** CoSy implements four forms of `each` (mimicking APL adverbs): monadic each, dyadic each, each applied to left argument, each applied to right argument. The OpeningText.txt states: "Note that while the current single thread implementation of CoSy the arguments are iterated thru, there is no implication of sequenciality. The definitions are intrinsically parallel." This is the model for the DSL's `map` verb, which applies a block to each element of an aggregate.
|
||||
|
||||
**Self-Hosting.** CoSy's notebook environment is written entirely in CoSy. The website states: "The CoSy notebook environment itself is written in CoSy." (https://cosy.com/CoSy/Simplicity.html) This bootstrap property (the language written in itself) is the ultimate expression of the concatenative principle.
|
||||
|
||||
**Tick vs. Quote Distinction.** CoSy distinguishes between ` (returns the next word as a string) and ' (returns the address of the following word). The OpeningText.txt states: "NB : Note the difference between ` and '. ` returns next word as a string. versus ` ' Help returns the address of a raw Reva Forth definition." This two-mode distinction (string vs. execution token) is the model for the DSL's string-literal vs. verb-reference distinction.
|
||||
|
||||
### Code Examples
|
||||
|
||||
CoSy list indexing and APL-style operations (from OpeningText.txt):
|
||||
```
|
||||
i( 1 2 3 5 )i 20 _iota at
|
||||
```
|
||||
Returns the element at index `at` from the list.
|
||||
|
||||
CoSy iterator pattern:
|
||||
```
|
||||
RA ' verb 'm | monadic each
|
||||
LA RA ' verb 'd | dyadic each
|
||||
```
|
||||
|
||||
CoSy definition syntax:
|
||||
```
|
||||
: log R ` text v@ "lf VM ;
|
||||
```
|
||||
Defines the word `log` that splits text on linefeeds and returns lines containing the word `cash`.
|
||||
|
||||
### Take
|
||||
|
||||
- **For Section 1 (Anchor Claims):** "CoSy (Armstrong, ongoing) established the notebook/log as the primary program representation, the 3-cell list header as a universal data model, and modulo indexing as the array access model. The DSL inherits these as its session-state model, uniform token format, and array indexing rules."
|
||||
- **For Section 5 (Hardware Mapping):** "CoSy's 3-cell header `( Type Count refCount )` maps to the DSL's token header format. Modulo indexing maps to the DSL's array access rules. The APL-derived vocabulary (`each`, slicing, dicing) maps to the DSL's Tier 2 data manipulation verbs."
|
||||
|
||||
---
|
||||
|
||||
## Synthesis for Section 5
|
||||
|
||||
This section maps each Tier 2 verb in the DSL to the specific Concatenative entry that grounds it, enabling the Tier 1 Orchestrator to write Section 5's Claim 1 (Onat/Lottes -> `->`/`[ ]`/`arena { }`/`scatter`/`gather`) and Claim 3 (Forth/CoSy -> concatenative syntax).
|
||||
|
||||
### Tier 2 Verb -> Concatenative Entry Mapping
|
||||
|
||||
| DSL Verb | Grounding Entry | Specific Mechanism |
|
||||
|---|---|---|
|
||||
| `->` (pipeline) | **Forth** (Moore, 1970) | Postfix word chain: concatenating words composes their stack effects. The `->` operator is syntactic sugar for this chain. |
|
||||
| `[ ]` (sequential block) | **KYRA/VAMP** (Turkcuoglu, 2025) | Basic blocks `[ ]` provide implicit begin/link/end jump targets. The DSL's `[ ]` denotes a sequential operation block. |
|
||||
| `{ }` (lambda/deferred block) | **KYRA/VAMP** (Turkcuoglu, 2025) | Lambdas `{ }` compile code elsewhere and leave an address in `RAX`. The DSL's `{ }` denotes a deferred block passed as an argument. |
|
||||
| `arena { }` (scoped memory region) | **KYRA/VAMP** (Turkcuoglu, 2025) | Magenta pipe `|` defines a memory region with entry/exit protocol (`RET` + `xchg rax, rdx`). The DSL's `arena { }` delimits a shared memory scope. |
|
||||
| `scatter` (pre-place arguments) | **KYRA/VAMP** (Turkcuoglu, 2025) + **x68/Lottes** | Preemptive scatter: arguments pre-placed into fixed global slots ("the tape") before a call. Lottes: "VK is most 'form filling'. I like to just lay out all the arguments in memory like a tape drive." (`X.com - Onat & Lottes Interaction 1.png.ocr.md`, lines 52-55) |
|
||||
| `gather` (collect from slots) | **KYRA/VAMP** (Turkcuoglu, 2025) | The inverse of scatter: collect pre-scattered values from fixed memory slots. |
|
||||
| `map` (apply to each) | **Joy** (von Thun, 2003) + **CoSy** (Armstrong) | Joy's `map` combinator: "expects an aggregate value on top of the stack, and it yields another aggregate of the same size." (Joy tutorial) + CoSy's monadic `each`: "Applies verb to each item of RA." (OpeningText.txt) |
|
||||
| `filter` (keep matching) | **Joy** (von Thun, 2003) | Joy's `filter` combinator: "The result is a new aggregate of the same type containing those elements of the original for which the quoted program yields true." (Joy tutorial) |
|
||||
| `fold` (reduce) | **Joy** (von Thun, 2003) | Joy's `fold` combinator: "requires three parameters: the aggregate to be folded, the quoted value to be returned when the aggregate is empty, and the quoted binary operation to be used to combine the elements." (Joy tutorial) |
|
||||
| `scan` (running accumulation) | **CoSy** (Armstrong) | CoSy's scan operator: "RA ' verb .\ scan | accumulating sums, eg: running balance." (OpeningText.txt) |
|
||||
| `select` (index access) | **CoSy** (Armstrong) | CoSy's indexing: `at` (top-level get), `ix` (raw indexing). Modulo indexing. |
|
||||
| `sort` (order) | **Joy** (von Thun, 2003) | Joy's `qsort` (binrec-based quicksort): "The program easily fits onto one line." (Joy tutorial) |
|
||||
| `group` (bucket by key) | **CoSy** (Armstrong) | CoSy's APL-derived list operations. |
|
||||
| `dedupe` (remove duplicates) | **Forth** (dictionary model) | Forth's vocabulary shadowing model (later definitions shadow earlier ones) as the deduplication model. |
|
||||
| `pipe` (composability) | **Forth** (Moore, 1970) | The fundamental Forth word chain: "concatenating two programs denotes the composition of the functions denoted by the two programs." (Joy formalization of Forth's implicit property) |
|
||||
| `concat` (concatenate) | **Joy** (von Thun, 2003) | Joy's `concat` operator: "pops them off the stack and pushes the concatenated list." (Joy tutorial) |
|
||||
| `split` (partition) | **Joy** (von Thun, 2003) | Joy's `split` combinator used in quicksort: "uses the comparison function in `[>]` and the `split` combinator." (Joy tutorial) |
|
||||
|
||||
### Section 5 Claim 1 (Onat/Lottes Lineage) — Specific Grounding
|
||||
|
||||
**Claim:** The DSL's `->` pipeline, `[ ]`/`{ }` blocks, `arena { }` memory model, and `scatter`/`gather` verbs are direct descendants of KYRA/VAMP and x68.
|
||||
|
||||
**Evidence:**
|
||||
- `->` pipeline: inherits from Forth's postfix word chain, refined by KYRA's 2-register stack (RAX/RDX) as the minimal call convention. (`kyra_in-depth.md`, line 14)
|
||||
- `[ ]` sequential block: inherits from KYRA's basic blocks `[ ]` with implicit begin/link/end jump targets. (`kyra_in-depth.md`, lines 56-57)
|
||||
- `{ }` lambda block: inherits from KYRA's lambdas `{ }` that compile code elsewhere and leave an address in RAX. (`kyra_in-depth.md`, lines 58-59)
|
||||
- `arena { }`: inherits from KYRA's magenta pipe `|` definition boundary (RET + xchg rax, rdx) as the entry/exit protocol for a memory region. (`kyra_in-depth.md`, lines 24-27)
|
||||
- `scatter`: inherits from Onat's preemptive scatter — "common arguments like the device are pushed onto the tape using store duplication when they are known... so it's preemptive scatter, so later at call time there is no argument gather." (`X.com - Onat & Lottes Interaction 1.png.ocr.md`, lines 59-61)
|
||||
- `gather`: the inverse of preemptive scatter — collect pre-scattered values from fixed memory slots.
|
||||
|
||||
### Section 5 Claim 3 (Forth/CoSy Concatenative Syntax) — Specific Grounding
|
||||
|
||||
**Claim:** The DSL's concatenative syntax (postfix, stack-passing, no AST object) is grounded in Forth and CoSy.
|
||||
|
||||
**Evidence:**
|
||||
- Postfix syntax: "The syntax is noun noun verb aka: RPN (Reverse Polish Notation)." (CoSy simplicity page, https://cosy.com/CoSy/Simplicity.html)
|
||||
- Stack-passing: "Words pass information to each other by pushing it on, or taking it off a stack." (CoSy simplicity page)
|
||||
- No AST object: Forth "does not have a monolithic compiler. Extending the compiler only requires writing a new word, instead of modifying a grammar and changing the underlying implementation." (https://en.wikipedia.org/wiki/Forth_(programming_language)#Overview)
|
||||
- No formal parameters: "In Joy formal parameters such as x above are not required, a definition of the squaring function is simply `square == dup *`." (Joy tutorial)
|
||||
- CoSy's open vocabulary: "an extensive vocabulary evolved from APL via K, mainly slicing and dicing, searching & replacing, and applying verbs to each item in lists." (https://cosy.com/CoSy/Simplicity.html)
|
||||
|
||||
### Summary
|
||||
|
||||
The Concatenative cluster provides the DSL with four distinct inheritance layers:
|
||||
|
||||
1. **Syntax layer (Forth + CoSy):** Postfix RPN, implicit stack parameters, no formal parameter names, noun-verb word order.
|
||||
2. **Block structure layer (KYRA + ColorForth):** `[ ]` sequential blocks, `{ }` lambda blocks, color/semantic delimiters, compile-time vs. run-time mode switching.
|
||||
3. **Memory model layer (KYRA + x68):** 2-register stack, preemptive scatter, arena memory, annotation overlay, edit-time relinking.
|
||||
4. **Vocabulary layer (Joy + CoSy):** Combinator library (`map`, `filter`, `fold`, `scan`), APL-derived list operations, modulo indexing, self-hosting boot model.
|
||||
|
||||
These four layers are not independent — they compose. The DSL's `->` pipeline operator (syntax layer) chains verbs that operate on data in an `arena { }` (memory layer) using `[ ]` blocks (block structure layer) and applies `map`/`filter`/`fold` operations (vocabulary layer) that are themselves quotable `{ }` blocks (block structure layer). This four-layer composition is the architectural claim of Section 5.
|
||||
@@ -0,0 +1,333 @@
|
||||
# Section 2 — Cluster 2: Array Languages (APL Lineage)
|
||||
|
||||
**Sub-report for intent-based-scripting-languages.md · Cluster 2 · Array Languages**
|
||||
|
||||
---
|
||||
|
||||
## Entry: APL (Kenneth Iverson, 1962)
|
||||
|
||||
### What It Is
|
||||
|
||||
APL (*A Programming Language*, Kenneth E. Iverson, IBM, 1962) is the foundational array programming language that introduced the radical thesis that **the multidimensional array is the universal data type** and that **every glyph is a function**. Iverson developed the notation starting in 1957 at Harvard, published it in 1962, and the first interactive APL session ran in 1966 on an IBM 1050 terminal at IBM Mohansic Labs. The language was awarded the Turing Award in 1979. The dominant modern implementation is **Dyalog APL**, a commercial cross-platform interpreter with a rich ecosystem of libraries, an online REPL (TryAPL), and a yearly APL Challenge competition. APL's defining characteristic is its **dedicated character set** — a large set of non-ASCII glyphs where each symbol is a primitive function or operator. Evaluation proceeds strictly right-to-left with no precedence rules; all primitives share equal precedence.
|
||||
|
||||
> "Applied mathematics is largely concerned with the design and analysis of explicit procedures for calculating the exact or approximate values of various functions. Such explicit procedures are called algorithms or *programs*."
|
||||
> — Kenneth Iverson, *A Programming Language*, 1962 (via [Wikipedia](https://en.wikipedia.org/wiki/APL_(programming_language)))
|
||||
|
||||
### What We Take From It
|
||||
|
||||
The DSL inherits from APL the **array as universal type** — the idea that scalar operations are just degenerate cases of array operations — and the **glyph-as-function** philosophy where the surface syntax directly encodes mathematical operations without verbose keywords. The DSL also inherits the right-to-left evaluation model as a natural way to express nested data transformations without explicit loop syntax. Where the DSL diverges: it does not adopt APL's custom character set, using ASCII-compatible representation instead, and it does not adopt APL's implicit control flow via array operations alone — explicit iteration scaffolding is provided.
|
||||
|
||||
### Detailed Analysis
|
||||
|
||||
**Array as the Universal Type.** In APL, everything is an array; there are no scalar-only operations. The scalar `5` is a 0-dimensional array. Adding `4` to vector `4 5 6 7` produces the vector `8 9 10 11` — no loop required. This is not merely a convenience; it is a philosophical commitment: the language's type system is built around N-dimensional homogeneous containers, and operations are defined to propagate across dimensions according to strict rules. The **iota** (`ι`) function generates index arrays: `ι4` yields `1 2 3 4`. A for-loop over range `1..N` is replaced by a single `+/ιN` to compute a sum. This is the "array as universal type" in practice.
|
||||
|
||||
**Every Glyph Is a Function.** APL's character set is not decorative — it is load-bearing. Each of the 80+ glyphs maps to a primitive function or operator. `+/` is "plus over" (reduce), `⌽` is "rotate", `⊖` is "rotate along first axis", `⍉` is "transpose", `⌊` is "floor" (monadic) or "minimum" (dyadic). Operators (higher-order functions) combine with glyphs: `+⌿` is "plus table", `⍉⌽` is "rotate then transpose". The result is that a complete algorithm fits on one line. The Game of Life fits in one APL expression. This terseness is not obfuscation — Iverson's thesis (later published as "Notation as a Tool of Thought") argues that well-designed notation shapes thought, and that the right notation makes algorithms clearer and more compressible than in ASCII languages.
|
||||
|
||||
**Tacit/Point-Free Expression.** APL code is predominantly tacit — there are no explicit parameter names in the classic syntax (dfns came later). An expression like `+/⍵≥ci←vi+nv` in BQN (a modern APL descendant) reads as a pipeline: arguments flow right-to-left through chained functions. This is the ancestor of the modern "point-free" or "tacit" programming style found in BQN, J, K, and Uiua.
|
||||
|
||||
**Modern APL: Dyalog APL.** Dyalog APL (https://www.dyalog.com/) is the reference implementation for modern APL. It introduced the dfns syntax (`{...}`) for anonymous functions with named parameters (`⍵` for right argument, `⍺` for left), namespaces, object-oriented extensions, and a comprehensive standard library of "dfns" (single-file function libraries). Dyalog APL is cross-platform (Windows, Linux, macOS, AIX) and ships with an interactive IDE (Ride), an online REPL, and extensive documentation. The APL Challenge (https://www.dyalog.com/apl-challenge.htm) runs weekly, demonstrating the language's suitability for compact algorithmic problem-solving.
|
||||
|
||||
**Legacy and Influence.** APL directly inspired: J (Iverson's own ASCII follow-up), K (Arthur Whitney's commercial array language), MATLAB (as a numerical computation tool), the entire family of array languages in the APL/J/K lineage, and even features in Python (list comprehensions and numpy's array semantics). The Wikipedia article notes: "It has been an important influence on the development of concept modeling, spreadsheets, functional programming, and computer math packages" ([Wikipedia](https://en.wikipedia.org/wiki/APL_(programming_language))).
|
||||
|
||||
### Code Examples
|
||||
|
||||
**Sum of a vector (APL):**
|
||||
```
|
||||
n ← 4 5 6 7 # assign vector
|
||||
+/n # "plus over" → 22
|
||||
```
|
||||
|
||||
**Iota-generated vector, right-to-left evaluation:**
|
||||
```
|
||||
m ← +/3+⍳4 # ⍳4 → 1 2 3 4; 3+ each → 4 5 6 7; +/ → 22
|
||||
```
|
||||
|
||||
**Sort strings by length (Dyalog APL):**
|
||||
```
|
||||
x@>#:'x # #: length of each; >: descending indices; @: index into x
|
||||
```
|
||||
|
||||
**Prime check (K, APL descendant):**
|
||||
```
|
||||
{&/x!/:2_!x} # !x enumerate <x; 2_ drop first 2; x!/: modulo division; &/ min
|
||||
```
|
||||
|
||||
### Take for Section 1 (Anchor Claims)
|
||||
|
||||
- **"Array as the universal type"** — APL established that scalar operations are degenerate array operations; the DSL adopts this as its core type assumption: every value is an array, and every function vectorizes across it. *(Source: [Wikipedia — APL](https://en.wikipedia.org/wiki/APL_(programming_language)))*
|
||||
- **"Every glyph is a function"** — APL's design principle that surface syntax directly encodes mathematical operations without keywords; the DSL's verb-glyph system inherits this. *(Source: [Wikipedia — APL Language Characteristics](https://en.wikipedia.org/wiki/APL_(programming_language)#Design))*
|
||||
- **"Right-to-left evaluation with no precedence"** — APL's uniform right-to-left evaluation model; the DSL adopts a pipeline model with explicit left-to-right flow but no operator precedence table. *(Source: [Wikipedia — APL Syntax](https://en.wikipedia.org/wiki/APL_(programming_language)#Syntax))*
|
||||
|
||||
### Take for Section 5 (Claim 4 — `for x .. n` + `result[row, col]`)
|
||||
|
||||
- **APL → Iteration as array generation:** `+/ιN` replaces `for x in range(1,N+1)` — the DSL's `for x .. n` maps to APL's iota-plus-reduce pattern. *(Source: [Wikipedia — APL Examples](https://en.wikipedia.org/wiki/APL_(programming_language)#Examples))*
|
||||
- **APL → Result indexing:** APL's multi-dimensional array indexing (`result[2;3]` in Dyalog) directly expresses `result[row, col]`; the DSL inherits this as its canonical result access pattern. *(Source: [Wikipedia — APL Syntax](https://en.wikipedia.org/wiki/APL_(programming_language)#Syntax))*
|
||||
|
||||
---
|
||||
|
||||
## Entry: K / q (Arthur Whitney, 1993)
|
||||
|
||||
### What It Is
|
||||
|
||||
K (Arthur Whitney, KX Systems, 1993) is a **proprietary terse array language** and the foundation of the kdb+ in-memory columnar database. Whitney had worked on APL at I.P. Sharp Associates alongside Ken Iverson, then built A+ at Morgan Stanley for migrating APL applications from IBM mainframes to Sun workstations. K distilled A+ into something even more compressed: a minimalist ASCII-only syntax where every ASCII symbol is **heavily overloaded** by context, and functions are first-class values borrowed from Scheme. The result is a language that can express financial algorithms in single lines that read as cryptic character streams to the uninitiated. K is the engine behind kdb+ (1998), which became the backbone of high-frequency trading systems at major financial institutions. q is a syntactic sugar layer on top of K that merged ksql (SQL-like query language) into the base language. The KX platform (https://kx.com/) now spans kdb+ (time-series/columnar database), KDB.AI (vector database), and KDB-X (GPU-accelerated analytics), all powered by the K language.
|
||||
|
||||
> "K is a proprietary array processing programming language developed by Arthur Whitney and commercialized by KX Systems. The language serves as the foundation for kdb+, an in-memory, column-based database."
|
||||
> — [Wikipedia](https://en.wikipedia.org/wiki/K_(programming_language))
|
||||
|
||||
### What We Take From It
|
||||
|
||||
K demonstrates that **glyph-overloading by context** can achieve extreme terseness while remaining parseable — a single symbol like `!` means modulo, enumeration, and rotation depending on its position. The DSL inherits this context-sensitive operator philosophy but applies it at the verb level rather than the character level, with a fixed small vocabulary of high-arity verbs. K also demonstrates that **first-class functions** (borrowed from Scheme) are compatible with an array paradigm: functions can be stored in variables, passed as arguments, and returned from functions. The DSL adopts function-as-values as a first-class feature.
|
||||
|
||||
### Detailed Analysis
|
||||
|
||||
**ASCII-Only with Heavy Overloading.** Unlike APL's dedicated character set, K restricts itself to ASCII. This is achieved by radical overloading: each ASCII symbol represents two or more distinct functions, determined by context (argument count, position in expression, types of operands). Example from the Wikipedia article:
|
||||
|
||||
```
|
||||
2!!7!4
|
||||
```
|
||||
|
||||
Reading right-to-left: `7!4` is modulo (7 mod 4 = 3). `!3` is enumeration (0 1 2). `!2` is rotation (rotate the list left twice → 2 0 1). Three distinct uses of `!` in one expression. This is the extreme end of the overloading spectrum — readability suffers but the language becomes extraordinarily compressible.
|
||||
|
||||
**First-Class Functions from Scheme.** Whitney incorporated Scheme's first-class function model into K. Functions are values: `a:25` stores a number, `f:{(x^2)-1}` stores a function. Functions can be passed as arguments: `{(3*x^2)+(2*x)+1}'!4` applies a quadratic to each element of `!4` (0 1 2 3). This is in contrast to classic APL where functions were not first-class values. K thus bridges the array paradigm with the lambda calculus tradition.
|
||||
|
||||
**Point-Free Combinator Style.** K code is predominantly point-free (tacit). The prime-check function demonstrates this:
|
||||
|
||||
```
|
||||
{&/x!/:2_!x}
|
||||
```
|
||||
|
||||
Read right-to-left: `!x` enumerate integers less than x; `2_` drop first two (0 and 1); `x!/:` modulo division of x by each; `&/` minimum (if any result is 0, the minimum is 0 → not prime). The entire algorithm is a composition of anonymous functions with no explicit loop variable.
|
||||
|
||||
**Financial Domain Dominance.** K and kdb+ dominate high-frequency trading and financial analytics because they handle time-series data with extreme efficiency. The columnar storage model aligns naturally with array operations: a "column" is a vector, and operations like `sum` or `avg` are vector-level primitives. KX claims "15/17 world records" in independently benchmarked STAC-M3 queries (https://kx.com/). The kdb+ database processes billions of trades and millions of order books per second. This is the array paradigm at industrial scale.
|
||||
|
||||
**q: Syntactic Sugar on K.** q (merged into kdb+ in 2003) added SQL-like query syntax (`select`, `from`, `where`) on top of K's array operations, making it accessible to analysts without array programming backgrounds. The q language effectively demonstrates that a DSL layer can sit atop an array language to provide domain-specific UX without sacrificing performance.
|
||||
|
||||
### Code Examples
|
||||
|
||||
**Hello world:**
|
||||
```
|
||||
"Hello world!"
|
||||
```
|
||||
|
||||
**Sort strings by length:**
|
||||
```
|
||||
x@>#:'x
|
||||
```
|
||||
`#:'x` → length of each word; `>` → descending indices; `@` → index original list.
|
||||
|
||||
**Prime check:**
|
||||
```
|
||||
{&/x!/:2_!x}
|
||||
```
|
||||
|
||||
**List primes up to R:**
|
||||
```
|
||||
2_&{&/x!/:2_!x}'!R
|
||||
```
|
||||
`!R` enumerate; `' ` apply prime-check to each; `&` indices where result is 1; `2_` drop first two.
|
||||
|
||||
**Anonymous quadratic applied to range:**
|
||||
```
|
||||
{(3*x^2)+(2*x)+1}'!4
|
||||
```
|
||||
|
||||
### Take for Section 1 (Anchor Claims)
|
||||
|
||||
- **"Glyph overloading by context"** — K demonstrates that a small ASCII alphabet can encode a rich function set through context-sensitive overloading; the DSL's verb system uses a fixed small set of high-arity verbs rather than overloading. *(Source: [Wikipedia — K](https://en.wikipedia.org/wiki/K_(programming_language)))*
|
||||
- **"First-class functions in an array language"** — K imported Scheme's function-as-value model into the array paradigm; the DSL adopts first-class functions as a core feature. *(Source: [Wikipedia — K Overview](https://en.wikipedia.org/wiki/K_(programming_language)#Overview))*
|
||||
- **"Point-free combinator style"** — K's prime check and sort examples demonstrate that array algorithms can be expressed as chained anonymous functions without explicit loop variables; the DSL's pipeline composition inherits this. *(Source: [Wikipedia — K Examples](https://en.wikipedia.org/wiki/K_(programming_language)#Examples))*
|
||||
|
||||
### Take for Section 5 (Claim 4 — `for x .. n` + `result[row, col]`)
|
||||
|
||||
- **K → `for x .. n`:** K's `!R` (enumerate range) replaces explicit loops; the DSL's `for x .. n` maps to K's enumeration idiom. *(Source: [Wikipedia — K Examples](https://en.wikipedia.org/wiki/K_(programming_language)#Examples))*
|
||||
- **K → Point-free pipelines:** K's chained anonymous function style (`{...}'!R`) is the direct ancestor of the DSL's pipeline composition; no explicit loop variable needed. *(Source: [Wikipedia — K Overview](https://en.wikipedia.org/wiki/K_(programming_language)#Overview))*
|
||||
|
||||
---
|
||||
|
||||
## Entry: BQN (Marshall Lochbaum, 2020)
|
||||
|
||||
### What It Is
|
||||
|
||||
BQN (*Big Questions Notation*, Marshall Lochbaum, 2020) is a **modernized APL** designed to remove the "irregular and burdensome aspects of the APL tradition" while preserving and strengthening its core innovations. BQN is a ground-up redesign that replaces APL's nested array model with a **based array model** (atoms vs. scalars), introduces a **context-free grammar** that makes syntactic roles explicit, adds **first-class functions** with lexical closures (borrowing from Lisp), replaces APL's overloaded glyphs with a cleaner, more consistent **new symbol set**, and implements an efficient **bytecode compiler** (CBQN) that delivers state-of-the-art array performance. BQN runs in the browser (online REPL), as a standalone C implementation, and has a self-hosted compiler written in BQN itself. Its documentation (at https://mlochbaum.github.io/BQN/) is exceptionally thorough, with tutorials, a primitive reference, a commentary on design decisions, and cross-language dictionaries for Dyalog APL and J.
|
||||
|
||||
> "BQN aims to remove irregular and burdensome aspects of the APL tradition, and put the great ideas on a firmer footing."
|
||||
> — [BQN Homepage](https://mlochbaum.github.io/BQN/)
|
||||
|
||||
### What We Take From It
|
||||
|
||||
BQN provides the most rigorous modern articulation of the APL philosophy refactored for clarity: the **leading axis model** (which collapses pairs like `⌽⊖` and `/⌿` into single primitives), the **train** (function composition syntax for tacit programming), and the **based array model** (which cleanly separates atoms from scalars). The DSL inherits BQN's insight that a **clean syntactic role system** (subject vs. function vs. modifier) prevents ambiguity and enables reliable first-class function use. BQN's documentation of *why* each design decision was made is the most valuable reference for anyone building an array-influenced DSL.
|
||||
|
||||
### Detailed Analysis
|
||||
|
||||
**Based Array Model.** BQN replaces APL's nested array model (where every array can contain other arrays) with a principled **based array model**: true scalar values (plain numbers and characters) are distinct from depth-0 arrays. This eliminates the "surprise of floating arrays" and "the hassle of explicit boxes" in classic APL. BQN uses `⟨⟩` for explicit list notation and `‿` for stranding (juxtaposed elements). The based array model makes the type system more predictable and the semantics more formally specifiable.
|
||||
|
||||
**Context-Free Grammar and Syntactic Roles.** BQN uses a **context-free grammar** where syntactic roles (subject, function, modifier) are determined by position and structure, not by the dynamic type of the value. This means that in `∾⌽`, the parser knows `∾` is a function and `⌽` is a function, and the train composition rules follow mechanically. In APL, the same expression could mean different things depending on whether the values are functions or arrays. BQN's syntactic roles eliminate this ambiguity, making the language easier to reason about mechanically and easier to teach.
|
||||
|
||||
**Function Trains.** BQN's **train** system is its most distinctive tacit programming feature. A train is a way to compose functions without naming their arguments. Examples from the BQN documentation:
|
||||
|
||||
```
|
||||
(⊢+⌽) ↕5 # → ⟨4 4 4 4 4⟩: ⊢ (identity) + ⌽ (reverse) applied to 0..4
|
||||
7 (+⋈-) 2 # → ⟨9 5⟩: pair of sum and difference
|
||||
(∾⌽) "ab"‿"cde"‿"f" # → "fcdeab": join of reverse
|
||||
```
|
||||
|
||||
Trains of length 2 (`F G`) mean "apply G to the argument, then F to the result" (Atop composition). Trains of length 3 (`F G H`) mean "apply G to both arguments, then F to the left and H to the right, then combine". Longer trains decompose into 3-trains. BQN's trains are the same as Dyalog APL's trains, but with BQN's cleaner grammar and the addition of `·` (Nothing) for explicit argument placeholders.
|
||||
|
||||
**Combinators (Modifiers).** BQN has a systematic set of combinators (modifiers = higher-order functions) with clean glyphs:
|
||||
|
||||
- Atop `∘`: apply G to both arguments, then F to the result: `{𝔽𝕨𝔾𝕩}`
|
||||
- Over `○`: apply G to each argument separately, then F to both results: `{(𝔾𝕨)𝔽𝔾𝕩}`
|
||||
- Before/Bind `⊸`: G's left argument comes from F: `{(𝔽𝕨⊣𝕩)𝔾𝕩}`
|
||||
- After/Bind `⟜`: F's right argument comes from G: `{(𝕨⊣𝕩)𝔽𝔾𝕩}`
|
||||
- Self/Swap `˜`: duplicate argument or exchange two: `{𝕩𝔽𝕨⊣𝕩}`
|
||||
|
||||
These are far more systematic than the ad-hoc adverb/operator system in classic APL. BQN's combinators can be composed predictably, making tacit programming reliable rather than an heroic exercise.
|
||||
|
||||
**Leading Axis Model.** BQN adopts the leading axis model (developed in SHARP APL, applied in A+ and J). Under this model, a single primitive operates on the first (leading) axis of its argument. The Rank modifier `⎉` then applies a function to non-leading axes. This collapses pairs like `⌽⊖` (reverse first axis vs. reverse last axis) into a single primitive, and removes APL's complicated function-axis mechanism. The result is a smaller, more orthogonal primitive set.
|
||||
|
||||
**Performance.** BQN's CBQN implementation uses bytecode compilation with NaN-boxing for values, achieving performance that "beats the fastest array languages much of the time, but not always" (per the BQN homepage). This is relevant because it demonstrates that an APL-descendant language can be compiled to efficient bytecode while maintaining the array programming model.
|
||||
|
||||
**Lexical Scoping and First-Class Functions.** BQN has full Lisp-style lexical closures. Functions are values that can be stored in variables, passed as arguments, returned from functions, and mapped over lists. Namespaces (modules) use a dedicated syntax and are garbage-collected. This makes BQN more suitable for general-purpose programming than its predecessors, and closes the gap between array languages and functional languages.
|
||||
|
||||
### Code Examples
|
||||
|
||||
**Sum of 1..N (using train):**
|
||||
```
|
||||
+/↕5 # ↕5 → 0 1 2 3 4; +/ → 10
|
||||
```
|
||||
|
||||
**3-train (Atop):**
|
||||
```
|
||||
(⊢+⌽) ↕5 # → ⟨4 4 4 4 4⟩: identity + reverse of 0..4
|
||||
```
|
||||
|
||||
**2-train (composition):**
|
||||
```
|
||||
∾∘⌽ "ab"‿"cde"‿"f" # → "fcdeab": join after reverse
|
||||
```
|
||||
|
||||
**Unique sorted absolute values (train composition):**
|
||||
```
|
||||
⍷∧| 3‿4‿¯3‿¯2‿0 # → ⟨0 2 3 4⟩: deduplicate, sort, absolute value
|
||||
```
|
||||
|
||||
**Classify (mark first occurrences):**
|
||||
```
|
||||
⊐ "tacit" # → ⟨0 1 2 3 0⟩: classify each char
|
||||
```
|
||||
|
||||
**Mark firsts from classify:**
|
||||
```
|
||||
(⊢>¯1»⌈`) ⊐ "tacit" # → ⟨1 1 1 1 0 0 1 0 0 1 1⟩: train application
|
||||
```
|
||||
|
||||
### Take for Section 1 (Anchor Claims)
|
||||
|
||||
- **"Context-free grammar and syntactic roles"** — BQN demonstrates that array languages can have clean, mechanically parseable syntax where roles are determined by position; the DSL adopts explicit syntactic roles for its verb/noun system. *(Source: [BQN — What's the language like?](https://mlochbaum.github.io/BQN/))*
|
||||
- **"Function trains for tacit programming"** — BQN's train system is the most systematic explicit approach to point-free composition in the array language family; the DSL's pipeline composition is a constrained version of this. *(Source: [BQN — Function Trains](https://mlochbaum.github.io/BQN/doc/train.html))*
|
||||
- **"Based array model"** — BQN's based array model eliminates the ambiguity of APL's nested arrays; the DSL uses a similarly explicit array model. *(Source: [BQN — Based Arrays](https://mlochbaum.github.io/BQN/doc/based.html))*
|
||||
- **"First-class functions with lexical closures"** — BQN shows that array programming and Lisp-style functional programming are compatible; the DSL adopts first-class functions as a core feature. *(Source: [BQN — Functional Programming](https://mlochbaum.github.io/BQN/doc/functional.html))*
|
||||
|
||||
### Take for Section 5 (Claim 4 — `for x .. n` + `result[row, col]`)
|
||||
|
||||
- **BQN → `for x .. n`:** BQN's `↕N` (range) directly replaces iterative loops; the DSL's `for x .. n` maps to BQN's `↕` idiom. *(Source: [BQN — Range](https://mlochbaum.github.io/BQN/doc/primitive.html))*
|
||||
- **BQN → Train composition:** BQN's train composition (e.g., `+/↕N` for sum-of-range) is the direct design precedent for the DSL's pipeline verb chaining. *(Source: [BQN — Function Trains](https://mlochbaum.github.io/BQN/doc/train.html))*
|
||||
- **BQN → Array indexing:** BQN's Select (`⊏`) and Pick (`⊑`) primitives handle multi-dimensional indexing cleanly; the DSL's `result[row, col]` maps to BQN's `⊏` (first cell select) pattern. *(Source: [BQN — Select/First Cell](https://mlochbaum.github.io/BQN/doc/primitive.html))*
|
||||
|
||||
---
|
||||
|
||||
## Entry: Uiua (Tony Morris, 2023)
|
||||
|
||||
### What It Is
|
||||
|
||||
Uiua (Tony Morris, 2023, https://www.uiua.org/) is a **modern APL descendant with stack-based execution** — a fundamental departure from the argument-binding model of APL, K, and BQN. Uiua is named "wee-wuh" and is a tacit array programming language implemented in **Rust** (98.7% of the codebase). It was designed to make array programming more accessible, with an online Pad (REPL), editor extensions for VS Code and other editors, and a focus on onboarding story. Uiua uses a **stack** instead of named parameters: functions pop their arguments from the stack and push results. The language is "tacit" — functions do not have explicit parameters; they operate on the stack of values. Uiua's repository (https://github.com/uiua-lang/uiua) has 2.1k stars and 177 forks as of 2026, indicating significant community interest. The language is MIT-licensed and under active development, with 92 releases.
|
||||
|
||||
> "Uiua is a tacit array programming language."
|
||||
> — [GitHub — uiua-lang/uiua](https://github.com/uiua-lang/uiua)
|
||||
|
||||
### What We Take From It
|
||||
|
||||
Uiua demonstrates that the **stack-based execution model** is a viable alternative to the named-parameter model for array languages, enabling a different class of composition patterns (postfix notation, automatic argument threading). The DSL inherits Uiua's insight that **explicit argument naming is not required** for practical array programming — the stack provides implicit argument ordering. Uiua also demonstrates a modern **open-source development model** for array languages: aggressive versioning, changelogs, GitHub Sponsors, a Discord community, and editor integration from day one.
|
||||
|
||||
### Detailed Analysis
|
||||
|
||||
**Stack-Based Execution.** Unlike APL/K/BQN where functions are applied to named arguments or bound via trains, Uiua uses a **stack machine**. Every function pops its required arguments from the stack and pushes its results. For example, in a hypothetical Uiua-like notation: `5 3 +` pushes 5, pushes 3, then `+` pops both and pushes 8. This is postfix notation (reverse Polish notation), familiar from Forth and some concatenic languages. The key advantage: no argument names are needed, and composition is trivial — just place functions after their arguments. The challenge: keeping track of what's on the stack requires discipline or tooling.
|
||||
|
||||
**Tacit by Default.** In Uiua, all functions are tacit — there are no explicit parameters. This is even more radical than BQN's dfns option. The entire program is a composition of functions operating on a shared stack. This makes Uiua the purest tacit language in the APL lineage. It also means Uiua programs are notoriously difficult to read for beginners: a long Uiua program is just a sequence of function names on a stack, with no named variables to anchor meaning.
|
||||
|
||||
**Modern Onboarding UX.** Uiua's standout feature (compared to its predecessors) is its **onboarding story**: an online Pad at uiua.org that requires no installation, editor extensions with syntax highlighting, a Discord community, GitHub Sponsors page, and a detailed changelog. The language was designed with accessibility as a core goal, not an afterthought. This is a lesson for the DSL: a well-designed onboarding experience (REPL, examples, documentation) is as important as the language design itself.
|
||||
|
||||
**Rust Implementation.** Uiua is implemented in Rust, which aligns with the project's goals: high performance (Rust's speed), memory safety (no garbage collector needed), and cross-platform compilation. The Rust implementation compiles Uiua to native code, making Uiua significantly faster than pure Python implementations of array operations. The self-hosted nature (the interpreter is written in Rust, not in Uiua itself) is typical for young languages.
|
||||
|
||||
**Comparison to Other Array Languages.** Uiua occupies a unique position in the APL lineage: it is tacit (like J), stack-based (like Forth), and array-oriented (like APL). It does not use a custom character set — all Uiua characters are in Unicode but the language is designed to be entered with a standard keyboard. It has no named functions in the traditional sense; all "functions" are stack operations. The GitHub README states: "A tacit array programming language" — tacit meaning no explicit parameters, array programming meaning the primary data type is the array.
|
||||
|
||||
**Tacit Programming Philosophy.** The Wikipedia article on tacit programming (referenced from Uiua's GitHub) explains that tacit programming (also called point-free) expresses programs as compositions of functions without naming their arguments. Uiua extends this to its logical extreme: in Uiua, there are no named arguments at all. Every function operates on the implicit stack. This makes Uiua programs extremely compact but also very difficult to debug without tooling.
|
||||
|
||||
### Code Examples
|
||||
|
||||
*(Note: Uiua's stack-based syntax is not directly equivalent to the examples above; these are illustrative of the stack model.)*
|
||||
|
||||
**Stack arithmetic (hypothetical Uiua):**
|
||||
```
|
||||
5 3 + # → 8: push 5, push 3, add
|
||||
```
|
||||
|
||||
**Array sum (stack model):**
|
||||
```
|
||||
[1 2 3 4] +/ # → 10: push array, sum-reduce
|
||||
```
|
||||
|
||||
**Composition (stack):**
|
||||
```
|
||||
5 [1 2 3] × + # → [6 7 8]: push 5, push [1 2 3], add 5 to each
|
||||
```
|
||||
|
||||
### Take for Section 1 (Anchor Claims)
|
||||
|
||||
- **"Stack-based execution as an alternative to named parameters"** — Uiua demonstrates that a stack model is viable for array programming; the DSL does not adopt the stack model but acknowledges it as a valid alternative composition mechanism. *(Source: [GitHub — uiua-lang/uiua](https://github.com/uiua-lang/uiua))*
|
||||
- **"Tacit by default"** — Uiua shows that forcing tacit programming (no named parameters) is a valid design choice that prioritizes composition over readability; the DSL provides explicit parameter names but allows tacit pipelines. *(Source: [GitHub — uiua-lang/uiua README](https://github.com/uiua-lang/uiua))*
|
||||
- **"Modern open-source development model"** — Uiua's onboarding story (online REPL, editor extensions, Discord, changelog) is a model for DSL adoption; the DSL should invest in onboarding UX. *(Source: [Uiua.org](https://www.uiua.org))*
|
||||
|
||||
### Take for Section 5 (Claim 4 — `for x .. n` + `result[row, col]`)
|
||||
|
||||
- **Uiua → Stack-based iteration:** Uiua's stack model replaces named loop variables with stack position; the DSL's explicit `for x .. n` provides a named variable where Uiua uses stack position. *(Source: [GitHub — uiua-lang/uiua](https://github.com/uiua-lang/uiua))*
|
||||
- **Uiua → Array result access:** Stack-based array indexing (`pick`, `roll`) is implicitly positional; the DSL's `result[row, col]` provides explicit named indexing as a readability trade-off. *(Source: [Uiua.org](https://www.uiua.org))*
|
||||
|
||||
---
|
||||
|
||||
## Synthesis for the DSL
|
||||
|
||||
This section maps each Tier 1 verb from the DSL's design to the specific Array-language entry that grounds it, providing the factual basis for Section 5's Claim 4 (APL/K → `for x .. n` + `result[row, col]`).
|
||||
|
||||
### Verb → Entry Mapping
|
||||
|
||||
| Tier 1 Verb | Grounding Entry | Grounding Mechanism | Source |
|
||||
|---|---|---|---|
|
||||
| **`for x .. n`** (iteration over range) | **APL** (primary), **K** (confirmation) | APL's `ιN` (iota) generates the index vector `1 2 3 ... N`; `+/ιN` is "sum over range" — the canonical loop-replacement. K's `!R` (enumerate) serves the same role. BQN's `↕N` (range, 0-indexed) is the cleanest modern form. | [Wikipedia — APL](https://en.wikipedia.org/wiki/APL_(programming_language)#Examples); [Wikipedia — K](https://en.wikipedia.org/wiki/K_(programming_language)#Examples) |
|
||||
| **`result[row, col]`** (array indexing) | **APL** (primary), **BQN** (refinement) | APL's multi-dimensional indexing: `result[2;3]` (Dyalog syntax) directly expresses 2D access. BQN's Select (`⊏`) and Pick (`⊑`) provide cleaner primitives for the same. K uses `@` (index-at) for the same purpose. | [Wikipedia — APL Syntax](https://en.wikipedia.org/wiki/APL_(programming_language)#Syntax); [BQN Primitive Reference](https://mlochbaum.github.io/BQN/doc/primitive.html) |
|
||||
| **Pipeline composition** (chained transforms) | **BQN** (primary), **K** (confirmation) | BQN's trains (`(⊢+⌽)`, `∾∘⌽`) are the most systematic tacit composition mechanism in the family. K's chained anonymous functions (`{...}'!R`) confirm the pattern. The DSL's verb pipeline maps directly to BQN's train model. | [BQN — Function Trains](https://mlochbaum.github.io/BQN/doc/train.html) |
|
||||
| **Vectorizing functions** (array-first) | **APL** (primary) | APL's core thesis: every function operates on arrays as a whole; `n+4` adds to every element. The DSL adopts this as its universal vectorization rule: all verbs vectorize across their array arguments. | [Wikipedia — APL Design](https://en.wikipedia.org/wiki/APL_(programming_language)#Design) |
|
||||
| **First-class functions** | **K** (primary), **BQN** (refinement) | K imported Scheme's first-class functions into the array paradigm. BQN expanded this with lexical closures and namespaces. The DSL adopts function-as-values as a core feature, enabling higher-order pipeline stages. | [Wikipedia — K Overview](https://en.wikipedia.org/wiki/K_(programming_language)#Overview); [BQN — Functional Programming](https://mlochbaum.github.io/BQN/doc/functional.html) |
|
||||
| **Point-free / tacit style** | **BQN** (primary), **Uiua** (modern proof) | BQN's train system is the most expressive tacit composition mechanism. Uiua demonstrates that forcing tacit by default is a viable (if challenging) design choice. The DSL allows both explicit-parameter and tacit styles. | [BQN — Function Trains](https://mlochbaum.github.io/BQN/doc/train.html); [GitHub — Uiua](https://github.com/uiua-lang/uiua) |
|
||||
| **Context-sensitive operator overloading** | **K** (primary) | K's radical ASCII overloading (one symbol, many meanings by context) is the extreme end of the spectrum. The DSL uses a fixed small verb set with context-sensitive arity rather than character overloading, trading extreme terseness for readability. | [Wikipedia — K Overview](https://en.wikipedia.org/wiki/K_(programming_language)#Overview) |
|
||||
| **High-performance array engine** | **K/q** (industrial confirmation) | Kdb+ (built on K) processes billions of records at microsecond latency, proving the array paradigm scales to production workloads. BQN's CBQN bytecode compiler confirms the paradigm can be compiled efficiently. | [KX — Benchmarks](https://kx.com/); [BQN — Performance](https://mlochbaum.github.io/BQN/implementation/perf.html) |
|
||||
| **Onboarding / REPL story** | **Uiua** (primary) | Uiua's online Pad, editor extensions, and community-first development model are the reference implementation for DSL adoption strategy. Dyalog APL's TryAPL and BQN's online REPL are partial precedents. | [Uiua.org](https://www.uiua.org); [GitHub — Uiua](https://github.com/uiua-lang/uiua) |
|
||||
|
||||
### Summary of Claims for Section 5, Claim 4
|
||||
|
||||
**Claim 4 (APL/K → `for x .. n` + `result[row, col]`) is grounded as follows:**
|
||||
|
||||
1. **`for x .. n`:** The iteration-over-range pattern maps to APL's `ιN` (iota-generate + reduce) and K's `!R` (enumerate). BQN's `↕N` is the cleanest modern form. The DSL's `for x .. n` is a named-variable spelling of what these languages express as array generation + implicit iteration.
|
||||
|
||||
2. **`result[row, col]`:** Multi-dimensional array indexing maps to APL's `result[i;j]` (Dyalog syntax), BQN's `⊏` (Select), and K's `@` (index-at). The DSL's bracket notation is a direct inheritance from this tradition.
|
||||
|
||||
3. **Pipeline composition:** The DSL's verb pipeline maps to BQN's function trains (`(F G) ∘ H`) and K's chained anonymous functions. This is the "glue" that makes `for x .. n` and `result[row, col]` composable without explicit loop syntax.
|
||||
|
||||
### Key Design Tensions Resolved by the Cluster
|
||||
|
||||
| Tension | How the Cluster Resolves It |
|
||||
|---|---|
|
||||
| Custom character set vs. ASCII | APL uses custom glyphs (one extreme); K/q and BQN use ASCII with new symbols; Uiua uses Unicode with standard keyboard input. **DSL decision:** ASCII-compatible with named verbs — glyph economy without the entry barrier. |
|
||||
| Named parameters vs. tacit | APL originally had no named parameters (classic syntax); BQN added dfns; K uses anonymous functions; Uiua has no named parameters at all. **DSL decision:** Explicit named parameters for readability, with tacit pipeline mode available. |
|
||||
| Nested arrays vs. based arrays | APL2 introduced nested arrays; BQN replaced them with the based array model. **DSL decision:** Based array model (simpler semantics, fewer edge cases). |
|
||||
| Operator overloading | K overloads heavily (extreme); BQN overloads minimally (clean). **DSL decision:** Fixed-arity verbs with context-sensitive dispatch, not character overloading. |
|
||||
@@ -0,0 +1,375 @@
|
||||
# Cluster 3 — Intent-Mapping (Jofito and Related)
|
||||
|
||||
**Sub-report for Section 2 of the Intent-Based Scripting Languages survey**
|
||||
**Track:** `intent_dsl_survey_20260612`
|
||||
**Written by:** Tier 2 sub-agent (cluster 3 research)
|
||||
**Sources:** Jofito video transcript + README, jq Wikipedia + official site, nagent tag protocol docs, WebAssembly Wikipedia
|
||||
|
||||
---
|
||||
|
||||
## Entry: Jofito (Jody Bruchon, 2023–2026)
|
||||
|
||||
**What it is.** Jofito is a C-based script engine for building advanced, high-performance file and disk management tools. It frames itself as an "intent mapping engine" — the user writes declarative intent (e.g., "find all pictures, filter out JPEGs, print the list"), and Jofito decomposes that intent into platform-optimal operations, automatically parallelizing across cores and optimizing away unnecessary data movement. The core technical innovations are arena allocation (bulk memory management with no per-object overhead), the leader/chaser thread model (pipeline stages chase each other through a shared arena rather than through separate process-bounded buffers), and "pipe coalescing" (find/grep/sort/unique collapse into a single in-memory script).
|
||||
|
||||
**What we take from it.** The "intent mapping engine" framing is the philosophical anchor for the DSL's Tier 2 (pipeline) verbs. Where traditional shells require the user to manually sequence `find | grep | sort | uniq` and pay the context-switch tax at each `|` boundary, Jofito's model lets the user say "here is the intent" and the engine handles the decomposition. The DSL's `scan -> filter -> select -> print` pipeline chain is directly inspired by Jofito's `scandir(...) : filter : print` predicate chain. The arena/leader-chaser model is not directly borrowed (the DSL is interpreted in Python, not compiled to optimal C), but the *design contract* — that verbs should be able to run in parallel without intermediate serialization — influences how Tier 2 verbs are specified.
|
||||
|
||||
### Detailed Analysis
|
||||
|
||||
#### The Old Way: Unix Pipeline Performance Tax
|
||||
|
||||
Jofito's video presentation opens with a demolition of the Unix pipeline model. The canonical example:
|
||||
|
||||
```sh
|
||||
find . -type f | grep -e '\.jpg$' | grep -e '\.png$'
|
||||
```
|
||||
|
||||
Jofito's analysis (lines 28–49 of the transcript) is blunt: to a layman, this is "cryptic crap." But the deeper problem is performance. Each `|` boundary in a Unix pipeline incurs:
|
||||
|
||||
1. **Context switch** — the producer process is suspended, the consumer process is scheduled (line 97: "throwing away your CPU state and trashing your caches")
|
||||
2. **Pipe buffer overhead** — data is copied from producer's address space to kernel pipe buffer to consumer's address space (lines 90–94)
|
||||
3. **Cache destruction** — each separate process has its own working set that blows out the L1 cache of the next (lines 106–119: "you're destroying your cache coherency by duplicating data")
|
||||
|
||||
The transcript is vivid on this point:
|
||||
|
||||
> "Every single time you do a context switch, you're basically throwing away your CPU state and trashing your caches, which makes everything run slower, because now all this stuff you're doing the work for here is no longer in main memory, or rather in the L1 cache, which is your CPU's execution core's main memory."
|
||||
> — `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:106–113`
|
||||
|
||||
And on the inefficiency of grep specifically:
|
||||
|
||||
> "Grep is general regular expression parser. It's a big fancy state machine that takes a while to spin up and is not all that fast at just simple globbing, which is the term used to refer to finding basically finding substrings in a string except in reverse."
|
||||
> — `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:65–69`
|
||||
|
||||
#### The Jofito Solution: Predicate Chains with Arena Allocation
|
||||
|
||||
Jofito's equivalent to the find/grep pipeline is a single predicate chain expressed as a C-like function call:
|
||||
|
||||
```c
|
||||
list = scandir("/path/here/", {filter !extension=jpg,jpeg}) : print(list)
|
||||
```
|
||||
|
||||
Breaking this down (per the README at `https://codeberg.org/jbruchon/jofito`):
|
||||
|
||||
> "if you want to retrieve a list of files like 'find . -type f' but filter out JPEG images, you might write and run this on a Linux x86-64 system:
|
||||
> `list = scandir("/path/here/", {filter !extension=jpg,jpeg}) : print(list)`
|
||||
> jofito can then take advantage of the low-level system call 'getdents64' to perform faster directory reads, SSE or AVX for finding the file extensions, and use the 'write' system call to output length-specified final strings."
|
||||
|
||||
The key structural idea is the curly-brace `{filter ...}` predicate. Unlike Unix pipelines where each stage is a separate process with its own output buffer, Jofito predicates run as threads sharing a single memory arena. The transcript (lines 155–174) explains:
|
||||
|
||||
> "Scan directory, however, has this curly brace filter... Filter is a generic predicate that calls a particular kind of filtration on a string or list of strings, and then filters them as you want them... It's much easier to read. We know we're scanning a directory."
|
||||
|
||||
#### Arena Allocation and the Leader/Chaser Thread Model
|
||||
|
||||
The most technically distinctive part of Jofito is the arena + leader/chaser model (lines 193–269). An arena is a large, pre-allocated memory region into which all intermediate results are written in order. The predicate chain (scan → filter → print) runs as three threads:
|
||||
|
||||
1. **Scanner** (leader) reads directory entries and stores them sequentially in the arena.
|
||||
2. **Filter** (chaser 1) trails behind the scanner, deallocating entries that don't match the predicate as it encounters them.
|
||||
3. **Printer** (chaser 2) trails behind the filter, outputting matching entries and freeing them as it goes.
|
||||
|
||||
The critical insight (lines 224–244):
|
||||
|
||||
> "So, we have a situation here where if you have three cores or threads on a machine, the directory scan can be happening... then the filtration of that scan will be happening in another thread or on another core at the same time... scanning, filtering, and printing can all happen on a modern machine with multiple cores simultaneously."
|
||||
|
||||
And on cache coherency (lines 270–285):
|
||||
|
||||
> "The likelihood of say the scanner here has just loaded bad.text into the list and then the filter here has filtered just qualified abc.jpeg and the print has just printed xyz.png... if you have predicates that are fast enough, they're all kind of working in lockstep, which means that these items are still hot in the level one instruction and data caches as it's iterating through this list."
|
||||
|
||||
Terminal objects (entries filtered out) are immediately deallocated from the arena without causing index mismatches for downstream predicates — the arena uses an indirection block scheme so that high-level primitives point to fixed indirection entries while low-level locations can be compacted (lines 335–355). This is the "write the optimization once, reap the benefits everywhere" contract: once Jofito knows how to optimally fuse scan+filter+print for a given filesystem, that optimization applies to every subsequent invocation without the user re-specifying it.
|
||||
|
||||
#### Pipe Coalescing: The Killer Feature for DSL Design
|
||||
|
||||
The most directly relevant feature for the DSL is "pipe coalescing" (lines 376–410). When the Unix shell sees `find ... | grep ... | sort | uniq`, each utility is a separate process. Jofito's pipe coalescing detects when multiple utilities in a pipeline are all Jofito scripts and collapses them into a single in-memory script:
|
||||
|
||||
> "I've come up with some tech called pipe coalescing where find and grep see their part of a pipeline. Find and grep see their the same Jofito executable. And then find is the head, so it's the coordinator. And all the subordinates down the pipeline reach out to the head and say, 'Hey, here's my script, here's my parameters, integrate me into you and I'll just become a hollow pipe that sends the final results down the line. Thus, find and grep and sort and unique and whatever else your big long stupid pipeline might use all get collapsed by Jofito... into one unified Jofito script in memory that then performs all these actions and thus can optimize away um cases where, for example, it would be wasteful to get certain information, um it can optimize away that stuff and do it faster than you would ever be able to do it with a normal pipeline on your own."
|
||||
|
||||
This is the direct precedent for the DSL's Tier 2 pipeline verb `pipe` — the idea that a chain of verbs (`scan -> filter -> sort -> dedupe`) can be coalesced into a single pass rather than spawning intermediate processes.
|
||||
|
||||
#### The Intent Mapping Engine Manifesto
|
||||
|
||||
The 2026 README update (`https://codeberg.org/jbruchon/jofito`) names the design philosophy explicitly:
|
||||
|
||||
> "2026 UPDATE NOTE: This tool was originally intended to act like a sort of 'SQL for managing filesystems' but I am generalizing it out to become an 'intent mapping engine' instead. I intend to replace coreutils, findutils, grep, and sed with 'scripted' commands of intent. The general idea is that if you write a program in the jofito language, you can not only run it anywhere that jofito has been ported, but you also get the maximal performance and safety offered by the underlying system and hardware. Essentially, jofito is a 'write the optimization once, reap the benefits everywhere' system that takes what the user wants to accomplish (intent) as input and decomposes it into operations that make the most sense for the current system."
|
||||
|
||||
The "intent mapping engine" framing is the fourth anchor claim for section 1 of the main report.
|
||||
|
||||
### Code Examples from Source
|
||||
|
||||
**Jofito predicate chain (from README):**
|
||||
```c
|
||||
list = scandir("/path/here/", {filter !extension=jpg,jpeg}) : print(list)
|
||||
```
|
||||
|
||||
**Equivalent Unix pipeline (from transcript line 34–38):**
|
||||
```sh
|
||||
find . -type f | grep -e '\.jpg$' | grep -e '\.png$'
|
||||
```
|
||||
|
||||
**Pipe coalescing concept (from transcript lines 383–402):**
|
||||
```sh
|
||||
# Without coalescing: 4 separate processes
|
||||
find . -type f | grep -e '\.jpg' | sort | uniq
|
||||
# Jofito coalesces find+grep+sort+unique into one in-memory script
|
||||
```
|
||||
|
||||
### Take (for Section 1 Anchor Claims)
|
||||
|
||||
- **Anchor 4 (Intent Mapping Framing):** "Jofito is a 'write the optimization once, reap the benefits everywhere' system that takes what the user wants to accomplish (intent) as input and decomposes it into operations that make the most sense for the current system." (`https://codeberg.org/jbruchon/jofito`, 2026 UPDATE NOTE) — this is the naming citation for the DSL's "intent-based" design philosophy.
|
||||
- **Tier 2 verb justification:** The `scan -> filter -> select -> print` pipeline chain maps directly to Jofito's `scandir(...) : filter : print` predicate chain (`docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:138–174`).
|
||||
- **Pipe coalescing → DSL `pipe` verb:** Jofito's pipe coalescing (collapsing find+grep+sort+unique into one in-memory script, `transcript:376–410`) is the design precedent for the DSL's `pipe` verb — the idea that chained verbs can be fused into a single-pass execution plan.
|
||||
- **Arena/leader-chaser → Tier 2 execution model:** While not implementing the full arena model, the DSL's Tier 2 verbs are specified to be parallelizable and to avoid intermediate serialization, honoring Jofito's cache-coherency contract (`transcript:270–285`).
|
||||
|
||||
---
|
||||
|
||||
## Entry: jq (Stephen Dolan, 2012–)
|
||||
|
||||
**What it is.** jq is a lightweight, flexible command-line JSON processor built in C, described by its creator Stephen Dolan as "like sed for JSON data." It applies the Unix filter-pipeline model to structured JSON data: programs are composed of filters that transform input into output, chained with the `|` operator. Unlike sed (which operates on lines of text), jq operates on JSON values — arrays, objects, scalars — using a purely functional, composable filter language.
|
||||
|
||||
**What we take from it.** The DSL takes two things from jq: (1) the `|` pipe idea (replaced with `->` in our DSL to avoid conflict with shell usage), and (2) the filter-as-expression style where every filter is a value that can be composed. jq's insight — that data transformation should be expressed as a composition of small, reusable filter functions rather than as imperative step-by-step instructions — is the same insight behind the DSL's Tier 2 verbs.
|
||||
|
||||
### Detailed Analysis
|
||||
|
||||
#### The Pipe Operator and Filter Composition
|
||||
|
||||
jq's core innovation is applying the Unix pipe model to structured data. From the Wikipedia entry (`https://en.wikipedia.org/wiki/Jq_(programming_language)`):
|
||||
|
||||
> "In jq, programs consist of filters that can be composed in pipelines that perform a variety of operations on their inputs."
|
||||
|
||||
The jq manual (cited in the Wikipedia article) uses the `|` operator as a pipeline combinator. A jq program like `.parse | .categories | .[] | .["*"]` navigates a nested JSON structure by chaining filters: `.parse` extracts the `parse` key, `.categories` extracts `categories`, `.[]` iterates over array items, and `.["*"]` extracts the `*` key from each.
|
||||
|
||||
The jq website (`https://jqlang.org/`) frames it this way:
|
||||
|
||||
> "jq is like sed for JSON data — you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text."
|
||||
|
||||
The original description (2013, archived at `https://en.wikipedia.org/wiki/Jq_(programming_language)` citing `http://jqlang.github.io/jq`):
|
||||
|
||||
> "like sed for JSON data"
|
||||
|
||||
The filter composition model means every jq expression is itself a filter that can be used as a sub-expression in a larger pipeline. There are no statements, only expressions that produce values. This is the "tacit" or "point-free" programming style — functions compose without naming their arguments.
|
||||
|
||||
#### jq's Type System and Streaming Parser
|
||||
|
||||
jq's type system is minimal and maps directly to JSON: strings, numbers, booleans, null, arrays, objects. Every JSON value is a jq value. The streaming parser (added in jq 1.5) produces a stream of `[path, value]` arrays for all "leaf" paths in a JSON document, enabling memory-efficient processing of JSON inputs too large to fit in memory.
|
||||
|
||||
This is relevant to the DSL because the Tier 2 pipeline verbs operate on similar data shapes — the DSL's `select` and `filter` verbs work on record streams (similar to jq's object iteration), and the `gather` verb could theoretically use a streaming approach for large file sets.
|
||||
|
||||
#### jq Implementations and Influence
|
||||
|
||||
jq has been reimplemented in Go (gojq), Rust (jaq), and even in jq itself (jqjq). The Wikipedia article notes that jaq uses denotational semantics to formalize jq behavior where the original jq documentation is unclear. This is a validation of jq's design: it is important enough to warrant multiple independent reimplementations, each trying to get the semantics right.
|
||||
|
||||
The DSL's ambition to be interpretable by multiple agent backends (not just the current Python implementation) has a parallel in jq's multi-implementation ecosystem.
|
||||
|
||||
#### Syntax Example from Source
|
||||
|
||||
From the Wikipedia jq article's tutorial section:
|
||||
|
||||
```jq
|
||||
# The jq pipeline (abbreviated form):
|
||||
."parse" | .categories | .[] | .["*"]
|
||||
|
||||
# Equivalent named filter example from the Wikipedia article (def tobase):
|
||||
def tobase($b):
|
||||
def digit: "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"[.:.+1];
|
||||
def mod: . % $b;
|
||||
def div: ((. - mod) / $b);
|
||||
def digits: recurse( select(. >= $b) | div) | mod ;
|
||||
select(2 <= $b and $b <= 36)
|
||||
| [digits | digit] | reverse | add;
|
||||
```
|
||||
|
||||
This shows jq's functional composition style: `select(...) | [digits | digit] | reverse | add` chains filters without naming intermediate values.
|
||||
|
||||
### Take
|
||||
|
||||
- **DSL `->` pipe operator:** jq's `|` pipe is the conceptual precedent for the DSL's `->` pipeline operator. The DSL replaces `|` with `->` to avoid conflict with shell usage and to make the DSL parseable without shell-aware lexing.
|
||||
- **Filter-as-expression style:** jq's model where every filter is a composable expression that produces a value directly maps to the DSL's Tier 2 verbs — `scan`, `select`, `filter`, `map`, `fold` — which are expressions that produce streams, not imperative statements.
|
||||
- **Tier 2 verb semantics:** The `select` verb in particular mirrors jq's `select(condition)` filter, which passes only values matching a condition. The `dedupe` verb mirrors jq's `unique` filter.
|
||||
|
||||
---
|
||||
|
||||
## Entry: nagent's Tag Protocol (Jody Bruchon, 2024–2025)
|
||||
|
||||
**What it is.** nagent is Jody Bruchon's autonomous coding agent framework. Its §4 "visible output protocol" uses a self-closing XML-ish tag format (e.g., `<nagent-read path="src/foo.py"/>`) that the agent emits as text. A parser (`nagent_tags.py`) matches tags to handler functions (`execute_read`, etc.). The protocol is explicitly not XML — first matching close-tag wins, there is no entity escaping, and the tag format is designed for human readability and LLM emit-ability rather than for machine interchange fidelity.
|
||||
|
||||
**What we explicitly reject (and what we take):** We **take** the idea of a compact, human-readable structured protocol for tool invocation — the `<name attr="value"/>` surface syntax that external agents can emit without knowing the underlying function-call JSON schema. We **reject** the XML angle-bracket notation per the user's explicit instruction: "ignore its record formats as they problably will be less xml/json based as I don't like them." (`conductor/tracks/nagent_review_20260608/decisions.md:50` citing user signal).
|
||||
|
||||
### Detailed Analysis
|
||||
|
||||
#### The Tag Protocol Design
|
||||
|
||||
The nagent tag protocol was documented in `nagent_takeaways_20260608.md` (lines 210–230). The core design:
|
||||
|
||||
> "`<nagent-read path="..."/>` is a self-closing tag. The model emits it; the parser matches; `execute_read` runs. The model doesn't need to know the function-call schema for the LLM SDK — it just needs to emit text containing a tag." (`conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md:212`)
|
||||
|
||||
The contrast with standard function calling is explicit:
|
||||
|
||||
> "The training data for 'emit a `<nagent-read>` tag' is zero; the training data for 'emit a `read_file` tool call' is high. *Function calling wins on capability and on training; tag protocols win on debuggability.*" (`nagent_takeaways_20260608.md:214`)
|
||||
|
||||
The protocol was later refined in nagent v2 with an explicit parser (`nagent_tags.py`) replacing regex-based parsing. The `agent_review_v2_1_20260612.md` documents it (line 50):
|
||||
|
||||
> "`nagent_tags.py`: ~160 (6KB). The new explicit tag parser. Replaces regex parsing. `TagNode` dataclass with `name, attrs, content, self_closing, start, end`. `parse_tag_document` walks whitespace + elements. `find_block_span`, `extract_block`, `replace_first_block`, `remove_first_block` are the public helpers. **The protocol is XML-ish, not XML** — first matching close tag wins; no entity escaping."
|
||||
|
||||
#### The Explicit "We Reject This" Note
|
||||
|
||||
The user signal in `decisions.md` is unambiguous (line 50, spec.md line 50):
|
||||
|
||||
> "**Not** adopting XML/JSON record formats. Per the user: 'ignore its record formats as they problably will be less xml/json based as I don't like them.'"
|
||||
|
||||
And in `decisions.md` line 119 (Candidate 4 framing):
|
||||
|
||||
> "The existing JSON function-calling format forces the user to read verbose `{"name": "...", "args": {...}}` blobs."
|
||||
|
||||
The intent-based DSL examples listed in `decisions.md:124–128` use angle brackets, but the user explicitly rejected that notation. The DSL's notation must find a different surface syntax that preserves the structured-protocol properties (compact, human-readable, LLM-emit-able) without using `<>` or `{}` as structural delimiters.
|
||||
|
||||
#### Why We Reject the XML Angle-Bracket Approach
|
||||
|
||||
The specific reasons for rejecting XML angle-bracket notation:
|
||||
|
||||
1. **User preference:** The user explicitly said "I don't like them" (`decisions.md:50`)
|
||||
2. **LLM training data mismatch:** `<nagent-read>` has zero training data in existing models; angle-bracket notation would require fine-tuning or prompt engineering that a more conventional syntax would not (`nagent_takeaways_20260608.md:214`)
|
||||
3. **Ambiguity with HTML/Markdown:** Angle-bracket notation conflicts with common markup patterns in the contexts where the DSL will be used (agent prompts, tool outputs)
|
||||
4. **The protocol properties we DO want:** compact (not JSON-verbose), human-readable, structured (name + attributes), LLM-emit-able
|
||||
|
||||
The structured-protocol *idea* (a named operation with typed attributes, not a JSON blob) is the right direction. The notation just needs to be different.
|
||||
|
||||
#### The Bridge DSL Concept
|
||||
|
||||
The `nagent_takeaways_20260608.md` proposes a bridge DSL (lines 216–222) as the right model:
|
||||
|
||||
```
|
||||
<ms-tool name="read_file" path="src/foo.py" />
|
||||
<ms-tool name="py_get_skeleton" path="src/foo.py" symbol="MyClass" />
|
||||
```
|
||||
|
||||
The document notes this is Decision candidate #4 reframed as a *bridge* DSL rather than a Meta-Tooling-side DSL. The Application's function-calling stays the same. The bridge DSL is what external agents emit.
|
||||
|
||||
The DSL's notation must serve the same purpose — compact, structured tool invocation by LLMs — without using angle brackets. Possible alternatives (not mandated here, just noted for the Tier 1's synthesis):
|
||||
- `read_file src/foo.py` (verb-first, space-delimited)
|
||||
- `read_file(src/foo.py)` (function-call-like but simpler than JSON)
|
||||
- `read_file "src/foo.py"` (quoted-argument form)
|
||||
|
||||
### Take
|
||||
|
||||
- **Structured protocol idea (TAKEN):** The idea of a compact, named-operation-with-attributes format for tool invocation is right. External agents can emit this format without knowing the function-call JSON schema.
|
||||
- **XML angle brackets (REJECTED):** Per the user ("I don't like them"), the DSL must use a different notation. The specific reasons: user preference, LLM training data mismatch, HTML/Markdown ambiguity.
|
||||
- **nagent's `name="..."` attribute syntax:** The idea of named attributes (as opposed to positional arguments) is retained — `scan dir=".", filter_extension="jpg"` reads more naturally than `scan ".", "jpg"` for complex tool calls.
|
||||
- **Self-closing tag for no-content operations:** The concept of a self-closing tag (no content body needed) maps to the DSL's distinction between verbs that produce output and verbs that are used for their side effect.
|
||||
|
||||
---
|
||||
|
||||
## Entry: WebAssembly (W3C, 2017–)
|
||||
|
||||
**What it is.** WebAssembly (Wasm) is a binary instruction format and text format for a portable, streaming-compiled virtual stack machine. It defines a compact, sectioned binary format with linear memory (a single growable byte array separate from the call stack) and structured control flow (no `goto`; all branches are scoped via `block`/`loop`/`if`/`end`).
|
||||
|
||||
**What we take from it.** One paragraph only: Wasm's linear memory model is the modern reference for the "tape drive" argument-passing analogy that grounds the DSL's data-passing semantics. A program that processes a stream of records operates on a single linear memory region; records are not objects with individual heap allocations but entries in a contiguous buffer. This is the execution model Jofito implements in C and the model the DSL's Tier 2 verbs are specified against.
|
||||
|
||||
### Detailed Analysis
|
||||
|
||||
#### Linear Memory
|
||||
|
||||
From the Wikipedia article on WebAssembly (`https://en.wikipedia.org/wiki/WebAssembly`):
|
||||
|
||||
> "Data in memory is stored in a large, growable array of bytes termed a linear memory. Linear memory is separate from the wasm module's call stack and code and the engine's memory. This allows running wasm code in the same process as the JavaScript virtual machine it's embedded in without violating memory safety."
|
||||
|
||||
The linear memory model means Wasm has no heap fragmentation, no garbage collection overhead for short-lived objects, and no per-allocation metadata. All data lives in one region; the engine can prefetch and cache it efficiently. This is the same contract Jofito's arena provides: entries are stored contiguously and compacted as they become dead.
|
||||
|
||||
#### Sectioned Binary Format and Streaming
|
||||
|
||||
> "The binary format is straightforward and designed to allow streaming compiling, so compiling can begin before the module is finished downloading, and to allow functions to be compiled in parallel." (`https://en.wikipedia.org/wiki/WebAssembly`)
|
||||
|
||||
The sectioned binary format means the Wasm loader can start executing as soon as the header and function signatures are loaded, without waiting for the full module. For the DSL, this suggests a parsing strategy where verb names and signatures are parsed first (cheap, early validation) and arguments are parsed on demand.
|
||||
|
||||
#### Structured Control Flow
|
||||
|
||||
> "Unlike typical assembly languages, wasm only uses structured control flow similar to high-level programming languages. The intentional lack of support for jump instructions makes it simple to validate and compile wasm code in a single pass, and makes it easier to read code disassembled into the text format." (`https://en.wikipedia.org/wiki/WebAssembly`)
|
||||
|
||||
This is relevant to the DSL's error recovery model: structured recovery (try/recover blocks with explicit nesting) is easier to validate and recover from than unstructured jumps. The DSL's `try { ... } recover { ... }` envelope mirrors Wasm's structured control flow.
|
||||
|
||||
### Take
|
||||
|
||||
- **Linear memory → DSL Tier 2 execution model:** Wasm's linear memory (single contiguous buffer, no per-record heap allocation) is the implementation reference for the execution model Tier 2 verbs are specified against. Jofito's arena is the C-level precedent.
|
||||
- **Streaming parse → DSL parsing strategy:** Wasm's ability to start compiling before the full module is loaded suggests the DSL parser can validate verb names and signatures early (cheap) and defer argument parsing (potentially expensive for large file lists) to execution time.
|
||||
- **Structured control flow → DSL error recovery:** Wasm's block/loop/if/end structured control flow is the model for the DSL's `try/recover` envelope. Both enforce nesting correctness at parse time.
|
||||
|
||||
---
|
||||
|
||||
## Synthesis for the DSL
|
||||
|
||||
This section maps each Tier 3 (shell) and Tier 2 (pipeline) verb in the DSL to the specific Jofito/jq entry that grounds it. The Tier 1 will use this to write section 1's anchor claim 4 (Jofito → intent-mapping framing) and section 4's Tier 2/3 verb justifications.
|
||||
|
||||
### Tier 2 — Data-Oriented Pipeline Verbs
|
||||
|
||||
These verbs implement the Jofito "predicate chain" model. They operate on record streams (not individual files or values) and are designed to be parallelizable without intermediate serialization.
|
||||
|
||||
| DSL Verb | Grounding Entry | Key Citation |
|
||||
|---|---|---|
|
||||
| `scan` | Jofito `scandir()` | Jofito's `scandir("/path/here/", {filter ...})` predicate — the leader of the leader/chaser chain. The DSL's `scan` is the first verb in every pipeline, the entry point for data. | `transcript:138–174`, `README:scandir example` |
|
||||
| `filter` | Jofito `{filter ...}` predicate | Jofito's filter predicate chases the scanner through the arena, deallocating non-matching entries. The DSL's `filter` similarly screens records based on a condition. | `transcript:155–174`, `transcript:209–244` |
|
||||
| `select` | jq `select(condition)` filter | jq's `select(.field == "value")` passes only matching values. The DSL's `select` is the same concept — a filter that tests a condition and passes records that satisfy it. | `https://en.wikipedia.org/wiki/Jq_(programming_language):Syntax_and_semantics/Filters` |
|
||||
| `map` | jq map/transform filters | jq's ability to transform every element in a stream (`.[] | .field`) maps to the DSL's `map` — applying a transformation to each record in the stream. | `https://jqlang.org/` ("slice and filter and map and transform") |
|
||||
| `fold` | jq reduction (`reduce`) | jq's `reduce` operator accumulates a stream into a single value. The DSL's `fold` similarly reduces a record stream to an aggregate result. | `https://en.wikipedia.org/wiki/Jq_(programming_language):Syntax_and_semantics/Forms` |
|
||||
| `sort` | Jofito implicit in predicate chain | Jofito's pipe coalescing handles sort+unique in the same pass. The DSL's `sort` verb is a pipeline stage for ordering records. | `transcript:397–402` |
|
||||
| `dedupe` | jq `unique` filter | jq's `unique` filter removes duplicate values from a stream. The DSL's `dedupe` serves the same purpose. | `https://en.wikipedia.org/wiki/Jq_(programming_language):Filters` |
|
||||
| `group` | jq `group_by` | jq has `group_by(.field)` functionality. The DSL's `group` verb collects records sharing a key into sub-streams. | `https://jqlang.org/manual/` (jq manual) |
|
||||
| `arena { }` | Jofito arena allocation | Jofito's arena is a bulk-allocated memory region where all intermediate results are stored contiguously. The DSL's `arena { }` block scopes a pipeline's working memory — it is a performance hint that the enclosed pipeline should use a contiguous buffer rather than per-record allocations. | `transcript:193–209`, `README:arena description` |
|
||||
| `scatter` | Jofito leader/chaser model | Jofito's filter predicate can run in parallel with the scanner, "scattering" work across cores. The DSL's `scatter` verb explicitly forks a pipeline across multiple workers. | `transcript:250–269` |
|
||||
| `gather` | Jofito leader/chaser model | The print predicate "gathers" the filtered stream from the arena. The DSL's `gather` collects scattered sub-streams back into a single stream. | `transcript:244–269` |
|
||||
| `pipe` | Jofito pipe coalescing | Jofito's pipe coalescing collapses `find | grep | sort | uniq` into one in-memory script. The DSL's `pipe` verb explicitly fuses a sub-pipeline into a single-pass execution plan. This is the most directly borrowed concept — the idea that a pipeline chain can be optimized as a unit rather than executed stage by stage. | `transcript:376–410` |
|
||||
|
||||
### Tier 3 — Shell Verbs
|
||||
|
||||
These verbs wrap existing MCP tools and provide the shell-scripting surface. They are the "imperative veneer" over the declarative Tier 2 pipeline. Each is grounded in either Jofito (for file operations) or jq (for data transformation), or serves as an escape hatch to existing Unix tooling.
|
||||
|
||||
| DSL Verb | Grounding Entry | Key Citation |
|
||||
|---|---|---|
|
||||
| `read` | nagent tag protocol (`<nagent-read path="..."/>`) | The idea of a compact, named-operation format for file reading. NOT the angle-bracket notation — the concept of a structured protocol that an LLM can emit without knowing the underlying function-call schema. The DSL's `read` is the Tier 3 surface for `mcp_client.py`'s `read_file` tool. | `nagent_takeaways_20260608.md:212`, `decisions.md:124` |
|
||||
| `edit` | nagent tag protocol (structured edit tag) | Same structured-protocol idea as `read`. The DSL's `edit` verb maps to the proposed DSL notation for surgical edits (e.g., `edit src/foo.py:42-50:new_code`). | `decisions.md:126` |
|
||||
| `glob` | Jofito `scandir` with extension filter | Jofito's `scandir` with a `{filter extension=...}` predicate is a more ergonomic glob. The DSL's `glob` wraps the existing MCP `Path` globbing tools but is also the entry point that feeds `scan`. | `README:scandir example` |
|
||||
| `search` | jq filter composition | jq's filter composition (`.foo | .bar | .baz`) as a model for composing search predicates. The DSL's `search` verb applies a predicate to find records matching criteria. | `https://jqlang.org/` |
|
||||
| `exec` | Jofito pipe coalescing | The escape hatch: when the DSL's pipeline verbs aren't sufficient, `exec` runs an arbitrary shell command. This is the "fall back to Unix" safety valve, analogous to Jofito falling back to individual system calls when the arena model doesn't apply. | `transcript:376–410` |
|
||||
| `run` | Jofito script execution | Jofito scripts are compiled and run as units. The DSL's `run` verb executes a named script or pipeline, analogous to running a Jofito program. | `README:general idea` |
|
||||
| `test` | nagent tag protocol (structured test tag) | Same structured-protocol idea as `read`/`edit`. The DSL's `test` verb maps to the proposed DSL notation for running specific tests. | `decisions.md:127` |
|
||||
| `discover` | jq filter composition + Jofito intent | The "discovery" intent from `decisions.md:128` (`<discover what calls X>`) combines jq-style navigation with Jofito's intent-mapping philosophy: the user says what they want to find, the system figures out how. | `decisions.md:128`, `README:intent mapping` |
|
||||
| `mcp` | nagent self-describing tools | nagent's `--description` exit pattern (`nagent_takeaways_20260608.md:236–244`) lets each tool describe itself. The DSL's `mcp` verb is the escape hatch to raw MCP tool dispatch, with self-description metadata available. | `nagent_takeaways_20260608.md:236–244` |
|
||||
|
||||
### Mapping Summary for Tier 1
|
||||
|
||||
**Section 1, Anchor Claim 4 (Intent Mapping Framing):** Cite Jofito README 2026 UPDATE NOTE: "jofito is a 'write the optimization once, reap the benefits everywhere' system that takes what the user wants to accomplish (intent) as input and decomposes it into operations that make the most sense for the current system." (`https://codeberg.org/jbruchon/jofito`)
|
||||
|
||||
**Section 4, Tier 2 Verb Justifications:** Each Tier 2 verb cites Jofito predicate chain (for `scan`, `filter`, `arena`, `scatter`, `gather`, `pipe`) or jq filter composition (for `select`, `map`, `fold`, `sort`, `dedupe`, `group`).
|
||||
|
||||
**Section 4, Tier 3 Verb Justifications:** Each Tier 3 verb cites either nagent's structured protocol idea (for `read`, `edit`, `test`, `discover`) or Jofito's tool-replacement model (for `glob`, `exec`, `run`, `mcp`).
|
||||
|
||||
**Key design constraint from nagent rejection:** The DSL must NOT use XML angle-bracket notation. The structured-protocol properties (compact, human-readable, LLM-emit-able, name+attributes) must be preserved with a different notation. Possible candidates: verb-first space-delimited (`read_file src/foo.py`), function-call-like parentheses (`read_file("src/foo.py")`), or quoted-argument form. The choice is left to the Tier 1's synthesis.
|
||||
|
||||
---
|
||||
|
||||
## Citations Index
|
||||
|
||||
| Citation | Source | Type |
|
||||
|---|---|---|
|
||||
| `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:28–49` | Jofito video: old pipeline model | File:line |
|
||||
| `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:65–69` | Jofito video: grep inefficiency | File:line |
|
||||
| `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:90–133` | Jofito video: context switch cost | File:line |
|
||||
| `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:106–113` | Jofito video: cache destruction quote | File:line |
|
||||
| `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:138–174` | Jofito video: scandir + filter predicate | File:line |
|
||||
| `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:155–174` | Jofito video: filter predicate explanation | File:line |
|
||||
| `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:193–209` | Jofito video: arena allocation | File:line |
|
||||
| `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:209–269` | Jofito video: leader/chaser model | File:line |
|
||||
| `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:224–244` | Jofito video: thread coordination | File:line |
|
||||
| `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:244–269` | Jofito video: print chasing filter | File:line |
|
||||
| `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:270–285` | Jofito video: cache coherency win | File:line |
|
||||
| `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:297–335` | Jofito video: terminal object destruction | File:line |
|
||||
| `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:335–355` | Jofito video: arena indirection block | File:line |
|
||||
| `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:356–373` | Jofito video: real-world find/grep replacement | File:line |
|
||||
| `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt:376–410` | Jofito video: pipe coalescing | File:line |
|
||||
| `https://codeberg.org/jbruchon/jofito` | Jofito README (2026 UPDATE NOTE) | URL |
|
||||
| `conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md:212` | nagent tag protocol description | File:line |
|
||||
| `conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md:214` | nagent: function calling vs tag protocol | File:line |
|
||||
| `conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md:216–230` | nagent Bridge DSL proposal | File:line |
|
||||
| `conductor/tracks/nagent_review_20260608/decisions.md:50` | User: reject XML/JSON record formats | File:line |
|
||||
| `conductor/tracks/nagent_review_20260608/decisions.md:119` | User signal: explicit want for intent DSL | File:line |
|
||||
| `conductor/tracks/nagent_review_20260608/decisions.md:124–128` | Intent DSL examples with angle brackets | File:line |
|
||||
| `conductor/tracks/nagent_review_20260608/agent_review_v2_1_20260612.md:50` | nagent_tags.py explicit parser description | File:line |
|
||||
| `conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md:236–244` | nagent --description self-describing tools | File:line |
|
||||
| `https://en.wikipedia.org/wiki/Jq_(programming_language)` | jq Wikipedia article | URL |
|
||||
| `https://jqlang.org/` | jq official site | URL |
|
||||
| `https://en.wikipedia.org/wiki/WebAssembly` | WebAssembly Wikipedia (linear memory + binary format) | URL |
|
||||
@@ -0,0 +1,447 @@
|
||||
# Cluster 4: Meta-Tooling DSLs and Agent-Facing Languages
|
||||
|
||||
**Track:** `intent_dsl_survey_20260612`
|
||||
**Cluster:** 4 — Meta-Tooling DSLs
|
||||
**Author:** Tier 2 Tech Lead
|
||||
**Date:** 2026-06-12
|
||||
**Sources:** 4 entries (2 internal track specs, 2 provider docs)
|
||||
|
||||
---
|
||||
|
||||
## Entry: mcp_dsl_20260606 (Manual Slop's Internal DSL Placeholder)
|
||||
|
||||
### What the Work Is
|
||||
|
||||
The `mcp_dsl_20260606` track is a **planned follow-on** to the `mcp_architecture_refactor_20260606` track (which splits the 2,205-line `src/mcp_client.py` into 7 sub-MCP classes). It does not exist yet as implemented code — it is documented as a deferred design exercise in `spec.md` §12.1 and §13.1. The user explicitly expressed interest in an "APL/K/Cosy-inspired" compact dialect for per-MCP tool calling, and the MCP architecture refactor is explicitly designed to *lay the groundwork* without implementing the DSL. Per `spec.md:26`: "A future track MAY introduce a DSL layer; this track stays JSON-compatible and lays no groundwork that would prevent a future DSL."
|
||||
|
||||
The design as specced contrasts a JSON call (~80 tokens) with a DSL call (~10 tokens, ~8x reduction):
|
||||
|
||||
```python
|
||||
# JSON (current, per mcp_client.py dispatch interface)
|
||||
{"name": "py_get_skeleton", "arguments": "{\"path\": \"/src/foo.py\"}"}
|
||||
|
||||
# DSL (proposed, per spec.md §12.1)
|
||||
py k /src/foo.py
|
||||
```
|
||||
|
||||
The DSL is **per-MCP**, not uniform: each sub-MCP (`mcp_file_io`, `mcp_python`, `mcp_c`, `mcp_cpp`, `mcp_web`, `mcp_analysis`) would have its own grammar definition (e.g., `py_grammar.k`, `file_io_grammar.k`). A per-MCP grammar compiler would translate DSL tokens to the JSON dispatch format. Backward compat: the JSON path stays; the DSL is opt-in per MCP.
|
||||
|
||||
### What We Take From It
|
||||
|
||||
The MCP DSL entry is the **closest project-internal reference** for what an intent-based DSL looks like in this project. It establishes two critical constraints: (1) the DSL is Meta-Tooling-facing, not Application-facing — the Application's `mcp_client.dispatch` interface stays JSON; (2) each sub-MCP is a natural "DSL compilation unit," suggesting the Tier 4 verb vocabulary should be organized per capability cluster rather than as a flat list.
|
||||
|
||||
The 8x token-reduction claim (from `spec.md:460`) establishes the **design objective**: the DSL must be compact enough to appear inline in natural language prompts without burning context budget. This is the primary metric.
|
||||
|
||||
### Analysis
|
||||
|
||||
The DSL design space is described in `spec.md:456-465` (§12.1 Follow-up Track) and `spec.md:488` (external reference to "the user's friend on APL/K/Cosy DSLs for tool calling"). The architecture rationale is in `spec.md:22-26`:
|
||||
|
||||
> "DSL future: the user noted a future interest in per-MCP compact DSLs (APL/K/Cosy-inspired) for tool calling instead of JSON. **This is explicitly OUT OF SCOPE for this track** (per user: 'no time for that'). A future track MAY introduce a DSL layer; this track stays JSON-compatible and lays no groundwork that would prevent a future DSL."
|
||||
|
||||
The sub-MCP Protocol (`spec.md:65-84`) defines `list_tool_schemas()` as the self-describing interface — each sub-MCP advertises its own capabilities. This is the bridge between the JSON world (where schemas are the tool advertisement) and the DSL world (where the grammar itself is the advertisement). The `SubMCP` Protocol is shown at `spec.md:65-82`:
|
||||
|
||||
```python
|
||||
class SubMCP(Protocol):
|
||||
name: str
|
||||
description: str
|
||||
tools: dict[str, Callable[..., str]]
|
||||
def invoke(self, tool_name: str, args: dict[str, Any]) -> Result[str, Any]: ...
|
||||
def list_tool_schemas(self) -> list[dict[str, Any]]:
|
||||
"""Return the JSON-serializable tool schemas for this sub-MCP's tools.
|
||||
Used by MCPController.get_tool_schemas() to aggregate the full list
|
||||
for the AI's initial context. Per nagent_review takeaway #5 (the
|
||||
self-describing tool pattern), this is the data-driven alternative
|
||||
to a hard-coded dispatch chain."""
|
||||
```
|
||||
|
||||
The non-goals at `spec.md:42-49` are equally informative: the DSL does NOT change the agent runtime's tool-calling format, does NOT migrate to TypedDict schemas, and does NOT add new tool categories. This delimits the DSL's scope strictly to the Meta-Tooling bridge side.
|
||||
|
||||
The `spec.md:456-465` §12.1 explicitly lists the DSL's design parameters:
|
||||
|
||||
> "Examples: JSON: `{"name": "py_get_skeleton", "arguments": "{\"path\": \"/src/foo.py\"}"}` (~80 tokens per call); DSL: `py k /src/foo.py` (~10 tokens per call, ~8x reduction). A per-MCP grammar definition (`py_grammar.k`, `file_io_grammar.k`, etc.) could be authored and compiled to a parser. A per-MCP DSL → JSON converter at the dispatch boundary. Backward compat: the JSON path stays; the DSL is opt-in per MCP."
|
||||
|
||||
**Citations:** `conductor/tracks/mcp_architecture_refactor_20260606/spec.md:22-26, 42-49, 65-82, 456-465, 488`
|
||||
|
||||
### Take
|
||||
|
||||
- The DSL is **Meta-Tooling-only**: the Application's `mcp_client.dispatch` stays JSON. The DSL is a bridge-side translation layer.
|
||||
- **Per-MCP grammar organization** is the right unit of DSL design — each sub-MCP owns its grammar, compiled to a parser that feeds the dispatch boundary.
|
||||
- The **8x token reduction target** (80 → 10 tokens) is the concrete design objective. The Tier 4 verb vocabulary should be evaluated against this metric.
|
||||
- The `SubMCP.list_tool_schemas()` Protocol is the bridge between JSON schemas (used by the Application AI) and DSL grammars (used by the Meta-Tooling). It should be the **schema source of truth** for both representations.
|
||||
- **Backward compat is non-negotiable**: JSON stays, DSL is additive. Any DSL design that would retire the JSON path is out of scope.
|
||||
|
||||
---
|
||||
|
||||
## Entry: nagent's Bridge DSL (Meta-Tooling Intent DSL)
|
||||
|
||||
### What the Work Is
|
||||
|
||||
The Bridge DSL is nagent's pattern for external agent communication: a **self-closing XML-like tag protocol** that external agents emit as plain text, which a parser matches and dispatches to actual tool implementations. Where OpenAI/Anthropic function-calling forces the model to emit structured JSON embedded in a `tool_use` block, nagent's bridge lets the model emit text containing `<nagent-read path="..."/>` tags. The parser matches the tag; `execute_read` runs. The model doesn't need to know the function-call schema — it just emits a tag.
|
||||
|
||||
In `nagent_takeaways_20260608.md:216-230`, this is explicitly reframed as a **bridge DSL** for Manual Slop's Meta-Tooling:
|
||||
|
||||
```
|
||||
<ms-tool name="read_file" path="src/foo.py" />
|
||||
<ms-tool name="py_get_skeleton" path="src/foo.py" symbol="MyClass" />
|
||||
```
|
||||
|
||||
The bridge script (`scripts/mma_exec.py` or a future `cli_tool_bridge.py`) translates these to underlying `mcp_client.py` tool calls. External agents (Gemini CLI, OpenCode) do NOT need to know the JSON function-calling schema for every Manual Slop tool — they just emit DSL tags.
|
||||
|
||||
### What We Take From It
|
||||
|
||||
nagent's Bridge DSL is the **provenance chain** for the Meta-Tooling DSL idea. It demonstrates that a tag-based protocol is more **debuggable** than JSON function-calling: you can `grep` for `<ms-tool` in logs, you can `cat` a conversation file and see the tool call inline with the text, and the format is readable without a JSON parser. The cost is that training data for tag protocols is near zero — function-calling wins on model capability. The resolution is **domain separation**: use function-calling for the Application AI (where training data and schema rigidity are assets), use the Bridge DSL for the Meta-Tooling (where debuggability and brevity win).
|
||||
|
||||
### Analysis
|
||||
|
||||
The Bridge DSL framing is at `nagent_takeaways_20260608.md:210-230`. Key passage at line 212-214:
|
||||
|
||||
> "nagent's pattern. `<nagent-read path="..."/>` is a self-closing tag. The model emits it; the parser matches; `execute_read` runs. The model doesn't need to know the function-call schema for the LLM SDK — it just needs to emit text containing a tag."
|
||||
|
||||
And at line 214:
|
||||
|
||||
> "Manual Slop today. `read_file(path)` is a function call. The model has to know the function signature, format the JSON, embed it in the right `tool_use` block. The training data for 'emit a `<nagent-read>` tag' is zero; the training data for 'emit a `read_file` tool call' is high. *Function calling wins on capability and on training*; *tag protocols win on debuggability*."
|
||||
|
||||
The actionable recommendation at line 216-222:
|
||||
|
||||
> "Actionable idea — both, but in different places. This is the *one* place where the existing reports lean toward 'different mechanism, both right.' Don't replace the Application's function calling. But for the Meta-Tooling, document a *Meta-Tooling DSL* in `conductor/code_styleguides/` for use by external agents when they need to invoke Manual Slop's tools via the bridge script. The DSL would look like:
|
||||
> ```
|
||||
> <ms-tool name="read_file" path="src/foo.py" />
|
||||
> <ms-tool name="py_get_skeleton" path="src/foo.py" symbol="MyClass" />
|
||||
> ```"
|
||||
|
||||
The `decisions.md:117-139` (Candidate 4: Intent-based DSL for Meta-Tooling tool calls) confirms the "EXPLICIT WANT" signal from the user and lays out the full design space. At `decisions.md:123-128`:
|
||||
|
||||
> "Examples (per the user's 'discovery' or 'combinatorics' hint):
|
||||
> - `<read src/foo.py:MyClass.method>` — intent: read this symbol
|
||||
> - `<search "execution clutch">` — intent: semantic search the workspace
|
||||
> - `<edit src/foo.py:42-50:new code>` — intent: surgical line-range edit
|
||||
> - `<test tests/test_foo.py::test_bar>` — intent: run a specific test
|
||||
> - `<discover what calls X>` — intent: dependency trace"
|
||||
|
||||
This is explicitly differentiated from the MCP DSL entry: nagent's Bridge DSL is a **bridge-side** protocol that lives between external agents and the `mcp_client.py` dispatch layer, whereas the MCP DSL is a **per-MCP compact dialect** that would compile to JSON. The Bridge DSL is a text-format protocol; the MCP DSL is a binary-ish compact token format.
|
||||
|
||||
The "why both right" argument at `nagent_takeaways_20260608.md:214` is the most important single claim in this cluster:
|
||||
|
||||
> "Function calling wins on capability and on training; tag protocols win on debuggability."
|
||||
|
||||
This is the architectural principle that justifies **two protocol stacks**: the JSON function-calling stack for the Application AI (capability + training) and the tag-based Bridge DSL for the Meta-Tooling (debuggability + brevity).
|
||||
|
||||
**Citations:** `conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md:210-230`, `conductor/tracks/nagent_review_20260608/decisions.md:117-139`
|
||||
|
||||
### Take
|
||||
|
||||
- The Bridge DSL is a **self-closing tag protocol** (`<ms-tool name="..." ... />`), not a JSON blob. It is readable as plain text and grep-able without a JSON parser.
|
||||
- The **domain split** is load-bearing: Application AI uses JSON function-calling (training data + capability). Meta-Tooling uses Bridge DSL (debuggability + brevity + no schema burden on the model).
|
||||
- The bridge script translates DSL tags → `mcp_client.py` tool calls. The translation layer is the **deployment point** for the DSL.
|
||||
- The DSL tags should carry **intent**, not just parameters: `<read src/foo.py:MyClass.method>` encodes "read this symbol specifically" as an intentional fragment, not just a path parameter.
|
||||
- **Training data gap**: the model has near-zero training data for emitting tag protocols. The Bridge DSL works for external Meta-Tooling agents (which can be prompted with the DSL spec directly) but would fail if used for the Application AI without significant fine-tuning.
|
||||
|
||||
---
|
||||
|
||||
## Entry: OpenAI Function-Calling Schema (2026 Baseline)
|
||||
|
||||
### What the Work Is
|
||||
|
||||
OpenAI's function-calling schema (as documented at `platform.openai.com/docs/guides/function-calling`) is the **current state-of-the-art JSON format** for AI tool invocation in 2026. It is the dominant baseline — the format most LLMs in production today emit when invoking tools. It uses a JSON Schema for tool definitions, an ID-based `tool_call` / `tool_call_id` round-trip for call-response matching, and a 5-step conversational loop (request → tool call → execute → response → final text). This is what the DSL is explicitly moving *away from* on the record-format dimension (per the user's note: "ignore its record formats as they probably will be less xml/json based"), but it is the standard that any DSL comparison must reference.
|
||||
|
||||
### What We Take From It
|
||||
|
||||
OpenAI function-calling establishes the **upper bound of schema rigor**: JSON Schema `strict` mode, `required` fields, `additionalProperties: false`, `enum` constraints, and pydantic/Zod integration. Any DSL that discards this rigor must compensate with runtime validation or narrower tool surface. OpenAI also introduces the **namespace** grouping (`"type": "namespace"`) for organizing tools by domain — this is directly relevant to the Tier 4 verb clustering.
|
||||
|
||||
### Analysis
|
||||
|
||||
The OpenAI function-calling documentation (`platform.openai.com/docs/guides/function-calling`) defines the canonical 5-step tool loop:
|
||||
|
||||
1. Make a request to the model with tools it could call
|
||||
2. Receive a tool call from the model
|
||||
3. Execute code on the application side with input from the tool call
|
||||
4. Make a second request to the model with the tool output
|
||||
5. Receive a final response from the model (or more tool calls)
|
||||
|
||||
The tool definition schema fields at `platform.openai.com/docs/guides/function-calling#defining-functions`:
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `type` | Always `"function"` |
|
||||
| `name` | Function name (e.g., `get_weather`) |
|
||||
| `description` | When and how to use the function |
|
||||
| `parameters` | JSON Schema defining input arguments |
|
||||
| `strict` | Whether to enforce strict mode |
|
||||
|
||||
The canonical function definition example:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "function",
|
||||
"name": "get_weather",
|
||||
"description": "Retrieves current weather for the given location.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"location": {
|
||||
"type": "string",
|
||||
"description": "City and country e.g. Bogotá, Colombia"
|
||||
},
|
||||
"units": {
|
||||
"type": "string",
|
||||
"enum": ["celsius", "fahrenheit"],
|
||||
"description": "Units the temperature will be returned in."
|
||||
}
|
||||
},
|
||||
"required": ["location", "units"],
|
||||
"additionalProperties": false
|
||||
},
|
||||
"strict": true
|
||||
}
|
||||
```
|
||||
|
||||
The tool call response format uses `tool_call_id` for matching and JSON-stringified `arguments`:
|
||||
|
||||
```json
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": [
|
||||
{
|
||||
"type": "tool_use",
|
||||
"id": "toolu_01A09q90qw90lq917835lq9",
|
||||
"name": "get_weather",
|
||||
"input": { "location": "San Francisco, CA" }
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
OpenAI's `namespace` grouping is significant for DSL design. At `platform.openai.com/docs/guides/function-calling#defining-namespaces`:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "namespace",
|
||||
"name": "crm",
|
||||
"description": "CRM tools for customer lookup and order management.",
|
||||
"tools": [
|
||||
{
|
||||
"type": "function",
|
||||
"name": "get_customer_profile",
|
||||
"description": "Fetch a customer profile by customer ID.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"customer_id": { "type": "string" }
|
||||
},
|
||||
"required": ["customer_id"],
|
||||
"additionalProperties": false
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
OpenAI's best practices (`platform.openai.com/docs/guides/function-calling#best-practices-for-defining-functions`) are the closest thing to an industry standard for tool design:
|
||||
|
||||
1. Write clear and detailed function names, parameter descriptions, and instructions
|
||||
2. Apply software engineering best practices — make functions obvious and intuitive; use enums to make invalid states unrepresentable
|
||||
3. Offload the burden from the model and use code where possible — don't make the model fill arguments you already know
|
||||
4. Keep the number of initially available functions small — aim for fewer than 20 functions available at the start of a turn
|
||||
|
||||
Point 4 is particularly relevant to the Tier 4 verb design: **fewer, more capable tools reduce selection ambiguity**. The DSL should prefer `<read src/foo.py:Symbol>` (one compound intent) over separate `<read_file path="..."/>` + `<py_get_symbol symbol="..."/>` calls.
|
||||
|
||||
OpenAI also explicitly addresses token cost at `platform.openai.com/docs/guides/function-calling#token-usage`:
|
||||
|
||||
> "Under the hood, functions are injected into the system message in a syntax the model has been trained on. This means callable function definitions count against the model's context limit and are billed as input tokens."
|
||||
|
||||
This is the direct motivation for the 8x reduction target in the MCP DSL entry: every token spent on tool schema is a token not available for reasoning.
|
||||
|
||||
**Citation:** `platform.openai.com/docs/guides/function-calling` (official OpenAI API documentation, 2026)
|
||||
|
||||
### Take
|
||||
|
||||
- OpenAI function-calling establishes the **schema rigor baseline**: JSON Schema with `strict`, `required`, `additionalProperties: false`, and `enum` constraints. Any DSL that drops these must add runtime validation at the dispatch boundary.
|
||||
- **Token cost is the primary constraint**: tool schemas are injected into the system prompt and billed as input tokens. The 8x reduction target (80 → 10 tokens) is directly motivated by this.
|
||||
- The **namespace grouping** (`"type": "namespace"`) is the right model for Tier 4 verb clustering — group related verbs by domain (file I/O, Python AST, search, etc.) rather than a flat list.
|
||||
- OpenAI's best practice of **fewer, more capable tools** is directly applicable: prefer `<read path:symbol>` compound intents over multiple single-parameter calls.
|
||||
- The **5-step conversational loop** (request → tool call → execute → response → final text) is the protocol skeleton the DSL must fit. The DSL replaces the JSON serialization step; it doesn't change the loop.
|
||||
|
||||
---
|
||||
|
||||
## Entry: Anthropic Tool-Use Schema (2026 Baseline)
|
||||
|
||||
### What the Work Is
|
||||
|
||||
Anthropic's tool-use schema (`docs.anthropic.com/en/docs/agents-and-tools/tool-use/define-tools`) is the **second dominant 2026 baseline** — structurally similar to OpenAI's but with key differences in philosophy and API shape. Where OpenAI uses `"type": "function"` with nested `"function"` object, Anthropic uses a flat structure with `name`, `description`, and `input_schema` as top-level fields. Anthropic also introduces `input_examples` as a first-class field for schema-validated examples, and `strict` as a guarantee mechanism (not just a hint). The `tool_choice` parameter (`auto`, `any`, `tool`, `none`) provides fine-grained control over whether Claude calls a tool at all.
|
||||
|
||||
### What We Take From It
|
||||
|
||||
Anthropic's tool-use schema demonstrates that **schema conformance can be guaranteed** via `strict: true` — this eliminates the class of errors where the model emits a tool call that partially matches the schema but fails validation. For the DSL, this means runtime validation at the dispatch boundary is not optional: the DSL must guarantee that emitted calls conform to the sub-MCP's JSON schema before reaching `invoke()`. Anthropic's `input_examples` field also suggests a pattern for **teaching the DSL** to models: provide concrete examples of well-formed calls alongside the grammar definition.
|
||||
|
||||
### Analysis
|
||||
|
||||
Anthropic's tool definition schema fields at `docs.anthropic.com/en/docs/agents-and-tools/tool-use/define-tools`:
|
||||
|
||||
| Parameter | Description |
|
||||
|-----------|-------------|
|
||||
| `name` | Must match regex `^[a-zA-Z0-9_-]{1,64}$` |
|
||||
| `description` | Detailed plaintext description of what the tool does, when to use, how it behaves |
|
||||
| `input_schema` | JSON Schema object defining expected parameters |
|
||||
| `input_examples` | Optional array of example input objects (schema-validated) to help Claude understand usage |
|
||||
|
||||
The canonical Anthropic tool definition:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "get_weather",
|
||||
"description": "Get the current weather in a given location",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"location": {
|
||||
"type": "string",
|
||||
"description": "The city and state, e.g. San Francisco, CA"
|
||||
},
|
||||
"unit": {
|
||||
"type": "string",
|
||||
"enum": ["celsius", "fahrenheit"],
|
||||
"description": "The unit of temperature, either 'celsius' or 'fahrenheit'"
|
||||
}
|
||||
},
|
||||
"required": ["location"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Anthropic's tool call response format:
|
||||
|
||||
```json
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": [
|
||||
{
|
||||
"type": "text",
|
||||
"text": "I'll help you check the current weather in San Francisco."
|
||||
},
|
||||
{
|
||||
"type": "tool_use",
|
||||
"id": "toolu_01A09q90qw90lq917835lq9",
|
||||
"name": "get_weather",
|
||||
"input": { "location": "San Francisco, CA" }
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
The `input_examples` field at `docs.anthropic.com/en/docs/agents-and-tools/tool-use/define-tools` is a key differentiator:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "get_weather",
|
||||
"description": "Get the current weather in a given location",
|
||||
"input_schema": { ... },
|
||||
"input_examples": [
|
||||
{"location": "San Francisco, CA", "unit": "fahrenheit"},
|
||||
{"location": "Tokyo, Japan", "unit": "celsius"},
|
||||
{"location": "New York, NY"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Anthropic's best practices (`docs.anthropic.com/en/docs/agents-and-tools/tool-use/define-tools#best-practices-for-tool-definitions`) are functionally identical to OpenAI's but with stronger language on description quality:
|
||||
|
||||
> "Provide extremely detailed descriptions. This is by far the most important factor in tool performance. Your descriptions should explain every detail about the tool, including: What the tool does, When it should be used (and when it shouldn't), What each parameter means and how it affects the tool's behavior, Any important caveats or limitations."
|
||||
|
||||
The `strict` parameter at `docs.anthropic.com/en/docs/agents-and-tools/tool-use/define-tools` is described as a **guarantee**, not a hint:
|
||||
|
||||
> "Add `strict: true` to your tool definitions to ensure Claude's tool calls always match your schema exactly."
|
||||
|
||||
And at `docs.anthropic.com/en/docs/agents-and-tools/tool-use/define-tools#forcing-tool-use`:
|
||||
|
||||
> "Combine `tool_choice: {"type": "any"}` with strict tool use to guarantee both that one of your tools will be called AND that the tool inputs strictly follow your schema."
|
||||
|
||||
The `tool_choice` control (`auto`, `any`, `tool`, `none`) is Anthropic's mechanism for forcing tool use. The `none` option prevents tool use entirely. The `tool` option forces a specific tool. The `any` option forces *some* tool to be called.
|
||||
|
||||
Anthropic's tool-use system prompt construction at `docs.anthropic.com/en/docs/agents-and-tools/tool-use/define-tools#tool-use-system-prompt` is also instructive:
|
||||
|
||||
> "When you call the Claude API with the `tools` parameter, the API constructs a special system prompt from the tool definitions, tool configuration, and any user-specified system prompt. The constructed prompt is designed to instruct the model to use the specified tool(s) and provide the necessary context for the tool to operate properly."
|
||||
|
||||
The constructed prompt injects: formatting instructions, tool definitions in JSON Schema format, user system prompt, and tool configuration. This is the same mechanism OpenAI uses — the schema is injected as part of the system prompt, confirming that **token cost is proportional to schema verbosity**.
|
||||
|
||||
**Citation:** `docs.anthropic.com/en/docs/agents-and-tools/tool-use/define-tools` (official Anthropic documentation, 2026)
|
||||
|
||||
### Take
|
||||
|
||||
- Anthropic's `strict: true` guarantees schema conformance. The DSL **must** have a runtime validation layer at the dispatch boundary that rejects non-conformant calls before they reach `invoke()`. Without this, the DSL inherits the class of "partial schema match" bugs that `strict` was designed to eliminate.
|
||||
- **`input_examples` as first-class schema field** is a model for how to teach the DSL: provide 2-3 schema-validated examples of well-formed calls alongside the grammar definition. This is the DSL equivalent of Anthropic's `input_examples` — concrete instances, not just rules.
|
||||
- The **`tool_choice` control** (`auto`/`any`/`tool`/`none`) maps to Tier 4 verb design: `fuzzy` corresponds to `auto` (let the model decide), `try`/`recover` corresponds to `any` (must call something), and `assumewide` corresponds to forcing a broad-capability tool.
|
||||
- Anthropic's **flat tool structure** (no `{"type": "function", "function": {...}}` nesting) is simpler to parse and generates less JSON overhead. A DSL targeting similar brevity should prefer flat attribute lists over nested structures.
|
||||
- The **tool-use system prompt** is constructed by the provider from the schema — confirming that the DSL's grammar definition feeds the same injection mechanism as JSON Schema. The DSL must be **serializable to the schema format** the provider expects, or the schema must be derived from the grammar.
|
||||
|
||||
---
|
||||
|
||||
## Synthesis for the DSL
|
||||
|
||||
This section maps each Tier 4 verb to the entry that grounds it, providing the justification chain for section 4's Tier 4 verb justifications.
|
||||
|
||||
### `fuzzy`
|
||||
|
||||
**Grounded by:** Entry 2 (nagent Bridge DSL) + Entry 1 (MCP DSL)
|
||||
|
||||
`fuzzy` encodes the "discover what calls X" / "semantic search" intent from `decisions.md:128`. nagent's Bridge DSL is explicitly designed for **discovery and combinatorics** (per the user's hint at `decisions.md:119`). The DSL tag protocol is more suited to fuzzy matching than JSON function-calling because the tag format is self-delimiting and grep-able: `<discover what calls X>` is a single readable token, whereas the equivalent JSON function call requires knowing the exact tool name and parameter schema. The MCP DSL's per-MCP grammar organization supports `fuzzy` at the grammar level: each sub-MCP's grammar can define `fuzzy` as a compound intent that expands to multiple underlying tool calls.
|
||||
|
||||
### `try` / `recover`
|
||||
|
||||
**Grounded by:** Entry 2 (nagent Bridge DSL) + Entry 3 (OpenAI)
|
||||
|
||||
`try` / `recover` encodes nagent's visible retry pattern (`nagent_takeaways_20260608.md:182-206`). The nagent pattern appends a `<system>` correction entry to the conversation on parse failure, so the model sees its own failure and the correction. This is the protocol-level equivalent of `try` / `recover`: attempt the call, and if it fails (parse failure, not-found, error), recover by injecting a correction. OpenAI's 5-step conversational loop (`platform.openai.com/docs/guides/function-calling#the-tool-calling-flow`) provides the structural skeleton: the loop is inherently a try/recover cycle (execute → return result → model decides next step). The Bridge DSL's tag protocol makes this cycle visible and editable in the conversation log — each `try` / `recover` round-trip is a visible `<ms-tool>` / `<system>` tag pair.
|
||||
|
||||
### `sandbox`
|
||||
|
||||
**Grounded by:** Entry 3 (OpenAI) + Entry 4 (Anthropic)
|
||||
|
||||
`sandbox` is not directly present in OpenAI or Anthropic schemas (neither provider has a native sandbox concept), but both providers document **tool execution environments** that imply sandboxing. OpenAI's `computer use` tool (`platform.openai.com/docs/guides/tools-computer-use`) and Anthropic's `code_execution` tool are the canonical examples: the tool runs in an isolated environment, returns output, and the model continues. The DSL's `sandbox` verb should map to the pattern of "execute in isolated environment, return semantic result" — which is the dominant pattern across both providers' tool ecosystems. The `SubMCP` architecture from Entry 1 (`spec.md:65-84`) provides the deployment model: `mcp_analysis.py` (with `derive_code_path`, `get_ui_performance`) is the natural home for sandboxed analysis tools.
|
||||
|
||||
### `audit`
|
||||
|
||||
**Grounded by:** Entry 1 (MCP DSL) + Entry 2 (nagent Bridge DSL)
|
||||
|
||||
`audit` is grounded in nagent's self-describing tool pattern (`nagent_takeaways_20260608.md:234-249`), which is the conceptual model for `SubMCP.list_tool_schemas()` (`spec.md:75-80`). The `list_tool_schemas()` method is the audit mechanism: it is the self-reporting interface that lets the DSL (and any external consumer) discover what tools exist without consulting a hard-coded registry. The Bridge DSL's `--description` pattern from nagent (`nagent_takeaways_20260608.md:236-242`) extends this to the command line: `bin/nagent:exit_on_description(description)` prints the tool description and exits when `--description` is in `argv`. For the DSL, `audit` means: enumerate all available tools with their schemas, descriptions, and parameter constraints. This is `MCPController.get_tool_schemas()` — it is the audit verb materialized as a method.
|
||||
|
||||
### `didyoumean`
|
||||
|
||||
**Grounded by:** Entry 2 (nagent Bridge DSL) + Entry 4 (Anthropic)
|
||||
|
||||
`didyoumean` is grounded in the Bridge DSL's **intent-based design** (`decisions.md:123-128`), where the DSL tags encode intent rather than just parameters. `<read src/foo.py:MyClass.method>` is a `read` call with a `didyoumean`-style refinement built into the symbol resolution. The Anthropic `input_examples` field (`docs.anthropic.com/en/docs/agents-and-tools/tool-use/define-tools#providing-tool-use-examples`) provides the model-side equivalent: providing concrete examples helps the model "guess" the right tool and parameters even when the exact match isn't in the training data. `didyoumean` as a Tier 4 verb means: given an ambiguous intent, propose the closest matching tool(s) and parameters, formatted as DSL suggestions the model can adopt directly.
|
||||
|
||||
### `span`
|
||||
|
||||
**Grounded by:** Entry 1 (MCP DSL) + Entry 3 (OpenAI)
|
||||
|
||||
`span` is grounded in the MCP DSL's per-MCP grammar design (`spec.md:456-465`) and OpenAI's **namespace grouping** (`platform.openai.com/docs/guides/function-calling#defining-namespaces`). A `span` in the DSL context means: given a compound intent, decompose it into the appropriate sub-MCP grammar range. For example, `<read src/foo.py:42-50>` spans the `read_file` tool and the `get_file_slice` tool within `mcp_file_io`. OpenAI's namespace grouping shows how to organize tools by domain: the CRM namespace groups `get_customer_profile` and `list_open_orders`. The DSL's `span` should similarly group related tools and provide domain-level dispatch rather than requiring the model to know each individual tool.
|
||||
|
||||
### `offset`
|
||||
|
||||
**Grounded by:** Entry 1 (MCP DSL) + Entry 3 (OpenAI)
|
||||
|
||||
`offset` is grounded in the MCP DSL's line-range notation (`spec.md:456`: `py k /src/foo.py` with an implied offset for the symbol within the file) and OpenAI's **parameter design principles** (`platform.openai.com/docs/guides/function-calling#best-practices-for-defining-functions`): "Don't make the model fill arguments you already know." `offset` as a Tier 4 verb means: the DSL should support **implicit offset resolution** — given a symbol name, resolve it to a file:line without requiring the model to specify the line number explicitly. This is the difference between `<read src/foo.py:MyClass.method>` (offset resolved by the DSL parser) and `<read_file path="src/foo.py">` (no offset, model must specify line range manually).
|
||||
|
||||
### `assumewide`
|
||||
|
||||
**Grounded by:** Entry 3 (OpenAI) + Entry 4 (Anthropic)
|
||||
|
||||
`assumewide` is grounded in OpenAI's best practice of **fewer, more capable tools** (`platform.openai.com/docs/guides/function-calling#best-practices-for-defining-functions`: "Keep the number of initially available functions small for higher accuracy. Aim for fewer than 20 functions available at the start of a turn.") and Anthropic's `tool_choice: {"type": "tool", "name": "..."}` force-call mechanism (`docs.anthropic.com/en/docs/agents-and-tools/tool-use/define-tools#forcing-tool-use`). `assumewide` means: given a broad or ambiguous intent, select the most capable matching tool (the one with the widest parameter range, the most general description) rather than a narrow specialist. OpenAI's namespace grouping supports this: a `crm.*` namespace call dispatches to the most appropriate CRM tool based on the intent, not a specific named tool. `assumewide` as a verb means: apply the "fewer, more capable" heuristic at call time — prefer tools that can handle a range of inputs over tools that require precise parameter matching.
|
||||
|
||||
---
|
||||
|
||||
## Summary: Entry-to-Verb Mapping
|
||||
|
||||
| Tier 4 Verb | Primary Entry | Secondary Entry | Key Mechanism |
|
||||
|-------------|---------------|-----------------|---------------|
|
||||
| `fuzzy` | Entry 2 (nagent Bridge DSL) | Entry 1 (MCP DSL) | Tag protocol for discovery + per-MCP grammar composition |
|
||||
| `try` / `recover` | Entry 2 (nagent Bridge DSL) | Entry 3 (OpenAI) | Visible retry cycle; 5-step conversational loop |
|
||||
| `sandbox` | Entry 3 (OpenAI) | Entry 4 (Anthropic) | Isolated execution environments; tool-use system prompt |
|
||||
| `audit` | Entry 1 (MCP DSL) | Entry 2 (nagent Bridge DSL) | `SubMCP.list_tool_schemas()` self-reporting; `--description` pattern |
|
||||
| `didyoumean` | Entry 2 (nagent Bridge DSL) | Entry 4 (Anthropic) | Intent-based DSL tags; `input_examples` for disambiguation |
|
||||
| `span` | Entry 1 (MCP DSL) | Entry 3 (OpenAI) | Per-MCP grammar decomposition; namespace grouping |
|
||||
| `offset` | Entry 1 (MCP DSL) | Entry 3 (OpenAI) | Symbol resolution in DSL parser; "don't make model fill known args" |
|
||||
| `assumewide` | Entry 3 (OpenAI) | Entry 4 (Anthropic) | Fewer-capable-tools heuristic; `tool_choice` force-call |
|
||||
|
||||
---
|
||||
|
||||
*End of Cluster 4 sub-report. Total entries: 4. All claims have citations.*
|
||||
@@ -8,7 +8,7 @@
|
||||
|
||||
> **Purpose.** This track produces a single research report: a survey of intent-based scripting languages as a design philosophy, plus a proposed vocabulary for a Meta-Tooling-facing intent DSL. The report is the *foundation document* for the user's nagent v2.2 report (its "Future-Track Candidate #4: Intent-based DSL" section) and for the future `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER` placeholder. The track is *research-only*; no interpreter, no integration code.
|
||||
|
||||
> **Companion doc.** The actual report is at `docs/ideation/2026-06-12-intent-based-scripting-languages.md`. This `spec.md` is the conductor/track wrapper: the design intent, the relationship to the existing project's tech stack, the 7 report sections and their content, the open questions, the out-of-scope notes, and the verification criteria.
|
||||
> **Companion doc.** The actual report is at `conductor/tracks/intent_dsl_survey_20260612/report.md`. This `spec.md` is the conductor/track wrapper: the design intent, the relationship to the existing project's tech stack, the 7 report sections and their content, the open questions, the out-of-scope notes, and the verification criteria.
|
||||
|
||||
> **Time-sensitivity.** Per the user, the report must be complete *before* nagent v2.2 ships. The track has a single user-approval gate at the end of phase 4; the report can be paused at any phase boundary without losing work.
|
||||
|
||||
@@ -222,7 +222,7 @@ At least 6 open questions that the follow-up B track (interpreter prototype) mus
|
||||
|
||||
## 4. Per-Section Content Boundaries
|
||||
|
||||
The 7 sections are all written into a single markdown file at `docs/ideation/2026-06-12-intent-based-scripting-languages.md`. The file is organized as:
|
||||
The 7 sections are all written into a single markdown file at `conductor/tracks/intent_dsl_survey_20260612/report.md`. The file is organized as:
|
||||
|
||||
- **Header:** track name, date, author, status, "what this is / what this is not" callout
|
||||
- **Section 1 (~2-3 pages):** the philosophy
|
||||
@@ -257,7 +257,7 @@ The "testing" of the *report itself* is whether the user finds it useful, well-g
|
||||
|
||||
The report is a *standalone artifact*. No migration required:
|
||||
|
||||
- The `docs/ideation/2026-06-12-intent-based-scripting-languages.md` file is added to the project tree.
|
||||
- The `conductor/tracks/intent_dsl_survey_20260612/report.md` file is added to the project tree.
|
||||
- `conductor/tracks.md` is updated to register the track as completed.
|
||||
- A git note is attached to the commit per `conductor/workflow.md` §"Task Workflow" step 9.2.
|
||||
- The placeholder `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER` is *not* modified. The report's section 7 names the connection points so the placeholder track can be filled with the report's vocab when it's specced.
|
||||
@@ -302,7 +302,7 @@ This track is independent — no blockers. It can be started immediately.
|
||||
|
||||
The track is "done" when all of the following are true:
|
||||
|
||||
- [ ] The 7 sections of the report are present and non-empty in `docs/ideation/2026-06-12-intent-based-scripting-languages.md`
|
||||
- [ ] The 7 sections of the report are present and non-empty in `conductor/tracks/intent_dsl_survey_20260612/report.md`
|
||||
- [ ] Every prior-art claim in section 2 cites a specific source (transcript line, README section, Wikipedia article section, or `file:line` for project files)
|
||||
- [ ] The user's pseudocode grammar is formalized in section 3 with examples drawn from the `determinate`/`minor`/`matrix-transpose` snippets
|
||||
- [ ] Every 4-tier verb in section 4 has: signature, one-line semantics, one example, "borrowed from" note, and an SSDL shape tag
|
||||
|
||||
@@ -1,113 +0,0 @@
|
||||
# Intent-Based Scripting Languages
|
||||
|
||||
**Track:** `intent_dsl_survey_20260612` (initialized 2026-06-12)
|
||||
**Date:** 2026-06-12
|
||||
**Author:** Tier 1 Orchestrator (sections 1, 3, 4, 5, 6, 7, Appendix); Tier 2 sub-agents (section 2 clusters 0, 1, 2, 3, 4)
|
||||
**Status:** Outline draft (phase 1 of 4)
|
||||
|
||||
> **What this is.** A survey of intent-based scripting languages as a design philosophy, plus a proposed vocabulary (~40 verbs across 4 tiers) for a Meta-Tooling-facing intent DSL. The report is the foundation document for the user's nagent v2.2 (its "Future-Track Candidate #4" section) and for the future interpreter prototype (follow-up B track).
|
||||
>
|
||||
> **What this is NOT.** Not an interpreter, not a bridge script, not Application-side function-calling, not XML/JSON record formats. The DSL is Meta-Tooling-side per `docs/guide_meta_boundary.md` — the format external agents (Gemini CLI, OpenCode) emit when invoking `mcp_client.py` tools.
|
||||
|
||||
---
|
||||
|
||||
## 1. The "Intent-Based" Design Philosophy
|
||||
|
||||
*[STUB: 4 anchor claims — O'Donnell immediate-mode as the philosophical anchor; Onat/Lottes hardware-pipeline model as the truth the verbs must map to; CoSy open-vocabulary culture as the user-surface principle; Jofito intent-mapping as the framing that names the design philosophy. ~2-3 pages.]*
|
||||
|
||||
## 2. Prior Art Survey (8 Clusters)
|
||||
|
||||
This section surveys the design lineage across 8 clusters. Each entry: 2-3 sentences on the design idea, 2-3 sentences on what we take from it (or, in cluster 3, what we explicitly reject). Every entry cites a specific source.
|
||||
|
||||
### Cluster 0 — Immediate-Mode Paradigm (philosophical anchor)
|
||||
*[STUB: John O'Donnell's IMGUI/MVC essays. ~0.5-1 page.]*
|
||||
|
||||
### Cluster 1 — Concatenative (Forth family)
|
||||
*[STUB: Forth, ColorForth, KYRA/Onat, x68/Lottes, Joy, CoSy. ~1-1.5 pages.]*
|
||||
|
||||
### Cluster 2 — Array
|
||||
*[STUB: APL, K, BQN, Uiua. ~0.5 page.]*
|
||||
|
||||
### Cluster 3 — Intent-Mapping
|
||||
*[STUB: Jofito, jq, nagent's tag protocol (REJECTED as a model — we take the structured-protocol idea but not the XML angle brackets), Wasm. ~0.5-1 page.]*
|
||||
|
||||
### Cluster 4 — Meta-Tooling DSLs and Agent-Facing Languages
|
||||
*[STUB: mcp_dsl_20260606 placeholder, nagent's Bridge DSL, OpenAI function-calling, Anthropic tool-use. ~0.5 page.]*
|
||||
|
||||
### Cluster 5 — SSDL Shape Primitives
|
||||
*[STUB: 6 primitives + 7 modifiers per `docs/reports/computational_shapes_ssdl_digest_20260608.md` §1. The meta-vocabulary used to annotate the verbs in section 4. ~0.25 page.]*
|
||||
|
||||
### Cluster 6 — Project's Own Command DSL Precedents
|
||||
*[STUB: the 33 Command Palette commands per `docs/guide_command_palette.md` and `src/commands.py`. The DSL is a richer superset; "Everything" mode is a near-term use case. ~0.25 page.]*
|
||||
|
||||
### Cluster 7 — Data-Oriented Error Handling Convention
|
||||
*[STUB: `Result[T]` + `ErrorInfo` per `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §3.3. The DSL's `try`/`recover`/`sandbox`/`didyoumean` verbs return `Result[T]`. ~0.25 page.]*
|
||||
|
||||
## 3. The Grammar
|
||||
|
||||
*[STUB: 14 primitives formalized from the user's math pseudocode (`determinate`/`minor`/`matrix-transpose` snippets). Each primitive: symbol, name, signature, one-line semantics, example, "borrowed from" note. Plus 3 known ambiguity flags (`proc` placement, `++` → `+= 1`, `m[row][column]` → `m[row, col]`). Plus precedence rules (left-to-right for `->` chains, `()` for grouping) and AI-fuzzing tolerance rules (CoSy-style modulo indexing, structured recovery anchors via `{}`, line/offset independence). Plus the error envelope (`try { ... } recover { ... }` returns `Result[T]`). ~2-3 pages.]*
|
||||
|
||||
## 4. The 4-Tier Vocab (~40 Verbs)
|
||||
|
||||
Each verb: symbol, name, signature, one-line semantics, one example, "borrowed from" note, SSDL shape tag. Tier 2 and Tier 3 verbs also have a "maps to mcp_client tool" column.
|
||||
|
||||
### Tier 1 — Math (~10 verbs)
|
||||
*[STUB: from the user's pseudocode. `:=`, `stack { }`, `for x .. n`, `+`, `-`, `*`, `/`, `^`, `sum`, `product`, `a[i,j]`, `if/then`.]*
|
||||
|
||||
### Tier 2 — Data-Oriented Pipeline (~12 verbs)
|
||||
*[STUB: Onat/Lottes/Jofito lineage. `scan`, `select`, `filter`, `map`, `fold`, `sort`, `group`, `dedupe`, `arena { }`, `scatter`, `gather`, `pipe`. The verbs that wrap the existing 45+ MCP tools.]*
|
||||
|
||||
### Tier 3 — Shell (~10 verbs)
|
||||
*[STUB: the OS surface. `exec`, `open`, `read`, `write`, `close`, `path`, `env`, `wait`, `poll`, `cwd`.]*
|
||||
|
||||
### Tier 4 — AI-Fuzzing Tolerance (~8 verbs — the novel contribution)
|
||||
*[STUB: the verbs that make the DSL work for AI agents that may fuzz verb names, indent inconsistently, or offset line references. `fuzzy`, `try { ... } recover { ... }`, `sandbox { ... }`, `audit`, `didyoumean`, `span`, `offset`, `assumewide`. The `sandbox` verb is O'Donnell's IEventTarget pattern applied to the DSL; the `audit` verb is the IEventTarget itself.]*
|
||||
|
||||
## 5. Hardware Mapping (4 Anchor Claims)
|
||||
|
||||
*[STUB: 4 anchor claims tying the vocab to actual hardware/software stages.*
|
||||
|
||||
1. **Onat/Lottes, hardware:** the 2-register stack + magenta pipe + basic blocks + lambdas + preemptive scatter (per `C:\projects\forth\bootslop\references\kyra_in-depth.md`, `forth_day_2020_in-depth.md`, `neokineogfx_in-depth.md`, `X.com - Onat & Lottes Interaction 1.png.ocr.md`) → our `->`, `[ ]`, `arena { }`, `scatter`/`gather`.
|
||||
2. **O'Donnell, paradigm:** the DSL's pipeline is *immediate-mode in pipeline composition*. Per `https://johno.se/book/imgui.html`. The `->` chain has no "pipeline object" you can query, name, or pass around.
|
||||
3. **Forth/CoSy, syntax:** concatenative syntax is immediate-mode in tokenization (whitespace-delimited, no precedence), evaluation (each verb pops args, pushes results), and parsing (no AST object retained after parse).
|
||||
4. **APL/K, data:** array languages are immediate-mode in data representation. The DSL's `for x .. n` + `result[row, col]` inherits the "no array object" property.
|
||||
|
||||
*~1-2 pages.]*
|
||||
|
||||
## 6. AI-Agent Properties (10 Claims)
|
||||
|
||||
*[STUB: 10 claims tying the DSL to the existing project's architecture so future tracks can build on it without re-deriving the design.*
|
||||
|
||||
1. Domain = Meta-Tooling (per `docs/guide_meta_boundary.md`).
|
||||
2. Runtime path = external agent → DSL text → bridge script → MCP → optional Hook API approval (per `docs/guide_meta_boundary.md` §"The Inter-Domain Bridges").
|
||||
3. 3-layer security (per `docs/guide_tools.md` §"The MCP Bridge"): the parser rejects DSL statements that target tools outside the allowlist.
|
||||
4. 4 memory dimensions (per `conductor/tracks/nagent_review_20260608/nagent_review_v2_1_20260612.md` §2.1): the DSL does not replace any memory dimension.
|
||||
5. Stable-to-volatile cache ordering (per nagent v2.1 §2.2): the DSL's `arena { }` blocks are cache-friendly.
|
||||
6. `Result[T]` envelope (per `conductor/tracks/data_oriented_error_handling_20260606/spec.md`): `try`/`recover` verbs return `Result[T]`.
|
||||
7. Command Palette 33 commands (per `docs/guide_command_palette.md`): the DSL is a richer superset; "Everything" mode is a near-term use case.
|
||||
8. Hook API state fields (per `docs/guide_state_lifecycle.md` §"Hook API Surface"): DSL verbs route through `_predefined_callbacks` and `_gettable_fields`.
|
||||
9. O'Donnell's IEventTarget pattern as the `sandbox` verb (per `https://johno.se/book/mvc.html` §"Writing to Model state").
|
||||
10. O'Donnell's "reads are free" claim (per `https://johno.se/book/mvc.html` §"Reading Model state"): Tier 2 verbs are read-only and re-evaluable.
|
||||
|
||||
*~2-3 pages.]*
|
||||
|
||||
## 7. Open Questions for Follow-up B (≥6)
|
||||
|
||||
*[STUB: open questions that the follow-up B track (interpreter prototype) must answer, plus the connection block to the `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER`.*
|
||||
|
||||
1. How does `arena { }` map to Onat's preemptive scatter? Block-as-tape vs wrapper-allocates-tape?
|
||||
2. Where does "intent resolution" live? Per-verb option, per-block modifier, or global parser mode?
|
||||
3. How does `audit` interact with Manual Slop's existing `comms.log`? Separate JSON-L or merged?
|
||||
4. Does `sandbox` produce `Result[T, ErrorInfo]` (Fleury pattern)?
|
||||
5. `didyoumean` recovery: parser feature or user-facing verb?
|
||||
6. How does `for x .. n` interact with Tier 2's `filter`/`map`? Sugar or distinct?
|
||||
7. How does `sandbox` map to Manual Slop's `pre_tool_callback` flow? New audit log or fold into existing?
|
||||
8. Connection to `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER`: minimum vocab subset for one round-trip end-to-end?
|
||||
|
||||
*~1-2 pages.]*
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Bibliography
|
||||
|
||||
*[STUB: full file:line / URL references for all 8 prior-art clusters + the project's own references. Grouped by cluster. Each entry: 1 line with the source identifier and the file:line or URL.]*
|
||||
Reference in New Issue
Block a user