conductor(track): Add intent_dsl_survey_20260612 spec
Foundation research track. Produces a single markdown report at docs/ideation/2026-06-12-intent-based-scripting-languages.md surveying intent-based scripting languages and proposing a 4-tier vocab (~40 verbs) for a Meta-Tooling-facing intent DSL. The report's 7 sections: 1. The 'intent-based' design philosophy (O'Donnell immediate-mode, Onat/Lottes hardware, CoSy open-vocab, Jofito intent-mapping) 2. Prior art across 8 clusters (0: IMGUI, 1: Concatenative, 2: Array, 3: Intent-mapping, 4: Meta-Tooling, 5: SSDL shapes, 6: Command Palette, 7: Result error handling) 3. The grammar (14 primitives formalized from user's pseudocode) 4. The 4-tier vocab (math, data pipeline, shell, AI-fuzzing tolerance) 5. Hardware mapping (4 anchor claims to Onat/Lottes/O'Donnell/APL-K) 6. AI-agent properties (10 claims tying to existing project architecture: Meta-Tooling domain, 3-layer security, 4 memory dimensions, stable-to-volatile cache, Result envelope, Command Palette 33 commands, Hook API, IEventTarget/sandbox, 'reads are free') 7. Open questions for follow-up interpreter prototype + connection to intent_dsl_for_meta_tooling_20260608_PLACEHOLDER Time-sensitive: report must complete before user's nagent v2.2. No new src/ code, no new tests, no pyproject.toml changes. Pure research deliverable.
This commit is contained in:
@@ -0,0 +1,361 @@
|
||||
# Track: Intent-Based Scripting Languages Survey
|
||||
|
||||
**Status:** Spec approved 2026-06-12
|
||||
**Initialized:** 2026-06-12
|
||||
**Owner:** Tier 1 Orchestrator (spec); Tier 2 Tech Lead (plan + execution)
|
||||
**Priority:** Medium-High (research deliverable; time-sensitive because the report's conclusions feed into the user's nagent v2.2 report)
|
||||
**Domain:** Meta-Tooling (the report is a *research deliverable*; the track produces no Application code)
|
||||
|
||||
> **Purpose.** This track produces a single research report: a survey of intent-based scripting languages as a design philosophy, plus a proposed vocabulary for a Meta-Tooling-facing intent DSL. The report is the *foundation document* for the user's nagent v2.2 report (its "Future-Track Candidate #4: Intent-based DSL" section) and for the future `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER` placeholder. The track is *research-only*; no interpreter, no integration code.
|
||||
|
||||
> **Companion doc.** The actual report is at `docs/ideation/2026-06-12-intent-based-scripting-languages.md`. This `spec.md` is the conductor/track wrapper: the design intent, the relationship to the existing project's tech stack, the 7 report sections and their content, the open questions, the out-of-scope notes, and the verification criteria.
|
||||
|
||||
> **Time-sensitivity.** Per the user, the report must be complete *before* nagent v2.2 ships. The track has a single user-approval gate at the end of phase 4; the report can be paused at any phase boundary without losing work.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
This track surveys **intent-based scripting languages** as a design philosophy and proposes a *succinct, effective vocabulary* for a Meta-Tooling-facing intent DSL. The vocabulary is designed to:
|
||||
|
||||
- Map cleanly onto **data-oriented hardware pipelines** (Onat Türkçüoğlu's KYRA/VAMP, Timothy Lottes's x68/5th — per `C:\projects\forth\bootslop\references\`)
|
||||
- Serve as a **shell-replacement** for AI agent tool calls (per Jody Bruchon's Jofito — per `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt`)
|
||||
- Compose via an **immediate-mode paradigm** (per John O'Donnell's IMGUI/MVC essays — per `https://johno.se/book/*`)
|
||||
- Tolerate **AI idiosyncrasies** (indentation fuzz, line-offset fuzz, verb-name fuzz) via structured recovery anchors
|
||||
- Coexist with the existing project's **45+ MCP tools** (per `docs/guide_tools.md` §"Native Tool Inventory") without becoming an XML/JSON blob
|
||||
|
||||
The report is the deliverable; the track has no Application code. Follow-up tracks (interpreter prototype, bridge script, integration with the `mcp_dsl_20260606` placeholder) are explicitly out of scope and will be planned separately.
|
||||
|
||||
## 2. Goals (Priority Order)
|
||||
|
||||
| Priority | Goal | Rationale |
|
||||
|---|---|---|
|
||||
| **A (foundational)** | Section 1 of the report — formalize "intent-based" as a design philosophy. Unify the Onat/Lottes hardware model, O'Donnell's immediate-mode paradigm, CoSy's open-vocabulary culture, Jofito's "intent mapping engine" framing, and the project's own `nagent_review_20260608` v2.1 "durable data, disposable workers" thesis into a single narrative. | Establishes the unifying claim the rest of the report builds on. Without this, the vocab section is just a list of verbs. |
|
||||
| **A (foundational)** | Section 2 of the report — prior art survey across 8 clusters (see §3.2 below). Every entry: 2-3 sentences on the design idea, 2-3 sentences on what we take from it. | Establishes the design lineage so the vocab section's "borrowed from" notes are grounded. |
|
||||
| **A (foundational)** | Section 3 of the report — formalize the grammar from the user's math pseudocode (the `determinate`/`minor`/`matrix-transpose` snippets shared during spec review). 14 primitives with examples drawn from those snippets. | The grammar is the most concrete deliverable; it's what the user's nagent v2.2 report will reference. |
|
||||
| **A (primary value)** | Section 4 of the report — the 4-tier vocab (~40 verbs). Tier 1 (math from user's pseudocode, ~10 verbs), Tier 2 (data-oriented pipeline, ~12 verbs), Tier 3 (shell, ~10 verbs), Tier 4 (AI-fuzzing tolerance, ~8 verbs). Each verb: signature, one-line semantics, one example, "borrowed from" note, SSDL shape tag. | The vocab is the report's primary value. Tier 4 is the novel contribution; the other tiers are the necessary substrate. |
|
||||
| **A (primary value)** | Section 5 of the report — the hardware mapping. 4 anchor claims tying the verbs to Onat/Lottes hardware (Cluster 1), O'Donnell's paradigm (Cluster 0), Forth/CoSy syntax (Cluster 1), and APL/K data (Cluster 2). | Establishes that the verbs are not arbitrary; they map to real hardware stages. |
|
||||
| **B (architectural)** | Section 6 of the report — the AI-agent properties. 10 claims tying the DSL to the existing project's architecture: Meta-Tooling domain (per `docs/guide_meta_boundary.md`), runtime path through `cli_tool_bridge.py` (per `docs/guide_meta_boundary.md` §"The Inter-Domain Bridges"), 3-layer security (per `docs/guide_tools.md` §"The MCP Bridge"), 4 memory dimensions (per `conductor/tracks/nagent_review_20260608/nagent_review_v2_1_20260612.md` §2.1), stable-to-volatile cache ordering (per nagent v2.1 §2.2), `Result[T]` envelope (per `conductor/tracks/data_oriented_error_handling_20260606/spec.md`), Command Palette 33 commands (per `docs/guide_command_palette.md`), Hook API state fields (per `docs/guide_state_lifecycle.md` §"Hook API Surface"), O'Donnell's IEventTarget pattern as the `sandbox` verb, O'Donnell's "reads are free" claim as the rationale for cheap verbs. | Connects the report's vocab to the existing project so future tracks can build on it without re-deriving the architecture. |
|
||||
| **C (research)** | Section 7 of the report — open questions for the follow-up B track (interpreter prototype) and connection points to the `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER`. At least 6 open questions + the placeholder connection. | The report is the *foundation* document; the open questions make explicit what the follow-up must answer. |
|
||||
| **C (research)** | The placeholder track `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER` is *not* consumed by this track. Per `conductor/tracks/nagent_review_20260608/metadata.json:28`, the placeholder is a separate, downstream track. The report's section 7 explicitly names the connection points so the placeholder can be filled with the report's vocab. | The placeholder and the survey are different artifacts at different abstraction levels. |
|
||||
| **D (forward-looking)** | The report's vocab section includes a "borrowed from" note for each verb pointing to the specific prior-art entry. The report is *reference-able* by future agents. | Future code-gen agents (the user's primary use case per the original message) can cite specific verbs with provenance. |
|
||||
| **D (forward-looking)** | A new follow-up B track (interpreter prototype) is *named* in the report's section 7 but **not** planned in this spec. Per the user's instruction: "A for this track, with B as a separate track maybe, a sort of experimental sub-project to try this stuff out." | Keeps this track focused on the report; the prototype gets its own track when the user is ready. |
|
||||
|
||||
### 2.1 Non-Goals (this track)
|
||||
|
||||
- **Not** building an interpreter. The follow-up B track (separate, future) is the prototype.
|
||||
- **Not** writing a bridge script. The placeholder `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER` track (separate, future) is the bridge.
|
||||
- **Not** modifying the Application's provider-native function-calling. The DSL is **Meta-Tooling-side** (per `docs/guide_meta_boundary.md` §"Domain 2: The Meta-Tooling"); the Application's function-calling is unchanged.
|
||||
- **Not** consuming the `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER` placeholder. The two tracks are different.
|
||||
- **Not** adopting XML/JSON record formats. Per the user: "ignore its record formats as they problably will be less xml/json based as I don't like them." nagent's tag protocol is *mentioned* in the prior art (Cluster 3) but explicitly *rejected* as a model.
|
||||
- **Not** adding new `src/` code, new tests, or new `pyproject.toml` dependencies. The track produces only a markdown report.
|
||||
- **Not** doing the user-approval gate until the *end* of phase 4. The first 3 phases are self-directed (gathering + writing + self-review); the user sees the final report and approves or iterates.
|
||||
- **Not** creating the standard `metadata.json` or `state.toml` until *after* the spec is approved. The spec-first pattern (per `conductor/workflow.md` §"Task Workflow" + this track's plan to be authored by the `writing-plans` skill) means the metadata and state are written when the plan is written.
|
||||
|
||||
## 3. Architecture
|
||||
|
||||
The report is the architecture. The 7 sections, in order, are:
|
||||
|
||||
### 3.1 Section 1 — The "Intent-Based" Design Philosophy
|
||||
|
||||
The unifying narrative. 4 anchor claims that tie the report together:
|
||||
|
||||
1. **"Intent-based" means the user's words are declarative intent, not imperative commands** (Jofito's "decompose intent into platform-optimal ops" framing).
|
||||
2. **The hardware is the truth** — the verbs must map to real data-oriented pipeline stages (Onat/Lottes, per `C:\projects\forth\bootslop\references\kyra_in-depth.md` and `X.com - Onat & Lottes Interaction 1.png.ocr.md`).
|
||||
3. **The pipeline is immediate-mode** — no Pipeline object, no retained state, just the verb call that produces output (O'Donnell's "widgets are method invocations, not objects", per `https://johno.se/book/imgui.html`).
|
||||
4. **The vocabulary IS the user surface** — for AI agents, the vocab is the API (CoSy's "open vocabulary" model, per `https://cosy.com/CoSy/Simplicity.html`).
|
||||
|
||||
### 3.2 Section 2 — Prior Art Survey (8 Clusters)
|
||||
|
||||
Each cluster: 2-5 entries. Each entry: 2-3 sentences on the design idea, 2-3 sentences on what we take from it. Every entry cites a specific source (`file:line` where possible, otherwise section reference).
|
||||
|
||||
**Cluster 0 — Immediate-Mode Paradigm (the philosophical anchor):**
|
||||
- John O'Donnell, "IMGUI" / "The Pitch" / "MVC" (per `https://johno.se/book/*`)
|
||||
|
||||
**Cluster 1 — Concatenative (Forth family):**
|
||||
- Forth (Chuck Moore, 1970)
|
||||
- ColorForth (Chuck Moore, ~1990s)
|
||||
- KYRA / VAMP (Onat Türkçüoğlu, SVFIG 2025; per `kyra_in-depth.md`)
|
||||
- x68 / 5th / "Ear" + "Toe" (Timothy Lottes, 2007-2026; per `neokineogfx_in-depth.md` and `blog_in-depth.md`)
|
||||
- Joy (William Byrd, Manfred von Thun, 2003)
|
||||
- CoSy (Bob Armstrong, ongoing; per `https://cosy.com/CoSy/Simplicity.html` and `https://cosy.com/4thCoSy/`)
|
||||
|
||||
**Cluster 2 — Array:**
|
||||
- APL (Kenneth Iverson, 1962; Dyalog)
|
||||
- K / q (Arthur Whitney, Kx Systems)
|
||||
- BQN (Marshall Lochbaum, 2020)
|
||||
- Uiua (Tony Morris, 2023)
|
||||
|
||||
**Cluster 3 — Intent-Mapping:**
|
||||
- Jofito (Jody Bruchon; per `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt` and codeberg README)
|
||||
- jq (Stephen Dolan, 2012-) — downgraded to "useful adjacent"
|
||||
- nagent's tag protocol — mentioned but explicitly rejected (no XML angle brackets, no JSON blobs)
|
||||
- Wasm — one paragraph
|
||||
|
||||
**Cluster 4 — Meta-Tooling DSLs and agent-facing languages:**
|
||||
- The `mcp_dsl_20260606` placeholder (per `mcp_architecture_refactor_20260606/spec.md` §12.1)
|
||||
- nagent's "Bridge DSL" idea (per `nagent_takeaways_20260608.md` line 216-230)
|
||||
- Stainless / OpenAI function-calling schemas (1 paragraph; baseline we're moving away from)
|
||||
- Anthropic tool-use schema (1 paragraph)
|
||||
|
||||
**Cluster 5 — SSDL shape primitives:**
|
||||
- The 6 primitives + 7 modifiers (per `docs/reports/computational_shapes_ssdl_digest_20260608.md` §1); cited as the meta-vocabulary for annotating the verbs in section 4.
|
||||
|
||||
**Cluster 6 — Project's own command DSL precedents:**
|
||||
- The 33 Command Palette commands (per `docs/guide_command_palette.md` and `src/commands.py`)
|
||||
|
||||
**Cluster 7 — Data-oriented error handling convention:**
|
||||
- The `Result[T]` + `ErrorInfo` pattern (per `conductor/tracks/data_oriented_error_handling_20260606/spec.md`); the DSL's `try`/`recover`/`sandbox`/`didyoumean` verbs return `Result[T]`.
|
||||
|
||||
### 3.3 Section 3 — The Grammar (from the user's pseudocode)
|
||||
|
||||
Formalizes the 14 primitives from the user's math snippets (`determinate`, `minor`, `matrix-transpose equivalence`). Each primitive: name, meaning, example from the user's snippets.
|
||||
|
||||
| # | Symbol | Name | Meaning | Source example |
|
||||
|---|---|---|---|---|
|
||||
| 1 | `name := value` | Local bind | Stack-scoped local declaration | `result := Matrix(m.rows -1, m.columns -1)` |
|
||||
| 2 | `stack { ... }` | Stack scope | Block of stack-allocated locals | `stack { result := ...; row_offset, col_offset := Scalar; }` |
|
||||
| 3 | `name: Type` | Annotation | Type hint on a binding | `m : Matrix` |
|
||||
| 4 | `func(args) -> Type { ... }` | Function def | Named function with return type | `determinate(m, row) -> Scalar { ... }` |
|
||||
| 5 | `name(...) proc { ... }` | Procedure def | Void-returning function | `minor(m, row_omit, column_omit) -> Scalar proc { ... }` |
|
||||
| 6 | `for x .. n` | Range iteration | Iterate `x` over `[0, n)` | `for col .. m.columns` |
|
||||
| 7 | `name[a, b]` | Bracket indexing | Multi-dim array access | `result[row - row_offset, col - col_offset]` |
|
||||
| 8 | `if cond { ... }` | Conditional | If-then (no else in user's snippet; inferred) | `if col = col_omit { ++ col_offset; continue; }` |
|
||||
| 9 | `return value` | Return | Function exit with value | `return result` |
|
||||
| 10 | `->` (between verbs) | Pipeline flow | Output of left → input of right | `filter -> (col != column_omit <- for col .. m.columns)` |
|
||||
| 11 | `<-` (after verb) | Input binding | The thing on the right is the producer | `for col .. m.columns` produces; `col != column_omit` consumes |
|
||||
| 12 | `=` (in `assert`) | Equality | Assert two expressions are equal | `assert -> product(...) = product(...)` |
|
||||
| 13 | `{ }` | Body block | Function/scope body | `{ ... }` |
|
||||
| 14 | `[ ]` | Basic block | Onat's compilation unit (no branching semantics; just a unit) | `[ my_stage ]` |
|
||||
|
||||
**Ambiguity flags** (per the user's note: "Hopefully the above don't have too many logic errors that the use can't be clarified."):
|
||||
- `proc` modifier placement: `minor(m, row_omit, column_omit) -> Scalar proc { ... }` — the report should note this is a *type qualifier* (the return type is "Scalar" + "proc"-ness means side-effecting) and may be a syntax quirk
|
||||
- `++col_offset` — likely `col_offset += 1`; the report should formalize as `name += 1` and not adopt `++`
|
||||
- `m[row][column]` vs `m[row, column]` — both appear in the user's snippets (line 24 `m[row][column]` is likely a typo for `m[row][col]`); the report adopts the comma-form throughout
|
||||
|
||||
The section also formalizes:
|
||||
- **Precedence:** left-to-right for `->` chains, with `(` `)` for grouping
|
||||
- **AI-fuzzing tolerance rules:** CoSy-style modulo indexing, structured recovery anchors via `{ }`, line/offset independence (parser uses token positions, not raw line numbers)
|
||||
- **Error envelope:** `try { ... } recover { ... }` returns `Result[T]` per the `data_oriented_error_handling_20260606` convention
|
||||
- **Block composition:** `[ ]` are Onat's basic blocks (compilation units); `{ }` are body blocks (scoping); `arena { }` are arena-scoped blocks (tape-drive regions)
|
||||
|
||||
### 3.4 Section 4 — The 4-Tier Vocab (~40 verbs)
|
||||
|
||||
Each verb: signature, one-line semantics, one example, "borrowed from" note, SSDL shape tag.
|
||||
|
||||
**Tier 1 — Math (from the user's pseudocode, ~10 verbs):**
|
||||
- `:=` (local bind), `stack { }` (stack scope), `for x .. n` (range), `+`, `-`, `*`, `/`, `^`, `sum`, `product`, `a[i,j]` (bracket indexing), `if/then`
|
||||
|
||||
**Tier 2 — Data-oriented pipeline (Onat/Lottes/Jofito lineage, ~12 verbs):**
|
||||
- `scan` (read source — maps to Jofito's `scandir`, Lottes's "read arena")
|
||||
- `select` (project columns)
|
||||
- `filter` (predicate, leader/chaser style per Jofito's `predicates` pattern)
|
||||
- `map` (transform each)
|
||||
- `fold` / `reduce` (accumulate)
|
||||
- `sort`, `group`, `dedupe`
|
||||
- `arena { }` scope (declare a tape-drive region — Onat's preemptive scatter)
|
||||
- `scatter` / `gather` (preemptive scatter primitives for FFI boundaries)
|
||||
- `pipe` (synonym for `->` chain root)
|
||||
|
||||
**Tier 3 — Shell (~10 verbs):**
|
||||
- `exec`, `open`, `read`, `write`, `close`, `path`, `env`, `wait`, `poll`, `cwd`
|
||||
|
||||
**Tier 4 — AI-fuzzing tolerance (the novel piece, ~8 verbs):**
|
||||
- `fuzzy` (declare a parse-tolerance region)
|
||||
- `try { ... } recover { ... }` (returns `Result[T]`)
|
||||
- `sandbox { ... }` (the IEventTarget boundary — per O'Donnell §"Writing to Model state")
|
||||
- `audit` (log primitive — auto-emits an audit record on every write-verb)
|
||||
- `didyoumean` (the parser's "best guess" recovery path)
|
||||
- `span` / `offset` (first-class spans for error messages; parser uses token positions, not line numbers)
|
||||
- `assumewide` (the SSDL "wide codepath" assumption, applied to the DSL — "if in doubt, the stage is wide/parallel")
|
||||
|
||||
**Mapping to existing MCP tools:** every Tier 2/3 verb has a "maps to mcp_client tool" column. Example: `scan` maps to `mcp_client.list_directory` + `mcp_client.search_files`; `read` maps to `mcp_client.read_file`; `write` maps to `mcp_client.set_file_slice`. This is the explicit "the DSL is a *front-end* for the existing 45+ tools" claim (per `docs/guide_tools.md` §"Native Tool Inventory").
|
||||
|
||||
### 3.5 Section 5 — Hardware Mapping (4 anchor claims)
|
||||
|
||||
Each claim ties a cluster to a specific verb behavior:
|
||||
|
||||
**Claim 1 (Onat/Lottes, hardware):** the 2-register stack + magenta pipe + basic blocks + lambdas + preemptive scatter (per `C:\projects\forth\bootslop\references\kyra_in-depth.md`, `forth_day_2020_in-depth.md`, `neokineogfx_in-depth.md`, `X.com - Onat & Lottes Interaction 1.png.ocr.md`) → our `->`, `[ ]`, `arena { }`, `scatter`/`gather`. Specifically:
|
||||
- 2-register stack (RAX/RDX) → the DSL's `->` chain maps to RAX-passed data; each verb is a "word" in Onat's sense (no args, no returns per the X.com thread line 95-103)
|
||||
- Magenta pipe `|` (KYRA) → our `->` (same definition-boundary semantics, retargeted to data flow)
|
||||
- Basic blocks `[ ]` (KYRA) → our `[ ]` (compilation units; the parser produces a `[ ]` block per `->`-delimited stage)
|
||||
- Lambdas `{ }` (KYRA) → our `arena { }` (arena-scoped blocks; the contents are pre-scattered into tape-drive regions)
|
||||
- Preemptive scatter (Onat/Lottes, per X.com line 55-61) → our `arena { }` (pre-place arguments before consumption)
|
||||
- Folded interpreter (Lottes, per `neokineogfx_in-depth.md` §2) → our verb dispatch (5-byte per-verb tail; the parser emits these at parse time)
|
||||
- Lottes's "no data stack" (per `blog_in-depth.md` §3) → our register-allocated temp vars (`a + b` doesn't push to a memory stack)
|
||||
- 32-bit granularity (Lottes x68) → each compiled verb is exactly 32 bits, padded via ignored prefixes
|
||||
- Branch misprediction fix (Lottes, per `neokineogfx_in-depth.md` §2) → the DSL parser produces straight-line code; no dictionary lookup at runtime
|
||||
|
||||
**Claim 2 (O'Donnell, paradigm):** the DSL's pipeline is *immediate-mode in pipeline composition*. Each `->`-delimited stage is a method invocation, not a Pipeline object. The pipeline exists *only* while the DSL program is being executed; once execution ends, the pipeline's state is gone. This is the *exact* parallel to IMGUI's "widgets are method invocations, not objects" (per `https://johno.se/book/imgui.html`). Why this matters: it means the parser doesn't need to track pipeline state across executions; each invocation is independent. Manifest in vocab: the `->` chain has no "pipeline object" you can query, name, or pass around; the only way to "name" a chain is to wrap it in a function.
|
||||
|
||||
**Claim 3 (Forth/CoSy, syntax):** concatenative syntax is immediate-mode in *tokenization* (whitespace-delimited, no precedence), in *evaluation* (each verb pops args, pushes results), and in *parsing* (no AST object retained after the parse — the parser emits JIT'd code directly per Onat's xchg model). The DSL inherits all three.
|
||||
|
||||
**Claim 4 (APL/K, data):** array languages are immediate-mode in *data representation* (no array-object header; CoSy uses `(Type Count refCount)` but values are passed by stack reference, not by handle). The DSL's `for x .. n` range + `result[row, col]` indexing inherits the "no array object" property.
|
||||
|
||||
### 3.6 Section 6 — AI-Agent Properties (10 claims)
|
||||
|
||||
Each claim ties the DSL to a specific aspect of the existing project's architecture.
|
||||
|
||||
1. **Domain = Meta-Tooling** (per `docs/guide_meta_boundary.md` §"Domain 2: The Meta-Tooling"). The Application's provider-native function-calling stays; the DSL is the format external agents (Gemini CLI, OpenCode) emit.
|
||||
2. **Runtime path = external agent → DSL text → bridge script** (per `docs/guide_meta_boundary.md` §"The Inter-Domain Bridges"). The bridge script (`scripts/cli_tool_bridge.py` analogue) translates the DSL into actual `mcp_client.py` tool calls. The bridge uses the Hook API to surface HITL approval modals when needed.
|
||||
3. **3-layer security (per `docs/guide_tools.md` §"The MCP Bridge"):** every verb in the DSL respects the existing allowlist. The parser rejects DSL statements that target tools outside the allowlist.
|
||||
4. **4 memory dimensions** (per `conductor/tracks/nagent_review_20260608/nagent_review_v2_1_20260612.md` §2.1): the DSL does *not* replace any memory dimension. Curation (FileItem + ContextPreset), Discussion (disc_entries), RAG (opt-in), Knowledge (candidate 11). The DSL is a *query format* for all 4, not a replacement.
|
||||
5. **Stable-to-volatile cache ordering** (per nagent v2.1 §2.2): the DSL's output (e.g., the `audit` verb's logs) is a *stable* layer that can be cached across turns. The DSL's `arena { }` blocks are cache-friendly.
|
||||
6. **`Result[T]` envelope** (per `conductor/tracks/data_oriented_error_handling_20260606/spec.md`): the `try`/`recover` verbs return `Result[T]`; the `didyoumean` verb returns `Result[T, list[Suggestion]]`. The 12 `ErrorKind` values are the canonical error vocabulary.
|
||||
7. **Command Palette 33 commands** (per `docs/guide_command_palette.md` and `src/commands.py`): the DSL's verbs are a *richer* superset of these. "Everything" mode in the Command Palette (per `guide_command_palette.md` line 383) is a near-term use case where the DSL's verbs can be the underlying format.
|
||||
8. **Hook API state fields** (per `docs/guide_state_lifecycle.md` §"Hook API Surface"): the DSL's verbs that mutate state route through `_predefined_callbacks`; the verbs that read state use `_gettable_fields`. The DSL never bypasses the Hook API; it's a *user* of the existing infrastructure.
|
||||
9. **O'Donnell's IEventTarget pattern as the `sandbox` verb** (per `https://johno.se/book/mvc.html` §"Writing to Model state"). The `sandbox { ... }` block in Tier 4 is the DSL's IEventTarget boundary. Every state change inside the block goes through the bridge script's HITL approval modal (per `docs/guide_meta_boundary.md`). The `audit` verb is the IEventTarget itself: a write-verb that logs the state change to a structured record.
|
||||
10. **O'Donnell's "reads are free" claim** (per `https://johno.se/book/mvc.html` §"Reading Model state"). The Tier 2 verbs (`scan`, `filter`, `map`, `fold`, `sort`, `group`, `dedupe`) are *read-only* and can be re-evaluated freely, multiple times per execution, in parallel stages, without audit. Only the moment the chain's output is consumed by a write-verb (`exec`, `write`, `assign`) triggers the HITL modal. This is why the bridge script can re-execute a read-only chain without human approval.
|
||||
|
||||
### 3.7 Section 7 — Open Questions for Follow-up B (≥6 questions + placeholder connection)
|
||||
|
||||
At least 6 open questions that the follow-up B track (interpreter prototype) must answer. Plus a connection block to the `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER`.
|
||||
|
||||
1. How does `arena { }` map to Onat's preemptive scatter? Is the block itself a tape-drive region, or is `arena` a wrapper that allocates a tape for the block's contents?
|
||||
2. Where does "intent resolution" live? Is it a per-verb option, a per-block modifier, or a global parser mode?
|
||||
3. How does `audit` interact with Manual Slop's existing `comms.log`? Is the DSL's audit log separate or merged? (Per `docs/guide_architecture.md` §"Telemetry & Auditing" — the existing 5 log streams are `comms.log`, `toolcalls.log`, `apihooks.log`, `clicalls.log`, `scripts/generated/<ts>_<seq>.ps1`.)
|
||||
4. Does `sandbox` produce `Result[T, ErrorInfo]` (the Fleury pattern) or a different envelope? (Per `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §3.3.)
|
||||
5. `didyoumean` recovery: parser feature or user-facing verb?
|
||||
6. How does `for x .. n` interact with Tier 2's `filter`/`map`? Sugar or distinct?
|
||||
7. How does `sandbox` map to Manual Slop's existing `pre_tool_callback` flow? The `sandbox` block's audit log: separate JSON-L file, or fold into the existing `comms.log` + `toolcalls.log`?
|
||||
8. Connection to `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER`: what's the minimum subset of the report's vocab that would let the placeholder track (a) write a bridge script and (b) demonstrate one round-trip end-to-end?
|
||||
|
||||
## 4. Per-Section Content Boundaries
|
||||
|
||||
The 7 sections are all written into a single markdown file at `docs/ideation/2026-06-12-intent-based-scripting-languages.md`. The file is organized as:
|
||||
|
||||
- **Header:** track name, date, author, status, "what this is / what this is not" callout
|
||||
- **Section 1 (~2-3 pages):** the philosophy
|
||||
- **Section 2 (~3-5 pages):** the 8-cluster prior art
|
||||
- **Section 3 (~2-3 pages):** the grammar with the user's pseudocode examples
|
||||
- **Section 4 (~3-4 pages):** the 4-tier verb tables
|
||||
- **Section 5 (~1-2 pages):** the hardware mapping
|
||||
- **Section 6 (~2-3 pages):** the AI-agent properties
|
||||
- **Section 7 (~1-2 pages):** the open questions
|
||||
- **Appendix (~1 page):** the full prior-art bibliography (file:line refs)
|
||||
|
||||
Target: ~3500-5000 lines of markdown. The existing `ed_chunk_data_structures_20260523.md` is 241 lines and was well-received; the report can be in that range (1.5-2x the existing ideation doc) if disciplined.
|
||||
|
||||
## 5. Configuration / Dependencies
|
||||
|
||||
- **No new Python dependencies.** The track produces only a markdown report; no `pyproject.toml` changes.
|
||||
- **No new `src/` code.** Same reason.
|
||||
- **No new tests.** Same reason.
|
||||
- **The `youtube-transcript-api` package is already used via `uv run --with`** (one-time, for the Jody Bruchon video transcript fetch; already executed during spec review). No persistent dependency.
|
||||
|
||||
## 6. Testing Strategy
|
||||
|
||||
The track is research-only; no automated tests. Verification is human:
|
||||
|
||||
1. **Self-review per the brainstorming skill:** after the report is drafted, the Tier 2 Tech Lead (or the Tier 1 Orchestrator in this case) does a placeholder scan, internal-consistency check, scope check, and ambiguity check.
|
||||
2. **User review:** the user reviews the final report and either approves (proceed to phase 4 commit) or iterates.
|
||||
3. **Verification criteria** (see §10 below) are checked before commit.
|
||||
|
||||
The "testing" of the *report itself* is whether the user finds it useful, well-grounded, and actionable for nagent v2.2 and the future interpreter prototype.
|
||||
|
||||
## 7. Migration / Rollout
|
||||
|
||||
The report is a *standalone artifact*. No migration required:
|
||||
|
||||
- The `docs/ideation/2026-06-12-intent-based-scripting-languages.md` file is added to the project tree.
|
||||
- `conductor/tracks.md` is updated to register the track as completed.
|
||||
- A git note is attached to the commit per `conductor/workflow.md` §"Task Workflow" step 9.2.
|
||||
- The placeholder `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER` is *not* modified. The report's section 7 names the connection points so the placeholder track can be filled with the report's vocab when it's specced.
|
||||
|
||||
Future tracks (B interpreter, placeholder bridge script) consume the report. The report is the *foundation document* — these tracks don't re-derive the philosophy, prior art, grammar, vocab, or AI-agent properties; they cite the report.
|
||||
|
||||
## 8. Risks & Mitigations
|
||||
|
||||
| Risk | Impact | Likelihood | Mitigation |
|
||||
|---|---|---|---|
|
||||
| Scope creep into building the interpreter | High (track becomes multi-month instead of 1-2 days) | Medium | Track is research-only; explicit non-goals (§2.1). Follow-up B is the prototype. |
|
||||
| Vocab grows beyond 40 verbs | Medium (report becomes hard to reference) | Low | Cap at 4 tiers, ~10 verbs each. Add a "vocab v1.1" follow-up if needed. |
|
||||
| Grammar section gets tangled in implementation details | Medium (the report becomes a spec instead of a survey) | Medium | Grammar is purely syntactic in section 3; implementation questions deferred to section 7's "open questions." |
|
||||
| Time slippage blocks nagent v2.2 | High (the user is waiting) | Low | 4 phases, single user-approval gate; can pause at any phase boundary. Phases 1-3 are self-directed; only phase 4 needs user input. |
|
||||
| The user's pseudo code has known logic errors | Low (the report flags them, doesn't propagate them) | High (already known) | Section 3's "Ambiguity flags" subsection names each ambiguity and notes that the report adopts a normalized form (`name += 1` not `++`, comma-form indexing). |
|
||||
| User disagrees with the vocab choices in section 4 | Medium (report needs revision) | Medium | Single user-approval gate at end of phase 4. If user wants changes, loop back. |
|
||||
| The 8-cluster prior art is too dense | Low (report becomes hard to read) | Medium | Each entry is 2-3 sentences on the idea + 2-3 sentences on the take. Total ~6 entries per cluster × 8 clusters = ~48 entries; manageable. |
|
||||
|
||||
## 9. Open Questions for the Tier 2 Tech Lead (planning, not blocking)
|
||||
|
||||
- The exact format of the report's verb tables (markdown tables vs YAML/JSON examples vs ASCII art). The user's ideation doc (`ed_chunk_data_structures_20260523.md`) uses prose + ASCII art; the existing `nagent_review_v2_1_20260612.md` uses markdown tables. Recommendation: markdown tables for the verb signatures, ASCII art for the pipeline examples.
|
||||
- The report's relation to the `manual_ux_validation_20260608_PLACEHOLDER` track. The placeholder track mentions a "Computational Shapes SSDL" workflow; the report's section 4 uses SSDL shape tags per verb. The connection is already there.
|
||||
- Whether to include a "minimal end-to-end example" in section 4 (e.g., "here is a 10-verb DSL program that does `find . -type f -name '*.py' | wc -l`"). Recommendation: yes, 1-2 examples per tier. Helps the reader grasp the verb composition.
|
||||
|
||||
## 10. Coordination with Pending Tracks (post-state baseline)
|
||||
|
||||
This track is independent — no blockers. It can be started immediately.
|
||||
|
||||
**The track should verify the following before phase 1 starts:**
|
||||
- `docs/ideation/` exists (it does, per `manual-slop_list_directory` of `docs/`)
|
||||
- `conductor/tracks.md` exists and is current (it is, per the spec review)
|
||||
- The 8 prior-art sources (CoSy Simplicity, Onat/Lottes refs, Jofito transcript + README, O'Donnell pages, `nagent_review_v2_1_20260612.md`, `data_oriented_error_handling_20260606/spec.md`, `guide_command_palette.md`, `computational_shapes_ssdl_digest_20260608.md`) are all readable (they are)
|
||||
|
||||
**The track does NOT block any other track.** It is purely additive.
|
||||
|
||||
**The track's output is consumed by:**
|
||||
- The user's nagent v2.2 report (the "Future-Track Candidate #4: Intent-based DSL" section)
|
||||
- The future `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER` (when it's specced)
|
||||
- The future "interpreter prototype" follow-up B track (when the user is ready)
|
||||
|
||||
## 11. Verification Criteria
|
||||
|
||||
The track is "done" when all of the following are true:
|
||||
|
||||
- [ ] The 7 sections of the report are present and non-empty in `docs/ideation/2026-06-12-intent-based-scripting-languages.md`
|
||||
- [ ] Every prior-art claim in section 2 cites a specific source (transcript line, README section, Wikipedia article section, or `file:line` for project files)
|
||||
- [ ] The user's pseudocode grammar is formalized in section 3 with examples drawn from the `determinate`/`minor`/`matrix-transpose` snippets
|
||||
- [ ] Every 4-tier verb in section 4 has: signature, one-line semantics, one example, "borrowed from" note, and an SSDL shape tag
|
||||
- [ ] Section 5 references Onat/Lottes 2-register model + Lottes's aliased register file + preemptive scatter (file:line references to `C:\projects\forth\bootslop\references\kyra_in-depth.md`, `forth_day_2020_in-depth.md`, `neokineogfx_in-depth.md`, `X.com - Onat & Lottes Interaction 1.png.ocr.md`)
|
||||
- [ ] Section 6 references the 4 memory dimensions from `conductor/tracks/nagent_review_20260608/nagent_review_v2_1_20260612.md` §2.1 + the SSDL "assume as much as possible" from `docs/reports/computational_shapes_ssdl_digest_20260608.md` + the `Result[T]` convention from `conductor/tracks/data_oriented_error_handling_20260606/spec.md` + the Application vs Meta-Tooling split from `docs/guide_meta_boundary.md`
|
||||
- [ ] Section 7 lists at least 6 open questions for the follow-up B track + the connection block to the `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER`
|
||||
- [ ] Self-review pass complete (placeholder scan, internal consistency, scope check, ambiguity check)
|
||||
- [ ] User has reviewed and approved the final report
|
||||
- [ ] The report is committed to git (per-file atomic commits per `conductor/workflow.md` §"Task Workflow" step 9.1-9.2)
|
||||
- [ ] A git note is attached per `conductor/workflow.md` §"Task Workflow" step 9.2
|
||||
- [ ] `conductor/tracks.md` is updated to register the track as completed (entry under "Recently Completed" or wherever the convention dictates)
|
||||
- [ ] The `ed_intent_dsl_*` placeholder track in `conductor/tracks.md` (if any) is not consumed — this is a new track, not a placeholder fill
|
||||
|
||||
## 12. Out of Scope (Explicit)
|
||||
|
||||
- **Interpreter prototype** (follow-up B track, separate)
|
||||
- **Bridge script** (the `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER`, separate)
|
||||
- **XML/JSON record formats** (user-rejected)
|
||||
- **The Application's provider-native function-calling** (stays as-is; the DSL is Meta-Tooling-side)
|
||||
- **RAG integration** (covered by the proposed `rag_integration_discipline.md` styleguide in the nagent v2.1 report §2.10)
|
||||
- **New `src/` code, new tests, `pyproject.toml` dependencies**
|
||||
- **Modifying the existing 33 Command Palette commands** (per `docs/guide_command_palette.md`); the DSL is a richer superset, not a replacement
|
||||
- **Implementing the `Result[T]` envelope** (covered by the `data_oriented_error_handling_20260606` track, in plan state per `conductor/tracks.md`)
|
||||
|
||||
## 13. See Also
|
||||
|
||||
### 13.1 Existing project references
|
||||
|
||||
- **`docs/Readme.md`** — the documentation index; the new report will be implicitly indexed by being in `docs/ideation/`
|
||||
- **`docs/ideation/ed_chunk_data_structures_20260523.md`** — the existing ideation doc; same folder, same style
|
||||
- **`conductor/tracks.md`** — the active tracks registry; will be updated to register this track
|
||||
- **`conductor/workflow.md`** — the workflow rules; this track follows the standard 4-phase pattern
|
||||
- **`conductor/product.md`** — the product guide; the report's "AI-agent properties" section (6) aligns with the product vision
|
||||
- **`conductor/tech-stack.md`** — the tech stack; the report's "hardware mapping" section (5) is consistent with the stated tech-stack constraints
|
||||
- **`conductor/code_styleguides/`** — the styleguides; the report's grammar section (3) follows the AI-Optimized Python style (1-space indent, region blocks, etc.) *for the report's own code examples*
|
||||
|
||||
### 13.2 Track-internal references
|
||||
|
||||
- **`conductor/tracks/data_oriented_error_handling_20260606/spec.md`** — the model for this spec's structure; the `Result[T]` convention the report's Tier 4 verbs follow
|
||||
- **`conductor/tracks/nagent_review_20260608/nagent_review_v2_1_20260612.md`** — the 4 memory dimensions, the RAG integration discipline, the stable-to-volatile cache ordering
|
||||
- **`conductor/tracks/mcp_architecture_refactor_20260606/spec.md` §12.1** — the `mcp_dsl_20260606` placeholder; the per-MCP DSL track
|
||||
- **`conductor/tracks/code_path_audit_20260607/spec.md`** — the data-oriented pattern for static analysis; the report's section 5 borrows its framing of "static analysis of intent"
|
||||
|
||||
### 13.3 External references (the prior art)
|
||||
|
||||
- **Forth, ColorForth, KYRA, x68, Joy, CoSy** — see §3.2 Cluster 1
|
||||
- **APL, K, BQN, Uiua** — see §3.2 Cluster 2
|
||||
- **Jofito, jq, nagent's tag protocol, Wasm** — see §3.2 Cluster 3
|
||||
- **mcp_dsl_20260606 placeholder, nagent's Bridge DSL, Stainless/OpenAI/Anthropic tool-use schemas** — see §3.2 Cluster 4
|
||||
- **SSDL shape primitives** (per `docs/reports/computational_shapes_ssdl_digest_20260608.md` §1) — see §3.2 Cluster 5
|
||||
- **Command Palette 33 commands** (per `docs/guide_command_palette.md` and `src/commands.py`) — see §3.2 Cluster 6
|
||||
- **`Result[T]` + `ErrorInfo` pattern** (per `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §3.3) — see §3.2 Cluster 7
|
||||
- **John O'Donnell's IMGUI / The Pitch / MVC** (per `https://johno.se/book/imgui.html`, `https://johno.se/book/pitch.html`, `https://johno.se/book/immvc.html`, `https://johno.se/book/mvc.html`) — see §3.2 Cluster 0
|
||||
- **Onat Türkçüoğlu's KYRA/VAMP and Timothy Lottes's x68/5th** (per `C:\projects\forth\bootslop\references\kyra_in-depth.md`, `forth_day_2020_in-depth.md`, `neokineogfx_in-depth.md`, `blog_in-depth.md`, `Architectural_Consolidation.md`, `X.com - Onat & Lottes Interaction 1.png.ocr.md`)
|
||||
- **Jody Bruchon's Jofito** (per `docs/transcripts/Ddme7DwMQBI_jofito_jody_bruchon.txt` and `https://codeberg.org/jbruchon/jofito`)
|
||||
- **Bob Armstrong's CoSy** (per `https://cosy.com/CoSy/Simplicity.html` and `https://cosy.com/4thCoSy/`)
|
||||
@@ -0,0 +1,428 @@
|
||||
As you can see, you guys have managed to
|
||||
buy a solid day of developer time for
|
||||
Jofido in under 24 hours. I am truly
|
||||
humbled by your support. In fact, I'm so
|
||||
humbled that Danny the dinosaur back
|
||||
here has now decided to become my high
|
||||
priest,
|
||||
which is why there's that creepy staff
|
||||
thing that I'm a little afraid of.
|
||||
Anyway, um in order to avoid getting
|
||||
murdered by the magic, I'm going to show
|
||||
you what I've done
|
||||
just so that you can understand Jofido a
|
||||
little better. I have this nifty little
|
||||
diagram right here. Oh, no, no, Danny,
|
||||
please don't kill me for hiding you. But
|
||||
this is how we're going to be rust. It's
|
||||
pretty straightforward uh if you know
|
||||
what you're looking at. So, let me maybe
|
||||
pivot somewhere else, which is going to
|
||||
unfortunately force me to edit the
|
||||
video, and actually show you this
|
||||
diagram in a little bit more clarity.
|
||||
All right, this is going to make for the
|
||||
most awkward presentation I've ever
|
||||
given anyone ever.
|
||||
>> [cough]
|
||||
>> So, what [clears throat] we've got here
|
||||
is the old way of doing things. This is
|
||||
your standard pipeline. Um by the way,
|
||||
excuse the stacking everything up on
|
||||
VCRs. I didn't know what else to do. I
|
||||
don't have a proper table here.
|
||||
find dot {dash} type f pipeline to grep
|
||||
{dash} e The backslashes are escapes for
|
||||
the dot. jpg {dollar} sign {dash} e
|
||||
{backslash} {backslash} dot png {dollar}
|
||||
sign.
|
||||
What does this mean? Well, if we try to
|
||||
read it like a layman, it doesn't mean
|
||||
very much. Find whatever an f is. Uh I
|
||||
can think of some things that start with
|
||||
f that remind me of things that I don't
|
||||
want to find. But anyway, and then a a
|
||||
vertical symbol, and what is a grep? Who
|
||||
knows? And what are all these? I mean, I
|
||||
I I kind of know what jpg and png mean,
|
||||
but if I'm a layman, this is cryptic
|
||||
crap.
|
||||
>> [snorts]
|
||||
>> It's not just cryptic, it's inefficient
|
||||
beyond belief. So, here's what we've
|
||||
got.
|
||||
You'll notice
|
||||
>> [clears throat]
|
||||
>> that we have arrows going down to this
|
||||
box called pipe buffer.
|
||||
That's because if you run find, current
|
||||
directory as the root for the find,
|
||||
only return results that are type file,
|
||||
just as an example, um
|
||||
>> [clears throat]
|
||||
>> pipeline, it has to shovel the output of
|
||||
that as the input of grep. Grep is
|
||||
general regular expression parser.
|
||||
It's a big fancy state machine that
|
||||
takes a while to spin up and is not all
|
||||
that fast at just simple globbing, which
|
||||
is the term used to refer to finding
|
||||
basically
|
||||
um
|
||||
finding substrings in a string except in
|
||||
reverse.
|
||||
So,
|
||||
>> [cough]
|
||||
>> these grep [clears throat] expressions,
|
||||
which is the e's,
|
||||
say dot jpg or dot png, and {dollar}
|
||||
sign is code for the end of the line.
|
||||
You have to know all of that to make
|
||||
this work. This essentially finds every
|
||||
single file, but not directory, only
|
||||
actual files under the current
|
||||
directory,
|
||||
and then pipes that to grep to then
|
||||
further reduce the results so that you
|
||||
only have jpg or png file extensions at
|
||||
the end of the list.
|
||||
To do that, it has to jump through this
|
||||
pipe buffer. Now, the problem is some
|
||||
data will get kicked out of find, put
|
||||
into this intermediate buffer, and then
|
||||
pushed out of the intermediate buffer as
|
||||
the input of grep. Every [snorts] single
|
||||
time you send stuff through a pipe, or a
|
||||
consumer consumes the stuff through the
|
||||
other end of the pipe, you have a
|
||||
context switch. Also, I didn't
|
||||
illustrate it here, but you also have a
|
||||
problem where if the consumer isn't fast
|
||||
enough, the producer waits for the
|
||||
consumer, potentially running into a
|
||||
nasty time-sinking task of some sort
|
||||
along the way.
|
||||
But we're going to ignore that for now.
|
||||
So, every time you do a context switch,
|
||||
you're basically [clears throat]
|
||||
throwing away your CPU state and
|
||||
trashing your caches, which makes
|
||||
everything run slower, because now all
|
||||
this stuff you're doing the work for
|
||||
here is no longer in main memory, or
|
||||
rather in the L1 cache, which is your
|
||||
CPU's execution core's main memory. It
|
||||
gets thrown out and switched over to
|
||||
this one. You just keep bouncing back
|
||||
and forth, or whatever. So, you're
|
||||
destroying your cache coherency by
|
||||
duplicating data, because the pipe
|
||||
buffer doesn't just like magically drop
|
||||
itself into grep. It has to be fed
|
||||
through the interfaces that grep uses to
|
||||
input, be it fgets, which reads
|
||||
individual lines, or um fread, or just
|
||||
plain read. But one way or the other, it
|
||||
gets kicked out of this, which usually
|
||||
there's some kind of output interface
|
||||
here. Then it gets stored by proxy in a
|
||||
buffer. Then that same proxy is also
|
||||
kicking it out. So, there's all these
|
||||
switches between the contexts,
|
||||
and it wrecks your CPU performance. Now,
|
||||
it's also just generally inefficient and
|
||||
unreadable.
|
||||
Grep is also a beast. And at the end of
|
||||
it, all we're doing is printing the list
|
||||
of files that match. Now,
|
||||
my solution, Jofido, Jody's file tool,
|
||||
we'll say scan directory. And this is
|
||||
sort of the C function format. I'm sorry
|
||||
I had to break things across lines, cuz
|
||||
I wrote large, but
|
||||
it ends over here.
|
||||
So, scan directory,
|
||||
the first parameter is the same thing.
|
||||
It's dot. It's presented in double
|
||||
quotes so that we know it's a string. We
|
||||
know that it's actually meant to be text
|
||||
and not a variable name. That's
|
||||
important.
|
||||
But other than that oddity, this is the
|
||||
same. But here's how it differs.
|
||||
Find does not have grep. Find can't do
|
||||
the
|
||||
only match things against certain
|
||||
parameters, or only match things that
|
||||
don't meet certain parameters.
|
||||
Scan directory, however, has this curly
|
||||
brace filter
|
||||
that ends over here.
|
||||
Filter is a generic predicate that calls
|
||||
a particular kind of filtration on a
|
||||
string or list of strings,
|
||||
and then filters them as you want them.
|
||||
In this case, we would have a filter
|
||||
that filters extensions. JP JPG and PNG
|
||||
corresponding to JPEG and portable
|
||||
network graphics images.
|
||||
It's much easier to read. We know we're
|
||||
scanning a directory. The dot's cryptic,
|
||||
but it's the current directory. I mean,
|
||||
that's just you kind of have to accept
|
||||
that degree of the terminology here.
|
||||
Excuse the coughing.
|
||||
[cough and clears throat]
|
||||
Then this filter, what happens under the
|
||||
hood is scan directory alone can just
|
||||
start reading the directory contents,
|
||||
but filter
|
||||
runs in a parallel thread. And then
|
||||
you'll notice that's not the end of it.
|
||||
Then the last one is another predicate
|
||||
called print.
|
||||
The curly braces mean that it's a
|
||||
predicate. Basically, think of it as a
|
||||
modifier. And that's the end of the scan
|
||||
directory function. Now, we don't have
|
||||
to have a big pipe buffer. We don't have
|
||||
to have an output buffer, a pipe buffer,
|
||||
and an input buffer, which is what's
|
||||
really going on here under the hood with
|
||||
the C library.
|
||||
Instead, we're doing everything
|
||||
in-house. We do it all internal to
|
||||
Jofido. So, what we have is an arena. An
|
||||
arena is a kind of memory map where you
|
||||
just slam everything in order, and um
|
||||
you allocate in large chunks. And I
|
||||
don't want to go too far into it, but
|
||||
the bottom line is as the scan directory
|
||||
reads in
|
||||
these paths and stores them in the arena
|
||||
here,
|
||||
the filter predicate is chasing that
|
||||
arena. Rather than waiting to to be able
|
||||
to continue to scan the directory for
|
||||
the filter to make a decision, these run
|
||||
in parallel. If scan directory is faster
|
||||
than filter,
|
||||
then filter eventually has to catch up.
|
||||
But if filter is faster than scan
|
||||
directory, which is most likely,
|
||||
then filter
|
||||
catches up to
|
||||
It just stops. It doesn't process
|
||||
anymore
|
||||
until scan directory increments
|
||||
the size of this list, and that triggers
|
||||
filter. Its thread wakes up, sees that
|
||||
the increment's there, sees that the
|
||||
done flag for the operation it's
|
||||
supposed to filter hasn't been toggled,
|
||||
and bumps to the next item. So, in this
|
||||
way, we have a leader
|
||||
and a chaser.
|
||||
The chaser [clears throat] goes through,
|
||||
and that's what this blue arrow is here,
|
||||
and qualifies each one. This one's bad.
|
||||
Okay. So, what happens when filter finds
|
||||
this bad one?
|
||||
Scan directory has already moved past
|
||||
it. So, filter will deallocate this and
|
||||
detach it. There's a complicated way
|
||||
that I prevent deallocation of an object
|
||||
from the arena from causing an index
|
||||
mismatch, but it can All you need to
|
||||
know is that we can remove this item
|
||||
without the third chaser, or the second
|
||||
chaser, print here,
|
||||
having a problem where, oh no, there's
|
||||
an item that's gone, and now I see this
|
||||
is item three instead of four. We don't
|
||||
have that problem. Filter can
|
||||
immediately detach this.
|
||||
And now, when print goes through, it
|
||||
will never hit this. See, each one of
|
||||
these follows in order. This is the most
|
||||
subordinate. This is the leader. So,
|
||||
print is chasing filter is chasing scan
|
||||
directory. We have a situation here
|
||||
where if you have three cores or threads
|
||||
on a machine,
|
||||
the directory scan can be happening, and
|
||||
this actually would be happening in bulk
|
||||
with some of my optimizations.
|
||||
Then
|
||||
the filtration of that scan will be
|
||||
happening in another thread or on
|
||||
another core
|
||||
at the same time
|
||||
and will stop when it runs out of data
|
||||
and resume when more data is available.
|
||||
Then the subordinate here also, same
|
||||
deal. It will stop when the filter
|
||||
doesn't have any more filtered items
|
||||
available and continue when it does. So,
|
||||
scanning, filtering, and printing can
|
||||
all happen on a modern machine with
|
||||
multiple cores simultaneously.
|
||||
But the most important part is if we
|
||||
have the scanner, the filter, and the
|
||||
printer chasing all one after the other,
|
||||
the likelihood of say
|
||||
say the scanner here has just loaded
|
||||
bad.text into the list
|
||||
and then the filter here um has filtered
|
||||
just qualified abc.jpeg and the print
|
||||
has just printed xyz.png, right?
|
||||
So, these things are all assuming that
|
||||
the predicates here are fast enough,
|
||||
they're all kind of working in lockstep,
|
||||
which means that these items are still
|
||||
hot in the level one instruction and
|
||||
data caches as it's iterating through
|
||||
this list.
|
||||
So, rather than this situation where you
|
||||
have three separate lists that are in
|
||||
completely different places that are
|
||||
blowing out each other's L1 cache
|
||||
presence,
|
||||
our entire chain here
|
||||
is following one another. And the best
|
||||
part of all of this,
|
||||
which not other than the fact that print
|
||||
can output immediately instead of
|
||||
waiting,
|
||||
the best part is this part. Arena
|
||||
objects are destroyed once they're
|
||||
terminal.
|
||||
So,
|
||||
what [clears throat] makes an arena
|
||||
object terminal? Well, when filter
|
||||
filters out this,
|
||||
it can no longer be passed to any of the
|
||||
predicates that are subordinate to it
|
||||
that come later.
|
||||
So,
|
||||
print is not going to be able to print
|
||||
this.
|
||||
So, there's no more use for it. This
|
||||
object officially's dead. So, filter can
|
||||
say so. Filter can say, "Hey, this one's
|
||||
a no-no, kill it." And it gets killed
|
||||
and it gets marked as free in the arena.
|
||||
But then, when print prints this one and
|
||||
this one and this one and this one in
|
||||
order,
|
||||
as it's printing them, it is the
|
||||
terminal predicate. It is the end of the
|
||||
line. Nothing happens with this after
|
||||
print because we didn't assign the scan
|
||||
directory results to some variable to
|
||||
keep.
|
||||
So, once scan directory's done and print
|
||||
has completed, we don't need any of this
|
||||
anymore.
|
||||
But we don't deallocate it in bulk at
|
||||
the end.
|
||||
As print chips away at the list and is
|
||||
the tail end of this predicate chain,
|
||||
dump dump dump dump. Once an item is no
|
||||
longer needed, it is freed up. Once
|
||||
enough arena items have been freed up,
|
||||
this entire arena page can be compacted.
|
||||
And I don't want to go over it in this
|
||||
one, but maybe the next video if you're
|
||||
interested. The way that the arena works
|
||||
is we actually have an indirection
|
||||
block, think of it as over here
|
||||
somewhere,
|
||||
so that these high-level primitives
|
||||
point to indirection blocks, but the
|
||||
low-level locations are pointed to by
|
||||
the indirection blocks. So, this sees
|
||||
the list it's outputting at a fixed
|
||||
location
|
||||
that points to a variable location.
|
||||
So, we can move these around all we
|
||||
want. We can garbage collect as in free
|
||||
up memory and compact out the empty
|
||||
spaces all day long.
|
||||
And none of these predicates or filters
|
||||
or actions or verbs or whatever you want
|
||||
to call them have any idea that that's
|
||||
going on right behind their backs.
|
||||
Anyway, this is just a basic example of
|
||||
the kind of thing that I intend to do.
|
||||
This effectively replaces this find grep
|
||||
chain, which is a pretty common one. I
|
||||
actually use this pretty often to find
|
||||
all of the pictures under a certain
|
||||
folder. So, this is not some academic
|
||||
example. This is real world working with
|
||||
your hands on the metal, you know,
|
||||
system administration. I need to find
|
||||
all the pictures underneath this folder
|
||||
and get a list of them.
|
||||
And this is a common thing to do and
|
||||
there are steps along the way that make
|
||||
it a lot slower than it has to be. The
|
||||
longer you wait for one step to finish,
|
||||
the longer it takes everything down the
|
||||
pipeline to finish.
|
||||
Also, something I haven't talked about,
|
||||
uh maybe a little teaser for you guys,
|
||||
I want to replace find and grep with
|
||||
Jofedo primitives and scripts.
|
||||
Well, one of the solutions I have to,
|
||||
"Well, how are you going to integrate
|
||||
Jofedo in in like this and not lose the
|
||||
benefits of like of avoiding this pipe
|
||||
buffer?"
|
||||
I've come up with some tech called pipe
|
||||
coalescing where
|
||||
find and grep see their part of a
|
||||
pipeline. Find and grep see their the
|
||||
same
|
||||
Jofedo executable.
|
||||
And then find is the head, so it's the
|
||||
coordinator. And all the subordinates
|
||||
down the pipeline reach out to the head
|
||||
and say, "Hey,
|
||||
here's my script, here's my parameters,
|
||||
integrate me into you
|
||||
and I'll just become a hollow pipe that
|
||||
sends the final results down the line.
|
||||
Thus, find and grep and sort and unique
|
||||
and whatever else your big long stupid
|
||||
pipeline might use all get collapsed by
|
||||
Jofedo if they're all Jofedo scripts
|
||||
instead of the actual binaries, that is,
|
||||
into one unified Jofedo script in memory
|
||||
that then performs all these actions and
|
||||
thus can optimize away um cases where,
|
||||
for example, it would be wasteful to get
|
||||
certain information, um it it can
|
||||
optimize away that stuff and do it
|
||||
faster than you would ever be able to do
|
||||
it with a normal pipeline
|
||||
on your own.
|
||||
>> [clears throat]
|
||||
>> Anyway, I don't want to talk anymore. I
|
||||
know I've hit almost 15 minutes on just
|
||||
this part and I thought that this would
|
||||
be a good introduction to give you an
|
||||
idea of what we're doing here and why
|
||||
you funding Jofedo development is so
|
||||
important. This kind of logic is not
|
||||
something that just anybody can write.
|
||||
And even for me, it's not like this is
|
||||
necessarily easy. This is a lot of work
|
||||
and a lot of testing. So, look down um
|
||||
my Kofi will be in the description,
|
||||
possibly the pinned comment, um a link
|
||||
to the video that started all this,
|
||||
perhaps, too. And um thanks for your
|
||||
support. I hope to do you proud. Have a
|
||||
great day.
|
||||
Reference in New Issue
Block a user