conductor(deob_warmup): Update prompt_template.md v2 - encoding placeholder + remove wrong re-encodings + per-language << >> note

LLM-direct spec v2. Rule 5 uses placeholder scheme: float (general), integer (general), Scalar (linear/geo/tensor alg), float64 (resolved). 3 wrong re-encodings removed from the 6 Noise-Dedup Lexicon section: function/procedure, parameter/argument, input/arg. Per-language rendering section added for << / >>: C11 uses much_less/much_greater/weakly_coupled; Python uses same; Forth uses named words (avoids bit-shift collision). Verification checklist updated to include v2-specific items: NO RE-ENCODING for distinct terms, transcendental as classification, template notation B as default, per-language << >> rendering.
2026-06-23 20:00:58 -04:00
parent 014179aa71
commit 99bc1598d9
1 changed files with 89 additions and 25 deletions
@@ -92,35 +92,54 @@ Every Pass 1 concept is represented. If a concept can't be bounded, mark it "ind

 The deob-warmup's `prompt_template.md` Rule 4 should produce outputs that include the compression history per layer.

-### Rule 5: Encoding-explicit (NEW per user 2026-06-23)
+### Rule 5: Encoding-explicit (refined v2 per user 2026-06-23)

 **Every value-bearing term must have an `encoding:` attribute.** The encoding is the **bounded form** of the value; without the encoding, the value is **indefinite** (per §1.2).

-Default encoding: `float64` (~16 decimal digits).
+**Principled defaults (v2):**
+- `float` (general unbounded float placeholder) — the principled form for general-purpose floats
+- `integer` (general unbounded integer placeholder) — the principled form for general-purpose integers
+- `Scalar` (placeholder with specific meaning in linear alg, geo alg, tensor alg; per user 2026-06-23 "useful for linear alg, geo alg, tensor alg") — the principled form when the geometric meaning matters

-| Conventional | Re-encoded (NEW) | Encoding |
-|---|---|---|
-| "real number" | `kind : Real` resolves to `quantity : float64` | `float64` |
-| "Pi" | `kind : Pi` resolves to `quantity : float64` | `float64` |
-| "the value is 5" | `quantity(5) : int64` | `int64` |
-| "the value is 3.14" | `quantity(3.14) : float64` | `float64` |
-| "the probability is 0.5" | `quantity(0.5) : float64` | `float64` |
-| "the matrix" | `Matrix : 3x3 of float64` | (matrix + element encoding) |
-| "the vector" | `Vector : n of float64` | (vector + element encoding) |
+`float64` is the principled resolved default ONLY when the user defines a target resolution for the application. The v1 lexicon's blanket `float64` default was over-committing; v2 defers the resolution until the application context demands it.

-The encoding is **mandatory**. The deob-warmup's `prompt_template.md` should produce outputs that include the encoding explicitly.
+| Conventional | Re-encoded (placeholder, v2) | Re-encoded (resolved, v2) | Encoding |
+|---|---|---|---|
+| "real number" (general) | `kind : Real` resolves to `float` | `kind : Real` resolves to `quantity : float64` | `float` (placeholder) or `float64` (resolved) |
+| "real number" (linear/geo/tensor alg) | `kind : Real` resolves to `Scalar` | `kind : Real` resolves to `Scalar : float64` | `Scalar` (placeholder) or `Scalar : float64` (resolved) |
+| "Pi" | `kind : Pi` resolves to `float` | `kind : Pi` resolves to `quantity : float64` | `float` (placeholder) or `float64` (resolved) |
+| "the value is 5" (general) | `quantity(5) : integer` | `quantity(5) : int64` | `integer` (placeholder) or `int64` (resolved) |
+| "the value is 3.14" (general) | `quantity(3.14) : float` | `quantity(3.14) : float64` | `float` (placeholder) or `float64` (resolved) |
+| "the value is 3.14" (linear alg) | `Scalar(3.14)` | `Scalar(3.14) : float64` | `Scalar` (placeholder) or `Scalar : float64` (resolved) |
+| "the probability is 0.5" | `quantity(0.5) : float` | `quantity(0.5) : float64` | `float` (placeholder) or `float64` (resolved) |
+| "the matrix" (linear alg) | `Matrix : 3x3 of Scalar` | `Matrix : 3x3 of float64` | `Scalar` or `float64` (element encoding) |
+| "the vector" (linear alg) | `Vector : n of Scalar` | `Vector : n of float64` | `Scalar` or `float64` (element encoding) |
+| "the correlation is 0.98" (R1, NEW v2) | `correlation : float` | `correlation : float64` | `float` (placeholder) or `float64` (resolved) |

-The encoding taxonomy (per `report.md` §11.2):
- `int8 / int16 / int32 / int64` (exact integers, bounded)
- `uint8 / uint16 / uint32 / uint64` (exact unsigned integers, bounded)
- `float16 / float32 / float64 / float128` (floats, bounded; `float64` is the default)
+The encoding is **mandatory**. The deob-warmup's `prompt_template.md` should produce outputs that include the encoding explicitly. The principled form uses `float` / `integer` / `Scalar` as placeholders; the user (or the LLM at their direction) can resolve to `float64` / `int64` / etc. when the application context demands it.
+
+The encoding taxonomy (per `report.md` §11.2, refined v2):
+- `int8 / int16 / int32 / int64` (exact integers, bounded; resolved)
+- `uint8 / uint16 / uint32 / uint64` (exact unsigned integers, bounded; resolved)
+- `float16 / float32 / float64 / float128` (floats, bounded; resolved)
+- `float` (general unbounded placeholder; v2)
+- `integer` (general unbounded placeholder; v2)
+- `Scalar` (linear/geo/tensor alg placeholder; v2)
 - `bigint` (arbitrary precision, exact)
 - `decimal64 / decimal128` (financial precision, bounded)

 **Per user 2026-06-23 clarification:** "Quantity or scalar for value is fine but to keep in mind that if they are used, it should be associated with a finite encoding. Whereas the real number line for example is a classification of expressions that may resolve to any finite encoding of quantity resolution."

+**Per user 2026-06-23 (further clarification):** "I do like the encoding taxonomy table you have when picking a resolution matters though." The taxonomy table is preserved — the placeholder vs resolved distinction makes the taxonomy *more* useful, not less.
+
+**Per user 2026-06-23 (on `Scalar`):** "no keep scalar its useful for linear alg, geo alg, tensor alg." `Scalar` is preserved in the v2 taxonomy as a domain-specific placeholder for linear/geo/tensor algebra. The principled form distinguishes:
+- `float` = general-purpose unbounded float placeholder
+- `Scalar` = domain-specific placeholder for linear alg, geo alg, tensor alg (where the geometric meaning matters)
+
 The encoding-explicit form is the operational form of §1.1 (form requires bounds): every value must have a bounded form. The user's contribution: the bound is the `encoding:`.

+**The ontology note (per user 2026-06-23, confirmed):** "You can observe the shape of the procedure, not all possible result combinations or resolutions for a given metric utilized with that procedure." The encoding-placeholder form reflects this: the *shape* is observable (the value's type, its operation); the *resolution* is a user-defined target (the specific encoding). The honest epistemic hedging pattern is the operational form of this distinction.
+
 **Univalence footnote (per Cluster 0, P37):** Univalence is `∞_proc`, not `∞_val`. The univalence axiom (per HoTT) is the *operational* form of treating equivalences as equal; the user's stance is that this is a *compression* (per Rule 4) and can be **opted out of** when lossless verification is required. For proof checkers, microchip specs, and other lossless systems, set `univalence: off`; for database joins, type-class lookup, and other fluid equality systems, set `univalence: on`.

 ## The 3 Noise-Dedup Maps (apply automatically)
@@ -164,18 +183,27 @@ Each re-encoding produces 3 layers:

 Reject compressed notation (sigma, bar-over-symbols, tensor indices) and demand the **fully expanded form** (nested loops, limit definitions, full chain of substitutions). The user wants every intermediate step visible.

-## The 6 Noise-Dedup Lexicon (Tier 1-4 of `report.md` §3)
+## The 6 Noise-Dedup Lexicon (Tier 1-4 of `report.md` §3, refined v2)

-Reference: `report.md` §3 for the full lexicon (~70 terms after Phase 1 expansion). Quick reference:
+Reference: `report.md` §3 + `lexicon.md` §2 for the full lexicon (~76 terms after v2 refinement). Quick reference:

- **Tier 1 (Core concepts, 12 terms):** `set` → `kind`; `∀` → `forall`; `∃` → `exists`; `∧/∨/¬/→/∈` → `and/or/not/implies/in`; `⊥` → `Bottom`; `Notion` (ἔννοια) → `concept`; etc.
- **Tier 2 (Data-oriented pipeline, 18 terms):** `function` → `procedure`; `parameter` → `argument`; `return` → `result`; `definition` → `formation`; `Attribute/Property/Type` (extrinsic/intrinsic/kind); `static { }` / `exe { }`; `CodeSector`; `using`; `'figure N.N' assert`; etc.
- **Tier 3 (Type-theoretic primitives, 18 terms):** `Type` → `kind`; `Type of types` → `Kind`; `Constructor` → `intro`; `Eliminator` → `elim`; `Computation rule` (value-level) → `comp`; `Type-level Computation` → `getType(...) === T`; `Pair<A, B>` with `Build<A>/Build<B>`; `Dependent<x : A>(B)`; `lambda.x.M`; `objects : m : A, n : B ;`; etc.
- **Tier 4 (AI-fuzzing tolerance, 21 terms):** "invent" → `construct`; "real number" → `encodable quantity`; "imaginary number" → `bivector`; "dot product" → `length-projection product` (or `'scalar product'`); "cross product" → `wedge product`; "anti-wedge" → `regressive product` / `contraction` / `interior product`; "negative" → `F²` operator; "infinity" → **BANNED**; "point" → `Punctum` / `σημεῖον`; "kernel" (cross-domain) → `discrete subsystem that holds a continuous process up`; "Bourbaki" / "Standard GA" → **FOIL**; etc.
+- **Tier 1 (Core concepts, 13 terms, v2):** `set` → **NO RE-ENCODING** (clarify with etymology); `∀` → `forall`; `∃` → `exists`; `∧/∨/¬/→/∈` → `and/or/not/implies/in`; `⊥` → `Bottom`; `Notion` (ἔννοια) → `concept`; `<<` / `>>` → `much_less` / `much_greater` with `tolerance`; etc.
+- **Tier 2 (Data-oriented pipeline, 18 terms, v2):** `function` → **NO RE-ENCODING** (function = declarative; procedure = imperative); `parameter` → **NO RE-ENCODING** (parameter ≠ argument); `return` → `result`; `definition` → `formation`; `input` → **NO RE-ENCODING** (input ≠ arg); `Attribute/Property/Type/Genus` → **NO RE-ENCODING** (Type/Genus/Kind are analogous; `kind` reserved for enumeration types); `static { }` / `exe { }`; `CodeSector`; `using`; `'figure N.N' assert`; etc.
+- **Tier 3 (Type-theoretic primitives, 20 terms, v2):** `Type` → **NO RE-ENCODING** (Type/Genus/Kind are analogous; `kind` reserved for enums); `Type of types` → `Kind`; `Constructor` → `intro`; `Eliminator` → `elim`; `Computation rule` (value-level) → `comp`; `Type-level Computation` → `getType(...) === T`; `Pair<A, B>` with `Build<A>/Build<B>`; `Dependent(B) <- depends(x : A)` (B default) / `Dependent<B>` (C++) / `Dependent[B, x : A]` (Odin) / `Dependent[B, x : A]` (Jai); `lambda.x.M`; `Markov<X, Y, Z>` (R4, NEW); `PolyTimeAdversary` (R6, NEW); `objects : m : A, n : B ;`; etc.
+- **Tier 4 (AI-fuzzing tolerance, 26 terms, v2):** "invent" → `construct`; "real number" → `quantity(<value>) : float` (general placeholder) or `Scalar` (linear/geo/tensor alg placeholder) or `float64` (resolved); "imaginary number" → `bivector`; "function" → **NO RE-ENCODING**; "transcendental" → `classification of expressions with specific traits` (NOT "template expression"); "dot product" → `length-projection product` (or `'scalar product'`); "cross product" → `wedge product`; "anti-wedge" → `regressive product`; "negative" → `F²` operator; "infinity" → **BANNED**; "point" → `Punctum`; "kernel" (cross-domain) → `discrete subsystem`; "Bourbaki" → **FOIL**; "correlation" (R1, NEW) → `correlation : float`; "<< N" / ">> N" (NEW) → `much_less` / `much_greater` with `tolerance`; etc.
+
+**v2 changes (per user 2026-06-23):**
+- 5 wrong re-encodings removed (set, function, parameter, input, proof)
+- 1 wrong re-encoding replaced (transcendental as template → classification)
+- 4 new entries (correlation, Markov chain, PolyTimeAdversary, `<<` / `>>`)
+- 4 template notations (B default, C++ opt-in, Odin opt-in, Jai opt-in)
+- Encoding defaults changed: `float64` → `float` (general) / `integer` (general) / `Scalar` (linear/geo/tensor alg) / `float64` (resolved)

 ## The Sectored Language Operator Names (per `report.md` §3.5, from Cluster 9) — OPTIONAL

 > **Reading guide.** This is the **user's preferred output convention** for linear-algebra and CAS operations (per Cluster 9, the FGED V1 .sectr files). It is OPTIONAL — the de-obfuscation scheme does not require it. The scheme's principled re-encodings (e.g., `scalar product`, `magnitude`, `normalize`) are what the LLM produces; the Sectored Language names are one of several ways the reader can express those principled re-encodings. Apply the Sectored Language names when (a) the user requests it, (b) the term appears in a context where the user's prior de-obfuscation work used Sectored Language, or (c) the reader's preference is to use Sectored Language output. Otherwise, use conventional math with explicit type annotations.
+>
+> **Per user 2026-06-23:** "When it comes to the code psuedo sectr lang is not complete and prob needs adapting or further adjustments." The Sectored Language is a starting point; the user's actual code conventions (C11: raddbg / duffel / pikuma / forth bootslop; Python: manual_slop) take precedence. Pass 3 will adapt the Sectored Language as needed.

 For linear algebra and CAS, the user's preferred Sectored Language naming is:
 - `magnitude(v)` for `||v||`
@@ -190,6 +218,38 @@ For linear algebra and CAS, the user's preferred Sectored Language naming is:
 - `'Transform from coordinate A to B' (ab_transform, coord_A, M) -> Matrix -> ab_transform * coord_a * inverse(ab_transform)` for conjugation
 - `wedge(a, b : Vector) -> (bv : Bivector)` for exterior algebra wedge

+## Per-language rendering for `<<` / `>>` (NEW v2)
+
+The `<<` / `>>` operators (much less than / much more than) have a per-language rendering issue: in C11, `a << b` and `a >> b` are bit-shift operators. In Python, the same. In Forth, `a b <<` is a shift. The principled form cannot be used as-is in these languages — there's a namespace collision with bit-shift.
+
+**Resolution:** use named functions or operators in the target language. The principled form (`<<` / `>>` with `tolerance`) is reserved for the abstract mathematical context (e.g., the lexicon, the type-theoretic spec). In code, the named functions are used.
+
+### C11 rendering (per user 2026-06-23)
+
+Per user 2026-06-23, "weakly_coupled(...) is good for c11, much_less and much_greater can be used as well."
+
+| Principled form | C11 rendering | Notes |
+|---|---|---|
+| `<<` (much less than) | `much_less(a, b, tolerance)` | Comparison; takes `tolerance : float64` |
+| `>>` (much more than) | `much_greater(a, b, tolerance)` | Comparison; takes `tolerance : float64` |
+| `<< N` / `>> N` (predicate form) | `weakly_coupled(a, b, tolerance)` | Predicate; for "loose correlation" |
+
+### Python rendering
+
+| Principled form | Python rendering | Notes |
+|---|---|---|
+| `<<` | `much_less(a, b, tolerance)` | Same as C11 |
+| `>>` | `much_greater(a, b, tolerance)` | Same as C11 |
+| `<<` / `>>` (predicate) | `weakly_coupled(a, b, tolerance)` | Same as C11 |
+
+### Forth rendering
+
+| Principled form | Forth rendering | Notes |
+|---|---|---|
+| `<<` | `much_less` (named word) | Forth's `<<` is bit-shift; named word avoids collision |
+| `>>` | `much_greater` (named word) | Same |
+| `<<` / `>>` (predicate) | `weakly_coupled` (named word) | Same |
+
 ## The Form-Anchor Examples (per `report.md` §5.3)

 | Indefinite (Pass 1) | Bounded form (re-encoded) | Projection (form anchor) |
@@ -200,17 +260,21 @@ For linear algebra and CAS, the user's preferred Sectored Language naming is:
 | "negative" | `F²` operator (the explicit-flip) | The twice-applied flip |
 | "the limit as x → a" | `Limit(f, a) : L` | The evaluation of the limit at the point |

-## Verification
+## Verification (refined v2)

 After producing the 3 files, verify each:

 - [ ] **Lossless** — no Pass 1 concept dropped; compression history preserved per layer.
 - [ ] **Bounded** — no `∞_val` or `∞_card`; the "real number line" as a value is banned.
- [ ] **Encoding-explicit** (Rule 5, NEW) — every value-bearing term has an `encoding:` attribute. Default: `float64`.
+- [ ] **Encoding-explicit** (Rule 5, refined v2) — every value-bearing term has an `encoding:` attribute. Principled defaults: `float` (general), `integer` (general), `Scalar` (linear/geo/tensor alg). `float64` only when the user defines a target resolution.
 - [ ] **Constructively typed** — every expression has a type.
 - [ ] **Etymology-cited** — every new term has the 1-line origin + 1-line definition history.
 - [ ] **Form-anchored** — every re-encoding has a form anchor.
- [ ] **Noise-deduped** — the 6 noise-dedup maps applied where applicable.
+- [ ] **Noise-deduped** — the 6 noise-dedup maps applied where applicable (Maps 1, 2, 3 are reshaped in v2; Map 4 unchanged).
+- [ ] **NO RE-ENCODING for distinct terms** (v2) — function ≠ procedure; parameter ≠ argument; input ≠ arg; proof ≠ construction (construction is a sub-type tag); set ≠ kind (set is a data structure). These terms are clarified with native language + etymology, not collapsed.
+- [ ] **Transcendental is a classification** (v2) — not a "template expression for producing a value at a given resolution."
+- [ ] **Template notation B as default** (v2) — `Dependent(B) <- depends(x : A)`. C++ (`Dependent<B>`), Odin (`Dependent[B, x : A]`), Jai (same as Odin) are opt-in per context.
+- [ ] **`<<` / `>>` per-language rendering** (v2) — C11 uses `much_less(a, b, tolerance)` / `much_greater(a, b, tolerance)` / `weakly_coupled(a, b, tolerance)`. Python uses the same. Forth uses named words.
 - [ ] **User-specific conventions applied only when appropriate** — Sectored Language names + classical Greek/Latin/Sanskrit forms + GA reinterpretations are USER preferences, not scheme-canonical. Apply them only when the reader would prefer them (per §3.4 reading guide + §3.5 note).
 - [ ] **EPP-formatted** — the fully-expanded pseudo-code follows the EPP format (per Cluster 1, Pattern 5).
 - [ ] **Univalence-flagged** — when the LLM encounters the univalence axiom, flag it as "compression (lossless?**)" so the user can opt in or out.