diff --git a/conductor/tracks/video_analysis_deob_warmup_20260621/prompt_template.md b/conductor/tracks/video_analysis_deob_warmup_20260621/prompt_template.md index 254fc7c6..f0259db6 100644 --- a/conductor/tracks/video_analysis_deob_warmup_20260621/prompt_template.md +++ b/conductor/tracks/video_analysis_deob_warmup_20260621/prompt_template.md @@ -92,35 +92,54 @@ Every Pass 1 concept is represented. If a concept can't be bounded, mark it "ind The deob-warmup's `prompt_template.md` Rule 4 should produce outputs that include the compression history per layer. -### Rule 5: Encoding-explicit (NEW per user 2026-06-23) +### Rule 5: Encoding-explicit (refined v2 per user 2026-06-23) **Every value-bearing term must have an `encoding:` attribute.** The encoding is the **bounded form** of the value; without the encoding, the value is **indefinite** (per §1.2). -Default encoding: `float64` (~16 decimal digits). +**Principled defaults (v2):** +- `float` (general unbounded float placeholder) — the principled form for general-purpose floats +- `integer` (general unbounded integer placeholder) — the principled form for general-purpose integers +- `Scalar` (placeholder with specific meaning in linear alg, geo alg, tensor alg; per user 2026-06-23 "useful for linear alg, geo alg, tensor alg") — the principled form when the geometric meaning matters -| Conventional | Re-encoded (NEW) | Encoding | -|---|---|---| -| "real number" | `kind : Real` resolves to `quantity : float64` | `float64` | -| "Pi" | `kind : Pi` resolves to `quantity : float64` | `float64` | -| "the value is 5" | `quantity(5) : int64` | `int64` | -| "the value is 3.14" | `quantity(3.14) : float64` | `float64` | -| "the probability is 0.5" | `quantity(0.5) : float64` | `float64` | -| "the matrix" | `Matrix : 3x3 of float64` | (matrix + element encoding) | -| "the vector" | `Vector : n of float64` | (vector + element encoding) | +`float64` is the principled resolved default ONLY when the user defines a target resolution for the application. The v1 lexicon's blanket `float64` default was over-committing; v2 defers the resolution until the application context demands it. -The encoding is **mandatory**. The deob-warmup's `prompt_template.md` should produce outputs that include the encoding explicitly. +| Conventional | Re-encoded (placeholder, v2) | Re-encoded (resolved, v2) | Encoding | +|---|---|---|---| +| "real number" (general) | `kind : Real` resolves to `float` | `kind : Real` resolves to `quantity : float64` | `float` (placeholder) or `float64` (resolved) | +| "real number" (linear/geo/tensor alg) | `kind : Real` resolves to `Scalar` | `kind : Real` resolves to `Scalar : float64` | `Scalar` (placeholder) or `Scalar : float64` (resolved) | +| "Pi" | `kind : Pi` resolves to `float` | `kind : Pi` resolves to `quantity : float64` | `float` (placeholder) or `float64` (resolved) | +| "the value is 5" (general) | `quantity(5) : integer` | `quantity(5) : int64` | `integer` (placeholder) or `int64` (resolved) | +| "the value is 3.14" (general) | `quantity(3.14) : float` | `quantity(3.14) : float64` | `float` (placeholder) or `float64` (resolved) | +| "the value is 3.14" (linear alg) | `Scalar(3.14)` | `Scalar(3.14) : float64` | `Scalar` (placeholder) or `Scalar : float64` (resolved) | +| "the probability is 0.5" | `quantity(0.5) : float` | `quantity(0.5) : float64` | `float` (placeholder) or `float64` (resolved) | +| "the matrix" (linear alg) | `Matrix : 3x3 of Scalar` | `Matrix : 3x3 of float64` | `Scalar` or `float64` (element encoding) | +| "the vector" (linear alg) | `Vector : n of Scalar` | `Vector : n of float64` | `Scalar` or `float64` (element encoding) | +| "the correlation is 0.98" (R1, NEW v2) | `correlation : float` | `correlation : float64` | `float` (placeholder) or `float64` (resolved) | -The encoding taxonomy (per `report.md` §11.2): -- `int8 / int16 / int32 / int64` (exact integers, bounded) -- `uint8 / uint16 / uint32 / uint64` (exact unsigned integers, bounded) -- `float16 / float32 / float64 / float128` (floats, bounded; `float64` is the default) +The encoding is **mandatory**. The deob-warmup's `prompt_template.md` should produce outputs that include the encoding explicitly. The principled form uses `float` / `integer` / `Scalar` as placeholders; the user (or the LLM at their direction) can resolve to `float64` / `int64` / etc. when the application context demands it. + +The encoding taxonomy (per `report.md` §11.2, refined v2): +- `int8 / int16 / int32 / int64` (exact integers, bounded; resolved) +- `uint8 / uint16 / uint32 / uint64` (exact unsigned integers, bounded; resolved) +- `float16 / float32 / float64 / float128` (floats, bounded; resolved) +- `float` (general unbounded placeholder; v2) +- `integer` (general unbounded placeholder; v2) +- `Scalar` (linear/geo/tensor alg placeholder; v2) - `bigint` (arbitrary precision, exact) - `decimal64 / decimal128` (financial precision, bounded) **Per user 2026-06-23 clarification:** "Quantity or scalar for value is fine but to keep in mind that if they are used, it should be associated with a finite encoding. Whereas the real number line for example is a classification of expressions that may resolve to any finite encoding of quantity resolution." +**Per user 2026-06-23 (further clarification):** "I do like the encoding taxonomy table you have when picking a resolution matters though." The taxonomy table is preserved — the placeholder vs resolved distinction makes the taxonomy *more* useful, not less. + +**Per user 2026-06-23 (on `Scalar`):** "no keep scalar its useful for linear alg, geo alg, tensor alg." `Scalar` is preserved in the v2 taxonomy as a domain-specific placeholder for linear/geo/tensor algebra. The principled form distinguishes: +- `float` = general-purpose unbounded float placeholder +- `Scalar` = domain-specific placeholder for linear alg, geo alg, tensor alg (where the geometric meaning matters) + The encoding-explicit form is the operational form of §1.1 (form requires bounds): every value must have a bounded form. The user's contribution: the bound is the `encoding:`. +**The ontology note (per user 2026-06-23, confirmed):** "You can observe the shape of the procedure, not all possible result combinations or resolutions for a given metric utilized with that procedure." The encoding-placeholder form reflects this: the *shape* is observable (the value's type, its operation); the *resolution* is a user-defined target (the specific encoding). The honest epistemic hedging pattern is the operational form of this distinction. + **Univalence footnote (per Cluster 0, P37):** Univalence is `∞_proc`, not `∞_val`. The univalence axiom (per HoTT) is the *operational* form of treating equivalences as equal; the user's stance is that this is a *compression* (per Rule 4) and can be **opted out of** when lossless verification is required. For proof checkers, microchip specs, and other lossless systems, set `univalence: off`; for database joins, type-class lookup, and other fluid equality systems, set `univalence: on`. ## The 3 Noise-Dedup Maps (apply automatically) @@ -164,18 +183,27 @@ Each re-encoding produces 3 layers: Reject compressed notation (sigma, bar-over-symbols, tensor indices) and demand the **fully expanded form** (nested loops, limit definitions, full chain of substitutions). The user wants every intermediate step visible. -## The 6 Noise-Dedup Lexicon (Tier 1-4 of `report.md` §3) +## The 6 Noise-Dedup Lexicon (Tier 1-4 of `report.md` §3, refined v2) -Reference: `report.md` §3 for the full lexicon (~70 terms after Phase 1 expansion). Quick reference: +Reference: `report.md` §3 + `lexicon.md` §2 for the full lexicon (~76 terms after v2 refinement). Quick reference: -- **Tier 1 (Core concepts, 12 terms):** `set` → `kind`; `∀` → `forall`; `∃` → `exists`; `∧/∨/¬/→/∈` → `and/or/not/implies/in`; `⊥` → `Bottom`; `Notion` (ἔννοια) → `concept`; etc. -- **Tier 2 (Data-oriented pipeline, 18 terms):** `function` → `procedure`; `parameter` → `argument`; `return` → `result`; `definition` → `formation`; `Attribute/Property/Type` (extrinsic/intrinsic/kind); `static { }` / `exe { }`; `CodeSector`; `using`; `'figure N.N' assert`; etc. -- **Tier 3 (Type-theoretic primitives, 18 terms):** `Type` → `kind`; `Type of types` → `Kind`; `Constructor` → `intro`; `Eliminator` → `elim`; `Computation rule` (value-level) → `comp`; `Type-level Computation` → `getType(...) === T`; `Pair` with `Build/Build`; `Dependent(B)`; `lambda.x.M`; `objects : m : A, n : B ;`; etc. -- **Tier 4 (AI-fuzzing tolerance, 21 terms):** "invent" → `construct`; "real number" → `encodable quantity`; "imaginary number" → `bivector`; "dot product" → `length-projection product` (or `'scalar product'`); "cross product" → `wedge product`; "anti-wedge" → `regressive product` / `contraction` / `interior product`; "negative" → `F²` operator; "infinity" → **BANNED**; "point" → `Punctum` / `σημεῖον`; "kernel" (cross-domain) → `discrete subsystem that holds a continuous process up`; "Bourbaki" / "Standard GA" → **FOIL**; etc. +- **Tier 1 (Core concepts, 13 terms, v2):** `set` → **NO RE-ENCODING** (clarify with etymology); `∀` → `forall`; `∃` → `exists`; `∧/∨/¬/→/∈` → `and/or/not/implies/in`; `⊥` → `Bottom`; `Notion` (ἔννοια) → `concept`; `<<` / `>>` → `much_less` / `much_greater` with `tolerance`; etc. +- **Tier 2 (Data-oriented pipeline, 18 terms, v2):** `function` → **NO RE-ENCODING** (function = declarative; procedure = imperative); `parameter` → **NO RE-ENCODING** (parameter ≠ argument); `return` → `result`; `definition` → `formation`; `input` → **NO RE-ENCODING** (input ≠ arg); `Attribute/Property/Type/Genus` → **NO RE-ENCODING** (Type/Genus/Kind are analogous; `kind` reserved for enumeration types); `static { }` / `exe { }`; `CodeSector`; `using`; `'figure N.N' assert`; etc. +- **Tier 3 (Type-theoretic primitives, 20 terms, v2):** `Type` → **NO RE-ENCODING** (Type/Genus/Kind are analogous; `kind` reserved for enums); `Type of types` → `Kind`; `Constructor` → `intro`; `Eliminator` → `elim`; `Computation rule` (value-level) → `comp`; `Type-level Computation` → `getType(...) === T`; `Pair` with `Build/Build`; `Dependent(B) <- depends(x : A)` (B default) / `Dependent` (C++) / `Dependent[B, x : A]` (Odin) / `Dependent[B, x : A]` (Jai); `lambda.x.M`; `Markov` (R4, NEW); `PolyTimeAdversary` (R6, NEW); `objects : m : A, n : B ;`; etc. +- **Tier 4 (AI-fuzzing tolerance, 26 terms, v2):** "invent" → `construct`; "real number" → `quantity() : float` (general placeholder) or `Scalar` (linear/geo/tensor alg placeholder) or `float64` (resolved); "imaginary number" → `bivector`; "function" → **NO RE-ENCODING**; "transcendental" → `classification of expressions with specific traits` (NOT "template expression"); "dot product" → `length-projection product` (or `'scalar product'`); "cross product" → `wedge product`; "anti-wedge" → `regressive product`; "negative" → `F²` operator; "infinity" → **BANNED**; "point" → `Punctum`; "kernel" (cross-domain) → `discrete subsystem`; "Bourbaki" → **FOIL**; "correlation" (R1, NEW) → `correlation : float`; "<< N" / ">> N" (NEW) → `much_less` / `much_greater` with `tolerance`; etc. + +**v2 changes (per user 2026-06-23):** +- 5 wrong re-encodings removed (set, function, parameter, input, proof) +- 1 wrong re-encoding replaced (transcendental as template → classification) +- 4 new entries (correlation, Markov chain, PolyTimeAdversary, `<<` / `>>`) +- 4 template notations (B default, C++ opt-in, Odin opt-in, Jai opt-in) +- Encoding defaults changed: `float64` → `float` (general) / `integer` (general) / `Scalar` (linear/geo/tensor alg) / `float64` (resolved) ## The Sectored Language Operator Names (per `report.md` §3.5, from Cluster 9) — OPTIONAL > **Reading guide.** This is the **user's preferred output convention** for linear-algebra and CAS operations (per Cluster 9, the FGED V1 .sectr files). It is OPTIONAL — the de-obfuscation scheme does not require it. The scheme's principled re-encodings (e.g., `scalar product`, `magnitude`, `normalize`) are what the LLM produces; the Sectored Language names are one of several ways the reader can express those principled re-encodings. Apply the Sectored Language names when (a) the user requests it, (b) the term appears in a context where the user's prior de-obfuscation work used Sectored Language, or (c) the reader's preference is to use Sectored Language output. Otherwise, use conventional math with explicit type annotations. +> +> **Per user 2026-06-23:** "When it comes to the code psuedo sectr lang is not complete and prob needs adapting or further adjustments." The Sectored Language is a starting point; the user's actual code conventions (C11: raddbg / duffel / pikuma / forth bootslop; Python: manual_slop) take precedence. Pass 3 will adapt the Sectored Language as needed. For linear algebra and CAS, the user's preferred Sectored Language naming is: - `magnitude(v)` for `||v||` @@ -190,6 +218,38 @@ For linear algebra and CAS, the user's preferred Sectored Language naming is: - `'Transform from coordinate A to B' (ab_transform, coord_A, M) -> Matrix -> ab_transform * coord_a * inverse(ab_transform)` for conjugation - `wedge(a, b : Vector) -> (bv : Bivector)` for exterior algebra wedge +## Per-language rendering for `<<` / `>>` (NEW v2) + +The `<<` / `>>` operators (much less than / much more than) have a per-language rendering issue: in C11, `a << b` and `a >> b` are bit-shift operators. In Python, the same. In Forth, `a b <<` is a shift. The principled form cannot be used as-is in these languages — there's a namespace collision with bit-shift. + +**Resolution:** use named functions or operators in the target language. The principled form (`<<` / `>>` with `tolerance`) is reserved for the abstract mathematical context (e.g., the lexicon, the type-theoretic spec). In code, the named functions are used. + +### C11 rendering (per user 2026-06-23) + +Per user 2026-06-23, "weakly_coupled(...) is good for c11, much_less and much_greater can be used as well." + +| Principled form | C11 rendering | Notes | +|---|---|---| +| `<<` (much less than) | `much_less(a, b, tolerance)` | Comparison; takes `tolerance : float64` | +| `>>` (much more than) | `much_greater(a, b, tolerance)` | Comparison; takes `tolerance : float64` | +| `<< N` / `>> N` (predicate form) | `weakly_coupled(a, b, tolerance)` | Predicate; for "loose correlation" | + +### Python rendering + +| Principled form | Python rendering | Notes | +|---|---|---| +| `<<` | `much_less(a, b, tolerance)` | Same as C11 | +| `>>` | `much_greater(a, b, tolerance)` | Same as C11 | +| `<<` / `>>` (predicate) | `weakly_coupled(a, b, tolerance)` | Same as C11 | + +### Forth rendering + +| Principled form | Forth rendering | Notes | +|---|---|---| +| `<<` | `much_less` (named word) | Forth's `<<` is bit-shift; named word avoids collision | +| `>>` | `much_greater` (named word) | Same | +| `<<` / `>>` (predicate) | `weakly_coupled` (named word) | Same | + ## The Form-Anchor Examples (per `report.md` §5.3) | Indefinite (Pass 1) | Bounded form (re-encoded) | Projection (form anchor) | @@ -200,17 +260,21 @@ For linear algebra and CAS, the user's preferred Sectored Language naming is: | "negative" | `F²` operator (the explicit-flip) | The twice-applied flip | | "the limit as x → a" | `Limit(f, a) : L` | The evaluation of the limit at the point | -## Verification +## Verification (refined v2) After producing the 3 files, verify each: - [ ] **Lossless** — no Pass 1 concept dropped; compression history preserved per layer. - [ ] **Bounded** — no `∞_val` or `∞_card`; the "real number line" as a value is banned. -- [ ] **Encoding-explicit** (Rule 5, NEW) — every value-bearing term has an `encoding:` attribute. Default: `float64`. +- [ ] **Encoding-explicit** (Rule 5, refined v2) — every value-bearing term has an `encoding:` attribute. Principled defaults: `float` (general), `integer` (general), `Scalar` (linear/geo/tensor alg). `float64` only when the user defines a target resolution. - [ ] **Constructively typed** — every expression has a type. - [ ] **Etymology-cited** — every new term has the 1-line origin + 1-line definition history. - [ ] **Form-anchored** — every re-encoding has a form anchor. -- [ ] **Noise-deduped** — the 6 noise-dedup maps applied where applicable. +- [ ] **Noise-deduped** — the 6 noise-dedup maps applied where applicable (Maps 1, 2, 3 are reshaped in v2; Map 4 unchanged). +- [ ] **NO RE-ENCODING for distinct terms** (v2) — function ≠ procedure; parameter ≠ argument; input ≠ arg; proof ≠ construction (construction is a sub-type tag); set ≠ kind (set is a data structure). These terms are clarified with native language + etymology, not collapsed. +- [ ] **Transcendental is a classification** (v2) — not a "template expression for producing a value at a given resolution." +- [ ] **Template notation B as default** (v2) — `Dependent(B) <- depends(x : A)`. C++ (`Dependent`), Odin (`Dependent[B, x : A]`), Jai (same as Odin) are opt-in per context. +- [ ] **`<<` / `>>` per-language rendering** (v2) — C11 uses `much_less(a, b, tolerance)` / `much_greater(a, b, tolerance)` / `weakly_coupled(a, b, tolerance)`. Python uses the same. Forth uses named words. - [ ] **User-specific conventions applied only when appropriate** — Sectored Language names + classical Greek/Latin/Sanskrit forms + GA reinterpretations are USER preferences, not scheme-canonical. Apply them only when the reader would prefer them (per §3.4 reading guide + §3.5 note). - [ ] **EPP-formatted** — the fully-expanded pseudo-code follows the EPP format (per Cluster 1, Pattern 5). - [ ] **Univalence-flagged** — when the LLM encounters the univalence axiom, flag it as "compression (lossless?**)" so the user can opt in or out.