diff --git a/conductor/code_styleguides/type_aliases.md b/conductor/code_styleguides/type_aliases.md index c854e48a..f6c321d7 100644 --- a/conductor/code_styleguides/type_aliases.md +++ b/conductor/code_styleguides/type_aliases.md @@ -316,4 +316,101 @@ A per-source-file layout matches the project's per-source-file guide structure ( - `conductor/code_styleguides/error_handling.md` — the `Result[T]` convention (complementary) - `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference - `conductor/tracks/data_structure_strengthening_20260606/` — the track that established this convention -- `docs/guide_state_lifecycle.md` — `App.__getattr__`/`__setattr__` state delegation (the runtime contract the aliases preserve) \ No newline at end of file +- `docs/guide_state_lifecycle.md` — `App.__getattr__`/`__setattr__` state delegation (the runtime contract the aliases preserve) +--- + +## When to Promote `TypeAlias` to `dataclass(frozen=True)` + +A `TypeAlias` like `Metadata: TypeAlias = dict[str, Any]` is a **rename** - the underlying shape is unchanged at runtime. This is appropriate when the shape is **open**, **self-describing**, or **transient**. Promote to `dataclass(frozen=True)` when the shape is **closed**, **named**, and **stable**. + +### Use `TypeAlias` when: + +| Condition | Why | Example | +|---|---|---| +| The shape is **truly open** (extra keys are allowed; the dict is a bag) | Aliases document intent without forcing a schema | `Metadata: TypeAlias = dict[str, Any]` (a generic key-value record) | +| The shape is **self-describing** (caller reads `entry.get("path")` without needing to know which keys are required) | Static analysis can't help here; the dict's open shape is the contract | `CommsLogEntry: TypeAlias = Metadata` (the AI comms log entries are heterogeneous) | +| The shape is **transient** (JSON-serialized, then deserialized; no in-memory invariants) | A frozen dataclass adds construction overhead for shapes that don't outlive a serialization round-trip | The JSON wire format (`JsonValue: TypeAlias = JsonPrimitive \| list["JsonValue"] \| dict[str, "JsonValue"]`) | +| The shape is **truly heterogeneous** (caller doesn't need to know which fields exist) | Documentation is the value; the type doesn't need enforcement | The `disc_entries: list[dict]` discussion list | + +### Promote to `dataclass(frozen=True)` when: + +| Condition | Why | Example from `vendor_capabilities.py` | +|---|---|---| +| The shape has **a known set of required fields** with **specific types** | Frozen dataclasses enforce the schema at construction time | `VendorCapabilities.vendor: str`, `model: str`, `vision: bool = False`, etc. | +| **Multiple sites access the same fields with string keys** | `payload["usage"]["input_tokens"]` x 5 sites = 5x the bug surface; `.usage.input_tokens` is type-checked | The OpenAI chat completion's `usage: UsageStats` with 4 int fields | +| The shape is **stable across serialization boundaries** (the on-disk / on-wire format is documented and won't change per-call) | A frozen dataclass guarantees the JSON shape is consistent | The `OpenAICompatibleRequest` (cross-vendor OpenAI-compatible request) | +| The shape is **shared across multiple modules** (the same schema is used by `ai_client.py` and `openai_compatible.py` and `api_hooks.py`) | One source of truth; changes propagate to all consumers | `ProviderHistory` shared between `_send_anthropic`, `_send_grok`, etc. | + +### The reference pattern (`src/vendor_capabilities.py`) + +```python +@dataclass(frozen=True) +class VendorCapabilities: + vendor: str + model: str + vision: bool = False + tool_calling: bool = True + caching: bool = False + # ... 22 named fields total + +_REGISTRY: dict[tuple[str, str], VendorCapabilities] = {} + +def register(cap: VendorCapabilities) -> None: + _REGISTRY[(cap.vendor, cap.model)] = cap + +def get_capabilities(vendor: str, model: str) -> VendorCapabilities: + if (vendor, model) in _REGISTRY: + return _REGISTRY[(vendor, model)] + if (vendor, '*') in _REGISTRY: + return _REGISTRY[(vendor, '*')] + raise KeyError(f'No capabilities registered for vendor={vendor!r} model={model!r}') +``` + +**The 5 properties that make this pattern successful:** + +| Property | Why it matters | +|---|---| +| `frozen=True` | Immutable; thread-safe; no accidental mutation | +| Named fields | Every capability is addressable by name (no `dict['vision']` lookups) | +| Module-level registry | O(1) lookup; no instantiation overhead | +| Wildcard `*` fallback | Per-vendor default for unregistered models | +| Flat (no nesting) | Single cache-line access for most queries | + +### The decision tree + +``` +Q: Is the shape a `dict[str, Any]` or similar open form? ++-- yes: +| Q: Does the shape have a known closed set of fields? +| +-- yes: +| | Q: Are 2+ of: (multi-module, multi-call-site, stable-serialization, known-types) true? +| | +-- yes -> dataclass(frozen=True) + module-level registry (vendor_capabilities pattern) +| | +-- no -> TypeAlias (Metadata / CommsLogEntry / FileItem) +| +-- no -> TypeAlias (the open shape is the contract) ++-- no: probably already a typed dataclass; if not, see if it should be one +``` + +### The 5 worked examples (per `ANY_TYPE_AUDIT_20260621.md` 3) + +The `any_type_componentization_20260621` track applies this rule to the 5 fat-struct candidates identified by the audit: + +| Candidate | From | To | Sites promoted | +|---|---|---|---:| +| P1 `MCP_TOOL_SPECS` | `list[dict[str, Any]]` (45 tools) | `src/mcp_tool_specs.py: ToolSpec` + `_REGISTRY: dict[str, ToolSpec]` | 8 | +| P1 `NormalizedResponse` + `OpenAICompatibleRequest` | `list[dict[str, Any]]` fields | `src/openai_schemas.py: ChatMessage, UsageStats, ToolCall` | 17 | +| P2 7x `*_history` + 7x `*_history_lock` | 14 module globals | `src/provider_state.py: ProviderHistory` + `_PROVIDER_HISTORIES: dict[str, ProviderHistory]` | 41 | +| P2 `LogRegistry.data: dict[str, dict[str, Any]]` | Nested anonymous dict | Inline `Session` + `SessionMetadata` dataclasses | 7 | +| P3 `WebSocketMessage` + `_serialize_for_api` | `dict[str, Any]` payloads | Inline `WebSocketMessage` + `JsonValue` TypeAlias | 16 | + +**Total: 89 sites promoted from `dict[str, Any]` / `list[dict[...]]` to typed dataclasses.** The remaining ~118 `Any` sites are intentional flexibility (SDK client holders, `__getattr__` dynamic dispatch, generic serialization - Patterns 3, 4, 5 per the audit). + +### See Also + +- `src/vendor_capabilities.py` - the canonical reference pattern +- `src/type_aliases.py` - the 10 existing TypeAliases + `FileItemsDiff` NamedTuple + the new `JsonPrimitive` / `JsonValue` +- `scripts/audit_dataclass_coverage.py` - the CI gate that enforces "no new fat-struct sites" +- `scripts/audit_weak_types.py` - the existing CI gate for the alias convention +- `conductor/code_styleguides/data_oriented_design.md` -1.2 "Design around the data" (the philosophical foundation) +- `conductor/code_styleguides/error_handling.md` - the `Result[T]` convention for `from_dict()` returns +- `docs/reports/ANY_TYPE_AUDIT_20260621.md` - the input artifact that identified the 5 candidates +- `conductor/tracks/any_type_componentization_20260621/` - the track that applied this rule