Private
Public Access
0
0

docs(styleguide): add 12 'When to Promote TypeAlias to dataclass' (t0_4)

Phase 0 of any_type_componentization_20260621. Adds the canonical
decision rule that future contributors can apply without re-deriving:

- TypeAlias conditions: open shape, self-describing, transient
- dataclass(frozen=True) conditions: known fields, multi-site access,
  stable serialization, shared across modules
- The src/vendor_capabilities.py reference pattern (5 properties)
- Decision tree
- The 5 worked examples (89 sites promoted per the audit)
- Cross-references to audit scripts + input artifact + track

This is the canonical artifact for the 'when to dataclass' question;
subsequent phases refer to it via 'see styleguide 12' rather than
re-deriving the rule.
This commit is contained in:
2026-06-21 15:58:42 -04:00
parent 4e658dd25c
commit a28d8723a8
+98 -1
View File
@@ -316,4 +316,101 @@ A per-source-file layout matches the project's per-source-file guide structure (
- `conductor/code_styleguides/error_handling.md` — the `Result[T]` convention (complementary)
- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference
- `conductor/tracks/data_structure_strengthening_20260606/` — the track that established this convention
- `docs/guide_state_lifecycle.md``App.__getattr__`/`__setattr__` state delegation (the runtime contract the aliases preserve)
- `docs/guide_state_lifecycle.md``App.__getattr__`/`__setattr__` state delegation (the runtime contract the aliases preserve)
---
## When to Promote `TypeAlias` to `dataclass(frozen=True)`
A `TypeAlias` like `Metadata: TypeAlias = dict[str, Any]` is a **rename** - the underlying shape is unchanged at runtime. This is appropriate when the shape is **open**, **self-describing**, or **transient**. Promote to `dataclass(frozen=True)` when the shape is **closed**, **named**, and **stable**.
### Use `TypeAlias` when:
| Condition | Why | Example |
|---|---|---|
| The shape is **truly open** (extra keys are allowed; the dict is a bag) | Aliases document intent without forcing a schema | `Metadata: TypeAlias = dict[str, Any]` (a generic key-value record) |
| The shape is **self-describing** (caller reads `entry.get("path")` without needing to know which keys are required) | Static analysis can't help here; the dict's open shape is the contract | `CommsLogEntry: TypeAlias = Metadata` (the AI comms log entries are heterogeneous) |
| The shape is **transient** (JSON-serialized, then deserialized; no in-memory invariants) | A frozen dataclass adds construction overhead for shapes that don't outlive a serialization round-trip | The JSON wire format (`JsonValue: TypeAlias = JsonPrimitive \| list["JsonValue"] \| dict[str, "JsonValue"]`) |
| The shape is **truly heterogeneous** (caller doesn't need to know which fields exist) | Documentation is the value; the type doesn't need enforcement | The `disc_entries: list[dict]` discussion list |
### Promote to `dataclass(frozen=True)` when:
| Condition | Why | Example from `vendor_capabilities.py` |
|---|---|---|
| The shape has **a known set of required fields** with **specific types** | Frozen dataclasses enforce the schema at construction time | `VendorCapabilities.vendor: str`, `model: str`, `vision: bool = False`, etc. |
| **Multiple sites access the same fields with string keys** | `payload["usage"]["input_tokens"]` x 5 sites = 5x the bug surface; `.usage.input_tokens` is type-checked | The OpenAI chat completion's `usage: UsageStats` with 4 int fields |
| The shape is **stable across serialization boundaries** (the on-disk / on-wire format is documented and won't change per-call) | A frozen dataclass guarantees the JSON shape is consistent | The `OpenAICompatibleRequest` (cross-vendor OpenAI-compatible request) |
| The shape is **shared across multiple modules** (the same schema is used by `ai_client.py` and `openai_compatible.py` and `api_hooks.py`) | One source of truth; changes propagate to all consumers | `ProviderHistory` shared between `_send_anthropic`, `_send_grok`, etc. |
### The reference pattern (`src/vendor_capabilities.py`)
```python
@dataclass(frozen=True)
class VendorCapabilities:
vendor: str
model: str
vision: bool = False
tool_calling: bool = True
caching: bool = False
# ... 22 named fields total
_REGISTRY: dict[tuple[str, str], VendorCapabilities] = {}
def register(cap: VendorCapabilities) -> None:
_REGISTRY[(cap.vendor, cap.model)] = cap
def get_capabilities(vendor: str, model: str) -> VendorCapabilities:
if (vendor, model) in _REGISTRY:
return _REGISTRY[(vendor, model)]
if (vendor, '*') in _REGISTRY:
return _REGISTRY[(vendor, '*')]
raise KeyError(f'No capabilities registered for vendor={vendor!r} model={model!r}')
```
**The 5 properties that make this pattern successful:**
| Property | Why it matters |
|---|---|
| `frozen=True` | Immutable; thread-safe; no accidental mutation |
| Named fields | Every capability is addressable by name (no `dict['vision']` lookups) |
| Module-level registry | O(1) lookup; no instantiation overhead |
| Wildcard `*` fallback | Per-vendor default for unregistered models |
| Flat (no nesting) | Single cache-line access for most queries |
### The decision tree
```
Q: Is the shape a `dict[str, Any]` or similar open form?
+-- yes:
| Q: Does the shape have a known closed set of fields?
| +-- yes:
| | Q: Are 2+ of: (multi-module, multi-call-site, stable-serialization, known-types) true?
| | +-- yes -> dataclass(frozen=True) + module-level registry (vendor_capabilities pattern)
| | +-- no -> TypeAlias (Metadata / CommsLogEntry / FileItem)
| +-- no -> TypeAlias (the open shape is the contract)
+-- no: probably already a typed dataclass; if not, see if it should be one
```
### The 5 worked examples (per `ANY_TYPE_AUDIT_20260621.md` 3)
The `any_type_componentization_20260621` track applies this rule to the 5 fat-struct candidates identified by the audit:
| Candidate | From | To | Sites promoted |
|---|---|---|---:|
| P1 `MCP_TOOL_SPECS` | `list[dict[str, Any]]` (45 tools) | `src/mcp_tool_specs.py: ToolSpec` + `_REGISTRY: dict[str, ToolSpec]` | 8 |
| P1 `NormalizedResponse` + `OpenAICompatibleRequest` | `list[dict[str, Any]]` fields | `src/openai_schemas.py: ChatMessage, UsageStats, ToolCall` | 17 |
| P2 7x `*_history` + 7x `*_history_lock` | 14 module globals | `src/provider_state.py: ProviderHistory` + `_PROVIDER_HISTORIES: dict[str, ProviderHistory]` | 41 |
| P2 `LogRegistry.data: dict[str, dict[str, Any]]` | Nested anonymous dict | Inline `Session` + `SessionMetadata` dataclasses | 7 |
| P3 `WebSocketMessage` + `_serialize_for_api` | `dict[str, Any]` payloads | Inline `WebSocketMessage` + `JsonValue` TypeAlias | 16 |
**Total: 89 sites promoted from `dict[str, Any]` / `list[dict[...]]` to typed dataclasses.** The remaining ~118 `Any` sites are intentional flexibility (SDK client holders, `__getattr__` dynamic dispatch, generic serialization - Patterns 3, 4, 5 per the audit).
### See Also
- `src/vendor_capabilities.py` - the canonical reference pattern
- `src/type_aliases.py` - the 10 existing TypeAliases + `FileItemsDiff` NamedTuple + the new `JsonPrimitive` / `JsonValue`
- `scripts/audit_dataclass_coverage.py` - the CI gate that enforces "no new fat-struct sites"
- `scripts/audit_weak_types.py` - the existing CI gate for the alias convention
- `conductor/code_styleguides/data_oriented_design.md` -1.2 "Design around the data" (the philosophical foundation)
- `conductor/code_styleguides/error_handling.md` - the `Result[T]` convention for `from_dict()` returns
- `docs/reports/ANY_TYPE_AUDIT_20260621.md` - the input artifact that identified the 5 candidates
- `conductor/tracks/any_type_componentization_20260621/` - the track that applied this rule