docs(reports): ANY_TYPE_AUDIT_20260621 - Any-type usage & componentization opportunities

2026-06-21 14:28:16 -04:00
parent 6210410cda
commit aca84b881b
1 changed files with 569 additions and 0 deletions
@@ -0,0 +1,569 @@
+# Audit Report: `Any` Type Usage & Data-Oriented Componentization Opportunities
+
+**Date:** 2026-06-21
+**Author:** Tier 2 Tech Lead (autonomous sandbox)
+**Track:** `data_structure_strengthening_20260606` (follow-on)
+**Status:** Findings report; **NOT a track spec** — Tier 1 is expected to devise the follow-up track.
+
+---
+
+## 1. Executive Summary
+
+The `data_structure_strengthening_20260606` track replaced 416 `dict[str, Any]` / `list[dict[...]]` / `Tuple[...]` annotations with 10 `TypeAlias` definitions + 1 `NamedTuple` (528 → 112 weak sites; 79% reduction). The 10 `TypeAlias` definitions are **renames** — they point to the same underlying `dict[str, Any]` / `list[dict[str, Any]]` shapes. The alias names document intent; they do not add type safety.
+
+This report audits the **remaining `Any` usage** (~300 occurrences across 41 files in `src/`) and identifies **fat-struct componentization opportunities** that can be promoted to true `dataclass(frozen=True)` definitions, following the pattern already established by `src/vendor_capabilities.py`. The 5 highest-value candidates are:
+
+| Rank | File | Fat Struct | Sites | Estimated Value |
+|---|---|---|---:|---|
+| **P1** | `src/mcp_client.py` | `MCP_TOOL_SPECS` (45 tools) | 8 Any | **HIGH** — 45 × ~4 params = ~180 implicit fields |
+| **P1** | `src/openai_compatible.py` | `NormalizedResponse` + `OpenAICompatibleRequest` | 17 Any | **HIGH** — message/tool-call/usage schemas are well-known |
+| **P2** | `src/ai_client.py` | 7 × `*_history: list[Metadata]` + 7 × `*_history_lock` | 41 Any | **HIGH** — unification is a `ProviderHistory` dict |
+| **P2** | `src/log_registry.py` | `data: dict[str, dict[str, Any]]` | 7 Any | MEDIUM — session metadata has 5 well-defined fields |
+| **P3** | `src/api_hooks.py` | `_serialize_for_api(obj: Any) -> Any` + `broadcast(payload)` | 16 Any | LOW — internal serialization; lower semantic gain |
+
+**The recommended sequencing** is to run `code_path_audit_20260607` FIRST (now that the 4 foundational tracks have shipped: `qwen_llama_grok`, `data_oriented_error_handling`, **`data_structure_strengthening`**, `mcp_architecture_refactor`). The audit's `ActionProfile` for the 3 in-scope actions (AI message lifecycle, discussion save/load, GUI startup) will identify which fat-struct sites are in the **hot path** vs. cold. The componentization work then targets the hot-path fat structs first.
+
+The follow-up track (proposed §6 below) is the **"Any-Type Componentization" track** — a 6-phase refactor that converts the 5 fat-struct candidates above into true `dataclass(frozen=True)` definitions, following the `vendor_capabilities` template.
+
+---
+
+## 2. Methodology
+
+### 2.1 Scope
+
+This report covers `Any` type annotations in `src/**/*.py`. The 41 files surveyed:
+
+```
+ai_client.py (41), app_controller.py (25), openai_compatible.py (17),
+api_hooks.py (16), gui_2.py (13), events.py (13), mcp_client.py (8),
+hot_reloader.py (7), log_registry.py (7), models.py (7), command_palette.py (6),
+commands.py (6), rag_engine.py (6), theme_models.py (6), history.py (6),
+api_hooks_helpers.py (6), conductor_tech_lead.py (5), orchestrator_pm.py (5),
+imgui_scopes.py (5), file_cache.py (1), warmup.py (1), ... [21 more files ≤4]
+```
+
+### 2.2 The 5 Patterns of `Any` Usage
+
+Across all 41 files, `Any` falls into exactly 5 patterns. The patterns are ranked by **% of total occurrences**:
+
+| # | Pattern | % of `Any` | Replaceable? |
+|---|---|---:|---|
+| 1 | `dict[str, Any]` — JSON-shaped payloads (config, API bodies, tool specs) | ~35% | YES → `Metadata` (existing) or new `ToolInput`/`ApiPayload`/`SessionData` |
+| 2 | `*_history: list[Metadata]` / `list[Any]` — per-provider message lists | ~12% | YES → unified `ProviderHistory` dict |
+| 3 | SDK client holders (`_gemini_chat: Any = None`, etc.) | ~8% | NO (lazy-init pattern; heterogeneous types) |
+| 4 | Dynamic dispatch (`__getattr__` returning `Any`) | ~6% | NO (intentional delegation) |
+| 5 | Generic serialization (`obj: Any) -> Any`) | ~5% | NO (genuinely generic) |
+
+**~57% of `Any` usages are replaceable with concrete dataclasses.** The remaining ~43% are intentional (SDK holders, dynamic dispatch, serialization).
+
+### 2.3 The Reference Pattern: `src/vendor_capabilities.py`
+
+`vendor_capabilities.py` is the **canonical "module-level abstraction layer"** the user pointed to. Its structure (76 lines):
+
+```python
+@dataclass(frozen=True)
+class VendorCapabilities:
+ vendor: str
+ model: str
+ vision: bool = False
+ tool_calling: bool = True
+ caching: bool = False
+ # ... 22 named fields total
+_REGISTRY: dict[tuple[str, str], VendorCapabilities] = {}
+
+def register(cap: VendorCapabilities) -> None: ...
+def get_capabilities(vendor: str, model: str) -> VendorCapabilities: ...
+```
+
+**Properties that make this pattern successful:**
+
+| Property | Why it matters |
+|---|---|
+| `frozen=True` | Immutable; thread-safe; no accidental mutation |
+| Named fields | Every capability is addressable by name (no `dict['vision']` lookups) |
+| Module-level registry | O(1) lookup; no instantiation overhead |
+| Wildcard `*` model | Fallback for unregistered models |
+| Flat (no nesting) | Single cache-line access for most queries |
+| Registration pattern | Extensible without modifying existing code |
+
+**All 5 fat-struct candidates below should follow this template.**
+
+---
+
+## 3. The Inventory: Top 5 Fat-Struct Candidates
+
+### 3.1 P1 — `src/mcp_client.py: MCP_TOOL_SPECS` (45 tools, 8 Any usages)
+
+**Current state** (`src/mcp_client.py:1954-1972`):
+
+```python
+def get_tool_schemas() -> list[dict[str, Any]]:
+ ...
+MCP_TOOL_SPECS: list[dict[str, Any]] = [
+ {
+  "name": "py_remove_def",
+  "description": "Excises a specific class or function from a Python file.",
+  "parameters": {
+   "type": "object",
+   "properties": {
+    "path": { "type": "string", "description": "Path to the .py file." },
+    "name": { "type": "string", "description": "The name of the class or function to remove." }
+   },
+   "required": ["path", "name"]
+  }
+ },
+ # ... 44 more dicts of identical shape
+]
+TOOL_NAMES: set[str] = {t['name'] for t in MCP_TOOL_SPECS}
+```
+
+**Problem:** 45 tool specs × ~3-5 parameters = ~180 implicit fields. The set comprehension `{t['name'] for t in MCP_TOOL_SPECS}` demonstrates the access pattern — repeated string-key lookups on untyped dicts. The dispatch map (`_dispatch_table`) is keyed by string tool names; static analysis cannot verify the key set.
+
+**Proposed componentization** (following the `vendor_capabilities` pattern):
+
+```python
+# src/mcp_tool_specs.py (new; module-level abstraction)
+@dataclass(frozen=True)
+class ToolParameter:
+ name: str
+ type: str           # "string" | "integer" | "boolean" | "object" | "array"
+ description: str
+ required: bool = False
+ enum: Optional[list[str]] = None
+
+@dataclass(frozen=True)
+class ToolSpec:
+ name: str
+ description: str
+ parameters: tuple[ToolParameter, ...]
+ category: str = "file"  # "file" | "ast" | "network" | "surgical"
+
+_REGISTRY: dict[str, ToolSpec] = {}
+
+def register(spec: ToolSpec) -> None: ...
+def get_tool_spec(name: str) -> ToolSpec: ...
+def get_tool_schemas() -> list[ToolSpec]: ...
+def tool_names() -> set[str]: ...
+```
+
+**Call sites to update:** `mcp_client.py:1772 dispatch()`, `mcp_client.py:1939 async_dispatch()`, the `TOOL_NAMES` set, the `_dispatch_table` map (could become a `dict[str, Callable]` instead of string-keyed).
+
+**Estimated value:** **HIGH** — 45 tools × ~4 params each = ~180 implicit fields become explicit. Enables IDE autocomplete of tool names + parameters. Static analysis can verify dispatch keys.
+
+---
+
+### 3.2 P1 — `src/openai_compatible.py: NormalizedResponse + OpenAICompatibleRequest` (17 Any)
+
+**Current state** (`src/openai_compatible.py:22-42`):
+
+```python
+@dataclass(frozen=True)
+class NormalizedResponse:
+ text: str
+ tool_calls: list[dict[str, Any]]           # FAT: JSON tool call shape
+ usage_input_tokens: int
+ usage_output_tokens: int
+ usage_cache_read_tokens: int
+ usage_cache_creation_tokens: int
+ raw_response: Any                         # FAT: SDK-specific response
+
+@dataclass
+class OpenAICompatibleRequest:
+ messages: list[dict[str, Any]]            # FAT: message shape
+ model: str
+ temperature: float = 0.0
+ top_p: float = 1.0
+ max_tokens: int = 8192
+ tools: Optional[list[dict[str, Any]]] = None   # FAT: tool schema
+ tool_choice: str = "auto"
+ stream: bool = False
+ stream_callback: Optional[Callable[[str], None]] = None
+ extra_body: Optional[dict[str, Any]] = None   # FAT: arbitrary params
+```
+
+**Three distinct fat-struct shapes** are in this file:
+1. **Tool call** (id, type, function: {name, arguments})
+2. **Chat message** (role, content, optional tool_calls/tool_call_id/name)
+3. **Usage stats** (input_tokens, output_tokens, cache_read, cache_creation)
+
+**Proposed componentization:**
+
+```python
+# src/openai_schemas.py (new; shared between openai_compatible.py and ai_client.py)
+
+@dataclass(frozen=True)
+class ToolCall:
+ id: str
+ type: str = "function"
+ function: "ToolCallFunction"
+
+@dataclass(frozen=True)
+class ToolCallFunction:
+ name: str
+ arguments: str  # JSON string
+
+@dataclass(frozen=True)
+class ChatMessage:
+ role: str  # "system" | "user" | "assistant" | "tool"
+ content: str
+ tool_calls: Optional[tuple[ToolCall, ...]] = None
+ tool_call_id: Optional[str] = None
+ name: Optional[str] = None
+
+@dataclass(frozen=True)
+class UsageStats:
+ input_tokens: int
+ output_tokens: int
+ cache_read_tokens: int = 0
+ cache_creation_tokens: int = 0
+
+# NormalizedResponse becomes:
+@dataclass(frozen=True)
+class NormalizedResponse:
+ text: str
+ tool_calls: tuple[ToolCall, ...]
+ usage: UsageStats
+ raw_response: Any  # Unavoidable: SDK-specific
+
+# OpenAICompatibleRequest becomes:
+@dataclass
+class OpenAICompatibleRequest:
+ messages: list[ChatMessage]
+ model: str
+ temperature: float = 0.0
+ # ... etc
+ tools: Optional[list[ToolSpec]] = None  # Use the §3.1 ToolSpec
+```
+
+**Call sites to update:** `_send_grok()`, `_send_minimax()`, `_send_llama()` in `ai_client.py` (3 functions); `openai_compatible.py` itself (~5 internal functions).
+
+**Estimated value:** **HIGH** — The OpenAI chat completion API is well-documented; the schema is stable; the LLM-readable documentation at <https://platform.openai.com/docs/api-reference/chat> is the source of truth. The 17 Any usages become 3 well-named dataclasses.
+
+**Cross-reference to §3.1:** The `tools: Optional[list[ToolSpec]]` field reuses the `ToolSpec` from the `mcp_client.py` refactor. One component, two consumers.
+
+---
+
+### 3.3 P2 — `src/ai_client.py: 7 × ProviderHistory` (41 Any)
+
+**Current state** (`src/ai_client.py:108-134`):
+
+```python
+_anthropic_history:      list[Metadata] = []
+_deepseek_history:       list[Metadata] = []
+_minimax_history:        list[Metadata] = []
+_qwen_history:           list[Metadata] = []
+_grok_history:           list[Metadata] = []
+_llama_history:          list[Metadata] = []
+# Plus 6 lock variables:
+_anthropic_history_lock: threading.Lock = threading.Lock()
+_deepseek_history_lock:  threading.Lock = threading.Lock()
+# ... etc
+```
+
+Plus the SDK client holders (Patterns 3, "keep as-is"):
+
+```python
+_gemini_client:      Optional[genai.Client] = None
+_gemini_chat:        Any             = None
+_gemini_cache:       Any             = None
+_deepseek_client:    Any             = None
+_minimax_client:     Any             = None
+_qwen_client:        Any             = None
+_grok_client:        Any             = None
+_llama_client:       Any             = None
+```
+
+**Problem:** 7 per-provider history lists + 7 locks = **14 module-level globals**. Each `_send_<provider>()` function mutates its own history. The `reset_session()` function knows about all 14. The cross-cutting concern is "history management" but it's spread across 14 variables.
+
+**Proposed componentization** (componentizing the history aspect; keeping the SDK clients as-is per Pattern 3):
+
+```python
+# src/provider_state.py (new)
+
+@dataclass
+class ProviderHistory:
+ messages: list[Metadata] = field(default_factory=list)
+ lock: threading.Lock = field(default_factory=threading.Lock)
+
+    def append(self, message: Metadata) -> None:
+        with self.lock:
+            self.messages.append(message)
+
+    def get_all(self) -> list[Metadata]:
+        with self.lock:
+            return list(self.messages)
+
+    def replace_all(self, messages: list[Metadata]) -> None:
+        with self.lock:
+            self.messages = list(messages)
+
+    def clear(self) -> None:
+        with self.lock:
+            self.messages = []
+
+# Module-level: one dict instead of 14 globals
+_PROVIDER_HISTORIES: dict[str, ProviderHistory] = {
+ "anthropic": ProviderHistory(),
+ "deepseek":  ProviderHistory(),
+ "minimax":   ProviderHistory(),
+ "qwen":      ProviderHistory(),
+ "grok":      ProviderHistory(),
+ "llama":     ProviderHistory(),
+}
+
+def get_history(provider: str) -> ProviderHistory:
+    return _PROVIDER_HISTORIES[provider]
+```
+
+**Call sites to update:** All `_send_<provider>()` functions (~6 files in `ai_client.py`); the `reset_session()` function; the `cleanup()` function. **Replaces 14 globals with 1 dict + 1 function.**
+
+**Estimated value:** **HIGH** — 14 globals → 1 dict + class. Encapsulates the lock + list behind a 4-method interface. Makes the cross-provider pattern visible: every provider has a history + lock; the `_PROVIDER_HISTORIES` dict makes the per-provider table a first-class object. Mirrors the `vendor_capabilities` `dict[tuple[str, str], VendorCapabilities]` pattern exactly.
+
+**Cross-reference to §3.2:** The `Metadata = list[dict[str, Any]]` in `ProviderHistory.messages` could be tightened to `list[ChatMessage]` (from §3.2) if the cross-provider schema can be unified. Realistic: the LLM-provider history format is **mostly** OpenAI-compatible (`{role, content}`) but with provider-specific extras (`tool_calls` for OpenAI; `reasoning_content` for Anthropic; `parts` for Gemini). A `ProviderHistory` whose `messages` is `list[ChatMessage | dict]` (union type) is realistic for a single-track scope; full unification is a separate refactor.
+
+---
+
+### 3.4 P2 — `src/log_registry.py: Session metadata` (7 Any)
+
+**Current state** (`src/log_registry.py:58-71`):
+
+```python
+self.data: dict[str, dict[str, Any]] = {}  # session_id -> session content
+
+def get_old_non_whitelisted_sessions(self) -> list[dict[str, Any]]:
+    ...
+```
+
+The outer key is `session_id: str`. The inner dict has implicit fields: `path`, `start_time`, `whitelisted`, `metadata`.
+
+**Proposed componentization:**
+
+```python
+@dataclass(frozen=True)
+class SessionMetadata:
+ message_count: int = 0
+ errors: int = 0
+ size_kb: int = 0
+ whitelisted: bool = False
+ reason: str = ''
+ timestamp: Optional[str] = None
+
+@dataclass(frozen=True)
+class Session:
+ session_id: str
+ path: str
+ start_time: str    # ISO format
+ whitelisted: bool = False
+ metadata: Optional[SessionMetadata] = None
+
+@dataclass
+class LogRegistry:
+    registry_path: str
+    data: dict[str, Session] = field(default_factory=dict)  # typed!
+```
+
+**Call sites to update:** `session_logger.py` (`open_session()`, `close_session()`); `log_pruner.py` (`prune_old_logs()`); `gui_2.py` (Log Management panel).
+
+**Estimated value:** MEDIUM — Self-contained file; isolated change. Eliminates a nested `dict[str, dict[str, Any]]` (2 levels of structural anonymity) in favor of 2 named dataclasses.
+
+---
+
+### 3.5 P3 — `src/api_hooks.py: Generic payload + serialization` (16 Any)
+
+**Current state** (`src/api_hooks.py:48-134`):
+
+```python
+def _get_app_attr(app: Any, name: str, default: Any = None) -> Any: ...
+def _set_app_attr(app: Any, name: str, value: Any) -> None: ...
+def _serialize_for_api(obj: Any) -> Any: ...
+def broadcast(self, channel: str, payload: dict[str, Any]) -> None: ...
+```
+
+**Problem:** `_get_app_attr` / `_set_app_attr` are dynamic-dispatch helpers (Pattern 4, "keep as-is"). But `_serialize_for_api` and `broadcast` are the **JSON wire format** — they could be typed.
+
+**Proposed componentization:**
+
+```python
+# Recursive type for serializable JSON payloads (Python 3.12+ has type; earlier needs TypeAlias)
+JsonPrimitive: TypeAlias = str | int | float | bool | None
+JsonValue: TypeAlias = JsonPrimitive | list["JsonValue"] | dict[str, "JsonValue"]
+
+def _serialize_for_api(obj: Any) -> JsonValue: ...
+
+@dataclass(frozen=True)
+class WebSocketMessage:
+ channel: str
+ payload: JsonValue
+
+def broadcast(self, message: WebSocketMessage) -> None: ...
+```
+
+**Estimated value:** LOW — Internal serialization; lower semantic gain. The `JsonValue` recursive type is the main value; it makes the wire format explicit.
+
+---
+
+## 4. Patterns That Are NOT Componentization Candidates
+
+These are the `Any` usages that should **stay as-is** (intentional flexibility):
+
+### 4.1 SDK Client Holders (Pattern 3)
+
+`_gemini_chat: Any = None`, `_deepseek_client: Any = None`, etc. in `src/ai_client.py`. These are **lazy-initialized** module-level singletons. Each provider's SDK client has a different type (`genai.Client`, `anthropic.Anthropic`, `openai.OpenAI`, etc.). They don't share a base class or Protocol.
+
+A `ProviderClients` dataclass that wraps all 7 clients would be possible (and is the §3.3 discussion), but the **client types** still have to be `Any` or `Optional[ProviderX]` because the SDKs are heterogeneous. The §3.3 refactor unifies the **history aspect** (which IS homogeneous — 6 providers, all `list[Metadata]` with locks) but leaves the client holders as Pattern 3.
+
+### 4.2 Dynamic Dispatch (`__getattr__`) (Pattern 4)
+
+`src/app_controller.py:1273 __getattr__`, `src/gui_2.py:742 __getattr__`, `src/commands.py:43 __getattr__`, `src/models.py:271 __getattr__`. These return `Any` because the delegated object is dynamically selected. The `__getattr__` is a known Python pattern; the return type is genuinely unknown at compile time.
+
+### 4.3 Generic Serialization (`obj: Any) -> Any`) (Pattern 5)
+
+`src/api_hooks.py:134 _serialize_for_api`, `src/app_controller.py:2144 _resolve_log_ref`. These process unknown-shaped data. The output shape mirrors the input shape. If the input is "anything from disk", the output is also "anything that can be re-serialized to disk."
+
+---
+
+## 5. The `code_path_audit_20260607` Pre-Requisite
+
+The `code_path_audit_20260607` track (spec approved 2026-06-07; revised 2026-06-08 for post-4-tracks timing) is now **unblocked**: the 4 foundational tracks it depends on (`qwen_llama_grok`, `data_oriented_error_handling`, `data_structure_strengthening`, `mcp_architecture_refactor`) have shipped (or are archivable). The audit's `trace_action` API will produce per-action profiles showing:
+
+- Which `Any` usages are in the **hot path** (e.g., `_send_<provider>` is called per request)
+- Which are in **cold paths** (e.g., `reset_session()` is called per project switch)
+- Which are in **initialization-only paths** (e.g., `_load_app_state()` is called once at startup)
+
+**The fat-struct componentization work is informed by this audit.** A `dict[str, Any]` in a hot path has a higher ROI to componentize than the same shape in a cold path (where the runtime cost is amortized). The `code_path_audit` report's `optimization_candidates.md` should specifically call out the 5 fat-struct candidates in §3 with their per-action cost estimates.
+
+### 5.1 Coordination Notes
+
+- The `code_path_audit_20260607` track's spec already mentions "fat struct" patterns indirectly (via the Casey Muratori / Andrew Reece / Ryan Fleury framing). The new `Any-typing componentization` follow-up track can cite the audit's `expensive_ops` index for each fat-struct candidate.
+- The audit's `actions/ai_message_lifecycle.tree` will show the call path from `_send_<provider>()` → `_reread_file_items()` → `_build_file_diff_text()` (the §3.3 history mutation path). This is the hot path.
+- The audit's `actions/discussion_save_load.tree` will show the `project_manager.save_project()` → `json.dumps()` (the §3.4 Session serialization path).
+
+### 5.2 Sequencing
+
+| Order | Track | Why |
+|---|---|---|
+| 1 | `code_path_audit_20260607` (run the audit) | Produces the per-action data needed to prioritize §3's 5 candidates |
+| 2 | `any_type_componentization_202606XX` (Tier 1 spec + plan) | Devised by Tier 1 with the audit's output as input |
+| 3 | Tier 2 implementation | 6 phases per the proposed track below |
+
+---
+
+## 6. Proposed Follow-up Track: `any_type_componentization_2026MMDD`
+
+**Suggested name:** `any_type_componentization_2026MMDD`
+**Owner:** Tier 1 (spec) → Tier 2 (implementation)
+**Priority:** Medium (developer + AI-readability; not a regression blocker)
+**Blocked by:** `code_path_audit_20260607` (the audit's report informs the spec)
+**Blocks:** None directly; enables follow-up `TypedDict migration` (per the original `data_structure_strengthening` plan §12.1)
+
+### 6.1 Goals (Priority Order)
+
+| Priority | Goal |
+|---|---|
+| **A (primary)** | Convert the 5 fat-struct candidates (§3) into `dataclass(frozen=True)` definitions following the `vendor_capabilities` template |
+| **B (architectural)** | Unify the 7 per-provider histories in `ai_client.py` (§3.3) behind a single `ProviderHistory` class + dict |
+| **C (documentation)** | Update `conductor/code_styleguides/type_aliases.md` (from `data_structure_strengthening_20260606`) with a new "When to Promote `TypeAlias` to `dataclass`" section |
+| **D (forward-looking)** | Re-evaluate the `code_path_audit`'s `expensive_ops` index after the componentization to confirm hot-path costs are reduced |
+
+### 6.2 Non-Goals (Track Scope Discipline)
+
+- **NOT** converting all 300 `Any` usages. Only the 5 fat-struct candidates in §3.
+- **NOT** converting SDK client holders (Pattern 3, §4.1). They stay as `Any` — heterogeneous types.
+- **NOT** changing the `__getattr__` dynamic-dispatch pattern (Pattern 4, §4.2). It stays as `Any` — intentional.
+- **NOT** typing the generic serialization functions (Pattern 5, §4.3). They stay as `Any` — input-driven.
+- **NOT** changing function signatures at the runtime level. The componentization is type-level + serialization-format-level.
+
+### 6.3 Suggested Phases
+
+| Phase | Work |
+|---|---|
+| 1 | `src/mcp_tool_specs.py` — new module with `ToolParameter` + `ToolSpec`; convert `MCP_TOOL_SPECS` to `list[ToolSpec]`; update `get_tool_schemas()`, `TOOL_NAMES`, dispatch map |
+| 2 | `src/openai_schemas.py` — new module with `ToolCall` + `ChatMessage` + `UsageStats`; convert `NormalizedResponse` and `OpenAICompatibleRequest`; update `_send_grok`/`_send_minimax`/`_send_llama` |
+| 3 | `src/provider_state.py` — new module with `ProviderHistory`; convert 7 histories + 7 locks to dict; update all `_send_<provider>()` and `reset_session()` |
+| 4 | `src/log_registry.py` — convert `Session` + `SessionMetadata`; update `session_logger.py` + `log_pruner.py` + `gui_2.py` |
+| 5 | `src/api_hooks.py` — add `JsonValue` recursive type; convert `WebSocketMessage`; update `broadcast()` |
+| 6 | Styleguide update + audit report + archive |
+
+### 6.4 Estimated Scope (per the `data_structure_strengthening` precedent)
+
+- **6 source files modified** (5 fat-struct files + `ai_client.py` for the history unification)
+- **3 new source files** (`mcp_tool_specs.py`, `openai_schemas.py`, `provider_state.py`)
+- **3 new test files** (per the TDD red-first protocol)
+- **1 styleguide update** (`type_aliases.md` — "When to Promote `TypeAlias` to `dataclass`" section)
+- **1 end-of-track report** (`docs/reports/TRACK_COMPLETION_any_type_componentization_<date>.md`)
+- **~30-50 atomic commits** (vs. `data_structure_strengthening`'s 22, because the per-file refactor is more complex)
+- **Audit followup**: re-run `code_path_audit_20260607` to confirm hot-path costs are reduced
+
+### 6.5 Convention to Document (styleguide)
+
+The new styleguide section (per `data_structure_strengthening`'s `conductor/code_styleguides/type_aliases.md`):
+
+```markdown
+## When to Promote `TypeAlias` to `dataclass`
+
+A `TypeAlias` like `Metadata: TypeAlias = dict[str, Any]` is a **rename** — the
+underlying shape is unchanged. This is appropriate when:
+
+- The shape is **truly open** (extra keys are allowed; the dict is a bag)
+- The shape is **self-describing** (caller reads `entry.get("path")` without
+  needing to know which keys are required)
+- The shape is **transient** (JSON-serialized, then deserialized; no
+  in-memory struct invariants)
+
+Promote to `dataclass(frozen=True)` when:
+
+- The shape has **a known set of required fields** with **specific types**
+  (e.g., a chat completion's `usage: UsageStats` with 4 int fields)
+- Multiple sites access the same fields with **string keys**
+  (`payload["usage"]["input_tokens"]` × 5 sites = 5× the bug surface)
+- The shape is **stable across serialization boundaries** (i.e., the
+  on-disk / on-wire format is documented and won't change per-call)
+- The shape is **shared across multiple modules** (the same schema is
+  used by `ai_client.py` and `openai_compatible.py` and `api_hooks.py`)
+
+The reference pattern is `src/vendor_capabilities.py`. When in doubt,
+follow that template: `frozen=True` dataclass + module-level registry +
+factory functions.
+
+The fat-struct candidates identified in
+`docs/reports/ANY_TYPE_AUDIT_20260621.md` (§3) are the canonical
+worked examples.
+```
+
+---
+
+## 7. Out of Scope (Explicit)
+
+The following are intentionally NOT in this report's recommendations:
+
+- **All 300 `Any` usages as a flat list.** The 5-pattern taxonomy (§2.2) groups them; the §3 fat-struct candidates are the actionable subset.
+- **Conversion of `dict[str, Any]` to `TypedDict`.** Per the original `data_structure_strengthening` plan §10, this is deferred. The proposed `dataclass(frozen=True)` approach is simpler and addresses the same problem (semantic naming).
+- **Conversion of `dict[str, Any]` to Pydantic models.** The project doesn't use Pydantic for these shapes; introducing it would be a much larger architectural decision.
+- **The 23 lower-impact files** (those with 1-9 weak `dict[str, Any]` sites each). These are deferred; the audit's `expensive_ops` index will re-prioritize them after the hot-path fat structs are componentized.
+- **Re-typing the existing `TypeAlias` definitions** (e.g., making `Metadata: TypeAlias = dict[str, Any]` a `class Metadata(dict)`). The aliases document intent; converting them to types is a separate decision.
+
+---
+
+## 8. Cross-References
+
+- `src/type_aliases.py` — the 10 `TypeAlias` definitions + `FileItemsDiff` `NamedTuple` (per `data_structure_strengthening_20260606`)
+- `src/result_types.py` — `Result[T]`, `ErrorInfo`, `NilPath`, `NilRAGState` (per `data_oriented_error_handling_20260606`)
+- `src/vendor_capabilities.py` — the reference pattern (frozen dataclass + module-level registry)
+- `src/code_path_audit.py` — future home of the `code_path_audit_20260607` tool (per the existing spec)
+- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference (per the `nagent_review_20260608` framing)
+- `conductor/code_styleguides/error_handling.md` — the `Result[T]` convention (complementary)
+- `conductor/code_styleguides/type_aliases.md` — the type-alias convention (per `data_structure_strengthening_20260606`)
+- `docs/reports/TRACK_COMPLETION_data_structure_strengthening_20260606.md` — the parent track
+- `docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md` — the precedent for this audit (211 sites → audit report → migration plan)
+- `conductor/tracks/code_path_audit_20260607/` — the prerequisite track (post-4-tracks timing)
+- `conductor/tracks/nagent_review_20260608/` — the Casey Muratori / Ryan Fleury / Andrew Reece framing
+
+---
+
+## 9. Conclusion
+
+The `data_structure_strengthening_20260606` track established the `TypeAlias` convention for naming shapes. The next logical step is **promoting the hot-path fat structs to `dataclass(frozen=True)` definitions** — the same `vendor_capabilities` pattern that the user pointed to. This report identifies 5 high-value candidates (§3), the patterns that should NOT be touched (§4), and a 6-phase proposed follow-up track (§6) that is informed by the prerequisite `code_path_audit_20260607` work.
+
+**Tier 1 is expected to devise the follow-up track spec** with the audit's per-action data as input. The spec's scope, priority, and exact phasing can be tuned to the audit's findings. The track name (`any_type_componentization_2026MMDD`) and the 6 phases in §6.3 are starting points.
+
+The single most important insight: **the `vendor_capabilities.py` pattern works because it identifies a `tuple[str, str]` (vendor × model) as a first-class key in a `dict[tuple, VendorCapabilities]`. The same pattern applied to the 5 fat-struct candidates in §3 produces the same win: shape becomes addressable, dict-key-lookups become field-access, and the static analysis can verify the contract.**