docs(reports): ANY_TYPE_AUDIT_20260621 - Any-type usage & componentization opportunities
This commit is contained in:
@@ -0,0 +1,569 @@
|
||||
# Audit Report: `Any` Type Usage & Data-Oriented Componentization Opportunities
|
||||
|
||||
**Date:** 2026-06-21
|
||||
**Author:** Tier 2 Tech Lead (autonomous sandbox)
|
||||
**Track:** `data_structure_strengthening_20260606` (follow-on)
|
||||
**Status:** Findings report; **NOT a track spec** — Tier 1 is expected to devise the follow-up track.
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
The `data_structure_strengthening_20260606` track replaced 416 `dict[str, Any]` / `list[dict[...]]` / `Tuple[...]` annotations with 10 `TypeAlias` definitions + 1 `NamedTuple` (528 → 112 weak sites; 79% reduction). The 10 `TypeAlias` definitions are **renames** — they point to the same underlying `dict[str, Any]` / `list[dict[str, Any]]` shapes. The alias names document intent; they do not add type safety.
|
||||
|
||||
This report audits the **remaining `Any` usage** (~300 occurrences across 41 files in `src/`) and identifies **fat-struct componentization opportunities** that can be promoted to true `dataclass(frozen=True)` definitions, following the pattern already established by `src/vendor_capabilities.py`. The 5 highest-value candidates are:
|
||||
|
||||
| Rank | File | Fat Struct | Sites | Estimated Value |
|
||||
|---|---|---|---:|---|
|
||||
| **P1** | `src/mcp_client.py` | `MCP_TOOL_SPECS` (45 tools) | 8 Any | **HIGH** — 45 × ~4 params = ~180 implicit fields |
|
||||
| **P1** | `src/openai_compatible.py` | `NormalizedResponse` + `OpenAICompatibleRequest` | 17 Any | **HIGH** — message/tool-call/usage schemas are well-known |
|
||||
| **P2** | `src/ai_client.py` | 7 × `*_history: list[Metadata]` + 7 × `*_history_lock` | 41 Any | **HIGH** — unification is a `ProviderHistory` dict |
|
||||
| **P2** | `src/log_registry.py` | `data: dict[str, dict[str, Any]]` | 7 Any | MEDIUM — session metadata has 5 well-defined fields |
|
||||
| **P3** | `src/api_hooks.py` | `_serialize_for_api(obj: Any) -> Any` + `broadcast(payload)` | 16 Any | LOW — internal serialization; lower semantic gain |
|
||||
|
||||
**The recommended sequencing** is to run `code_path_audit_20260607` FIRST (now that the 4 foundational tracks have shipped: `qwen_llama_grok`, `data_oriented_error_handling`, **`data_structure_strengthening`**, `mcp_architecture_refactor`). The audit's `ActionProfile` for the 3 in-scope actions (AI message lifecycle, discussion save/load, GUI startup) will identify which fat-struct sites are in the **hot path** vs. cold. The componentization work then targets the hot-path fat structs first.
|
||||
|
||||
The follow-up track (proposed §6 below) is the **"Any-Type Componentization" track** — a 6-phase refactor that converts the 5 fat-struct candidates above into true `dataclass(frozen=True)` definitions, following the `vendor_capabilities` template.
|
||||
|
||||
---
|
||||
|
||||
## 2. Methodology
|
||||
|
||||
### 2.1 Scope
|
||||
|
||||
This report covers `Any` type annotations in `src/**/*.py`. The 41 files surveyed:
|
||||
|
||||
```
|
||||
ai_client.py (41), app_controller.py (25), openai_compatible.py (17),
|
||||
api_hooks.py (16), gui_2.py (13), events.py (13), mcp_client.py (8),
|
||||
hot_reloader.py (7), log_registry.py (7), models.py (7), command_palette.py (6),
|
||||
commands.py (6), rag_engine.py (6), theme_models.py (6), history.py (6),
|
||||
api_hooks_helpers.py (6), conductor_tech_lead.py (5), orchestrator_pm.py (5),
|
||||
imgui_scopes.py (5), file_cache.py (1), warmup.py (1), ... [21 more files ≤4]
|
||||
```
|
||||
|
||||
### 2.2 The 5 Patterns of `Any` Usage
|
||||
|
||||
Across all 41 files, `Any` falls into exactly 5 patterns. The patterns are ranked by **% of total occurrences**:
|
||||
|
||||
| # | Pattern | % of `Any` | Replaceable? |
|
||||
|---|---|---:|---|
|
||||
| 1 | `dict[str, Any]` — JSON-shaped payloads (config, API bodies, tool specs) | ~35% | YES → `Metadata` (existing) or new `ToolInput`/`ApiPayload`/`SessionData` |
|
||||
| 2 | `*_history: list[Metadata]` / `list[Any]` — per-provider message lists | ~12% | YES → unified `ProviderHistory` dict |
|
||||
| 3 | SDK client holders (`_gemini_chat: Any = None`, etc.) | ~8% | NO (lazy-init pattern; heterogeneous types) |
|
||||
| 4 | Dynamic dispatch (`__getattr__` returning `Any`) | ~6% | NO (intentional delegation) |
|
||||
| 5 | Generic serialization (`obj: Any) -> Any`) | ~5% | NO (genuinely generic) |
|
||||
|
||||
**~57% of `Any` usages are replaceable with concrete dataclasses.** The remaining ~43% are intentional (SDK holders, dynamic dispatch, serialization).
|
||||
|
||||
### 2.3 The Reference Pattern: `src/vendor_capabilities.py`
|
||||
|
||||
`vendor_capabilities.py` is the **canonical "module-level abstraction layer"** the user pointed to. Its structure (76 lines):
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class VendorCapabilities:
|
||||
vendor: str
|
||||
model: str
|
||||
vision: bool = False
|
||||
tool_calling: bool = True
|
||||
caching: bool = False
|
||||
# ... 22 named fields total
|
||||
_REGISTRY: dict[tuple[str, str], VendorCapabilities] = {}
|
||||
|
||||
def register(cap: VendorCapabilities) -> None: ...
|
||||
def get_capabilities(vendor: str, model: str) -> VendorCapabilities: ...
|
||||
```
|
||||
|
||||
**Properties that make this pattern successful:**
|
||||
|
||||
| Property | Why it matters |
|
||||
|---|---|
|
||||
| `frozen=True` | Immutable; thread-safe; no accidental mutation |
|
||||
| Named fields | Every capability is addressable by name (no `dict['vision']` lookups) |
|
||||
| Module-level registry | O(1) lookup; no instantiation overhead |
|
||||
| Wildcard `*` model | Fallback for unregistered models |
|
||||
| Flat (no nesting) | Single cache-line access for most queries |
|
||||
| Registration pattern | Extensible without modifying existing code |
|
||||
|
||||
**All 5 fat-struct candidates below should follow this template.**
|
||||
|
||||
---
|
||||
|
||||
## 3. The Inventory: Top 5 Fat-Struct Candidates
|
||||
|
||||
### 3.1 P1 — `src/mcp_client.py: MCP_TOOL_SPECS` (45 tools, 8 Any usages)
|
||||
|
||||
**Current state** (`src/mcp_client.py:1954-1972`):
|
||||
|
||||
```python
|
||||
def get_tool_schemas() -> list[dict[str, Any]]:
|
||||
...
|
||||
MCP_TOOL_SPECS: list[dict[str, Any]] = [
|
||||
{
|
||||
"name": "py_remove_def",
|
||||
"description": "Excises a specific class or function from a Python file.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"path": { "type": "string", "description": "Path to the .py file." },
|
||||
"name": { "type": "string", "description": "The name of the class or function to remove." }
|
||||
},
|
||||
"required": ["path", "name"]
|
||||
}
|
||||
},
|
||||
# ... 44 more dicts of identical shape
|
||||
]
|
||||
TOOL_NAMES: set[str] = {t['name'] for t in MCP_TOOL_SPECS}
|
||||
```
|
||||
|
||||
**Problem:** 45 tool specs × ~3-5 parameters = ~180 implicit fields. The set comprehension `{t['name'] for t in MCP_TOOL_SPECS}` demonstrates the access pattern — repeated string-key lookups on untyped dicts. The dispatch map (`_dispatch_table`) is keyed by string tool names; static analysis cannot verify the key set.
|
||||
|
||||
**Proposed componentization** (following the `vendor_capabilities` pattern):
|
||||
|
||||
```python
|
||||
# src/mcp_tool_specs.py (new; module-level abstraction)
|
||||
@dataclass(frozen=True)
|
||||
class ToolParameter:
|
||||
name: str
|
||||
type: str # "string" | "integer" | "boolean" | "object" | "array"
|
||||
description: str
|
||||
required: bool = False
|
||||
enum: Optional[list[str]] = None
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ToolSpec:
|
||||
name: str
|
||||
description: str
|
||||
parameters: tuple[ToolParameter, ...]
|
||||
category: str = "file" # "file" | "ast" | "network" | "surgical"
|
||||
|
||||
_REGISTRY: dict[str, ToolSpec] = {}
|
||||
|
||||
def register(spec: ToolSpec) -> None: ...
|
||||
def get_tool_spec(name: str) -> ToolSpec: ...
|
||||
def get_tool_schemas() -> list[ToolSpec]: ...
|
||||
def tool_names() -> set[str]: ...
|
||||
```
|
||||
|
||||
**Call sites to update:** `mcp_client.py:1772 dispatch()`, `mcp_client.py:1939 async_dispatch()`, the `TOOL_NAMES` set, the `_dispatch_table` map (could become a `dict[str, Callable]` instead of string-keyed).
|
||||
|
||||
**Estimated value:** **HIGH** — 45 tools × ~4 params each = ~180 implicit fields become explicit. Enables IDE autocomplete of tool names + parameters. Static analysis can verify dispatch keys.
|
||||
|
||||
---
|
||||
|
||||
### 3.2 P1 — `src/openai_compatible.py: NormalizedResponse + OpenAICompatibleRequest` (17 Any)
|
||||
|
||||
**Current state** (`src/openai_compatible.py:22-42`):
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class NormalizedResponse:
|
||||
text: str
|
||||
tool_calls: list[dict[str, Any]] # FAT: JSON tool call shape
|
||||
usage_input_tokens: int
|
||||
usage_output_tokens: int
|
||||
usage_cache_read_tokens: int
|
||||
usage_cache_creation_tokens: int
|
||||
raw_response: Any # FAT: SDK-specific response
|
||||
|
||||
@dataclass
|
||||
class OpenAICompatibleRequest:
|
||||
messages: list[dict[str, Any]] # FAT: message shape
|
||||
model: str
|
||||
temperature: float = 0.0
|
||||
top_p: float = 1.0
|
||||
max_tokens: int = 8192
|
||||
tools: Optional[list[dict[str, Any]]] = None # FAT: tool schema
|
||||
tool_choice: str = "auto"
|
||||
stream: bool = False
|
||||
stream_callback: Optional[Callable[[str], None]] = None
|
||||
extra_body: Optional[dict[str, Any]] = None # FAT: arbitrary params
|
||||
```
|
||||
|
||||
**Three distinct fat-struct shapes** are in this file:
|
||||
1. **Tool call** (id, type, function: {name, arguments})
|
||||
2. **Chat message** (role, content, optional tool_calls/tool_call_id/name)
|
||||
3. **Usage stats** (input_tokens, output_tokens, cache_read, cache_creation)
|
||||
|
||||
**Proposed componentization:**
|
||||
|
||||
```python
|
||||
# src/openai_schemas.py (new; shared between openai_compatible.py and ai_client.py)
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ToolCall:
|
||||
id: str
|
||||
type: str = "function"
|
||||
function: "ToolCallFunction"
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ToolCallFunction:
|
||||
name: str
|
||||
arguments: str # JSON string
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ChatMessage:
|
||||
role: str # "system" | "user" | "assistant" | "tool"
|
||||
content: str
|
||||
tool_calls: Optional[tuple[ToolCall, ...]] = None
|
||||
tool_call_id: Optional[str] = None
|
||||
name: Optional[str] = None
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class UsageStats:
|
||||
input_tokens: int
|
||||
output_tokens: int
|
||||
cache_read_tokens: int = 0
|
||||
cache_creation_tokens: int = 0
|
||||
|
||||
# NormalizedResponse becomes:
|
||||
@dataclass(frozen=True)
|
||||
class NormalizedResponse:
|
||||
text: str
|
||||
tool_calls: tuple[ToolCall, ...]
|
||||
usage: UsageStats
|
||||
raw_response: Any # Unavoidable: SDK-specific
|
||||
|
||||
# OpenAICompatibleRequest becomes:
|
||||
@dataclass
|
||||
class OpenAICompatibleRequest:
|
||||
messages: list[ChatMessage]
|
||||
model: str
|
||||
temperature: float = 0.0
|
||||
# ... etc
|
||||
tools: Optional[list[ToolSpec]] = None # Use the §3.1 ToolSpec
|
||||
```
|
||||
|
||||
**Call sites to update:** `_send_grok()`, `_send_minimax()`, `_send_llama()` in `ai_client.py` (3 functions); `openai_compatible.py` itself (~5 internal functions).
|
||||
|
||||
**Estimated value:** **HIGH** — The OpenAI chat completion API is well-documented; the schema is stable; the LLM-readable documentation at <https://platform.openai.com/docs/api-reference/chat> is the source of truth. The 17 Any usages become 3 well-named dataclasses.
|
||||
|
||||
**Cross-reference to §3.1:** The `tools: Optional[list[ToolSpec]]` field reuses the `ToolSpec` from the `mcp_client.py` refactor. One component, two consumers.
|
||||
|
||||
---
|
||||
|
||||
### 3.3 P2 — `src/ai_client.py: 7 × ProviderHistory` (41 Any)
|
||||
|
||||
**Current state** (`src/ai_client.py:108-134`):
|
||||
|
||||
```python
|
||||
_anthropic_history: list[Metadata] = []
|
||||
_deepseek_history: list[Metadata] = []
|
||||
_minimax_history: list[Metadata] = []
|
||||
_qwen_history: list[Metadata] = []
|
||||
_grok_history: list[Metadata] = []
|
||||
_llama_history: list[Metadata] = []
|
||||
# Plus 6 lock variables:
|
||||
_anthropic_history_lock: threading.Lock = threading.Lock()
|
||||
_deepseek_history_lock: threading.Lock = threading.Lock()
|
||||
# ... etc
|
||||
```
|
||||
|
||||
Plus the SDK client holders (Patterns 3, "keep as-is"):
|
||||
|
||||
```python
|
||||
_gemini_client: Optional[genai.Client] = None
|
||||
_gemini_chat: Any = None
|
||||
_gemini_cache: Any = None
|
||||
_deepseek_client: Any = None
|
||||
_minimax_client: Any = None
|
||||
_qwen_client: Any = None
|
||||
_grok_client: Any = None
|
||||
_llama_client: Any = None
|
||||
```
|
||||
|
||||
**Problem:** 7 per-provider history lists + 7 locks = **14 module-level globals**. Each `_send_<provider>()` function mutates its own history. The `reset_session()` function knows about all 14. The cross-cutting concern is "history management" but it's spread across 14 variables.
|
||||
|
||||
**Proposed componentization** (componentizing the history aspect; keeping the SDK clients as-is per Pattern 3):
|
||||
|
||||
```python
|
||||
# src/provider_state.py (new)
|
||||
|
||||
@dataclass
|
||||
class ProviderHistory:
|
||||
messages: list[Metadata] = field(default_factory=list)
|
||||
lock: threading.Lock = field(default_factory=threading.Lock)
|
||||
|
||||
def append(self, message: Metadata) -> None:
|
||||
with self.lock:
|
||||
self.messages.append(message)
|
||||
|
||||
def get_all(self) -> list[Metadata]:
|
||||
with self.lock:
|
||||
return list(self.messages)
|
||||
|
||||
def replace_all(self, messages: list[Metadata]) -> None:
|
||||
with self.lock:
|
||||
self.messages = list(messages)
|
||||
|
||||
def clear(self) -> None:
|
||||
with self.lock:
|
||||
self.messages = []
|
||||
|
||||
# Module-level: one dict instead of 14 globals
|
||||
_PROVIDER_HISTORIES: dict[str, ProviderHistory] = {
|
||||
"anthropic": ProviderHistory(),
|
||||
"deepseek": ProviderHistory(),
|
||||
"minimax": ProviderHistory(),
|
||||
"qwen": ProviderHistory(),
|
||||
"grok": ProviderHistory(),
|
||||
"llama": ProviderHistory(),
|
||||
}
|
||||
|
||||
def get_history(provider: str) -> ProviderHistory:
|
||||
return _PROVIDER_HISTORIES[provider]
|
||||
```
|
||||
|
||||
**Call sites to update:** All `_send_<provider>()` functions (~6 files in `ai_client.py`); the `reset_session()` function; the `cleanup()` function. **Replaces 14 globals with 1 dict + 1 function.**
|
||||
|
||||
**Estimated value:** **HIGH** — 14 globals → 1 dict + class. Encapsulates the lock + list behind a 4-method interface. Makes the cross-provider pattern visible: every provider has a history + lock; the `_PROVIDER_HISTORIES` dict makes the per-provider table a first-class object. Mirrors the `vendor_capabilities` `dict[tuple[str, str], VendorCapabilities]` pattern exactly.
|
||||
|
||||
**Cross-reference to §3.2:** The `Metadata = list[dict[str, Any]]` in `ProviderHistory.messages` could be tightened to `list[ChatMessage]` (from §3.2) if the cross-provider schema can be unified. Realistic: the LLM-provider history format is **mostly** OpenAI-compatible (`{role, content}`) but with provider-specific extras (`tool_calls` for OpenAI; `reasoning_content` for Anthropic; `parts` for Gemini). A `ProviderHistory` whose `messages` is `list[ChatMessage | dict]` (union type) is realistic for a single-track scope; full unification is a separate refactor.
|
||||
|
||||
---
|
||||
|
||||
### 3.4 P2 — `src/log_registry.py: Session metadata` (7 Any)
|
||||
|
||||
**Current state** (`src/log_registry.py:58-71`):
|
||||
|
||||
```python
|
||||
self.data: dict[str, dict[str, Any]] = {} # session_id -> session content
|
||||
|
||||
def get_old_non_whitelisted_sessions(self) -> list[dict[str, Any]]:
|
||||
...
|
||||
```
|
||||
|
||||
The outer key is `session_id: str`. The inner dict has implicit fields: `path`, `start_time`, `whitelisted`, `metadata`.
|
||||
|
||||
**Proposed componentization:**
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class SessionMetadata:
|
||||
message_count: int = 0
|
||||
errors: int = 0
|
||||
size_kb: int = 0
|
||||
whitelisted: bool = False
|
||||
reason: str = ''
|
||||
timestamp: Optional[str] = None
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class Session:
|
||||
session_id: str
|
||||
path: str
|
||||
start_time: str # ISO format
|
||||
whitelisted: bool = False
|
||||
metadata: Optional[SessionMetadata] = None
|
||||
|
||||
@dataclass
|
||||
class LogRegistry:
|
||||
registry_path: str
|
||||
data: dict[str, Session] = field(default_factory=dict) # typed!
|
||||
```
|
||||
|
||||
**Call sites to update:** `session_logger.py` (`open_session()`, `close_session()`); `log_pruner.py` (`prune_old_logs()`); `gui_2.py` (Log Management panel).
|
||||
|
||||
**Estimated value:** MEDIUM — Self-contained file; isolated change. Eliminates a nested `dict[str, dict[str, Any]]` (2 levels of structural anonymity) in favor of 2 named dataclasses.
|
||||
|
||||
---
|
||||
|
||||
### 3.5 P3 — `src/api_hooks.py: Generic payload + serialization` (16 Any)
|
||||
|
||||
**Current state** (`src/api_hooks.py:48-134`):
|
||||
|
||||
```python
|
||||
def _get_app_attr(app: Any, name: str, default: Any = None) -> Any: ...
|
||||
def _set_app_attr(app: Any, name: str, value: Any) -> None: ...
|
||||
def _serialize_for_api(obj: Any) -> Any: ...
|
||||
def broadcast(self, channel: str, payload: dict[str, Any]) -> None: ...
|
||||
```
|
||||
|
||||
**Problem:** `_get_app_attr` / `_set_app_attr` are dynamic-dispatch helpers (Pattern 4, "keep as-is"). But `_serialize_for_api` and `broadcast` are the **JSON wire format** — they could be typed.
|
||||
|
||||
**Proposed componentization:**
|
||||
|
||||
```python
|
||||
# Recursive type for serializable JSON payloads (Python 3.12+ has type; earlier needs TypeAlias)
|
||||
JsonPrimitive: TypeAlias = str | int | float | bool | None
|
||||
JsonValue: TypeAlias = JsonPrimitive | list["JsonValue"] | dict[str, "JsonValue"]
|
||||
|
||||
def _serialize_for_api(obj: Any) -> JsonValue: ...
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class WebSocketMessage:
|
||||
channel: str
|
||||
payload: JsonValue
|
||||
|
||||
def broadcast(self, message: WebSocketMessage) -> None: ...
|
||||
```
|
||||
|
||||
**Estimated value:** LOW — Internal serialization; lower semantic gain. The `JsonValue` recursive type is the main value; it makes the wire format explicit.
|
||||
|
||||
---
|
||||
|
||||
## 4. Patterns That Are NOT Componentization Candidates
|
||||
|
||||
These are the `Any` usages that should **stay as-is** (intentional flexibility):
|
||||
|
||||
### 4.1 SDK Client Holders (Pattern 3)
|
||||
|
||||
`_gemini_chat: Any = None`, `_deepseek_client: Any = None`, etc. in `src/ai_client.py`. These are **lazy-initialized** module-level singletons. Each provider's SDK client has a different type (`genai.Client`, `anthropic.Anthropic`, `openai.OpenAI`, etc.). They don't share a base class or Protocol.
|
||||
|
||||
A `ProviderClients` dataclass that wraps all 7 clients would be possible (and is the §3.3 discussion), but the **client types** still have to be `Any` or `Optional[ProviderX]` because the SDKs are heterogeneous. The §3.3 refactor unifies the **history aspect** (which IS homogeneous — 6 providers, all `list[Metadata]` with locks) but leaves the client holders as Pattern 3.
|
||||
|
||||
### 4.2 Dynamic Dispatch (`__getattr__`) (Pattern 4)
|
||||
|
||||
`src/app_controller.py:1273 __getattr__`, `src/gui_2.py:742 __getattr__`, `src/commands.py:43 __getattr__`, `src/models.py:271 __getattr__`. These return `Any` because the delegated object is dynamically selected. The `__getattr__` is a known Python pattern; the return type is genuinely unknown at compile time.
|
||||
|
||||
### 4.3 Generic Serialization (`obj: Any) -> Any`) (Pattern 5)
|
||||
|
||||
`src/api_hooks.py:134 _serialize_for_api`, `src/app_controller.py:2144 _resolve_log_ref`. These process unknown-shaped data. The output shape mirrors the input shape. If the input is "anything from disk", the output is also "anything that can be re-serialized to disk."
|
||||
|
||||
---
|
||||
|
||||
## 5. The `code_path_audit_20260607` Pre-Requisite
|
||||
|
||||
The `code_path_audit_20260607` track (spec approved 2026-06-07; revised 2026-06-08 for post-4-tracks timing) is now **unblocked**: the 4 foundational tracks it depends on (`qwen_llama_grok`, `data_oriented_error_handling`, `data_structure_strengthening`, `mcp_architecture_refactor`) have shipped (or are archivable). The audit's `trace_action` API will produce per-action profiles showing:
|
||||
|
||||
- Which `Any` usages are in the **hot path** (e.g., `_send_<provider>` is called per request)
|
||||
- Which are in **cold paths** (e.g., `reset_session()` is called per project switch)
|
||||
- Which are in **initialization-only paths** (e.g., `_load_app_state()` is called once at startup)
|
||||
|
||||
**The fat-struct componentization work is informed by this audit.** A `dict[str, Any]` in a hot path has a higher ROI to componentize than the same shape in a cold path (where the runtime cost is amortized). The `code_path_audit` report's `optimization_candidates.md` should specifically call out the 5 fat-struct candidates in §3 with their per-action cost estimates.
|
||||
|
||||
### 5.1 Coordination Notes
|
||||
|
||||
- The `code_path_audit_20260607` track's spec already mentions "fat struct" patterns indirectly (via the Casey Muratori / Andrew Reece / Ryan Fleury framing). The new `Any-typing componentization` follow-up track can cite the audit's `expensive_ops` index for each fat-struct candidate.
|
||||
- The audit's `actions/ai_message_lifecycle.tree` will show the call path from `_send_<provider>()` → `_reread_file_items()` → `_build_file_diff_text()` (the §3.3 history mutation path). This is the hot path.
|
||||
- The audit's `actions/discussion_save_load.tree` will show the `project_manager.save_project()` → `json.dumps()` (the §3.4 Session serialization path).
|
||||
|
||||
### 5.2 Sequencing
|
||||
|
||||
| Order | Track | Why |
|
||||
|---|---|---|
|
||||
| 1 | `code_path_audit_20260607` (run the audit) | Produces the per-action data needed to prioritize §3's 5 candidates |
|
||||
| 2 | `any_type_componentization_202606XX` (Tier 1 spec + plan) | Devised by Tier 1 with the audit's output as input |
|
||||
| 3 | Tier 2 implementation | 6 phases per the proposed track below |
|
||||
|
||||
---
|
||||
|
||||
## 6. Proposed Follow-up Track: `any_type_componentization_2026MMDD`
|
||||
|
||||
**Suggested name:** `any_type_componentization_2026MMDD`
|
||||
**Owner:** Tier 1 (spec) → Tier 2 (implementation)
|
||||
**Priority:** Medium (developer + AI-readability; not a regression blocker)
|
||||
**Blocked by:** `code_path_audit_20260607` (the audit's report informs the spec)
|
||||
**Blocks:** None directly; enables follow-up `TypedDict migration` (per the original `data_structure_strengthening` plan §12.1)
|
||||
|
||||
### 6.1 Goals (Priority Order)
|
||||
|
||||
| Priority | Goal |
|
||||
|---|---|
|
||||
| **A (primary)** | Convert the 5 fat-struct candidates (§3) into `dataclass(frozen=True)` definitions following the `vendor_capabilities` template |
|
||||
| **B (architectural)** | Unify the 7 per-provider histories in `ai_client.py` (§3.3) behind a single `ProviderHistory` class + dict |
|
||||
| **C (documentation)** | Update `conductor/code_styleguides/type_aliases.md` (from `data_structure_strengthening_20260606`) with a new "When to Promote `TypeAlias` to `dataclass`" section |
|
||||
| **D (forward-looking)** | Re-evaluate the `code_path_audit`'s `expensive_ops` index after the componentization to confirm hot-path costs are reduced |
|
||||
|
||||
### 6.2 Non-Goals (Track Scope Discipline)
|
||||
|
||||
- **NOT** converting all 300 `Any` usages. Only the 5 fat-struct candidates in §3.
|
||||
- **NOT** converting SDK client holders (Pattern 3, §4.1). They stay as `Any` — heterogeneous types.
|
||||
- **NOT** changing the `__getattr__` dynamic-dispatch pattern (Pattern 4, §4.2). It stays as `Any` — intentional.
|
||||
- **NOT** typing the generic serialization functions (Pattern 5, §4.3). They stay as `Any` — input-driven.
|
||||
- **NOT** changing function signatures at the runtime level. The componentization is type-level + serialization-format-level.
|
||||
|
||||
### 6.3 Suggested Phases
|
||||
|
||||
| Phase | Work |
|
||||
|---|---|
|
||||
| 1 | `src/mcp_tool_specs.py` — new module with `ToolParameter` + `ToolSpec`; convert `MCP_TOOL_SPECS` to `list[ToolSpec]`; update `get_tool_schemas()`, `TOOL_NAMES`, dispatch map |
|
||||
| 2 | `src/openai_schemas.py` — new module with `ToolCall` + `ChatMessage` + `UsageStats`; convert `NormalizedResponse` and `OpenAICompatibleRequest`; update `_send_grok`/`_send_minimax`/`_send_llama` |
|
||||
| 3 | `src/provider_state.py` — new module with `ProviderHistory`; convert 7 histories + 7 locks to dict; update all `_send_<provider>()` and `reset_session()` |
|
||||
| 4 | `src/log_registry.py` — convert `Session` + `SessionMetadata`; update `session_logger.py` + `log_pruner.py` + `gui_2.py` |
|
||||
| 5 | `src/api_hooks.py` — add `JsonValue` recursive type; convert `WebSocketMessage`; update `broadcast()` |
|
||||
| 6 | Styleguide update + audit report + archive |
|
||||
|
||||
### 6.4 Estimated Scope (per the `data_structure_strengthening` precedent)
|
||||
|
||||
- **6 source files modified** (5 fat-struct files + `ai_client.py` for the history unification)
|
||||
- **3 new source files** (`mcp_tool_specs.py`, `openai_schemas.py`, `provider_state.py`)
|
||||
- **3 new test files** (per the TDD red-first protocol)
|
||||
- **1 styleguide update** (`type_aliases.md` — "When to Promote `TypeAlias` to `dataclass`" section)
|
||||
- **1 end-of-track report** (`docs/reports/TRACK_COMPLETION_any_type_componentization_<date>.md`)
|
||||
- **~30-50 atomic commits** (vs. `data_structure_strengthening`'s 22, because the per-file refactor is more complex)
|
||||
- **Audit followup**: re-run `code_path_audit_20260607` to confirm hot-path costs are reduced
|
||||
|
||||
### 6.5 Convention to Document (styleguide)
|
||||
|
||||
The new styleguide section (per `data_structure_strengthening`'s `conductor/code_styleguides/type_aliases.md`):
|
||||
|
||||
```markdown
|
||||
## When to Promote `TypeAlias` to `dataclass`
|
||||
|
||||
A `TypeAlias` like `Metadata: TypeAlias = dict[str, Any]` is a **rename** — the
|
||||
underlying shape is unchanged. This is appropriate when:
|
||||
|
||||
- The shape is **truly open** (extra keys are allowed; the dict is a bag)
|
||||
- The shape is **self-describing** (caller reads `entry.get("path")` without
|
||||
needing to know which keys are required)
|
||||
- The shape is **transient** (JSON-serialized, then deserialized; no
|
||||
in-memory struct invariants)
|
||||
|
||||
Promote to `dataclass(frozen=True)` when:
|
||||
|
||||
- The shape has **a known set of required fields** with **specific types**
|
||||
(e.g., a chat completion's `usage: UsageStats` with 4 int fields)
|
||||
- Multiple sites access the same fields with **string keys**
|
||||
(`payload["usage"]["input_tokens"]` × 5 sites = 5× the bug surface)
|
||||
- The shape is **stable across serialization boundaries** (i.e., the
|
||||
on-disk / on-wire format is documented and won't change per-call)
|
||||
- The shape is **shared across multiple modules** (the same schema is
|
||||
used by `ai_client.py` and `openai_compatible.py` and `api_hooks.py`)
|
||||
|
||||
The reference pattern is `src/vendor_capabilities.py`. When in doubt,
|
||||
follow that template: `frozen=True` dataclass + module-level registry +
|
||||
factory functions.
|
||||
|
||||
The fat-struct candidates identified in
|
||||
`docs/reports/ANY_TYPE_AUDIT_20260621.md` (§3) are the canonical
|
||||
worked examples.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Out of Scope (Explicit)
|
||||
|
||||
The following are intentionally NOT in this report's recommendations:
|
||||
|
||||
- **All 300 `Any` usages as a flat list.** The 5-pattern taxonomy (§2.2) groups them; the §3 fat-struct candidates are the actionable subset.
|
||||
- **Conversion of `dict[str, Any]` to `TypedDict`.** Per the original `data_structure_strengthening` plan §10, this is deferred. The proposed `dataclass(frozen=True)` approach is simpler and addresses the same problem (semantic naming).
|
||||
- **Conversion of `dict[str, Any]` to Pydantic models.** The project doesn't use Pydantic for these shapes; introducing it would be a much larger architectural decision.
|
||||
- **The 23 lower-impact files** (those with 1-9 weak `dict[str, Any]` sites each). These are deferred; the audit's `expensive_ops` index will re-prioritize them after the hot-path fat structs are componentized.
|
||||
- **Re-typing the existing `TypeAlias` definitions** (e.g., making `Metadata: TypeAlias = dict[str, Any]` a `class Metadata(dict)`). The aliases document intent; converting them to types is a separate decision.
|
||||
|
||||
---
|
||||
|
||||
## 8. Cross-References
|
||||
|
||||
- `src/type_aliases.py` — the 10 `TypeAlias` definitions + `FileItemsDiff` `NamedTuple` (per `data_structure_strengthening_20260606`)
|
||||
- `src/result_types.py` — `Result[T]`, `ErrorInfo`, `NilPath`, `NilRAGState` (per `data_oriented_error_handling_20260606`)
|
||||
- `src/vendor_capabilities.py` — the reference pattern (frozen dataclass + module-level registry)
|
||||
- `src/code_path_audit.py` — future home of the `code_path_audit_20260607` tool (per the existing spec)
|
||||
- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference (per the `nagent_review_20260608` framing)
|
||||
- `conductor/code_styleguides/error_handling.md` — the `Result[T]` convention (complementary)
|
||||
- `conductor/code_styleguides/type_aliases.md` — the type-alias convention (per `data_structure_strengthening_20260606`)
|
||||
- `docs/reports/TRACK_COMPLETION_data_structure_strengthening_20260606.md` — the parent track
|
||||
- `docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md` — the precedent for this audit (211 sites → audit report → migration plan)
|
||||
- `conductor/tracks/code_path_audit_20260607/` — the prerequisite track (post-4-tracks timing)
|
||||
- `conductor/tracks/nagent_review_20260608/` — the Casey Muratori / Ryan Fleury / Andrew Reece framing
|
||||
|
||||
---
|
||||
|
||||
## 9. Conclusion
|
||||
|
||||
The `data_structure_strengthening_20260606` track established the `TypeAlias` convention for naming shapes. The next logical step is **promoting the hot-path fat structs to `dataclass(frozen=True)` definitions** — the same `vendor_capabilities` pattern that the user pointed to. This report identifies 5 high-value candidates (§3), the patterns that should NOT be touched (§4), and a 6-phase proposed follow-up track (§6) that is informed by the prerequisite `code_path_audit_20260607` work.
|
||||
|
||||
**Tier 1 is expected to devise the follow-up track spec** with the audit's per-action data as input. The spec's scope, priority, and exact phasing can be tuned to the audit's findings. The track name (`any_type_componentization_2026MMDD`) and the 6 phases in §6.3 are starting points.
|
||||
|
||||
The single most important insight: **the `vendor_capabilities.py` pattern works because it identifies a `tuple[str, str]` (vendor × model) as a first-class key in a `dict[tuple, VendorCapabilities]`. The same pattern applied to the 5 fat-struct candidates in §3 produces the same win: shape becomes addressable, dict-key-lookups become field-access, and the static analysis can verify the contract.**
|
||||
Reference in New Issue
Block a user