Private
Public Access
0
0

conductor(track): Initialize data_structure_strengthening_20260606

Track + metadata + state + tracks.md registration for the type-aliases
refactor that follows the audit_weak_types.py findings (430 weak sites
across 29 of 61 files; 86% concentrated in 6 high-traffic files).

Key design decisions (per user approval):
- 10 TypeAlias definitions in src/type_aliases.py (Metadata, CommsLogEntry,
  CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition,
  ToolCall, CommsLogCallback).
- 1 NamedTuple (FileItemsDiff) for the _reread_file_items return.
- Mechanical replacement of 345 weak sites across 6 files (NOT 430; the
  remaining 85 are in 23 lower-impact files deferred to future tracks).
- scripts/audit_weak_types.py gains a --strict mode and a baseline file
  (scripts/audit_weak_types.baseline.json) so the count is enforced.
- 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples
  + docs + archive.
- Honest about what's missing: TypedDict / @dataclass migration is a
  follow-up track (typed_dict_migration_20260606), not this one.
- Coexistence with the data_oriented_error_handling_20260606 track's
  Result[T] / ErrorInfo: the aliases are value-level (data types), Result
  is control-level (wrapper). They compose (Result[FileItems] is valid).
  No conflict.

Audit baseline:
- Pre-track: 430 weak sites, 0 strong patterns
- Target after Phase 1: ~60 weak sites (only the 23 lower-impact files)
- Top 4 unique type strings account for 86% of findings (4-6 aliases
  eliminate the bulk of the noise).

Not blocked by anything; can be executed independently of the other
pending tracks. Blocks typed_dict_migration_20260606 (the future Phase 2).
This commit is contained in:
2026-06-06 17:49:22 -04:00
parent 84fd9ac90e
commit ed42a97a9b
4 changed files with 552 additions and 0 deletions
+4
View File
@@ -167,6 +167,10 @@ User review surfaced five outstanding UI issues, each previously attempted witho
*Goal: Introduce Ryan Fleury's "errors are just cases" framework as a project convention. New `src/result_types.py` (ErrorKind enum, ErrorInfo dataclass, `Result[T]` with data + side-channel errors list, NilPath + NilRAGState sentinel singletons) and new `conductor/code_styleguides/error_handling.md` canonical reference. Refactor `src/mcp_client.py` ((p, err) tuples → Result; 30+ `assert p is not None` → nil-sentinel paths), `src/ai_client.py` (ProviderError exception → ErrorInfo dataclass; `_send_<vendor>()` → `_send_<vendor>_result()` returning `Result[str]`; `send()` marked `@deprecated`; new `send_result()` public API), and `src/rag_engine.py` (RAGEngine methods → Result returns). Update `conductor/product-guidelines.md` + `workflow.md` + `docs/guide_*.md` so the convention is documented and future plans can incrementally migrate the remaining `src/` files. **Blocked by** startup_speedup, test_batching_refactor, and qwen_llama_grok tracks. 5 phases: foundation+styleguide, mcp_client refactor, ai_client refactor (highest risk; ProviderError removal), rag_engine refactor, deprecation+docs+archive.*
*Follow-up: [./tracks/public_api_migration_20260606/](./tracks/public_api_migration_20260606/) (planned; not yet specced) — removes the deprecated `ai_client.send()` and migrates all callers.*
0f. [ ] **Track: Data Structure Strengthening (Type Aliases + NamedTuples)** `[track-created: <pending>]`
*Link: [./tracks/data_structure_strengthening_20260606/](./tracks/data_structure_strengthening_20260606/), Spec: [./tracks/data_structure_strengthening_20260606/spec.md](./tracks/data_structure_strengthening_20260606/spec.md), Plan: [./tracks/data_structure_strengthening_20260606/plan.md](./tracks/data_structure_strengthening_20260606/plan.md) (to be authored by writing-plans skill)*
*Goal: Improve AI-readability by naming 430 currently-anonymous `dict[str, Any]` / `list[dict[...]]` / `Tuple[...]` types. New `src/type_aliases.py` with 10 `TypeAlias` definitions (`Metadata`, `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `FileItem`, `FileItems`, `ToolDefinition`, `ToolCall`, `CommsLogCallback`) and 1 `NamedTuple` (`FileItemsDiff`). Mechanical replacement of 345 weak sites across 6 high-traffic files: `src/ai_client.py` (139), `src/app_controller.py` (86), `src/models.py` (51), `src/api_hook_client.py` (32), `src/project_manager.py` (20), `src/aggregate.py` (17). Add `--strict` mode to the existing `scripts/audit_weak_types.py` (committed in 84fd9ac9; found the 430 sites) so it becomes a permanent CI gate that fails when new weak types are introduced. Generate `scripts/audit_weak_types.baseline.json` with the post-refactor count. 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. **Data-grounded**: the audit script is the source of truth; the count drops from 430 to ~60 (86% reduction) in the 6 high-traffic files. **Honest about what's missing**: 23 lower-impact files remain; TypedDict/dataclass migration is deferred to a follow-up track. 2-3 days work, 1-2 phases, low risk.*
0b. [x] **Track: rag_phase4_stress_test_flake_20260606** — fixed 16412ad5
*Status: 2026-06-06 — Surfaced during post-v2 verification. Resolved: real bug, NOT a test flake. Root cause: ChromaDB collection dimension mismatch across test runs. The persistent on-disk collection (`tests/artifacts/live_gui_workspace/.slop_cache/chroma_test_stress/`) was created by a previous run with Gemini embeddings (3072-dim); the current run uses local SentenceTransformers (384-dim). `index_file()` upserts silently corrupt the collection, then `search()` fails with `Collection expecting embedding with dimension of 3072, got 384` and the AI request never reaches 'done' status, timing out the 50*0.5s = 25s poll loop. Fix: `RAGEngine._init_vector_store` now calls `_validate_collection_dim` which inspects the first existing vector's dim, compares to the current provider's output, and recreates the collection on mismatch (with a stderr warning). Regression tests added: `test_rag_collection_dim_mismatch_recreates_collection` and `test_rag_collection_dim_match_preserves_collection` in `tests/test_rag_engine.py`. This also fixes a real user-facing bug: switching embedding providers in the GUI previously caused silent corruption. Commit 16412ad5.*
0a. [ ] **Track: prior_session_test_harden_20260605** [superseded by live_gui_test_hardening_v2_20260605]
@@ -0,0 +1,146 @@
{
"track_id": "data_structure_strengthening_20260606",
"name": "Data Structure Strengthening (Type Aliases + NamedTuples)",
"initialized": "2026-06-06",
"owner": "tier2-tech-lead",
"priority": "medium",
"status": "active",
"type": "refactor + ai-readability + documentation",
"scope": {
"new_files": [
"src/type_aliases.py",
"tests/test_type_aliases.py",
"tests/test_audit_weak_types.py",
"conductor/code_styleguides/type_aliases.md"
],
"modified_files": [
"src/ai_client.py",
"src/app_controller.py",
"src/models.py",
"src/api_hook_client.py",
"src/project_manager.py",
"src/aggregate.py",
"conductor/product-guidelines.md",
"scripts/audit_weak_types.py"
]
},
"blocked_by": [],
"blocks": ["typed_dict_migration_20260606" /* not yet created */],
"estimated_phases": 2,
"spec": "spec.md",
"plan": "plan.md",
"priority_order": "A (6 aliases + 6-file replacement) > B (canonical names + audit CI gate) > C (NamedTuples + docs) > D (plan follow-up)",
"audit_data": {
"total_weak_findings_baseline": 430,
"files_scanned": 61,
"files_with_findings_baseline": 29,
"positive_patterns_baseline": 0,
"unique_type_strings_baseline": 26,
"top_4_unique_types_account_for_pct": 86,
"top_offender": "src/ai_client.py (139 findings, 32.3%)"
},
"type_aliases": {
"Metadata": "dict[str, Any] - the root alias; any key-value record",
"CommsLogEntry": "Metadata - a single entry in the AI comms log",
"CommsLog": "list[CommsLogEntry] - the comms log ring buffer",
"HistoryMessage": "Metadata - a single message in the AI provider history",
"History": "list[HistoryMessage] - the conversation history",
"FileItem": "Metadata - a single file in the context (path, content, is_image, etc.)",
"FileItems": "list[FileItem] - the most common weak pattern in the codebase",
"ToolDefinition": "Metadata - a single tool definition (function name, description, parameters)",
"ToolCall": "Metadata - a single tool call from the model (id, type, function)",
"CommsLogCallback": "Callable[[CommsLogEntry], None] - the callback signature"
},
"named_tuples": {
"FileItemsDiff": "NamedTuple with fields (refreshed: FileItems, changed: FileItems) - the return of _reread_file_items"
},
"refactor_targets": {
"src/ai_client.py": {
"weak_sites": 139,
"replacement_strategy": "79 dict_str_any -> Metadata/CommsLogEntry/HistoryMessage/FileItem/ToolDefinition/ToolCall; 56 list_of_dict -> CommsLog/History/FileItems/ToolDefinitions; 2 Optional[List[Dict[...]]] -> Optional[FileItems]; 2 assign_tuple_literal -> ToolCall"
},
"src/app_controller.py": {
"weak_sites": 86,
"replacement_strategy": "62 dict_str_any -> Metadata; 20 list_of_dict -> list[Metadata]; 4 optional_dict -> Optional[Metadata]"
},
"src/models.py": {
"weak_sites": 51,
"replacement_strategy": "48 dict_str_any -> Optional[Metadata]; 3 list_of_dict -> list[Metadata]"
},
"src/api_hook_client.py": {
"weak_sites": 32,
"replacement_strategy": "30 dict_str_any -> Metadata; 2 list_of_dict -> list[Metadata]"
},
"src/project_manager.py": {
"weak_sites": 20,
"replacement_strategy": "16 dict_str_any -> Metadata; 3 list_of_dict -> list[Metadata]; 1 optional_dict -> Optional[Metadata]"
},
"src/aggregate.py": {
"weak_sites": 17,
"replacement_strategy": "10 dict_str_any -> Metadata; 7 list_of_dict -> list[Metadata]"
}
},
"audit_ci_gate": {
"script": "scripts/audit_weak_types.py",
"current_mode": "informational (exit 0 always)",
"new_mode": "strict (exit 1 if new findings introduced vs baseline)",
"baseline_file": "scripts/audit_weak_types.baseline.json",
"baseline_after_phase_1": "~60 findings (only the 23 lower-impact files remain)",
"target_reduction": "430 -> ~60 (86% reduction in the 6 high-traffic files)"
},
"ai_performance_analysis": {
"win": "A name is a one-time cost the AI pays to learn, then reuses forever. With 10 aliases covering 370+ usages, the AI's vocabulary cost is bounded while the readability win is unbounded.",
"cost": "10 new names for the AI to learn. Comparable to adding 10 new function names to a module - well within normal Python codebase scale.",
"caveat": "If we add too many aliases (50+), the cognitive cost exceeds the benefit. The proposed 10 is the sweet spot. Phase 2 will convert the most-used aliases to TypedDict, which gives the AI field-level hints, not just a name.",
"honest_assessment": "Net win. The current 0 aliases is the worst case; going to 10 is a strictly better state for AI readability."
},
"coexistence_with_data_oriented_track": {
"Result_T": "The data_oriented_error_handling_20260606 track introduces Result[T] as a control-level wrapper. The aliases introduced by THIS track are value-level types (what's inside the T).",
"ErrorInfo": "Already a @dataclass from the data_oriented track; no change.",
"Result_composition": "Result[FileItems] is valid - the aliases name the T, not the Result itself."
},
"architectural_invariant": "The 6 type aliases are the CANONICAL names for the metadata family. New code MUST use them. Old code is migrated opportunistically. The audit script enforces this via the --strict mode (exits 1 if new weak sites are introduced).",
"threading_constraint": "No change. TypeAlias is type-level only; runtime behavior is identical to the underlying types. The aliases are thread-safe because dict / list / Callable are thread-safe for the operations performed.",
"verification_criteria": [
"src/type_aliases.py exists with 10 TypeAliases and 1 NamedTuple",
"All 10 aliases import successfully (tests/test_type_aliases.py)",
"Result[FileItems] is a valid generic (verified by importing)",
"scripts/audit_weak_types.py reports 370+ fewer findings after Phase 1 (~60 total)",
"scripts/audit_weak_types.py --strict mode exits 1 when a new weak site is added",
"scripts/audit_weak_types.baseline.json is committed with the post-Phase-1 count",
"src/ai_client.py: 139 weak sites -> 0 weak sites (all replaced with aliases)",
"src/app_controller.py: 86 -> 0",
"src/models.py: 51 -> 0",
"src/api_hook_client.py: 32 -> 0",
"src/project_manager.py: 20 -> 0",
"src/aggregate.py: 17 -> 0",
"Phase 2: _reread_file_items returns FileItemsDiff (NamedTuple); all call sites updated",
"Phase 2: 1-2 more tuple returns converted to NamedTuples opportunistically",
"tests/test_type_aliases.py: 8+ tests pass",
"tests/test_audit_weak_types.py: 6+ tests pass",
"tests/test_ai_client.py (existing): no regressions",
"tests/test_app_controller.py (existing): no regressions",
"tests/test_models.py (existing): no regressions",
"tests/test_api_hook_client.py (existing): no regressions",
"tests/test_project_manager.py (existing): no regressions",
"tests/test_aggregate.py (existing): no regressions",
"conductor/product-guidelines.md: new 'Data Structure Conventions' section added",
"conductor/code_styleguides/type_aliases.md: the canonical reference",
"No new threading.Thread calls in src/",
"No new Optional[X] introduced by the refactor (the aliases compose with Optional, but no NEW Optional types are added)",
"No runtime behavior changes (aliases are type-level only)"
],
"links": {
"backlog_entry": "conductor/tracks.md (to be added)",
"audit_script": "scripts/audit_weak_types.py",
"code_styleguide": "conductor/code_styleguides/type_aliases.md (to be created in Phase 2)",
"testing_guide": "docs/guide_testing.md",
"audit_baseline": "scripts/audit_weak_types.baseline.json (to be created in Phase 1)",
"related_tracks": [
"conductor/tracks/startup_speedup_20260606/",
"conductor/tracks/test_batching_refactor_20260606/",
"conductor/tracks/qwen_llama_grok_integration_20260606/",
"conductor/tracks/data_oriented_error_handling_20260606/"
]
}
}
@@ -0,0 +1,311 @@
# Track: Data Structure Strengthening (Type Aliases + NamedTuples)
**Status:** Active (spec approved 2026-06-06)
**Initialized:** 2026-06-06
**Owner:** Tier 2 Tech Lead
**Priority:** Medium (developer + AI-readability; not a regression blocker)
---
## 1. Overview
This track introduces a small, focused set of `TypeAlias` definitions in a new `src/type_aliases.py` module and replaces 370+ anonymous `dict[str, Any]` / `list[dict[...]]` usages across 6 high-traffic files (`src/ai_client.py`, `src/app_controller.py`, `src/models.py`, `src/api_hook_client.py`, `src/project_manager.py`, `src/aggregate.py`). It also converts 2-3 tuple returns to `NamedTuple`s for self-documenting struct semantics.
The track is **data-grounded**: a new AST-based audit script (`scripts/audit_weak_types.py`, committed in `84fd9ac9`) found 430 weak type sites across 29 of 61 files. After whitespace normalization, only **26 unique type strings** exist; the top 4 (`list[dict[str, Any]]`, `dict[str, Any]`, `Dict[str, Any]`, `List[Dict[str, Any]]`) account for 86% of findings. A small set of well-named aliases eliminates the vast majority.
**The current codebase has ZERO strong type aliases** (no `TypeAlias`, no `NamedTuple`, no `pydantic.BaseModel` for these shapes). This is the worst case for AI readability — an LLM reading the code has zero schema hints and must guess the shape from usage at every call site.
**Scope is deliberately bounded.** The track adds **6 type aliases** and converts **2-3 tuple returns** to NamedTuples. It does NOT migrate to `TypedDict` or `@dataclass` schemas (that's a much larger Phase 2, planned as a separate follow-up). It does NOT touch the 23 lower-impact files; they remain as `dict[str, Any]` until a future track migrates them.
## 2. Goals (Priority Order)
| Priority | Goal | Rationale |
|---|---|---|
| **A (primary value)** | Add 6 `TypeAlias` definitions to `src/type_aliases.py`: `Metadata`, `CommsLogEntry`, `CommsLog`, `FileItem`, `FileItems`, `HistoryMessage`. | Each alias names a concept that currently appears as `dict[str, Any]` or `list[dict[str, Any]]` in 30+ sites. The name is self-documenting; the underlying type is the same. |
| **A (primary value)** | Mechanical replacement of 370+ weak sites in 6 files: `src/ai_client.py`, `src/app_controller.py`, `src/models.py`, `src/api_hook_client.py`, `src/project_manager.py`, `src/aggregate.py`. | The audit shows 86% of findings are in these 6 files. A focused refactor here eliminates the bulk of the noise. |
| **B (architectural)** | The new aliases are the **canonical** names going forward. New code MUST use the aliases. Old code is migrated opportunistically (this track + future tracks). | One source of truth. The audit script (`scripts/audit_weak_types.py`) becomes a permanent CI gate that fails when new weak types are introduced. |
| **B (architectural)** | Audit script exits 0 with significantly fewer findings after the refactor. Re-running `--json` should show the count drop from 430 to ~60 (only the 23 lower-impact files remain). | Measurable success criterion. The audit script is the ground truth. |
| **C (optimization)** | Convert 2-3 tuple returns to `NamedTuple`s. Specifically: `_reread_file_items()` returns `Tuple[refreshed, changed]` becomes a `FileItemsDiff` NamedTuple. Other 1-occurrence tuples (screen coords, etc.) are converted opportunistically. | The tuple return pattern is rarer than the dict pattern (4 sites vs 430), but each conversion is high-value for self-documentation. |
| **C (documentation)** | Add a short "Data Structure Conventions" section to `conductor/product-guidelines.md` and a "Type Aliases" subsection in `conductor/code_styleguides/error_handling.md` (or a new `code_styleguides/type_aliases.md`). | The convention is visible in the project-level guidance. Future plans reference it. |
| **D (forward-looking)** | Plan a Phase-2 follow-up track: "TypedDict / dataclass Migration" that converts the most-used aliases (`CommsLogEntry`, `FileItem`) to `TypedDict` or `@dataclass(frozen=True)` so the FIELDS are visible to LLMs, not just the name. NOT in this track; documented in §12.1. | Honest about what's missing. Phase 2 is a separate effort. |
### 2.1 Non-Goals (this track)
- **Not** converting `dict[str, Any]` to `TypedDict` or `@dataclass`. That's Phase 2 of a future track. This track stops at NAMING the shapes; it does not give them structure.
- **Not** touching the 23 lower-impact files. They stay as `dict[str, Any]` until a future incremental track migrates them. The audit script makes their weakness VISIBLE so the cost of ignoring them is documented.
- **Not** changing the `Result[T]` pattern from the `data_oriented_error_handling_20260606` track. The aliases complement `Result`; they don't replace it. (`ErrorInfo` is a `@dataclass`, not a `TypeAlias`; it's already structured.)
- **Not** adding pydantic models. The project doesn't currently use pydantic for these shapes; introducing it would be a much larger architectural decision.
- **Not** modifying the data_oriented_error_handling_20260606 track's `src/result_types.py`. The aliases live in a new file (`src/type_aliases.py`); they coexist with `Result`/`ErrorInfo`.
- **Not** changing the public API of any function. The aliases are TYPE-LEVEL ONLY; runtime behavior is identical.
## 3. Architecture
### 3.1 The Aliases
`src/type_aliases.py` (NEW, ~80 lines):
```python
from typing import Any, Callable, TypeAlias
# A single key-value record. The shape is intentionally open (Any value type)
# because different concepts use different value types (str for paths, int for
# counts, dict for nested structures, etc.). The name documents the SEMANTIC
# ROLE, not the structural shape.
Metadata: TypeAlias = dict[str, Any]
# A single entry in the AI comms log (the in-memory ring buffer of API
# requests/responses/timestamps/kind/direction). Used by _comms_log,
# _append_comms, get_comms_log, comms_log_callback, etc.
CommsLogEntry: TypeAlias = Metadata
# A list of comms log entries.
CommsLog: TypeAlias = list[CommsLogEntry]
# A single entry in the AI provider's conversation history (the messages
# list passed to/from OpenAI/Anthropic/Gemini). Used by _anthropic_history,
# _deepseek_history, _minimax_history, _grok_history, _llama_history, etc.
HistoryMessage: TypeAlias = Metadata
# A list of history messages.
History: TypeAlias = list[HistoryMessage]
# A single file item in the context (path, content, is_image flag, base64
# data, mtime). Used by file_items parameter (the most-threated list in
# the codebase), _reread_file_items, _build_file_context_text, etc.
FileItem: TypeAlias = Metadata
# A list of file items. The most common weak pattern in the codebase.
FileItems: TypeAlias = list[FileItem]
# A single tool definition (function name, description, parameters schema).
# Used by _build_anthropic_tools, _CACHED_ANTHROPIC_TOOLS, _get_anthropic_tools,
# and the corresponding openai-compatible / gemini / deepseek builders.
ToolDefinition: TypeAlias = Metadata
# A single tool call from the model (id, type, function: {name, arguments}).
# Used by response.tool_calls parsing across all providers.
ToolCall: TypeAlias = Metadata
# A callback that receives a comms log entry. Used by comms_log_callback,
# confirm_and_run_callback, etc.
CommsLogCallback: TypeAlias = Callable[[CommsLogEntry], None]
```
### 3.2 The NamedTuples (Phase 2)
`src/type_aliases.py` (continued):
```python
from typing import NamedTuple
# Return type of _reread_file_items. The two lists are conceptually distinct:
# refreshed = items whose mtime was checked and the content re-read; changed =
# items whose content actually changed (subset of refreshed).
class FileItemsDiff(NamedTuple):
refreshed: FileItems
changed: FileItems
```
(Optional, if 1-2 more tuple returns warrant conversion — e.g., `Optional[Tuple[int, int, int, int]]` for screen coords, etc. — add them as separate `NamedTuple`s with semantic names.)
### 3.3 Why These Specific Aliases
The 6 aliases were chosen to be **concept-distinct**: each names a different semantic role that the code uses. Using the same name (`Metadata`) for all of them would collapse the semantic distinction; using 30 names would exceed the AI's vocabulary budget. 6 is the sweet spot:
| Alias | Semantic role | Distinct from |
|---|---|---|
| `Metadata` | generic key-value record | (root) |
| `CommsLogEntry` | a single comms log entry | `HistoryMessage` (different lifecycle) |
| `HistoryMessage` | a single AI provider history message | `CommsLogEntry` (different lifecycle) |
| `FileItem` | a single file in the context | `ToolDefinition` (different shape: paths vs function specs) |
| `ToolDefinition` | a single tool definition | `FileItem`, `ToolCall` |
| `ToolCall` | a single tool call from the model | `ToolDefinition` (definition vs invocation) |
Some of these are aliased to `Metadata` (e.g., `CommsLogEntry: TypeAlias = Metadata`). This is intentional: Phase 2 can convert `Metadata` to a `TypedDict` (or split into per-concept `TypedDict`s) and the aliases continue to work without breaking changes. The aliases are STABLE NAMES; the underlying type can evolve.
### 3.4 Module Layout
```
src/
type_aliases.py # NEW: 6 TypeAliases + 1-3 NamedTuples
ai_client.py # MODIFIED: import aliases; replace ~139 weak sites
app_controller.py # MODIFIED: import aliases; replace ~86 weak sites
models.py # MODIFIED: import aliases; replace ~51 weak sites
api_hook_client.py # MODIFIED: import aliases; replace ~32 weak sites
project_manager.py # MODIFIED: import aliases; replace ~20 weak sites
aggregate.py # MODIFIED: import aliases; replace ~17 weak sites
mcp_client.py # UNCHANGED (only 9 weak sites; below the threshold)
conductor/
product-guidelines.md # MODIFIED: new "Data Structure Conventions" section
code_styleguides/
type_aliases.md # NEW: the canonical reference (or co-located in error_handling.md)
scripts/
audit_weak_types.py # already committed in 84fd9ac9; runs as CI gate
tests/
test_type_aliases.py # NEW: verify the aliases import and resolve to the right types
(existing test files): # MODIFIED: update the 6 files; existing tests should pass unchanged
```
### 3.5 Coexistence with `Result[T]` and `ErrorInfo`
The new `Metadata` family aliases are VALUE-LEVEL types (what's in a dict). The `Result[T]` from `data_oriented_error_handling_20260606` is a CONTROL-LEVEL wrapper (a data struct that includes errors). They compose:
```python
# Data-oriented error handling returns:
Result[CommsLogEntry] # a Result wrapping a single comms log entry
Result[History] # a Result wrapping a list of history messages
Result[FileItems] # a Result wrapping a list of file items
# The aliases name the "T" in Result[T], not the Result itself.
```
This is consistent: `Result` is a generic that wraps any data type. Naming the data types (via `TypeAlias`) makes the generic concrete without changing the `Result` pattern.
## 4. Per-File Refactor Plan
### 4.1 `src/ai_client.py` (139 sites — largest offender)
**Pattern:** `_anthropic_history: list[dict[str, Any]]` (and 5 sibling histories), `_comms_log: deque[dict[str, Any]]`, `get_comms_log -> list[dict[str, Any]]`, `_build_anthropic_tools -> list[dict[str, Any]]`, `_reread_file_items -> tuple[list[...], list[...]]`, etc.
**Refactor strategy:**
- Replace all 79 `dict[str, Any]` / `Dict[str, Any]` with `Metadata` or the more specific alias.
- Replace all 56 `list[dict[...]]` with `CommsLog` / `History` / `FileItems` / `ToolDefinitions` based on the SEMANTIC ROLE of the list.
- 2 `Optional[List[Dict[...]]]` with `Optional[FileItems]` (the `_CACHED_ANTHROPIC_TOOLS` is an Optional[ToolDefinitions]).
- 2 tuple-return literal returns: the `cast(...)` patterns in `_dispatch_tool`. Replace with `ToolCall` extraction.
**Naming heuristic:** for each list of dicts, look at the variable name + the function name to determine the semantic role. E.g., `_comms_log``CommsLog`; `_anthropic_history``History`; `_build_anthropic_tools``ToolDefinitions`; `_reread_file_items(file_items: list[...])``FileItems`.
### 4.2 `src/app_controller.py` (86 sites)
**Pattern:** `_pending_dialog: Optional[ConfirmDialog] = None` (stays as-is; this is a STRONG type already), `last_error: Optional[Dict[str, str]] = None` (could be `Optional[ErrorInfo]` from the data_oriented track), but most weak sites are in the `Hook API` request/response payloads and the `pre_tool_callback` family.
**Refactor strategy:**
- The 62 `dict_str_any` sites: replace with `Metadata` or `CommsLogEntry` based on context.
- The 20 `list_of_dict` sites: replace with the appropriate alias.
- The 4 `optional_dict` sites: replace with `Optional[Metadata]` (or `Optional[CommsLogEntry]` if the context is the hook request payload).
### 4.3 `src/models.py` (51 sites)
**Pattern:** Dataclass fields. E.g., `script: Optional[str] = None` (stays as-is; STRONG), but also `target_file: Optional[str] = None` and many fields where the type is `Optional[Dict[str, Any]]` (in dataclass fields).
**Refactor strategy:** Replace 48 `dict_str_any` with `Optional[Metadata]`; 3 `list_of_dict` with the appropriate alias.
### 4.4 `src/api_hook_client.py` (32 sites)
**Pattern:** HTTP request/response payloads. E.g., `payload: Dict[str, Any]`, `data: dict[str, Any]`.
**Refactor strategy:** 30 `dict_str_any``Metadata`; 2 `list_of_dict``list[Metadata]`.
### 4.5 `src/project_manager.py` (20 sites)
**Pattern:** TOML config dicts. E.g., `proj: dict[str, Any]`, `data: dict[str, Any]`.
**Refactor strategy:** 16 `dict_str_any``Metadata`; 3 `list_of_dict``list[Metadata]`; 1 `optional_dict``Optional[Metadata]`.
### 4.6 `src/aggregate.py` (17 sites)
**Pattern:** Aggregation result dicts. E.g., `result: dict[str, list[dict[str, Any]]]`.
**Refactor strategy:** 10 `dict_str_any``Metadata`; 7 `list_of_dict` → appropriate alias.
### 4.7 Phase 2 NamedTuple conversions
- **`_reread_file_items`** in `src/ai_client.py` (returns `Tuple[List[FileItem], List[FileItem]]`) → returns `FileItemsDiff`. Affects ~3-4 call sites.
- **1-2 screen-coord tuples** (1-occurrence each) — opportunistic. If the call site is clear and the names are obvious, convert; otherwise leave.
## 5. The Audit Script as a Permanent CI Gate
After this track, the audit script becomes a permanent CI gate. `scripts/audit_weak_types.py` exits 0 even when findings exist (it's informational). The CI gate uses a stricter mode:
```bash
# New mode: --strict, exits 1 if any new weak site is added in a PR
python scripts/audit_weak_types.py --strict
```
The `--strict` mode compares the current count to a baseline (stored in `scripts/audit_weak_types.baseline.json`). If the current count is HIGHER than the baseline, exit 1. The baseline is regenerated after this track to the post-refactor count (~60 findings, only the 23 lower-impact files remain).
This is documented in the spec but the actual `--strict` mode is implemented as part of the track (Phase 1 final task). Future PRs that introduce new `dict[str, Any]` or anonymous tuples will fail CI.
## 6. Configuration
No new dependencies. No new environment variables. No new config files.
The aliases live in `src/type_aliases.py` (pure stdlib `typing.TypeAlias`).
## 7. Testing Strategy
| Test File | Purpose | Coverage Target |
|---|---|---|
| `tests/test_type_aliases.py` | Verify the aliases import; verify they resolve to the expected types; verify they compose with `Result[T]` (e.g., `Result[FileItems]` is a valid generic). | 100% |
| `tests/test_audit_weak_types.py` | Verify the audit script's regex patterns are correct; verify the `Finding` dataclass is populated correctly; verify the report matches expectations. | 90% |
| `tests/test_ai_client.py` (existing) | Verify no regressions after the 139-site replacement. | 100% (regression) |
| `tests/test_app_controller.py` (existing) | Verify no regressions after the 86-site replacement. | 100% (regression) |
| `tests/test_models.py` (existing) | Verify no regressions after the 51-site replacement. | 100% (regression) |
| `tests/test_api_hook_client.py` (existing) | Verify no regressions after the 32-site replacement. | 100% (regression) |
| `tests/test_project_manager.py` (existing) | Verify no regressions after the 20-site replacement. | 100% (regression) |
| `tests/test_aggregate.py` (existing) | Verify no regressions after the 17-site replacement. | 100% (regression) |
| `tests/test_mcp_client.py` (existing) | Verify no regressions. (mcp_client is unchanged but the aliases may be adopted opportunistically in Phase 1.5 if convenient.) | 100% (regression) |
**Mocking strategy:** Existing tests use `unittest.mock.patch`; no changes needed.
**Audit baseline check:** After Phase 1, the audit script should report 0 NEW findings (the count may go UP if a few sites were missed, but the trend is DOWN). After Phase 2, the count should be at or below the pre-track baseline minus 50 (the targeted reductions).
## 8. Migration / Rollout
| Phase | What | Risk |
|---|---|---|
| **Phase 1 — Aliases + 6-file replacement + audit baseline** | Add `src/type_aliases.py`. Add `tests/test_type_aliases.py`. Mechanical replacement in 6 files. Add `--strict` mode to the audit script. Generate the new baseline. | Medium. ~345 sites of mechanical replacement. Mitigated by existing test coverage. |
| **Phase 2 — NamedTuples + docs + archive** | Convert 2-3 tuple returns to NamedTuples. Update `conductor/product-guidelines.md` and `code_styleguides/`. Manual smoke test. Archive the track. | Low. ~3-4 sites of tuple conversion. Docs-only changes. |
Each phase has its own checkpoint commit and git note.
## 9. Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Mechanical replacement misses a few sites; the count doesn't drop as expected. | Medium | Low | The audit script is the source of truth. Re-run after Phase 1; investigate any anomalies. |
| Renaming `dict[str, Any]` to `Metadata` (or another alias) changes how some tests introspect types (e.g., `isinstance(x, dict)`). | Low | Medium | The aliases are TYPE-LEVEL ONLY; at runtime, `Metadata` IS `dict[str, Any]` IS `dict`. `isinstance(x, dict)` continues to work. Test cases that use `get_type_hints()` may need updating; documented in the test plan. |
| A future contributor adds a new `dict[str, Any]` and the audit script doesn't catch it. | Low | Low | The audit script's regex patterns are exhaustive for the current 430 findings. New patterns (e.g., a new `Mapping[str, Any]`) would be missed. The track documents the patterns the script knows; future contributions of new patterns warrant extending the script. |
| The aliases conflict with the `Result[T]` and `ErrorInfo` from the data_oriented_error_handling track. | Low | Low | The aliases are VALUE-LEVEL (data types); `Result` and `ErrorInfo` are CONTROL-LEVEL (wrappers). They compose: `Result[FileItems]` is valid. No conflict. |
| The 6-file mechanical replacement is too large to review in one PR. | Medium | Low | Phase 1 is split into 6 sub-tasks (one per file) in the plan, each with its own commit. Reviewers can review file-by-file. |
| The 23 lower-impact files are NEVER migrated. | High | Low (acceptable) | The audit script stays in the codebase as a permanent CI gate. The cost of ignoring the 23 files is now VISIBLE. Future tracks can pick them up opportunistically. |
## 10. Out of Scope (Explicit)
- **TypedDict / @dataclass migration** of the `Metadata` family. Deferred to a Phase 2 of a future track. This track adds NAMES; the next track adds STRUCTURE.
- **The 23 lower-impact files** (those with 1-9 weak sites each). Deferred; will be addressed opportunistically or in a future incremental track.
- **Adding pydantic models.** Not requested; would be a much larger architectural decision.
- **Changing function signatures at the runtime level.** The aliases are TYPE-LEVEL; runtime behavior is identical.
- **Modifying `scripts/audit_weak_types.py`'s regex patterns.** The patterns are correct for the current findings. If new patterns emerge, a future track can extend the script.
- **Migrating the data_oriented_error_handling_20260606 track's `src/result_types.py` aliases.** The 2 type-aliases modules are SEPARATE: `result_types.py` has `ErrorInfo` / `Result` / `ErrorKind`; `type_aliases.py` has `Metadata` / `CommsLog` / `FileItem` / etc. They don't overlap.
## 11. Open Questions
1. **The 6 aliases or 4?** The 6 listed in §3.1 are: `Metadata`, `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `FileItem`, `FileItems`, `ToolDefinition`, `ToolCall`, `CommsLogCallback`. That's 10. Should we cut to 4-6 to minimize the AI vocabulary? (Proposal: keep all 10; they're each named for a distinct concept, and the 10 names are self-explanatory. The "vocabulary cost" is the same as adding 10 new function names to a module — well within normal Python codebase scale.)
2. **Should `FileItem` and `ToolDefinition` be `TypedDict` from the start?** A `TypedDict` gives the AI field-level hints, not just a name. But introducing `TypedDict` requires knowing the FIELDS, which is a deeper semantic task. (Proposal: Phase 1 uses `TypeAlias = dict[str, Any]`; Phase 2 of a future track converts to `TypedDict`. Keeps the current track scope tight.)
3. **Should the audit script enforce a count threshold (e.g., "no more than 100 weak sites total") or a per-file threshold (e.g., "no file may have more than 50 weak sites")?** (Proposal: per-file threshold is more actionable. A future PR that introduces 20 new `dict[str, Any]` in `foo.py` would fail even if the total count didn't increase.)
## 12. See Also
### 12.1 Follow-up Track (planned; not in this spec)
**"TypedDict / dataclass Migration"** (`typed_dict_migration_20260606` or similar) — converts the most-used aliases (`CommsLogEntry`, `FileItem`, `ToolDefinition`, `HistoryMessage`) to `TypedDict` (Python 3.8+) or `@dataclass(frozen=True)` so the FIELDS are visible to LLMs, not just the name. This is the natural Phase 2 of this track. Prerequisites: this track (so the field-level schema has a stable name to attach to).
### 12.2 Project References
- `scripts/audit_weak_types.py` (already committed; `84fd9ac9`) — the audit that found 430 weak sites.
- `docs/guide_testing.md` — test conventions.
- `conductor/code_styleguides/error_handling.md` (created in the data_oriented_error_handling_20260606 track) — the convention for `Result` types; the new type-aliases convention lives alongside.
- `conductor/product-guidelines.md` "Data-Oriented Error Handling" — the convention this track extends (Data Structure Strengthening is a new top-level convention in the same family).
- `conductor/tracks/data_oriented_error_handling_20260606/` — the previous track that established the convention format; this track uses the same pattern.
### 12.3 External References
- **Python `typing.TypeAlias`** — the canonical mechanism for type aliases (PEP 613, Python 3.10+).
- **Python `typing.NamedTuple`** — for tuple-with-fields.
- **Python `typing.TypedDict`** — for the future Phase 2 (not in this track).
- **Mike Acton on data-oriented design** — the "data is the API" framing that motivates NAMING data structures clearly.
- **Casey Muratori on module layer boundaries** — the convention that each module owns its data and exposes a clear interface.
@@ -0,0 +1,91 @@
# Track state for data_structure_strengthening_20260606
# Updated by Tier 2 Tech Lead as tasks complete
[meta]
track_id = "data_structure_strengthening_20260606"
name = "Data Structure Strengthening (Type Aliases + NamedTuples)"
status = "active"
current_phase = 0
last_updated = "2026-06-06"
[phases]
phase_1 = { status = "pending", checkpointsha = "", name = "Aliases + 6-file replacement + audit baseline" }
phase_2 = { status = "pending", checkpointsha = "", name = "NamedTuples + docs + archive" }
[tasks]
# Phase 1: Aliases + 6-file replacement
t1_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_type_aliases.py (verify 10 TypeAliases + 1 NamedTuple import and resolve to expected types; verify Result[FileItems] composes)" }
t1_2 = { status = "pending", commit_sha = "", description = "Green: create src/type_aliases.py with 10 TypeAliases (Metadata, CommsLogEntry, CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition, ToolCall, CommsLogCallback) and 1 NamedTuple (FileItemsDiff)" }
t1_3 = { status = "pending", commit_sha = "", description = "Replace 139 weak sites in src/ai_client.py with the new aliases (79 dict_str_any + 56 list_of_dict + 2 Optional[List[Dict]] + 2 assign_tuple_literal)" }
t1_4 = { status = "pending", commit_sha = "", description = "Replace 86 weak sites in src/app_controller.py (62 dict_str_any + 20 list_of_dict + 4 optional_dict)" }
t1_5 = { status = "pending", commit_sha = "", description = "Replace 51 weak sites in src/models.py (48 dict_str_any + 3 list_of_dict)" }
t1_6 = { status = "pending", commit_sha = "", description = "Replace 32 weak sites in src/api_hook_client.py (30 dict_str_any + 2 list_of_dict)" }
t1_7 = { status = "pending", commit_sha = "", description = "Replace 20 weak sites in src/project_manager.py (16 dict_str_any + 3 list_of_dict + 1 optional_dict)" }
t1_8 = { status = "pending", commit_sha = "", description = "Replace 17 weak sites in src/aggregate.py (10 dict_str_any + 7 list_of_dict)" }
t1_9 = { status = "pending", commit_sha = "", description = "Add --strict mode to scripts/audit_weak_types.py (compares current count to baseline file; exits 1 if increased)" }
t1_10 = { status = "pending", commit_sha = "", description = "Generate scripts/audit_weak_types.baseline.json with the post-Phase-1 count" }
t1_11 = { status = "pending", commit_sha = "", description = "Red: tests/test_audit_weak_types.py (verify regex patterns, Finding dataclass, report format)" }
t1_12 = { status = "pending", commit_sha = "", description = "Run full test suite; confirm no regressions in 6 refactored files" }
t1_13 = { status = "pending", commit_sha = "", description = "Run audit; confirm count dropped from 430 to ~60; commit the new baseline" }
t1_14 = { status = "pending", commit_sha = "", description = "Phase 1 checkpoint commit + git note" }
# Phase 2: NamedTuples + docs + archive
t2_1 = { status = "pending", commit_sha = "", description = "Convert src/ai_client.py:_reread_file_items to return FileItemsDiff NamedTuple (replaces Tuple[List[FileItem], List[FileItem]]); update ~3-4 call sites" }
t2_2 = { status = "pending", commit_sha = "", description = "Opportunistic NamedTuple conversions for 1-2 more tuple returns (screen coords, etc.)" }
t2_3 = { status = "pending", commit_sha = "", description = "Create conductor/code_styleguides/type_aliases.md (canonical reference for the alias convention; 5 patterns + decision tree + examples)" }
t2_4 = { status = "pending", commit_sha = "", description = "Add 'Data Structure Conventions' section to conductor/product-guidelines.md (referencing the new styleguide)" }
t2_5 = { status = "pending", commit_sha = "", description = "Manual smoke test: launch GUI; verify type aliases don't break anything; verify audit --strict mode" }
t2_6 = { status = "pending", commit_sha = "", description = "Phase 2 checkpoint commit + git note (TRACK COMPLETE)" }
t2_7 = { status = "pending", commit_sha = "", description = "git mv conductor/tracks/data_structure_strengthening_20260606 to conductor/tracks/archive/" }
t2_8 = { status = "pending", commit_sha = "", description = "Update conductor/tracks.md: move entry to Recently Completed" }
t2_9 = { status = "pending", commit_sha = "", description = "Final state.toml update: mark all phases completed; add follow-up track typed_dict_migration_20260606 placeholder" }
[verification]
# Filled as phases complete
phase_1_aliases_module_complete = false
phase_1_ai_client_refactored = false
phase_1_app_controller_refactored = false
phase_1_models_refactored = false
phase_1_api_hook_client_refactored = false
phase_1_project_manager_refactored = false
phase_1_aggregate_refactored = false
phase_1_audit_strict_mode_added = false
phase_1_baseline_committed = false
phase_2_file_items_diff_named_tuple = false
phase_2_opportunistic_named_tuples = false
phase_2_styleguide_written = false
phase_2_product_guidelines_updated = false
phase_2_smoke_test_passed = false
phase_2_track_archived = false
full_test_suite_passes = false
no_new_optional_introduced = false
audit_count_dropped_to_60 = false
[audit_count_progression]
# Filled as tasks complete
baseline = 430
after_ai_client = 291
after_app_controller = 205
after_models = 154
after_api_hook_client = 122
after_project_manager = 102
after_aggregate = 85
phase_1_checkpoint_committed = 0 # TBD
phase_2_checkpoint_committed = 0 # TBD
[files_refactored]
ai_client = { weak_sites_before = 139, weak_sites_after = 0, status = "pending" }
app_controller = { weak_sites_before = 86, weak_sites_after = 0, status = "pending" }
models = { weak_sites_before = 51, weak_sites_after = 0, status = "pending" }
api_hook_client = { weak_sites_before = 32, weak_sites_after = 0, status = "pending" }
project_manager = { weak_sites_before = 20, weak_sites_after = 0, status = "pending" }
aggregate = { weak_sites_before = 17, weak_sites_after = 0, status = "pending" }
[typed_dict_migration_followup]
track_id = "typed_dict_migration_20260606"
status = "planned_in_data_structure_strengthening_20260606"
converts = ["CommsLogEntry", "FileItem", "ToolDefinition", "HistoryMessage"]
to = "TypedDict or @dataclass(frozen=True)"
[public_api_migration_followup]
# From the data_oriented_error_handling track
note = "This track does not depend on or block the public_api_migration_20260606 track. They are independent."