conductor(track): Initialize data_structure_strengthening_20260606

Track + metadata + state + tracks.md registration for the type-aliases refactor that follows the audit_weak_types.py findings (430 weak sites across 29 of 61 files; 86% concentrated in 6 high-traffic files). Key design decisions (per user approval): - 10 TypeAlias definitions in src/type_aliases.py (Metadata, CommsLogEntry, CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition, ToolCall, CommsLogCallback). - 1 NamedTuple (FileItemsDiff) for the _reread_file_items return. - Mechanical replacement of 345 weak sites across 6 files (NOT 430; the remaining 85 are in 23 lower-impact files deferred to future tracks). - scripts/audit_weak_types.py gains a --strict mode and a baseline file (scripts/audit_weak_types.baseline.json) so the count is enforced. - 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. - Honest about what's missing: TypedDict / @dataclass migration is a follow-up track (typed_dict_migration_20260606), not this one. - Coexistence with the data_oriented_error_handling_20260606 track's Result[T] / ErrorInfo: the aliases are value-level (data types), Result is control-level (wrapper). They compose (Result[FileItems] is valid). No conflict. Audit baseline: - Pre-track: 430 weak sites, 0 strong patterns - Target after Phase 1: ~60 weak sites (only the 23 lower-impact files) - Top 4 unique type strings account for 86% of findings (4-6 aliases eliminate the bulk of the noise). Not blocked by anything; can be executed independently of the other pending tracks. Blocks typed_dict_migration_20260606 (the future Phase 2).
2026-06-06 17:49:22 -04:00
parent 84fd9ac90e
commit ed42a97a9b
4 changed files with 552 additions and 0 deletions
@@ -167,6 +167,10 @@ User review surfaced five outstanding UI issues, each previously attempted witho
   *Goal: Introduce Ryan Fleury's "errors are just cases" framework as a project convention. New `src/result_types.py` (ErrorKind enum, ErrorInfo dataclass, `Result[T]` with data + side-channel errors list, NilPath + NilRAGState sentinel singletons) and new `conductor/code_styleguides/error_handling.md` canonical reference. Refactor `src/mcp_client.py` ((p, err) tuples → Result; 30+ `assert p is not None` → nil-sentinel paths), `src/ai_client.py` (ProviderError exception → ErrorInfo dataclass; `_send_<vendor>()` → `_send_<vendor>_result()` returning `Result[str]`; `send()` marked `@deprecated`; new `send_result()` public API), and `src/rag_engine.py` (RAGEngine methods → Result returns). Update `conductor/product-guidelines.md` + `workflow.md` + `docs/guide_*.md` so the convention is documented and future plans can incrementally migrate the remaining `src/` files. **Blocked by** startup_speedup, test_batching_refactor, and qwen_llama_grok tracks. 5 phases: foundation+styleguide, mcp_client refactor, ai_client refactor (highest risk; ProviderError removal), rag_engine refactor, deprecation+docs+archive.*
   *Follow-up: [./tracks/public_api_migration_20260606/](./tracks/public_api_migration_20260606/) (planned; not yet specced) — removes the deprecated `ai_client.send()` and migrates all callers.*

+0f. [ ] **Track: Data Structure Strengthening (Type Aliases + NamedTuples)** `[track-created: <pending>]`
+   *Link: [./tracks/data_structure_strengthening_20260606/](./tracks/data_structure_strengthening_20260606/), Spec: [./tracks/data_structure_strengthening_20260606/spec.md](./tracks/data_structure_strengthening_20260606/spec.md), Plan: [./tracks/data_structure_strengthening_20260606/plan.md](./tracks/data_structure_strengthening_20260606/plan.md) (to be authored by writing-plans skill)*
+   *Goal: Improve AI-readability by naming 430 currently-anonymous `dict[str, Any]` / `list[dict[...]]` / `Tuple[...]` types. New `src/type_aliases.py` with 10 `TypeAlias` definitions (`Metadata`, `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `FileItem`, `FileItems`, `ToolDefinition`, `ToolCall`, `CommsLogCallback`) and 1 `NamedTuple` (`FileItemsDiff`). Mechanical replacement of 345 weak sites across 6 high-traffic files: `src/ai_client.py` (139), `src/app_controller.py` (86), `src/models.py` (51), `src/api_hook_client.py` (32), `src/project_manager.py` (20), `src/aggregate.py` (17). Add `--strict` mode to the existing `scripts/audit_weak_types.py` (committed in 84fd9ac9; found the 430 sites) so it becomes a permanent CI gate that fails when new weak types are introduced. Generate `scripts/audit_weak_types.baseline.json` with the post-refactor count. 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. **Data-grounded**: the audit script is the source of truth; the count drops from 430 to ~60 (86% reduction) in the 6 high-traffic files. **Honest about what's missing**: 23 lower-impact files remain; TypedDict/dataclass migration is deferred to a follow-up track. 2-3 days work, 1-2 phases, low risk.*
+
 0b. [x] **Track: rag_phase4_stress_test_flake_20260606** — fixed 16412ad5
   *Status: 2026-06-06 — Surfaced during post-v2 verification. Resolved: real bug, NOT a test flake. Root cause: ChromaDB collection dimension mismatch across test runs. The persistent on-disk collection (`tests/artifacts/live_gui_workspace/.slop_cache/chroma_test_stress/`) was created by a previous run with Gemini embeddings (3072-dim); the current run uses local SentenceTransformers (384-dim). `index_file()` upserts silently corrupt the collection, then `search()` fails with `Collection expecting embedding with dimension of 3072, got 384` and the AI request never reaches 'done' status, timing out the 50*0.5s = 25s poll loop. Fix: `RAGEngine._init_vector_store` now calls `_validate_collection_dim` which inspects the first existing vector's dim, compares to the current provider's output, and recreates the collection on mismatch (with a stderr warning). Regression tests added: `test_rag_collection_dim_mismatch_recreates_collection` and `test_rag_collection_dim_match_preserves_collection` in `tests/test_rag_engine.py`. This also fixes a real user-facing bug: switching embedding providers in the GUI previously caused silent corruption. Commit 16412ad5.*
 0a. [ ] **Track: prior_session_test_harden_20260605** [superseded by live_gui_test_hardening_v2_20260605]
@@ -0,0 +1,146 @@
+{
+  "track_id": "data_structure_strengthening_20260606",
+  "name": "Data Structure Strengthening (Type Aliases + NamedTuples)",
+  "initialized": "2026-06-06",
+  "owner": "tier2-tech-lead",
+  "priority": "medium",
+  "status": "active",
+  "type": "refactor + ai-readability + documentation",
+  "scope": {
+    "new_files": [
+      "src/type_aliases.py",
+      "tests/test_type_aliases.py",
+      "tests/test_audit_weak_types.py",
+      "conductor/code_styleguides/type_aliases.md"
+    ],
+    "modified_files": [
+      "src/ai_client.py",
+      "src/app_controller.py",
+      "src/models.py",
+      "src/api_hook_client.py",
+      "src/project_manager.py",
+      "src/aggregate.py",
+      "conductor/product-guidelines.md",
+      "scripts/audit_weak_types.py"
+    ]
+  },
+  "blocked_by": [],
+  "blocks": ["typed_dict_migration_20260606" /* not yet created */],
+  "estimated_phases": 2,
+  "spec": "spec.md",
+  "plan": "plan.md",
+  "priority_order": "A (6 aliases + 6-file replacement) > B (canonical names + audit CI gate) > C (NamedTuples + docs) > D (plan follow-up)",
+  "audit_data": {
+    "total_weak_findings_baseline": 430,
+    "files_scanned": 61,
+    "files_with_findings_baseline": 29,
+    "positive_patterns_baseline": 0,
+    "unique_type_strings_baseline": 26,
+    "top_4_unique_types_account_for_pct": 86,
+    "top_offender": "src/ai_client.py (139 findings, 32.3%)"
+  },
+  "type_aliases": {
+    "Metadata": "dict[str, Any] - the root alias; any key-value record",
+    "CommsLogEntry": "Metadata - a single entry in the AI comms log",
+    "CommsLog": "list[CommsLogEntry] - the comms log ring buffer",
+    "HistoryMessage": "Metadata - a single message in the AI provider history",
+    "History": "list[HistoryMessage] - the conversation history",
+    "FileItem": "Metadata - a single file in the context (path, content, is_image, etc.)",
+    "FileItems": "list[FileItem] - the most common weak pattern in the codebase",
+    "ToolDefinition": "Metadata - a single tool definition (function name, description, parameters)",
+    "ToolCall": "Metadata - a single tool call from the model (id, type, function)",
+    "CommsLogCallback": "Callable[[CommsLogEntry], None] - the callback signature"
+  },
+  "named_tuples": {
+    "FileItemsDiff": "NamedTuple with fields (refreshed: FileItems, changed: FileItems) - the return of _reread_file_items"
+  },
+  "refactor_targets": {
+    "src/ai_client.py": {
+      "weak_sites": 139,
+      "replacement_strategy": "79 dict_str_any -> Metadata/CommsLogEntry/HistoryMessage/FileItem/ToolDefinition/ToolCall; 56 list_of_dict -> CommsLog/History/FileItems/ToolDefinitions; 2 Optional[List[Dict[...]]] -> Optional[FileItems]; 2 assign_tuple_literal -> ToolCall"
+    },
+    "src/app_controller.py": {
+      "weak_sites": 86,
+      "replacement_strategy": "62 dict_str_any -> Metadata; 20 list_of_dict -> list[Metadata]; 4 optional_dict -> Optional[Metadata]"
+    },
+    "src/models.py": {
+      "weak_sites": 51,
+      "replacement_strategy": "48 dict_str_any -> Optional[Metadata]; 3 list_of_dict -> list[Metadata]"
+    },
+    "src/api_hook_client.py": {
+      "weak_sites": 32,
+      "replacement_strategy": "30 dict_str_any -> Metadata; 2 list_of_dict -> list[Metadata]"
+    },
+    "src/project_manager.py": {
+      "weak_sites": 20,
+      "replacement_strategy": "16 dict_str_any -> Metadata; 3 list_of_dict -> list[Metadata]; 1 optional_dict -> Optional[Metadata]"
+    },
+    "src/aggregate.py": {
+      "weak_sites": 17,
+      "replacement_strategy": "10 dict_str_any -> Metadata; 7 list_of_dict -> list[Metadata]"
+    }
+  },
+  "audit_ci_gate": {
+    "script": "scripts/audit_weak_types.py",
+    "current_mode": "informational (exit 0 always)",
+    "new_mode": "strict (exit 1 if new findings introduced vs baseline)",
+    "baseline_file": "scripts/audit_weak_types.baseline.json",
+    "baseline_after_phase_1": "~60 findings (only the 23 lower-impact files remain)",
+    "target_reduction": "430 -> ~60 (86% reduction in the 6 high-traffic files)"
+  },
+  "ai_performance_analysis": {
+    "win": "A name is a one-time cost the AI pays to learn, then reuses forever. With 10 aliases covering 370+ usages, the AI's vocabulary cost is bounded while the readability win is unbounded.",
+    "cost": "10 new names for the AI to learn. Comparable to adding 10 new function names to a module - well within normal Python codebase scale.",
+    "caveat": "If we add too many aliases (50+), the cognitive cost exceeds the benefit. The proposed 10 is the sweet spot. Phase 2 will convert the most-used aliases to TypedDict, which gives the AI field-level hints, not just a name.",
+    "honest_assessment": "Net win. The current 0 aliases is the worst case; going to 10 is a strictly better state for AI readability."
+  },
+  "coexistence_with_data_oriented_track": {
+    "Result_T": "The data_oriented_error_handling_20260606 track introduces Result[T] as a control-level wrapper. The aliases introduced by THIS track are value-level types (what's inside the T).",
+    "ErrorInfo": "Already a @dataclass from the data_oriented track; no change.",
+    "Result_composition": "Result[FileItems] is valid - the aliases name the T, not the Result itself."
+  },
+  "architectural_invariant": "The 6 type aliases are the CANONICAL names for the metadata family. New code MUST use them. Old code is migrated opportunistically. The audit script enforces this via the --strict mode (exits 1 if new weak sites are introduced).",
+  "threading_constraint": "No change. TypeAlias is type-level only; runtime behavior is identical to the underlying types. The aliases are thread-safe because dict / list / Callable are thread-safe for the operations performed.",
+  "verification_criteria": [
+    "src/type_aliases.py exists with 10 TypeAliases and 1 NamedTuple",
+    "All 10 aliases import successfully (tests/test_type_aliases.py)",
+    "Result[FileItems] is a valid generic (verified by importing)",
+    "scripts/audit_weak_types.py reports 370+ fewer findings after Phase 1 (~60 total)",
+    "scripts/audit_weak_types.py --strict mode exits 1 when a new weak site is added",
+    "scripts/audit_weak_types.baseline.json is committed with the post-Phase-1 count",
+    "src/ai_client.py: 139 weak sites -> 0 weak sites (all replaced with aliases)",
+    "src/app_controller.py: 86 -> 0",
+    "src/models.py: 51 -> 0",
+    "src/api_hook_client.py: 32 -> 0",
+    "src/project_manager.py: 20 -> 0",
+    "src/aggregate.py: 17 -> 0",
+    "Phase 2: _reread_file_items returns FileItemsDiff (NamedTuple); all call sites updated",
+    "Phase 2: 1-2 more tuple returns converted to NamedTuples opportunistically",
+    "tests/test_type_aliases.py: 8+ tests pass",
+    "tests/test_audit_weak_types.py: 6+ tests pass",
+    "tests/test_ai_client.py (existing): no regressions",
+    "tests/test_app_controller.py (existing): no regressions",
+    "tests/test_models.py (existing): no regressions",
+    "tests/test_api_hook_client.py (existing): no regressions",
+    "tests/test_project_manager.py (existing): no regressions",
+    "tests/test_aggregate.py (existing): no regressions",
+    "conductor/product-guidelines.md: new 'Data Structure Conventions' section added",
+    "conductor/code_styleguides/type_aliases.md: the canonical reference",
+    "No new threading.Thread calls in src/",
+    "No new Optional[X] introduced by the refactor (the aliases compose with Optional, but no NEW Optional types are added)",
+    "No runtime behavior changes (aliases are type-level only)"
+  ],
+  "links": {
+    "backlog_entry": "conductor/tracks.md (to be added)",
+    "audit_script": "scripts/audit_weak_types.py",
+    "code_styleguide": "conductor/code_styleguides/type_aliases.md (to be created in Phase 2)",
+    "testing_guide": "docs/guide_testing.md",
+    "audit_baseline": "scripts/audit_weak_types.baseline.json (to be created in Phase 1)",
+    "related_tracks": [
+      "conductor/tracks/startup_speedup_20260606/",
+      "conductor/tracks/test_batching_refactor_20260606/",
+      "conductor/tracks/qwen_llama_grok_integration_20260606/",
+      "conductor/tracks/data_oriented_error_handling_20260606/"
+    ]
+  }
+}
@@ -0,0 +1,311 @@
+# Track: Data Structure Strengthening (Type Aliases + NamedTuples)
+
+**Status:** Active (spec approved 2026-06-06)
+**Initialized:** 2026-06-06
+**Owner:** Tier 2 Tech Lead
+**Priority:** Medium (developer + AI-readability; not a regression blocker)
+
+---
+
+## 1. Overview
+
+This track introduces a small, focused set of `TypeAlias` definitions in a new `src/type_aliases.py` module and replaces 370+ anonymous `dict[str, Any]` / `list[dict[...]]` usages across 6 high-traffic files (`src/ai_client.py`, `src/app_controller.py`, `src/models.py`, `src/api_hook_client.py`, `src/project_manager.py`, `src/aggregate.py`). It also converts 2-3 tuple returns to `NamedTuple`s for self-documenting struct semantics.
+
+The track is **data-grounded**: a new AST-based audit script (`scripts/audit_weak_types.py`, committed in `84fd9ac9`) found 430 weak type sites across 29 of 61 files. After whitespace normalization, only **26 unique type strings** exist; the top 4 (`list[dict[str, Any]]`, `dict[str, Any]`, `Dict[str, Any]`, `List[Dict[str, Any]]`) account for 86% of findings. A small set of well-named aliases eliminates the vast majority.
+
+**The current codebase has ZERO strong type aliases** (no `TypeAlias`, no `NamedTuple`, no `pydantic.BaseModel` for these shapes). This is the worst case for AI readability — an LLM reading the code has zero schema hints and must guess the shape from usage at every call site.
+
+**Scope is deliberately bounded.** The track adds **6 type aliases** and converts **2-3 tuple returns** to NamedTuples. It does NOT migrate to `TypedDict` or `@dataclass` schemas (that's a much larger Phase 2, planned as a separate follow-up). It does NOT touch the 23 lower-impact files; they remain as `dict[str, Any]` until a future track migrates them.
+
+## 2. Goals (Priority Order)
+
+| Priority | Goal | Rationale |
+|---|---|---|
+| **A (primary value)** | Add 6 `TypeAlias` definitions to `src/type_aliases.py`: `Metadata`, `CommsLogEntry`, `CommsLog`, `FileItem`, `FileItems`, `HistoryMessage`. | Each alias names a concept that currently appears as `dict[str, Any]` or `list[dict[str, Any]]` in 30+ sites. The name is self-documenting; the underlying type is the same. |
+| **A (primary value)** | Mechanical replacement of 370+ weak sites in 6 files: `src/ai_client.py`, `src/app_controller.py`, `src/models.py`, `src/api_hook_client.py`, `src/project_manager.py`, `src/aggregate.py`. | The audit shows 86% of findings are in these 6 files. A focused refactor here eliminates the bulk of the noise. |
+| **B (architectural)** | The new aliases are the **canonical** names going forward. New code MUST use the aliases. Old code is migrated opportunistically (this track + future tracks). | One source of truth. The audit script (`scripts/audit_weak_types.py`) becomes a permanent CI gate that fails when new weak types are introduced. |
+| **B (architectural)** | Audit script exits 0 with significantly fewer findings after the refactor. Re-running `--json` should show the count drop from 430 to ~60 (only the 23 lower-impact files remain). | Measurable success criterion. The audit script is the ground truth. |
+| **C (optimization)** | Convert 2-3 tuple returns to `NamedTuple`s. Specifically: `_reread_file_items()` returns `Tuple[refreshed, changed]` becomes a `FileItemsDiff` NamedTuple. Other 1-occurrence tuples (screen coords, etc.) are converted opportunistically. | The tuple return pattern is rarer than the dict pattern (4 sites vs 430), but each conversion is high-value for self-documentation. |
+| **C (documentation)** | Add a short "Data Structure Conventions" section to `conductor/product-guidelines.md` and a "Type Aliases" subsection in `conductor/code_styleguides/error_handling.md` (or a new `code_styleguides/type_aliases.md`). | The convention is visible in the project-level guidance. Future plans reference it. |
+| **D (forward-looking)** | Plan a Phase-2 follow-up track: "TypedDict / dataclass Migration" that converts the most-used aliases (`CommsLogEntry`, `FileItem`) to `TypedDict` or `@dataclass(frozen=True)` so the FIELDS are visible to LLMs, not just the name. NOT in this track; documented in §12.1. | Honest about what's missing. Phase 2 is a separate effort. |
+
+### 2.1 Non-Goals (this track)
+
+- **Not** converting `dict[str, Any]` to `TypedDict` or `@dataclass`. That's Phase 2 of a future track. This track stops at NAMING the shapes; it does not give them structure.
+- **Not** touching the 23 lower-impact files. They stay as `dict[str, Any]` until a future incremental track migrates them. The audit script makes their weakness VISIBLE so the cost of ignoring them is documented.
+- **Not** changing the `Result[T]` pattern from the `data_oriented_error_handling_20260606` track. The aliases complement `Result`; they don't replace it. (`ErrorInfo` is a `@dataclass`, not a `TypeAlias`; it's already structured.)
+- **Not** adding pydantic models. The project doesn't currently use pydantic for these shapes; introducing it would be a much larger architectural decision.
+- **Not** modifying the data_oriented_error_handling_20260606 track's `src/result_types.py`. The aliases live in a new file (`src/type_aliases.py`); they coexist with `Result`/`ErrorInfo`.
+- **Not** changing the public API of any function. The aliases are TYPE-LEVEL ONLY; runtime behavior is identical.
+
+## 3. Architecture
+
+### 3.1 The Aliases
+
+`src/type_aliases.py` (NEW, ~80 lines):
+
+```python
+from typing import Any, Callable, TypeAlias
+
+# A single key-value record. The shape is intentionally open (Any value type)
+# because different concepts use different value types (str for paths, int for
+# counts, dict for nested structures, etc.). The name documents the SEMANTIC
+# ROLE, not the structural shape.
+Metadata: TypeAlias = dict[str, Any]
+
+# A single entry in the AI comms log (the in-memory ring buffer of API
+# requests/responses/timestamps/kind/direction). Used by _comms_log,
+# _append_comms, get_comms_log, comms_log_callback, etc.
+CommsLogEntry: TypeAlias = Metadata
+
+# A list of comms log entries.
+CommsLog: TypeAlias = list[CommsLogEntry]
+
+# A single entry in the AI provider's conversation history (the messages
+# list passed to/from OpenAI/Anthropic/Gemini). Used by _anthropic_history,
+# _deepseek_history, _minimax_history, _grok_history, _llama_history, etc.
+HistoryMessage: TypeAlias = Metadata
+
+# A list of history messages.
+History: TypeAlias = list[HistoryMessage]
+
+# A single file item in the context (path, content, is_image flag, base64
+# data, mtime). Used by file_items parameter (the most-threated list in
+# the codebase), _reread_file_items, _build_file_context_text, etc.
+FileItem: TypeAlias = Metadata
+
+# A list of file items. The most common weak pattern in the codebase.
+FileItems: TypeAlias = list[FileItem]
+
+# A single tool definition (function name, description, parameters schema).
+# Used by _build_anthropic_tools, _CACHED_ANTHROPIC_TOOLS, _get_anthropic_tools,
+# and the corresponding openai-compatible / gemini / deepseek builders.
+ToolDefinition: TypeAlias = Metadata
+
+# A single tool call from the model (id, type, function: {name, arguments}).
+# Used by response.tool_calls parsing across all providers.
+ToolCall: TypeAlias = Metadata
+
+# A callback that receives a comms log entry. Used by comms_log_callback,
+# confirm_and_run_callback, etc.
+CommsLogCallback: TypeAlias = Callable[[CommsLogEntry], None]
+```
+
+### 3.2 The NamedTuples (Phase 2)
+
+`src/type_aliases.py` (continued):
+
+```python
+from typing import NamedTuple
+
+# Return type of _reread_file_items. The two lists are conceptually distinct:
+# refreshed = items whose mtime was checked and the content re-read; changed =
+# items whose content actually changed (subset of refreshed).
+class FileItemsDiff(NamedTuple):
+ refreshed: FileItems
+ changed: FileItems
+```
+
+(Optional, if 1-2 more tuple returns warrant conversion — e.g., `Optional[Tuple[int, int, int, int]]` for screen coords, etc. — add them as separate `NamedTuple`s with semantic names.)
+
+### 3.3 Why These Specific Aliases
+
+The 6 aliases were chosen to be **concept-distinct**: each names a different semantic role that the code uses. Using the same name (`Metadata`) for all of them would collapse the semantic distinction; using 30 names would exceed the AI's vocabulary budget. 6 is the sweet spot:
+
+| Alias | Semantic role | Distinct from |
+|---|---|---|
+| `Metadata` | generic key-value record | (root) |
+| `CommsLogEntry` | a single comms log entry | `HistoryMessage` (different lifecycle) |
+| `HistoryMessage` | a single AI provider history message | `CommsLogEntry` (different lifecycle) |
+| `FileItem` | a single file in the context | `ToolDefinition` (different shape: paths vs function specs) |
+| `ToolDefinition` | a single tool definition | `FileItem`, `ToolCall` |
+| `ToolCall` | a single tool call from the model | `ToolDefinition` (definition vs invocation) |
+
+Some of these are aliased to `Metadata` (e.g., `CommsLogEntry: TypeAlias = Metadata`). This is intentional: Phase 2 can convert `Metadata` to a `TypedDict` (or split into per-concept `TypedDict`s) and the aliases continue to work without breaking changes. The aliases are STABLE NAMES; the underlying type can evolve.
+
+### 3.4 Module Layout
+
+```
+src/
+  type_aliases.py              # NEW: 6 TypeAliases + 1-3 NamedTuples
+  ai_client.py                 # MODIFIED: import aliases; replace ~139 weak sites
+  app_controller.py            # MODIFIED: import aliases; replace ~86 weak sites
+  models.py                    # MODIFIED: import aliases; replace ~51 weak sites
+  api_hook_client.py           # MODIFIED: import aliases; replace ~32 weak sites
+  project_manager.py           # MODIFIED: import aliases; replace ~20 weak sites
+  aggregate.py                 # MODIFIED: import aliases; replace ~17 weak sites
+  mcp_client.py                # UNCHANGED (only 9 weak sites; below the threshold)
+
+conductor/
+  product-guidelines.md        # MODIFIED: new "Data Structure Conventions" section
+  code_styleguides/
+    type_aliases.md            # NEW: the canonical reference (or co-located in error_handling.md)
+
+scripts/
+  audit_weak_types.py          # already committed in 84fd9ac9; runs as CI gate
+
+tests/
+  test_type_aliases.py         # NEW: verify the aliases import and resolve to the right types
+  (existing test files):       # MODIFIED: update the 6 files; existing tests should pass unchanged
+```
+
+### 3.5 Coexistence with `Result[T]` and `ErrorInfo`
+
+The new `Metadata` family aliases are VALUE-LEVEL types (what's in a dict). The `Result[T]` from `data_oriented_error_handling_20260606` is a CONTROL-LEVEL wrapper (a data struct that includes errors). They compose:
+
+```python
+# Data-oriented error handling returns:
+Result[CommsLogEntry]   # a Result wrapping a single comms log entry
+Result[History]         # a Result wrapping a list of history messages
+Result[FileItems]       # a Result wrapping a list of file items
+
+# The aliases name the "T" in Result[T], not the Result itself.
+```
+
+This is consistent: `Result` is a generic that wraps any data type. Naming the data types (via `TypeAlias`) makes the generic concrete without changing the `Result` pattern.
+
+## 4. Per-File Refactor Plan
+
+### 4.1 `src/ai_client.py` (139 sites — largest offender)
+
+**Pattern:** `_anthropic_history: list[dict[str, Any]]` (and 5 sibling histories), `_comms_log: deque[dict[str, Any]]`, `get_comms_log -> list[dict[str, Any]]`, `_build_anthropic_tools -> list[dict[str, Any]]`, `_reread_file_items -> tuple[list[...], list[...]]`, etc.
+
+**Refactor strategy:**
+- Replace all 79 `dict[str, Any]` / `Dict[str, Any]` with `Metadata` or the more specific alias.
+- Replace all 56 `list[dict[...]]` with `CommsLog` / `History` / `FileItems` / `ToolDefinitions` based on the SEMANTIC ROLE of the list.
+- 2 `Optional[List[Dict[...]]]` with `Optional[FileItems]` (the `_CACHED_ANTHROPIC_TOOLS` is an Optional[ToolDefinitions]).
+- 2 tuple-return literal returns: the `cast(...)` patterns in `_dispatch_tool`. Replace with `ToolCall` extraction.
+
+**Naming heuristic:** for each list of dicts, look at the variable name + the function name to determine the semantic role. E.g., `_comms_log` → `CommsLog`; `_anthropic_history` → `History`; `_build_anthropic_tools` → `ToolDefinitions`; `_reread_file_items(file_items: list[...])` → `FileItems`.
+
+### 4.2 `src/app_controller.py` (86 sites)
+
+**Pattern:** `_pending_dialog: Optional[ConfirmDialog] = None` (stays as-is; this is a STRONG type already), `last_error: Optional[Dict[str, str]] = None` (could be `Optional[ErrorInfo]` from the data_oriented track), but most weak sites are in the `Hook API` request/response payloads and the `pre_tool_callback` family.
+
+**Refactor strategy:**
+- The 62 `dict_str_any` sites: replace with `Metadata` or `CommsLogEntry` based on context.
+- The 20 `list_of_dict` sites: replace with the appropriate alias.
+- The 4 `optional_dict` sites: replace with `Optional[Metadata]` (or `Optional[CommsLogEntry]` if the context is the hook request payload).
+
+### 4.3 `src/models.py` (51 sites)
+
+**Pattern:** Dataclass fields. E.g., `script: Optional[str] = None` (stays as-is; STRONG), but also `target_file: Optional[str] = None` and many fields where the type is `Optional[Dict[str, Any]]` (in dataclass fields).
+
+**Refactor strategy:** Replace 48 `dict_str_any` with `Optional[Metadata]`; 3 `list_of_dict` with the appropriate alias.
+
+### 4.4 `src/api_hook_client.py` (32 sites)
+
+**Pattern:** HTTP request/response payloads. E.g., `payload: Dict[str, Any]`, `data: dict[str, Any]`.
+
+**Refactor strategy:** 30 `dict_str_any` → `Metadata`; 2 `list_of_dict` → `list[Metadata]`.
+
+### 4.5 `src/project_manager.py` (20 sites)
+
+**Pattern:** TOML config dicts. E.g., `proj: dict[str, Any]`, `data: dict[str, Any]`.
+
+**Refactor strategy:** 16 `dict_str_any` → `Metadata`; 3 `list_of_dict` → `list[Metadata]`; 1 `optional_dict` → `Optional[Metadata]`.
+
+### 4.6 `src/aggregate.py` (17 sites)
+
+**Pattern:** Aggregation result dicts. E.g., `result: dict[str, list[dict[str, Any]]]`.
+
+**Refactor strategy:** 10 `dict_str_any` → `Metadata`; 7 `list_of_dict` → appropriate alias.
+
+### 4.7 Phase 2 NamedTuple conversions
+
+- **`_reread_file_items`** in `src/ai_client.py` (returns `Tuple[List[FileItem], List[FileItem]]`) → returns `FileItemsDiff`. Affects ~3-4 call sites.
+- **1-2 screen-coord tuples** (1-occurrence each) — opportunistic. If the call site is clear and the names are obvious, convert; otherwise leave.
+
+## 5. The Audit Script as a Permanent CI Gate
+
+After this track, the audit script becomes a permanent CI gate. `scripts/audit_weak_types.py` exits 0 even when findings exist (it's informational). The CI gate uses a stricter mode:
+
+```bash
+# New mode: --strict, exits 1 if any new weak site is added in a PR
+python scripts/audit_weak_types.py --strict
+```
+
+The `--strict` mode compares the current count to a baseline (stored in `scripts/audit_weak_types.baseline.json`). If the current count is HIGHER than the baseline, exit 1. The baseline is regenerated after this track to the post-refactor count (~60 findings, only the 23 lower-impact files remain).
+
+This is documented in the spec but the actual `--strict` mode is implemented as part of the track (Phase 1 final task). Future PRs that introduce new `dict[str, Any]` or anonymous tuples will fail CI.
+
+## 6. Configuration
+
+No new dependencies. No new environment variables. No new config files.
+
+The aliases live in `src/type_aliases.py` (pure stdlib `typing.TypeAlias`).
+
+## 7. Testing Strategy
+
+| Test File | Purpose | Coverage Target |
+|---|---|---|
+| `tests/test_type_aliases.py` | Verify the aliases import; verify they resolve to the expected types; verify they compose with `Result[T]` (e.g., `Result[FileItems]` is a valid generic). | 100% |
+| `tests/test_audit_weak_types.py` | Verify the audit script's regex patterns are correct; verify the `Finding` dataclass is populated correctly; verify the report matches expectations. | 90% |
+| `tests/test_ai_client.py` (existing) | Verify no regressions after the 139-site replacement. | 100% (regression) |
+| `tests/test_app_controller.py` (existing) | Verify no regressions after the 86-site replacement. | 100% (regression) |
+| `tests/test_models.py` (existing) | Verify no regressions after the 51-site replacement. | 100% (regression) |
+| `tests/test_api_hook_client.py` (existing) | Verify no regressions after the 32-site replacement. | 100% (regression) |
+| `tests/test_project_manager.py` (existing) | Verify no regressions after the 20-site replacement. | 100% (regression) |
+| `tests/test_aggregate.py` (existing) | Verify no regressions after the 17-site replacement. | 100% (regression) |
+| `tests/test_mcp_client.py` (existing) | Verify no regressions. (mcp_client is unchanged but the aliases may be adopted opportunistically in Phase 1.5 if convenient.) | 100% (regression) |
+
+**Mocking strategy:** Existing tests use `unittest.mock.patch`; no changes needed.
+
+**Audit baseline check:** After Phase 1, the audit script should report 0 NEW findings (the count may go UP if a few sites were missed, but the trend is DOWN). After Phase 2, the count should be at or below the pre-track baseline minus 50 (the targeted reductions).
+
+## 8. Migration / Rollout
+
+| Phase | What | Risk |
+|---|---|---|
+| **Phase 1 — Aliases + 6-file replacement + audit baseline** | Add `src/type_aliases.py`. Add `tests/test_type_aliases.py`. Mechanical replacement in 6 files. Add `--strict` mode to the audit script. Generate the new baseline. | Medium. ~345 sites of mechanical replacement. Mitigated by existing test coverage. |
+| **Phase 2 — NamedTuples + docs + archive** | Convert 2-3 tuple returns to NamedTuples. Update `conductor/product-guidelines.md` and `code_styleguides/`. Manual smoke test. Archive the track. | Low. ~3-4 sites of tuple conversion. Docs-only changes. |
+
+Each phase has its own checkpoint commit and git note.
+
+## 9. Risks & Mitigations
+
+| Risk | Likelihood | Impact | Mitigation |
+|---|---|---|---|
+| Mechanical replacement misses a few sites; the count doesn't drop as expected. | Medium | Low | The audit script is the source of truth. Re-run after Phase 1; investigate any anomalies. |
+| Renaming `dict[str, Any]` to `Metadata` (or another alias) changes how some tests introspect types (e.g., `isinstance(x, dict)`). | Low | Medium | The aliases are TYPE-LEVEL ONLY; at runtime, `Metadata` IS `dict[str, Any]` IS `dict`. `isinstance(x, dict)` continues to work. Test cases that use `get_type_hints()` may need updating; documented in the test plan. |
+| A future contributor adds a new `dict[str, Any]` and the audit script doesn't catch it. | Low | Low | The audit script's regex patterns are exhaustive for the current 430 findings. New patterns (e.g., a new `Mapping[str, Any]`) would be missed. The track documents the patterns the script knows; future contributions of new patterns warrant extending the script. |
+| The aliases conflict with the `Result[T]` and `ErrorInfo` from the data_oriented_error_handling track. | Low | Low | The aliases are VALUE-LEVEL (data types); `Result` and `ErrorInfo` are CONTROL-LEVEL (wrappers). They compose: `Result[FileItems]` is valid. No conflict. |
+| The 6-file mechanical replacement is too large to review in one PR. | Medium | Low | Phase 1 is split into 6 sub-tasks (one per file) in the plan, each with its own commit. Reviewers can review file-by-file. |
+| The 23 lower-impact files are NEVER migrated. | High | Low (acceptable) | The audit script stays in the codebase as a permanent CI gate. The cost of ignoring the 23 files is now VISIBLE. Future tracks can pick them up opportunistically. |
+
+## 10. Out of Scope (Explicit)
+
+- **TypedDict / @dataclass migration** of the `Metadata` family. Deferred to a Phase 2 of a future track. This track adds NAMES; the next track adds STRUCTURE.
+- **The 23 lower-impact files** (those with 1-9 weak sites each). Deferred; will be addressed opportunistically or in a future incremental track.
+- **Adding pydantic models.** Not requested; would be a much larger architectural decision.
+- **Changing function signatures at the runtime level.** The aliases are TYPE-LEVEL; runtime behavior is identical.
+- **Modifying `scripts/audit_weak_types.py`'s regex patterns.** The patterns are correct for the current findings. If new patterns emerge, a future track can extend the script.
+- **Migrating the data_oriented_error_handling_20260606 track's `src/result_types.py` aliases.** The 2 type-aliases modules are SEPARATE: `result_types.py` has `ErrorInfo` / `Result` / `ErrorKind`; `type_aliases.py` has `Metadata` / `CommsLog` / `FileItem` / etc. They don't overlap.
+
+## 11. Open Questions
+
+1. **The 6 aliases or 4?** The 6 listed in §3.1 are: `Metadata`, `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `FileItem`, `FileItems`, `ToolDefinition`, `ToolCall`, `CommsLogCallback`. That's 10. Should we cut to 4-6 to minimize the AI vocabulary? (Proposal: keep all 10; they're each named for a distinct concept, and the 10 names are self-explanatory. The "vocabulary cost" is the same as adding 10 new function names to a module — well within normal Python codebase scale.)
+2. **Should `FileItem` and `ToolDefinition` be `TypedDict` from the start?** A `TypedDict` gives the AI field-level hints, not just a name. But introducing `TypedDict` requires knowing the FIELDS, which is a deeper semantic task. (Proposal: Phase 1 uses `TypeAlias = dict[str, Any]`; Phase 2 of a future track converts to `TypedDict`. Keeps the current track scope tight.)
+3. **Should the audit script enforce a count threshold (e.g., "no more than 100 weak sites total") or a per-file threshold (e.g., "no file may have more than 50 weak sites")?** (Proposal: per-file threshold is more actionable. A future PR that introduces 20 new `dict[str, Any]` in `foo.py` would fail even if the total count didn't increase.)
+
+## 12. See Also
+
+### 12.1 Follow-up Track (planned; not in this spec)
+
+**"TypedDict / dataclass Migration"** (`typed_dict_migration_20260606` or similar) — converts the most-used aliases (`CommsLogEntry`, `FileItem`, `ToolDefinition`, `HistoryMessage`) to `TypedDict` (Python 3.8+) or `@dataclass(frozen=True)` so the FIELDS are visible to LLMs, not just the name. This is the natural Phase 2 of this track. Prerequisites: this track (so the field-level schema has a stable name to attach to).
+
+### 12.2 Project References
+
+- `scripts/audit_weak_types.py` (already committed; `84fd9ac9`) — the audit that found 430 weak sites.
+- `docs/guide_testing.md` — test conventions.
+- `conductor/code_styleguides/error_handling.md` (created in the data_oriented_error_handling_20260606 track) — the convention for `Result` types; the new type-aliases convention lives alongside.
+- `conductor/product-guidelines.md` "Data-Oriented Error Handling" — the convention this track extends (Data Structure Strengthening is a new top-level convention in the same family).
+- `conductor/tracks/data_oriented_error_handling_20260606/` — the previous track that established the convention format; this track uses the same pattern.
+
+### 12.3 External References
+
+- **Python `typing.TypeAlias`** — the canonical mechanism for type aliases (PEP 613, Python 3.10+).
+- **Python `typing.NamedTuple`** — for tuple-with-fields.
+- **Python `typing.TypedDict`** — for the future Phase 2 (not in this track).
+- **Mike Acton on data-oriented design** — the "data is the API" framing that motivates NAMING data structures clearly.
+- **Casey Muratori on module layer boundaries** — the convention that each module owns its data and exposes a clear interface.
@@ -0,0 +1,91 @@
+# Track state for data_structure_strengthening_20260606
+# Updated by Tier 2 Tech Lead as tasks complete
+
+[meta]
+track_id = "data_structure_strengthening_20260606"
+name = "Data Structure Strengthening (Type Aliases + NamedTuples)"
+status = "active"
+current_phase = 0
+last_updated = "2026-06-06"
+
+[phases]
+phase_1 = { status = "pending", checkpointsha = "", name = "Aliases + 6-file replacement + audit baseline" }
+phase_2 = { status = "pending", checkpointsha = "", name = "NamedTuples + docs + archive" }
+
+[tasks]
+# Phase 1: Aliases + 6-file replacement
+t1_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_type_aliases.py (verify 10 TypeAliases + 1 NamedTuple import and resolve to expected types; verify Result[FileItems] composes)" }
+t1_2 = { status = "pending", commit_sha = "", description = "Green: create src/type_aliases.py with 10 TypeAliases (Metadata, CommsLogEntry, CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition, ToolCall, CommsLogCallback) and 1 NamedTuple (FileItemsDiff)" }
+t1_3 = { status = "pending", commit_sha = "", description = "Replace 139 weak sites in src/ai_client.py with the new aliases (79 dict_str_any + 56 list_of_dict + 2 Optional[List[Dict]] + 2 assign_tuple_literal)" }
+t1_4 = { status = "pending", commit_sha = "", description = "Replace 86 weak sites in src/app_controller.py (62 dict_str_any + 20 list_of_dict + 4 optional_dict)" }
+t1_5 = { status = "pending", commit_sha = "", description = "Replace 51 weak sites in src/models.py (48 dict_str_any + 3 list_of_dict)" }
+t1_6 = { status = "pending", commit_sha = "", description = "Replace 32 weak sites in src/api_hook_client.py (30 dict_str_any + 2 list_of_dict)" }
+t1_7 = { status = "pending", commit_sha = "", description = "Replace 20 weak sites in src/project_manager.py (16 dict_str_any + 3 list_of_dict + 1 optional_dict)" }
+t1_8 = { status = "pending", commit_sha = "", description = "Replace 17 weak sites in src/aggregate.py (10 dict_str_any + 7 list_of_dict)" }
+t1_9 = { status = "pending", commit_sha = "", description = "Add --strict mode to scripts/audit_weak_types.py (compares current count to baseline file; exits 1 if increased)" }
+t1_10 = { status = "pending", commit_sha = "", description = "Generate scripts/audit_weak_types.baseline.json with the post-Phase-1 count" }
+t1_11 = { status = "pending", commit_sha = "", description = "Red: tests/test_audit_weak_types.py (verify regex patterns, Finding dataclass, report format)" }
+t1_12 = { status = "pending", commit_sha = "", description = "Run full test suite; confirm no regressions in 6 refactored files" }
+t1_13 = { status = "pending", commit_sha = "", description = "Run audit; confirm count dropped from 430 to ~60; commit the new baseline" }
+t1_14 = { status = "pending", commit_sha = "", description = "Phase 1 checkpoint commit + git note" }
+# Phase 2: NamedTuples + docs + archive
+t2_1 = { status = "pending", commit_sha = "", description = "Convert src/ai_client.py:_reread_file_items to return FileItemsDiff NamedTuple (replaces Tuple[List[FileItem], List[FileItem]]); update ~3-4 call sites" }
+t2_2 = { status = "pending", commit_sha = "", description = "Opportunistic NamedTuple conversions for 1-2 more tuple returns (screen coords, etc.)" }
+t2_3 = { status = "pending", commit_sha = "", description = "Create conductor/code_styleguides/type_aliases.md (canonical reference for the alias convention; 5 patterns + decision tree + examples)" }
+t2_4 = { status = "pending", commit_sha = "", description = "Add 'Data Structure Conventions' section to conductor/product-guidelines.md (referencing the new styleguide)" }
+t2_5 = { status = "pending", commit_sha = "", description = "Manual smoke test: launch GUI; verify type aliases don't break anything; verify audit --strict mode" }
+t2_6 = { status = "pending", commit_sha = "", description = "Phase 2 checkpoint commit + git note (TRACK COMPLETE)" }
+t2_7 = { status = "pending", commit_sha = "", description = "git mv conductor/tracks/data_structure_strengthening_20260606 to conductor/tracks/archive/" }
+t2_8 = { status = "pending", commit_sha = "", description = "Update conductor/tracks.md: move entry to Recently Completed" }
+t2_9 = { status = "pending", commit_sha = "", description = "Final state.toml update: mark all phases completed; add follow-up track typed_dict_migration_20260606 placeholder" }
+
+[verification]
+# Filled as phases complete
+phase_1_aliases_module_complete = false
+phase_1_ai_client_refactored = false
+phase_1_app_controller_refactored = false
+phase_1_models_refactored = false
+phase_1_api_hook_client_refactored = false
+phase_1_project_manager_refactored = false
+phase_1_aggregate_refactored = false
+phase_1_audit_strict_mode_added = false
+phase_1_baseline_committed = false
+phase_2_file_items_diff_named_tuple = false
+phase_2_opportunistic_named_tuples = false
+phase_2_styleguide_written = false
+phase_2_product_guidelines_updated = false
+phase_2_smoke_test_passed = false
+phase_2_track_archived = false
+full_test_suite_passes = false
+no_new_optional_introduced = false
+audit_count_dropped_to_60 = false
+
+[audit_count_progression]
+# Filled as tasks complete
+baseline = 430
+after_ai_client = 291
+after_app_controller = 205
+after_models = 154
+after_api_hook_client = 122
+after_project_manager = 102
+after_aggregate = 85
+phase_1_checkpoint_committed = 0  # TBD
+phase_2_checkpoint_committed = 0  # TBD
+
+[files_refactored]
+ai_client = { weak_sites_before = 139, weak_sites_after = 0, status = "pending" }
+app_controller = { weak_sites_before = 86, weak_sites_after = 0, status = "pending" }
+models = { weak_sites_before = 51, weak_sites_after = 0, status = "pending" }
+api_hook_client = { weak_sites_before = 32, weak_sites_after = 0, status = "pending" }
+project_manager = { weak_sites_before = 20, weak_sites_after = 0, status = "pending" }
+aggregate = { weak_sites_before = 17, weak_sites_after = 0, status = "pending" }
+
+[typed_dict_migration_followup]
+track_id = "typed_dict_migration_20260606"
+status = "planned_in_data_structure_strengthening_20260606"
+converts = ["CommsLogEntry", "FileItem", "ToolDefinition", "HistoryMessage"]
+to = "TypedDict or @dataclass(frozen=True)"
+
+[public_api_migration_followup]
+# From the data_oriented_error_handling track
+note = "This track does not depend on or block the public_api_migration_20260606 track. They are independent."