From ed42a97a9bb46879dcb74f96638d4e03e8e56274 Mon Sep 17 00:00:00 2001 From: Ed_ Date: Sat, 6 Jun 2026 17:49:22 -0400 Subject: [PATCH] conductor(track): Initialize data_structure_strengthening_20260606 Track + metadata + state + tracks.md registration for the type-aliases refactor that follows the audit_weak_types.py findings (430 weak sites across 29 of 61 files; 86% concentrated in 6 high-traffic files). Key design decisions (per user approval): - 10 TypeAlias definitions in src/type_aliases.py (Metadata, CommsLogEntry, CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition, ToolCall, CommsLogCallback). - 1 NamedTuple (FileItemsDiff) for the _reread_file_items return. - Mechanical replacement of 345 weak sites across 6 files (NOT 430; the remaining 85 are in 23 lower-impact files deferred to future tracks). - scripts/audit_weak_types.py gains a --strict mode and a baseline file (scripts/audit_weak_types.baseline.json) so the count is enforced. - 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. - Honest about what's missing: TypedDict / @dataclass migration is a follow-up track (typed_dict_migration_20260606), not this one. - Coexistence with the data_oriented_error_handling_20260606 track's Result[T] / ErrorInfo: the aliases are value-level (data types), Result is control-level (wrapper). They compose (Result[FileItems] is valid). No conflict. Audit baseline: - Pre-track: 430 weak sites, 0 strong patterns - Target after Phase 1: ~60 weak sites (only the 23 lower-impact files) - Top 4 unique type strings account for 86% of findings (4-6 aliases eliminate the bulk of the noise). Not blocked by anything; can be executed independently of the other pending tracks. Blocks typed_dict_migration_20260606 (the future Phase 2). --- conductor/tracks.md | 4 + .../metadata.json | 146 ++++++++ .../spec.md | 311 ++++++++++++++++++ .../state.toml | 91 +++++ 4 files changed, 552 insertions(+) create mode 100644 conductor/tracks/data_structure_strengthening_20260606/metadata.json create mode 100644 conductor/tracks/data_structure_strengthening_20260606/spec.md create mode 100644 conductor/tracks/data_structure_strengthening_20260606/state.toml diff --git a/conductor/tracks.md b/conductor/tracks.md index c439663e..501719b6 100644 --- a/conductor/tracks.md +++ b/conductor/tracks.md @@ -167,6 +167,10 @@ User review surfaced five outstanding UI issues, each previously attempted witho *Goal: Introduce Ryan Fleury's "errors are just cases" framework as a project convention. New `src/result_types.py` (ErrorKind enum, ErrorInfo dataclass, `Result[T]` with data + side-channel errors list, NilPath + NilRAGState sentinel singletons) and new `conductor/code_styleguides/error_handling.md` canonical reference. Refactor `src/mcp_client.py` ((p, err) tuples → Result; 30+ `assert p is not None` → nil-sentinel paths), `src/ai_client.py` (ProviderError exception → ErrorInfo dataclass; `_send_()` → `_send__result()` returning `Result[str]`; `send()` marked `@deprecated`; new `send_result()` public API), and `src/rag_engine.py` (RAGEngine methods → Result returns). Update `conductor/product-guidelines.md` + `workflow.md` + `docs/guide_*.md` so the convention is documented and future plans can incrementally migrate the remaining `src/` files. **Blocked by** startup_speedup, test_batching_refactor, and qwen_llama_grok tracks. 5 phases: foundation+styleguide, mcp_client refactor, ai_client refactor (highest risk; ProviderError removal), rag_engine refactor, deprecation+docs+archive.* *Follow-up: [./tracks/public_api_migration_20260606/](./tracks/public_api_migration_20260606/) (planned; not yet specced) — removes the deprecated `ai_client.send()` and migrates all callers.* +0f. [ ] **Track: Data Structure Strengthening (Type Aliases + NamedTuples)** `[track-created: ]` + *Link: [./tracks/data_structure_strengthening_20260606/](./tracks/data_structure_strengthening_20260606/), Spec: [./tracks/data_structure_strengthening_20260606/spec.md](./tracks/data_structure_strengthening_20260606/spec.md), Plan: [./tracks/data_structure_strengthening_20260606/plan.md](./tracks/data_structure_strengthening_20260606/plan.md) (to be authored by writing-plans skill)* + *Goal: Improve AI-readability by naming 430 currently-anonymous `dict[str, Any]` / `list[dict[...]]` / `Tuple[...]` types. New `src/type_aliases.py` with 10 `TypeAlias` definitions (`Metadata`, `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `FileItem`, `FileItems`, `ToolDefinition`, `ToolCall`, `CommsLogCallback`) and 1 `NamedTuple` (`FileItemsDiff`). Mechanical replacement of 345 weak sites across 6 high-traffic files: `src/ai_client.py` (139), `src/app_controller.py` (86), `src/models.py` (51), `src/api_hook_client.py` (32), `src/project_manager.py` (20), `src/aggregate.py` (17). Add `--strict` mode to the existing `scripts/audit_weak_types.py` (committed in 84fd9ac9; found the 430 sites) so it becomes a permanent CI gate that fails when new weak types are introduced. Generate `scripts/audit_weak_types.baseline.json` with the post-refactor count. 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. **Data-grounded**: the audit script is the source of truth; the count drops from 430 to ~60 (86% reduction) in the 6 high-traffic files. **Honest about what's missing**: 23 lower-impact files remain; TypedDict/dataclass migration is deferred to a follow-up track. 2-3 days work, 1-2 phases, low risk.* + 0b. [x] **Track: rag_phase4_stress_test_flake_20260606** — fixed 16412ad5 *Status: 2026-06-06 — Surfaced during post-v2 verification. Resolved: real bug, NOT a test flake. Root cause: ChromaDB collection dimension mismatch across test runs. The persistent on-disk collection (`tests/artifacts/live_gui_workspace/.slop_cache/chroma_test_stress/`) was created by a previous run with Gemini embeddings (3072-dim); the current run uses local SentenceTransformers (384-dim). `index_file()` upserts silently corrupt the collection, then `search()` fails with `Collection expecting embedding with dimension of 3072, got 384` and the AI request never reaches 'done' status, timing out the 50*0.5s = 25s poll loop. Fix: `RAGEngine._init_vector_store` now calls `_validate_collection_dim` which inspects the first existing vector's dim, compares to the current provider's output, and recreates the collection on mismatch (with a stderr warning). Regression tests added: `test_rag_collection_dim_mismatch_recreates_collection` and `test_rag_collection_dim_match_preserves_collection` in `tests/test_rag_engine.py`. This also fixes a real user-facing bug: switching embedding providers in the GUI previously caused silent corruption. Commit 16412ad5.* 0a. [ ] **Track: prior_session_test_harden_20260605** [superseded by live_gui_test_hardening_v2_20260605] diff --git a/conductor/tracks/data_structure_strengthening_20260606/metadata.json b/conductor/tracks/data_structure_strengthening_20260606/metadata.json new file mode 100644 index 00000000..b961c3da --- /dev/null +++ b/conductor/tracks/data_structure_strengthening_20260606/metadata.json @@ -0,0 +1,146 @@ +{ + "track_id": "data_structure_strengthening_20260606", + "name": "Data Structure Strengthening (Type Aliases + NamedTuples)", + "initialized": "2026-06-06", + "owner": "tier2-tech-lead", + "priority": "medium", + "status": "active", + "type": "refactor + ai-readability + documentation", + "scope": { + "new_files": [ + "src/type_aliases.py", + "tests/test_type_aliases.py", + "tests/test_audit_weak_types.py", + "conductor/code_styleguides/type_aliases.md" + ], + "modified_files": [ + "src/ai_client.py", + "src/app_controller.py", + "src/models.py", + "src/api_hook_client.py", + "src/project_manager.py", + "src/aggregate.py", + "conductor/product-guidelines.md", + "scripts/audit_weak_types.py" + ] + }, + "blocked_by": [], + "blocks": ["typed_dict_migration_20260606" /* not yet created */], + "estimated_phases": 2, + "spec": "spec.md", + "plan": "plan.md", + "priority_order": "A (6 aliases + 6-file replacement) > B (canonical names + audit CI gate) > C (NamedTuples + docs) > D (plan follow-up)", + "audit_data": { + "total_weak_findings_baseline": 430, + "files_scanned": 61, + "files_with_findings_baseline": 29, + "positive_patterns_baseline": 0, + "unique_type_strings_baseline": 26, + "top_4_unique_types_account_for_pct": 86, + "top_offender": "src/ai_client.py (139 findings, 32.3%)" + }, + "type_aliases": { + "Metadata": "dict[str, Any] - the root alias; any key-value record", + "CommsLogEntry": "Metadata - a single entry in the AI comms log", + "CommsLog": "list[CommsLogEntry] - the comms log ring buffer", + "HistoryMessage": "Metadata - a single message in the AI provider history", + "History": "list[HistoryMessage] - the conversation history", + "FileItem": "Metadata - a single file in the context (path, content, is_image, etc.)", + "FileItems": "list[FileItem] - the most common weak pattern in the codebase", + "ToolDefinition": "Metadata - a single tool definition (function name, description, parameters)", + "ToolCall": "Metadata - a single tool call from the model (id, type, function)", + "CommsLogCallback": "Callable[[CommsLogEntry], None] - the callback signature" + }, + "named_tuples": { + "FileItemsDiff": "NamedTuple with fields (refreshed: FileItems, changed: FileItems) - the return of _reread_file_items" + }, + "refactor_targets": { + "src/ai_client.py": { + "weak_sites": 139, + "replacement_strategy": "79 dict_str_any -> Metadata/CommsLogEntry/HistoryMessage/FileItem/ToolDefinition/ToolCall; 56 list_of_dict -> CommsLog/History/FileItems/ToolDefinitions; 2 Optional[List[Dict[...]]] -> Optional[FileItems]; 2 assign_tuple_literal -> ToolCall" + }, + "src/app_controller.py": { + "weak_sites": 86, + "replacement_strategy": "62 dict_str_any -> Metadata; 20 list_of_dict -> list[Metadata]; 4 optional_dict -> Optional[Metadata]" + }, + "src/models.py": { + "weak_sites": 51, + "replacement_strategy": "48 dict_str_any -> Optional[Metadata]; 3 list_of_dict -> list[Metadata]" + }, + "src/api_hook_client.py": { + "weak_sites": 32, + "replacement_strategy": "30 dict_str_any -> Metadata; 2 list_of_dict -> list[Metadata]" + }, + "src/project_manager.py": { + "weak_sites": 20, + "replacement_strategy": "16 dict_str_any -> Metadata; 3 list_of_dict -> list[Metadata]; 1 optional_dict -> Optional[Metadata]" + }, + "src/aggregate.py": { + "weak_sites": 17, + "replacement_strategy": "10 dict_str_any -> Metadata; 7 list_of_dict -> list[Metadata]" + } + }, + "audit_ci_gate": { + "script": "scripts/audit_weak_types.py", + "current_mode": "informational (exit 0 always)", + "new_mode": "strict (exit 1 if new findings introduced vs baseline)", + "baseline_file": "scripts/audit_weak_types.baseline.json", + "baseline_after_phase_1": "~60 findings (only the 23 lower-impact files remain)", + "target_reduction": "430 -> ~60 (86% reduction in the 6 high-traffic files)" + }, + "ai_performance_analysis": { + "win": "A name is a one-time cost the AI pays to learn, then reuses forever. With 10 aliases covering 370+ usages, the AI's vocabulary cost is bounded while the readability win is unbounded.", + "cost": "10 new names for the AI to learn. Comparable to adding 10 new function names to a module - well within normal Python codebase scale.", + "caveat": "If we add too many aliases (50+), the cognitive cost exceeds the benefit. The proposed 10 is the sweet spot. Phase 2 will convert the most-used aliases to TypedDict, which gives the AI field-level hints, not just a name.", + "honest_assessment": "Net win. The current 0 aliases is the worst case; going to 10 is a strictly better state for AI readability." + }, + "coexistence_with_data_oriented_track": { + "Result_T": "The data_oriented_error_handling_20260606 track introduces Result[T] as a control-level wrapper. The aliases introduced by THIS track are value-level types (what's inside the T).", + "ErrorInfo": "Already a @dataclass from the data_oriented track; no change.", + "Result_composition": "Result[FileItems] is valid - the aliases name the T, not the Result itself." + }, + "architectural_invariant": "The 6 type aliases are the CANONICAL names for the metadata family. New code MUST use them. Old code is migrated opportunistically. The audit script enforces this via the --strict mode (exits 1 if new weak sites are introduced).", + "threading_constraint": "No change. TypeAlias is type-level only; runtime behavior is identical to the underlying types. The aliases are thread-safe because dict / list / Callable are thread-safe for the operations performed.", + "verification_criteria": [ + "src/type_aliases.py exists with 10 TypeAliases and 1 NamedTuple", + "All 10 aliases import successfully (tests/test_type_aliases.py)", + "Result[FileItems] is a valid generic (verified by importing)", + "scripts/audit_weak_types.py reports 370+ fewer findings after Phase 1 (~60 total)", + "scripts/audit_weak_types.py --strict mode exits 1 when a new weak site is added", + "scripts/audit_weak_types.baseline.json is committed with the post-Phase-1 count", + "src/ai_client.py: 139 weak sites -> 0 weak sites (all replaced with aliases)", + "src/app_controller.py: 86 -> 0", + "src/models.py: 51 -> 0", + "src/api_hook_client.py: 32 -> 0", + "src/project_manager.py: 20 -> 0", + "src/aggregate.py: 17 -> 0", + "Phase 2: _reread_file_items returns FileItemsDiff (NamedTuple); all call sites updated", + "Phase 2: 1-2 more tuple returns converted to NamedTuples opportunistically", + "tests/test_type_aliases.py: 8+ tests pass", + "tests/test_audit_weak_types.py: 6+ tests pass", + "tests/test_ai_client.py (existing): no regressions", + "tests/test_app_controller.py (existing): no regressions", + "tests/test_models.py (existing): no regressions", + "tests/test_api_hook_client.py (existing): no regressions", + "tests/test_project_manager.py (existing): no regressions", + "tests/test_aggregate.py (existing): no regressions", + "conductor/product-guidelines.md: new 'Data Structure Conventions' section added", + "conductor/code_styleguides/type_aliases.md: the canonical reference", + "No new threading.Thread calls in src/", + "No new Optional[X] introduced by the refactor (the aliases compose with Optional, but no NEW Optional types are added)", + "No runtime behavior changes (aliases are type-level only)" + ], + "links": { + "backlog_entry": "conductor/tracks.md (to be added)", + "audit_script": "scripts/audit_weak_types.py", + "code_styleguide": "conductor/code_styleguides/type_aliases.md (to be created in Phase 2)", + "testing_guide": "docs/guide_testing.md", + "audit_baseline": "scripts/audit_weak_types.baseline.json (to be created in Phase 1)", + "related_tracks": [ + "conductor/tracks/startup_speedup_20260606/", + "conductor/tracks/test_batching_refactor_20260606/", + "conductor/tracks/qwen_llama_grok_integration_20260606/", + "conductor/tracks/data_oriented_error_handling_20260606/" + ] + } +} diff --git a/conductor/tracks/data_structure_strengthening_20260606/spec.md b/conductor/tracks/data_structure_strengthening_20260606/spec.md new file mode 100644 index 00000000..ccfc09bd --- /dev/null +++ b/conductor/tracks/data_structure_strengthening_20260606/spec.md @@ -0,0 +1,311 @@ +# Track: Data Structure Strengthening (Type Aliases + NamedTuples) + +**Status:** Active (spec approved 2026-06-06) +**Initialized:** 2026-06-06 +**Owner:** Tier 2 Tech Lead +**Priority:** Medium (developer + AI-readability; not a regression blocker) + +--- + +## 1. Overview + +This track introduces a small, focused set of `TypeAlias` definitions in a new `src/type_aliases.py` module and replaces 370+ anonymous `dict[str, Any]` / `list[dict[...]]` usages across 6 high-traffic files (`src/ai_client.py`, `src/app_controller.py`, `src/models.py`, `src/api_hook_client.py`, `src/project_manager.py`, `src/aggregate.py`). It also converts 2-3 tuple returns to `NamedTuple`s for self-documenting struct semantics. + +The track is **data-grounded**: a new AST-based audit script (`scripts/audit_weak_types.py`, committed in `84fd9ac9`) found 430 weak type sites across 29 of 61 files. After whitespace normalization, only **26 unique type strings** exist; the top 4 (`list[dict[str, Any]]`, `dict[str, Any]`, `Dict[str, Any]`, `List[Dict[str, Any]]`) account for 86% of findings. A small set of well-named aliases eliminates the vast majority. + +**The current codebase has ZERO strong type aliases** (no `TypeAlias`, no `NamedTuple`, no `pydantic.BaseModel` for these shapes). This is the worst case for AI readability — an LLM reading the code has zero schema hints and must guess the shape from usage at every call site. + +**Scope is deliberately bounded.** The track adds **6 type aliases** and converts **2-3 tuple returns** to NamedTuples. It does NOT migrate to `TypedDict` or `@dataclass` schemas (that's a much larger Phase 2, planned as a separate follow-up). It does NOT touch the 23 lower-impact files; they remain as `dict[str, Any]` until a future track migrates them. + +## 2. Goals (Priority Order) + +| Priority | Goal | Rationale | +|---|---|---| +| **A (primary value)** | Add 6 `TypeAlias` definitions to `src/type_aliases.py`: `Metadata`, `CommsLogEntry`, `CommsLog`, `FileItem`, `FileItems`, `HistoryMessage`. | Each alias names a concept that currently appears as `dict[str, Any]` or `list[dict[str, Any]]` in 30+ sites. The name is self-documenting; the underlying type is the same. | +| **A (primary value)** | Mechanical replacement of 370+ weak sites in 6 files: `src/ai_client.py`, `src/app_controller.py`, `src/models.py`, `src/api_hook_client.py`, `src/project_manager.py`, `src/aggregate.py`. | The audit shows 86% of findings are in these 6 files. A focused refactor here eliminates the bulk of the noise. | +| **B (architectural)** | The new aliases are the **canonical** names going forward. New code MUST use the aliases. Old code is migrated opportunistically (this track + future tracks). | One source of truth. The audit script (`scripts/audit_weak_types.py`) becomes a permanent CI gate that fails when new weak types are introduced. | +| **B (architectural)** | Audit script exits 0 with significantly fewer findings after the refactor. Re-running `--json` should show the count drop from 430 to ~60 (only the 23 lower-impact files remain). | Measurable success criterion. The audit script is the ground truth. | +| **C (optimization)** | Convert 2-3 tuple returns to `NamedTuple`s. Specifically: `_reread_file_items()` returns `Tuple[refreshed, changed]` becomes a `FileItemsDiff` NamedTuple. Other 1-occurrence tuples (screen coords, etc.) are converted opportunistically. | The tuple return pattern is rarer than the dict pattern (4 sites vs 430), but each conversion is high-value for self-documentation. | +| **C (documentation)** | Add a short "Data Structure Conventions" section to `conductor/product-guidelines.md` and a "Type Aliases" subsection in `conductor/code_styleguides/error_handling.md` (or a new `code_styleguides/type_aliases.md`). | The convention is visible in the project-level guidance. Future plans reference it. | +| **D (forward-looking)** | Plan a Phase-2 follow-up track: "TypedDict / dataclass Migration" that converts the most-used aliases (`CommsLogEntry`, `FileItem`) to `TypedDict` or `@dataclass(frozen=True)` so the FIELDS are visible to LLMs, not just the name. NOT in this track; documented in §12.1. | Honest about what's missing. Phase 2 is a separate effort. | + +### 2.1 Non-Goals (this track) + +- **Not** converting `dict[str, Any]` to `TypedDict` or `@dataclass`. That's Phase 2 of a future track. This track stops at NAMING the shapes; it does not give them structure. +- **Not** touching the 23 lower-impact files. They stay as `dict[str, Any]` until a future incremental track migrates them. The audit script makes their weakness VISIBLE so the cost of ignoring them is documented. +- **Not** changing the `Result[T]` pattern from the `data_oriented_error_handling_20260606` track. The aliases complement `Result`; they don't replace it. (`ErrorInfo` is a `@dataclass`, not a `TypeAlias`; it's already structured.) +- **Not** adding pydantic models. The project doesn't currently use pydantic for these shapes; introducing it would be a much larger architectural decision. +- **Not** modifying the data_oriented_error_handling_20260606 track's `src/result_types.py`. The aliases live in a new file (`src/type_aliases.py`); they coexist with `Result`/`ErrorInfo`. +- **Not** changing the public API of any function. The aliases are TYPE-LEVEL ONLY; runtime behavior is identical. + +## 3. Architecture + +### 3.1 The Aliases + +`src/type_aliases.py` (NEW, ~80 lines): + +```python +from typing import Any, Callable, TypeAlias + +# A single key-value record. The shape is intentionally open (Any value type) +# because different concepts use different value types (str for paths, int for +# counts, dict for nested structures, etc.). The name documents the SEMANTIC +# ROLE, not the structural shape. +Metadata: TypeAlias = dict[str, Any] + +# A single entry in the AI comms log (the in-memory ring buffer of API +# requests/responses/timestamps/kind/direction). Used by _comms_log, +# _append_comms, get_comms_log, comms_log_callback, etc. +CommsLogEntry: TypeAlias = Metadata + +# A list of comms log entries. +CommsLog: TypeAlias = list[CommsLogEntry] + +# A single entry in the AI provider's conversation history (the messages +# list passed to/from OpenAI/Anthropic/Gemini). Used by _anthropic_history, +# _deepseek_history, _minimax_history, _grok_history, _llama_history, etc. +HistoryMessage: TypeAlias = Metadata + +# A list of history messages. +History: TypeAlias = list[HistoryMessage] + +# A single file item in the context (path, content, is_image flag, base64 +# data, mtime). Used by file_items parameter (the most-threated list in +# the codebase), _reread_file_items, _build_file_context_text, etc. +FileItem: TypeAlias = Metadata + +# A list of file items. The most common weak pattern in the codebase. +FileItems: TypeAlias = list[FileItem] + +# A single tool definition (function name, description, parameters schema). +# Used by _build_anthropic_tools, _CACHED_ANTHROPIC_TOOLS, _get_anthropic_tools, +# and the corresponding openai-compatible / gemini / deepseek builders. +ToolDefinition: TypeAlias = Metadata + +# A single tool call from the model (id, type, function: {name, arguments}). +# Used by response.tool_calls parsing across all providers. +ToolCall: TypeAlias = Metadata + +# A callback that receives a comms log entry. Used by comms_log_callback, +# confirm_and_run_callback, etc. +CommsLogCallback: TypeAlias = Callable[[CommsLogEntry], None] +``` + +### 3.2 The NamedTuples (Phase 2) + +`src/type_aliases.py` (continued): + +```python +from typing import NamedTuple + +# Return type of _reread_file_items. The two lists are conceptually distinct: +# refreshed = items whose mtime was checked and the content re-read; changed = +# items whose content actually changed (subset of refreshed). +class FileItemsDiff(NamedTuple): + refreshed: FileItems + changed: FileItems +``` + +(Optional, if 1-2 more tuple returns warrant conversion — e.g., `Optional[Tuple[int, int, int, int]]` for screen coords, etc. — add them as separate `NamedTuple`s with semantic names.) + +### 3.3 Why These Specific Aliases + +The 6 aliases were chosen to be **concept-distinct**: each names a different semantic role that the code uses. Using the same name (`Metadata`) for all of them would collapse the semantic distinction; using 30 names would exceed the AI's vocabulary budget. 6 is the sweet spot: + +| Alias | Semantic role | Distinct from | +|---|---|---| +| `Metadata` | generic key-value record | (root) | +| `CommsLogEntry` | a single comms log entry | `HistoryMessage` (different lifecycle) | +| `HistoryMessage` | a single AI provider history message | `CommsLogEntry` (different lifecycle) | +| `FileItem` | a single file in the context | `ToolDefinition` (different shape: paths vs function specs) | +| `ToolDefinition` | a single tool definition | `FileItem`, `ToolCall` | +| `ToolCall` | a single tool call from the model | `ToolDefinition` (definition vs invocation) | + +Some of these are aliased to `Metadata` (e.g., `CommsLogEntry: TypeAlias = Metadata`). This is intentional: Phase 2 can convert `Metadata` to a `TypedDict` (or split into per-concept `TypedDict`s) and the aliases continue to work without breaking changes. The aliases are STABLE NAMES; the underlying type can evolve. + +### 3.4 Module Layout + +``` +src/ + type_aliases.py # NEW: 6 TypeAliases + 1-3 NamedTuples + ai_client.py # MODIFIED: import aliases; replace ~139 weak sites + app_controller.py # MODIFIED: import aliases; replace ~86 weak sites + models.py # MODIFIED: import aliases; replace ~51 weak sites + api_hook_client.py # MODIFIED: import aliases; replace ~32 weak sites + project_manager.py # MODIFIED: import aliases; replace ~20 weak sites + aggregate.py # MODIFIED: import aliases; replace ~17 weak sites + mcp_client.py # UNCHANGED (only 9 weak sites; below the threshold) + +conductor/ + product-guidelines.md # MODIFIED: new "Data Structure Conventions" section + code_styleguides/ + type_aliases.md # NEW: the canonical reference (or co-located in error_handling.md) + +scripts/ + audit_weak_types.py # already committed in 84fd9ac9; runs as CI gate + +tests/ + test_type_aliases.py # NEW: verify the aliases import and resolve to the right types + (existing test files): # MODIFIED: update the 6 files; existing tests should pass unchanged +``` + +### 3.5 Coexistence with `Result[T]` and `ErrorInfo` + +The new `Metadata` family aliases are VALUE-LEVEL types (what's in a dict). The `Result[T]` from `data_oriented_error_handling_20260606` is a CONTROL-LEVEL wrapper (a data struct that includes errors). They compose: + +```python +# Data-oriented error handling returns: +Result[CommsLogEntry] # a Result wrapping a single comms log entry +Result[History] # a Result wrapping a list of history messages +Result[FileItems] # a Result wrapping a list of file items + +# The aliases name the "T" in Result[T], not the Result itself. +``` + +This is consistent: `Result` is a generic that wraps any data type. Naming the data types (via `TypeAlias`) makes the generic concrete without changing the `Result` pattern. + +## 4. Per-File Refactor Plan + +### 4.1 `src/ai_client.py` (139 sites — largest offender) + +**Pattern:** `_anthropic_history: list[dict[str, Any]]` (and 5 sibling histories), `_comms_log: deque[dict[str, Any]]`, `get_comms_log -> list[dict[str, Any]]`, `_build_anthropic_tools -> list[dict[str, Any]]`, `_reread_file_items -> tuple[list[...], list[...]]`, etc. + +**Refactor strategy:** +- Replace all 79 `dict[str, Any]` / `Dict[str, Any]` with `Metadata` or the more specific alias. +- Replace all 56 `list[dict[...]]` with `CommsLog` / `History` / `FileItems` / `ToolDefinitions` based on the SEMANTIC ROLE of the list. +- 2 `Optional[List[Dict[...]]]` with `Optional[FileItems]` (the `_CACHED_ANTHROPIC_TOOLS` is an Optional[ToolDefinitions]). +- 2 tuple-return literal returns: the `cast(...)` patterns in `_dispatch_tool`. Replace with `ToolCall` extraction. + +**Naming heuristic:** for each list of dicts, look at the variable name + the function name to determine the semantic role. E.g., `_comms_log` → `CommsLog`; `_anthropic_history` → `History`; `_build_anthropic_tools` → `ToolDefinitions`; `_reread_file_items(file_items: list[...])` → `FileItems`. + +### 4.2 `src/app_controller.py` (86 sites) + +**Pattern:** `_pending_dialog: Optional[ConfirmDialog] = None` (stays as-is; this is a STRONG type already), `last_error: Optional[Dict[str, str]] = None` (could be `Optional[ErrorInfo]` from the data_oriented track), but most weak sites are in the `Hook API` request/response payloads and the `pre_tool_callback` family. + +**Refactor strategy:** +- The 62 `dict_str_any` sites: replace with `Metadata` or `CommsLogEntry` based on context. +- The 20 `list_of_dict` sites: replace with the appropriate alias. +- The 4 `optional_dict` sites: replace with `Optional[Metadata]` (or `Optional[CommsLogEntry]` if the context is the hook request payload). + +### 4.3 `src/models.py` (51 sites) + +**Pattern:** Dataclass fields. E.g., `script: Optional[str] = None` (stays as-is; STRONG), but also `target_file: Optional[str] = None` and many fields where the type is `Optional[Dict[str, Any]]` (in dataclass fields). + +**Refactor strategy:** Replace 48 `dict_str_any` with `Optional[Metadata]`; 3 `list_of_dict` with the appropriate alias. + +### 4.4 `src/api_hook_client.py` (32 sites) + +**Pattern:** HTTP request/response payloads. E.g., `payload: Dict[str, Any]`, `data: dict[str, Any]`. + +**Refactor strategy:** 30 `dict_str_any` → `Metadata`; 2 `list_of_dict` → `list[Metadata]`. + +### 4.5 `src/project_manager.py` (20 sites) + +**Pattern:** TOML config dicts. E.g., `proj: dict[str, Any]`, `data: dict[str, Any]`. + +**Refactor strategy:** 16 `dict_str_any` → `Metadata`; 3 `list_of_dict` → `list[Metadata]`; 1 `optional_dict` → `Optional[Metadata]`. + +### 4.6 `src/aggregate.py` (17 sites) + +**Pattern:** Aggregation result dicts. E.g., `result: dict[str, list[dict[str, Any]]]`. + +**Refactor strategy:** 10 `dict_str_any` → `Metadata`; 7 `list_of_dict` → appropriate alias. + +### 4.7 Phase 2 NamedTuple conversions + +- **`_reread_file_items`** in `src/ai_client.py` (returns `Tuple[List[FileItem], List[FileItem]]`) → returns `FileItemsDiff`. Affects ~3-4 call sites. +- **1-2 screen-coord tuples** (1-occurrence each) — opportunistic. If the call site is clear and the names are obvious, convert; otherwise leave. + +## 5. The Audit Script as a Permanent CI Gate + +After this track, the audit script becomes a permanent CI gate. `scripts/audit_weak_types.py` exits 0 even when findings exist (it's informational). The CI gate uses a stricter mode: + +```bash +# New mode: --strict, exits 1 if any new weak site is added in a PR +python scripts/audit_weak_types.py --strict +``` + +The `--strict` mode compares the current count to a baseline (stored in `scripts/audit_weak_types.baseline.json`). If the current count is HIGHER than the baseline, exit 1. The baseline is regenerated after this track to the post-refactor count (~60 findings, only the 23 lower-impact files remain). + +This is documented in the spec but the actual `--strict` mode is implemented as part of the track (Phase 1 final task). Future PRs that introduce new `dict[str, Any]` or anonymous tuples will fail CI. + +## 6. Configuration + +No new dependencies. No new environment variables. No new config files. + +The aliases live in `src/type_aliases.py` (pure stdlib `typing.TypeAlias`). + +## 7. Testing Strategy + +| Test File | Purpose | Coverage Target | +|---|---|---| +| `tests/test_type_aliases.py` | Verify the aliases import; verify they resolve to the expected types; verify they compose with `Result[T]` (e.g., `Result[FileItems]` is a valid generic). | 100% | +| `tests/test_audit_weak_types.py` | Verify the audit script's regex patterns are correct; verify the `Finding` dataclass is populated correctly; verify the report matches expectations. | 90% | +| `tests/test_ai_client.py` (existing) | Verify no regressions after the 139-site replacement. | 100% (regression) | +| `tests/test_app_controller.py` (existing) | Verify no regressions after the 86-site replacement. | 100% (regression) | +| `tests/test_models.py` (existing) | Verify no regressions after the 51-site replacement. | 100% (regression) | +| `tests/test_api_hook_client.py` (existing) | Verify no regressions after the 32-site replacement. | 100% (regression) | +| `tests/test_project_manager.py` (existing) | Verify no regressions after the 20-site replacement. | 100% (regression) | +| `tests/test_aggregate.py` (existing) | Verify no regressions after the 17-site replacement. | 100% (regression) | +| `tests/test_mcp_client.py` (existing) | Verify no regressions. (mcp_client is unchanged but the aliases may be adopted opportunistically in Phase 1.5 if convenient.) | 100% (regression) | + +**Mocking strategy:** Existing tests use `unittest.mock.patch`; no changes needed. + +**Audit baseline check:** After Phase 1, the audit script should report 0 NEW findings (the count may go UP if a few sites were missed, but the trend is DOWN). After Phase 2, the count should be at or below the pre-track baseline minus 50 (the targeted reductions). + +## 8. Migration / Rollout + +| Phase | What | Risk | +|---|---|---| +| **Phase 1 — Aliases + 6-file replacement + audit baseline** | Add `src/type_aliases.py`. Add `tests/test_type_aliases.py`. Mechanical replacement in 6 files. Add `--strict` mode to the audit script. Generate the new baseline. | Medium. ~345 sites of mechanical replacement. Mitigated by existing test coverage. | +| **Phase 2 — NamedTuples + docs + archive** | Convert 2-3 tuple returns to NamedTuples. Update `conductor/product-guidelines.md` and `code_styleguides/`. Manual smoke test. Archive the track. | Low. ~3-4 sites of tuple conversion. Docs-only changes. | + +Each phase has its own checkpoint commit and git note. + +## 9. Risks & Mitigations + +| Risk | Likelihood | Impact | Mitigation | +|---|---|---|---| +| Mechanical replacement misses a few sites; the count doesn't drop as expected. | Medium | Low | The audit script is the source of truth. Re-run after Phase 1; investigate any anomalies. | +| Renaming `dict[str, Any]` to `Metadata` (or another alias) changes how some tests introspect types (e.g., `isinstance(x, dict)`). | Low | Medium | The aliases are TYPE-LEVEL ONLY; at runtime, `Metadata` IS `dict[str, Any]` IS `dict`. `isinstance(x, dict)` continues to work. Test cases that use `get_type_hints()` may need updating; documented in the test plan. | +| A future contributor adds a new `dict[str, Any]` and the audit script doesn't catch it. | Low | Low | The audit script's regex patterns are exhaustive for the current 430 findings. New patterns (e.g., a new `Mapping[str, Any]`) would be missed. The track documents the patterns the script knows; future contributions of new patterns warrant extending the script. | +| The aliases conflict with the `Result[T]` and `ErrorInfo` from the data_oriented_error_handling track. | Low | Low | The aliases are VALUE-LEVEL (data types); `Result` and `ErrorInfo` are CONTROL-LEVEL (wrappers). They compose: `Result[FileItems]` is valid. No conflict. | +| The 6-file mechanical replacement is too large to review in one PR. | Medium | Low | Phase 1 is split into 6 sub-tasks (one per file) in the plan, each with its own commit. Reviewers can review file-by-file. | +| The 23 lower-impact files are NEVER migrated. | High | Low (acceptable) | The audit script stays in the codebase as a permanent CI gate. The cost of ignoring the 23 files is now VISIBLE. Future tracks can pick them up opportunistically. | + +## 10. Out of Scope (Explicit) + +- **TypedDict / @dataclass migration** of the `Metadata` family. Deferred to a Phase 2 of a future track. This track adds NAMES; the next track adds STRUCTURE. +- **The 23 lower-impact files** (those with 1-9 weak sites each). Deferred; will be addressed opportunistically or in a future incremental track. +- **Adding pydantic models.** Not requested; would be a much larger architectural decision. +- **Changing function signatures at the runtime level.** The aliases are TYPE-LEVEL; runtime behavior is identical. +- **Modifying `scripts/audit_weak_types.py`'s regex patterns.** The patterns are correct for the current findings. If new patterns emerge, a future track can extend the script. +- **Migrating the data_oriented_error_handling_20260606 track's `src/result_types.py` aliases.** The 2 type-aliases modules are SEPARATE: `result_types.py` has `ErrorInfo` / `Result` / `ErrorKind`; `type_aliases.py` has `Metadata` / `CommsLog` / `FileItem` / etc. They don't overlap. + +## 11. Open Questions + +1. **The 6 aliases or 4?** The 6 listed in §3.1 are: `Metadata`, `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `FileItem`, `FileItems`, `ToolDefinition`, `ToolCall`, `CommsLogCallback`. That's 10. Should we cut to 4-6 to minimize the AI vocabulary? (Proposal: keep all 10; they're each named for a distinct concept, and the 10 names are self-explanatory. The "vocabulary cost" is the same as adding 10 new function names to a module — well within normal Python codebase scale.) +2. **Should `FileItem` and `ToolDefinition` be `TypedDict` from the start?** A `TypedDict` gives the AI field-level hints, not just a name. But introducing `TypedDict` requires knowing the FIELDS, which is a deeper semantic task. (Proposal: Phase 1 uses `TypeAlias = dict[str, Any]`; Phase 2 of a future track converts to `TypedDict`. Keeps the current track scope tight.) +3. **Should the audit script enforce a count threshold (e.g., "no more than 100 weak sites total") or a per-file threshold (e.g., "no file may have more than 50 weak sites")?** (Proposal: per-file threshold is more actionable. A future PR that introduces 20 new `dict[str, Any]` in `foo.py` would fail even if the total count didn't increase.) + +## 12. See Also + +### 12.1 Follow-up Track (planned; not in this spec) + +**"TypedDict / dataclass Migration"** (`typed_dict_migration_20260606` or similar) — converts the most-used aliases (`CommsLogEntry`, `FileItem`, `ToolDefinition`, `HistoryMessage`) to `TypedDict` (Python 3.8+) or `@dataclass(frozen=True)` so the FIELDS are visible to LLMs, not just the name. This is the natural Phase 2 of this track. Prerequisites: this track (so the field-level schema has a stable name to attach to). + +### 12.2 Project References + +- `scripts/audit_weak_types.py` (already committed; `84fd9ac9`) — the audit that found 430 weak sites. +- `docs/guide_testing.md` — test conventions. +- `conductor/code_styleguides/error_handling.md` (created in the data_oriented_error_handling_20260606 track) — the convention for `Result` types; the new type-aliases convention lives alongside. +- `conductor/product-guidelines.md` "Data-Oriented Error Handling" — the convention this track extends (Data Structure Strengthening is a new top-level convention in the same family). +- `conductor/tracks/data_oriented_error_handling_20260606/` — the previous track that established the convention format; this track uses the same pattern. + +### 12.3 External References + +- **Python `typing.TypeAlias`** — the canonical mechanism for type aliases (PEP 613, Python 3.10+). +- **Python `typing.NamedTuple`** — for tuple-with-fields. +- **Python `typing.TypedDict`** — for the future Phase 2 (not in this track). +- **Mike Acton on data-oriented design** — the "data is the API" framing that motivates NAMING data structures clearly. +- **Casey Muratori on module layer boundaries** — the convention that each module owns its data and exposes a clear interface. diff --git a/conductor/tracks/data_structure_strengthening_20260606/state.toml b/conductor/tracks/data_structure_strengthening_20260606/state.toml new file mode 100644 index 00000000..7ade3957 --- /dev/null +++ b/conductor/tracks/data_structure_strengthening_20260606/state.toml @@ -0,0 +1,91 @@ +# Track state for data_structure_strengthening_20260606 +# Updated by Tier 2 Tech Lead as tasks complete + +[meta] +track_id = "data_structure_strengthening_20260606" +name = "Data Structure Strengthening (Type Aliases + NamedTuples)" +status = "active" +current_phase = 0 +last_updated = "2026-06-06" + +[phases] +phase_1 = { status = "pending", checkpointsha = "", name = "Aliases + 6-file replacement + audit baseline" } +phase_2 = { status = "pending", checkpointsha = "", name = "NamedTuples + docs + archive" } + +[tasks] +# Phase 1: Aliases + 6-file replacement +t1_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_type_aliases.py (verify 10 TypeAliases + 1 NamedTuple import and resolve to expected types; verify Result[FileItems] composes)" } +t1_2 = { status = "pending", commit_sha = "", description = "Green: create src/type_aliases.py with 10 TypeAliases (Metadata, CommsLogEntry, CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition, ToolCall, CommsLogCallback) and 1 NamedTuple (FileItemsDiff)" } +t1_3 = { status = "pending", commit_sha = "", description = "Replace 139 weak sites in src/ai_client.py with the new aliases (79 dict_str_any + 56 list_of_dict + 2 Optional[List[Dict]] + 2 assign_tuple_literal)" } +t1_4 = { status = "pending", commit_sha = "", description = "Replace 86 weak sites in src/app_controller.py (62 dict_str_any + 20 list_of_dict + 4 optional_dict)" } +t1_5 = { status = "pending", commit_sha = "", description = "Replace 51 weak sites in src/models.py (48 dict_str_any + 3 list_of_dict)" } +t1_6 = { status = "pending", commit_sha = "", description = "Replace 32 weak sites in src/api_hook_client.py (30 dict_str_any + 2 list_of_dict)" } +t1_7 = { status = "pending", commit_sha = "", description = "Replace 20 weak sites in src/project_manager.py (16 dict_str_any + 3 list_of_dict + 1 optional_dict)" } +t1_8 = { status = "pending", commit_sha = "", description = "Replace 17 weak sites in src/aggregate.py (10 dict_str_any + 7 list_of_dict)" } +t1_9 = { status = "pending", commit_sha = "", description = "Add --strict mode to scripts/audit_weak_types.py (compares current count to baseline file; exits 1 if increased)" } +t1_10 = { status = "pending", commit_sha = "", description = "Generate scripts/audit_weak_types.baseline.json with the post-Phase-1 count" } +t1_11 = { status = "pending", commit_sha = "", description = "Red: tests/test_audit_weak_types.py (verify regex patterns, Finding dataclass, report format)" } +t1_12 = { status = "pending", commit_sha = "", description = "Run full test suite; confirm no regressions in 6 refactored files" } +t1_13 = { status = "pending", commit_sha = "", description = "Run audit; confirm count dropped from 430 to ~60; commit the new baseline" } +t1_14 = { status = "pending", commit_sha = "", description = "Phase 1 checkpoint commit + git note" } +# Phase 2: NamedTuples + docs + archive +t2_1 = { status = "pending", commit_sha = "", description = "Convert src/ai_client.py:_reread_file_items to return FileItemsDiff NamedTuple (replaces Tuple[List[FileItem], List[FileItem]]); update ~3-4 call sites" } +t2_2 = { status = "pending", commit_sha = "", description = "Opportunistic NamedTuple conversions for 1-2 more tuple returns (screen coords, etc.)" } +t2_3 = { status = "pending", commit_sha = "", description = "Create conductor/code_styleguides/type_aliases.md (canonical reference for the alias convention; 5 patterns + decision tree + examples)" } +t2_4 = { status = "pending", commit_sha = "", description = "Add 'Data Structure Conventions' section to conductor/product-guidelines.md (referencing the new styleguide)" } +t2_5 = { status = "pending", commit_sha = "", description = "Manual smoke test: launch GUI; verify type aliases don't break anything; verify audit --strict mode" } +t2_6 = { status = "pending", commit_sha = "", description = "Phase 2 checkpoint commit + git note (TRACK COMPLETE)" } +t2_7 = { status = "pending", commit_sha = "", description = "git mv conductor/tracks/data_structure_strengthening_20260606 to conductor/tracks/archive/" } +t2_8 = { status = "pending", commit_sha = "", description = "Update conductor/tracks.md: move entry to Recently Completed" } +t2_9 = { status = "pending", commit_sha = "", description = "Final state.toml update: mark all phases completed; add follow-up track typed_dict_migration_20260606 placeholder" } + +[verification] +# Filled as phases complete +phase_1_aliases_module_complete = false +phase_1_ai_client_refactored = false +phase_1_app_controller_refactored = false +phase_1_models_refactored = false +phase_1_api_hook_client_refactored = false +phase_1_project_manager_refactored = false +phase_1_aggregate_refactored = false +phase_1_audit_strict_mode_added = false +phase_1_baseline_committed = false +phase_2_file_items_diff_named_tuple = false +phase_2_opportunistic_named_tuples = false +phase_2_styleguide_written = false +phase_2_product_guidelines_updated = false +phase_2_smoke_test_passed = false +phase_2_track_archived = false +full_test_suite_passes = false +no_new_optional_introduced = false +audit_count_dropped_to_60 = false + +[audit_count_progression] +# Filled as tasks complete +baseline = 430 +after_ai_client = 291 +after_app_controller = 205 +after_models = 154 +after_api_hook_client = 122 +after_project_manager = 102 +after_aggregate = 85 +phase_1_checkpoint_committed = 0 # TBD +phase_2_checkpoint_committed = 0 # TBD + +[files_refactored] +ai_client = { weak_sites_before = 139, weak_sites_after = 0, status = "pending" } +app_controller = { weak_sites_before = 86, weak_sites_after = 0, status = "pending" } +models = { weak_sites_before = 51, weak_sites_after = 0, status = "pending" } +api_hook_client = { weak_sites_before = 32, weak_sites_after = 0, status = "pending" } +project_manager = { weak_sites_before = 20, weak_sites_after = 0, status = "pending" } +aggregate = { weak_sites_before = 17, weak_sites_after = 0, status = "pending" } + +[typed_dict_migration_followup] +track_id = "typed_dict_migration_20260606" +status = "planned_in_data_structure_strengthening_20260606" +converts = ["CommsLogEntry", "FileItem", "ToolDefinition", "HistoryMessage"] +to = "TypedDict or @dataclass(frozen=True)" + +[public_api_migration_followup] +# From the data_oriented_error_handling track +note = "This track does not depend on or block the public_api_migration_20260606 track. They are independent."