Private
Public Access
0
0
Files
manual_slop/conductor/tracks/data_structure_strengthening_20260606/spec.md
T
ed ed42a97a9b conductor(track): Initialize data_structure_strengthening_20260606
Track + metadata + state + tracks.md registration for the type-aliases
refactor that follows the audit_weak_types.py findings (430 weak sites
across 29 of 61 files; 86% concentrated in 6 high-traffic files).

Key design decisions (per user approval):
- 10 TypeAlias definitions in src/type_aliases.py (Metadata, CommsLogEntry,
  CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition,
  ToolCall, CommsLogCallback).
- 1 NamedTuple (FileItemsDiff) for the _reread_file_items return.
- Mechanical replacement of 345 weak sites across 6 files (NOT 430; the
  remaining 85 are in 23 lower-impact files deferred to future tracks).
- scripts/audit_weak_types.py gains a --strict mode and a baseline file
  (scripts/audit_weak_types.baseline.json) so the count is enforced.
- 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples
  + docs + archive.
- Honest about what's missing: TypedDict / @dataclass migration is a
  follow-up track (typed_dict_migration_20260606), not this one.
- Coexistence with the data_oriented_error_handling_20260606 track's
  Result[T] / ErrorInfo: the aliases are value-level (data types), Result
  is control-level (wrapper). They compose (Result[FileItems] is valid).
  No conflict.

Audit baseline:
- Pre-track: 430 weak sites, 0 strong patterns
- Target after Phase 1: ~60 weak sites (only the 23 lower-impact files)
- Top 4 unique type strings account for 86% of findings (4-6 aliases
  eliminate the bulk of the noise).

Not blocked by anything; can be executed independently of the other
pending tracks. Blocks typed_dict_migration_20260606 (the future Phase 2).
2026-06-06 17:49:22 -04:00

23 KiB

Track: Data Structure Strengthening (Type Aliases + NamedTuples)

Status: Active (spec approved 2026-06-06) Initialized: 2026-06-06 Owner: Tier 2 Tech Lead Priority: Medium (developer + AI-readability; not a regression blocker)


1. Overview

This track introduces a small, focused set of TypeAlias definitions in a new src/type_aliases.py module and replaces 370+ anonymous dict[str, Any] / list[dict[...]] usages across 6 high-traffic files (src/ai_client.py, src/app_controller.py, src/models.py, src/api_hook_client.py, src/project_manager.py, src/aggregate.py). It also converts 2-3 tuple returns to NamedTuples for self-documenting struct semantics.

The track is data-grounded: a new AST-based audit script (scripts/audit_weak_types.py, committed in 84fd9ac9) found 430 weak type sites across 29 of 61 files. After whitespace normalization, only 26 unique type strings exist; the top 4 (list[dict[str, Any]], dict[str, Any], Dict[str, Any], List[Dict[str, Any]]) account for 86% of findings. A small set of well-named aliases eliminates the vast majority.

The current codebase has ZERO strong type aliases (no TypeAlias, no NamedTuple, no pydantic.BaseModel for these shapes). This is the worst case for AI readability — an LLM reading the code has zero schema hints and must guess the shape from usage at every call site.

Scope is deliberately bounded. The track adds 6 type aliases and converts 2-3 tuple returns to NamedTuples. It does NOT migrate to TypedDict or @dataclass schemas (that's a much larger Phase 2, planned as a separate follow-up). It does NOT touch the 23 lower-impact files; they remain as dict[str, Any] until a future track migrates them.

2. Goals (Priority Order)

Priority Goal Rationale
A (primary value) Add 6 TypeAlias definitions to src/type_aliases.py: Metadata, CommsLogEntry, CommsLog, FileItem, FileItems, HistoryMessage. Each alias names a concept that currently appears as dict[str, Any] or list[dict[str, Any]] in 30+ sites. The name is self-documenting; the underlying type is the same.
A (primary value) Mechanical replacement of 370+ weak sites in 6 files: src/ai_client.py, src/app_controller.py, src/models.py, src/api_hook_client.py, src/project_manager.py, src/aggregate.py. The audit shows 86% of findings are in these 6 files. A focused refactor here eliminates the bulk of the noise.
B (architectural) The new aliases are the canonical names going forward. New code MUST use the aliases. Old code is migrated opportunistically (this track + future tracks). One source of truth. The audit script (scripts/audit_weak_types.py) becomes a permanent CI gate that fails when new weak types are introduced.
B (architectural) Audit script exits 0 with significantly fewer findings after the refactor. Re-running --json should show the count drop from 430 to ~60 (only the 23 lower-impact files remain). Measurable success criterion. The audit script is the ground truth.
C (optimization) Convert 2-3 tuple returns to NamedTuples. Specifically: _reread_file_items() returns Tuple[refreshed, changed] becomes a FileItemsDiff NamedTuple. Other 1-occurrence tuples (screen coords, etc.) are converted opportunistically. The tuple return pattern is rarer than the dict pattern (4 sites vs 430), but each conversion is high-value for self-documentation.
C (documentation) Add a short "Data Structure Conventions" section to conductor/product-guidelines.md and a "Type Aliases" subsection in conductor/code_styleguides/error_handling.md (or a new code_styleguides/type_aliases.md). The convention is visible in the project-level guidance. Future plans reference it.
D (forward-looking) Plan a Phase-2 follow-up track: "TypedDict / dataclass Migration" that converts the most-used aliases (CommsLogEntry, FileItem) to TypedDict or @dataclass(frozen=True) so the FIELDS are visible to LLMs, not just the name. NOT in this track; documented in §12.1. Honest about what's missing. Phase 2 is a separate effort.

2.1 Non-Goals (this track)

  • Not converting dict[str, Any] to TypedDict or @dataclass. That's Phase 2 of a future track. This track stops at NAMING the shapes; it does not give them structure.
  • Not touching the 23 lower-impact files. They stay as dict[str, Any] until a future incremental track migrates them. The audit script makes their weakness VISIBLE so the cost of ignoring them is documented.
  • Not changing the Result[T] pattern from the data_oriented_error_handling_20260606 track. The aliases complement Result; they don't replace it. (ErrorInfo is a @dataclass, not a TypeAlias; it's already structured.)
  • Not adding pydantic models. The project doesn't currently use pydantic for these shapes; introducing it would be a much larger architectural decision.
  • Not modifying the data_oriented_error_handling_20260606 track's src/result_types.py. The aliases live in a new file (src/type_aliases.py); they coexist with Result/ErrorInfo.
  • Not changing the public API of any function. The aliases are TYPE-LEVEL ONLY; runtime behavior is identical.

3. Architecture

3.1 The Aliases

src/type_aliases.py (NEW, ~80 lines):

from typing import Any, Callable, TypeAlias

# A single key-value record. The shape is intentionally open (Any value type)
# because different concepts use different value types (str for paths, int for
# counts, dict for nested structures, etc.). The name documents the SEMANTIC
# ROLE, not the structural shape.
Metadata: TypeAlias = dict[str, Any]

# A single entry in the AI comms log (the in-memory ring buffer of API
# requests/responses/timestamps/kind/direction). Used by _comms_log,
# _append_comms, get_comms_log, comms_log_callback, etc.
CommsLogEntry: TypeAlias = Metadata

# A list of comms log entries.
CommsLog: TypeAlias = list[CommsLogEntry]

# A single entry in the AI provider's conversation history (the messages
# list passed to/from OpenAI/Anthropic/Gemini). Used by _anthropic_history,
# _deepseek_history, _minimax_history, _grok_history, _llama_history, etc.
HistoryMessage: TypeAlias = Metadata

# A list of history messages.
History: TypeAlias = list[HistoryMessage]

# A single file item in the context (path, content, is_image flag, base64
# data, mtime). Used by file_items parameter (the most-threated list in
# the codebase), _reread_file_items, _build_file_context_text, etc.
FileItem: TypeAlias = Metadata

# A list of file items. The most common weak pattern in the codebase.
FileItems: TypeAlias = list[FileItem]

# A single tool definition (function name, description, parameters schema).
# Used by _build_anthropic_tools, _CACHED_ANTHROPIC_TOOLS, _get_anthropic_tools,
# and the corresponding openai-compatible / gemini / deepseek builders.
ToolDefinition: TypeAlias = Metadata

# A single tool call from the model (id, type, function: {name, arguments}).
# Used by response.tool_calls parsing across all providers.
ToolCall: TypeAlias = Metadata

# A callback that receives a comms log entry. Used by comms_log_callback,
# confirm_and_run_callback, etc.
CommsLogCallback: TypeAlias = Callable[[CommsLogEntry], None]

3.2 The NamedTuples (Phase 2)

src/type_aliases.py (continued):

from typing import NamedTuple

# Return type of _reread_file_items. The two lists are conceptually distinct:
# refreshed = items whose mtime was checked and the content re-read; changed =
# items whose content actually changed (subset of refreshed).
class FileItemsDiff(NamedTuple):
 refreshed: FileItems
 changed: FileItems

(Optional, if 1-2 more tuple returns warrant conversion — e.g., Optional[Tuple[int, int, int, int]] for screen coords, etc. — add them as separate NamedTuples with semantic names.)

3.3 Why These Specific Aliases

The 6 aliases were chosen to be concept-distinct: each names a different semantic role that the code uses. Using the same name (Metadata) for all of them would collapse the semantic distinction; using 30 names would exceed the AI's vocabulary budget. 6 is the sweet spot:

Alias Semantic role Distinct from
Metadata generic key-value record (root)
CommsLogEntry a single comms log entry HistoryMessage (different lifecycle)
HistoryMessage a single AI provider history message CommsLogEntry (different lifecycle)
FileItem a single file in the context ToolDefinition (different shape: paths vs function specs)
ToolDefinition a single tool definition FileItem, ToolCall
ToolCall a single tool call from the model ToolDefinition (definition vs invocation)

Some of these are aliased to Metadata (e.g., CommsLogEntry: TypeAlias = Metadata). This is intentional: Phase 2 can convert Metadata to a TypedDict (or split into per-concept TypedDicts) and the aliases continue to work without breaking changes. The aliases are STABLE NAMES; the underlying type can evolve.

3.4 Module Layout

src/
  type_aliases.py              # NEW: 6 TypeAliases + 1-3 NamedTuples
  ai_client.py                 # MODIFIED: import aliases; replace ~139 weak sites
  app_controller.py            # MODIFIED: import aliases; replace ~86 weak sites
  models.py                    # MODIFIED: import aliases; replace ~51 weak sites
  api_hook_client.py           # MODIFIED: import aliases; replace ~32 weak sites
  project_manager.py           # MODIFIED: import aliases; replace ~20 weak sites
  aggregate.py                 # MODIFIED: import aliases; replace ~17 weak sites
  mcp_client.py                # UNCHANGED (only 9 weak sites; below the threshold)

conductor/
  product-guidelines.md        # MODIFIED: new "Data Structure Conventions" section
  code_styleguides/
    type_aliases.md            # NEW: the canonical reference (or co-located in error_handling.md)

scripts/
  audit_weak_types.py          # already committed in 84fd9ac9; runs as CI gate

tests/
  test_type_aliases.py         # NEW: verify the aliases import and resolve to the right types
  (existing test files):       # MODIFIED: update the 6 files; existing tests should pass unchanged

3.5 Coexistence with Result[T] and ErrorInfo

The new Metadata family aliases are VALUE-LEVEL types (what's in a dict). The Result[T] from data_oriented_error_handling_20260606 is a CONTROL-LEVEL wrapper (a data struct that includes errors). They compose:

# Data-oriented error handling returns:
Result[CommsLogEntry]   # a Result wrapping a single comms log entry
Result[History]         # a Result wrapping a list of history messages
Result[FileItems]       # a Result wrapping a list of file items

# The aliases name the "T" in Result[T], not the Result itself.

This is consistent: Result is a generic that wraps any data type. Naming the data types (via TypeAlias) makes the generic concrete without changing the Result pattern.

4. Per-File Refactor Plan

4.1 src/ai_client.py (139 sites — largest offender)

Pattern: _anthropic_history: list[dict[str, Any]] (and 5 sibling histories), _comms_log: deque[dict[str, Any]], get_comms_log -> list[dict[str, Any]], _build_anthropic_tools -> list[dict[str, Any]], _reread_file_items -> tuple[list[...], list[...]], etc.

Refactor strategy:

  • Replace all 79 dict[str, Any] / Dict[str, Any] with Metadata or the more specific alias.
  • Replace all 56 list[dict[...]] with CommsLog / History / FileItems / ToolDefinitions based on the SEMANTIC ROLE of the list.
  • 2 Optional[List[Dict[...]]] with Optional[FileItems] (the _CACHED_ANTHROPIC_TOOLS is an Optional[ToolDefinitions]).
  • 2 tuple-return literal returns: the cast(...) patterns in _dispatch_tool. Replace with ToolCall extraction.

Naming heuristic: for each list of dicts, look at the variable name + the function name to determine the semantic role. E.g., _comms_logCommsLog; _anthropic_historyHistory; _build_anthropic_toolsToolDefinitions; _reread_file_items(file_items: list[...])FileItems.

4.2 src/app_controller.py (86 sites)

Pattern: _pending_dialog: Optional[ConfirmDialog] = None (stays as-is; this is a STRONG type already), last_error: Optional[Dict[str, str]] = None (could be Optional[ErrorInfo] from the data_oriented track), but most weak sites are in the Hook API request/response payloads and the pre_tool_callback family.

Refactor strategy:

  • The 62 dict_str_any sites: replace with Metadata or CommsLogEntry based on context.
  • The 20 list_of_dict sites: replace with the appropriate alias.
  • The 4 optional_dict sites: replace with Optional[Metadata] (or Optional[CommsLogEntry] if the context is the hook request payload).

4.3 src/models.py (51 sites)

Pattern: Dataclass fields. E.g., script: Optional[str] = None (stays as-is; STRONG), but also target_file: Optional[str] = None and many fields where the type is Optional[Dict[str, Any]] (in dataclass fields).

Refactor strategy: Replace 48 dict_str_any with Optional[Metadata]; 3 list_of_dict with the appropriate alias.

4.4 src/api_hook_client.py (32 sites)

Pattern: HTTP request/response payloads. E.g., payload: Dict[str, Any], data: dict[str, Any].

Refactor strategy: 30 dict_str_anyMetadata; 2 list_of_dictlist[Metadata].

4.5 src/project_manager.py (20 sites)

Pattern: TOML config dicts. E.g., proj: dict[str, Any], data: dict[str, Any].

Refactor strategy: 16 dict_str_anyMetadata; 3 list_of_dictlist[Metadata]; 1 optional_dictOptional[Metadata].

4.6 src/aggregate.py (17 sites)

Pattern: Aggregation result dicts. E.g., result: dict[str, list[dict[str, Any]]].

Refactor strategy: 10 dict_str_anyMetadata; 7 list_of_dict → appropriate alias.

4.7 Phase 2 NamedTuple conversions

  • _reread_file_items in src/ai_client.py (returns Tuple[List[FileItem], List[FileItem]]) → returns FileItemsDiff. Affects ~3-4 call sites.
  • 1-2 screen-coord tuples (1-occurrence each) — opportunistic. If the call site is clear and the names are obvious, convert; otherwise leave.

5. The Audit Script as a Permanent CI Gate

After this track, the audit script becomes a permanent CI gate. scripts/audit_weak_types.py exits 0 even when findings exist (it's informational). The CI gate uses a stricter mode:

# New mode: --strict, exits 1 if any new weak site is added in a PR
python scripts/audit_weak_types.py --strict

The --strict mode compares the current count to a baseline (stored in scripts/audit_weak_types.baseline.json). If the current count is HIGHER than the baseline, exit 1. The baseline is regenerated after this track to the post-refactor count (~60 findings, only the 23 lower-impact files remain).

This is documented in the spec but the actual --strict mode is implemented as part of the track (Phase 1 final task). Future PRs that introduce new dict[str, Any] or anonymous tuples will fail CI.

6. Configuration

No new dependencies. No new environment variables. No new config files.

The aliases live in src/type_aliases.py (pure stdlib typing.TypeAlias).

7. Testing Strategy

Test File Purpose Coverage Target
tests/test_type_aliases.py Verify the aliases import; verify they resolve to the expected types; verify they compose with Result[T] (e.g., Result[FileItems] is a valid generic). 100%
tests/test_audit_weak_types.py Verify the audit script's regex patterns are correct; verify the Finding dataclass is populated correctly; verify the report matches expectations. 90%
tests/test_ai_client.py (existing) Verify no regressions after the 139-site replacement. 100% (regression)
tests/test_app_controller.py (existing) Verify no regressions after the 86-site replacement. 100% (regression)
tests/test_models.py (existing) Verify no regressions after the 51-site replacement. 100% (regression)
tests/test_api_hook_client.py (existing) Verify no regressions after the 32-site replacement. 100% (regression)
tests/test_project_manager.py (existing) Verify no regressions after the 20-site replacement. 100% (regression)
tests/test_aggregate.py (existing) Verify no regressions after the 17-site replacement. 100% (regression)
tests/test_mcp_client.py (existing) Verify no regressions. (mcp_client is unchanged but the aliases may be adopted opportunistically in Phase 1.5 if convenient.) 100% (regression)

Mocking strategy: Existing tests use unittest.mock.patch; no changes needed.

Audit baseline check: After Phase 1, the audit script should report 0 NEW findings (the count may go UP if a few sites were missed, but the trend is DOWN). After Phase 2, the count should be at or below the pre-track baseline minus 50 (the targeted reductions).

8. Migration / Rollout

Phase What Risk
Phase 1 — Aliases + 6-file replacement + audit baseline Add src/type_aliases.py. Add tests/test_type_aliases.py. Mechanical replacement in 6 files. Add --strict mode to the audit script. Generate the new baseline. Medium. ~345 sites of mechanical replacement. Mitigated by existing test coverage.
Phase 2 — NamedTuples + docs + archive Convert 2-3 tuple returns to NamedTuples. Update conductor/product-guidelines.md and code_styleguides/. Manual smoke test. Archive the track. Low. ~3-4 sites of tuple conversion. Docs-only changes.

Each phase has its own checkpoint commit and git note.

9. Risks & Mitigations

Risk Likelihood Impact Mitigation
Mechanical replacement misses a few sites; the count doesn't drop as expected. Medium Low The audit script is the source of truth. Re-run after Phase 1; investigate any anomalies.
Renaming dict[str, Any] to Metadata (or another alias) changes how some tests introspect types (e.g., isinstance(x, dict)). Low Medium The aliases are TYPE-LEVEL ONLY; at runtime, Metadata IS dict[str, Any] IS dict. isinstance(x, dict) continues to work. Test cases that use get_type_hints() may need updating; documented in the test plan.
A future contributor adds a new dict[str, Any] and the audit script doesn't catch it. Low Low The audit script's regex patterns are exhaustive for the current 430 findings. New patterns (e.g., a new Mapping[str, Any]) would be missed. The track documents the patterns the script knows; future contributions of new patterns warrant extending the script.
The aliases conflict with the Result[T] and ErrorInfo from the data_oriented_error_handling track. Low Low The aliases are VALUE-LEVEL (data types); Result and ErrorInfo are CONTROL-LEVEL (wrappers). They compose: Result[FileItems] is valid. No conflict.
The 6-file mechanical replacement is too large to review in one PR. Medium Low Phase 1 is split into 6 sub-tasks (one per file) in the plan, each with its own commit. Reviewers can review file-by-file.
The 23 lower-impact files are NEVER migrated. High Low (acceptable) The audit script stays in the codebase as a permanent CI gate. The cost of ignoring the 23 files is now VISIBLE. Future tracks can pick them up opportunistically.

10. Out of Scope (Explicit)

  • TypedDict / @dataclass migration of the Metadata family. Deferred to a Phase 2 of a future track. This track adds NAMES; the next track adds STRUCTURE.
  • The 23 lower-impact files (those with 1-9 weak sites each). Deferred; will be addressed opportunistically or in a future incremental track.
  • Adding pydantic models. Not requested; would be a much larger architectural decision.
  • Changing function signatures at the runtime level. The aliases are TYPE-LEVEL; runtime behavior is identical.
  • Modifying scripts/audit_weak_types.py's regex patterns. The patterns are correct for the current findings. If new patterns emerge, a future track can extend the script.
  • Migrating the data_oriented_error_handling_20260606 track's src/result_types.py aliases. The 2 type-aliases modules are SEPARATE: result_types.py has ErrorInfo / Result / ErrorKind; type_aliases.py has Metadata / CommsLog / FileItem / etc. They don't overlap.

11. Open Questions

  1. The 6 aliases or 4? The 6 listed in §3.1 are: Metadata, CommsLogEntry, CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition, ToolCall, CommsLogCallback. That's 10. Should we cut to 4-6 to minimize the AI vocabulary? (Proposal: keep all 10; they're each named for a distinct concept, and the 10 names are self-explanatory. The "vocabulary cost" is the same as adding 10 new function names to a module — well within normal Python codebase scale.)
  2. Should FileItem and ToolDefinition be TypedDict from the start? A TypedDict gives the AI field-level hints, not just a name. But introducing TypedDict requires knowing the FIELDS, which is a deeper semantic task. (Proposal: Phase 1 uses TypeAlias = dict[str, Any]; Phase 2 of a future track converts to TypedDict. Keeps the current track scope tight.)
  3. Should the audit script enforce a count threshold (e.g., "no more than 100 weak sites total") or a per-file threshold (e.g., "no file may have more than 50 weak sites")? (Proposal: per-file threshold is more actionable. A future PR that introduces 20 new dict[str, Any] in foo.py would fail even if the total count didn't increase.)

12. See Also

12.1 Follow-up Track (planned; not in this spec)

"TypedDict / dataclass Migration" (typed_dict_migration_20260606 or similar) — converts the most-used aliases (CommsLogEntry, FileItem, ToolDefinition, HistoryMessage) to TypedDict (Python 3.8+) or @dataclass(frozen=True) so the FIELDS are visible to LLMs, not just the name. This is the natural Phase 2 of this track. Prerequisites: this track (so the field-level schema has a stable name to attach to).

12.2 Project References

  • scripts/audit_weak_types.py (already committed; 84fd9ac9) — the audit that found 430 weak sites.
  • docs/guide_testing.md — test conventions.
  • conductor/code_styleguides/error_handling.md (created in the data_oriented_error_handling_20260606 track) — the convention for Result types; the new type-aliases convention lives alongside.
  • conductor/product-guidelines.md "Data-Oriented Error Handling" — the convention this track extends (Data Structure Strengthening is a new top-level convention in the same family).
  • conductor/tracks/data_oriented_error_handling_20260606/ — the previous track that established the convention format; this track uses the same pattern.

12.3 External References

  • Python typing.TypeAlias — the canonical mechanism for type aliases (PEP 613, Python 3.10+).
  • Python typing.NamedTuple — for tuple-with-fields.
  • Python typing.TypedDict — for the future Phase 2 (not in this track).
  • Mike Acton on data-oriented design — the "data is the API" framing that motivates NAMING data structures clearly.
  • Casey Muratori on module layer boundaries — the convention that each module owns its data and exposes a clear interface.