# Track: Data Structure Strengthening (Type Aliases + NamedTuples)

**Status:** Active (spec approved 2026-06-06)
**Initialized:** 2026-06-06
**Owner:** Tier 2 Tech Lead
**Priority:** Medium (developer + AI-readability; not a regression blocker)

---

## 1. Overview

This track introduces a small, focused set of `TypeAlias` definitions in a new `src/type_aliases.py` module and replaces 370+ anonymous `dict[str, Any]` / `list[dict[...]]` usages across 6 high-traffic files (`src/ai_client.py`, `src/app_controller.py`, `src/models.py`, `src/api_hook_client.py`, `src/project_manager.py`, `src/aggregate.py`). It also converts 2-3 tuple returns to `NamedTuple`s for self-documenting struct semantics.

**In addition**, the track introduces a new `docs/type_registry/` directory that contains **auto-generated** documentation describing the fields of every `TypeAlias`, `NamedTuple`, `@dataclass`, and `TypedDict` in `src/`. A new script `scripts/generate_type_registry.py` reads `src/` via AST and writes the docs. The coding agent runs this script as part of track completion (and CI runs it as a `--check` to detect drift).

The track is **data-grounded**: a new AST-based audit script (`scripts/audit_weak_types.py`, committed in `84fd9ac9`) found 430 weak type sites across 29 of 61 files. After whitespace normalization, only **26 unique type strings** exist; the top 4 (`list[dict[str, Any]]`, `dict[str, Any]`, `Dict[str, Any]`, `List[Dict[str, Any]]`) account for 86% of findings. A small set of well-named aliases eliminates the vast majority.

**The current codebase has ZERO strong type aliases** (no `TypeAlias`, no `NamedTuple`, no `pydantic.BaseModel` for these shapes). This is the worst case for AI readability — an LLM reading the code has zero schema hints and must guess the shape from usage at every call site.

**Scope is deliberately bounded.** The track adds **6 type aliases**, converts **2-3 tuple returns** to NamedTuples, and introduces the **type registry generator + initial generated docs**. It does NOT migrate to `TypedDict` or `@dataclass` schemas (the registry generator captures the field information in docs form, with much lower upfront cost). It does NOT touch the 23 lower-impact files; they remain as `dict[str, Any]` until a future track migrates them.

### 1.1 Why docs over TypedDict

The original draft of this spec proposed a follow-up track "TypedDict / dataclass Migration" that would convert every `Metadata` alias into a `TypedDict` with explicit fields. After user feedback, this was replaced with the type-registry approach for three reasons:

1. **Lower upfront cost.** `TypedDict` requires designing the schema for every type. The registry generator reads what already exists in code and writes it to docs. No schema design needed.
2. **Better fit for AI workflow.** An LLM that needs to know the fields of `CommsLogEntry` can `cat docs/type_registry/ai_client.md` once, then use the field info. The cost is a few hundred tokens of context, paid only when the LLM needs the schema.
3. **Auto-maintained.** The script runs as part of track completion and as a CI `--check`. The registry can never drift; if code changes, the agent regenerates the docs.

The "cost we eat" is the LLM reading the docs at query time. This is bounded (a few hundred tokens per query) and proportional to the actual information need.

## 2. Goals (Priority Order)

| Priority | Goal | Rationale |
|---|---|---|
| **A (primary value)** | Add 6 `TypeAlias` definitions to `src/type_aliases.py`: `Metadata`, `CommsLogEntry`, `CommsLog`, `FileItem`, `FileItems`, `HistoryMessage`. | Each alias names a concept that currently appears as `dict[str, Any]` or `list[dict[str, Any]]` in 30+ sites. The name is self-documenting; the underlying type is the same. |
| **A (primary value)** | Mechanical replacement of 370+ weak sites in 6 files: `src/ai_client.py`, `src/app_controller.py`, `src/models.py`, `src/api_hook_client.py`, `src/project_manager.py`, `src/aggregate.py`. | The audit shows 86% of findings are in these 6 files. A focused refactor here eliminates the bulk of the noise. |
| **B (architectural)** | The new aliases are the **canonical** names going forward. New code MUST use the aliases. Old code is migrated opportunistically (this track + future tracks). | One source of truth. The audit script (`scripts/audit_weak_types.py`) becomes a permanent CI gate that fails when new weak types are introduced. |
| **B (architectural)** | Audit script exits 0 with significantly fewer findings after the refactor. Re-running `--json` should show the count drop from 430 to ~60 (only the 23 lower-impact files remain). | Measurable success criterion. The audit script is the ground truth. |
| **C (optimization)** | Convert 2-3 tuple returns to `NamedTuple`s. Specifically: `_reread_file_items()` returns `Tuple[refreshed, changed]` becomes a `FileItemsDiff` NamedTuple. Other 1-occurrence tuples (screen coords, etc.) are converted opportunistically. | The tuple return pattern is rarer than the dict pattern (4 sites vs 430), but each conversion is high-value for self-documentation. |
| **C (documentation)** | Add a short "Data Structure Conventions" section to `conductor/product-guidelines.md` and a new `conductor/code_styleguides/type_aliases.md` reference. | The convention is visible in the project-level guidance. Future plans reference it. |
| **C (innovation)** | New `docs/type_registry/` directory with **auto-generated** documentation describing the fields of every `TypeAlias`, `NamedTuple`, `@dataclass`, and `TypedDict` in `src/`. New script `scripts/generate_type_registry.py` reads `src/` via AST and writes the docs. The script has a `--check` mode for CI: exits 1 if the registry would change. The coding agent runs the script as part of track completion. | The "docs over TypedDict" tradeoff: pay a small token cost at AI-query time (the LLM `cat`s the docs) instead of a large upfront cost (designing `TypedDict` schemas for every type). See §1.1. |
| **D (forward-looking)** | Plan a future "Registry Maintenance" track that promotes the type-registry generation to a CI gate (fail if `--check` reports drift). The registry becomes part of every track's commit workflow. NOT in this track; documented in §12.1. | The track ships the registry; the future track wires it into CI / track-completion workflows. |

### 2.1 Non-Goals (this track)

- **Not** converting `dict[str, Any]` to `TypedDict` or `@dataclass` directly in code. The type registry (added in Phase 2) captures the field information in docs form; a future track may convert the most-used aliases to `TypedDict` (giving schema hints via type hints instead of via docs), but that is a separate decision.
- **Not** touching the 23 lower-impact files. They stay as `dict[str, Any]` until a future incremental track migrates them. The audit script makes their weakness VISIBLE so the cost of ignoring them is documented.
- **Not** changing the `Result[T]` pattern from the `data_oriented_error_handling_20260606` track. The aliases complement `Result`; they don't replace it. (`ErrorInfo` is a `@dataclass`, not a `TypeAlias`; it's already structured.)
- **Not** adding pydantic models. The project doesn't currently use pydantic for these shapes; introducing it would be a much larger architectural decision.
- **Not** modifying the data_oriented_error_handling_20260606 track's `src/result_types.py`. The aliases live in a new file (`src/type_aliases.py`); they coexist with `Result`/`ErrorInfo`.
- **Not** changing the public API of any function. The aliases are TYPE-LEVEL ONLY; runtime behavior is identical.

## 3. Architecture

### 3.1 The Aliases

`src/type_aliases.py` (NEW, ~80 lines):

```python
from typing import Any, Callable, TypeAlias

# A single key-value record. The shape is intentionally open (Any value type)
# because different concepts use different value types (str for paths, int for
# counts, dict for nested structures, etc.). The name documents the SEMANTIC
# ROLE, not the structural shape.
Metadata: TypeAlias = dict[str, Any]

# A single entry in the AI comms log (the in-memory ring buffer of API
# requests/responses/timestamps/kind/direction). Used by _comms_log,
# _append_comms, get_comms_log, comms_log_callback, etc.
CommsLogEntry: TypeAlias = Metadata

# A list of comms log entries.
CommsLog: TypeAlias = list[CommsLogEntry]

# A single entry in the Application's discussion (the UI-layer entry list
# persisted to project TOML; see docs/guide_discussions.md §"Data Model").
# Per the docs refresh (2026-06-08), this has at least 7 fields:
# {role, content, collapsed, ts, thinking_segments?, usage?, read_mode?}.
# Plus optional extras (e.g., tag, comment from custom slices).
# Uses Metadata (dict[str, Any]) because the dict is intentionally OPEN —
# extra keys are allowed and ignored by the renderer. The alias docstring
# documents the minimum required keys, not the full schema.
#
# IMPORTANT (added 2026-06-08 per nagent_review Pitfall #4): this is the
# UI/curation-layer history. It is *distinct* from ProviderHistoryMessage
# below, which is the provider-side history (the bytes actually replayed
# to the LLM). Conflating them perpetuates the provider-history-divergence
# bug: user edits HistoryMessage.content via the discussion UI but
# ProviderHistoryMessage.content is not updated. The follow-up
# public_api_migration_20260606 track is the natural moment to unify.
HistoryMessage: TypeAlias = Metadata

# A list of history messages.
History: TypeAlias = list[HistoryMessage]

# Provider-side history entry: a single message passed to/from the LLM
# SDK (OpenAI/Anthropic/Gemini/DeepSeek/etc.). Per the docs refresh and
# the nagent_review (Pitfall #4), this is a DIFFERENT layer from
# HistoryMessage. Shape: {role: "user"|"assistant"|"tool"|"system",
# content: str | list[ContentBlock], tool_calls?: [...],
# tool_call_id?: str, name?: str}. Aliased to Metadata for the same
# reason HistoryMessage is (open shape; type aliases as semantic
# names, not structural constraints). The distinction from
# HistoryMessage is the alias name, not the underlying dict shape.
ProviderHistoryMessage: TypeAlias = Metadata

# A list of provider history messages.
ProviderHistory: TypeAlias = list[ProviderHistoryMessage]

# A single file item in the context. Per docs/guide_context_aggregation.md
# §"The FileItem Schema (Full)" (added 2026-06-08), this is a 9-field
# dataclass: {path, auto_aggregate, force_full, view_mode, selected,
# ast_signatures, ast_definitions, ast_mask, custom_slices, injected_at}.
# The alias does NOT point to Metadata — it points to the existing
# models.FileItem class. This is the only alias in the 10 that is not
# a dict alias; the others remain dict aliases for compatibility with
# the FileItem.to_dict()/from_dict() round-trip.
FileItem: TypeAlias = "models.FileItem"  # type: ignore[misc]

# A list of file items. The most common weak pattern in the codebase.
FileItems: TypeAlias = list[FileItem]

# A single tool definition (function name, description, parameters schema).
# Used by _build_anthropic_tools, _CACHED_ANTHROPIC_TOOLS, _get_anthropic_tools,
# and the corresponding openai-compatible / gemini / deepseek builders.
ToolDefinition: TypeAlias = Metadata

# A single tool call from the model (id, type, function: {name, arguments}).
# Used by response.tool_calls parsing across all providers.
ToolCall: TypeAlias = Metadata

# A callback that receives a comms log entry. Used by comms_log_callback,
# confirm_and_run_callback, etc.
CommsLogCallback: TypeAlias = Callable[[CommsLogEntry], None]
```

### 3.2 The NamedTuples (Phase 2)

`src/type_aliases.py` (continued):

```python
from typing import NamedTuple

# Return type of _reread_file_items. The two lists are conceptually distinct:
# refreshed = items whose mtime was checked and the content re-read; changed =
# items whose content actually changed (subset of refreshed).
class FileItemsDiff(NamedTuple):
 refreshed: FileItems
 changed: FileItems
```

(Optional, if 1-2 more tuple returns warrant conversion — e.g., `Optional[Tuple[int, int, int, int]]` for screen coords, etc. — add them as separate `NamedTuple`s with semantic names.)

### 3.3 Why These Specific Aliases

The 6 aliases were chosen to be **concept-distinct**: each names a different semantic role that the code uses. Using the same name (`Metadata`) for all of them would collapse the semantic distinction; using 30 names would exceed the AI's vocabulary budget. 6 is the sweet spot:

| Alias | Semantic role | Distinct from |
|---|---|---|
| `Metadata` | generic key-value record | (root) |
| `CommsLogEntry` | a single comms log entry | `HistoryMessage` (different lifecycle) |
| `HistoryMessage` | a single AI provider history message | `CommsLogEntry` (different lifecycle) |
| `FileItem` | a single file in the context | `ToolDefinition` (different shape: paths vs function specs) |
| `ToolDefinition` | a single tool definition | `FileItem`, `ToolCall` |
| `ToolCall` | a single tool call from the model | `ToolDefinition` (definition vs invocation) |

Some of these are aliased to `Metadata` (e.g., `CommsLogEntry: TypeAlias = Metadata`). This is intentional: Phase 2 can convert `Metadata` to a `TypedDict` (or split into per-concept `TypedDict`s) and the aliases continue to work without breaking changes. The aliases are STABLE NAMES; the underlying type can evolve.

### 3.4 Module Layout

```
src/
  type_aliases.py              # NEW: 6 TypeAliases + 1-3 NamedTuples
  ai_client.py                 # MODIFIED: import aliases; replace ~139 weak sites
  app_controller.py            # MODIFIED: import aliases; replace ~86 weak sites
  models.py                    # MODIFIED: import aliases; replace ~51 weak sites
  api_hook_client.py           # MODIFIED: import aliases; replace ~32 weak sites
  project_manager.py           # MODIFIED: import aliases; replace ~20 weak sites
  aggregate.py                 # MODIFIED: import aliases; replace ~17 weak sites
  mcp_client.py                # UNCHANGED (only 9 weak sites; below the threshold)

docs/
  type_registry/
    index.md                   # NEW (generated): top-level TOCs
    type_aliases.md            # NEW (generated): the 10 TypeAliases + 1 NamedTuple
    ai_client.md               # NEW (generated): per-source-file reference
    app_controller.md          # NEW (generated)
    models.md                  # NEW (generated)
    api_hook_client.md         # NEW (generated)
    project_manager.md         # NEW (generated)
    aggregate.md               # NEW (generated)
    result_types.md            # NEW (generated): from data_oriented_error_handling_20260606

conductor/
  product-guidelines.md        # MODIFIED: new "Data Structure Conventions" section
  code_styleguides/
    type_aliases.md            # NEW: the canonical reference

scripts/
  audit_weak_types.py          # already committed in 84fd9ac9; runs as CI gate
  generate_type_registry.py    # NEW: AST-based registry generator

tests/
  test_type_aliases.py         # NEW: verify the aliases import and resolve to the right types
  test_generate_type_registry.py # NEW: verify the generator's regex/AST patterns and output format
  (existing test files):       # MODIFIED: update the 6 files; existing tests should pass unchanged
```

### 3.5 Coexistence with `Result[T]` and `ErrorInfo`

The new `Metadata` family aliases are VALUE-LEVEL types (what's in a dict). The `Result[T]` from `data_oriented_error_handling_20260606` is a CONTROL-LEVEL wrapper (a data struct that includes errors). They compose:

```python
# Data-oriented error handling returns:
Result[CommsLogEntry]   # a Result wrapping a single comms log entry
Result[History]         # a Result wrapping a list of history messages
Result[FileItems]       # a Result wrapping a list of file items

# The aliases name the "T" in Result[T], not the Result itself.
```

This is consistent: `Result` is a generic that wraps any data type. Naming the data types (via `TypeAlias`) makes the generic concrete without changing the `Result` pattern.

### 3.6 Type Registry (Auto-Generated Docs)

`scripts/generate_type_registry.py` is a new AST-based tool that reads `src/` and writes `docs/type_registry/`. It runs as part of track completion (manually by the coding agent) and as a CI `--check` (automated).

**Output structure:**

```
docs/type_registry/
  index.md              # top-level: full table of contents + summary
  type_aliases.md       # the 10 TypeAliases from src/type_aliases.py
  ai_client.md          # per-source-file: all dataclasses, NamedTuples, TypeAliases defined or used here
  app_controller.md
  models.md
  api_hook_client.md
  project_manager.md
  aggregate.md
  ...
  (one .md per source file that has structs)
```

**Script behavior:**

```bash
# Generate / regenerate the registry (default mode)
python scripts/generate_type_registry.py

# Verify the registry is up-to-date (CI mode; exits 1 if drift)
python scripts/generate_type_registry.py --check

# Dry run: print what would change without writing
python scripts/generate_type_registry.py --diff
```

**For each `@dataclass` in `src/`, the script writes a section like:**

```markdown
## `src/models.py::Ticket`

**Kind:** `@dataclass`
**Fields:**
- `id: str` — unique ticket identifier
- `title: str` — human-readable title
- `status: str = "todo"` — current status
- `priority: int = 0` — priority for queue ordering
- `created_at: datetime.datetime` — when created
- `dependencies: list[str] = field(default_factory=list)` — ticket IDs this depends on
- `metadata: Metadata` — opaque key-value metadata (see type_aliases.md)
```

(Note: docstrings on fields are extracted from the source to provide the "—" descriptions. Fields without docstrings are documented with their name only.)

**For each `TypeAlias`, the script writes a section like:**

```markdown
## `src/type_aliases.py::CommsLogEntry`

**Kind:** `TypeAlias`
**Resolves to:** `Metadata`
**Used by:** `_comms_log`, `_append_comms`, `get_comms_log`, `comms_log_callback`, ...

**Note:** `CommsLogEntry` is a semantic alias for `Metadata`. For the canonical field semantics, see [`Metadata`](#metadata) (which is itself a generic `dict[str, Any]` until a future track converts it to a `TypedDict`).
```

**For each `NamedTuple`, the script writes a section like:**

```markdown
## `src/type_aliases.py::FileItemsDiff`

**Kind:** `NamedTuple`
**Fields:**
- `refreshed: FileItems` — items whose mtime was checked and content re-read
- `changed: FileItems` — items whose content actually changed (subset of refreshed)
```

**For each function that returns a structured type, the script documents the return type signature** (using `ast.unparse` on the return annotation).

### 3.7 Why Per-Source-File Docs (not one giant file)

A per-source-file layout matches the project's per-source-file guide structure (`docs/guide_ai_client.md`, `docs/guide_mcp_client.md`, etc.). The coding agent reads `docs/type_registry/ai_client.md` when working in `src/ai_client.py` — locality of reference. The `index.md` provides the cross-cutting view.

**The "token cost we eat" per LLM query is bounded:** a typical source file's registry is 200-500 lines of markdown. The LLM reads it once and caches the schema in context. Subsequent references to the same types don't re-fetch.

## 4. Per-File Refactor Plan

### 4.1 `src/ai_client.py` (139 sites — largest offender)

**Pattern:** `_anthropic_history: list[dict[str, Any]]` (and 5 sibling histories), `_comms_log: deque[dict[str, Any]]`, `get_comms_log -> list[dict[str, Any]]`, `_build_anthropic_tools -> list[dict[str, Any]]`, `_reread_file_items -> tuple[list[...], list[...]]`, etc.

**Refactor strategy:**
- Replace all 79 `dict[str, Any]` / `Dict[str, Any]` with `Metadata` or the more specific alias.
- Replace all 56 `list[dict[...]]` with `CommsLog` / `History` / `FileItems` / `ToolDefinitions` based on the SEMANTIC ROLE of the list.
- 2 `Optional[List[Dict[...]]]` with `Optional[FileItems]` (the `_CACHED_ANTHROPIC_TOOLS` is an Optional[ToolDefinitions]).
- 2 tuple-return literal returns: the `cast(...)` patterns in `_dispatch_tool`. Replace with `ToolCall` extraction.

**Naming heuristic:** for each list of dicts, look at the variable name + the function name to determine the semantic role. E.g., `_comms_log` → `CommsLog`; `_anthropic_history` → `History`; `_build_anthropic_tools` → `ToolDefinitions`; `_reread_file_items(file_items: list[...])` → `FileItems`.

### 4.2 `src/app_controller.py` (86 sites)

**Pattern:** `_pending_dialog: Optional[ConfirmDialog] = None` (stays as-is; this is a STRONG type already), `last_error: Optional[Dict[str, str]] = None` (could be `Optional[ErrorInfo]` from the data_oriented track), but most weak sites are in the `Hook API` request/response payloads and the `pre_tool_callback` family.

**Refactor strategy:**
- The 62 `dict_str_any` sites: replace with `Metadata` or `CommsLogEntry` based on context.
- The 20 `list_of_dict` sites: replace with the appropriate alias.
- The 4 `optional_dict` sites: replace with `Optional[Metadata]` (or `Optional[CommsLogEntry]` if the context is the hook request payload).

### 4.3 `src/models.py` (51 sites)

**Pattern:** Dataclass fields. E.g., `script: Optional[str] = None` (stays as-is; STRONG), but also `target_file: Optional[str] = None` and many fields where the type is `Optional[Dict[str, Any]]` (in dataclass fields).

**Refactor strategy:** Replace 48 `dict_str_any` with `Optional[Metadata]`; 3 `list_of_dict` with the appropriate alias.

### 4.4 `src/api_hook_client.py` (32 sites)

**Pattern:** HTTP request/response payloads. E.g., `payload: Dict[str, Any]`, `data: dict[str, Any]`.

**Refactor strategy:** 30 `dict_str_any` → `Metadata`; 2 `list_of_dict` → `list[Metadata]`.

### 4.5 `src/project_manager.py` (20 sites)

**Pattern:** TOML config dicts. E.g., `proj: dict[str, Any]`, `data: dict[str, Any]`.

**Refactor strategy:** 16 `dict_str_any` → `Metadata`; 3 `list_of_dict` → `list[Metadata]`; 1 `optional_dict` → `Optional[Metadata]`.

### 4.6 `src/aggregate.py` (17 sites)

**Pattern:** Aggregation result dicts. E.g., `result: dict[str, list[dict[str, Any]]]`.

**Refactor strategy:** 10 `dict_str_any` → `Metadata`; 7 `list_of_dict` → appropriate alias.

### 4.7 Phase 2 NamedTuple conversions

- **`_reread_file_items`** in `src/ai_client.py` (returns `Tuple[List[FileItem], List[FileItem]]`) → returns `FileItemsDiff`. Affects ~3-4 call sites.
- **1-2 screen-coord tuples** (1-occurrence each) — opportunistic. If the call site is clear and the names are obvious, convert; otherwise leave.

## 5. The Audit Script as a Permanent CI Gate

After this track, the audit script becomes a permanent CI gate. `scripts/audit_weak_types.py` exits 0 even when findings exist (it's informational). The CI gate uses a stricter mode:

```bash
# New mode: --strict, exits 1 if any new weak site is added in a PR
python scripts/audit_weak_types.py --strict
```

The `--strict` mode compares the current count to a baseline (stored in `scripts/audit_weak_types.baseline.json`). If the current count is HIGHER than the baseline, exit 1. The baseline is regenerated after this track to the post-refactor count (~60 findings, only the 23 lower-impact files remain).

This is documented in the spec but the actual `--strict` mode is implemented as part of the track (Phase 1 final task). Future PRs that introduce new `dict[str, Any]` or anonymous tuples will fail CI.

## 6. Configuration

No new dependencies. No new environment variables. No new config files.

The aliases live in `src/type_aliases.py` (pure stdlib `typing.TypeAlias`).

## 7. Testing Strategy

| Test File | Purpose | Coverage Target |
|---|---|---|
| `tests/test_type_aliases.py` | Verify the aliases import; verify they resolve to the expected types; verify they compose with `Result[T]` (e.g., `Result[FileItems]` is a valid generic). | 100% |
| `tests/test_audit_weak_types.py` | Verify the audit script's regex patterns are correct; verify the `Finding` dataclass is populated correctly; verify the report matches expectations. | 90% |
| `tests/test_ai_client.py` (existing) | Verify no regressions after the 139-site replacement. | 100% (regression) |
| `tests/test_app_controller.py` (existing) | Verify no regressions after the 86-site replacement. | 100% (regression) |
| `tests/test_models.py` (existing) | Verify no regressions after the 51-site replacement. | 100% (regression) |
| `tests/test_api_hook_client.py` (existing) | Verify no regressions after the 32-site replacement. | 100% (regression) |
| `tests/test_project_manager.py` (existing) | Verify no regressions after the 20-site replacement. | 100% (regression) |
| `tests/test_aggregate.py` (existing) | Verify no regressions after the 17-site replacement. | 100% (regression) |
| `tests/test_mcp_client.py` (existing) | Verify no regressions. (mcp_client is unchanged but the aliases may be adopted opportunistically in Phase 1.5 if convenient.) | 100% (regression) |

**Mocking strategy:** Existing tests use `unittest.mock.patch`; no changes needed.

**Audit baseline check:** After Phase 1, the audit script should report 0 NEW findings (the count may go UP if a few sites were missed, but the trend is DOWN). After Phase 2, the count should be at or below the pre-track baseline minus 50 (the targeted reductions).

## 8. Migration / Rollout

| Phase | What | Risk |
|---|---|---|
| **Phase 1 — Aliases + 6-file replacement + audit baseline** | Add `src/type_aliases.py`. Add `tests/test_type_aliases.py`. Mechanical replacement in 6 files. Add `--strict` mode to the audit script. Generate the new baseline. | Medium. ~345 sites of mechanical replacement. Mitigated by existing test coverage. |
| **Phase 2 — NamedTuples + type registry generator + initial docs + archive** | Convert 2-3 tuple returns to NamedTuples. Add `scripts/generate_type_registry.py` + the initial generated registry in `docs/type_registry/`. Add tests for the generator. Add `conductor/code_styleguides/type_aliases.md` and update `product-guidelines.md`. Manual smoke test. Archive the track. | Low. ~3-4 sites of tuple conversion. Generator is a self-contained AST tool. Docs-only changes. |

Each phase has its own checkpoint commit and git note.

## 9. Risks & Mitigations

| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Mechanical replacement misses a few sites; the count doesn't drop as expected. | Medium | Low | The audit script is the source of truth. Re-run after Phase 1; investigate any anomalies. |
| Renaming `dict[str, Any]` to `Metadata` (or another alias) changes how some tests introspect types (e.g., `isinstance(x, dict)`). | Low | Medium | The aliases are TYPE-LEVEL ONLY; at runtime, `Metadata` IS `dict[str, Any]` IS `dict`. `isinstance(x, dict)` continues to work. Test cases that use `get_type_hints()` may need updating; documented in the test plan. |
| A future contributor adds a new `dict[str, Any]` and the audit script doesn't catch it. | Low | Low | The audit script's regex patterns are exhaustive for the current 430 findings. New patterns (e.g., a new `Mapping[str, Any]`) would be missed. The track documents the patterns the script knows; future contributions of new patterns warrant extending the script. |
| The aliases conflict with the `Result[T]` and `ErrorInfo` from the data_oriented_error_handling track. | Low | Low | The aliases are VALUE-LEVEL (data types); `Result` and `ErrorInfo` are CONTROL-LEVEL (wrappers). They compose: `Result[FileItems]` is valid. No conflict. |
| The 6-file mechanical replacement is too large to review in one PR. | Medium | Low | Phase 1 is split into 6 sub-tasks (one per file) in the plan, each with its own commit. Reviewers can review file-by-file. |
| The 23 lower-impact files are NEVER migrated. | High | Low (acceptable) | The audit script stays in the codebase as a permanent CI gate. The cost of ignoring the 23 files is now VISIBLE. Future tracks can pick them up opportunistically. |
| The `docs/type_registry/` docs drift from the actual code. | Medium | Medium (LLM reads stale info) | The `--check` mode of the generator exits 1 if the registry would change. The coding agent runs the generator before each track's commit. A follow-up track (`type_registry_ci_20260606`) will wire `--check` into CI. |

## 10. Out of Scope (Explicit)

- **TypedDict / @dataclass migration** of the `Metadata` family. The type registry (added in Phase 2) captures the field information in docs form, with much lower upfront cost than `TypedDict` migration. A future track MAY convert the most-used aliases to `TypedDict` (giving the AI schema hints via type hints instead of via docs); this is a separate decision.
- **The 23 lower-impact files** (those with 1-9 weak sites each). Deferred; will be addressed opportunistically or in a future incremental track. **Note (added 2026-06-08):** this list is dominated by `src/gui_2.py` (26+ weak sites per `docs/guide_state_lifecycle.md` §"State Delegation" and §"Reset" — `_disc_entries_lock` references, `_last_ui_snapshot`, the `UISnapshot` capture/restore, the 30+ fields cleared in `_handle_reset_session`) and `src/mcp_client.py` (will be touched heavily by the parallel `mcp_architecture_refactor_20260606` track). The deferral is correct, but a *follow-up* track should explicitly call out gui_2.py and mcp_client.py as the next targets, rather than implying they're handled.
- **Adding pydantic models.** Not requested; would be a much larger architectural decision.
- **Changing function signatures at the runtime level.** The aliases are TYPE-LEVEL; runtime behavior is identical.
- **Modifying `scripts/audit_weak_types.py`'s regex patterns.** The patterns are correct for the current findings. If new patterns emerge, a future track can extend the script.
- **Migrating the data_oriented_error_handling_20260606 track's `src/result_types.py` aliases.** The 2 type-aliases modules are SEPARATE: `result_types.py` has `ErrorInfo` / `Result` / `ErrorKind`; `type_aliases.py` has `Metadata` / `CommsLog` / `FileItem` / etc. They don't overlap.

## 11. Open Questions

1. **The 6 aliases or 4?** The 6 listed in §3.1 are: `Metadata`, `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `FileItem`, `FileItems`, `ToolDefinition`, `ToolCall`, `CommsLogCallback`. That's 10. Should we cut to 4-6 to minimize the AI vocabulary? (Proposal: keep all 10; they're each named for a distinct concept, and the 10 names are self-explanatory. The "vocabulary cost" is the same as adding 10 new function names to a module — well within normal Python codebase scale.)
2. **Should `FileItem` and `ToolDefinition` be `TypedDict` from the start?** A `TypedDict` gives the AI field-level hints, not just a name. But introducing `TypedDict` requires knowing the FIELDS, which is a deeper semantic task. (Proposal: Phase 1 uses `TypeAlias = dict[str, Any]`; Phase 2 of a future track converts to `TypedDict`. Keeps the current track scope tight.)
3. **Should the audit script enforce a count threshold (e.g., "no more than 100 weak sites total") or a per-file threshold (e.g., "no file may have more than 50 weak sites")?** (Proposal: per-file threshold is more actionable. A future PR that introduces 20 new `dict[str, Any]` in `foo.py` would fail even if the total count didn't increase.)

## 12. See Also

### 12.1 Follow-up Track (planned; not in this spec)

**"Registry Maintenance & CI Integration"** (`type_registry_ci_20260606` or similar) — promotes the type-registry generator from a manual track-completion step to a CI gate. The track:
- Wires `python scripts/generate_type_registry.py --check` into CI; the PR fails if the registry is stale.
- Adds the registry to the per-track commit workflow: the coding agent runs the generator before marking a track complete, and includes the registry diff in the commit.
- Optionally adds a pre-commit hook that runs the generator and stages the diff.
- The "Type Registry Maintenance" track is the natural follow-up. Prerequisites: this track (so the generator exists and is tested).

### 12.2 Project References

- `scripts/audit_weak_types.py` (already committed; `84fd9ac9`) — the audit that found 430 weak sites.
- `docs/guide_testing.md` — test conventions.
- `docs/guide_models.md` — the existing `models.py:510-559 FileItem` dataclass is the *concrete* class the new `FileItem` alias points to. Per the 2026-06-08 docs refresh, the FileItem schema (9 fields + `__post_init__` normalizer) is documented in `docs/guide_context_aggregation.md §"The FileItem Schema (Full)"`.
- `docs/guide_context_aggregation.md` — added 2026-06-08. The `aggregate.py:142 build_file_items` function consumes the `FileItem` list; the `FileItems: TypeAlias` is the consumer-side type.
- `docs/guide_discussions.md` — added 2026-06-08. The entry dict shape (the `HistoryMessage` alias) is documented here. The shape has at least 7 fields (`{role, content, collapsed, ts, thinking_segments?, usage?, read_mode?}`) plus optional extras. The alias docstring notes the dict is *open* — extra keys are allowed.
- `docs/guide_state_lifecycle.md` — added 2026-06-08. The `App.__getattr__`/`__setattr__` state delegation (per `gui_2.py:666-675`) and the `UISnapshot` capture (`gui_2.py:735-789`) are the *correctness* the alias-typed code must preserve; aliases are TYPE-LEVEL ONLY and don't change runtime behavior.
- `conductor/code_styleguides/error_handling.md` (created in the data_oriented_error_handling_20260606 track) — the convention for `Result` types; the new type-aliases convention lives alongside. The two conventions are *complementary*: aliases name the *data* (`T` in `Result[T]`); `Result` wraps the *control flow*. See §3.5 of the spec.
- `conductor/product-guidelines.md` "Data-Oriented Error Handling" — the convention this track extends (Data Structure Strengthening is a new top-level convention in the same family).
- `conductor/tracks/data_oriented_error_handling_20260606/` — the previous track that established the convention format; this track uses the same pattern. The new `ProviderHistoryMessage` alias (added 2026-06-08) is the *concrete manifestation* of nagent_review Pitfall #4 (provider-history divergence) — the user's edits to the `HistoryMessage` (UI layer) are a different layer from the `ProviderHistoryMessage` (SDK layer), and conflating them perpetuates the bug.
- `conductor/tracks/mcp_architecture_refactor_20260606/` — the parallel major track. `mcp_client.py` is currently listed as "UNCHANGED (only 9 weak sites; below the threshold)" in the module layout, but the refactor will touch it heavily; the audit script should be re-run after the mcp refactor lands, and a follow-up type-aliases pass on mcp_client.py is the natural next target.
- `conductor/tracks/nagent_review_20260608/report.md` — added 2026-06-08. §6 (per-file memory) and §15 Pitfall #4 (provider history divergence) directly motivate the `HistoryMessage` vs `ProviderHistoryMessage` split in §3.1 of this spec.
- `conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md` — added 2026-06-08. §9 (edit-the-input, not the output) describes the bug the new alias split addresses.

### 12.3 External References

- **Python `typing.TypeAlias`** — the canonical mechanism for type aliases (PEP 613, Python 3.10+).
- **Python `typing.NamedTuple`** — for tuple-with-fields.
- **Python `typing.TypedDict`** — for the future Phase 2 (not in this track).
- **Mike Acton on data-oriented design** — the "data is the API" framing that motivates NAMING data structures clearly.
- **Casey Muratori on module layer boundaries** — the convention that each module owns its data and exposes a clear interface.