Private
Public Access
0
0
Commit Graph

2 Commits

Author SHA1 Message Date
ed aba35f9f4a conductor(spec): Add type registry to data_structure_strengthening track
Per user feedback (2026-06-06): instead of a follow-up 'TypedDict
Migration' track, add a NEW deliverable: an auto-generated type registry
in docs/type_registry/ that captures the field information in docs form.

New files:
- scripts/generate_type_registry.py (NEW): AST-based tool that reads
  src/ and writes per-source-file .md files with the fields of every
  @dataclass, NamedTuple, TypeAlias, TypedDict. Has --check (CI mode,
  exits 1 if registry would change) and --diff (dry run) modes.
- docs/type_registry/ (NEW, generated): index.md + per-source-file
  references (type_aliases.md, ai_client.md, models.md, etc.).
- tests/test_generate_type_registry.py (NEW): verify the generator.

Architecture updates:
- Section 3.6 (NEW): Type Registry architecture with example output.
- Section 3.7 (NEW): Why per-source-file docs (locality of reference).
- Section 1.1 (NEW): 'Why docs over TypedDict' analysis (3 reasons:
  lower upfront cost, better fit for AI workflow, auto-maintained).
- Goals table: registry added as a C (innovation) goal.
- Module layout: docs/type_registry/ and scripts/generate_type_registry.py
  added to the new files list.
- Migration: Phase 2 now includes the registry generator + initial docs.
- Out of scope: TypedDict migration REMOVED; 'auto-typing the field
  shape' added with the docs as the chosen approach.
- See Also: TypedDict follow-up REPLACED with 'Registry Maintenance &
  CI Integration' (smaller scope, just wires the generator into CI).

The 'cost we eat' is the LLM reading 200-500 lines of markdown per
query. This is bounded and proportional to actual information need.
The upfront cost of designing TypedDict schemas for every type is
unbounded. Tradeoffs favor the docs approach for v1; TypedDict can
come later as a future track if desired.
2026-06-06 18:06:34 -04:00
ed ed42a97a9b conductor(track): Initialize data_structure_strengthening_20260606
Track + metadata + state + tracks.md registration for the type-aliases
refactor that follows the audit_weak_types.py findings (430 weak sites
across 29 of 61 files; 86% concentrated in 6 high-traffic files).

Key design decisions (per user approval):
- 10 TypeAlias definitions in src/type_aliases.py (Metadata, CommsLogEntry,
  CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition,
  ToolCall, CommsLogCallback).
- 1 NamedTuple (FileItemsDiff) for the _reread_file_items return.
- Mechanical replacement of 345 weak sites across 6 files (NOT 430; the
  remaining 85 are in 23 lower-impact files deferred to future tracks).
- scripts/audit_weak_types.py gains a --strict mode and a baseline file
  (scripts/audit_weak_types.baseline.json) so the count is enforced.
- 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples
  + docs + archive.
- Honest about what's missing: TypedDict / @dataclass migration is a
  follow-up track (typed_dict_migration_20260606), not this one.
- Coexistence with the data_oriented_error_handling_20260606 track's
  Result[T] / ErrorInfo: the aliases are value-level (data types), Result
  is control-level (wrapper). They compose (Result[FileItems] is valid).
  No conflict.

Audit baseline:
- Pre-track: 430 weak sites, 0 strong patterns
- Target after Phase 1: ~60 weak sites (only the 23 lower-impact files)
- Top 4 unique type strings account for 86% of findings (4-6 aliases
  eliminate the bulk of the noise).

Not blocked by anything; can be executed independently of the other
pending tracks. Blocks typed_dict_migration_20260606 (the future Phase 2).
2026-06-06 17:49:22 -04:00