Compare commits
121 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| c8478ba61f | |||
| 0c58a97cdb | |||
| b450cb0972 | |||
| 929e2f2c36 | |||
| 9a7ff2834b | |||
| 3f68ff4295 | |||
| b3d3e1ed3f | |||
| a34426d401 | |||
| 517f3f4a6c | |||
| bb2a4843ae | |||
| d4b4be20ff | |||
| 8d67fd688d | |||
| 1a1cf8beea | |||
| 0e67bc27da | |||
| 47c3e4ed2e | |||
| 2987e37f85 | |||
| 1aaa2f626a | |||
| 4395329002 | |||
| 84df12a65e | |||
| 2e2b7cbc7e | |||
| d20e1c2e78 | |||
| 85baea8cf0 | |||
| 7ea414e988 | |||
| 74e5521dca | |||
| 702a3b649c | |||
| 7e61dd7d2f | |||
| 327fb0d06d | |||
| 29dd6aa6be | |||
| 1e404548e0 | |||
| 92b2ec4a75 | |||
| d1d98c85ce | |||
| 3c4dd5c20f | |||
| 99e955795f | |||
| 900b68009b | |||
| 35746d59ec | |||
| 8ff397cfd7 | |||
| 85799bdef1 | |||
| 593da35589 | |||
| cbc6592938 | |||
| 8bb7bc0b03 | |||
| 751b94d4e8 | |||
| f32e4fd268 | |||
| f690b4dea4 | |||
| f914b2bcd4 | |||
| 7fef95cc87 | |||
| c760b8e09d | |||
| f1d157bf33 | |||
| 077cdf20db | |||
| edd2f181eb | |||
| 16fbf5619f | |||
| 49fb0a1a13 | |||
| 7c3052c893 | |||
| ae745886a7 | |||
| e9b1138949 | |||
| 06287dbb95 | |||
| 76b10e734d | |||
| 0c7a12a3fa | |||
| 1dce32037a | |||
| e4ec494b89 | |||
| 91775ee391 | |||
| 6275c860bf | |||
| 1a739ecef5 | |||
| f08394a98c | |||
| 95a8fae234 | |||
| 4bbc69019e | |||
| b3ed4b1508 | |||
| 3172a6ac1d | |||
| ad9c028acc | |||
| 30c8b26381 | |||
| ea8bcdf389 | |||
| 275f34da6e | |||
| 0fabeaf4ce | |||
| 4a774eb341 | |||
| 5c5f347cf0 | |||
| e9fa69ddc1 | |||
| fef6c20ea0 | |||
| 901b1b0982 | |||
| cb85591fc8 | |||
| e19672b2e0 | |||
| 2ad4718c3c | |||
| ca4826ab31 | |||
| 4dd373d70d | |||
| f855967bb8 | |||
| 338573b1e8 | |||
| 7478090e71 | |||
| b942c3f8b9 | |||
| 4bfce93105 | |||
| fd95ea4879 | |||
| a96f946b40 | |||
| 1872b66f68 | |||
| 0318bfe9e2 | |||
| 9961e437fb | |||
| c4686787b6 | |||
| 91a96ce139 | |||
| 8bcde09476 | |||
| 747e3983bd | |||
| 0bc8abbe9a | |||
| 96007ebd77 | |||
| bf1f11ed6c | |||
| 6e6ba90e39 | |||
| a28d8723a8 | |||
| 4e658dd25c | |||
| cfdf8988fb | |||
| 647ad3d49d | |||
| 3669ce590c | |||
| f1c23c7da5 | |||
| 46a2245658 | |||
| ebadfda9d6 | |||
| 365fa554d9 | |||
| c1a15c45c5 | |||
| 548c4fef63 | |||
| ed0d198afe | |||
| 9ccdedeeb3 | |||
| 45a5e81406 | |||
| 94f4a4eee9 | |||
| 12fcc55cfc | |||
| 1c05305a98 | |||
| a22e0f5473 | |||
| c4c45d4a54 | |||
| 5c9249659f | |||
| 23b7b9357d |
@@ -26,3 +26,8 @@ temp_old_gui.py
|
||||
.antigravitycli
|
||||
.vscode
|
||||
.coverage
|
||||
|
||||
# Video analysis campaign artifacts (per conductor/tracks/video_analysis_campaign_20260621/spec.md FR8)
|
||||
conductor/tracks/video_analysis_*/artifacts/*.mp4
|
||||
conductor/tracks/video_analysis_*/artifacts/*.vtt
|
||||
# video.log intentionally committed (small text, useful for debugging)
|
||||
|
||||
@@ -49,7 +49,7 @@ Tracks that are unblocked and ready to start. Ordered by **dependency** (blocked
|
||||
| 15a | — | [Manual UX Validation — ASCII-Sketch Workflow](#track-manual-ux-validation--ascii-sketch-workflow-new-2026-06-08) | spec ✓, plan ✓, ready to start | (none — independent; NEW 2026-06-08) |
|
||||
| 15b | — | [Chunkification Optimization (Contingency)](#track-chunkification-optimization-new-2026-06-08-contingency) | spec ✓ (contingency), no plan | hard constraint surface (deferred) |
|
||||
| 16 | — | [GenCpp Dogfood Feedback Loop](#track-gencpp-dogfood-feedback-loop) | spec TBD | (none — independent; oldest pending track) |
|
||||
| 17 | — | [Code Path Audit](#track-code-path-audit) | spec TBD | test_infrastructure_hardening_20260609 (merged) |
|
||||
| 17 | A | [Code Path Audit](#track-code-path-audit) | spec ✓ + plan ✓ (revised 2026-06-08 post-4-tracks; **pre-flight adjusted 2026-06-21** with 2 new actions + 5 micro-benchmarks + no-TypeError assertion per `docs/handoffs/PROMPT_FOR_TIER_1.md`) | test_infrastructure_hardening_20260609 (merged), any_type_componentization_20260621 (shipped 2026-06-21), phase2_4_5_call_site_completion_20260621 (BLOCKER for the broadcast() TypeError fix; unblocks audit instrumentation) |
|
||||
| 23 | A (research) | [Intent-Based Scripting Languages Survey](#track-intent-based-scripting-languages-survey-new-2026-06-12) | spec ✓, plan pending | (none — independent; NEW 2026-06-12; **non-impl research track**, **time-sensitive: report must complete before nagent v2.2**) |
|
||||
| 24 | A (bugfix) | [AI Loop Regressions (MiniMax, Gemini, Gemini CLI, DeepSeek)](#track-ai-loop-regressions-minimax-gemini-gemini-cli-deepseek-new-2026-06-14) | spec ✓, plan ✓, shipped 2026-06-15 (with 1 critical `_api_generate` regression + 2 deferred bugs — see `doeh_test_thinking_cleanup_20260615`) | (none — independent; **NEW 2026-06-14**; user-blocking; 3 bugs from `data_oriented_error_handling_20260606`) |
|
||||
| 25 | B (research) | [Fable System Prompt Review (Critical Analysis)](#track-fable-system-prompt-review-critical-analysis-new-2026-06-17) | spec ✓, plan pending | (none — independent; **NEW 2026-06-17**; **non-impl research track**, **informs the deferred nagent-rebuild**; 10 cluster sub-reports + 17-section synthesis report >3500 LOC + 3 side artifacts; Fable artifact at `docs/artifacts/Fable System Prompt.txt` is local-only and **NEVER committed**) |
|
||||
@@ -63,6 +63,8 @@ Tracks that are unblocked and ready to start. Ordered by **dependency** (blocked
|
||||
| 21 | A | [Conductor Chronology (chronology.md canonical index)](#track-conductor-chronology) | spec ✓, plan ✓, 10/10 phases implemented; Phase 10 (user sign-off) pending; end-of-track report at `docs/reports/TRACK_COMPLETION_chronology_20260619.md` | (none — independent; **NEW 2026-06-19**; canonical-track infrastructure; the `superpowers_review_20260619` track is `blocked_by` this one) |
|
||||
| 22b | A (meta-tooling) | [Meta-Tooling Workflow Review — Past-Month LLM Behavior Analysis](#track-meta-tooling-workflow-review-past-month-llm-behavior-analysis) | spec ✓, plan ✓, metadata ✓, state ✓, **parked 2026-06-20** (current_phase=0); 11-phase plan; ≥4,000-LOC 4-part report; 13-15 atomic commits; Tier 1 anchor + 3 Tier 3 parallel sweeps | (none — independent; **NEW 2026-06-20**; sibling to nagent_review + fable_review + superpowers_review + intent_dsl_survey; produces workflow_improvements.md + implementation_sequencing.md as standalone inputs for a near-future "workflow improvements rebuild" track; research-only; no src/, tests/, AGENTS.md, conductor/*.md, .opencode/, or scripts/audit_*.py changes; **anti-sliming guard**: Phase 9 self-review + Phase 10 user review gate are literal hard gates per the chronology_20260619 handover) |
|
||||
| 26 | A (research) | [Video Analysis Campaign (12 videos, 5 clusters, Pass 1 of 3)](#track-video-analysis-campaign-20260621) | spec ✓, plan ✓, **14 folders scaffolded (1 umbrella + 12 children + 1 synthesis); Pass 1 of 3 (information extraction); awaiting Phase 0 tooling prerequisites (yt-dlp, cv2, imagehash install in repo venv)**; 12 children in execution order: CS229 → math foundations → Platonic/geometric → biological → CS336 → applied capstone; per-video target: 1000-10000 LOC markdown deep-dive report | (none — independent; **NEW 2026-06-21**; multi-track research campaign; 12 videos across 5 clusters (E: Stanford >1hr; A: math foundations; B: Platonic AI; C: biological/cognitive; D: applied); multi-pass handoff to Pass 2 (de-obfuscation via user's math encoding — USER must rediscover notation before Pass 2 starts) + Pass 3 (projection to applied domain — USER must articulate "own caveats" before Pass 3 starts); **lossless preservation directive**: Pass 1 artifacts must NOT be over-summarized (data cascades to Pass 2/3); **2 E-cluster videos failed oEmbed 401** (yt-dlp may still work; verify in Phase 1); reusable tooling: 5 TDD scripts in `scripts/video_analysis/` (download_video, extract_transcript, extract_keyframes, ocr_frames, synthesize_report) |
|
||||
| 27 | A | [Phase 2/4/5 Call-Site Completion (post any_type_componentization)](#track-phase2-4-5-call-site-completion-20260621) | spec ✓, plan ✓, metadata ✓, state ✓; **Tier 1 decided SHINK scope** to Phase 6a + 6b + 6d + 6e (~18 commits, ~3 hours Tier 2); **BLOCKER for `code_path_audit_20260607`** (the broadcast() TypeError contaminates audit instrumentation); see `docs/handoffs/PROMPT_FOR_TIER_1.md` | any_type_componentization_20260621 (parent; shipped 2026-06-21 with 48/89 sites + 1 runtime bug) | (**NEW 2026-06-21**; bugfix + refactor + test-infrastructure + Tier 2 cost analysis; Phase 6a: fix `HookServer.broadcast()` callers in `src/app_controller.py` + `src/events.py` + `src/gui_2.py` (5-10 sites) — migrate to `WebSocketMessage` signature; Phase 6b: complete `_send_grok` + `_send_minimax` + `_send_llama` `OpenAICompatibleRequest` migration (3 sites); Phase 6d: update those 3 senders' `NormalizedResponse` to use `UsageStats` (3 sites); **Phase 6e: Tier 2 produces `docs/reports/PHASE3_TIER2_ANALYSIS.md` (authoritative Phase 3 cost hypothesis; supersedes Tier 1's draft at `PHASE3_HYPOTHETICAL_PROMOTION.md` which stays as the placeholder; profiles all 6 senders + discovers hidden cross-references + provides refined cost estimates + recommendations for the future Phase 3 track)**; adds `tests/test_websocket_broadcast_regression.py` with "no-TypeError" assertion that the audit will reuse; **deferred**: Phase 3 (`provider_state.ProviderHistory` call-site migration in `ai_client.py` — 112 sites) → separate track post-audit; cross-phase coupling → separate track; `audit_tier2_leaks.py` sandbox-pollution fixes → infra track; pre-existing `test_gui2_custom_callback_hook_works` flake → separate investigation; **does NOT merge `tier2/any_type_componentization_20260621` branch** per Tier 2's reconnaissance framing; **Tier 2 owns the Phase 3 cost analysis (Tier 1's draft at `docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md` is the hypothesis; Tier 2's `PHASE3_TIER2_ANALYSIS.md` is the refined authoritative version)**) |
|
||||
| 28 | A | [Any-Type Componentization (Promote dict[str, Any] to dataclass(frozen=True))](#track-any-type-componentization-promote-dictstr-any-to-dataclassfrozentrue) | spec ✓, plan ✓, metadata ✓, state ✓, **shipped 2026-06-21** with 48/89 fat-struct sites promoted (Phases 1, 2, 4, 5 complete); Phase 3 (`provider_state` call-site migration in `ai_client.py`) DEFERRED to a separate track; 1 runtime bug surfaced (`HookServer.broadcast()` callers in `app_controller.py` + `events.py`); not merged; reconnaissance for `code_path_audit_20260607`; tier2 branch at 24 commits | (none — independent; **NEW 2026-06-21**; refactor + ai-readability + type-safety; ships: 3 new modules (`src/mcp_tool_specs.py`, `src/openai_schemas.py`, `src/provider_state.py`); 2 new audit scripts (`scripts/audit_dataclass_coverage.py` + `--strict` mode); styleguide `conductor/code_styleguides/type_aliases.md` §12 "When to Promote TypeAlias to dataclass"; type-registry regenerated; 130+ tests pass; **input artifact**: `docs/reports/ANY_TYPE_AUDIT_20260621.md`; **handoff docs**: `docs/handoffs/PROMPT_FOR_TIER_1.md` + `HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md` + `HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md`) |
|
||||
|
||||
**Note on numbering:** the legacy file used `0a`, `0b`, `0c`... and `0d`, `0e`, `0f`, `0g` for tracks created 2026-06-06+. This is the **git-blame sort order**, not a logical execution order. The new structure re-orders by dependency.
|
||||
|
||||
@@ -632,6 +634,38 @@ Lightweight chronology; full spec/plan/state per track is in the linked folder.
|
||||
*Link: [./tracks/code_path_audit_20260607/](./tracks/code_path_audit_20260607/), Spec: [./tracks/code_path_audit_20260607/spec.md](./tracks/code_path_audit_20260607/spec.md), Plan: [./tracks/code_path_audit_20260607/plan.md](./tracks/code_path_audit_20260607/plan.md) (to be authored by writing-plans skill)*
|
||||
*Goal: Build `src/code_path_audit.py` — a static-analysis tool that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. Output: custom postfix `.dsl` data + markdown + Mermaid + prefix tree text under `docs/reports/code_path_audit/<date>/`. The follow-up `pipeline_pruning_20260607` consumes the `.dsl` files; the markdown + tree are for human review. MMA worker spawn is **cold per user**. **Timing (revised 2026-06-08):** the audit must run *after* the 4 foundational tracks ship (`qwen_llama_grok`, `data_oriented_error_handling`, `data_structure_strengthening`, `mcp_architecture_refactor`); pre-4-tracks code is too stale to ground optimization decisions.*
|
||||
|
||||
*Pre-Flight Adjustments (2026-06-21, per `docs/handoffs/PROMPT_FOR_TIER_1.md` + `HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md`):*
|
||||
- *Add 2 new actions to per-action profiling: `provider_history_append` (the hot path Phase 3 will refactor; measures per-turn append latency + lock acquire time) + `websocket_broadcast` (the GUI thread's per-event cost; the path Phase 6a will fix)*
|
||||
- *Add 5 micro-benchmarks to `optimization_candidates.md`: `NormalizedResponse.__init__` (<1μs), `WebSocketMessage.__init__` (<5μs), `UsageStats.__init__` (<500ns), `ProviderHistory.lock` (<500ns), `ToolSpec.__init__` (<2μs)*
|
||||
- *Add the "no-TypeError-errors-on-any-thread" assertion: the audit fails if any `worker[queue_fallback] error: WebSocketServer.broadcast()` appears in harness output; backed by `tests/test_websocket_broadcast_regression.py`*
|
||||
- *Add the 89 fat-struct sites from `ANY_TYPE_AUDIT_20260621.md` §3 as instrumented targets; tags each with `(file:line, hot_path, cold_path, init_path)`*
|
||||
- *BLOCKER: `phase2_4_5_call_site_completion_20260621` (the broadcast() TypeError fix). The audit's per-action profiling is contaminated by the TypeError spam until Phase 6a merges. Recommended sequence: run the follow-up track first; after merge, launch the audit; the audit's per-action data informs the deferred Phase 3 + cross-phase coupling follow-up tracks*
|
||||
|
||||
#### Track: Phase 2/4/5 Call-Site Completion (post any_type_componentization) `[track-created: 2026-06-21]`
|
||||
*Link: [./tracks/phase2_4_5_call_site_completion_20260621/](./tracks/phase2_4_5_call_site_completion_20260621/), Spec: [./tracks/phase2_4_5_call_site_completion_20260621/spec.md](./tracks/phase2_4_5_call_site_completion_20260621/spec.md), Plan: [./tracks/phase2_4_5_call_site_completion_20260621/plan.md](./tracks/phase2_4_5_call_site_completion_20260621/plan.md), Metadata: [./tracks/phase2_4_5_call_site_completion_20260621/metadata.json](./tracks/phase2_4_5_call_site_completion_20260621/metadata.json), State: [./tracks/phase2_4_5_call_site_completion_20260621/state.toml](./tracks/phase2_4_5_call_site_completion_20260621/state.toml)*
|
||||
|
||||
*Status: 2026-06-21 — Active, Tier 1 decision pending Tier 2 implementation. **SHRUNK scope** per `PROMPT_FOR_TIER_1.md` Decision 1 (Phase 6a + 6b + 6d only; defer Phase 3 to its own track post-audit).*
|
||||
|
||||
*Goal: Three-phase focused track that **(a) fixes the `HookServer.broadcast()` runtime bug** introduced by `any_type_componentization_20260621` Phase 5 (the Phase 5 commit `e9fa69dd` changed `broadcast(channel, payload)` → `broadcast(message: WebSocketMessage)` but did not update internal callers in `src/app_controller.py`, `src/events.py`, `src/gui_2.py`); **(b) completes the `_send_grok` / `_send_minimax` / `_send_llama` Phase 2 migration** (the 3 OpenAI-compatible senders were deferred in t2_6 and still construct `OpenAICompatibleRequest(messages=[{"role": ..., "content": ...}])` instead of `messages=[ChatMessage(...)]`); **(c) updates those 3 senders' `NormalizedResponse` construction** to use the Phase 2 `UsageStats` dataclass. **Adds `tests/test_websocket_broadcast_regression.py` with a "no-TypeError-errors-on-any-thread" assertion that `code_path_audit_20260607` will reuse**.*
|
||||
|
||||
*Scope (per Tier 1's shrink decision):*
|
||||
- *Phase 6a (~7 commits): Fix `HookServer.broadcast()` callers in `src/app_controller.py:_run_pending_tasks_once_result` + `src/events.py` + `src/gui_2.py:_process_pending_gui_tasks`. Replace `broadcast(channel, payload)` with `broadcast(WebSocketMessage(channel=, payload=))`. Add regression test.*
|
||||
- *Phase 6b (~5 commits): Migrate `_send_grok` (L2532) + `_send_minimax` (L2616) + `_send_llama` (L2856) to construct `OpenAICompatibleRequest(messages=[ChatMessage(...)], ...)`. Update provider tests.*
|
||||
- *Phase 6d (~4 commits): Update those 3 senders' `NormalizedResponse` construction to use `usage=UsageStats(input_tokens=..., output_tokens=..., cache_read_tokens=..., cache_creation_tokens=...)` instead of 4 separate int fields.*
|
||||
- *Total: ~16 atomic commits, ~3 hours Tier 2 work.*
|
||||
|
||||
*Deferred (out of scope, per Tier 1's decision):*
|
||||
- *Phase 3 (`provider_state.ProviderHistory` call-site migration in `src/ai_client.py`): 112 sites across 6 senders (`_send_anthropic` 25, `_send_deepseek` 20, `_send_minimax` 21, `_send_qwen` 12, `_send_grok` 13, `_send_llama` 21). Qualitative cost estimate: ~+1-2ms per session; +8-15μs per `_send_anthropic` turn. Full analysis: `docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md`. The audit will quantify this before the Phase 3 track runs.*
|
||||
- *Cross-phase coupling: `OpenAICompatibleRequest.tools: list[dict[str, Any]]` → `list[ToolSpec]`. Deferred to a separate track.*
|
||||
- *`audit_tier2_leaks.py` sandbox-pollution fixes (3 failures): `--allowlist` for `mcp_paths.toml`, `opencode.json`, `.opencode/*`. Infrastructure track.*
|
||||
- *Pre-existing `test_gui2_custom_callback_hook_works` flake. Separate investigation.*
|
||||
|
||||
*`blocks: code_path_audit_20260607` (the broadcast() TypeError contaminates the audit's per-action profiling; this track unblocks the audit). `blocked_by: any_type_componentization_20260621` (parent track; shipped 2026-06-21; the tier2 branch is NOT merged).*
|
||||
|
||||
*Does NOT merge `tier2/any_type_componentization_20260621` branch per Tier 2's reconnaissance framing in `HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md` ("Use as input for the audit, not as a merge candidate"). The branch stays at 24 commits as the audit's reconnaissance warm-up.*
|
||||
|
||||
*Regression protocol (the lesson from `any_type_componentization_20260621`'s 10 test failures): after each Phase, run `uv run python scripts/run_tests_batched.py --tier tier-1-unit-core` FULLY (no stop-on-failure). After all phases complete, run all 11 tiers FULLY. The "no-TypeError" assertion is the canonical regression test.*
|
||||
|
||||
#### Track: GUI Architecture Refinement
|
||||
*Link: [./tracks/gui_architecture_refinement_20260512/](./tracks/gui_architecture_refinement_20260512/) (no spec.md; needs scoping before planning)*
|
||||
|
||||
|
||||
@@ -0,0 +1,198 @@
|
||||
{
|
||||
"track_id": "any_type_componentization_20260621",
|
||||
"name": "Any-Type Componentization (Promote dict[str, Any] to dataclass(frozen=True))",
|
||||
"initialized": "2026-06-21",
|
||||
"owner": "tier2-tech-lead",
|
||||
"priority": "medium",
|
||||
"status": "active",
|
||||
"type": "refactor + ai-readability + type-safety",
|
||||
"scope": {
|
||||
"new_files": [
|
||||
"src/mcp_tool_specs.py",
|
||||
"src/openai_schemas.py",
|
||||
"src/provider_state.py",
|
||||
"scripts/audit_dataclass_coverage.py",
|
||||
"scripts/audit_dataclass_coverage.baseline.json",
|
||||
"tests/test_audit_dataclass_coverage.py",
|
||||
"tests/test_mcp_tool_specs.py",
|
||||
"tests/test_openai_schemas.py",
|
||||
"tests/test_provider_state.py",
|
||||
"docs/type_registry/src_mcp_tool_specs.md",
|
||||
"docs/type_registry/src_openai_schemas.md",
|
||||
"docs/type_registry/src_provider_state.md",
|
||||
"docs/reports/TRACK_COMPLETION_any_type_componentization_20260621.md"
|
||||
],
|
||||
"modified_files": [
|
||||
"src/type_aliases.py",
|
||||
"src/mcp_client.py",
|
||||
"src/openai_compatible.py",
|
||||
"src/ai_client.py",
|
||||
"src/log_registry.py",
|
||||
"src/session_logger.py",
|
||||
"src/log_pruner.py",
|
||||
"src/gui_2.py",
|
||||
"src/api_hooks.py",
|
||||
"src/api_hook_client.py",
|
||||
"conductor/code_styleguides/type_aliases.md",
|
||||
"docs/type_registry/src_ai_client.md",
|
||||
"docs/type_registry/src_openai_compatible.md",
|
||||
"docs/type_registry/src_mcp_client.md",
|
||||
"docs/type_registry/src_api_hooks.md",
|
||||
"docs/type_registry/src_log_registry.md"
|
||||
],
|
||||
"deleted_files": []
|
||||
},
|
||||
"blocked_by": [
|
||||
"data_structure_strengthening_20260606"
|
||||
],
|
||||
"blocks": [
|
||||
"any_type_componentization_phase2_2026MMDD",
|
||||
"openai_tools_dataclass_bridge_2026MMDD"
|
||||
],
|
||||
"estimated_phases": 7,
|
||||
"spec": "spec.md",
|
||||
"plan": "plan.md (to be authored by writing-plans skill after spec approval)",
|
||||
"priority_order": "A (5 fat-struct conversions + audit gate) > B (JsonValue + styleguide §12) > C (registry updates) > D (cross-phase coupling follow-up)",
|
||||
"input_artifact": {
|
||||
"report": "docs/reports/ANY_TYPE_AUDIT_20260621.md",
|
||||
"date": "2026-06-21",
|
||||
"findings_total": 300,
|
||||
"candidates_identified": 5,
|
||||
"candidates_sites": 89
|
||||
},
|
||||
"reference_pattern": {
|
||||
"file": "src/vendor_capabilities.py",
|
||||
"lines": "64-76",
|
||||
"template": "@dataclass(frozen=True) + module-level _REGISTRY dict + factory function"
|
||||
},
|
||||
"candidates": {
|
||||
"p1_mcp_tool_specs": {
|
||||
"file": "src/mcp_client.py",
|
||||
"current": "MCP_TOOL_SPECS: list[dict[str, Any]] (45 tools)",
|
||||
"target_module": "src/mcp_tool_specs.py (new)",
|
||||
"sites": 8,
|
||||
"value": "HIGH"
|
||||
},
|
||||
"p1_openai_schemas": {
|
||||
"file": "src/openai_compatible.py",
|
||||
"current": "NormalizedResponse + OpenAICompatibleRequest with list[dict[str, Any]] fields",
|
||||
"target_module": "src/openai_schemas.py (new)",
|
||||
"sites": 17,
|
||||
"value": "HIGH"
|
||||
},
|
||||
"p2_provider_state": {
|
||||
"file": "src/ai_client.py",
|
||||
"current": "7× _<provider>_history + 7× _<provider>_history_lock module globals",
|
||||
"target_module": "src/provider_state.py (new)",
|
||||
"sites": 41,
|
||||
"value": "HIGH"
|
||||
},
|
||||
"p2_log_registry_session": {
|
||||
"file": "src/log_registry.py",
|
||||
"current": "self.data: dict[str, dict[str, Any]]",
|
||||
"target_module": "src/log_registry.py (inline)",
|
||||
"sites": 7,
|
||||
"value": "MEDIUM"
|
||||
},
|
||||
"p3_api_hooks_websocket": {
|
||||
"file": "src/api_hooks.py",
|
||||
"current": "def broadcast(channel, payload: dict[str, Any]) + _serialize_for_api",
|
||||
"target_module": "src/api_hooks.py (inline)",
|
||||
"sites": 16,
|
||||
"value": "LOW"
|
||||
}
|
||||
},
|
||||
"audit_ci_gate": {
|
||||
"script": "scripts/audit_dataclass_coverage.py",
|
||||
"modes": {
|
||||
"default": "informational (exit 0)",
|
||||
"--json": "machine-readable report",
|
||||
"--strict": "CI gate (exit 1 if current > baseline)",
|
||||
"--baseline": "path to baseline file (default: scripts/audit_dataclass_coverage.baseline.json)"
|
||||
},
|
||||
"baseline_after_track": "211 (300 Any sites - 89 promoted = 211 remaining)"
|
||||
},
|
||||
"phases": {
|
||||
"phase_0": {
|
||||
"name": "Shared scaffolding",
|
||||
"scope": "JsonValue TypeAlias + dataclass-coverage audit + styleguide §12",
|
||||
"estimated_commits": 3,
|
||||
"files": ["src/type_aliases.py", "scripts/audit_dataclass_coverage.py", "conductor/code_styleguides/type_aliases.md"]
|
||||
},
|
||||
"phase_1": {
|
||||
"name": "mcp_tool_specs (P1)",
|
||||
"scope": "src/mcp_tool_specs.py new; src/mcp_client.py refactor 8 sites",
|
||||
"estimated_commits": 10,
|
||||
"files": ["src/mcp_tool_specs.py", "src/mcp_client.py", "src/ai_client.py"]
|
||||
},
|
||||
"phase_2": {
|
||||
"name": "openai_schemas (P1)",
|
||||
"scope": "src/openai_schemas.py new; 17 sites in src/openai_compatible.py + src/ai_client.py",
|
||||
"estimated_commits": 10,
|
||||
"files": ["src/openai_schemas.py", "src/openai_compatible.py", "src/ai_client.py"]
|
||||
},
|
||||
"phase_3": {
|
||||
"name": "provider_state (P2)",
|
||||
"scope": "src/provider_state.py new; 41 sites in src/ai_client.py",
|
||||
"estimated_commits": 15,
|
||||
"files": ["src/provider_state.py", "src/ai_client.py"]
|
||||
},
|
||||
"phase_4": {
|
||||
"name": "log_registry Session (P2)",
|
||||
"scope": "7 sites in src/log_registry.py + 3 consumer files",
|
||||
"estimated_commits": 5,
|
||||
"files": ["src/log_registry.py", "src/session_logger.py", "src/log_pruner.py", "src/gui_2.py"]
|
||||
},
|
||||
"phase_5": {
|
||||
"name": "api_hooks WebSocketMessage (P3)",
|
||||
"scope": "16 sites in src/api_hooks.py",
|
||||
"estimated_commits": 5,
|
||||
"files": ["src/api_hooks.py"]
|
||||
},
|
||||
"phase_6": {
|
||||
"name": "Verify + archive",
|
||||
"scope": "Full audit + 11-tier regression + docs + archive move",
|
||||
"estimated_commits": 2,
|
||||
"files": ["docs/reports/TRACK_COMPLETION_*", "conductor/tracks.md"]
|
||||
}
|
||||
},
|
||||
"total_estimated_commits": 50,
|
||||
"ai_performance_analysis": {
|
||||
"win": "Closed-shape types vs open dicts. The AI now sees `.tool_calls[0].function.name` (field access; type-checked) instead of `tool_calls[0]['function']['name']` (3 nested dict-key lookups; untyped). Static analysis can verify field existence.",
|
||||
"cost": "Migration overhead (~50 commits). New dataclass vocabulary for the AI to learn (similar to the 10 TypeAliases from data_structure_strengthening). Cross-phase coupling deferred (Phase 2's tools field stays as list[dict[str, Any]] for now).",
|
||||
"caveat": "Frozen dataclasses are slightly slower to construct than dict literals (~microseconds). For hot paths (per-provider history append), this is negligible. The JSON wire format (`JsonValue`) is type-level only; runtime serialization is unchanged.",
|
||||
"honest_assessment": "Net win. The 5 candidates are the highest-value fat-struct sites identified by the audit. Promoting them to frozen dataclasses + registries adds type safety, IDE autocomplete, and dispatch verification. The remaining 211 Any sites are intentional flexibility (Patterns 3/4/5) and stay as Any."
|
||||
},
|
||||
"architectural_invariant": "Frozen dataclasses are the canonical pattern for closed-shape data in this codebase. TypeAlias remains the canonical pattern for open-shape data. The decision tree lives in conductor/code_styleguides/type_aliases.md §12 (added in Phase 0).",
|
||||
"threading_constraint": "Phase 3 (provider_state) consolidates 7 locks into a single _PROVIDER_HISTORIES dict. Each ProviderHistory instance owns its own lock (via default_factory=threading.Lock). The lock semantics are unchanged from the current per-provider locks.",
|
||||
"verification_criteria": [
|
||||
"src/mcp_tool_specs.py exists with ToolParameter + ToolSpec + registry",
|
||||
"src/openai_schemas.py exists with ToolCall + ChatMessage + UsageStats",
|
||||
"src/provider_state.py exists with ProviderHistory + _PROVIDER_HISTORIES dict",
|
||||
"src/log_registry.py has Session + SessionMetadata dataclasses",
|
||||
"src/api_hooks.py has WebSocketMessage + JsonValue TypeAlias usage",
|
||||
"src/type_aliases.py extended with JsonPrimitive + JsonValue",
|
||||
"scripts/audit_dataclass_coverage.py exists with --strict mode",
|
||||
"scripts/audit_dataclass_coverage.baseline.json committed",
|
||||
"conductor/code_styleguides/type_aliases.md has §12 When to Promote section",
|
||||
"6 new test files exist with 48+ tests (Phase 0 audit: 6, Phase 1: 8, Phase 2: 10, Phase 3: 10, Phase 4: 8, Phase 5: 6)",
|
||||
"All existing tests pass (no regressions in 11-tier batched run)",
|
||||
"audit_weak_types.py --strict exits 0",
|
||||
"audit_dataclass_coverage.py --strict exits 0",
|
||||
"generate_type_registry.py --check exits 0 (5 new .md files appear)",
|
||||
"docs/reports/TRACK_COMPLETION_any_type_componentization_20260621.md written",
|
||||
"Track archived; conductor/tracks.md updated"
|
||||
],
|
||||
"sequencing_note": "Per user direction 2026-06-21: this track is NOT blocked by code_path_audit_20260607. The two tracks are orthogonal (semantic clarity vs runtime cost). Both can run in parallel.",
|
||||
"links": {
|
||||
"input_report": "docs/reports/ANY_TYPE_AUDIT_20260621.md",
|
||||
"parent_track": "conductor/tracks/data_structure_strengthening_20260606/",
|
||||
"reference_pattern": "src/vendor_capabilities.py",
|
||||
"audit_template": "scripts/audit_weak_types.py",
|
||||
"type_alias_module": "src/type_aliases.py",
|
||||
"code_styleguide": "conductor/code_styleguides/type_aliases.md",
|
||||
"error_handling_styleguide": "conductor/code_styleguides/error_handling.md",
|
||||
"testing_guide": "docs/guide_testing.md",
|
||||
"parallel_track": "conductor/tracks/code_path_audit_20260607/"
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,633 @@
|
||||
# Track: Any-Type Componentization (Promote `dict[str, Any]` to `dataclass(frozen=True)`)
|
||||
|
||||
**Status:** Active (spec approved 2026-06-21)
|
||||
**Initialized:** 2026-06-21
|
||||
**Owner:** Tier 2 Tech Lead
|
||||
**Priority:** Medium (developer + AI-readability; not a regression blocker)
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
The `data_structure_strengthening_20260606` track established the `TypeAlias` convention: 10 aliases + 1 `NamedTuple` in `src/type_aliases.py`, replacing 416 of 528 weak-type sites (79% reduction) across 6 high-traffic files. The aliases are **renames** — they point to the same underlying `dict[str, Any]` / `list[dict[str, Any]]` shapes. The alias names document intent; they do not add type safety.
|
||||
|
||||
A follow-on audit (`docs/reports/ANY_TYPE_AUDIT_20260621.md`, committed 2026-06-21) identified **5 fat-struct candidates** that warrant promotion to `dataclass(frozen=True)` definitions, following the `src/vendor_capabilities.py` pattern (`frozen=True` dataclass + module-level registry + factory function). This track is the implementation of the audit's recommendations.
|
||||
|
||||
**The 5 candidates (89 of the 300 `Any` usages, ~30%):**
|
||||
|
||||
| Rank | Target | Sites | Value |
|
||||
|---|---|---:|---|
|
||||
| P1 | `src/mcp_client.py: MCP_TOOL_SPECS` (45 tools) | 8 | HIGH — 180 implicit fields become explicit |
|
||||
| P1 | `src/openai_compatible.py: NormalizedResponse + OpenAICompatibleRequest` | 17 | HIGH — well-documented OpenAI schema |
|
||||
| P2 | `src/ai_client.py: 7× ProviderHistory + 7 locks` | 41 | HIGH — 14 module globals → 1 dict |
|
||||
| P2 | `src/log_registry.py: Session metadata` | 7 | MEDIUM — 2 levels of structural anonymity |
|
||||
| P3 | `src/api_hooks.py: WebSocketMessage + JsonValue` | 16 | LOW — generic serialization |
|
||||
|
||||
**The audit's 5-pattern taxonomy (`ANY_TYPE_AUDIT_20260621.md` §2.2):** only Pattern 1 (JSON-shaped payloads) and Pattern 2 (per-provider message lists) are componentization candidates. Patterns 3 (SDK holders), 4 (`__getattr__`), 5 (generic serialization) stay as `Any` — see §10.
|
||||
|
||||
**Scope is deliberately bounded.** The track promotes the 5 fat-struct candidates to `dataclass(frozen=True)`. It does NOT migrate all 300 `Any` usages; it does NOT convert `TypeAlias` definitions to `TypedDict`; it does NOT introduce Pydantic. The audit's recommended boundary is honored.
|
||||
|
||||
**Sequencing (revised 2026-06-21 per user direction).** The audit's §5.2 originally proposed gating this track behind `code_path_audit_20260607`. **This gate is removed.** The two tracks are orthogonal:
|
||||
- `code_path_audit` measures RUNTIME cost per call (CPU/memory)
|
||||
- `any_type_componentization` measures SEMANTIC clarity (AI-readability)
|
||||
|
||||
Neither depends on the other. The code_path_audit's report can retroactively flag which any-type candidates it found in hot paths as a side benefit. Both tracks can run in parallel.
|
||||
|
||||
## 2. Goals (Priority Order)
|
||||
|
||||
| Priority | Goal | Rationale |
|
||||
|---|---|---|
|
||||
| **A (primary)** | Convert the 5 fat-struct candidates (89 sites) to `dataclass(frozen=True)` definitions following `src/vendor_capabilities.py` template | The audit identified these as the high-value subset; aliases alone don't add type safety |
|
||||
| **A (primary)** | New `scripts/audit_dataclass_coverage.py` with `--strict` mode | The CI gate that prevents regression of dataclass promotion work |
|
||||
| **B (architectural)** | New `JsonValue` recursive `TypeAlias` (in `src/type_aliases.py`) for the JSON wire format | Phase 5 (api_hooks) needs it; reusable for future JSON-boundary tracks |
|
||||
| **B (architectural)** | New styleguide §12 "When to Promote `TypeAlias` to `dataclass`" section | Captures the rule that future contributors can apply without re-deriving |
|
||||
| **C (documentation)** | Update `docs/type_registry/` registry entries for the 3 new modules + modified files | The type-registry generator picks them up automatically; `--check` mode validates |
|
||||
| **D (forward-looking)** | Note the cross-phase coupling opportunity (Phase 2's `OpenAICompatibleRequest.tools` could consume Phase 1's `ToolSpec`) as a follow-up track — NOT in this track | Cross-phase coupling is a future concern; this track ships each phase independently |
|
||||
|
||||
### 2.1 Non-Goals (this track)
|
||||
|
||||
- **NOT** converting all 300 `Any` usages. Only the 5 fat-struct candidates.
|
||||
- **NOT** converting SDK client holders (Pattern 3). They stay as `Any` — heterogeneous SDK types.
|
||||
- **NOT** changing the `__getattr__` dynamic-dispatch pattern (Pattern 4). It stays as `Any` — intentional.
|
||||
- **NOT** typing the generic serialization functions (Pattern 5). They stay as `Any` — input-driven.
|
||||
- **NOT** converting `dict[str, Any]` to `TypedDict` (per `data_structure_strengthening_20260606` §10, deferred to a separate decision).
|
||||
- **NOT** introducing Pydantic (would be a much larger architectural decision).
|
||||
- **NOT** changing function signatures at the runtime level (dataclasses are serialization-compatible via `from_dict()`/`to_dict()` helpers).
|
||||
- **NOT** waiting for `code_path_audit_20260607` (per the §1 sequencing revision).
|
||||
|
||||
## 3. Architecture
|
||||
|
||||
### 3.1 The Reference Pattern: `src/vendor_capabilities.py`
|
||||
|
||||
`src/vendor_capabilities.py` is the **canonical "module-level abstraction layer"** (76 lines):
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class VendorCapabilities:
|
||||
vendor: str
|
||||
model: str
|
||||
vision: bool = False
|
||||
tool_calling: bool = True
|
||||
caching: bool = False
|
||||
# ... 22 named fields total
|
||||
|
||||
_REGISTRY: dict[tuple[str, str], VendorCapabilities] = {}
|
||||
|
||||
def register(cap: VendorCapabilities) -> None: ...
|
||||
def get_capabilities(vendor: str, model: str) -> VendorCapabilities: ...
|
||||
```
|
||||
|
||||
**Properties that make this pattern successful:**
|
||||
|
||||
| Property | Why it matters |
|
||||
|---|---|
|
||||
| `frozen=True` | Immutable; thread-safe; no accidental mutation |
|
||||
| Named fields | Every field is addressable by name (no `dict['vision']` lookups) |
|
||||
| Module-level registry | O(1) lookup; no instantiation overhead |
|
||||
| Wildcard `*` model | Fallback for unregistered models |
|
||||
| Flat (no nesting) | Single cache-line access for most queries |
|
||||
| Registration pattern | Extensible without modifying existing code |
|
||||
|
||||
All 5 fat-struct candidates follow this template.
|
||||
|
||||
### 3.2 The Conversion API: `from_dict` / `to_dict`
|
||||
|
||||
For each new dataclass, the convention is:
|
||||
|
||||
```python
|
||||
@classmethod
|
||||
def from_dict(cls, data: Metadata) -> Result[Self, ErrorInfo]:
|
||||
"""Parse a dict into the dataclass. Returns Result for graceful failure."""
|
||||
|
||||
def to_dict(self) -> Metadata:
|
||||
"""Serialize the dataclass back to a dict (for logging, JSON wire)."""
|
||||
```
|
||||
|
||||
The `Result[Self, ErrorInfo]` return type follows the data-oriented convention from `data_oriented_error_handling_20260606` (see `conductor/code_styleguides/error_handling.md`). Conversion failures (missing required field, type mismatch, malformed JSON) return `ErrorInfo` instead of raising.
|
||||
|
||||
### 3.3 The `JsonValue` Recursive Type
|
||||
|
||||
Phase 5 (`api_hooks.py`) needs a type for arbitrary JSON-shaped data. Python 3.12+ has `type` statement; earlier versions need a `TypeAlias`:
|
||||
|
||||
```python
|
||||
# src/type_aliases.py (extension)
|
||||
JsonPrimitive: TypeAlias = str | int | float | bool | None
|
||||
JsonValue: TypeAlias = JsonPrimitive | list["JsonValue"] | dict[str, "JsonValue"]
|
||||
```
|
||||
|
||||
This makes `_serialize_for_api(obj: Any) -> JsonValue` and `broadcast(message: WebSocketMessage)` (with `payload: JsonValue`) explicit.
|
||||
|
||||
### 3.4 Module Layout
|
||||
|
||||
```
|
||||
src/
|
||||
type_aliases.py # MODIFIED: add JsonPrimitive + JsonValue TypeAliases
|
||||
vendor_capabilities.py # UNCHANGED: the reference pattern (no edits)
|
||||
mcp_tool_specs.py # NEW: ToolParameter + ToolSpec dataclasses + registry
|
||||
openai_schemas.py # NEW: ToolCall + ToolCallFunction + ChatMessage + UsageStats
|
||||
provider_state.py # NEW: ProviderHistory dataclass + _PROVIDER_HISTORIES dict
|
||||
mcp_client.py # MODIFIED: MCP_TOOL_SPECS -> list[ToolSpec]; update dispatch
|
||||
openai_compatible.py # MODIFIED: NormalizedResponse + OpenAICompatibleRequest use ChatMessage/UsageStats/ToolSpec
|
||||
ai_client.py # MODIFIED: replace 14 globals with _PROVIDER_HISTORIES dict; update _send_grok/_send_minimax/_send_llama
|
||||
log_registry.py # MODIFIED: add Session + SessionMetadata dataclasses
|
||||
session_logger.py # MODIFIED: use Session dataclass
|
||||
log_pruner.py # MODIFIED: use Session dataclass
|
||||
gui_2.py # MODIFIED: Log Management panel uses Session
|
||||
api_hooks.py # MODIFIED: add WebSocketMessage dataclass; _serialize_for_api -> JsonValue
|
||||
|
||||
scripts/
|
||||
audit_dataclass_coverage.py # NEW: counts anonymous dict[str, Any] per module; --strict mode
|
||||
audit_dataclass_coverage.baseline.json # NEW: baseline count post-track
|
||||
audit_weak_types.py # UNCHANGED (still gates the alias convention)
|
||||
generate_type_registry.py # UNCHANGED (registry generator; auto-includes new modules)
|
||||
|
||||
conductor/
|
||||
code_styleguides/
|
||||
type_aliases.md # MODIFIED: add §12 "When to Promote TypeAlias to dataclass"
|
||||
|
||||
tests/
|
||||
test_mcp_tool_specs.py # NEW
|
||||
test_openai_schemas.py # NEW
|
||||
test_provider_state.py # NEW
|
||||
test_log_registry_dataclasses.py # NEW (or extend existing)
|
||||
test_api_hooks_dataclasses.py # NEW (or extend existing)
|
||||
test_audit_dataclass_coverage.py # NEW
|
||||
(existing test files): # MODIFIED: update call sites; existing tests should pass unchanged
|
||||
|
||||
docs/
|
||||
type_registry/ # AUTO-GENERATED: new modules appear automatically
|
||||
mcp_tool_specs.md # NEW (generated)
|
||||
openai_schemas.md # NEW (generated)
|
||||
provider_state.md # NEW (generated)
|
||||
api_hooks.md # NEW (generated; replaces existing 16-Any-flavored entry)
|
||||
log_registry.md # NEW (generated)
|
||||
src_ai_client.md # MODIFIED (generated; ProviderHistory changes shape)
|
||||
src_openai_compatible.md # MODIFIED (generated; NormalizedResponse changes shape)
|
||||
src_mcp_client.md # MODIFIED (generated; MCP_TOOL_SPECS changes shape)
|
||||
|
||||
docs/reports/
|
||||
TRACK_COMPLETION_any_type_componentization_20260621.md # NEW (end-of-track)
|
||||
```
|
||||
|
||||
### 3.5 Coexistence with the Type-Alias Convention
|
||||
|
||||
The new dataclasses **complement** the `TypeAlias` convention (not replace it):
|
||||
|
||||
- **`TypeAlias`** = rename a shape that's still a dict at runtime (cheap; 0 structural cost)
|
||||
- **`dataclass(frozen=True)`** = give the shape fields + methods + invariants (expensive; changes runtime type)
|
||||
|
||||
The decision tree (now in styleguide §12):
|
||||
|
||||
```
|
||||
Is the shape open-ended (extra keys allowed, no invariants)? ──► TypeAlias (Metadata)
|
||||
Is the shape a closed set of named fields with specific types? ──► dataclass(frozen=True)
|
||||
Is the shape a JSON wire format (recursive)? ──► JsonValue (TypeAlias)
|
||||
```
|
||||
|
||||
The 5 fat-struct candidates are closed sets of named fields. The 112 remaining `dict[str, Any]` sites in the audit's 27 lower-impact files are mostly open-ended (provider payloads, config dicts) and stay as `TypeAlias` (or even raw `dict[str, Any]`) until a future track identifies them as closed-shape candidates.
|
||||
|
||||
## 4. Per-Phase Plan
|
||||
|
||||
### Phase 0: Shared scaffolding (1 task; ~3 commits)
|
||||
|
||||
- **WHERE:** `src/type_aliases.py`, `scripts/audit_dataclass_coverage.py`, `conductor/code_styleguides/type_aliases.md`
|
||||
- **WHAT:** Add `JsonPrimitive` + `JsonValue` TypeAliases; new audit script that counts anonymous `dict[str, Any]` per module with `--strict` mode (CI gate); styleguide §12
|
||||
- **HOW:** Use the existing `audit_weak_types.py` script as the template for the new audit; follow `audit_weak_types.py:130-160` for the `--strict` mode pattern
|
||||
- **SAFETY:** No behavior change; type aliases + new audit script are additive
|
||||
- **TESTS:** `tests/test_audit_dataclass_coverage.py` (6+ tests; mirror `tests/test_audit_weak_types.py`)
|
||||
- **VERIFICATION:** `uv run python scripts/audit_dataclass_coverage.py --strict` exits 0 (baseline == current)
|
||||
- **COMMIT:** `feat(scaffold): JsonValue TypeAlias + dataclass-coverage audit + styleguide §12`
|
||||
|
||||
### Phase 1: `src/mcp_tool_specs.py` (P1, 8 sites)
|
||||
|
||||
**Current state** (`src/mcp_client.py:1944-2747`):
|
||||
```python
|
||||
MCP_TOOL_SPECS: list[dict[str, Any]] = [
|
||||
{ "name": "py_remove_def", "description": "...", "parameters": {...} },
|
||||
# ... 44 more dicts of identical shape
|
||||
]
|
||||
TOOL_NAMES: set[str] = {t['name'] for t in MCP_TOOL_SPECS} # line 2747
|
||||
```
|
||||
|
||||
**Refactor target:**
|
||||
```python
|
||||
# src/mcp_tool_specs.py (NEW; ~120 lines)
|
||||
@dataclass(frozen=True)
|
||||
class ToolParameter:
|
||||
name: str
|
||||
type: str # "string" | "integer" | "boolean" | "object" | "array"
|
||||
description: str
|
||||
required: bool = False
|
||||
enum: Optional[list[str]] = None
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ToolSpec:
|
||||
name: str
|
||||
description: str
|
||||
parameters: tuple[ToolParameter, ...]
|
||||
category: str = "file"
|
||||
|
||||
_REGISTRY: dict[str, ToolSpec] = {}
|
||||
|
||||
def register(spec: ToolSpec) -> None: ...
|
||||
def get_tool_spec(name: str) -> ToolSpec: ...
|
||||
def get_tool_schemas() -> list[ToolSpec]: ...
|
||||
def tool_names() -> set[str]: ...
|
||||
```
|
||||
|
||||
**Call sites to update:**
|
||||
- `src/mcp_client.py:1944` `native_names = {t['name'] for t in MCP_TOOL_SPECS}` → `mcp_tool_specs.tool_names()`
|
||||
- `src/mcp_client.py:1958` `res = list(MCP_TOOL_SPECS)` → `res = mcp_tool_specs.get_tool_schemas()`
|
||||
- `src/mcp_client.py:1972` `MCP_TOOL_SPECS: list[dict[str, Any]] = [...]` → moved to `mcp_tool_specs.py:_REGISTRY`
|
||||
- `src/mcp_client.py:2747` `TOOL_NAMES: set[str] = {t['name'] for t in MCP_TOOL_SPECS}` → `mcp_tool_specs.tool_names()`
|
||||
- `src/ai_client.py:560,582,1012` `mcp_client.TOOL_NAMES` → `mcp_tool_specs.tool_names()` (3 sites)
|
||||
- `src/app_controller.py:2103,2962,3263` `models.AGENT_TOOL_NAMES` (cross-check; not directly `TOOL_NAMES`)
|
||||
|
||||
**Compatibility shim:** keep `mcp_client.MCP_TOOL_SPECS` and `mcp_client.TOOL_NAMES` as thin re-exports for the duration of this phase, then remove in a follow-up commit if no external test breaks. Alternative: deprecate immediately and fix the 3 callers.
|
||||
|
||||
**Tests:** `tests/test_mcp_tool_specs.py` (8+ tests)
|
||||
- Verify all 45 tools are registered
|
||||
- Verify `get_tool_spec("py_remove_def")` returns correct spec
|
||||
- Verify `tool_names()` matches expected set
|
||||
- Verify `from_dict()` returns `Result` for valid + invalid inputs
|
||||
- Verify `TOOL_NAMES` is a subset of `models.AGENT_TOOL_NAMES` (cross-module invariant)
|
||||
|
||||
### Phase 2: `src/openai_schemas.py` (P1, 17 sites)
|
||||
|
||||
**Current state** (`src/openai_compatible.py:10-30`):
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class NormalizedResponse:
|
||||
text: str
|
||||
tool_calls: list[dict[str, Any]] # FAT: JSON tool call shape
|
||||
usage_input_tokens: int
|
||||
usage_output_tokens: int
|
||||
usage_cache_read_tokens: int
|
||||
usage_cache_creation_tokens: int
|
||||
raw_response: Any # FAT: SDK-specific response (Pattern 3, stay)
|
||||
|
||||
@dataclass
|
||||
class OpenAICompatibleRequest:
|
||||
messages: list[dict[str, Any]] # FAT: message shape
|
||||
model: str
|
||||
...
|
||||
tools: Optional[list[dict[str, Any]]] = None # FAT: tool schema (cross-phase: Phase 1)
|
||||
extra_body: Optional[dict[str, Any]] = None # FAT: arbitrary params
|
||||
```
|
||||
|
||||
**Refactor target:**
|
||||
```python
|
||||
# src/openai_schemas.py (NEW; ~150 lines)
|
||||
@dataclass(frozen=True)
|
||||
class ToolCall:
|
||||
id: str
|
||||
type: str = "function"
|
||||
function: "ToolCallFunction"
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ToolCallFunction:
|
||||
name: str
|
||||
arguments: str # JSON string
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ChatMessage:
|
||||
role: str # "system" | "user" | "assistant" | "tool"
|
||||
content: str
|
||||
tool_calls: Optional[tuple[ToolCall, ...]] = None
|
||||
tool_call_id: Optional[str] = None
|
||||
name: Optional[str] = None
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class UsageStats:
|
||||
input_tokens: int
|
||||
output_tokens: int
|
||||
cache_read_tokens: int = 0
|
||||
cache_creation_tokens: int = 0
|
||||
|
||||
# NormalizedResponse becomes:
|
||||
@dataclass(frozen=True)
|
||||
class NormalizedResponse:
|
||||
text: str
|
||||
tool_calls: tuple[ToolCall, ...]
|
||||
usage: UsageStats # was 4 separate fields
|
||||
raw_response: Any # Unavoidable: SDK-specific
|
||||
|
||||
# OpenAICompatibleRequest becomes:
|
||||
@dataclass
|
||||
class OpenAICompatibleRequest:
|
||||
messages: list[ChatMessage]
|
||||
model: str
|
||||
temperature: float = 0.0
|
||||
top_p: float = 1.0
|
||||
max_tokens: int = 8192
|
||||
tools: Optional[list[dict[str, Any]]] = None # Cross-phase: Phase 1's ToolSpec (deferred)
|
||||
tool_choice: str = "auto"
|
||||
stream: bool = False
|
||||
stream_callback: Optional[Callable[[str], None]] = None
|
||||
extra_body: Optional[dict[str, Any]] = None
|
||||
```
|
||||
|
||||
**Cross-phase coupling (deferred):** `OpenAICompatibleRequest.tools: Optional[list[ToolSpec]]` would reuse Phase 1's `ToolSpec`. This is a follow-up track concern; Phase 2 ships with `list[dict[str, Any]]` for that field with a `# TODO(future-track): migrate to list[ToolSpec]` note.
|
||||
|
||||
**Call sites to update:**
|
||||
- `src/openai_compatible.py` itself (~5 internal functions consuming `NormalizedResponse`)
|
||||
- `src/ai_client.py` `_send_grok()`, `_send_minimax()`, `_send_llama()` (~3 functions; they construct `NormalizedResponse` and `OpenAICompatibleRequest`)
|
||||
- `src/api_hook_client.py` (the API hook payloads may serialize these; cross-check)
|
||||
|
||||
**Tests:** `tests/test_openai_schemas.py` (10+ tests)
|
||||
- Verify `ChatMessage.from_dict()` round-trip for all 4 roles
|
||||
- Verify `UsageStats` field access
|
||||
- Verify `ToolCall.function.arguments` JSON parsing
|
||||
- Verify `Result[Self, ErrorInfo]` error cases (missing required field, malformed JSON)
|
||||
- Verify `NormalizedResponse.raw_response` is still `Any` (Pattern 3)
|
||||
|
||||
### Phase 3: `src/provider_state.py` (P2, 41 sites)
|
||||
|
||||
**Current state** (`src/ai_client.py:111-133`):
|
||||
```python
|
||||
_anthropic_history: list[Metadata] = []
|
||||
_anthropic_history_lock: threading.Lock = threading.Lock()
|
||||
_deepseek_history: list[Metadata] = []
|
||||
_deepseek_history_lock: threading.Lock = threading.Lock()
|
||||
# ... 7 providers × 2 vars = 14 module globals
|
||||
```
|
||||
|
||||
Plus the SDK client holders (Pattern 3, stay):
|
||||
```python
|
||||
_gemini_chat: Any = None
|
||||
_deepseek_client: Any = None
|
||||
# ... 7 SDK clients stay as-is
|
||||
```
|
||||
|
||||
**Refactor target:**
|
||||
```python
|
||||
# src/provider_state.py (NEW; ~80 lines)
|
||||
@dataclass
|
||||
class ProviderHistory:
|
||||
messages: list[Metadata] = field(default_factory=list)
|
||||
lock: threading.Lock = field(default_factory=threading.Lock)
|
||||
|
||||
def append(self, message: Metadata) -> None: ...
|
||||
def get_all(self) -> list[Metadata]: ...
|
||||
def replace_all(self, messages: list[Metadata]) -> None: ...
|
||||
def clear(self) -> None: ...
|
||||
|
||||
_PROVIDER_HISTORIES: dict[str, ProviderHistory] = {
|
||||
"anthropic": ProviderHistory(),
|
||||
"deepseek": ProviderHistory(),
|
||||
"minimax": ProviderHistory(),
|
||||
"qwen": ProviderHistory(),
|
||||
"grok": ProviderHistory(),
|
||||
"llama": ProviderHistory(),
|
||||
}
|
||||
|
||||
def get_history(provider: str) -> ProviderHistory:
|
||||
return _PROVIDER_HISTORIES[provider]
|
||||
```
|
||||
|
||||
**Call sites to update** (`src/ai_client.py`):
|
||||
- Lines 463-466: `global _anthropic_history` declarations (4 declarations across `cleanup()` and similar) → removed
|
||||
- Lines 483-499: 7 `with _<provider>_history_lock:` blocks in `cleanup()` → `get_history("<provider>").clear()`
|
||||
- Lines 1447, 1457-1460, 1469, 1471, 1475, 1489, 1503, 1506, 1582: ~20 `_anthropic_history` references → `get_history("anthropic").messages` and `.append()`
|
||||
- Lines 2201-2202, 2221-2222, 2353, 2360, 2418-2420: ~10 `_deepseek_history` references → `get_history("deepseek")`
|
||||
- Lines 2575-2588, 2605: ~10 `_grok_history` references → `get_history("grok")`
|
||||
- Lines 2659-2685: ~10 `_minimax_history` references → `get_history("minimax")`
|
||||
- Lines 2812-2823: ~8 `_qwen_history` references → `get_history("qwen")`
|
||||
- Lines 2901-2925: ~8 `_llama_history` references → `get_history("llama")`
|
||||
- The `_repair_<provider>_history()` and `_trim_<provider>_history()` helpers (lines 1353, 1381, 2138, 2462, 2482) take `history: list[Metadata]` parameters — they stay as-is; call sites pass `get_history("<provider>").messages`
|
||||
|
||||
**Tests:** `tests/test_provider_state.py` (10+ tests)
|
||||
- Verify `ProviderHistory.append()` is thread-safe (lock semantics)
|
||||
- Verify `ProviderHistory.clear()` resets the list atomically
|
||||
- Verify `get_history("anthropic")` returns the same instance across calls (singleton)
|
||||
- Verify `replace_all()` swaps the list under lock
|
||||
- Verify `cleanup()` clears all 6 histories
|
||||
- Verify SDK client holders (`_gemini_chat`, etc.) are NOT touched (Pattern 3 preserved)
|
||||
|
||||
**Risk:** This phase has the largest ripple. The 41 sites include 14 module globals (renames are mechanical) + ~27 call-site updates. The audit may undercount if helper functions in `ai_client.py` reference these globals beyond the listed lines. **Mitigation:** Phase 3 has its own audit baseline snapshot before starting; any new finds get added to the phase's task list.
|
||||
|
||||
### Phase 4: `src/log_registry.py: Session` (P2, 7 sites)
|
||||
|
||||
**Current state** (`src/log_registry.py:58`):
|
||||
```python
|
||||
self.data: dict[str, dict[str, Any]] = {} # session_id -> session content
|
||||
```
|
||||
|
||||
The outer key is `session_id: str`. The inner dict has implicit fields: `path`, `start_time`, `whitelisted`, `metadata`.
|
||||
|
||||
**Refactor target** (inline in `src/log_registry.py`):
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class SessionMetadata:
|
||||
message_count: int = 0
|
||||
errors: int = 0
|
||||
size_kb: int = 0
|
||||
whitelisted: bool = False
|
||||
reason: str = ''
|
||||
timestamp: Optional[str] = None
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class Session:
|
||||
session_id: str
|
||||
path: str
|
||||
start_time: str # ISO format
|
||||
whitelisted: bool = False
|
||||
metadata: Optional[SessionMetadata] = None
|
||||
|
||||
@dataclass
|
||||
class LogRegistry:
|
||||
registry_path: str
|
||||
data: dict[str, Session] = field(default_factory=dict) # typed!
|
||||
```
|
||||
|
||||
**Call sites to update:**
|
||||
- `src/log_registry.py` `get_old_non_whitelisted_sessions()` and 6 other internal methods
|
||||
- `src/session_logger.py` `open_session()`, `close_session()`
|
||||
- `src/log_pruner.py` `prune_old_logs()`
|
||||
- `src/gui_2.py` Log Management panel (find via `grep "log_registry"` or "session_log")
|
||||
|
||||
**Tests:** `tests/test_log_registry_dataclasses.py` (or extend existing)
|
||||
- Verify `Session.from_dict()` round-trip
|
||||
- Verify `Session.metadata` is `Optional[SessionMetadata]`
|
||||
- Verify `LogRegistry.data: dict[str, Session]` (no longer `dict[str, dict[str, Any]]`)
|
||||
- Verify `prune_old_logs()` works on the new schema
|
||||
|
||||
### Phase 5: `src/api_hooks.py: WebSocketMessage + JsonValue` (P3, 16 sites)
|
||||
|
||||
**Current state** (`src/api_hooks.py:48-145`):
|
||||
```python
|
||||
def _get_app_attr(app: Any, name: str, default: Any = None) -> Any: ...
|
||||
def _set_app_attr(app: Any, name: str, value: Any) -> None: ...
|
||||
def _serialize_for_api(obj: Any) -> Any: ...
|
||||
def broadcast(self, channel: str, payload: dict[str, Any]) -> None: ...
|
||||
```
|
||||
|
||||
The `_get_app_attr` / `_set_app_attr` are Pattern 4 (stay as `Any`).
|
||||
The `_serialize_for_api` and `broadcast` are the JSON wire format.
|
||||
|
||||
**Refactor target** (inline in `src/api_hooks.py`):
|
||||
```python
|
||||
from src.type_aliases import JsonValue
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class WebSocketMessage:
|
||||
channel: str
|
||||
payload: JsonValue
|
||||
|
||||
def _serialize_for_api(obj: Any) -> JsonValue: ...
|
||||
|
||||
def broadcast(self, message: WebSocketMessage) -> None: ...
|
||||
```
|
||||
|
||||
**Call sites to update:** `broadcast()` callers (~5-10 sites across `src/app_controller.py`, `src/gui_2.py`)
|
||||
|
||||
**Tests:** extend `tests/test_api_hooks.py`
|
||||
- Verify `WebSocketMessage` is `frozen=True` (cannot mutate)
|
||||
- Verify `JsonValue` round-trip via `_serialize_for_api`
|
||||
- Verify `_get_app_attr` / `_set_app_attr` signatures are unchanged (Pattern 4 preserved)
|
||||
|
||||
### Phase 6: Verification + docs + archive
|
||||
|
||||
- Run full audit: `audit_weak_types.py --strict` exits 0; `audit_dataclass_coverage.py --strict` exits 0
|
||||
- Run full regression suite: 11-tier batched (per `test_sandbox_hardening_20260619` convention)
|
||||
- Regenerate `docs/type_registry/` via `scripts/generate_type_registry.py`
|
||||
- Verify `--check` mode passes
|
||||
- Write end-of-track report at `docs/reports/TRACK_COMPLETION_any_type_componentization_20260621.md`
|
||||
- Move `conductor/tracks/any_type_componentization_20260621/` → `conductor/tracks/archive/`
|
||||
- Update `conductor/tracks.md`
|
||||
|
||||
## 5. The Audit Script as a Permanent CI Gate
|
||||
|
||||
The new `scripts/audit_dataclass_coverage.py` mirrors `audit_weak_types.py`'s design:
|
||||
|
||||
**Modes:**
|
||||
- Default: informational (exits 0; prints report)
|
||||
- `--json`: machine-readable
|
||||
- `--strict`: CI gate (exits 1 if current anonymous `dict[str, Any]` count > baseline)
|
||||
- `--baseline`: path to baseline file (default: `scripts/audit_dataclass_coverage.baseline.json`)
|
||||
|
||||
**What it counts:** sites where the structural anonymity persists (the 89 this track targets). Aliases that point to `dict[str, Any]` (e.g., `Metadata`, `CommsLogEntry`) are NOT counted; the audit counts actual `dict[str, Any]` / `list[dict[...]]` annotations and the remaining `Any` usages outside the 5 candidates.
|
||||
|
||||
**Baseline:** committed at `scripts/audit_dataclass_coverage.baseline.json` post-Phase-6. Expected: 211 `Any` sites remain (300 - 89 = 211). The audit's 5-pattern taxonomy justifies the boundary.
|
||||
|
||||
## 6. Configuration
|
||||
|
||||
No new dependencies. No new environment variables. No new config files.
|
||||
|
||||
The new dataclasses use stdlib `dataclasses.dataclass(frozen=True)` (Python 3.11+).
|
||||
|
||||
## 7. Testing Strategy
|
||||
|
||||
| Test File | Purpose | Coverage Target |
|
||||
|---|---|---|
|
||||
| `tests/test_audit_dataclass_coverage.py` | Verify the audit script's patterns + `--strict` mode + baseline | 90% |
|
||||
| `tests/test_mcp_tool_specs.py` | Verify 45 tools registered + dispatch + cross-module invariants | 100% |
|
||||
| `tests/test_openai_schemas.py` | Verify ChatMessage/UsageStats/ToolCall round-trips + Result[T] errors | 100% |
|
||||
| `tests/test_provider_state.py` | Verify ProviderHistory thread safety + cleanup + singleton semantics | 100% |
|
||||
| `tests/test_log_registry_dataclasses.py` | Verify Session dataclass + LogRegistry typed | 100% |
|
||||
| `tests/test_api_hooks.py` (extended) | Verify WebSocketMessage + JsonValue round-trip | 100% |
|
||||
| `tests/test_ai_client.py` (existing) | No regressions after 41-site Phase 3 refactor | 100% (regression) |
|
||||
| `tests/test_mcp_client.py` (existing) | No regressions after Phase 1 dispatch refactor | 100% (regression) |
|
||||
| `tests/test_openai_compatible.py` (existing) | No regressions after Phase 2 refactor | 100% (regression) |
|
||||
| `tests/test_log_registry.py` (existing) | No regressions after Phase 4 | 100% (regression) |
|
||||
| `tests/test_api_hooks.py` (existing) | No regressions after Phase 5 | 100% (regression) |
|
||||
|
||||
**Mocking strategy:** Per the project's structural testing contract (`docs/guide_testing.md`), Tier 3 workers do NOT use `unittest.mock.patch` for core infrastructure. The new tests use the real dataclasses with synthetic `Metadata` inputs.
|
||||
|
||||
**Audit baseline check:** Post-Phase-6, `audit_dataclass_coverage.py` should report ≤ baseline count. The dataclass-coverage baseline is expected to be 211 (300 `Any` minus the 89 candidates promoted in this track).
|
||||
|
||||
## 8. Migration / Rollout
|
||||
|
||||
| Phase | What | Risk | Commits |
|
||||
|---|---|---|---|
|
||||
| **0 — Scaffolding** | Add `JsonValue`, new audit, styleguide §12 | Low (additive only) | ~3 |
|
||||
| **1 — `mcp_tool_specs`** | P1 (8 sites) | Medium (45 tools × ~4 params) | ~10 |
|
||||
| **2 — `openai_schemas`** | P1 (17 sites) | Medium (cross-module: ai_client consumers) | ~10 |
|
||||
| **3 — `provider_state`** | P2 (41 sites) | **Medium-High** (14 globals + ~27 call sites) | ~15 |
|
||||
| **4 — `log_registry` Session** | P2 (7 sites) | Low (self-contained file) | ~5 |
|
||||
| **5 — `api_hooks` WebSocketMessage** | P3 (16 sites) | Low (Pattern 5 preserved) | ~5 |
|
||||
| **6 — Verify + archive** | Audit + tests + docs | Low | ~2 |
|
||||
| **Total** | | | **~50 atomic commits** |
|
||||
|
||||
Each phase has its own checkpoint commit and git note (per `conductor/workflow.md` Task Workflow §9-10).
|
||||
|
||||
## 9. Risks & Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|---|---|---|---|
|
||||
| Phase 3 (`provider_state`) has more call sites than the audit identified. | Medium | Medium | Snapshot an audit baseline before Phase 3; any new finds get added to the phase's task list. Worst case: Phase 3 grows to ~20 commits (still tractable). |
|
||||
| Phase 1 (`mcp_tool_specs`) dispatch map (`_dispatch_table`) has dead-code that the typed refactor surfaces. | Medium | Low | The dataclass + registry pattern naturally surfaces dead code. Add a "dead code removal" task to Phase 1 if discovered. |
|
||||
| The `JsonValue` recursive type fails to type-check in Python 3.11. | Low | Low | Use `TypeAlias` with forward-reference (`"JsonValue"`) in `list` and `dict`; tested in Phase 0. |
|
||||
| A consumer of `mcp_client.TOOL_NAMES` lives outside `src/` (e.g., `tests/`, `conductor/`) and breaks. | Medium | Low | Compatibility shim (re-export) for 1 commit; remove in follow-up. |
|
||||
| `frozen=True` dataclasses break code that mutates dict fields. | Medium | Medium | Audit each candidate for mutation patterns before phase; convert mutators to `replace()` (returns new instance) per `dataclasses.replace()`. |
|
||||
| The new audit script's `--strict` mode is too strict (rejects valid uses). | Low | Medium | Set baseline conservatively (post-Phase-6 actual count); tighten only after 1 week of clean CI. |
|
||||
| Cross-phase coupling (Phase 2's `tools: list[ToolSpec]`) creates merge conflict with Phase 1. | Low | Low | Explicitly deferred; Phase 2 ships with `list[dict[str, Any]]` + TODO comment. |
|
||||
| The 5 candidates leave 211 `Any` sites untouched; users expect more. | Low | Low | Document in §10 explicitly; the audit's 5-pattern taxonomy justifies the boundary. |
|
||||
|
||||
## 10. Out of Scope (Explicit)
|
||||
|
||||
- **The remaining 211 `Any` usages** (300 - 89 = 211). The audit's 5-pattern taxonomy identifies these as Patterns 3/4/5 (SDK holders, dynamic dispatch, generic serialization) — they stay as `Any` because they're intentionally flexible. A future track may identify additional fat-struct candidates; this track does not.
|
||||
- **TypedDict migration** of any alias. Per `data_structure_strengthening_20260606` §10, deferred.
|
||||
- **Pydantic models.** Not requested; would be a much larger architectural decision.
|
||||
- **The `JsonValue` recursive type as a runtime validator** (e.g., `jsonschema` validation). The TypeAlias is a type hint, not a runtime guard.
|
||||
- **Conversion of the `TypeAlias` definitions themselves to `dataclass` (e.g., making `Metadata: TypeAlias = dict[str, Any]` a `class Metadata(dict)`).** The aliases document intent; converting them is a separate decision.
|
||||
- **Cross-phase coupling** between Phase 1 and Phase 2 (Phase 2's `OpenAICompatibleRequest.tools: list[ToolSpec]`). Deferred to a follow-up track.
|
||||
- **Wait for `code_path_audit_20260607` to ship.** Per the §1 sequencing revision, the two tracks are orthogonal.
|
||||
- **Modifying the audit scripts** (`audit_weak_types.py`, `audit_dataclass_coverage.py`) beyond the new `--strict` mode in Phase 0. Future extensions are separate tracks.
|
||||
|
||||
## 11. Decisions Made During Spec Authoring
|
||||
|
||||
The following design choices were resolved during spec drafting (formerly "Open Questions"):
|
||||
|
||||
1. **`ToolSpec.parameters: tuple[ToolParameter, ...]` (RESOLVED)** — Tuple wins. Immutable matches `frozen=True` philosophy; serialization uses explicit `to_dict()` helper. `list[ToolParameter]` would force runtime conversion at every JSON boundary.
|
||||
2. **`ProviderHistory.clear()` reuses the lock (RESOLVED)** — The lock protects the list, not the lock instance. `default_factory=threading.Lock` in the dataclass field ensures every `ProviderHistory` gets its own lock on construction; `clear()` does NOT reset the lock.
|
||||
3. **`Session.metadata: Optional[SessionMetadata] = None` (RESOLVED)** — `Optional` with default None wins. Matches existing call patterns in `session_logger.py` where sessions may exist without metadata populated yet.
|
||||
4. **`JsonValue` lives in `src/type_aliases.py` (RESOLVED)** — Existing file is the canonical location for TypeAliases. New file would split the convention across 2 modules.
|
||||
5. **No compatibility shim in Phase 1 (RESOLVED)** — Phase 1's 3 call sites in `ai_client.py` are updated immediately. The shim would add a commit of pure re-exports that gets removed in the next commit anyway.
|
||||
|
||||
## 12. See Also
|
||||
|
||||
### 12.1 Project References
|
||||
|
||||
- `docs/reports/ANY_TYPE_AUDIT_20260621.md` — the audit that drove this track (the input artifact)
|
||||
- `conductor/tracks/data_structure_strengthening_20260606/` — the parent track (the 10 TypeAliases + 1 NamedTuple; this track builds on it)
|
||||
- `src/vendor_capabilities.py` — the reference pattern (`frozen=True` dataclass + module-level registry + factory)
|
||||
- `src/type_aliases.py` — the TypeAlias module (extended in Phase 0 with `JsonValue`)
|
||||
- `scripts/audit_weak_types.py` — the audit script template (`scripts/audit_dataclass_coverage.py` mirrors its design)
|
||||
- `conductor/code_styleguides/type_aliases.md` — the canonical styleguide (Phase 0 adds §12)
|
||||
- `conductor/code_styleguides/error_handling.md` — the `Result[T]` convention (used by `from_dict()`)
|
||||
- `docs/guide_testing.md` — the test infrastructure (live_gui fixture, structural testing contract)
|
||||
- `docs/reports/TRACK_COMPLETION_data_structure_strengthening_20260606.md` — the parent track's end-of-track report
|
||||
- `conductor/tracks/code_path_audit_20260607/` — the parallel runtime-cost track (NOT a blocker)
|
||||
|
||||
### 12.2 External References
|
||||
|
||||
- **Python `dataclasses.dataclass(frozen=True)`** — the canonical pattern for immutable named records (PEP 681 for `dataclass_transform`; Python 3.11+ stdlib).
|
||||
- **Mike Acton's data-oriented design** — the "data is the API" framing that motivates named fields over dict access.
|
||||
- **Casey Muratori on module layer boundaries** — the convention that each module owns its data and exposes a clear interface.
|
||||
- **Ryan Fleury's "errors are just cases"** — the `Result[T]` convention adopted by this track for `from_dict()` return types.
|
||||
|
||||
### 12.3 Follow-up Track (planned; NOT in this track)
|
||||
|
||||
- **`any_type_componentization_phase2_2026MMDD`** (placeholder): the 211 remaining `Any` sites not in the 5 candidates. Identified by the audit's Pattern 3/4/5 analysis; may yield additional fat-struct candidates as future tracks touch those code areas.
|
||||
- **`openai_tools_dataclass_bridge_2026MMDD`** (placeholder): the cross-phase coupling opportunity (Phase 2's `OpenAICompatibleRequest.tools: list[ToolSpec]`).
|
||||
- **`type_registry_ci_20260606`** (planned in `data_structure_strengthening_20260606` §12.1): wires `generate_type_registry.py --check` into CI. This track ships the new modules; the CI gate is a separate concern.
|
||||
|
||||
## 13. Verification Criteria (Definition of Done)
|
||||
|
||||
- [ ] `src/mcp_tool_specs.py` exists with `ToolParameter` + `ToolSpec` + registry
|
||||
- [ ] `src/openai_schemas.py` exists with `ToolCall` + `ChatMessage` + `UsageStats`
|
||||
- [ ] `src/provider_state.py` exists with `ProviderHistory` + `_PROVIDER_HISTORIES` dict
|
||||
- [ ] `src/log_registry.py` has `Session` + `SessionMetadata` dataclasses
|
||||
- [ ] `src/api_hooks.py` has `WebSocketMessage` + `JsonValue` TypeAlias usage
|
||||
- [ ] `src/type_aliases.py` extended with `JsonPrimitive` + `JsonValue`
|
||||
- [ ] `scripts/audit_dataclass_coverage.py` exists with `--strict` mode
|
||||
- [ ] `scripts/audit_dataclass_coverage.baseline.json` committed
|
||||
- [ ] `conductor/code_styleguides/type_aliases.md` has §12 "When to Promote" section
|
||||
- [ ] 6 new test files exist with 48+ tests (Phase 0 audit: 6, Phase 1: 8, Phase 2: 10, Phase 3: 10, Phase 4: 8, Phase 5: 6)
|
||||
- [ ] All existing tests pass (no regressions in 11-tier batched run)
|
||||
- [ ] `audit_weak_types.py --strict` exits 0
|
||||
- [ ] `audit_dataclass_coverage.py --strict` exits 0
|
||||
- [ ] `generate_type_registry.py --check` exits 0 (5 new .md files appear)
|
||||
- [ ] `docs/reports/TRACK_COMPLETION_any_type_componentization_20260621.md` written
|
||||
- [ ] Track archived; `conductor/tracks.md` updated
|
||||
@@ -0,0 +1,129 @@
|
||||
# Track state for any_type_componentization_20260621
|
||||
# Updated by Tier 2 Tech Lead as tasks complete
|
||||
|
||||
[meta]
|
||||
track_id = "any_type_componentization_20260621"
|
||||
name = "Any-Type Componentization (Promote dict[str, Any] to dataclass(frozen=True))"
|
||||
status = "active"
|
||||
current_phase = 0
|
||||
last_updated = "2026-06-21"
|
||||
|
||||
[blocked_by]
|
||||
data_structure_strengthening_20260606 = "pending_merge"
|
||||
|
||||
[blocks]
|
||||
any_type_componentization_phase2_2026MMDD = "planned"
|
||||
openai_tools_dataclass_bridge_2026MMDD = "planned"
|
||||
|
||||
[phases]
|
||||
phase_0 = { status = "pending", checkpointsha = "", name = "Shared scaffolding (JsonValue + audit + styleguide)" }
|
||||
phase_1 = { status = "pending", checkpointsha = "", name = "mcp_tool_specs (P1, 8 sites)" }
|
||||
phase_2 = { status = "pending", checkpointsha = "", name = "openai_schemas (P1, 17 sites)" }
|
||||
phase_3 = { status = "pending", checkpointsha = "", name = "provider_state (P2, 41 sites)" }
|
||||
phase_4 = { status = "pending", checkpointsha = "", name = "log_registry Session (P2, 7 sites)" }
|
||||
phase_5 = { status = "pending", checkpointsha = "", name = "api_hooks WebSocketMessage (P3, 16 sites)" }
|
||||
phase_6 = { status = "pending", checkpointsha = "", name = "Verify + docs + archive" }
|
||||
|
||||
[tasks]
|
||||
# Phase 0: Shared scaffolding
|
||||
t0_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_audit_dataclass_coverage.py (mirror tests/test_audit_weak_types.py structure; verify regex patterns + Finding dataclass + --strict mode)" }
|
||||
t0_2 = { status = "pending", commit_sha = "", description = "Green: implement scripts/audit_dataclass_coverage.py (informational + --json + --strict + --baseline modes)" }
|
||||
t0_3 = { status = "pending", commit_sha = "", description = "Extend src/type_aliases.py with JsonPrimitive + JsonValue TypeAliases" }
|
||||
t0_4 = { status = "pending", commit_sha = "", description = "Add §12 'When to Promote TypeAlias to dataclass' to conductor/code_styleguides/type_aliases.md" }
|
||||
t0_5 = { status = "pending", commit_sha = "", description = "Phase 0 checkpoint commit + git note" }
|
||||
# Phase 1: mcp_tool_specs (P1)
|
||||
t1_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_mcp_tool_specs.py (verify 45 tools registered; get_tool_spec dispatch; TOOL_NAMES cross-module invariant)" }
|
||||
t1_2 = { status = "pending", commit_sha = "", description = "Green: create src/mcp_tool_specs.py with ToolParameter + ToolSpec dataclasses + module-level _REGISTRY" }
|
||||
t1_3 = { status = "pending", commit_sha = "", description = "Migrate MCP_TOOL_SPECS dict literals to ToolSpec instances in src/mcp_tool_specs.py:_REGISTRY" }
|
||||
t1_4 = { status = "pending", commit_sha = "", description = "Update src/mcp_client.py call sites (lines 1944, 1958, 2747) to use mcp_tool_specs.tool_names() / get_tool_schemas()" }
|
||||
t1_5 = { status = "pending", commit_sha = "", description = "Update src/ai_client.py:560,582,1012 (3 sites using mcp_client.TOOL_NAMES -> mcp_tool_specs.tool_names())" }
|
||||
t1_6 = { status = "pending", commit_sha = "", description = "Verify cross-module invariant: TOOL_NAMES is a subset of models.AGENT_TOOL_NAMES" }
|
||||
t1_7 = { status = "pending", commit_sha = "", description = "Run regression suite on tests/test_mcp_client.py + tests/test_ai_client.py" }
|
||||
t1_8 = { status = "pending", commit_sha = "", description = "Phase 1 checkpoint commit + git note" }
|
||||
# Phase 2: openai_schemas (P1)
|
||||
t2_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_openai_schemas.py (ChatMessage.from_dict round-trip for 4 roles; UsageStats field access; ToolCall.function.arguments JSON parse; Result[T] error cases)" }
|
||||
t2_2 = { status = "pending", commit_sha = "", description = "Green: create src/openai_schemas.py with ToolCall + ToolCallFunction + ChatMessage + UsageStats dataclasses" }
|
||||
t2_3 = { status = "pending", commit_sha = "", description = "Refactor src/openai_compatible.py:NormalizedResponse (4 usage fields -> UsageStats; tool_calls -> tuple[ToolCall, ...])" }
|
||||
t2_4 = { status = "pending", commit_sha = "", description = "Refactor src/openai_compatible.py:OpenAICompatibleRequest (messages -> list[ChatMessage])" }
|
||||
t2_5 = { status = "pending", commit_sha = "", description = "Update src/openai_compatible.py internal consumers (~5 functions constructing/parsing NormalizedResponse)" }
|
||||
t2_6 = { status = "pending", commit_sha = "", description = "Update src/ai_client.py _send_grok + _send_minimax + _send_llama (3 functions constructing OpenAICompatibleRequest)" }
|
||||
t2_7 = { status = "pending", commit_sha = "", description = "Cross-check src/api_hook_client.py for NormalizedResponse/OpenAICompatibleRequest consumers" }
|
||||
t2_8 = { status = "pending", commit_sha = "", description = "Run regression suite on tests/test_openai_compatible.py + tests/test_ai_client.py" }
|
||||
t2_9 = { status = "pending", commit_sha = "", description = "Phase 2 checkpoint commit + git note" }
|
||||
# Phase 3: provider_state (P2)
|
||||
t3_1 = { status = "pending", commit_sha = "", description = "Audit baseline snapshot: count _<provider>_history + _<provider>_history_lock references in src/ai_client.py" }
|
||||
t3_2 = { status = "pending", commit_sha = "", description = "Red: tests/test_provider_state.py (ProviderHistory.append thread-safety; clear atomicity; get_history singleton; cleanup clears all 6)" }
|
||||
t3_3 = { status = "pending", commit_sha = "", description = "Green: create src/provider_state.py with ProviderHistory dataclass + _PROVIDER_HISTORIES dict" }
|
||||
t3_4 = { status = "pending", commit_sha = "", description = "Remove 7 module globals + 7 lock declarations from src/ai_client.py:111-133" }
|
||||
t3_5 = { status = "pending", commit_sha = "", description = "Update src/ai_client.py:463-466 (cleanup() global declarations removed)" }
|
||||
t3_6 = { status = "pending", commit_sha = "", description = "Update src/ai_client.py:483-499 (cleanup() 7 lock blocks -> get_history(p).clear())" }
|
||||
t3_7 = { status = "pending", commit_sha = "", description = "Update src/ai_client.py _send_anthropic (~20 sites at lines 1447, 1457-1460, 1469, 1471, 1475, 1489, 1503, 1506, 1582)" }
|
||||
t3_8 = { status = "pending", commit_sha = "", description = "Update src/ai_client.py _send_deepseek (~10 sites at lines 2201-2202, 2221-2222, 2353, 2360, 2418-2420)" }
|
||||
t3_9 = { status = "pending", commit_sha = "", description = "Update src/ai_client.py _send_grok (~10 sites at lines 2575-2588, 2605)" }
|
||||
t3_10 = { status = "pending", commit_sha = "", description = "Update src/ai_client.py _send_minimax (~10 sites at lines 2659-2685)" }
|
||||
t3_11 = { status = "pending", commit_sha = "", description = "Update src/ai_client.py _send_qwen (~8 sites at lines 2812-2823)" }
|
||||
t3_12 = { status = "pending", commit_sha = "", description = "Update src/ai_client.py _send_llama (~8 sites at lines 2901-2925)" }
|
||||
t3_13 = { status = "pending", commit_sha = "", description = "Verify SDK client holders (_gemini_chat, etc.) NOT touched (Pattern 3 preserved)" }
|
||||
t3_14 = { status = "pending", commit_sha = "", description = "Run regression suite on tests/test_ai_client*.py (8 files; 27 tests)" }
|
||||
t3_15 = { status = "pending", commit_sha = "", description = "Phase 3 checkpoint commit + git note" }
|
||||
# Phase 4: log_registry Session (P2)
|
||||
t4_1 = { status = "pending", commit_sha = "", description = "Red: extend tests/test_log_registry.py (Session.from_dict round-trip; Session.metadata Optional; LogRegistry.data typed)" }
|
||||
t4_2 = { status = "pending", commit_sha = "", description = "Green: add Session + SessionMetadata dataclasses inline in src/log_registry.py" }
|
||||
t4_3 = { status = "pending", commit_sha = "", description = "Refactor LogRegistry.data: dict[str, dict[str, Any]] -> dict[str, Session]" }
|
||||
t4_4 = { status = "pending", commit_sha = "", description = "Update src/session_logger.py (open_session, close_session)" }
|
||||
t4_5 = { status = "pending", commit_sha = "", description = "Update src/log_pruner.py (prune_old_logs)" }
|
||||
t4_6 = { status = "pending", commit_sha = "", description = "Update src/gui_2.py Log Management panel" }
|
||||
t4_7 = { status = "pending", commit_sha = "", description = "Run regression suite on tests/test_log_registry.py + tests/test_session_logger.py + tests/test_log_pruner.py" }
|
||||
t4_8 = { status = "pending", commit_sha = "", description = "Phase 4 checkpoint commit + git note" }
|
||||
# Phase 5: api_hooks WebSocketMessage (P3)
|
||||
t5_1 = { status = "pending", commit_sha = "", description = "Red: extend tests/test_api_hooks.py (WebSocketMessage frozen=True; JsonValue round-trip via _serialize_for_api; Pattern 4 preserved)" }
|
||||
t5_2 = { status = "pending", commit_sha = "", description = "Green: add WebSocketMessage dataclass inline in src/api_hooks.py" }
|
||||
t5_3 = { status = "pending", commit_sha = "", description = "Update broadcast() signature: (channel, payload: dict[str, Any]) -> (message: WebSocketMessage)" }
|
||||
t5_4 = { status = "pending", commit_sha = "", description = "Update _serialize_for_api return type: Any -> JsonValue" }
|
||||
t5_5 = { status = "pending", commit_sha = "", description = "Update broadcast() callers (~5-10 sites across src/app_controller.py, src/gui_2.py)" }
|
||||
t5_6 = { status = "pending", commit_sha = "", description = "Verify Pattern 4 preserved: _get_app_attr, _set_app_attr signatures unchanged" }
|
||||
t5_7 = { status = "pending", commit_sha = "", description = "Run regression suite on tests/test_api_hooks.py + tests/test_app_controller.py" }
|
||||
t5_8 = { status = "pending", commit_sha = "", description = "Phase 5 checkpoint commit + git note" }
|
||||
# Phase 6: Verify + docs + archive
|
||||
t6_1 = { status = "pending", commit_sha = "", description = "Run scripts/audit_weak_types.py --strict (exit 0)" }
|
||||
t6_2 = { status = "pending", commit_sha = "", description = "Run scripts/audit_dataclass_coverage.py --strict (exit 0; generate baseline)" }
|
||||
t6_3 = { status = "pending", commit_sha = "", description = "Run scripts/generate_type_registry.py (auto-include new modules) + --check (exit 0)" }
|
||||
t6_4 = { status = "pending", commit_sha = "", description = "Run 11-tier batched regression suite (per test_sandbox_hardening_20260619 convention)" }
|
||||
t6_5 = { status = "pending", commit_sha = "", description = "Write docs/reports/TRACK_COMPLETION_any_type_componentization_20260621.md" }
|
||||
t6_6 = { status = "pending", commit_sha = "", description = "git mv conductor/tracks/any_type_componentization_20260621 conductor/tracks/archive/" }
|
||||
t6_7 = { status = "pending", commit_sha = "", description = "Update conductor/tracks.md (move entry to Recently Completed)" }
|
||||
t6_8 = { status = "pending", commit_sha = "", description = "Final state.toml update + Phase 6 checkpoint commit + git note" }
|
||||
|
||||
[verification]
|
||||
phase_0_jsonvalue_complete = false
|
||||
phase_0_audit_script_complete = false
|
||||
phase_0_styleguide_complete = false
|
||||
phase_1_mcp_tool_specs_complete = false
|
||||
phase_2_openai_schemas_complete = false
|
||||
phase_3_provider_state_complete = false
|
||||
phase_4_log_registry_complete = false
|
||||
phase_5_api_hooks_complete = false
|
||||
phase_6_track_archived = false
|
||||
full_11_tier_regression_passes = false
|
||||
audit_weak_types_strict_passes = false
|
||||
audit_dataclass_coverage_strict_passes = false
|
||||
type_registry_check_passes = false
|
||||
|
||||
[candidate_progression]
|
||||
# Filled as phases complete
|
||||
p1_mcp_tool_specs_sites = 8
|
||||
p1_openai_schemas_sites = 17
|
||||
p2_provider_state_sites = 41
|
||||
p2_log_registry_sites = 7
|
||||
p3_api_hooks_sites = 16
|
||||
total_candidate_sites = 89
|
||||
|
||||
[files_modified_or_created]
|
||||
new = ["src/mcp_tool_specs.py", "src/openai_schemas.py", "src/provider_state.py", "scripts/audit_dataclass_coverage.py", "scripts/audit_dataclass_coverage.baseline.json"]
|
||||
modified = ["src/type_aliases.py", "src/mcp_client.py", "src/openai_compatible.py", "src/ai_client.py", "src/log_registry.py", "src/session_logger.py", "src/log_pruner.py", "src/gui_2.py", "src/api_hooks.py", "conductor/code_styleguides/type_aliases.md"]
|
||||
|
||||
[input_artifact]
|
||||
report = "docs/reports/ANY_TYPE_AUDIT_20260621.md"
|
||||
findings_count = 300
|
||||
candidates_count = 5
|
||||
candidate_sites = 89
|
||||
@@ -1,250 +1,354 @@
|
||||
# Track Specification: Conductor Chronology (2026-06-19)
|
||||
# Track Specification: Conductor Chronology v2 (2026-06-21 rewrite)
|
||||
|
||||
## Overview
|
||||
|
||||
This track creates `conductor/chronology.md`, a complete, manually-maintained index of all tracks (active, shipped, archived, superseded) for the Manual Slop conductor system, plus a small section for notable non-track commits. It removes the duplicated `[x]` completed-track listings from `conductor/tracks.md` (the "Phase 9: Chore Tracks" section, the `[x]` entries under "Active Research Tracks", and the `[shipped]` entries under "Follow-up") and consolidates them into a single canonical index.
|
||||
This is the **v2 rewrite** of `chronology_20260619`. The first run (Phases 1-9, 24 commits, 2026-06-19 to 2026-06-20) shipped `conductor/chronology.md` with a **broken status classifier** that read stale `metadata.json.status` fields. The user mandate — "EVERY SINGLE ENTRY MUST BE CROSS CHECKED" — was satisfied at a structural level (folder set == row set) but the **semantic level** (status correctness, summary quality) was not. Two classifier iterations followed (commits `4109a667` and `271e6895`); both used heuristic-based fallbacks and neither used **git history as the explicit evidence source** the user wants.
|
||||
|
||||
The per-track `spec.md`/`plan.md`/`metadata.json`/`state.toml` in `conductor/tracks/` and `conductor/archive/` remain the source of truth for each track's details. `chronology.md` is the *index* — one row per track, with a brief one-sentence summary, a folder link, a commit range, and a status badge. It reads as a build history, not a release history.
|
||||
This rewrite replaces the spec/plan/state.toml; the 24 prior commits + the broken v1 chronology remain in git history as the foundation. The substantive changes are:
|
||||
1. **FR1** (chronology structure): rewritten — new status enum (5 values), per-row evidence line, per-row confidence level, "Needs Review" section.
|
||||
2. **FR5** (helper script): rewritten — git-history classifier with confidence assignment.
|
||||
3. **FR6** (cross-check): rewritten — 3-stage protocol (classifier auto + Tier 1 reviews "Needs Review" queue + user reviews final).
|
||||
4. **FR7** (new): classifier quality gate — if > 30% of rows are ambiguous, abort to manual review (the user's "B" fallback).
|
||||
|
||||
The active task list stays in `conductor/tracks.md` (in-flight `[~]` and planned `[ ]` entries). When a track ships and is moved to `archive/`, its entry is added to `chronology.md` and its `[x]` row is removed from `tracks.md` (this is the workflow change).
|
||||
Phases that produced the existing `tracks.md` pruning + `workflow.md` 3-step convention + the v1 migration report are reused. This rewrite adds a v2 addendum to the migration report.
|
||||
|
||||
## Current State Audit (as of 2026-06-19)
|
||||
## Current State Audit (as of 2026-06-21, commit `3aea92f1`)
|
||||
|
||||
### Already Implemented (DO NOT re-implement)
|
||||
### Already Implemented (carried forward, NO REWORK)
|
||||
|
||||
1. **`conductor/tracks.md` (line 459)** — already calls itself a "Lightweight chronology; full spec/plan/state per track is in the linked folder." This track makes that role explicit and gives it a dedicated file.
|
||||
2. **`conductor/tracks.md` "Phase 9: Chore Tracks" section** — manually-maintained list of `[x]` completed tracks. This is one of three duplicated listings that move to `chronology.md`.
|
||||
3. **`conductor/tracks.md` "Active Research Tracks" section** — the `[x]` entries (e.g., Fable review shipped 2026-06-18) move to `chronology.md`. The `[ ]` in-flight entries stay in `tracks.md`.
|
||||
4. **`conductor/tracks.md` "Follow-up (Planned, Not Yet Specced)" section** — the `[shipped: YYYY-MM-DD]` entries move to `chronology.md`. The "planned" and "not yet specced" entries stay in `tracks.md`.
|
||||
5. **`conductor/archive/` (176 track folders)** — the canonical location of shipped tracks. Each folder has at minimum a `spec.md`; most also have `plan.md`; modern tracks (2026-06+) have `metadata.json` + `state.toml` as well.
|
||||
6. **`conductor/tracks/` (35 active track folders)** — the canonical location of in-flight tracks.
|
||||
7. **`conductor/workflow.md` "Notes > Editing this file" section** — documents the existing convention for moving tracks to `archive/` when shipped. The new convention is appended here.
|
||||
1. **`conductor/tracks.md` "Phase 9: Chore Tracks" section** — pruned to one-line stub pointing to `chronology.md` (commit `be38dd5`).
|
||||
2. **`conductor/tracks.md` "Active Research Tracks" `[x]` entries** — pruned (commit `cca4767`).
|
||||
3. **`conductor/tracks.md` "Follow-up" `[shipped]` entries** — pruned (commit `b3a9c45`).
|
||||
4. **`conductor/workflow.md` "Notes > Editing this file" section** — has the 3-step archiving convention (commit `b697cd8`).
|
||||
5. **`scripts/audit/generate_chronology.py`** — exists (338 lines). Functions: `extract_slug_date`, `extract_summary`, `walk_track_folders`, `format_markdown`, `_classify_status`, `_parse_state_phase`, `_last_commit_date`. The **broken function** is `_classify_status` (lines ~163-189) which reads the `current` parameter (originally from `metadata.json.status`) and uses folder-location + state_phase heuristics. **This function is the target of FR5's rewrite.**
|
||||
6. **`tests/test_generate_chronology.py`** — 6 unit tests, all passing against the current (broken) classifier. Need extension per FR5.
|
||||
7. **`conductor/chronology.md`** — 218 lines, 216 rows, v1 with broken status classifier. Statuses include `active`, `spec_written`, `spec_approved`, `planning` (stale metadata.json.status values). 41 `Completed`, 0 `Abandoned`, 167 rows with stale status per the handover report (line 14-16). **Target of Phase 1's move-to-broken-v1.**
|
||||
8. **`docs/reports/CHRONOLOGY_MIGRATION_20260619.md`** — v1 migration report; needs v2 addendum (FR4).
|
||||
9. **`docs/reports/CHRONOLOGY_TRACK_HANDOVER_20260620.md`** — tier-2's hand-off; documents the failure + the recommended fix (the 5-step git-history algorithm).
|
||||
10. **`docs/reports/TRACK_COMPLETION_chronology_20260619.md`** — v1 end-of-track report; needs v2 addendum.
|
||||
|
||||
### Gaps to Fill (This Track's Scope)
|
||||
|
||||
| # | Gap | Where | Resolution |
|
||||
|---|-----|-------|-----------|
|
||||
| G1 | No `conductor/chronology.md` exists | `conductor/` (new file) | Create + populate |
|
||||
| G2 | `tracks.md` carries duplicated completed-track listings across 3 sections | `conductor/tracks.md` Phase 9, Active Research, Follow-up | Remove all `[x]`/`[shipped]` entries |
|
||||
| G3 | No documented convention for what happens to a `tracks.md` entry when a track is archived | `conductor/workflow.md` | Add a 3-step section: update `tracks.md`, add to `chronology.md`, move folder to `archive/` |
|
||||
| G4 | No audit trail of the migration | `docs/reports/` | New `CHRONOLOGY_MIGRATION_20260619.md` for user review |
|
||||
| G5 | Brief per-track summaries don't exist anywhere as a single-line format | `spec.md` (1st paragraph) + `metadata.json.description` (modern tracks) | Extract for the migration; manually edited for length |
|
||||
| G1 | v1 chronology.md has 167/216 rows with wrong status (stale `metadata.json.status` values) | `conductor/chronology.md` | Move v1 to `conductor/chronology.md.broken-v1` (Phase 1); generate v2 with git-history classifier (Phase 4) |
|
||||
| G2 | v1 chronology.md has summaries that are metadata-field text (`**Priority:** A...`, `**Date:** 2026-06-20`) not the actual track summary | Same as G1 | v2's priority chain (FR5 §"Summary extraction") rejects metadata-field text via regex |
|
||||
| G3 | `_classify_status` reads stale `metadata.json.status` | `scripts/audit/generate_chronology.py:~163-189` | Rewrite to use the 5-step git-history algorithm (handover §"Root cause of failure") |
|
||||
| G4 | No "Needs Review" queue mechanism | n/a (new) | Add per-row confidence (FR5) + "Needs Review" section in `chronology.md` (FR1) |
|
||||
| G5 | No quality gate to detect a bad classifier | n/a (new) | Add `scripts/audit/chronology_quality_gate.py` (FR7) |
|
||||
| G6 | v1 cross-check was bulk-verified (structural check, not per-row semantic check) | n/a (process change) | v2 cross-check is 3-stage (FR6): classifier auto + Tier 1 reviews "Needs Review" + user reviews final with per-row evidence log |
|
||||
| G7 | v1 per-row evidence is missing | n/a (new) | Add per-row evidence line to `chronology.md` (FR1) + standalone evidence log file (FR6 §"per-row evidence log") |
|
||||
| G8 | `state.toml` is at `current_phase = 10` with a false "complete" state | `conductor/tracks/chronology_20260619/state.toml` | Reset to `current_phase = 0`; this rewrite starts fresh |
|
||||
| G9 | v1 migration report has 167 stale-status rows in the per-row log | `docs/reports/CHRONOLOGY_MIGRATION_20260619.md` | v2 addendum shows the diff (v1 status → v2 status) with the git evidence per row |
|
||||
| G10 | No fallback path if the classifier is bad | n/a (new) | FR7 quality gate; if > 30% ambiguous → abort to manual review (the user's "B" fallback per chat 2026-06-21) |
|
||||
|
||||
## Goals
|
||||
|
||||
1. **One canonical index.** `conductor/chronology.md` is the only file the user (or an agent) consults to see "what has this project done." No more scanning 3 sections of `tracks.md`.
|
||||
2. **No info loss.** Every completed track that was in `tracks.md` is now in `chronology.md` with the same information (name, link, status, checkpoint SHAs).
|
||||
3. **Forward-compatible.** When a new track ships, the convention is clear: add a row to `chronology.md`, update the row in `tracks.md` (or remove it), and move the folder to `archive/`.
|
||||
4. **Notable non-track commits captured.** Commits that aren't part of any track (direct fixes, infra tweaks, doc-only commits) have a place in `chronology.md` if a future reader would want to know about them.
|
||||
5. **No day estimates.** Per the project convention (added 2026-06-16), all scope is measured in files/sites, not time.
|
||||
1. **One canonical index.** `conductor/chronology.md` is the only file consulted to see "what has this project done." No more scanning 3 sections of `tracks.md`. (Carried from v1; unchanged.)
|
||||
2. **No info loss.** Every track that has a folder in `conductor/tracks/` or `conductor/archive/` has a row in `chronology.md` (or a documented exception). (Carried from v1; unchanged.)
|
||||
3. **Forward-compatible.** When a new track ships, the convention is clear: move folder to `archive/`, remove `[x]` from `tracks.md`, add a row to `chronology.md` with the new format. (Carried from v1; unchanged.)
|
||||
4. **Git history is the explicit evidence.** Each row's status is derived from `git log -- <folder>` (commit count + commit messages). `metadata.json.status` is **informational only** — the classifier does not trust it for the final status.
|
||||
5. **"EVERY SINGLE ENTRY" mandate preserved at the semantic level.** Every row has: (a) a status decision, (b) the git evidence that supports the decision, (c) a per-row confidence level, (d) a "Needs Review" flag if confidence is low. The "cross-check" is the row's evidence trail, not a separate audit pass.
|
||||
6. **Conservative classifier + hard quality gate.** The classifier auto-classifies only when evidence is clear; ambiguous rows are flagged for human review. If > 30% of rows are ambiguous, the classifier is bad → abort to manual review (the user's "B" fallback per chat 2026-06-21).
|
||||
7. **No day estimates.** Per `conductor/workflow.md` Tier 1 Track Initialization Rules (added 2026-06-16). Scope measured in files/sites.
|
||||
|
||||
## Functional Requirements
|
||||
|
||||
### FR1. `conductor/chronology.md` file structure
|
||||
### FR1. `conductor/chronology.md` v2 structure (REWRITTEN)
|
||||
|
||||
**WHERE:** New file `conductor/chronology.md` at the conductor root.
|
||||
**WHERE:** `conductor/chronology.md` (replaces v1).
|
||||
|
||||
**WHAT:** A markdown file with the following structure (top to bottom):
|
||||
**WHAT:** Same overall structure as v1 (table format, newest first, "Notable Non-Track Commits" section at the bottom), with these changes:
|
||||
|
||||
```markdown
|
||||
# Conductor Chronology
|
||||
**Status enum (5 values, replaces v1's 6-value enum):**
|
||||
- `Active` — folder in `tracks/` + work has started (≥ 1 `feat/fix/refactor` commit) but `state.toml.current_phase` < 3
|
||||
- `In Progress` — folder in `tracks/` + `state.toml.current_phase` ≥ 3 (or no `state.toml` + ≥ 3 work commits)
|
||||
- `Completed` — folder in `archive/` + ≥ 3 work commits (or `state.toml.current_phase == "complete"`)
|
||||
- `Abandoned` — folder in `tracks/` or `archive/` + 0-1 work commits + last commit > 14 days ago + no `feat/fix/refactor` in commit history
|
||||
- `Special` — explicit human-decision; e.g., research note, scratch dir, archived by mistake, deleted
|
||||
|
||||
Complete history of all tracks for the Manual Slop conductor system, plus notable non-track commits. This is the canonical index — the per-track spec/plan/metadata in `tracks/` and `archive/` remain the source of truth for each track's details.
|
||||
**Notably ABSENT from the v2 enum** (present in v1): `Shipped`, `Superseded`, `planning`, `spec_written`, `spec_approved`, `active` (lowercase). The v2 enum is the canonical set; v1's status values are stale metadata leaks.
|
||||
|
||||
The active task list lives in [`tracks.md`](./tracks.md). When a track ships and is moved to `archive/`, its entry here is added (and its `[x]` entry removed from `tracks.md`).
|
||||
**Per-row confidence level (NEW):**
|
||||
- `high` — auto-classified by the script; git evidence + folder location + state.toml (if present) all point to the same status
|
||||
- `low` — in the "Needs Review" queue; needs Tier 1 + user review
|
||||
|
||||
## Tracks (newest first)
|
||||
|
||||
- **YYYY-MM-DD** — `track_id_<YYYYMMDD>` *(Status)* — One-sentence summary.
|
||||
- Folder: [tracks/track_id_<YYYYMMDD>/](./tracks/track_id_<YYYYMMDD>/) (active) OR [archive/track_id_<YYYYMMDD>/](./archive/track_id_<YYYYMMDD>/) (shipped)
|
||||
- Range: `<init-sha>..<end-sha>` (N commits)
|
||||
|
||||
*(one row per track, ~165 total)*
|
||||
|
||||
## Notable Non-Track Commits
|
||||
|
||||
- **YYYY-MM-DD** — `<sha>` — One-line description of why this commit is notable.
|
||||
- ...
|
||||
**Per-row evidence line (NEW):**
|
||||
Each row gets a sub-line in the format:
|
||||
```
|
||||
Evidence: <7-char-init-sha>..<7-char-end-sha> | N commits | state_phase=<N or "n/a" or "complete"> | "<first-commit-subject>" → "<last-commit-subject>" | confidence=<high|low>
|
||||
```
|
||||
|
||||
**Per-row fields:**
|
||||
- **Date** — the date in the track's slug (`YYYYMMDD` → `YYYY-MM-DD`). If the slug date disagrees with the first-commit date (older tracks), use the slug date.
|
||||
- **Track ID** — the standard `topic_<YYYYMMDD>` slug, in backticks.
|
||||
- **Status** — one of: `Active`, `In Progress`, `Shipped`, `Superseded`, `Abandoned`.
|
||||
- **Summary** — one sentence, ≤ 25 words, manually written. The first sentence of `spec.md` is the source; manually trimmed for length.
|
||||
- **Folder** — link to `tracks/<id>/` (active) or `archive/<id>/` (shipped).
|
||||
- **Range** — `<7-char init SHA>..<7-char end SHA>` + commit count. Use the FIRST commit that touched the track folder as `init-sha` and the LAST commit (or the archive-move commit) as `end-sha`. Get these from `git log --reverse --format='%h' -- <folder>` and `git log --format='%h' -1 -- <folder>`.
|
||||
**"Needs Review" section (NEW):**
|
||||
At the bottom of `chronology.md`, a section listing all `low`-confidence rows with a one-line reason each. Format:
|
||||
```
|
||||
## Needs Review (Tier 1 + User)
|
||||
|
||||
**Notable Non-Track Commits section:**
|
||||
- Sorted newest first.
|
||||
- One row per notable commit: date, SHA, one-line description.
|
||||
- The criterion for "notable" is: a future agent reading the chronology would want to know this commit happened. The bar is "non-obvious work that wasn't part of a track" — e.g., direct production fixes, infra changes, refactors that pre-date the conductor convention.
|
||||
These rows had ambiguous git evidence. Resolved by Tier 1; user reviewed in Stage 3.
|
||||
|
||||
### FR2. `conductor/tracks.md` pruning
|
||||
|
||||
**WHERE:** `conductor/tracks.md` (modify).
|
||||
|
||||
**WHAT:** Remove all `[x]` completed-track entries from the 3 sections:
|
||||
1. "Phase 9: Chore Tracks" — remove the entire section (or leave a one-line stub pointing to `chronology.md`).
|
||||
2. "Active Research Tracks" — remove only the `[x]` entries; keep the `[ ]` in-flight ones.
|
||||
3. "Follow-up (Planned, Not Yet Specced)" — remove only the `[shipped: YYYY-MM-DD]` entries; keep the "planned" and "not yet specced" entries.
|
||||
|
||||
**KEEP:**
|
||||
- The Active Tracks table at the top of the file (all rows, including in-flight `[~]` and planned `[ ]`).
|
||||
- The "Backlog" section.
|
||||
- The "Notes" section.
|
||||
- The "Status legend" (`[ ]` / `[~]` / `[x]`).
|
||||
|
||||
**Stub convention:** If a section is fully removed, leave a one-line stub:
|
||||
```markdown
|
||||
#### Phase 9: Chore Tracks
|
||||
*Completed chore tracks are in [`chronology.md`](./chronology.md).*
|
||||
- `<track_id>` (status=<resolved>) — <one-line reason> — resolved by Tier 1
|
||||
```
|
||||
|
||||
### FR3. `conductor/workflow.md` update
|
||||
**Other v1 fields preserved unchanged:** Date, Track ID, Summary (≤ 25 words), Folder, Range (`<init-sha>..<end-sha>` with commit count), Notable Non-Track Commits section.
|
||||
|
||||
**WHERE:** `conductor/workflow.md` "Notes > Editing this file" section (append).
|
||||
|
||||
**WHAT:** Add a 3-step convention for archiving a track:
|
||||
|
||||
```markdown
|
||||
**Archiving a track (3 steps):**
|
||||
1. Move the folder from `conductor/tracks/<id>/` to `conductor/archive/<id>/`.
|
||||
2. Remove the `[x]` entry from `conductor/tracks.md` (and update status badges on related entries).
|
||||
3. Add a row to `conductor/chronology.md` with the init SHA, the end SHA (the archive-move commit), and a one-sentence summary.
|
||||
**Worked example (new format):**
|
||||
```
|
||||
| 2026-06-19 | `chronology_20260619` | In Progress | **Confidence:** low | v2 rewrite of the chronology track after tier-2's failure report identified the broken status classifier. | `conductor/tracks/chronology_20260619` | `87923c93..3aea92f1` (12) |
|
||||
| | | | | | Evidence: `87923c9..3aea92f` | 12 commits | state_phase=n/a (this rewrite) | "conductor(track): add initial spec for chronology_20260619" → "botched the chronology, going to rewrite the track." | confidence=low |
|
||||
```
|
||||
|
||||
### FR4. Migration report
|
||||
### FR2. `conductor/tracks.md` pruning (CARRIED FORWARD; no changes)
|
||||
|
||||
**WHERE:** New file `docs/reports/CHRONOLOGY_MIGRATION_20260619.md`.
|
||||
**Already complete in v1 (commits `be38dd5`, `cca4767`, `b3a9c45`).** This rewrite verifies the pruning is intact and re-commits nothing.
|
||||
|
||||
**WHAT:** A one-page summary for the user to review the migration:
|
||||
- Total entries created in `chronology.md` (count by status: Active / Shipped / Superseded / Abandoned).
|
||||
- Total entries removed from `tracks.md` (count by section: Phase 9 / Active Research / Follow-up).
|
||||
- Total notable non-track commits added.
|
||||
- Any tracks that couldn't be migrated (missing `spec.md`, ambiguous status, etc.) and why.
|
||||
- A small diff preview (10-20 sample rows) so the user can spot-check the format.
|
||||
**Verification step:** Phase 1 of the v2 plan runs `grep -n "^- \[x\]" conductor/tracks.md` and confirms 0 matches (other than the Status legend at the bottom of the file).
|
||||
|
||||
### FR5. Helper script (DRAFT-ONLY; never source of truth)
|
||||
### FR3. `conductor/workflow.md` 3-step convention (CARRIED FORWARD; no changes)
|
||||
|
||||
**WHERE:** New file `scripts/audit/generate_chronology.py` (used for the initial population only).
|
||||
**Already complete in v1 (commit `b697cd8`).** This rewrite verifies the 3-step block is present and re-commits nothing.
|
||||
|
||||
**WHAT:** A one-shot script that walks `conductor/tracks/` and `conductor/archive/`, extracts per-track data (init SHA, end SHA, date, summary from `spec.md`/`metadata.json`), and produces a **DRAFT** `conductor/chronology.md.draft`. The draft is a starting point for FR6; it is NOT authoritative.
|
||||
**Verification step:** Phase 1 of the v2 plan runs `grep -n "Archiving a track" conductor/workflow.md` and confirms 1 match.
|
||||
|
||||
**The script is the EXTRACTION tool; the human is the AUTHORITY.** Every value the script emits is a guess: a date pulled from the slug, a summary trimmed from `spec.md`, a commit SHA from `git log`. All of these can be wrong (slugs predate the slug convention; summaries are too long or off-topic; commit SHAs depend on the folder containing the right files). The script cannot know which tracks are superseded, abandoned, or special-cased. The cross-check (FR6) is the gate that catches this.
|
||||
### FR4. Migration report v2 addendum (UPDATED)
|
||||
|
||||
**Workflow:**
|
||||
1. Run `uv run python scripts/audit/generate_chronology.py --draft > conductor/chronology.md.draft`.
|
||||
2. Tier 1 (or the user) cross-checks every row per FR6.
|
||||
3. After cross-check, the draft is renamed to `conductor/chronology.md`.
|
||||
4. The script stays in `scripts/audit/` for re-generation if needed (a new track added retroactively, etc.) but is not part of the ongoing workflow.
|
||||
**WHERE:** `docs/reports/CHRONOLOGY_MIGRATION_20260619.md` (extends existing report).
|
||||
|
||||
**This script is REQUIRED for the initial migration** (165+ rows of hand-typing is impractical) but does NOT replace the cross-check.
|
||||
**WHAT:** A new section appended to the end of the v1 report: "v2 Rewrite Addendum (2026-06-21)". Contains:
|
||||
- **Why the rewrite was needed** — link to `CHRONOLOGY_TRACK_HANDOVER_20260620.md` + summary of the root cause
|
||||
- **v1 → v2 status diff** — table of all 216 rows showing the v1 status (stale) and v2 status (after the new classifier) + the git evidence per row
|
||||
- **Classifier confidence distribution** — counts: `high` / `low` / total; % of total in `Needs Review`
|
||||
- **Tier 1 review log** — for each `low`-confidence row, the resolution note (assigned status + reason + override if any)
|
||||
- **Quality gate result** — was the 30% threshold hit? If so, the abort-to-B was triggered.
|
||||
- **Outstanding issues** — any rows the user flagged for follow-up
|
||||
|
||||
### FR6. Mandatory per-row cross-check (USER DIRECTIVE 2026-06-19)
|
||||
### FR5. Helper script rewrite — git-history classifier (REWRITTEN)
|
||||
|
||||
**WHERE:** `conductor/chronology.md.draft` (after the script runs per FR5), then `conductor/chronology.md` (after cross-check).
|
||||
**WHERE:** `scripts/audit/generate_chronology.py` (rewritten) + `tests/test_generate_chronology.py` (extended).
|
||||
|
||||
**WHAT:** Every row in the draft is verified by a human (Tier 1 or the user) before the draft is renamed to the canonical `chronology.md`. No row is trusted on the script's word alone. The cross-check is a hard gate: the file is not committed until every row passes.
|
||||
**WHAT:** The script's `_classify_status` function is rewritten to use the handover's 5-step algorithm. The new signature is:
|
||||
|
||||
**The 5 fields verified per row:**
|
||||
1. **Date** — does it match the slug (`YYYYMMDD` → `YYYY-MM-DD`)? If the slug is missing or non-standard, does the first-commit date match? Fix any disagreement.
|
||||
2. **Track ID** — does the backticked slug match the folder name? Any typo is a broken link.
|
||||
3. **Status** — is the badge correct? Folder in `tracks/` = `Active` or `In Progress`; folder in `archive/` = `Shipped`; check `tracks.md` for `[~]` (in progress) vs `[ ]` (planned, not yet active). Superseded/Abandoned are rare and require a manual decision.
|
||||
4. **Summary** — does the one-sentence summary actually describe what the track did? Is it under 25 words? Is it the most important fact, not the first random sentence of `spec.md`? Trim or rewrite as needed.
|
||||
5. **Range** — does the init SHA exist? Does the end SHA exist? Does the range cover the right commits? Run `git log --oneline <init>..<end> -- <folder>` and verify the count is plausible (not 0, not absurd).
|
||||
```python
|
||||
def _classify_status(
|
||||
folder_link: str,
|
||||
init_sha: str,
|
||||
end_sha: str,
|
||||
commit_count: int,
|
||||
first_commit_subject: str,
|
||||
last_commit_subject: str,
|
||||
state_phase: str | None,
|
||||
metadata_status: str | None,
|
||||
last_commit_date: str,
|
||||
) -> tuple[str, str, str]:
|
||||
"""Classify a track's status using git history as primary evidence.
|
||||
|
||||
**The completeness check (parallel gate):**
|
||||
After per-row verification, Tier 1 enumerates every folder in `conductor/tracks/` and `conductor/archive/` and confirms each has a corresponding row in `chronology.md`. Any folder without a row is a bug — either the row was missed, or the folder is special-cased (e.g., a research note, not a track) and the migration report (FR4) documents the exception.
|
||||
Returns:
|
||||
(status, confidence, reason) where:
|
||||
- status: one of "Active", "In Progress", "Completed", "Abandoned", "Special"
|
||||
- confidence: "high" or "low"
|
||||
- reason: one-line explanation of the classification
|
||||
"""
|
||||
```
|
||||
|
||||
**The "nothing was missed" mandate (user directive, verbatim):**
|
||||
> EVERY SINGLE ENTRY MUST BE CROSS CHECKED TO MAKE SURE IT'S STILL CORRECT, AND NOTHING WAS MISSED.
|
||||
**The 5-step algorithm (per the handover §"Rewrite `_classify_status` to use git history as primary evidence"):**
|
||||
|
||||
This is non-negotiable. If the cross-check finds even one error, the draft is fixed and re-verified. If a folder has no row, the row is added and verified. The migration is not "done" until both the per-row check and the completeness check are clean.
|
||||
1. **Count meaningful commits.** `commit_count` (already computed by the script via `git log --oneline -- <folder> | wc -l`). 1-2 commits (just spec/plan creation) is a strong signal for `Active` (in `tracks/`) or `Abandoned` (in `archive/`). ≥ 3 work commits is a strong signal for `Completed` (in `archive/`) or `In Progress` (in `tracks/`).
|
||||
|
||||
**Who does the cross-check:**
|
||||
- **Tier 1** does the bulk of the per-row verification (mechanical checks: slug match, SHA existence, folder existence).
|
||||
- **The user** reviews a 10–20 row sample (per FR4's diff preview) and the final `chronology.md` before it is committed. The user is the quality gate.
|
||||
- **Tier 3** is not used for the cross-check — the per-row work is too small to delegate, and the user wants the verification done by an agent with full context, not a stateless worker.
|
||||
2. **Inspect commit messages.** `first_commit_subject` and `last_commit_subject` (already extracted by the script). Classify each commit as `work` (matches `^(feat|fix|refactor|perf|test)\(`) or `meta` (matches `^(chore|docs|conductor)\(`) or `other` (everything else).
|
||||
|
||||
3. **Check `state.toml` phase progression.** `state_phase` is parsed from `state.toml.current_phase` if the file exists; else `None`. The thresholds:
|
||||
- `state_phase == "complete"` → `Completed` (high confidence if corroborated by git)
|
||||
- `state_phase >= 3` → `In Progress` (high confidence if corroborated by git)
|
||||
- `state_phase in (0, 1, 2)` → `Active` (high confidence if corroborated by git)
|
||||
- `state_phase is None` → no signal from state.toml; classifier relies on git + folder
|
||||
|
||||
4. **Default to conservative.** When git history is ambiguous (1-3 commits with no clear `work` pattern), flag as `low` confidence → "Needs Review". The classifier NEVER auto-marks `Abandoned` — that's a `Special` decision reserved for Tier 1 + user.
|
||||
|
||||
5. **Honour explicit metadata.** If `metadata_status` is `abandoned` or `superseded` (or `Special`), and git evidence is not contradictory, trust the metadata. If git evidence contradicts metadata (e.g., `archive/` + 0 commits + `metadata_status = "Completed"`), the classifier flags `low` confidence and the user resolves in Stage 3.
|
||||
|
||||
**Per-row confidence assignment:**
|
||||
- `high` — git evidence + folder location + state.toml (if present) all point to the same status. Default for unambiguous cases.
|
||||
- `low` — any of: (a) < 3 commits total, (b) conflicting signals (e.g., `archive/` + 0 commits + state_phase 0), (c) no `state.toml` + ambiguous git history, (d) `metadata_status` contradicts git.
|
||||
|
||||
**Summary extraction (REWRITTEN priority chain):**
|
||||
The v1 priority chain is replaced with a regex-aware version:
|
||||
1. `metadata.json.summary` if present and does not start with `**` (regex: `^\*\*`)
|
||||
2. First non-empty line of `spec.md` that does not start with `**`
|
||||
3. `metadata.json.description` if not starting with `**`
|
||||
4. First non-empty line of `plan.md` that does not start with `**`
|
||||
5. Generic placeholder: `"Imported from archive (no spec)"` for archive rows, `"Track folder (no spec found)"` for tracks/ rows
|
||||
|
||||
The regex `^\*\*` rejects metadata-field text like `**Priority:** A...`, `**Date:** 2026-06-20`, `**Created:** 2026-06-19`, `**Initialized:** 2026-06-19`, `**Parent umbrella:** ...`, `**Confidence:** ...`.
|
||||
|
||||
**New script: `scripts/audit/chronology_quality_gate.py` (FR7's wrapper).**
|
||||
- Reads the staging `chronology.md.staging` file.
|
||||
- Counts `high` and `low` confidence rows.
|
||||
- Computes `low_count / total_count`.
|
||||
- If ratio > 0.30 → exit code 1, prints "ABORT: classifier is bad; >30% of rows are ambiguous. Fall back to manual review (v1 protocol)."
|
||||
- If ratio ≤ 0.30 → exit code 0, prints "PASS: classifier is good. Proceed to Tier 1 review of 'Needs Review' queue."
|
||||
|
||||
**Tests extended:** the existing 6 tests stay; add 8-10 new tests covering:
|
||||
- `_classify_status` returns correct status for each (folder, commit_count, state_phase) combination
|
||||
- `low` confidence is assigned for ambiguous cases (1-2 commits, conflicting signals)
|
||||
- `high` confidence is assigned for unambiguous cases
|
||||
- Summary priority chain rejects metadata-field text (regression test for the v1 bug)
|
||||
- The staging file has per-row evidence + confidence lines
|
||||
- The "Needs Review" section is correctly populated
|
||||
- The quality gate script exits 1 when > 30% ambiguous, 0 when ≤ 30%
|
||||
- The quality gate script prints the correct summary
|
||||
|
||||
### FR6. Per-row cross-check (REWRITTEN — 3-stage protocol)
|
||||
|
||||
**WHERE:** `conductor/chronology.md` v2 (after classifier run), then "Needs Review" queue (Tier 1 review), then final v2 (user review).
|
||||
|
||||
**WHAT:** The cross-check is **3-stage** (replaces v1's single-stage Tier 1 review of every row):
|
||||
|
||||
**Stage 1: Classifier auto-classification (script run).**
|
||||
- The script runs `walk_track_folders()` over `conductor/tracks/` and `conductor/archive/`.
|
||||
- For each folder, the script extracts: date, track_id, init_sha, end_sha, commit_count, first_commit_subject, last_commit_subject, state_phase, metadata_status, last_commit_date, summary.
|
||||
- The script's rewritten `_classify_status()` assigns (status, confidence, reason) for each row.
|
||||
- Output: `conductor/chronology.md.staging` with the per-row evidence line + confidence level + "Needs Review" section.
|
||||
- The script is **READ-ONLY** on the source folders; it writes to `chronology.md.staging` only.
|
||||
- **Quality gate (FR7)** runs immediately after: if the gate passes, proceed to Stage 2; if the gate fails, the staging file is preserved and the task aborts to manual review (per FR7).
|
||||
|
||||
**Stage 2: Tier 1 review of the "Needs Review" queue (only if quality gate passes).**
|
||||
- Tier 1 opens `conductor/chronology.md.staging`.
|
||||
- Tier 1 filters to the "Needs Review" section (rows with `confidence=low`).
|
||||
- For each `low`-confidence row, Tier 1:
|
||||
1. Opens the track's `spec.md` (or `plan.md` / `metadata.json` if no spec).
|
||||
2. Runs `git log --oneline -- <folder>` and reviews the commit history.
|
||||
3. Verifies the row's evidence line is accurate.
|
||||
4. Assigns a status from the 5-value enum (or flags for user decision).
|
||||
5. Writes a one-line resolution note (e.g., "Resolved: Active — work in progress, state_phase=2; classifier flagged low because no spec.md yet").
|
||||
- **Tier 1's defaults:**
|
||||
- In `tracks/` + ambiguous → `Active` with a one-line note
|
||||
- In `archive/` + 0 commits → `Special` with note "archive folder with no work commits"
|
||||
- In `archive/` + ≥ 3 work commits + state_phase=0 (missing/incomplete) → `Completed` with note "archive + N work commits; state.toml is stale"
|
||||
- Truly ambiguous → `Special` with note "needs user decision; flagged in Stage 3"
|
||||
- After Tier 1 resolves all `low`-confidence rows, the staging file is updated: the "Needs Review" section is moved to a "Tier 1 Resolutions" section showing each row's resolution note.
|
||||
|
||||
**Stage 3: User review of final v2.**
|
||||
- User opens `conductor/chronology.md.staging` (now with Stage 2 resolutions).
|
||||
- User reviews: (a) the format is correct, (b) every row has evidence + decision, (c) Tier 1's resolutions are reasonable, (d) nothing missed.
|
||||
- User either approves (proceed to Phase 7 promotion) or requests changes (loop back to Stage 2 or 1).
|
||||
|
||||
**The per-row evidence log (NEW FILE).**
|
||||
- Path: `tests/artifacts/chronology_v2_evidence_log.md` (gitignored).
|
||||
- Format: one row per track with: track_id, status, confidence, init_sha, end_sha, commit_count, first_commit_subject, last_commit_subject, state_phase, classifier_reason, tier1_override (if any).
|
||||
- Generated by the script during Stage 1; extended by Tier 1 during Stage 2; reviewed by the user in Stage 3.
|
||||
|
||||
### FR7. Classifier quality gate (NEW)
|
||||
|
||||
**WHERE:** `scripts/audit/chronology_quality_gate.py` (new file) + `tests/test_chronology_quality_gate.py` (new tests).
|
||||
|
||||
**WHAT:** A wrapper script that runs after the classifier's Stage 1 output. The script:
|
||||
1. Reads `conductor/chronology.md.staging` (the script's output).
|
||||
2. Parses each row's confidence level.
|
||||
3. Counts `high` and `low` confidence rows.
|
||||
4. Computes `low_count / total_count`.
|
||||
5. If ratio > 0.30 → exit code 1, prints "ABORT: classifier is bad; >30% of rows are ambiguous. Fall back to manual review (v1 protocol). Tier 1 should manually review every row in the staging file."
|
||||
6. If ratio ≤ 0.30 → exit code 0, prints "PASS: classifier is good. <N> rows need Tier 1 review; proceed to Stage 2."
|
||||
|
||||
**The 30% threshold is a hard gate.** Tier 1 doesn't start Stage 2 until the gate passes. If the gate fails, the staging file is preserved as `chronology.md.staging.aborted` and the task falls back to the v1 manual protocol (Tier 1 reviews every row).
|
||||
|
||||
**Tests for the quality gate:**
|
||||
- Staging file with 0% low → exit 0
|
||||
- Staging file with 30% low (boundary) → exit 0
|
||||
- Staging file with 31% low → exit 1
|
||||
- Staging file with 100% low → exit 1
|
||||
- Staging file with malformed rows → exit 2 (parse error)
|
||||
|
||||
**No shortcut is acceptable:**
|
||||
- "Looks right" is not a verification. Every row is opened, every SHA is checked, every summary is read.
|
||||
- Sample-based verification is not acceptable. EVERY row.
|
||||
- Trusting the script output is not acceptable. The script is a starting point; the cross-check is the truth.
|
||||
## Non-Functional Requirements
|
||||
|
||||
(Carried from v1, mostly unchanged.)
|
||||
|
||||
- **NFR1. Manually maintained.** Per user choice (2026-06-19), the ongoing workflow is hand-edited. No auto-generation in CI; no script runs on every commit. The one-shot migration is a single event; the file is then edited like `tracks.md`.
|
||||
- **NFR2. Compact.** Each row is ≤ 4 lines (the bullet + 3 sub-lines for Folder/Range, OR a single condensed line for very old tracks where the folder is the only link). The file is scannable, not a wall of text.
|
||||
- **NFR3. Re-derivable.** A reader can rebuild the chronology from `git log` + the track folders if needed. The init SHA + end SHA in each row is the contract; the summary is the human-friendly gloss.
|
||||
- **NFR4. No day estimates.** Per the project convention (added 2026-06-16), all scope is measured in files/sites.
|
||||
- **NFR5. No TDD required.** This is a documentation/tooling track, not a feature track. No production code change; no tests added. (If FR5's helper script is built, it gets 3-5 unit tests for the data extraction logic.)
|
||||
- **NFR2. Compact.** Each row is ≤ 5 lines (the bullet + 3 sub-lines for Folder/Range/Evidence, OR a single condensed line for very old tracks where the folder is the only link). The file is scannable, not a wall of text.
|
||||
- **NFR3. Re-derivable.** A reader can rebuild the chronology from `git log` + the track folders if needed. The init SHA + end SHA + evidence line in each row is the contract; the summary is the human-friendly gloss.
|
||||
- **NFR4. No day estimates.** Per `conductor/workflow.md` Tier 1 Track Initialization Rules (added 2026-06-16). All scope is measured in files/sites.
|
||||
- **NFR5. No TDD required for the chronology itself.** This is a documentation/tooling track, not a feature track. The helper script (FR5) gets 8-10 new unit tests for the new classifier (TDD-required per project convention).
|
||||
- **NFR6. Evidence is auditable (NEW).** The per-row evidence log (`tests/artifacts/chronology_v2_evidence_log.md`) is human-readable; every classification decision is reproducible from the log + git history. A reader can verify any row's status by running `git log -- <folder>` and comparing to the evidence log.
|
||||
- **NFR7. Classifier is conservative (NEW).** When in doubt, `low` confidence. The cost of a false `low` (Tier 1 reviews it) is small; the cost of a false `high` (wrong status committed without review) is high. The classifier's bias is toward `low`.
|
||||
|
||||
## Architecture Reference
|
||||
|
||||
- **`conductor/tracks.md:459`** — the existing "lightweight chronology" reference. This track formalizes that role.
|
||||
- **`conductor/workflow.md` "Notes > Editing this file"** — the existing convention for moving tracks to `archive/`. The new 3-step convention is appended here.
|
||||
- **`conductor/code_styleguides/feature_flags.md`** — the "delete to turn off" convention. The helper script (FR5) is opt-in via its presence in `scripts/audit/`; deleting the file turns it off.
|
||||
- **`docs/reports/`** — convention for one-page reports (per `TRACK_COMPLETION_*.md` precedent set by `tier2_autonomous_sandbox_20260616`). The migration report follows the same shape.
|
||||
- **`docs/reports/CHRONOLOGY_TRACK_HANDOVER_20260620.md`** — the failure report; the source of the new classifier algorithm (5-step algorithm, §"Rewrite `_classify_status` to use git history as primary evidence", lines 53-68).
|
||||
- **`docs/reports/CHRONOLOGY_MIGRATION_20260619.md`** — v1 migration report; the v2 addendum (FR4) extends it.
|
||||
- **`conductor/code_styleguides/data_oriented_design.md`** — applies: the chronology is data (one row per track), the classifier is a transformation (git history → status), the evidence log is a projection (data + decision + provenance).
|
||||
- **`conductor/code_styleguides/error_handling.md`** — applies to the helper script: the script's `_classify_status` returns `(status, confidence, reason)` (a data-oriented "and/or" pattern, not an exception). The "Needs Review" queue is a recoverable case (low confidence), not an error.
|
||||
- **`conductor/tracks.md:459`** — the existing "lightweight chronology" reference. v2 formalizes that role.
|
||||
- **`conductor/workflow.md` "Notes > Editing this file"** — the existing convention for moving tracks to `archive/`. The 3-step convention (FR3) is appended here.
|
||||
|
||||
## Out of Scope
|
||||
|
||||
1. **Auto-generation on every commit.** Per the user's "manual maintenance" choice, there's no script that updates `chronology.md` automatically. The file is hand-edited when a track is archived.
|
||||
2. **Tracking "in-flight" tracks in chronology.md.** In-flight tracks (`[~]` in `tracks.md`) stay in `tracks.md` only. The chronology is the record of *completed* work; the active task list is the record of *in-progress* work.
|
||||
(Carried from v1, mostly unchanged.)
|
||||
|
||||
1. **Auto-generation on every commit.** Per the user's "manual maintenance" choice (2026-06-19), there's no script that updates `chronology.md` automatically. The file is hand-edited when a track is archived.
|
||||
2. **Tracking "in-flight" tracks in `chronology.md`.** In-flight tracks (`[~]` in `tracks.md`) appear in `chronology.md` with status `Active` or `In Progress` (per v2's enum). The active task list still lives in `tracks.md`.
|
||||
3. **Tracking "planned but not specced" backlog items.** These stay in `tracks.md` under "Follow-up" and "Backlog". They aren't tracks until they have a folder.
|
||||
4. **Restructuring `tracks.md` beyond `[x]` removal.** The 3 sections that hold `[x]` entries get their `[x]` rows removed, but no new structure is imposed on `tracks.md`. The file's organization is preserved.
|
||||
5. **A separate `chronology/` folder for the file.** The file lives at the conductor root (`conductor/chronology.md`), not in a subdirectory. Same level as `tracks.md`, `workflow.md`, `product.md`.
|
||||
4. **Restructuring `tracks.md` beyond `[x]` removal.** The 3 sections that held `[x]` entries are now stubs (v1 Phase 3); no new structure is imposed.
|
||||
5. **A separate `chronology/` folder for the file.** The file lives at the conductor root (`conductor/chronology.md`), not in a subdirectory.
|
||||
6. **Reformatting existing `spec.md` / `plan.md` files.** The migration reads from them; it does not modify them.
|
||||
7. **A web view of the chronology.** It's a markdown file for in-repo reading. No GUI integration is in scope.
|
||||
8. **A separate `chronology.md.draft` workflow (NEW for v2).** v1 used `.draft` files; v2 doesn't. The classifier emits directly to a staging file (`chronology.md.staging`); the staging file is renamed to `chronology.md` after Stage 2 (Tier 1 review). The `.staging` suffix is gitignored.
|
||||
|
||||
## Verification Criteria
|
||||
|
||||
For the track to be marked complete, ALL of the following must be true:
|
||||
|
||||
- [ ] **VC1.** `conductor/chronology.md` exists, is populated with one row per track (active + shipped + superseded + abandoned), and the format matches FR1.
|
||||
- [ ] **VC2.** `conductor/tracks.md` no longer contains any `[x]` completed-track entries. The "Phase 9: Chore Tracks" section either is removed or is a one-line stub pointing to `chronology.md`. The "Active Research Tracks" and "Follow-up" sections retain only their `[ ]` and `~` in-flight entries.
|
||||
- [ ] **VC3.** `conductor/workflow.md` "Notes > Editing this file" section includes the new 3-step archiving convention (FR3).
|
||||
- [ ] **VC4.** `docs/reports/CHRONOLOGY_MIGRATION_20260619.md` exists with the count summaries + diff preview (FR4).
|
||||
- [ ] **VC5.** `conductor/chronology.md` is in alphabetical/chronological order (newest first), and every row has a `Folder` link and a `Range` line.
|
||||
- [ ] **VC6.** Every track folder in `conductor/tracks/` and `conductor/archive/` has a corresponding row in `chronology.md` (or a documented exception in the migration report).
|
||||
- [ ] **VC7.** The notable non-track commits section (if populated) is sorted newest first and every row has a date, SHA, and description.
|
||||
- [ ] **VC8.** No new `src/*.py` files were created (per `AGENTS.md` File Size and Naming Convention rule).
|
||||
- [ ] **VC9.** End-of-track report at `docs/reports/TRACK_COMPLETION_chronology_20260619.md` (per Tier 2 conventions, if executed by Tier 2).
|
||||
- [ ] **VC10. Per-row cross-check (FR6).** Every row in `chronology.md` was opened, the 5 fields (date, ID, status, summary, range) were verified, and any errors found were fixed before the file was committed. The cross-check is logged in the migration report (per-row checklist or summary).
|
||||
- [ ] **VC11. Completeness check (FR6).** Every folder in `conductor/tracks/` and `conductor/archive/` has a corresponding row in `chronology.md`, OR a documented exception in the migration report (FR4). The folder set vs. row-set difference is empty (or only contains documented exceptions).
|
||||
- [ ] **VC12. User sign-off (FR6).** The user reviewed the final `chronology.md` and confirmed: (a) the format is correct, (b) the summaries are accurate, (c) the commit ranges are right, (d) nothing was missed. The user's sign-off is recorded in the migration report.
|
||||
- [ ] **VC1.** `conductor/chronology.md` v2 exists with 216 rows; all 5 status values are used; per-row evidence line is present; per-row confidence level is present.
|
||||
- [ ] **VC2.** `conductor/tracks.md` pruning is intact (no regression from v1's pruning; `grep -n "^- \[x\]" conductor/tracks.md` returns 0 matches).
|
||||
- [ ] **VC3.** `conductor/workflow.md` 3-step convention is present (no regression; `grep -n "Archiving a track" conductor/workflow.md` returns 1 match).
|
||||
- [ ] **VC4.** `docs/reports/CHRONOLOGY_MIGRATION_20260619.md` has the v2 addendum (per FR4).
|
||||
- [ ] **VC5.** Sorted newest first; every row has Folder + Range + Evidence lines.
|
||||
- [ ] **VC6.** Every folder in `conductor/tracks/` and `conductor/archive/` has a corresponding row, OR a documented exception in the v2 addendum.
|
||||
- [ ] **VC7.** "Notable Non-Track Commits" section is preserved (may be empty if no notable commits found).
|
||||
- [ ] **VC8.** No new `src/*.py` files created (per `AGENTS.md` File Size and Naming Convention rule).
|
||||
- [ ] **VC9.** v2 addendum to `docs/reports/TRACK_COMPLETION_chronology_20260619.md` (per project convention).
|
||||
- [ ] **VC10. Classifier quality gate (FR7).** The `scripts/audit/chronology_quality_gate.py` ran; result was PASS (low confidence ≤ 30%). If the gate failed, the abort-to-B was triggered and Tier 1 manually reviewed every row.
|
||||
- [ ] **VC11. "Needs Review" queue resolved (FR6 Stage 2).** Every `low`-confidence row in the staging file has a Tier 1 resolution note; the queue is empty in the final `chronology.md` (Tier 1's resolutions are reflected in the per-row status).
|
||||
- [ ] **VC12. Per-row evidence log (FR6).** `tests/artifacts/chronology_v2_evidence_log.md` has one row per track with status + confidence + evidence + decision (Tier 1 override if any).
|
||||
- [ ] **VC13. User sign-off (FR6 Stage 3).** User confirmed: format correct, every row has evidence, Tier 1 resolutions are reasonable, nothing missed. Sign-off recorded in the v2 addendum (FR4).
|
||||
- [ ] **VC14. v1 archive preserved (this rewrite's prerequisite).** `conductor/chronology.md.broken-v1` exists with the v1 218-line file; `git log` shows the rewrite is a continuation (commit `3aea92f1` "botched the chronology, going to rewrite the track."), not a re-do.
|
||||
|
||||
## Risk Assessment
|
||||
|
||||
| Risk | Likelihood | Scope impact | Mitigation |
|
||||
|---|---|---|---|
|
||||
| R1: Migration is incomplete (some tracks missed) | medium | implementation may be larger than the spec suggests if many tracks lack spec.md or have ambiguous status | The migration report (FR4) explicitly lists skipped tracks; VC6 checks for "every folder has a row OR a documented exception." |
|
||||
| R2: Brief summaries are too long or too vague | medium | implementation may require manual editing of ~165 summaries | The helper script (FR5) extracts the first sentence of `spec.md`; user (or Tier 1) reviews and trims in the draft phase. |
|
||||
| R3: Commit ranges are wrong (init SHA or end SHA) | low | minimal — git log is authoritative | Helper script uses `git log --reverse --format='%h' -- <folder>` and `git log -1 --format='%h' -- <folder>`; both are deterministic. |
|
||||
| R4: Date source is ambiguous (slug vs first-commit date) | low | minimal | Rule (per FR1): use the slug date. If the slug date disagrees with the first commit (rare; older tracks), the slug wins because the slug is the project's convention. |
|
||||
| R5: User changes their mind on the format after seeing the migration | medium | implementation may be larger than the spec suggests | The migration is reviewed (FR4) BEFORE the chronology.md is finalized. The draft phase (FR5) is the review point. |
|
||||
| R6: `tracks.md` pruning breaks a link the user uses | low | minimal | The pruning is by section + status badge; the user-visible in-flight entries are untouched. The "Status legend" at the bottom of `tracks.md` is preserved. |
|
||||
| R7: Cross-check (FR6) is shallow or skipped (USER DIRECTIVE 2026-06-19) | high | implementation may be larger than the spec suggests; the whole track is not "done" until every row is verified | FR6 is a hard gate (VC10/VC11/VC12). The migration report logs the cross-check. The user signs off on the final result. No shortcut is acceptable. |
|
||||
| R8: Folder has no `spec.md` (older tracks) | medium | minimal — the summary is unknown | Use `metadata.json.description` if present; else use the first non-empty line of `plan.md`; else write a generic placeholder like "Imported from archive (no spec)" and flag in the migration report. |
|
||||
| R9: Track folder exists but is not a real track (e.g., a research note, a scratch dir) | medium | minimal | The completeness check (FR6) catches this: the folder is enumerated, the row is added with status `Special` and a one-line explanation, OR the folder is renamed/removed and the migration report documents it. |
|
||||
| R1: Classifier is too aggressive (false `high` confidence) | medium | Wrong status committed; user catches in Stage 3 | FR7 quality gate (30% abort); per-row evidence makes the classifier's reasoning auditable; conservative bias (NFR7) |
|
||||
| R2: Classifier is too conservative (>30% `low`) | medium | FR7 aborts → fallback to v1 manual protocol (Tier 1 reviews every row) | The fallback is the user's "B" option (per chat 2026-06-21); explicitly designed in FR7 |
|
||||
| R3: Tier 1's resolutions are wrong (Stage 2) | low | User catches in Stage 3 | Per-row resolution notes + evidence log make Tier 1's reasoning auditable; user's Stage 3 review is the final gate |
|
||||
| R4: `state.toml` parsing fails (some folders lack state.toml) | low | Rows fall to "ambiguous" → `low` confidence → queued for review | Classifier tolerates missing state.toml (FR5 §"3. Check `state.toml` phase progression"); "ambiguous" is the correct behavior per the conservative bias |
|
||||
| R5: v1 archive move loses data | low | Minimal — `git mv` is safe | Use `git mv` for the rename; verify with `git log --follow` after |
|
||||
| R6: User disagrees with Tier 1's resolutions | low | Loops back to Stage 2 | The user is the final gate (Stage 3); explicit Stage 3 review |
|
||||
| R7: Summary extraction still picks metadata-field text (regression of v1 bug) | low | Row has bad summary | v2's priority chain + regex rejection (`^\*\*`); tested by extended test suite (FR5 §"Tests extended") |
|
||||
| R8: The 30% threshold is wrong (too low or too high) | medium | If too low: abort too easily. If too high: accept a bad classifier. | The 30% value is the user's "A only if classifier is good" trade-off; if the user wants to adjust, FR7's wrapper script accepts `--threshold` as a CLI flag |
|
||||
| R9: Evidence line format is too verbose (clutters the table) | low | User complains in Stage 3; loops back to FR1 | The evidence line is a sub-line (not a column); the table remains 6 columns. If the user wants it more terse, FR1 can be revised. |
|
||||
| R10: v1's broken chronology is referenced by other docs | low | Confusion between v1 and v2 | `conductor/chronology.md.broken-v1` is clearly labeled; the v2 file is `chronology.md`; the v1 report is extended with the v2 addendum that explains the rename |
|
||||
|
||||
## Execution Plan (high-level — see `plan.md` for worker-ready tasks)
|
||||
|
||||
- [ ] **Phase 1: Audit + data extraction.** Walk `conductor/tracks/` and `conductor/archive/`; for each folder, capture (id, date, status, init SHA, end SHA, summary source). Build the migration dataset.
|
||||
- [ ] **Phase 2: Generate `chronology.md` draft.** Apply the FR1 format to the dataset; write to `conductor/chronology.md.draft` (or directly to `chronology.md` if no draft phase).
|
||||
- [ ] **Phase 3: Prune `tracks.md`.** Remove the 3 categories of `[x]`/`[shipped]` entries per FR2. Leave stubs for fully-removed sections.
|
||||
- [ ] **Phase 4: Update `workflow.md`.** Add the 3-step archiving convention per FR3.
|
||||
- [ ] **Phase 5: Write the migration report.** Per FR4.
|
||||
- [ ] **Phase 6: User review.** User reviews the draft (or final `chronology.md`); approves or requests changes.
|
||||
- [ ] **Phase 7: Final commit.** The spec/plan are committed before this phase; the migration is the implementation work.
|
||||
- [ ] **Phase 8: Per-row cross-check (FR6, hard gate).** Tier 1 opens every row in `chronology.md.draft`, verifies the 5 fields (date, ID, status, summary, range), and fixes any errors. The cross-check is logged in the migration report.
|
||||
- [ ] **Phase 9: Completeness check (FR6, hard gate).** Tier 1 enumerates every folder in `conductor/tracks/` and `conductor/archive/`; any folder without a row is added (or documented as an exception). The diff between folder set and row set is empty (or only contains documented exceptions).
|
||||
- [ ] **Phase 10: User sign-off (FR6, hard gate).** The user reviews the final `chronology.md` and the migration report. The user confirms: (a) format is right, (b) summaries are accurate, (c) commit ranges are right, (d) nothing was missed. Sign-off is recorded in the migration report.
|
||||
- [ ] **Phase 1: Archive v1 + verify state of carried-forward work.** Move `conductor/chronology.md` → `conductor/chronology.md.broken-v1`; reset `state.toml` to `current_phase = 0`; verify `tracks.md` pruning + `workflow.md` 3-step convention are intact.
|
||||
- [ ] **Phase 2: Rewrite the helper script + extend tests (FR5).** Rewrite `_classify_status` to use the 5-step git-history algorithm; add per-row confidence assignment; rewrite summary priority chain with regex rejection; add 8-10 new unit tests.
|
||||
- [ ] **Phase 3: Add the quality gate script (FR7).** New file `scripts/audit/chronology_quality_gate.py`; 5 new unit tests for the threshold logic.
|
||||
- [ ] **Phase 4: Run the new classifier, generate v2 staging (FR6 Stage 1).** Run the script; verify the staging file has per-row evidence + confidence + "Needs Review" section.
|
||||
- [ ] **Phase 5: Quality gate (FR7).** Run `chronology_quality_gate.py`; if PASS, proceed; if ABORT, fallback to manual review protocol.
|
||||
- [ ] **Phase 6: Tier 1 reviews "Needs Review" queue (FR6 Stage 2).** Tier 1 resolves each `low`-confidence row; updates the staging file with Tier 1's resolutions; updates the per-row evidence log.
|
||||
- [ ] **Phase 7: Promote v2 staging → canonical (FR1).** Rename `chronology.md.staging` → `chronology.md`; commit.
|
||||
- [ ] **Phase 8: Write v2 addendum to migration report + end-of-track report (FR4 + VC9).** Add the v2 rewrite section; document the v1 → v2 status diff + Tier 1 review log; write end-of-track v2 addendum.
|
||||
- [ ] **Phase 9: User sign-off (FR6 Stage 3).** User reviews v2 + evidence log + Tier 1 resolutions. Records sign-off in the v2 addendum.
|
||||
- [ ] **Phase 10: Wrap-up.** Mark track complete in `tracks.md` + `state.toml`; set status = "completed" in `metadata.json`.
|
||||
|
||||
## See Also
|
||||
|
||||
- `conductor/tracks.md:459` — the existing "lightweight chronology" reference that this track formalizes.
|
||||
- `conductor/workflow.md` "Notes > Editing this file" — the existing archive convention; the new 3-step convention is appended here.
|
||||
- `docs/reports/CHRONOLOGY_TRACK_HANDOVER_20260620.md` — the failure report; the source of the new classifier algorithm.
|
||||
- `docs/reports/CHRONOLOGY_MIGRATION_20260619.md` — v1 migration report; the v2 addendum extends it.
|
||||
- `conductor/tracks.md:459` — the existing "lightweight chronology" reference that v2 formalizes.
|
||||
- `conductor/workflow.md` "Notes > Editing this file" — the existing archive convention; the 3-step convention (FR3) is appended here.
|
||||
- `conductor/code_styleguides/feature_flags.md` — "delete to turn off" convention; the helper script (FR5) follows it.
|
||||
- `conductor/code_styleguides/data_oriented_design.md` — applies: the chronology is data, the classifier is a transformation, the evidence log is a projection.
|
||||
- `conductor/code_styleguides/error_handling.md` — applies to the helper script: `_classify_status` returns `(status, confidence, reason)` (data-oriented "and/or" pattern).
|
||||
- `docs/reports/TRACK_COMPLETION_tier2_autonomous_sandbox_20260616.md` — precedent for one-page end-of-track reports.
|
||||
- `AGENTS.md` "File Size and Naming Convention" — the hard rule against creating new `src/<thing>.py` files; this track doesn't touch `src/`.
|
||||
- `AGENTS.md` "File Size and Naming Convention" — the hard rule against creating new `src/<thing>.py` files; v2 doesn't touch `src/`.
|
||||
- `AGENTS.md` "Critical Anti-Patterns" — the no-day-estimates rule; the no-`git restore` ban; the report-instead-of-fix pattern (the handover IS a fix, not a report).
|
||||
- `conductor/workflow.md` "Tier 1 Track Initialization Rules" — the no-day-estimates rule followed in this spec.
|
||||
- `conductor/workflow.md` "Skip-Marker Policy" — applies: the v1 chronology's broken rows are not "skipped"; they are re-classified in v2.
|
||||
|
||||
@@ -0,0 +1,263 @@
|
||||
# Tier 2 Startup — code_path_audit_20260607 v2
|
||||
|
||||
> **For Tier 2 Tech Lead (autonomous mode).** This is the entry point. Read this file first, then `plan_v2.md`, then `spec_v2.md`. The v1 files (`spec.md` + `plan.md`) are **preserved unchanged and never executed** — do not load them as the canonical spec.
|
||||
|
||||
## What this track is
|
||||
|
||||
Build `src/code_path_audit.py` v2 — a data-oriented static-analysis tool that audits the 13 data aggregates in `src/` (10 in-scope TypeAliases + 3 candidate placeholders for `any_type_componentization_20260621` which is NOT on master) and produces per-aggregate profiles. The output (custom postfix `.dsl` + markdown + prefix tree text) is the artifact that informs per-aggregate refactor decisions.
|
||||
|
||||
**Why v2 supersedes v1:** v1 was authored 2026-06-07 before the 4 foundational tracks shipped. v1's "per-action" framing is now stale. v2 reframes the audit to "per-data-aggregate" + a 4-direction decomposition-cost heuristic (componentize / unify / hold / insufficient_data) per aggregate. v2 also cross-validates the 2 foundational conventions (`data_structure_strengthening_20260606` + `data_oriented_error_handling_20260606`) directly.
|
||||
|
||||
**The user's framing (2026-06-22):**
|
||||
> "The whole point of the code path audit is to audit all paths nearly in the ./src of the codebase. The main point of it is to identify data-oriented pipelines and what data aggregate they will be operating on. This will realize what the data strengthening just uncovered and cross-audit if its deductions on the data structures are accurate while also being able to utilize additional flexibility the data oriented error handling track has provided. We are entering a time where the codebase is getting heavily adjusted into a properly engineered machine with discernable working parts. The cost of the pipeline is important, it should factor in what data needs to be componentized further vs which can be unified further into wider code paths handling larger fat structs."
|
||||
|
||||
## What to load
|
||||
|
||||
In this order:
|
||||
1. **This file** (`TIER2_STARTUP.md`) — startup context.
|
||||
2. **`plan_v2.md`** — the executable plan. 14 phases, 85+ tasks, 91 tests. **This is the source of truth for execution.**
|
||||
3. **`spec_v2.md`** — the design intent. Read this when the plan is ambiguous.
|
||||
4. **DO NOT load `spec.md` or `plan.md`** — those are the v1 files (preserved, never executed). The plan_v2.md supersedes plan.md.
|
||||
|
||||
## What's on master (verified `7e61dd7d` + commits `7ea414e9` + `85baea8c`)
|
||||
|
||||
- `src/type_aliases.py` — the 10 canonical TypeAliases + 1 NamedTuple (`FileItemsDiff`).
|
||||
- `src/result_types.py` — `Result[T]`, `ErrorInfo`, `ErrorKind`, `NilPath`, `NilRAGState`, `OK`.
|
||||
- `src/mcp_client.py:934-992` — `derive_code_path(target, max_depth=5)` (the v1 primitive; v2's PCG is the multi-symbol superset).
|
||||
- `src/performance_monitor.py` — runtime profiling (used by the `pipeline_runtime_profiling_20260607` follow-up, NOT by this track).
|
||||
- `scripts/audit_main_thread_imports.py` — import-graph CI gate.
|
||||
- `scripts/audit_weak_types.py` — weak-types CI gate.
|
||||
- `scripts/audit_exception_handling.py` — exception-handling CI gate.
|
||||
- `scripts/audit_no_models_config_io.py` — config-I/O ownership CI gate.
|
||||
- `scripts/audit_optional_in_3_files.py` — `Optional[T]` ban CI gate (the 3 baseline files; v2 extends this with +1 line in Phase 12).
|
||||
- `scripts/generate_type_registry.py` — type-registry generator.
|
||||
- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference.
|
||||
- `conductor/code_styleguides/error_handling.md` — the `Result[T]` convention.
|
||||
- `conductor/code_styleguides/type_aliases.md` — the 10 TypeAliases.
|
||||
- `conductor/code_styleguides/agent_memory_dimensions.md` — the 4 mem dims.
|
||||
|
||||
**NOT on master (and the v2 audit must tolerate their absence for an interim run):**
|
||||
- `any_type_componentization_20260621` — merged `f914b2bc`, reverted `751b94d4` (9 minutes later). The 3 candidate aggregates (`ToolSpec`, `ChatMessage`, `ProviderHistory`) are forward-compat placeholders with `is_candidate: True`.
|
||||
- `phase2_4_5_call_site_completion_20260621` — same merge+revert history. The `PHASE3_HYPOTHETICAL_PROMOTION.md` report is NOT on master (reverted with the merge).
|
||||
|
||||
**3 handoff files are also NOT on master** (reverted with the merge): `HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md`, `HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md`, `PROMPT_FOR_TIER_1.md`. The v2 spec/plan do NOT reference these by name; the candidate-aggregate handling is described from first principles.
|
||||
|
||||
## Hard Bans (3-layer enforced)
|
||||
|
||||
These are restated from `conductor/tier2/agents/tier2-autonomous.md`; they apply on every commit:
|
||||
|
||||
- `git push*` (any form) — the user fetches the branch + reviews + merges.
|
||||
- `git checkout*` (any form) — use `git switch -c` for new branches, `git switch` to switch.
|
||||
- `git restore*` (any form) — never restore files.
|
||||
- `git reset*` (any form) — never reset state.
|
||||
- File access outside `C:\projects\manual_slop_tier2\` (the Tier 2 clone) — the Windows restricted token blocks it.
|
||||
- **`*AppData\\*`** — AppData is OFF-LIMITS for any read, write, or shell command. Use `tests/artifacts/tier2_state/<track>/` for failcount state, `tests/artifacts/tier2_failures/` for failure reports, `scripts/tier2/artifacts/<track>/` for throwaway scripts.
|
||||
|
||||
If a task requires one of these, **STOP and report to the user** — do not bypass.
|
||||
|
||||
## Conventions (MUST follow)
|
||||
|
||||
- **Test runner:** `uv run python scripts/run_tests_batched.py` (NEVER `uv run pytest` directly; the batched runner provides tier-based filtering, parallelization, and the summary table).
|
||||
- **Default branch:** `master` (not `main`).
|
||||
- **Line endings:** preserve existing. This repo has a mix of CRLF and LF. Do not normalize.
|
||||
- **Throw-away scripts:** `scripts/tier2/artifacts/code_path_audit_20260607/` (NOT the base `scripts/tier2/` dir).
|
||||
- **End-of-track report:** `docs/reports/TRACK_COMPLETION_code_path_audit_20260607.md` (the file name uses the track_id, not the date; check the precedent set by `TRACK_COMPLETION_live_gui_test_fixes_20260618.md`).
|
||||
|
||||
## TDD Protocol (per `conductor/workflow.md`)
|
||||
|
||||
1. **Red:** write the failing test (1 commit). Run `uv run python scripts/run_tests_batched.py` and confirm FAIL.
|
||||
2. **Green:** implement the minimal code to pass (1 commit). Run and confirm PASS.
|
||||
3. **Refactor:** (optional) 1 commit if there's cleanup.
|
||||
4. **Commit per task** (1 task = 1 commit). Attach a git note summarizing the task.
|
||||
5. **Update `plan_v2.md`**: change `[ ]` to `[x] <7-char-sha>` for the completed task. Commit the plan update.
|
||||
|
||||
## Per-Task Commit Protocol
|
||||
|
||||
After each task:
|
||||
1. `git add <specific files>` (not `git add .` for individual commits).
|
||||
2. `git commit -m "<type>(<scope>): <description>"` (e.g., `feat(audit): add the 5 enums`).
|
||||
3. Get the commit hash: `git log -1 --format="%H"`.
|
||||
4. Attach git note: `git notes add -m "Task N.M: ..." <hash>`.
|
||||
5. Update `plan_v2.md`: change `[ ]` to `[x] <7-char-sha>` for the task.
|
||||
6. Commit the plan update: `git add plan_v2.md && git commit -m "conductor(plan): Mark task N.M complete"`.
|
||||
|
||||
## Pre-Delegation Checkpoint
|
||||
|
||||
Before each Tier 3 worker delegation, run `git add .` to stage prior work. This is a safety net: if the worker fails or incorrectly runs `git restore`, your prior iterations are not lost.
|
||||
|
||||
## Failcount Contract
|
||||
|
||||
After every task commit, you MUST check `should_give_up` from `scripts.tier2.failcount`. The state is persisted at `tests/artifacts/tier2_state/code_path_audit_20260607/state.json` (project-relative; resolved via `Path(__file__).parents[2]` in the failcount module). The thresholds are:
|
||||
- 3 consecutive red-phase failures
|
||||
- 3 consecutive green-phase failures
|
||||
- 30 minutes with no progress (no commit, no green test)
|
||||
|
||||
If `should_give_up` returns True, IMMEDIATELY stop. Do not attempt another fix. Call `write_failure_report` from `scripts.tier2.write_report` and print the report path. Then **escalate to the user** (do not just write a report and stop silently).
|
||||
|
||||
## Track-Specific Guidance
|
||||
|
||||
### The 3 candidate aggregates
|
||||
|
||||
The 3 candidate aggregates (`ToolSpec`, `ChatMessage`, `ProviderHistory`) are NOT on master. The v2 audit produces **placeholders** with `is_candidate: True` and all metrics set to 0. The `candidates.md` rollup explains the placeholder status. The integration tests verify the placeholder format.
|
||||
|
||||
**The v2 spec's `synthesize_aggregate_profile()` Task 9.2 has the placeholder template hard-coded.** When implementing it, use the exact template from the spec — do not invent a different placeholder structure.
|
||||
|
||||
### The 4 audit gates
|
||||
|
||||
After every commit, run:
|
||||
```bash
|
||||
uv run python scripts/audit_exception_handling.py --strict
|
||||
uv run python scripts/audit_weak_types.py --strict
|
||||
uv run python scripts/audit_main_thread_imports.py
|
||||
uv run python scripts/audit_no_models_config_io.py
|
||||
```
|
||||
|
||||
These are the "laws of physics" for `src/code_path_audit.py`. If a gate fails, **fix before continuing**. The most likely failure mode is a Tier 3 worker adding an `Optional[T]` return type (banned in the 3 refactored files + the new file) or a `try/except: pass` (banned per `error_handling.md` Pattern 5).
|
||||
|
||||
### The `Result[T]` return type rule
|
||||
|
||||
**Every public function in `src/code_path_audit.py` that can fail at runtime returns `Result[T]`.** No `Optional[T]` returns. No `None` returns. No `raise Exception(...)` (only `raise` for programmer errors, e.g., `raise ValueError` in `__init__` for missing config).
|
||||
|
||||
The plan marks 6 of the 11 public functions as returning deterministic `T` (no failure mode). The other 5 (1, 2, 7, 9, 10) return `Result[T]`. **Do not add `Result[T]` to the deterministic ones** — it adds noise. **Do not skip `Result[T]` on the fallible ones** — it violates the convention.
|
||||
|
||||
### The 11 public functions (per the spec)
|
||||
|
||||
| # | Function | Returns | Phase |
|
||||
|---|---|---|---|
|
||||
| 1 | `run_audit(...)` | `Result[AuditSummary]` | 9 |
|
||||
| 2 | `build_pcg(src_dir)` | `Result[ProducerConsumerGraph]` | 2 |
|
||||
| 3 | `classify_memory_dim(...)` | `MemoryDim` (deterministic) | 3 |
|
||||
| 4 | `detect_access_pattern(...)` | `AccessPattern` (deterministic) | 4 |
|
||||
| 5 | `estimate_call_frequency(...)` | `Frequency` (deterministic) | 5 |
|
||||
| 6 | `compute_decomposition_cost(...)` | `DecompositionCost` (deterministic) | 6 |
|
||||
| 7 | `read_input_json(path)` | `Result[dict]` | 7 |
|
||||
| 8 | `to_dsl_v2(profile)` | `str` (deterministic) | 8 |
|
||||
| 9 | `parse_dsl_v2(text)` | `Result[dict]` | 8 |
|
||||
| 10 | `to_markdown(profile)` | `str` (deterministic) | 8 |
|
||||
| 11 | `to_tree(profile)` | `str` (deterministic) | 8 |
|
||||
|
||||
Plus the CLI (`if __name__ == "__main__":`) and the MCP tool wrapper (`code_path_audit_v2`).
|
||||
|
||||
### The 14 v2 DSL tagged words (per the spec)
|
||||
|
||||
`kind`, `mem-dim`, `fn-ref`, `access-pattern`, `ap-evidence`, `frequency`, `freq-evidence`, `result-coverage`, `type-alias-coverage`, `cross-audit-finding`, `cross-audit-findings`, `decomp-cost`, `opt-candidate`, `is-candidate`. The arity table is in `src/code_path_audit.py:DSL_WORD_ARITY_V2` (Phase 8 Task 8.1).
|
||||
|
||||
The DSL format is **flat sections** (streamable, tag-scannable) — NOT a nested record. Each `\\ === section_name ===` line is followed by the section's tagged records. This is the v1 design's "no need to parse the whole file" property applied to v2.
|
||||
|
||||
### The 5 enums (per the spec)
|
||||
|
||||
`AggregateKind` (4 values: typealias, dataclass, candidate_dataclass, builtin), `MemoryDim` (7 values: curation, discussion, rag, knowledge, config, control, unknown), `AccessPattern` (5 values: whole_struct, field_by_field, hot_cold_split, bulk_batched, mixed), `Frequency` (7 values: hot, per_turn, per_discussion, per_request, cold, init, unknown), `RecommendedDirection` (4 values: componentize, unify, hold, insufficient_data).
|
||||
|
||||
All enums are `Literal[...]` types (string-valued) for stable postfix DSL output. No `Enum` class — the v1 spec's rationale is "no enum-name lookup table needed in the parser."
|
||||
|
||||
### The 9 supporting dataclasses (per the spec)
|
||||
|
||||
`FunctionRef`, `AccessPatternEvidence`, `FrequencyEvidence`, `ResultCoverage`, `TypeAliasCoverage`, `CrossAuditFinding`, `CrossAuditFindings`, `DecompositionCost`, `OptimizationCandidate`. Plus the central `AggregateProfile` (14 required fields + 2 default). All `frozen=True` per the immutability story.
|
||||
|
||||
### The 4 decomposition directions (per the spec)
|
||||
|
||||
- `componentize` — split into smaller dataclasses; access pattern is `field_by_field` with many dead fields, OR `hot_cold_split` with small hot fields.
|
||||
- `unify` — combine into wider fat structs; access pattern is `bulk_batched` with a small struct, OR `whole_struct` with a small struct.
|
||||
- `hold` — current shape is correct; default for `frozen + whole_struct` (the ideal shape).
|
||||
- `insufficient_data` — access pattern is `mixed` or frequency is `unknown`; needs runtime profiling.
|
||||
|
||||
The 4-direction logic is in `src/code_path_audit.py:recommended_direction()` (Phase 6 Task 6.6). The savings estimates are heuristic (calibrated by `pipeline_runtime_profiling_20260607`); use as ranking input, not as actual savings.
|
||||
|
||||
### The 6 input JSON contracts (per the spec)
|
||||
|
||||
The v2 audit consumes JSON from 6 sources in `tests/artifacts/audit_inputs/` (gitignored per `test_sandbox.md`):
|
||||
|
||||
| Input | Producer | Path |
|
||||
|---|---|---|
|
||||
| 1 | `scripts/audit_weak_types.py --json` | `audit_weak_types.json` |
|
||||
| 2 | `scripts/audit_exception_handling.py --json` | `audit_exception_handling.json` |
|
||||
| 3 | `scripts/audit_optional_in_3_files.py --json` | `audit_optional_in_3_files.json` |
|
||||
| 4 | `scripts/audit_no_models_config_io.py --json` | `audit_no_models_config_io.json` |
|
||||
| 5 | `scripts/audit_main_thread_imports.py --json` | `audit_main_thread_imports.json` |
|
||||
| 6 | `scripts/generate_type_registry.py --json` | `type_registry.json` |
|
||||
|
||||
**Tolerance:** if any input is missing or malformed, the audit continues with the corresponding `cross_audit_findings` field set to `()` (empty tuple) and the markdown notes the missing input. The audit does NOT fail on missing inputs.
|
||||
|
||||
### The integration test fixture
|
||||
|
||||
`tests/fixtures/synthetic_src/` defines 3 TypeAliases (Metadata, FileItems, History) + 6 functions (2 producers, 4 consumers). `tests/fixtures/audit_inputs/` has 6 JSON files matching the contracts. The integration tests assert the exact expected profiles per aggregate (the expected output is in the spec's §7.1 + the plan's Phase 10 tasks).
|
||||
|
||||
**The fixture names match the canonical TypeAliases** (Metadata, FileItems, History) so the audit's `CANONICAL_MEMORY_DIM` lookup works correctly. Do not rename the fixture's aggregates.
|
||||
|
||||
## Known gotchas (from prior tracks' lessons)
|
||||
|
||||
These are the "1% chance this happens but you'll waste 4 hours if you don't know" notes:
|
||||
|
||||
1. **`Optional[T]` ban extends to the new file.** The `scripts/audit_optional_in_3_files.py` script will be extended in Phase 12 to check `src/code_path_audit.py`. If any Tier 3 worker adds an `Optional[T]` return, the extended audit fails. **Read `conductor/code_styleguides/error_handling.md` before writing the public API.** The 5 MUST-DO rules and 7 MUST-NOT-DO rules apply.
|
||||
|
||||
2. **Logging is NOT a drain.** Per `error_handling.md` Pattern A: `sys.stderr.write` / `logging.error` / `print` in an except body is `INTERNAL_SILENT_SWALLOW`, a violation. The CLI / MCP entry points are the drain points. Use `Result[T]` propagation and let the error reach the drain.
|
||||
|
||||
3. **The AST walker does NOT execute the code.** The PCG, APD, CFE are pure static analysis. No `eval`, no `exec`, no imports of `src/*` modules that have side effects. The v2 audit reads files; it does not import them.
|
||||
|
||||
4. **`scripts/run_tests_batched.py` is the only test runner.** Direct `uv run pytest` may work for a single file but bypasses the tiering that the live_gui tests depend on. The failcount and per-tier filtering only work with the batched runner.
|
||||
|
||||
5. **`master` is the default branch.** This repo never had `main`. `git fetch origin master` (NOT `main`).
|
||||
|
||||
6. **The CRLF/LF mix is intentional.** Do not normalize. Per-file preservation.
|
||||
|
||||
7. **The 3 candidate aggregates are placeholders.** When you run the audit on `master`, the `candidates.md` rollup will show 3 placeholders with `is_candidate: True`. This is correct. The placeholders become real profiles when `any_type_componentization_20260621` is re-merged.
|
||||
|
||||
8. **The 1-line extension to `scripts/audit_optional_in_3_files.py` is the audit gate.** If you skip Phase 12 Task 12.2, the new file is not covered by the `Optional[T]` ban, and a future Tier 3 worker could regress the convention. Do the extension.
|
||||
|
||||
## Verification Protocol (per `conductor/workflow.md`)
|
||||
|
||||
After every task, run the **4 audit gates** in `--strict` mode + the unit tests:
|
||||
|
||||
```bash
|
||||
uv run pytest tests/test_code_path_audit.py -q
|
||||
uv run python scripts/audit_exception_handling.py --strict
|
||||
uv run python scripts/audit_weak_types.py --strict
|
||||
uv run python scripts/audit_main_thread_imports.py
|
||||
uv run python scripts/audit_no_models_config_io.py
|
||||
```
|
||||
|
||||
At **end-of-track** (Phase 13), add:
|
||||
```bash
|
||||
uv run python -m src.code_path_audit --all --date 2026-06-22
|
||||
uv run python scripts/audit_code_path_audit_coverage.py --input-dir docs/reports/code_path_audit/2026-06-22/ --strict
|
||||
uv run python scripts/generate_type_registry.py --check
|
||||
```
|
||||
|
||||
## End-of-Track Handoff
|
||||
|
||||
When all 14 phases complete, write `docs/reports/TRACK_COMPLETION_code_path_audit_20260607.md` (the user reads this to decide merge). Update `conductor/tracks.md` with the v2 entry. Update `state.toml` to `status = "completed"` and `current_phase = "complete"`.
|
||||
|
||||
The TRACK_COMPLETION report should include:
|
||||
- What shipped (file inventory).
|
||||
- Verification: 91 tests pass + 4 audit gates + meta-audit + type registry.
|
||||
- The cross-validation verdict (does the v2 audit's data match the actual state of `data_structure_strengthening` + `data_oriented_error_handling`?).
|
||||
- The 5 follow-up tracks.
|
||||
- The 3 candidate aggregates' forward-compat status.
|
||||
|
||||
## Out of scope (restated)
|
||||
|
||||
- Modifications to existing `src/*.py` files (read-only on the 65 existing files).
|
||||
- Modifications to the 5 existing audit scripts (consume their JSON; don't change them).
|
||||
- Runtime profiling (deferred to `pipeline_runtime_profiling_20260607`).
|
||||
- New pip dependencies (stdlib only).
|
||||
- Changes to v1 spec.md or plan.md (preserved unchanged).
|
||||
- MMA worker spawn action (cold per user).
|
||||
- New src/<thing>.py files (per AGENTS.md file size + naming convention).
|
||||
- The 23 lower-impact files (deferred).
|
||||
|
||||
## See also
|
||||
|
||||
- `conductor/tracks/code_path_audit_20260607/spec_v2.md` — the canonical spec (design intent).
|
||||
- `conductor/tracks/code_path_audit_20260607/plan_v2.md` — the canonical plan (executable).
|
||||
- `conductor/tracks/code_path_audit_20260607/metadata.json` — the track metadata.
|
||||
- `conductor/tracks/code_path_audit_20260607/state.toml` — the track state.
|
||||
- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference.
|
||||
- `conductor/code_styleguides/error_handling.md` — the `Result[T]` convention.
|
||||
- `conductor/code_styleguides/type_aliases.md` — the 10 TypeAliases.
|
||||
- `conductor/code_styleguides/agent_memory_dimensions.md` — the 4 mem dims.
|
||||
- `conductor/tier2/agents/tier2-autonomous.md` — the Tier 2 agent prompt (this file is the track-specific supplement).
|
||||
- `conductor/tier2/commands/tier-2-auto-execute.md` — the execute command.
|
||||
- `docs/reports/RESULT_MIGRATION_CAMPAIGN_STATUS_20260619.md` — the 100%-complete result migration campaign (the v2 audit runs against this final state).
|
||||
- `docs/reports/ANY_TYPE_AUDIT_20260621.md` — the 89-site audit that informed the 3 candidate aggregates.
|
||||
- `docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md` — the cost analysis that informed the `ProviderHistory` candidate (NOT on master; reverted with the merge).
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_takeaways_v3_1_20260620.md` — the v3.1 nagent review (Candidate 27: Markdown + custom DSL lock-in is the direct application of the v2's custom postfix DSL).
|
||||
@@ -0,0 +1,200 @@
|
||||
{
|
||||
"id": "code_path_audit_20260607",
|
||||
"title": "Code Path & Data Pipeline Audit v2",
|
||||
"type": "tooling",
|
||||
"status": "active",
|
||||
"priority": "A",
|
||||
"created": "2026-06-07",
|
||||
"last_revised": "2026-06-22",
|
||||
"owner": "tier2-tech-lead",
|
||||
"parent_umbrella": null,
|
||||
"spec": "conductor/tracks/code_path_audit_20260607/spec_v2.md",
|
||||
"plan": "conductor/tracks/code_path_audit_20260607/plan_v2.md",
|
||||
"spec_v1_preserved": "conductor/tracks/code_path_audit_20260607/spec.md (v1, never executed; preserved unchanged)",
|
||||
"plan_v1_preserved": "conductor/tracks/code_path_audit_20260607/plan.md (v1, never executed; preserved unchanged)",
|
||||
"v2_revision_rationale": "v1 was authored 2026-06-07 before the 4 foundational tracks shipped; v1 framing is now stale. v2 re-scopes the audit from 'expensive operations per action' to 'data pipelines per aggregate' + a decomposition-cost heuristic (componentize vs unify) per aggregate. v2 also cross-validates data_structure_strengthening + data_oriented_error_handling directly (the 2 foundational tracks didn't exist on 2026-06-07).",
|
||||
"scope": {
|
||||
"files_created": 17,
|
||||
"files_created_paths": [
|
||||
"src/code_path_audit.py",
|
||||
"tests/test_code_path_audit.py",
|
||||
"tests/test_code_path_audit_live_gui.py",
|
||||
"tests/fixtures/synthetic_src/__init__.py",
|
||||
"tests/fixtures/synthetic_src/type_aliases.py",
|
||||
"tests/fixtures/synthetic_src/ai_client.py",
|
||||
"tests/fixtures/synthetic_src/aggregate.py",
|
||||
"tests/fixtures/synthetic_src/gui_2.py",
|
||||
"tests/fixtures/synthetic_src/cleanup.py",
|
||||
"tests/fixtures/synthetic_src/overrides.toml",
|
||||
"tests/fixtures/audit_inputs/audit_weak_types.json",
|
||||
"tests/fixtures/audit_inputs/audit_exception_handling.json",
|
||||
"tests/fixtures/audit_inputs/audit_optional_in_3_files.json",
|
||||
"tests/fixtures/audit_inputs/audit_no_models_config_io.json",
|
||||
"tests/fixtures/audit_inputs/audit_main_thread_imports.json",
|
||||
"tests/fixtures/audit_inputs/type_registry.json",
|
||||
"scripts/audit_code_path_audit_coverage.py",
|
||||
"conductor/code_styleguides/code_path_audit.md"
|
||||
],
|
||||
"files_modified": 1,
|
||||
"files_modified_paths": [
|
||||
"scripts/audit_optional_in_3_files.py (+1 line: add src/code_path_audit.py to the baseline list)"
|
||||
],
|
||||
"files_preserved_v1": [
|
||||
"conductor/tracks/code_path_audit_20260607/spec.md (v1)",
|
||||
"conductor/tracks/code_path_audit_20260607/plan.md (v1)"
|
||||
],
|
||||
"phases": 14,
|
||||
"tasks": 85,
|
||||
"tests_total": 91,
|
||||
"tests_unit": 84,
|
||||
"tests_integration": 7,
|
||||
"tests_live_gui_opt_in": 2,
|
||||
"aggregates_total": 13,
|
||||
"aggregates_real": 10,
|
||||
"aggregates_candidate": 3,
|
||||
"rollups": 4,
|
||||
"follow_up_tracks": 5
|
||||
},
|
||||
"depends_on": [
|
||||
"data_oriented_error_handling_20260606 (SHIPPED; the v2 audit's result_coverage cross-checks this)",
|
||||
"data_structure_strengthening_20260606 (SHIPPED; the v2 audit's type_alias_coverage cross-checks this)",
|
||||
"mcp_architecture_refactor_20260606 (SHIPPED; provides the 6 input audit scripts' baselines)",
|
||||
"qwen_llama_grok_integration_20260606 (SHIPPED; the v2 audit covers the 8 _send_<vendor> functions)",
|
||||
"result_migration_20260616 (100% complete as of 2026-06-21; the v2 audit runs against the post-migration src/)"
|
||||
],
|
||||
"blocks": [
|
||||
"pipeline_runtime_profiling_20260607 (preserved from v1; calibrates v2's heuristic cost constants against real measurements)",
|
||||
"data_pipelines_inventory_<date> (per-pipeline vs per-aggregate reports for the top 5 pipelines)",
|
||||
"code_path_audit_in_ci_<date> (run v2 in CI on every PR)",
|
||||
"code_path_audit_data_oriented_refactor_<date> (implement the 3 high-priority componentize candidates)",
|
||||
"code_path_audit_v2_5_followup_<date> (re-run v2 after any_type_componentization_20260621 merges)"
|
||||
],
|
||||
"out_of_scope": [
|
||||
"No modifications to existing src/*.py files (read-only on the 65 existing files; the v2 audit doesn't change them).",
|
||||
"No modifications to the 5 existing audit scripts (consume their JSON; don't change them).",
|
||||
"No runtime profiling (deferred to pipeline_runtime_profiling_20260607).",
|
||||
"No new pip dependencies (stdlib only: ast, pathlib, json, dataclasses, tomllib, re).",
|
||||
"No changes to data_structure_strengthening or data_oriented_error_handling styleguides.",
|
||||
"No changes to v1 spec.md or plan.md (v1 preserved unchanged).",
|
||||
"No MMA worker spawn action (preserved from v1; user directive 2026-06-07: cold until 1:1 discussion UX is dogfooded).",
|
||||
"No new src/<thing>.py files (per AGENTS.md file size + naming convention: helpers and sub-systems go in the parent module).",
|
||||
"The 23 lower-impact files (1-9 weak-type sites each; deferred to a follow-up track).",
|
||||
"The 3 candidate aggregates' 'real' analysis (deferred to code_path_audit_v2_5_followup_<date>).",
|
||||
"The v1-style per-action output is preserved for backward compat but downgraded to cross-references."
|
||||
],
|
||||
"tolerated_at_run_time": [
|
||||
"any_type_componentization_20260621 is NOT on master (merged f914b2bc, reverted 751b94d4); the v2 audit produces placeholders for the 3 candidate aggregates with is_candidate: True.",
|
||||
"phase2_4_5_call_site_completion_20260621 is NOT on master (same merge+revert history).",
|
||||
"Missing input JSONs in tests/artifacts/audit_inputs/ are tolerated (the corresponding cross_audit_findings field is empty; the markdown notes the absence).",
|
||||
"Malformed input JSONs are tolerated (the read_input_json() returns Result with errors; the v2 audit continues with empty data)."
|
||||
],
|
||||
"test_summary": {
|
||||
"tests_total": 91,
|
||||
"tests_unit": 84,
|
||||
"tests_integration": 7,
|
||||
"tests_live_gui_opt_in": 2,
|
||||
"test_tier_count": 11,
|
||||
"test_pass_count_target": "All 91 tests PASS; the 2 live_gui are opt-in (CODE_PATH_AUDIT_LIVE_GUI=1)"
|
||||
},
|
||||
"verification_criteria": [
|
||||
"FR-1: src/code_path_audit.py is created with the 11 public functions + 4 static analyzers (PCG, MemoryDim, APD, CFE) + 4 renderers (to_dsl_v2, to_markdown, to_tree, parse_dsl_v2) + run_audit() main entry + CLI + MCP tool wrapper",
|
||||
"FR-2: All 11 public functions return Result[T] per error_handling.md (or return a deterministic T when no runtime failure is possible)",
|
||||
"FR-3: The 4 audit gates pass in --strict mode (audit_exception_handling, audit_weak_types, audit_main_thread_imports, audit_no_models_config_io)",
|
||||
"FR-4: The meta-audit (scripts/audit_code_path_audit_coverage.py) passes on the real audit output (0 schema violations)",
|
||||
"FR-5: The type registry is in sync with src/type_aliases.py (scripts/generate_type_registry.py --check exits 0)",
|
||||
"FR-6: 91 tests pass (84 unit + 7 integration; 2 live_gui are opt-in)",
|
||||
"FR-7: The audit output (13 per-aggregate .dsl + .md + .tree files + 4 rollups) is committed to docs/reports/code_path_audit/2026-06-22/",
|
||||
"FR-8: The TRACK_COMPLETION report is written to docs/reports/TRACK_COMPLETION_code_path_audit_20260622.md",
|
||||
"FR-9: conductor/tracks.md is updated with the v2 track entry (the checkpoint SHA from the TRACK_COMPLETION report commit)",
|
||||
"FR-10: The 1-line extension to scripts/audit_optional_in_3_files.py is committed; the extended audit passes in --strict mode",
|
||||
"FR-11: conductor/code_styleguides/code_path_audit.md is written (the 5-convention styleguide)",
|
||||
"Atomic per-task commits with git notes per conductor/workflow.md step 9.1-9.3",
|
||||
"No day estimates, no T-shirt sizes in any artifact"
|
||||
],
|
||||
"risks": [
|
||||
{
|
||||
"id": "R1",
|
||||
"description": "The decomposition-cost heuristic is inaccurate (componentize_savings overestimate or underestimate)",
|
||||
"mitigation": "The runtime-profiling follow-up recalibrates. The override file (scripts/code_path_audit_overrides.toml) lets the user adjust per-aggregate. The summary.md and decomposition_matrix.md headers caveat: 'Savings estimates are heuristic; use as ranking input, not as actual savings.'"
|
||||
},
|
||||
{
|
||||
"id": "R2",
|
||||
"description": "The PCG misses dynamic patterns (eval, getattr, decorator-driven dispatch like @imscope)",
|
||||
"mitigation": "The override file lists the known passthroughs. The runtime-profiling follow-up catches the unresolved. The v1 spec's 'unresolved_calls' pattern is preserved."
|
||||
},
|
||||
{
|
||||
"id": "R3",
|
||||
"description": "The 6 input JSON contracts drift (the existing audit scripts evolve without bumping the v2 audit's contract)",
|
||||
"mitigation": "The scripts/audit_code_path_audit_coverage.py meta-audit runs in CI; fails on schema drift. The v2 audit tolerates missing fields (returns empty cross_audit_findings; markdown notes the absence)."
|
||||
},
|
||||
{
|
||||
"id": "R4",
|
||||
"description": "The candidate aggregates don't merge (any_type_componentization_20260621 is delayed)",
|
||||
"mitigation": "The v2 audit is forward-compatible. The is_candidate: bool flag handles the absence gracefully. The candidates.md rollup explains the placeholder status."
|
||||
},
|
||||
{
|
||||
"id": "R5",
|
||||
"description": "The v1 .dsl files don't round-trip (the v2 parser is more strict than v1)",
|
||||
"mitigation": "The v2 parser is a superset of v1; the v1 action reports still parse. The test_v2_dsl_backward_compat_v1 test verifies."
|
||||
},
|
||||
{
|
||||
"id": "R6",
|
||||
"description": "The synthetic src/ fixture diverges from real src/ (the test expectations don't generalize)",
|
||||
"mitigation": "The integration test layer runs against real src/ as well as the synthetic fixture. The 2 are decoupled."
|
||||
},
|
||||
{
|
||||
"id": "R7",
|
||||
"description": "The 4 audit gates regress during implementation (Tier 3 worker adds a try/except violation, Optional[T] return, etc.)",
|
||||
"mitigation": "Run the 4 audit gates in --strict mode after every commit. If a gate fails, fix before continuing. The audit scripts are the 'laws of physics' for the new file."
|
||||
},
|
||||
{
|
||||
"id": "R8",
|
||||
"description": "The 85+ tasks exceed Tier 2's per-task context window (the model runs out of memory mid-track)",
|
||||
"mitigation": "Per-task commits are atomic; the failcount state file persists progress. The per-task commit discipline means each commit is a safe rollback point. If a task fails 3 times, escalate to the user (don't keep retrying)."
|
||||
},
|
||||
{
|
||||
"id": "R9",
|
||||
"description": "The 91 tests are too long-running for the per-PR CI gate (the user expects <2 min for unit tests)",
|
||||
"mitigation": "The unit + integration tests run in <30s. The live_gui tests are opt-in via the CODE_PATH_AUDIT_LIVE_GUI env var. The 2 opt-in tests are not in the default run."
|
||||
},
|
||||
{
|
||||
"id": "R10",
|
||||
"description": "The Tier 2 agent uses a git command that is hard-banned (git restore, git checkout, git reset, git push)",
|
||||
"mitigation": "The 3-layer hard ban enforcement (OpenCode permission + Windows restricted token + git hooks) catches the violation. The TIER2_STARTUP.md restates the hard bans. If a task requires one, escalate to the user."
|
||||
}
|
||||
],
|
||||
"out_of_scope": [
|
||||
"Modifications to existing src/*.py files (read-only on the 65 existing files)",
|
||||
"Modifications to the 5 existing audit scripts (consume their JSON; don't change them)",
|
||||
"Runtime profiling (deferred to pipeline_runtime_profiling_20260607)",
|
||||
"New pip dependencies (stdlib only)",
|
||||
"Changes to data_structure_strengthening or data_oriented_error_handling styleguides",
|
||||
"Changes to v1 spec.md or plan.md (v1 preserved)",
|
||||
"MMA worker spawn action (cold per user)",
|
||||
"New src/<thing>.py files (per AGENTS.md file size + naming convention)",
|
||||
"The 23 lower-impact files (deferred)",
|
||||
"The 3 candidate aggregates' real analysis (deferred to v2.5 follow-up)"
|
||||
],
|
||||
"follow_up_tracks": [
|
||||
{
|
||||
"id": "pipeline_runtime_profiling_20260607",
|
||||
"purpose": "Calibrate v2's heuristic cost constants against real measurements. Uses src/performance_monitor.py."
|
||||
},
|
||||
{
|
||||
"id": "data_pipelines_inventory_<date>",
|
||||
"purpose": "Per-pipeline (vs per-aggregate) reports for the top 5 pipelines."
|
||||
},
|
||||
{
|
||||
"id": "code_path_audit_in_ci_<date>",
|
||||
"purpose": "Run v2 in CI on every PR; fail on new untyped sites or decomposition-matrix regression."
|
||||
},
|
||||
{
|
||||
"id": "code_path_audit_data_oriented_refactor_<date>",
|
||||
"purpose": "Implement the 3 high-priority componentize candidates (FileItems, History, Metadata)."
|
||||
},
|
||||
{
|
||||
"id": "code_path_audit_v2_5_followup_<date>",
|
||||
"purpose": "Re-run v2 after any_type_componentization_20260621 merges; the 3 placeholders become real profiles."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -305,6 +305,79 @@ This track has **no blockers** and **no conflicts**. It can ship independently o
|
||||
|
||||
This track's analysis is **read-only** — it doesn't modify `src/`, doesn't change the public API, doesn't add tests to the existing test suite. The only new files are `src/code_path_audit.py` (the tool), `tests/test_code_path_audit.py` (the tests), and the report under `docs/reports/code_path_audit/2026-06-07/`.
|
||||
|
||||
## Pre-Flight Adjustments (2026-06-21, per handoffs from `any_type_componentization_20260621`)
|
||||
|
||||
The `any_type_componentization_20260621` track (shipped 2026-06-21 with 48/89 sites promoted) revealed that **the 4 foundational tracks this audit was deferred behind have evolved**. Specifically, 5 new hot-path dataclasses (`ToolSpec`, `ChatMessage`, `UsageStats`, `ToolCall`, `WebSocketMessage`) and 1 new module (`provider_state.ProviderHistory`) now exist. This audit must instrument them.
|
||||
|
||||
**Per `docs/handoffs/PROMPT_FOR_TIER_1.md` and `HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md`, the following 4 adjustments are added to this audit's scope:**
|
||||
|
||||
### A1. Add 2 new actions to the per-action profiling
|
||||
|
||||
The existing 3 actions (`ai_message_lifecycle`, `discussion_save_load`, `gui_startup`) become 5:
|
||||
|
||||
| Action | Codepath | Measures |
|
||||
|---|---|---|
|
||||
| `provider_history_append` (NEW) | `get_history(p).append(msg)` (or legacy `_anthropic_history.append(msg)`) | Per-turn append latency + lock acquire time + memory allocation per call. The hot path Phase 3 will refactor. |
|
||||
| `websocket_broadcast` (NEW) | `broadcast(WebSocketMessage(...))` (post-Phase 6a) | Per-broadcast overhead (allocation + JSON serialization + WebSocket send). The GUI thread's per-event cost. |
|
||||
| `ai_message_lifecycle` (existing) | `_send_<provider>` end-to-end | Total per-turn latency delta pre/post Phase 3 (`provider_state.ProviderHistory`). The 3 OpenAI-compatible providers (`grok`, `minimax`, `llama`) are **newly instrumented** (currently unprofiled). |
|
||||
| `discussion_save_load` (existing) | `reset_session()` + project switch | Cold-path cost. The `clear_all()` migration's per-call delta. |
|
||||
| `gui_startup` (existing) | `_PROVIDER_HISTORIES` dict init at module load | One-time init cost (6 `ProviderHistory()` instances + 6 locks). |
|
||||
|
||||
### A2. Add 5 micro-benchmarks to the audit's `optimization_candidates.md`
|
||||
|
||||
The audit's per-call cost estimates should include these 5 micro-benchmarks (added per `HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md` §7):
|
||||
|
||||
| Micro-benchmark | Purpose | Expected overhead |
|
||||
|---|---|---|
|
||||
| `NormalizedResponse.__init__` | Dataclass construction vs the old 6-field dict literal | <1μs; immaterial |
|
||||
| `WebSocketMessage.__init__` | Dataclass construction per broadcast | <5μs; the hot path concern |
|
||||
| `UsageStats.__init__` | Nested dataclass construction per response | <500ns; negligible (4 int fields) |
|
||||
| `ProviderHistory.lock` acquire | threading.Lock acquire overhead | <500ns; the threading hot path |
|
||||
| `ToolSpec.__init__` | Dataclass construction per tool (45 tools, cold path) | <2μs; only at registration |
|
||||
|
||||
The benchmarks are emitted to `docs/reports/code_path_audit/<date>/micro_benchmarks.md`.
|
||||
|
||||
### A3. Add the "no-TypeError-errors-on-any-thread" assertion
|
||||
|
||||
The audit's per-action profiling runs the 5 actions in a controlled harness. The audit MUST assert that no `worker[queue_fallback] error: WebSocketServer.broadcast() takes 2 positional arguments but 3 were given` (or any TypeError on any thread) appears in the harness output during profiling.
|
||||
|
||||
This assertion catches the broadcast() regression that `any_type_componentization_20260621` introduced. The regression test that backs this assertion lives in `tests/test_websocket_broadcast_regression.py` (added by the `phase2_4_5_call_site_completion_20260621` follow-up track).
|
||||
|
||||
If the assertion fires, the audit's output should:
|
||||
1. Mark the affected action's profile as `INSTRUMENTATION_CONTAMINATED`
|
||||
2. List the offending thread + traceback in the report's `errors.md`
|
||||
3. Recommend re-running the audit AFTER `phase2_4_5_call_site_completion_20260621` merges
|
||||
|
||||
### A4. Add the 89 fat-struct sites as instrumented targets
|
||||
|
||||
The audit reads `docs/reports/ANY_TYPE_AUDIT_20260621.md` §3's table and tags each `Any` usage with `(file:line, hot_path, cold_path, init_path)`. The 89 sites become per-action cost estimates that flow into `optimization_candidates.md`.
|
||||
|
||||
For the 48 promoted sites, the audit compares pre-refactor (legacy globals + dict literals) vs post-refactor (dataclass + registry). For the 41 deferred Phase 3 sites, the audit produces per-call cost estimates that inform the future Phase 3 follow-up track (see `docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md` for the qualitative estimates).
|
||||
|
||||
### A5. Sequencing (BLOCKER)
|
||||
|
||||
**This audit is now blocked by `phase2_4_5_call_site_completion_20260621` (the broadcast() fix).** Until Phase 6a merges, the GUI thread's `worker[queue_fallback]` TypeError spam contaminates the audit's per-action profiling.
|
||||
|
||||
**Recommended sequence:**
|
||||
```
|
||||
T0: Tier 1 approves follow-up track (decision: SHRINK to 6a + 6b + 6d)
|
||||
T1: Tier 2 implements Phase 6a + 6b + 6d (~3 hours, ~16 commits)
|
||||
T2: Tier 1 reviews + merges follow-up track
|
||||
T3: Tier 1 launches code_path_audit_20260607
|
||||
T4: Tier 2 implements Phase 3 + cross-phase coupling (separate track, post-audit)
|
||||
```
|
||||
|
||||
### A6. New coordination with `any_type_componentization_20260621`
|
||||
|
||||
This audit now has **new dependencies** beyond the original 4 foundational tracks:
|
||||
|
||||
| Track | Status | Provides to this audit |
|
||||
|---|---|---|
|
||||
| `any_type_componentization_20260621` | Shipped 2026-06-21 (48/89 promoted) | The 5 dataclasses + 1 module; the 200-site dataclass-coverage baseline |
|
||||
| `phase2_4_5_call_site_completion_20260621` | Spec'd 2026-06-21; not yet merged | The fix for the broadcast() TypeError; the "no-TypeError" assertion |
|
||||
|
||||
This audit is `blocked_by` both tracks (post-merge).
|
||||
|
||||
## Follow-up
|
||||
|
||||
- **`pipeline_runtime_profiling_20260607`** (the user-requested follow-up; NOT in this track): adds a runtime profiling harness using the existing `src/performance_monitor.py` + a per-action test fixture. Measures real costs for the 3 actions. Calibrates the heuristic cost model (`EXPENSIVE_THRESHOLD` + per-class weights). Catches "things that aren't easy to resolve statically" — import cost, JIT effects, GC pauses, C-extension call cost (imgui-bundle, tree-sitter native), decorator-driven dispatch. Output: `scripts/runtime_profiler.py` + updated `code_path_audit.py` cost model.
|
||||
|
||||
@@ -0,0 +1,636 @@
|
||||
# Track Specification: Code Path & Data Pipeline Audit v2
|
||||
|
||||
**Status:** Spec v2 (revised 2026-06-22; v1 was approved 2026-06-07 and revised 2026-06-08 with the post-4-tracks timing + 5-source framing)
|
||||
**Initialized:** 2026-06-07 (v1); 2026-06-22 (v2 supersedes v1)
|
||||
**Owner:** Tier 1 (spec) -> Tier 2 (plan + execution)
|
||||
**Priority:** High (foundational; enables follow-up pruning + per-pipeline refactor tracks)
|
||||
**Folder:** `conductor/tracks/code_path_audit_20260607/`
|
||||
**Files:** `spec.md` (v1; preserved), `spec_v2.md` (this file), `plan.md` (v1; preserved), `plan_v2.md` (after this spec is approved)
|
||||
|
||||
> **v2 revision note (2026-06-22).** The v1 spec.md (approved 2026-06-07; revised 2026-06-08) was never executed (no `state.toml`, no `metadata.json`, no `src/code_path_audit.py` in the working tree). The 14-day gap saw 4 foundational tracks ship (`qwen_llama_grok_integration_20260606`, `data_oriented_error_handling_20260606`, `data_structure_strengthening_20260606`, `mcp_architecture_refactor_20260606`), the entire 5-sub-track `result_migration` campaign ship (2026-06-16 through 2026-06-21; 100% complete), and the `nagent_review` corpus grow from v1 to v3.1. v2 re-scopes the audit from "expensive operations per action" to "data pipelines per aggregate" — the v1 framing was correct at the time (the 4 tracks were future) but is now stale. v2 also cross-validates the `data_structure_strengthening_20260606` + `data_oriented_error_handling_20260606` deductions directly, which v1 could not (those tracks didn't exist on 2026-06-07). See §"Why v2" below.
|
||||
|
||||
---
|
||||
|
||||
## Why v2 (the rationale for the revision)
|
||||
|
||||
The user's framing (2026-06-22):
|
||||
|
||||
> "The whole point of the code path audit is to audit all paths nearly in the ./src of the codebase. The main point of it is to identify data-oriented pipelines and what data aggregate they will be operating on. This will realize what the data strengthening just uncovered and cross-audit if its deductions on the data structures are accurate while also being able to utilize additional flexibility the data oriented error handling track has provided. We are entering a time where the codebase is getting heavily adjusted into a properly engineered machine with discernable working parts."
|
||||
>
|
||||
> "The cost of the pipeline is important, it should factor in what data needs to be componentized further vs which can be unified further into wider code paths handling larger fat structs."
|
||||
|
||||
**Three changes from v1 to v2:**
|
||||
|
||||
1. **Output structure: per-action -> per-data-aggregate.** v1 emitted 3 per-action profiles (`ai_message_lifecycle`, `discussion_save_load`, `gui_startup`). v2 emits 10+3 per-data-aggregate profiles (`Metadata`, `FileItem`, `FileItems`, `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `ToolDefinition`, `ToolCall`, `Result[T]` + the 3 candidate aggregates `ChatMessage`, `ToolSpec`, `ProviderHistory`). The per-action reports are preserved for backward compat but downgraded to "cross-references to the per-aggregate profiles."
|
||||
|
||||
2. **Cross-validation with the 5 existing audit scripts.** v1 was a standalone tool. v2 consumes JSON from `audit_weak_types`, `audit_exception_handling`, `audit_optional_in_3_files`, `audit_no_models_config_io`, `audit_main_thread_imports`, and the type registry (`generate_type_registry.py --json`). The v2 audit's per-aggregate `cross_audit_findings` + `result_coverage` + `type_alias_coverage` are the cross-checks of the 2 foundational tracks (`data_structure_strengthening` + `data_oriented_error_handling`).
|
||||
|
||||
3. **The decomposition-cost heuristic.** v1 had a "cost model" focused on expensive operations (file I/O, network, AST parse). v2 adds a `DecompositionCost` heuristic per aggregate that answers the user's question: "should this data be componentized further (split into smaller dataclasses) or unified further (combined into wider fat structs)?" The recommendation is grounded in 3 dimensions: access pattern (whole_struct / field_by_field / hot_cold_split / bulk_batched / mixed), frequency (hot / per_turn / per_discussion / per_request / cold / init / unknown), and shape (struct_field_count + struct_frozen).
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Build `src/code_path_audit.py` v2 — a data-oriented static-analysis tool that audits the data pipelines in `src/` and produces per-data-aggregate profiles. The output (custom postfix `.dsl` data + markdown + prefix tree text, organized per-aggregate) is the artifact that informs per-aggregate refactor decisions. The actual code changes are follow-up tracks (the 3 high-priority candidates from `decomposition_matrix.md`).
|
||||
|
||||
The v2 audit's primary value is **cross-validation**: it consumes the JSON outputs of the 5 existing audit scripts and synthesizes them with the per-aggregate producer/consumer call graph. The result is a per-aggregate report that says "this aggregate has 12 weak-type sites (cross-checks `data_structure_strengthening`), 5 exception-handling sites (cross-checks `data_oriented_error_handling`), and 1 high-priority optimization candidate (decomposition direction: componentize)." The user reads one report per aggregate, not one per action.
|
||||
|
||||
The v2 audit is **read-only** on `src/` (the only new file is the tool itself + its tests + the report). The MMA worker spawn action is **out of scope** (per v1; the user's "keeping MMA cold" directive from 2026-06-07 still stands). Runtime profiling is **out of scope** (deferred to `pipeline_runtime_profiling_20260607`); the v2's heuristic cost constants are recalibrated by that follow-up.
|
||||
|
||||
---
|
||||
|
||||
## Current State Audit (as of `7e61dd7d`)
|
||||
|
||||
`src/` has 65 `.py` files (per the result migration campaign's final state). The call graph is dense; per-aggregate traversal is what makes the analysis tractable. The 4 foundational tracks that v1 deferred behind have all shipped; the 2 follow-up tracks (`any_type_componentization_20260621` + `phase2_4_5_call_site_completion_20260621`) are NOT on master (merged in `f914b2bc` then reverted in `751b94d4`); the v2 audit must be tolerant of their absence for an interim run.
|
||||
|
||||
### Already Implemented (DO NOT re-implement; KEEP / build on)
|
||||
|
||||
1. **`scripts/audit_main_thread_imports.py`** — the import-graph CI gate. The v2 audit consumes its JSON output (per the v2's `cross_audit_findings.import_graph` field). v2 does not modify this script.
|
||||
|
||||
2. **`scripts/audit_weak_types.py`** — the weak-types CI gate. v2 consumes its JSON output. v2 does not modify this script.
|
||||
|
||||
3. **`scripts/audit_exception_handling.py`** — the exception-handling CI gate (per `error_handling.md`). v2 consumes its JSON output. v2 does not modify this script.
|
||||
|
||||
4. **`scripts/audit_optional_in_3_files.py`** — the `Optional[T]` ban CI gate for the 3 refactored files (`mcp_client.py`, `ai_client.py`, `rag_engine.py`). v2 extends this script by 1 line (add `src/code_path_audit.py` to the baseline list); the convention is the same.
|
||||
|
||||
5. **`scripts/audit_no_models_config_io.py`** — the config-I/O ownership CI gate (per `conductor/code_styleguides/config_state_owner.md`). v2 consumes its JSON output. v2 does not modify this script.
|
||||
|
||||
6. **`scripts/generate_type_registry.py`** — the type-registry generator (per `conductor/code_styleguides/type_aliases.md`). v2 consumes its JSON output. v2 does not modify this script.
|
||||
|
||||
7. **`src/type_aliases.py`** — the 10 canonical TypeAliases + 1 NamedTuple (`FileItemsDiff`). v2 imports these; v2 does not redefine them. The 13 data aggregates (10 + 3 candidates) are referenced by their canonical names.
|
||||
|
||||
8. **`src/result_types.py`** — `Result[T]`, `ErrorInfo`, `NilPath`, `NilRAGState`, `ErrorKind`. v2 imports these; v2 does not redefine them. v2's public functions return `Result[T]` per the `error_handling.md` hard rule.
|
||||
|
||||
9. **`src/mcp_client.py:934-992` — `derive_code_path(target, max_depth=5)`.** A single-symbol recursive call tracer with text output. v2 builds on this pattern; the v2's PCG P1 (return-type pass) is the multi-symbol superset. The v1 spec's `CallGraph` is subsumed by the v2's `ProducerConsumerGraph` (function-to-aggregate edges, not function-to-function edges).
|
||||
|
||||
10. **`src/performance_monitor.py`** — runtime profiling with `monitor.scope("name")` + per-component hit counts + latencies. Used at runtime; the `pipeline_runtime_profiling_20260607` follow-up uses it to calibrate the v2's heuristic cost constants.
|
||||
|
||||
11. **`conductor/code_styleguides/data_oriented_design.md`** — the canonical DOD reference. v2's decomposition-cost heuristic is informed by the 8 defaults in §2 (especially "The common case dominates" + "Where there is one, there are many"). v2's per-aggregate access pattern classification follows the DOD's "Algorithms on data" framing.
|
||||
|
||||
12. **`conductor/code_styleguides/error_handling.md`** — the `Result[T]` convention. v2's public API returns `Result[T]` per the hard rule (§"Hard Rules" §"The 5 MUST-DO rules" + §"The 7 MUST-NOT-DO rules").
|
||||
|
||||
13. **`conductor/code_styleguides/type_aliases.md`** — the 10 TypeAliases + 1 NamedTuple. v2's per-aggregate `type_alias_coverage` metric is the cross-check of this convention.
|
||||
|
||||
14. **`conductor/code_styleguides/agent_memory_dimensions.md`** — the 4 mem dims (curation / discussion / RAG / knowledge). v2's `MemoryDim` classifier (§7.2.2) follows the styleguide's "shape rule" (a feature that wants one should use the matching dimension).
|
||||
|
||||
15. **`conductor/code_styleguides/feature_flags.md`** — the "delete to turn off" pattern. v2's `scripts/audit_code_path_audit_coverage.py` is a feature flag (the meta-audit); removing the file disables the meta-audit.
|
||||
|
||||
16. **`conductor/code_styleguides/cache_friendly_context.md`** — the stable-to-volatile cache ordering. v2's per-aggregate reports are a downstream consumer of the cache state (the `cache_friendly_context` is the "what stays in the LLM's context"; the v2's per-aggregate profile is the "what data flows through the LLM").
|
||||
|
||||
17. **`conductor/code_styleguides/knowledge_artifacts.md`** — the knowledge harvest pattern. v2's per-aggregate profiles are NOT a knowledge artifact (they're a curation artifact, per the 4-dim rule).
|
||||
|
||||
18. **`conductor/code_styleguides/rag_integration_discipline.md`** — the conservative-RAG rule. v2's `RAG` aggregate (RAGEngine state, indexed chunks) is classified by the `MemoryDim` classifier; the audit does not mutate RAG state.
|
||||
|
||||
19. **SDM docstrings** (`[C: ...]` / `[M: ...]` tags in `src/*.py` docstrings) — pre-computed caller/mutation info. v2's PCG is a more rigorous version of what SDM already documents ad-hoc.
|
||||
|
||||
20. **`conductor/tracks/nagent_review_20260608/nagent_review_v3_1_20260620.md`** — the v3.1 nagent review. v2 references the v3.1 Candidates 27-30 (Markdown + custom DSL lock-in, per-turn ground-truth hook, dataset-curation track, cache TTL GUI hardening). The v2's custom postfix DSL is a direct application of Candidate 27 (markdown + custom DSL).
|
||||
|
||||
21. **`docs/reports/computational_shapes_ssdl_digest_20260608.md`** — the SSDL digest that informed the v1 spec's 5-source lens. v2 preserves the lens (the 6 SSDL primitives are referenced in the v2's per-aggregate access pattern + frequency classification).
|
||||
|
||||
22. **`docs/reports/RESULT_MIGRATION_CAMPAIGN_STATUS_20260619.md`** — the 100%-complete `result_migration` campaign (268 sites migrated + 9 legacy wrappers obliterated across 6 sub-tracks, 2026-06-16 through 2026-06-21). v2's `result_coverage` metric is the post-campaign check that the convention was applied uniformly across all 65 `src/` files.
|
||||
|
||||
23. **`docs/reports/ANY_TYPE_AUDIT_20260621.md`** — the 89-site audit (48 promoted + 41 deferred) that informed `any_type_componentization_20260621`. v2 references the 3 candidate aggregates (§3.1 `ToolSpec`, §3.2 `ChatMessage`, §3.3 `ProviderHistory`) as forward-compat placeholders.
|
||||
|
||||
24. **`docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md`** — the Tier 2's authoritative cost analysis of the 41 deferred Phase 3 sites (the 112 call sites in `_send_<provider>()` that would migrate to `ProviderHistory.append()`). v2's `ProviderHistory` candidate aggregate's placeholder is sourced from this report.
|
||||
|
||||
25. **`conductor/tracks/code_path_audit_20260607/spec.md`** — the v1 spec (preserved). v2's structure is informed by v1's 6-phase plan + 5-source framing + 3-action output.
|
||||
|
||||
26. **`conductor/tracks/code_path_audit_20260607/plan.md`** — the v1 plan (preserved, never executed). v2's plan is a fresh write.
|
||||
|
||||
### Gaps to Fill (This Track's Scope)
|
||||
|
||||
- A `ProducerConsumerGraph` builder for all of `src/` (3 AST passes: P1 return types, P2 parameter types, P3 field access). Multi-aggregate, machine-readable output.
|
||||
- An `AccessPatternDetector` (5 patterns: whole_struct, field_by_field, hot_cold_split, bulk_batched, mixed). Per-`(function, aggregate)` classification with per-aggregate dominance rule (25% threshold).
|
||||
- A `CallFrequencyEstimator` (7 frequencies: hot, per_turn, per_discussion, per_request, cold, init, unknown). Entry-point-based heuristic + manual override file.
|
||||
- A `DecompositionCost` heuristic per aggregate (4 directions: componentize, unify, hold, insufficient_data). The 5-step `recommended_direction` logic per §7.5.
|
||||
- A `MemoryDim` classifier per aggregate (7 dims: curation, discussion, rag, knowledge, config, control, unknown). Canonical mappings + file-of-origin heuristic + override.
|
||||
- A per-aggregate profile data model (`AggregateProfile` + 9 supporting dataclasses + 5 enums: `AggregateKind`, `MemoryDim`, `AccessPattern`, `Frequency`, `RecommendedDirection`). All `frozen=True` per the immutability story. The 9 supporting dataclasses: `FunctionRef`, `AccessPatternEvidence`, `FrequencyEvidence`, `ResultCoverage`, `TypeAliasCoverage`, `CrossAuditFinding`, `CrossAuditFindings`, `DecompositionCost`, `OptimizationCandidate`.
|
||||
- A cross-audit integration layer that consumes the 6 input JSON streams and produces per-aggregate `cross_audit_findings` + 2 coverage metrics (`result_coverage`, `type_alias_coverage`).
|
||||
- The v2 postfix DSL (14 new tagged words + the v1's 7 preserved). The flat-section format (streamable, tag-scannable).
|
||||
- Output: per-aggregate `.dsl` + `.md` + `.tree` files + 4 top-level rollup files (summary.md, cross_audit_summary.md, decomposition_matrix.md, candidates.md).
|
||||
- A CLI (`python -m src.code_path_audit --all --date <date>`) and an MCP tool (`code_path_audit_v2(action=None) -> dict`).
|
||||
- A meta-audit (`scripts/audit_code_path_audit_coverage.py`) that validates the v2 audit's output schema.
|
||||
- The actual audit run on the 13 aggregates, with the report committed to `docs/reports/code_path_audit/<date>/`.
|
||||
- A new styleguide (`conductor/code_styleguides/code_path_audit.md`) documenting the v2 audit's contract.
|
||||
- A 1-line extension to `scripts/audit_optional_in_3_files.py` to include `src/code_path_audit.py` in the baseline.
|
||||
|
||||
---
|
||||
|
||||
## Goals
|
||||
|
||||
1. **Produce a queryable artifact per aggregate.** The custom postfix `.dsl` output is the source of truth; markdown + prefix tree text are for human review. Re-run after any `src/` change to see drift.
|
||||
2. **Cross-validate the 2 foundational conventions.** Per-aggregate `result_coverage` (the `data_oriented_error_handling` cross-check) + per-aggregate `type_alias_coverage` (the `data_structure_strengthening` cross-check). The verdict at the top of `summary.md` says "VERIFIED" or "DRIFT DETECTED" with the specific evidence.
|
||||
3. **Surface the top-N decomposition candidates per aggregate.** The `decomposition_matrix.md` ranks candidates by `estimated_savings_us × frequency_multiplier`. This is what the user uses to decide which refactor track to do next.
|
||||
4. **Data-grounded design.** The audit's data structure is the spec; the heuristics and the threshold are module-level constants tunable from one place (`scripts/code_path_audit_overrides.toml`).
|
||||
5. **Reusable across aggregates.** The `build_pcg` + `classify_memory_dim` + `detect_access_pattern` + `estimate_call_frequency` + `compute_decomposition_cost` APIs take any aggregate (or "all 13"). Adding a 14th aggregate is 1 line in the `AGGREGATES` constant.
|
||||
6. **Surface calibration gaps clearly.** When the static heuristic can't resolve a call (C-extension, decorator-driven dispatch, `getattr` magic), the report flags it as "unresolved" so the `pipeline_runtime_profiling_20260607` follow-up targets it.
|
||||
7. **Tolerate the candidate aggregates' absence.** The 3 candidate aggregates (`ChatMessage`, `ToolSpec`, `ProviderHistory`) are NOT on master. The v2 audit produces placeholders with `is_candidate: True`; the report is still valid (the placeholders are clearly marked).
|
||||
|
||||
---
|
||||
|
||||
## Functional Requirements
|
||||
|
||||
The 11 public functions in `src/code_path_audit.py`. All return `Result[T]` per the `error_handling.md` hard rule (or return a deterministic `T` when no runtime failure is possible).
|
||||
|
||||
| # | Function | Returns | Failure mode |
|
||||
|---|---|---|---|
|
||||
| 1 | `run_audit(src_dir, audit_inputs_dir, output_dir, date)` | `Result[AuditSummary]` | 6 input JSONs may be missing or malformed; src/ may be unparseable |
|
||||
| 2 | `build_pcg(src_dir)` | `Result[ProducerConsumerGraph]` | AST parse errors in src/ |
|
||||
| 3 | `classify_memory_dim(aggregate, type_registry)` | `MemoryDim` | n/a (deterministic) |
|
||||
| 4 | `detect_access_pattern(function_body, aggregate)` | `AccessPattern` | n/a (deterministic) |
|
||||
| 5 | `estimate_call_frequency(function, call_graph)` | `Frequency` | n/a (deterministic) |
|
||||
| 6 | `compute_decomposition_cost(profile)` | `DecompositionCost` | n/a (deterministic) |
|
||||
| 7 | `read_input_json(path)` | `Result[dict]` | file not found; malformed JSON |
|
||||
| 8 | `to_dsl_v2(profile)` | `str` | n/a (deterministic) |
|
||||
| 9 | `parse_dsl_v2(text)` | `Result[dict]` | malformed DSL |
|
||||
| 10 | `to_markdown(profile)` | `str` | n/a (deterministic) |
|
||||
| 11 | `to_tree(profile)` | `str` | n/a (deterministic) |
|
||||
|
||||
Plus the CLI (`python -m src.code_path_audit ...`) and the MCP tool (`code_path_audit_v2`).
|
||||
|
||||
---
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
- **No new pip dependencies.** The v2 audit uses stdlib only (`ast`, `pathlib`, `json`, `dataclasses`, `tomllib` for the override file).
|
||||
- **1-space indentation** for all Python code (per `conductor/workflow.md`).
|
||||
- **CRLF line endings** on Windows.
|
||||
- **Type hints required** for all public functions.
|
||||
- **No comments in Python source** (documentation lives in `/docs`).
|
||||
- **`Result[T]` return types** for all functions that can fail at runtime (per the `error_handling.md` hard rule). The new file is held to the same standard as the 3 refactored files.
|
||||
- **`Optional[T]` return types are FORBIDDEN** in `src/code_path_audit.py`. Verified by the extended `scripts/audit_optional_in_3_files.py` (1-line extension).
|
||||
- **Per-task commits** (1 task = 1 commit). Per `conductor/workflow.md` TDD protocol.
|
||||
- **Per-task git notes** (each commit gets a `git notes add -m "..."` summary).
|
||||
- **Coverage target: >80%** for `src/code_path_audit.py`. The 4 audit scripts (`audit_exception_handling.py --strict`, `audit_weak_types.py --strict`, `audit_main_thread_imports.py`, `audit_no_models_config_io.py`) are the verification gates.
|
||||
- **The audit's runtime is bounded.** The full audit run against the real `src/` (65 files) completes in <60s on a developer machine. The unit + integration tests complete in <30s. The live_gui E2E tests are opt-in.
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
### 7.1 Public API (the 11 functions)
|
||||
|
||||
#### 7.1.1 `run_audit(...)`
|
||||
|
||||
The main entry point. Runs the full audit pipeline:
|
||||
|
||||
1. Read the 6 input JSON files from `audit_inputs_dir` (using `read_input_json` per function #7). Missing files are tolerated; the corresponding `cross_audit_findings` field is `()` and the markdown notes the absence.
|
||||
2. Build the PCG (using `build_pcg` per function #2).
|
||||
3. For each of the 13 aggregates, build the `AggregateProfile`:
|
||||
- `classify_memory_dim(aggregate, type_registry)` (function #3)
|
||||
- `detect_access_pattern(consumer, aggregate)` (function #4) for each consumer; aggregate to the per-aggregate pattern
|
||||
- `estimate_call_frequency(function, call_graph)` (function #5) for each producer + consumer; aggregate to the per-aggregate frequency
|
||||
- Cross-validate with the 6 input JSONs (compute `cross_audit_findings`, `result_coverage`, `type_alias_coverage`)
|
||||
- `compute_decomposition_cost(profile)` (function #6)
|
||||
- Synthesize `optimization_candidates` from the cross-audit findings + the decomposition cost
|
||||
4. Render the 13 per-aggregate `.dsl` + `.md` + `.tree` files.
|
||||
5. Render the 4 top-level rollup files (`summary.md`, `cross_audit_summary.md`, `decomposition_matrix.md`, `candidates.md`).
|
||||
6. Return `Result[AuditSummary]` with the per-aggregate profiles + the rollup paths.
|
||||
|
||||
#### 7.1.2 The other 10 functions
|
||||
|
||||
Per the table in §"Functional Requirements." The deterministic functions (3, 4, 5, 6, 8, 10, 11) take already-parsed data and return data; no I/O. The boundary functions (1, 2, 7, 9) catch stdlib I/O + AST parse errors and convert to `ErrorInfo` per `error_handling.md` Pattern 2.
|
||||
|
||||
### 7.2 The 4 static analyses (PCG, MemoryDim, APD, CFE)
|
||||
|
||||
#### 7.2.1 `ProducerConsumerGraph` (PCG) — pipeline discovery
|
||||
|
||||
**Three AST passes over `src/`:**
|
||||
|
||||
| Pass | What it finds | Output |
|
||||
|---|---|---|
|
||||
| **P1: Return types** | `FunctionDef.returns` annotation -> `Result[T]` -> producer of `T`; or direct `T` (alias or dataclass) -> producer of `T`. | `(function, aggregate, "producer", confidence="high")` edges |
|
||||
| **P2: Parameter types** | `FunctionDef.args` annotation -> parameter is a TypeAlias or dataclass -> consumer of that aggregate. `dict[str, Any]` parameter is NOT a consumer edge (typed by P3). | `(function, aggregate, "consumer", confidence="high")` edges |
|
||||
| **P3: Field access** | Every `payload['key']` and `payload.attr` in the function body. The audit consults `scripts/generate_type_registry.py --json` to map `key` to a known field of a known aggregate. If `key` is unique to one aggregate (e.g., `'vision'` -> `VendorCapabilities`), the consumer edge is high-confidence. If `key` is ambiguous (e.g., `'path'` appears in both `FileItem` and `ContextPreset`), the edge is low-confidence and the markdown flags it. | `(function, aggregate, "consumer", confidence=...)` edges |
|
||||
|
||||
**Edge cases the algorithm handles:**
|
||||
|
||||
- **Constructor calls** (`dict(...)`, `SomeDataclass(...)`, `SomeNamedTuple(...)`) inside a function body: the function is a producer at the call site. The audit tracks the call's `type` argument (`dict`, `SomeDataclass`) to identify the aggregate.
|
||||
- **Re-exports** (`from src.type_aliases import Metadata`): the audit uses `import` resolution to find the canonical TypeAlias definition, not the re-exported name.
|
||||
- **Decorator-wrapped methods** (e.g., `@imscope`): the audit walks through the decorator; if the decorator is a known passthrough (per `scripts/code_path_audit_overrides.toml`), the method body is processed normally. If unknown, the function is marked "unresolved" and the markdown notes it (matches the v1 spec's `unresolved_calls` behavior).
|
||||
- **Re-exports across sub-MCPs** (`mcp_client.py` re-exports `mcp_file_io.read_file_result`): the audit uses the **definition** site, not the re-export site, for the producer. The re-export site gets a "passthrough" `FunctionRef` with `role="consumer"`.
|
||||
|
||||
**Output:** A bipartite graph keyed by `(function_fqname, aggregate_name)` -> `FunctionRef` + role.
|
||||
|
||||
#### 7.2.2 `MemoryDim` classifier
|
||||
|
||||
A function `classify_memory_dim(aggregate_name, producer_functions, type_registry) -> MemoryDim` that consults:
|
||||
|
||||
1. **Canonical mappings** (hardcoded in `code_path_audit.py`):
|
||||
- `Metadata`, `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History` -> `discussion` (per-turn conversational)
|
||||
- `FileItem`, `FileItems` -> `curation` (per-file structural)
|
||||
- `ToolDefinition`, `ToolCall` -> `control` (these propagate through the LLM-tool pipeline)
|
||||
- `Result`, `ErrorInfo` -> `control` (propagation primitives)
|
||||
2. **File-of-origin heuristic:** if the aggregate's primary producer is in `src/aggregate.py`, `src/context_presets.py`, `src/views.py` -> `curation`. If in `src/ai_client.py`, `src/history.py`, `src/app_controller.py` (in the discussion-handling sections) -> `discussion`. If in `src/rag_engine.py` -> `rag`. If in `src/knowledge*.py` (if exists) -> `knowledge`. If in `src/paths.py`, `src/presets.py`, `src/personas.py` -> `config`.
|
||||
3. **Override file:** `scripts/code_path_audit_overrides.toml` with `[memory_dim.<aggregate>] = "<dim>"` for cases the heuristic gets wrong.
|
||||
|
||||
**When the classifier can't determine:** the result is `"unknown"` and the markdown flags it for human review (the override file is the fix).
|
||||
|
||||
#### 7.2.3 `AccessPatternDetector` (APD) — per-`(function, aggregate)` access pattern
|
||||
|
||||
For each `(function, aggregate)` pair:
|
||||
|
||||
1. Walk the function body. Record every `payload['key']` / `payload.attr` access into a `Counter[str]` keyed by `key`.
|
||||
2. Detect these patterns:
|
||||
- `whole_struct`: the function reads `payload` directly (passes to another function; `print(payload)`; `return payload`) OR accesses <=1 distinct key.
|
||||
- `field_by_field`: the function accesses >=3 distinct keys AND no `whole_struct` access in the body.
|
||||
- `hot_cold_split`: the function accesses 1-2 keys in the function's hot path (the top-level statement body) AND 2+ additional keys inside `if/else` branches.
|
||||
- `bulk_batched`: the function is `for x in payload_list: <op>` where `payload_list: list[aggregate]` and the body accesses fields uniformly across iterations.
|
||||
- `mixed`: none of the above patterns dominate (each pattern has <60% share of the function's accesses).
|
||||
3. Aggregate the per-function patterns to the aggregate level: the dominant pattern across all consumers, with the rule that the dominant pattern must have >=25% share of consumers. If no pattern has >=25%, the aggregate-level result is `mixed`.
|
||||
|
||||
**The threshold constants** are module-level in `code_path_audit.py`:
|
||||
|
||||
```python
|
||||
WHOLE_STRUCT_KEY_THRESHOLD: int = 1
|
||||
FIELD_BY_FIELD_KEY_THRESHOLD: int = 3
|
||||
MIXED_DOMINANCE_THRESHOLD: float = 0.6
|
||||
AGGREGATE_LEVEL_DOMINANCE_THRESHOLD: float = 0.25
|
||||
```
|
||||
|
||||
The override file can change them per-aggregate.
|
||||
|
||||
#### 7.2.4 `CallFrequencyEstimator` (CFE) — per-function frequency
|
||||
|
||||
Build the v1 call graph. For each function:
|
||||
|
||||
1. **Entry point detection** (AST-based):
|
||||
- Functions called from `__init__` of `App` (in `src/gui_2.py`) or `AppController` (in `src/app_controller.py`) or from `main()` (in `gui.py`) -> `init`.
|
||||
- Functions called from the ImGui render loop (`render_*` functions, or functions called within `if imgui.begin_main_tool_bar():` etc.) -> `hot`.
|
||||
- Functions called from the AI send path (`_send_<provider>_result`, `process_user_request`) -> `per_turn`.
|
||||
- Functions called from `reset_session`, `cleanup`, `_classify_*_error` -> `cold`.
|
||||
- Functions called from `save_project`, `load_project`, `save_snapshot` -> `per_discussion`.
|
||||
- Functions called from `_api_*` FastAPI handlers -> `per_request`.
|
||||
2. **Override file:** `scripts/code_path_audit_overrides.toml` with `[frequency.<function_fqname>] = "<freq>"` for manual corrections.
|
||||
3. **Aggregate level:** the dominant frequency across all producers+consumers, with `unknown` if no dominant.
|
||||
|
||||
### 7.3 The 6 input streams
|
||||
|
||||
The v2 audit consumes JSON from 6 sources. All 6 are in `tests/artifacts/audit_inputs/` (gitignored per `test_sandbox.md`):
|
||||
|
||||
| Input | Path | Producer | Shape (essential fields) |
|
||||
|---|---|---|---|
|
||||
| 1 | `audit_weak_types.json` | `scripts/audit_weak_types.py --json` | `{"findings": [{"file", "line", "type_string", "category"}]}` |
|
||||
| 2 | `audit_exception_handling.json` | `scripts/audit_exception_handling.py --json` | `{"findings": [{"file", "line", "category", "function", "class", "body_summary"}]}` |
|
||||
| 3 | `audit_optional_in_3_files.json` | `scripts/audit_optional_in_3_files.py --json` | `{"findings": [{"file", "line", "return_type", "function"}]}` (3 baseline files only) |
|
||||
| 4 | `audit_no_models_config_io.json` | `scripts/audit_no_models_config_io.py --json` | `{"findings": [{"file", "line", "function", "config_path"}]}` |
|
||||
| 5 | `audit_main_thread_imports.json` | `scripts/audit_main_thread_imports.py --json` | `{"findings": [{"file", "line", "imported_module", "thread"}]}` |
|
||||
| 6 | `type_registry.json` | `scripts/generate_type_registry.py --json` | `{"types": {"<aggregate_name>": {"file", "fields": [{"name", "type", "optional"}]}}}` |
|
||||
|
||||
**Tolerance:** if any input is missing or malformed, the audit continues with the corresponding `cross_audit_findings` field set to `()` (empty tuple) and the markdown notes the missing input. The audit does NOT fail on missing inputs.
|
||||
|
||||
### 7.4 The 13 data aggregates (10 + 3 candidates)
|
||||
|
||||
The 10 in-scope aggregates are the canonical TypeAliases from `src/type_aliases.py`:
|
||||
|
||||
```
|
||||
1. Metadata (the root alias; 79 sites in src/ai_client.py alone)
|
||||
2. FileItem (single file in context)
|
||||
3. FileItems (list of files in context; the most common weak pattern)
|
||||
4. CommsLogEntry (single entry in AI comms log)
|
||||
5. CommsLog (the comms log ring buffer)
|
||||
6. HistoryMessage (single message in provider history; UI layer)
|
||||
7. History (the conversation history)
|
||||
8. ToolDefinition (single tool definition)
|
||||
9. ToolCall (single tool call from the model)
|
||||
10. Result[T] (the success-or-failure wrapper; the audit's coverage metric)
|
||||
```
|
||||
|
||||
The 3 candidate aggregates are from `any_type_componentization_20260621` §3 (NOT on master; the v2 audit is forward-compatible with their absence):
|
||||
|
||||
```
|
||||
11. ToolSpec / ToolParameter (would replace ToolDefinition's 45 dict instances; §3.1)
|
||||
12. ChatMessage / UsageStats / NormalizedResponse (would replace HistoryMessage + tool-call dicts; §3.2)
|
||||
13. ProviderHistory (would replace the 7 per-provider history lists + locks; §3.3 + PHASE3_HYPOTHETICAL_PROMOTION)
|
||||
```
|
||||
|
||||
When the candidate is absent (the master state), the v2 audit produces a placeholder with `is_candidate: True` and all metrics set to 0. The `candidates.md` rollup explains the placeholder status.
|
||||
|
||||
### 7.5 The decomposition cost formula
|
||||
|
||||
**Constants (module-level, tunable):**
|
||||
|
||||
```python
|
||||
MICROSECOND_BUDGET_PER_LLM_TURN: int = 50_000 # per a real Anthropic Sonnet call's worth of work
|
||||
BRANCH_DISPATCH_OVERHEAD_US: int = 100 # cost per if/else branch decision on a struct field
|
||||
ALLOCATION_OVERHEAD_US: int = 50 # cost per SomeDataclass(...) construction
|
||||
DEAD_FIELD_COST_PER_FIELD_US: int = 10 # wasted allocation per unused field
|
||||
COMPONENTIZATION_INDIRECTION_US: int = 200 # cost of splitting a hot struct into 2
|
||||
UNIFICATION_INDIRECTION_US: int = 300 # cost of merging 2 hot structs into 1
|
||||
```
|
||||
|
||||
**Per-call cost formula:**
|
||||
|
||||
```
|
||||
per_call_cost_us =
|
||||
(struct_field_count * ALLOCATION_OVERHEAD_US)
|
||||
+ (max(fields_accessed_in_hot_path, 1) * BRANCH_DISPATCH_OVERHEAD_US)
|
||||
+ (struct_frozen ? 20 : 0)
|
||||
```
|
||||
|
||||
**Current total cost** (per unit of frequency):
|
||||
|
||||
```
|
||||
current_total_us = per_call_cost_us * frequency_multiplier
|
||||
where frequency_multiplier is:
|
||||
hot = 60 (60 fps)
|
||||
per_turn = 1
|
||||
per_request = 1
|
||||
per_discussion = 1
|
||||
cold = 0.01
|
||||
init = 0.001
|
||||
unknown = 0 (no estimate; mark insufficient_data)
|
||||
```
|
||||
|
||||
**Componentize savings formula:**
|
||||
|
||||
```
|
||||
componentize_savings_us = current_total_us * componentize_factor
|
||||
where componentize_factor is:
|
||||
if access_pattern == "field_by_field" and struct_field_count > 10 and not struct_frozen:
|
||||
componentize_factor = 0.30
|
||||
elif access_pattern == "hot_cold_split" and hot_field_count <= 2 and struct_field_count > 5:
|
||||
componentize_factor = 0.40
|
||||
elif access_pattern == "whole_struct" or access_pattern == "bulk_batched":
|
||||
componentize_factor = -0.20
|
||||
elif access_pattern == "mixed":
|
||||
componentize_factor = 0
|
||||
else:
|
||||
componentize_factor = -0.10
|
||||
```
|
||||
|
||||
**Unify savings formula:**
|
||||
|
||||
```
|
||||
unify_savings_us = current_total_us * unify_factor
|
||||
where unify_factor is:
|
||||
if access_pattern == "bulk_batched" and struct_field_count <= 3 and struct_frozen:
|
||||
unify_factor = 0.25
|
||||
elif access_pattern == "whole_struct" and struct_field_count <= 5 and struct_frozen:
|
||||
unify_factor = 0.15
|
||||
elif access_pattern == "field_by_field":
|
||||
unify_factor = -0.30
|
||||
elif access_pattern == "hot_cold_split":
|
||||
unify_factor = -0.10
|
||||
elif access_pattern == "mixed":
|
||||
unify_factor = 0
|
||||
else:
|
||||
unify_factor = 0.05
|
||||
```
|
||||
|
||||
**`recommended_direction` logic:**
|
||||
|
||||
```
|
||||
if access_pattern == "field_by_field" and struct_field_count > 10:
|
||||
-> "componentize" (rationale cites the dead-field count)
|
||||
elif access_pattern == "hot_cold_split" and hot_field_count <= 2:
|
||||
-> "componentize" (split into hot + cold structs)
|
||||
elif access_pattern == "bulk_batched" and struct_field_count <= 3:
|
||||
-> "unify" (small struct; wider bulk path is fine)
|
||||
elif access_pattern == "whole_struct" and struct_field_count <= 5:
|
||||
-> "unify" (small struct; less dispatch overhead)
|
||||
elif access_pattern == "mixed" or frequency == "unknown":
|
||||
-> "insufficient_data" (recommend runtime profiling per pipeline)
|
||||
elif struct_frozen and access_pattern == "whole_struct":
|
||||
-> "hold" (frozen + whole_struct is the ideal shape)
|
||||
else:
|
||||
-> "hold"
|
||||
```
|
||||
|
||||
**The auto-generated rationale string:**
|
||||
|
||||
```
|
||||
"<aggregate_name>: access_pattern=<pattern>, frequency=<freq>, struct_field_count=<N>, struct_frozen=<bool>.
|
||||
Recommended: <direction> because <one-sentence justification>. Estimated savings: <X>us per <freq unit>."
|
||||
```
|
||||
|
||||
The Tier 2 Tech Lead can override the rationale per-aggregate in `scripts/code_path_audit_overrides.toml`.
|
||||
|
||||
---
|
||||
|
||||
## Output Format
|
||||
|
||||
### 8.1 The 13 per-aggregate files (DSL + markdown + tree)
|
||||
|
||||
For each aggregate:
|
||||
|
||||
**`*.dsl`** — the postfix DSL (flat sections, streamable, tag-scannable). The canonical artifact.
|
||||
|
||||
**`*.md`** — human-readable markdown, 10 sections (Header, Pipeline summary, Access pattern, Frequency, Result coverage, Type alias coverage, Cross-audit findings, Decomposition cost, Optimization candidates, Verdict).
|
||||
|
||||
**`*.tree`** — prefix tree text view (box-drawing, recursive walker). Compact, scannable.
|
||||
|
||||
### 8.2 The 4 top-level rollups
|
||||
|
||||
**`summary.md`** — the 30-second view + the 4-mem-dim rollup + the verdict (the "VERIFIED" or "DRIFT DETECTED" line).
|
||||
|
||||
**`cross_audit_summary.md`** — the per-aggregate cross-audit hits table (5 columns, one per input audit script) + the top-5 follow-up candidates + the cross-validation verdict.
|
||||
|
||||
**`decomposition_matrix.md`** — the ranked list of optimization candidates across all aggregates, sorted by `estimated_savings_us * frequency_multiplier`. The "what should we do next" view.
|
||||
|
||||
**`candidates.md`** — the 3 candidate aggregates (forward-compat placeholders). Explains the placeholder status.
|
||||
|
||||
### 8.3 The v1 artifacts (preserved for backward compat)
|
||||
|
||||
- `docs/reports/code_path_audit/<date>/call_graph.dsl` — the v1 full call graph.
|
||||
- `docs/reports/code_path_audit/<date>/actions/ai_message_lifecycle.{dsl,md,mmd}` — the v1 per-action reports, downgraded to "cross-references to the per-aggregate profiles."
|
||||
|
||||
### 8.4 The audit_inputs/ dir (gitignored)
|
||||
|
||||
The 6 input JSON files consumed (for reproducibility; same dir name as `tests/artifacts/audit_inputs/` per `test_sandbox.md`).
|
||||
|
||||
---
|
||||
|
||||
## Verification (10-phase TDD test plan)
|
||||
|
||||
Per `conductor/workflow.md` TDD red-first protocol. Each phase has 1 setup commit + N test commits + 1 refactor commit.
|
||||
|
||||
| Phase | What | Test count | Audit gate |
|
||||
|---|---:|---:|---|
|
||||
| 1. Data model | `AggregateProfile` + 9 supporting dataclasses + 5 enums (per §7.1 / §7.2) | 10 | n/a |
|
||||
| 2. PCG (P1+P2+P3) | The 3 AST passes; producer/consumer edges | 7 | `audit_main_thread_imports.py` |
|
||||
| 3. APD | The 5 access patterns + the 25% dominance rule | 6 | n/a |
|
||||
| 4. CFE | The 6 entry-point detectors + the override file | 6 | n/a |
|
||||
| 5. Decomposition cost | The 4-direction logic + the auto-generated rationale | 6 | n/a |
|
||||
| 6. Cross-audit integration | The 6 input JSON contracts + the 3-tier mapping | 7 | `audit_weak_types.py --strict` |
|
||||
| 7. v2 DSL | The 14 new tagged words + the round-trip + backward compat | 5 | n/a |
|
||||
| 8. Markdown / tree renderers | The 10 markdown sections + the box-drawing tree | 4 | n/a |
|
||||
| 9. Integration tests | The synthetic src/ fixture + the real src/ run | 7 | All 4 audit scripts pass `--strict` |
|
||||
| 10. Live_gui E2E (opt-in) | The MCP tool via the `live_gui` fixture | 2 | All 4 audit scripts pass `--strict` |
|
||||
|
||||
**Total: 60 unit tests + 7 integration tests + 2 live_gui tests = 69 tests.**
|
||||
|
||||
### 9.1 The synthetic src/ fixture
|
||||
|
||||
`tests/fixtures/synthetic_src/` — 6 files defining 3 aggregates (`Metadata`, `FileItems`, `History`) + 6 functions (2 producers, 4 consumers). The integration tests assert the exact expected profiles.
|
||||
|
||||
### 9.2 The 6 input JSON fixture
|
||||
|
||||
`tests/fixtures/audit_inputs/` — 6 JSON files matching the contracts in §7.3. The integration tests assert the cross-audit mapping, the `result_coverage` + `type_alias_coverage` formulas, and the tolerance for missing inputs.
|
||||
|
||||
### 9.3 Pre-commit verification
|
||||
|
||||
```bash
|
||||
uv run pytest tests/test_code_path_audit.py -q
|
||||
uv run python scripts/audit_exception_handling.py --strict
|
||||
uv run python scripts/audit_weak_types.py --strict
|
||||
uv run python scripts/audit_main_thread_imports.py
|
||||
uv run python scripts/audit_no_models_config_io.py
|
||||
```
|
||||
|
||||
### 9.4 End-of-track verification
|
||||
|
||||
```bash
|
||||
uv run python -m src.code_path_audit --all --date 2026-06-22
|
||||
uv run python scripts/audit_exception_handling.py --strict
|
||||
uv run python scripts/audit_weak_types.py --strict
|
||||
uv run python scripts/audit_main_thread_imports.py
|
||||
uv run python scripts/audit_no_models_config_io.py
|
||||
uv run python scripts/generate_type_registry.py --check
|
||||
uv run pytest tests/test_code_path_audit_live_gui.py -v
|
||||
```
|
||||
|
||||
### 9.5 Manual verification (per `conductor/workflow.md`)
|
||||
|
||||
The Tier 2 Tech Lead + user review the `docs/reports/code_path_audit/<date>/summary.md` to confirm:
|
||||
- The 4-mem-dim rollup is correct
|
||||
- The cross-audit verdict is accurate
|
||||
- The decomposition_matrix.md rankings match the user's intuition
|
||||
- The 3 candidate aggregates are properly marked as placeholders
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope (per §7.2)
|
||||
|
||||
- **No modifications to existing `src/*.py` files** (read-only on the 65 existing files; the v2 audit doesn't change them).
|
||||
- **No modifications to the 5 existing audit scripts** (consume their JSON; don't change them).
|
||||
- **No runtime profiling.** Deferred to `pipeline_runtime_profiling_20260607` (preserved from the v1 spec's follow-up list).
|
||||
- **No new pip dependencies.** The v2 audit uses stdlib only.
|
||||
- **No changes to `data_structure_strengthening_20260606` or `data_oriented_error_handling_20260606` styleguides.**
|
||||
- **No changes to the v1 `spec.md` and `plan.md`** (they stay as v1).
|
||||
- **No MMA worker spawn action** (preserved from v1; the user's "keeping MMA cold" directive from 2026-06-07 still stands).
|
||||
- **No new modules in `src/` other than `code_path_audit.py`** (per the file size + naming convention in AGENTS.md).
|
||||
- **The 23 lower-impact files** (those with 1-9 weak-type sites each) are deferred.
|
||||
- **The 3 candidate aggregates' "real" analysis** is deferred (the v2 audit produces placeholders; the real profiles arrive after `any_type_componentization_20260621` merges).
|
||||
- **The v1-style per-action output** is preserved for backward compat but downgraded to "cross-references to the per-aggregate profiles."
|
||||
|
||||
---
|
||||
|
||||
## Risks (per §7.3)
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|---|---|---|---|
|
||||
| The decomposition-cost heuristic is inaccurate (componentize_savings overestimate or underestimate) | Medium | Medium (false-positive optimization candidates) | Runtime-profiling follow-up recalibrates. The override file adjusts per-aggregate. |
|
||||
| The PCG misses dynamic patterns (`eval`, `getattr`, decorator-driven dispatch) | Medium | Low (affected functions marked "unresolved") | The override file lists known passthroughs. Runtime-profiling follow-up catches unresolved. |
|
||||
| The 6 input JSON contracts drift (the existing audit scripts evolve without bumping the v2 audit's contract) | Medium | Low (the v2 audit tolerates missing fields; the schema validator catches drift) | The `audit_code_path_audit_coverage.py` meta-audit runs in CI; fails on schema drift. |
|
||||
| The candidate aggregates don't merge (`any_type_componentization_20260621` is delayed) | Low | Low (the placeholders are still there; the report still produces) | The v2 audit is forward-compatible. The `is_candidate: bool` flag handles absence. |
|
||||
| The v1 .dsl files don't round-trip (the v2 parser is more strict than v1) | Low | Medium (the v1 action reports are broken) | The v2 parser is a **superset** of v1; the v1 action reports still parse. The `test_v2_dsl_backward_compat_v1` test verifies. |
|
||||
| The 60+7+2 = 69 tests is too long-running for the per-PR CI gate | Low | Low (AST walks are sub-second; live_gui tests are opt-in) | Unit + integration tests <30s. Live_gui tests opt-in via env var. |
|
||||
| The synthetic src/ fixture diverges from real src/ (the test expectations don't generalize) | Medium | Low (the integration tests catch real bugs separately) | The integration test layer runs against real src/ as well as the synthetic fixture. |
|
||||
| The v2 audit is run against `master` without `any_type_componentization_20260621` merged, so the candidate placeholders pollute the report | Low | Low (the placeholders are clearly marked) | The `is_candidate: bool` flag is visible in every output. The `summary.md` has a section explaining placeholder status. |
|
||||
| The decomposition-matrix savings estimates are misinterpreted as "ground truth" (they're heuristic) | Medium | Low (the user might over-prioritize) | The `summary.md` and `decomposition_matrix.md` headers caveat: "Savings estimates are heuristic (calibrated by `pipeline_runtime_profiling_20260607`); use as ranking input, not as actual savings." |
|
||||
| The 4 mem dim classification is wrong for some aggregates (the file-of-origin heuristic misroutes) | Medium | Low (the misrouted aggregate shows up in the wrong dim's rollup) | The `MemoryDim` is overridable in `scripts/code_path_audit_overrides.toml`. The markdown flags the override. |
|
||||
|
||||
---
|
||||
|
||||
## Coordination with Pending Tracks
|
||||
|
||||
| Track | Status (2026-06-22) | Relationship to v2 |
|
||||
|---|---|---|
|
||||
| `any_type_componentization_20260621` | NOT on master (merged `f914b2bc`, reverted `751b94d4`); spec + plan in `conductor/tracks/any_type_componentization_20260621/` | The 3 candidate aggregates (`ToolSpec`, `ChatMessage`, `ProviderHistory`) are sourced from this track's `ANY_TYPE_AUDIT_20260621.md` §3. The v2 audit's `candidates.md` rollup documents the forward-compat. When this track merges, the v2 audit is re-run; the placeholders become real profiles. |
|
||||
| `phase2_4_5_call_site_completion_20260621` | NOT on master (same merge+revert history as `any_type_componentization_20260621`); spec + plan + TRACK_COMPLETION report in `conductor/tracks/phase2_4_5_call_site_completion_20260621/` | The `PHASE3_HYPOTHETICAL_PROMOTION.md` (authored by Tier 2; the authoritative Phase 3 cost hypothesis) is the source of the v2's `ProviderHistory` candidate aggregate's expected cost. The v2 audit's `candidates.md` cites this report. |
|
||||
| `data_oriented_error_handling_20260606` | SHIPPED (in master) | The v2 audit's `result_coverage` metric is the cross-check. The `error_handling.md` styleguide is the v2 audit's source of truth for the `Result[T]` return types. |
|
||||
| `data_structure_strengthening_20260606` | SHIPPED (in master) | The v2 audit's `type_alias_coverage` metric is the cross-check. The `type_aliases.md` styleguide + the 10 TypeAliases are the v2 audit's source of truth. |
|
||||
| `result_migration_cruft_removal_20260620` | SHIPPED (in master) | The `RESULT_MIGRATION_CAMPAIGN_STATUS_20260619.md` confirms the 100% complete state. The v2 audit's `result_coverage` reports on this final state. |
|
||||
| `public_api_migration_and_ui_polish_20260615` | SHIPPED (in master) | `ai_client.send_result()` is the canonical public API. The v2 audit's `Metadata` aggregate's `result_coverage` reports on the post-migration state. |
|
||||
| `nagent_review_20260608` (v3.1) | ACTIVE (in master; v3.1 is the latest at `7e61dd7d`) | The v2 audit references Candidates 27-30 (Markdown + custom DSL lock-in, per-turn ground-truth hook, dataset-curation track, cache TTL GUI hardening). The v2's custom postfix DSL is a direct application of Candidate 27. |
|
||||
| `exception_handling_audit_20260616` | SHIPPED (in master) | The 211-site audit (`EXCEPTION_HANDLING_AUDIT_20260616.md`) is the precedent for the v2 audit's structure (audit -> migration plan -> sub-tracks). |
|
||||
| `tier2_leak_prevention_20260620` | SHIPPED (in master) | The v2 audit's Tier 2 execution follows the `tier2_leak_prevention` conventions (no `git push*`, no `git checkout*`, etc.). |
|
||||
|
||||
**This audit has no blockers** and **no conflicts**. It can ship independently of the 5 active planned tracks. It enables future refactors (the 3 high-priority `componentize` candidates).
|
||||
|
||||
---
|
||||
|
||||
## Follow-up (per §7.4)
|
||||
|
||||
| # | Track | When | Purpose |
|
||||
|---|---|---|---|
|
||||
| 1 | `pipeline_runtime_profiling_20260607` | After v2 ships | Calibrate the v2's heuristic cost constants against real measurements. Uses `src/performance_monitor.py`. The v2 spec's `MICROSECOND_BUDGET_PER_LLM_TURN`, `BRANCH_DISPATCH_OVERHEAD_US`, `ALLOCATION_OVERHEAD_US`, `DEAD_FIELD_COST_PER_FIELD_US`, `COMPONENTIZATION_INDIRECTION_US`, `UNIFICATION_INDIRECTION_US` are recalibrated by this track. |
|
||||
| 2 | `data_pipelines_inventory_<date>` | After v2 ships | Per-pipeline (vs per-aggregate) reports for the top 5 pipelines. Complements the v2 with the pipeline view. The v2's `decomposition_matrix.md` is the input. |
|
||||
| 3 | `code_path_audit_in_ci_<date>` | After v2 ships | Run v2 in CI on every PR; fail on new untyped sites OR a high-priority decomposition-matrix regression. The "audit as CI gate" pattern. |
|
||||
| 4 | `code_path_audit_data_oriented_refactor_<date>` | After v2 ships | Implement the 3 high-priority `componentize` candidates (FileItems, History, Metadata) per the v2 audit's `decomposition_matrix.md`. |
|
||||
| 5 | `code_path_audit_v2_5_followup_<date>` | After `any_type_componentization_20260621` merges | Re-run v2; the 3 placeholders become real profiles; the decomposition-matrix gets 3 new rows. |
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
### Styleguides
|
||||
|
||||
- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference (v2's decomposition-cost heuristic is informed by §2's 8 defaults)
|
||||
- `conductor/code_styleguides/error_handling.md` — the `Result[T]` convention (v2's public API returns `Result[T]` per the hard rule)
|
||||
- `conductor/code_styleguides/type_aliases.md` — the 10 TypeAliases + 1 NamedTuple (v2's 10 in-scope aggregates)
|
||||
- `conductor/code_styleguides/agent_memory_dimensions.md` — the 4 mem dims (v2's `MemoryDim` classifier)
|
||||
- `conductor/code_styleguides/feature_flags.md` — "delete to turn off" pattern (v2's `audit_code_path_audit_coverage.py` is a feature flag)
|
||||
- `conductor/code_styleguides/cache_friendly_context.md` — stable-to-volatile context ordering (v2's per-aggregate reports are a downstream consumer of the cache state)
|
||||
- `conductor/code_styleguides/knowledge_artifacts.md` — the knowledge harvest pattern (v2's per-aggregate profiles are NOT a knowledge artifact; they're curation)
|
||||
- `conductor/code_styleguides/rag_integration_discipline.md` — the conservative-RAG rule (v2's `rag` aggregate classification)
|
||||
- `conductor/code_styleguides/config_state_owner.md` — config I/O ownership (v2's `audit_no_models_config_io.json` is the cross-check)
|
||||
|
||||
### v1 spec + plan (preserved)
|
||||
|
||||
- `conductor/tracks/code_path_audit_20260607/spec.md` — the v1 spec (approved 2026-06-07; revised 2026-06-08 with post-4-tracks timing + 5-source framing)
|
||||
- `conductor/tracks/code_path_audit_20260607/plan.md` — the v1 plan (preserved, never executed)
|
||||
|
||||
### Reports + ideation
|
||||
|
||||
- `docs/reports/computational_shapes_ssdl_digest_20260608.md` — the SSDL digest that informed the v1 spec's 5-source lens (v2 preserves the lens)
|
||||
- `docs/reports/RESULT_MIGRATION_CAMPAIGN_STATUS_20260619.md` — the 100%-complete result migration campaign
|
||||
- `docs/reports/ANY_TYPE_AUDIT_20260621.md` — the 89-site audit (48 promoted + 41 deferred) that informed `any_type_componentization_20260621` (v2's 3 candidate aggregates)
|
||||
- `docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md` — the Tier 2's authoritative cost analysis of the 41 deferred Phase 3 sites
|
||||
- `docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md` — the 211-site audit (precedent for v2's structure)
|
||||
- `docs/reports/PLANNING_DIGEST_20260606.md` — the planning digest for the 5 foundational tracks
|
||||
- `docs/ideation/ed_chunk_data_structures_20260523.md` — the chunk-based-data-structure ideation (referenced in v1 spec; v2's `bulk_batched` access pattern aligns)
|
||||
|
||||
### v3.1 nagent review (the latest framing)
|
||||
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_review_v3_1_20260620.md` — the v3.1 thickened main review
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_takeaways_v3_1_20260620.md` — the v3.1 bridge + the 4 new candidates (27-30)
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_review_v3_20260619.md` — the v3 main review (preserved per user directive 2026-06-20)
|
||||
|
||||
### Source files (the v2 audit consumes)
|
||||
|
||||
- `src/type_aliases.py` — the 10 TypeAliases + 1 NamedTuple
|
||||
- `src/result_types.py` — `Result[T]`, `ErrorInfo`, nil-sentinels
|
||||
- `src/mcp_client.py:934-992` — `derive_code_path` (the v2's PCG is the multi-symbol superset)
|
||||
- `src/performance_monitor.py` — runtime profiling (used by `pipeline_runtime_profiling_20260607` follow-up)
|
||||
- `src/vendor_capabilities.py` — the canonical `frozen=True` dataclass + module-level registry pattern (template for the v2 audit's per-aggregate profile structure)
|
||||
|
||||
### Audit scripts (the v2 audit consumes)
|
||||
|
||||
- `scripts/audit_main_thread_imports.py` — import-graph CI gate
|
||||
- `scripts/audit_weak_types.py` — weak-types CI gate
|
||||
- `scripts/audit_exception_handling.py` — exception-handling CI gate
|
||||
- `scripts/audit_optional_in_3_files.py` — `Optional[T]` ban CI gate (v2 extends this with 1 line)
|
||||
- `scripts/audit_no_models_config_io.py` — config-I/O ownership CI gate
|
||||
- `scripts/generate_type_registry.py` — type-registry generator
|
||||
|
||||
### Workflow + process
|
||||
|
||||
- `conductor/workflow.md` — TDD protocol + per-task commits + git notes + phase checkpoints + skip-marker policy
|
||||
- `conductor/edit_workflow.md` — the edit-tool contract (the v2 audit uses `manual-slop_*` MCP tools per the project convention)
|
||||
- `AGENTS.md` — canonical operating rules (the "no day estimates" rule, the "small files are propaganda" stance, the hard bans on `git restore` / `git checkout --`)
|
||||
- `conductor/product-guidelines.md` — product-level conventions (1-space indent, 1 commit per task, type hints, etc.)
|
||||
- `conductor/tech-stack.md` — tech stack constraints (Python 3.11+, imgui-bundle, FastAPI, etc.)
|
||||
|
||||
### Sibling tracks (the v2's relationship)
|
||||
|
||||
- `conductor/tracks/any_type_componentization_20260621/` — the 3 candidate aggregates' source
|
||||
- `conductor/tracks/phase2_4_5_call_site_completion_20260621/` — the `PHASE3_HYPOTHETICAL_PROMOTION` source
|
||||
- `conductor/tracks/data_oriented_error_handling_20260606/` — the `Result[T]` source
|
||||
- `conductor/tracks/data_structure_strengthening_20260606/` — the TypeAlias source
|
||||
- `conductor/tracks/result_migration_cruft_removal_20260620/` — the 100% complete result migration
|
||||
|
||||
---
|
||||
|
||||
**End of spec_v2.md.**
|
||||
@@ -0,0 +1,64 @@
|
||||
# Track state for code_path_audit_20260607
|
||||
# v2 supersedes v1; spec_v2.md + plan_v2.md are the canonical artifacts
|
||||
# (v1's spec.md + plan.md are preserved unchanged, never executed)
|
||||
# Updated by Tier 2 Tech Lead as tasks complete
|
||||
|
||||
[meta]
|
||||
track_id = "code_path_audit_20260607"
|
||||
name = "Code Path & Data Pipeline Audit v2"
|
||||
status = "active" # active | completed
|
||||
current_phase = 0 # 0 = pre-Phase 1; 1..N = in Phase N; "complete" if all phases done
|
||||
last_updated = "2026-06-22"
|
||||
|
||||
[parent]
|
||||
# Independent track (not part of an umbrella)
|
||||
|
||||
[blocked_by]
|
||||
# No blockers. The 5 foundational tracks (data_oriented_error_handling_20260606,
|
||||
# data_structure_strengthening_20260606, mcp_architecture_refactor_20260606,
|
||||
# qwen_llama_grok_integration_20260606, result_migration_20260616) are SHIPPED.
|
||||
# The 2 candidate-related tracks (any_type_componentization_20260621,
|
||||
# phase2_4_5_call_site_completion_20260621) are NOT on master; the v2 audit
|
||||
# is tolerant of their absence (forward-compat placeholders).
|
||||
|
||||
[blocks]
|
||||
# 5 follow-up tracks (see metadata.json follow_up_tracks)
|
||||
|
||||
[phases]
|
||||
# 14 phases per plan_v2.md
|
||||
phase_0 = { status = "pending", checkpointsha = "", name = "Setup (state.toml, empty files, fixture dirs)" }
|
||||
phase_1 = { status = "pending", checkpointsha = "", name = "Data model (5 enums + 9 supporting dataclasses + AggregateProfile)" }
|
||||
phase_2 = { status = "pending", checkpointsha = "", name = "PCG (3 AST passes: P1 return types, P2 parameter types, P3 field access)" }
|
||||
phase_3 = { status = "pending", checkpointsha = "", name = "MemoryDim classifier (canonical mappings + file-of-origin + override)" }
|
||||
phase_4 = { status = "pending", checkpointsha = "", name = "APD (5 access patterns + 25% dominance rule)" }
|
||||
phase_5 = { status = "pending", checkpointsha = "", name = "CFE (7 frequencies + entry-point detection + override file)" }
|
||||
phase_6 = { status = "pending", checkpointsha = "", name = "Decomposition cost (4 directions + auto-generated rationale)" }
|
||||
phase_7 = { status = "pending", checkpointsha = "", name = "Cross-audit integration (6 input JSONs + 3-tier mapping)" }
|
||||
phase_8 = { status = "pending", checkpointsha = "", name = "v2 DSL (14 new tagged words + flat-section format)" }
|
||||
phase_9 = { status = "pending", checkpointsha = "", name = "run_audit() main entry + CLI + MCP tool" }
|
||||
phase_10 = { status = "pending", checkpointsha = "", name = "Integration tests (synthetic src/ + audit_inputs/ fixtures)" }
|
||||
phase_11 = { status = "pending", checkpointsha = "", name = "Live_gui E2E tests (opt-in via CODE_PATH_AUDIT_LIVE_GUI=1)" }
|
||||
phase_12 = { status = "pending", checkpointsha = "", name = "Meta-audit + 1-line extension + styleguide" }
|
||||
phase_13 = { status = "pending", checkpointsha = "", name = "End-of-track report + tracks.md update" }
|
||||
|
||||
[verification]
|
||||
data_model_tests_passing = false
|
||||
pcg_tests_passing = false
|
||||
memory_dim_tests_passing = false
|
||||
apd_tests_passing = false
|
||||
cfe_tests_passing = false
|
||||
decomposition_cost_tests_passing = false
|
||||
cross_audit_integration_tests_passing = false
|
||||
v2_dsl_tests_passing = false
|
||||
renderers_tests_passing = false
|
||||
integration_tests_passing = false
|
||||
live_gui_tests_passing = false
|
||||
meta_audit_passing = false
|
||||
all_4_audit_gates_passing = false
|
||||
type_registry_check_passing = false
|
||||
audit_run_completed = false
|
||||
summary_md_approved = false
|
||||
optimization_candidates_md_approved = false
|
||||
truncation_md_approved = false
|
||||
track_completion_report_written = false
|
||||
tracks_md_updated = false
|
||||
@@ -0,0 +1,118 @@
|
||||
{
|
||||
"track_id": "phase2_4_5_call_site_completion_20260621",
|
||||
"name": "Phase 2/4/5 Call-Site Completion (post any_type_componentization)",
|
||||
"initialized": "2026-06-21",
|
||||
"owner": "tier2-tech-lead",
|
||||
"priority": "A",
|
||||
"status": "active",
|
||||
"type": "bugfix + refactor + test-infrastructure",
|
||||
"scope": {
|
||||
"new_files": [
|
||||
"tests/test_websocket_broadcast_regression.py",
|
||||
"docs/reports/TRACK_COMPLETION_phase2_4_5_call_site_completion_20260621.md"
|
||||
],
|
||||
"modified_files": [
|
||||
"src/app_controller.py",
|
||||
"src/events.py",
|
||||
"src/gui_2.py",
|
||||
"src/ai_client.py",
|
||||
"tests/test_grok_provider.py",
|
||||
"tests/test_minimax_provider.py",
|
||||
"tests/test_llama_provider.py"
|
||||
],
|
||||
"deleted_files": []
|
||||
},
|
||||
"blocked_by": [],
|
||||
"blocks": ["code_path_audit_20260607"],
|
||||
"estimated_phases": 4,
|
||||
"spec": "spec.md",
|
||||
"plan": "plan.md",
|
||||
"priority_order": "A (Phase 6a broadcast fix) > A (Phase 6b OpenAICompatibleRequest) > B (Phase 6d NormalizedResponse) > A (Phase 6e Tier 2 cost deduction)",
|
||||
"parent_track": {
|
||||
"id": "any_type_componentization_20260621",
|
||||
"spec": "conductor/tracks/any_type_componentization_20260621/spec.md",
|
||||
"handoff_docs": [
|
||||
"docs/handoffs/PROMPT_FOR_TIER_1.md",
|
||||
"docs/handoffs/HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md",
|
||||
"docs/handoffs/HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md"
|
||||
]
|
||||
},
|
||||
"phases": {
|
||||
"phase_6a": {
|
||||
"name": "Fix HookServer.broadcast() callers",
|
||||
"scope": "Migrate broadcast(channel, payload) callers in app_controller.py + events.py + gui_2.py to broadcast(WebSocketMessage(...))",
|
||||
"estimated_commits": 7,
|
||||
"new_test_file": "tests/test_websocket_broadcast_regression.py"
|
||||
},
|
||||
"phase_6b": {
|
||||
"name": "Complete OpenAICompatibleRequest migration",
|
||||
"scope": "_send_grok + _send_minimax + _send_llama construct OpenAICompatibleRequest(messages=[ChatMessage(...)])",
|
||||
"estimated_commits": 5
|
||||
},
|
||||
"phase_6d": {
|
||||
"name": "Update NormalizedResponse construction",
|
||||
"scope": "Same 3 senders: usage_input_tokens/etc -> usage=UsageStats(...)",
|
||||
"estimated_commits": 4
|
||||
},
|
||||
"phase_6e": {
|
||||
"name": "Phase 3 Hypothetical Cost Deduction (Tier 2 authoritative deliverable)",
|
||||
"scope": "Tier 2 produces docs/reports/PHASE3_TIER2_ANALYSIS.md while doing 6b/6d work in src/ai_client.py; profiles all 6 senders + discovers hidden cross-references + provides refined cost estimates + recommendations for the future Phase 3 track. Supersedes Tier 1's draft at docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md (which stays as the hypothesis doc).",
|
||||
"estimated_commits": 2,
|
||||
"new_doc_file": "docs/reports/PHASE3_TIER2_ANALYSIS.md",
|
||||
"rationale": "Tier 2 is in src/ai_client.py anyway doing the 6b/6d migration work; they have full context to produce the authoritative Phase 3 cost analysis. The future Phase 3 track + the code_path_audit both need this data."
|
||||
}
|
||||
},
|
||||
"total_estimated_commits": 18,
|
||||
"deferred_work": {
|
||||
"phase_3_provider_state": {
|
||||
"deferred_to": "separate track post code_path_audit_20260607",
|
||||
"rationale": "Phase 3 has runtime hot-path concerns (per-LLM-turn history manipulation); the code_path_audit should measure cost BEFORE the refactor",
|
||||
"estimated_sites": 112,
|
||||
"estimation_method": "grep -c '_<provider>_history(?!_)' on src/ai_client.py per HANDOFF_CODE_PATH_AUDIT"
|
||||
},
|
||||
"cross_phase_coupling": {
|
||||
"deferred_to": "separate track",
|
||||
"rationale": "OpenAICompatibleRequest.tools: list[dict[str, Any]] -> list[ToolSpec] is a follow-up"
|
||||
},
|
||||
"audit_tier2_leaks_fix": {
|
||||
"deferred_to": "infrastructure track",
|
||||
"rationale": "3 sandbox-pollution failures; need --allowlist for mcp_paths.toml, opencode.json, .opencode/*"
|
||||
},
|
||||
"pre_existing_gui2_parity_flake": {
|
||||
"deferred_to": "investigation",
|
||||
"rationale": "test_gui2_custom_callback_hook_works flake; not introduced by this track"
|
||||
}
|
||||
},
|
||||
"unblocks": {
|
||||
"code_path_audit_20260607": "TypeError spam from broadcast() contaminates per-action profiling; Phase 6a fixes the underlying regression"
|
||||
},
|
||||
"verification_criteria": [
|
||||
"src/app_controller.py:_run_pending_tasks_once_result uses broadcast(WebSocketMessage(...))",
|
||||
"src/events.py broadcast callers use WebSocketMessage",
|
||||
"src/gui_2.py:_process_pending_gui_tasks broadcast callers use WebSocketMessage",
|
||||
"tests/test_websocket_broadcast_regression.py exists; asserts no broadcast() TypeError",
|
||||
"_send_grok constructs OpenAICompatibleRequest(messages=[ChatMessage(...)], ...)",
|
||||
"_send_minimax constructs OpenAICompatibleRequest(messages=[ChatMessage(...)], ...)",
|
||||
"_send_llama constructs OpenAICompatibleRequest(messages=[ChatMessage(...)], ...)",
|
||||
"_send_grok constructs NormalizedResponse(text=..., usage=UsageStats(...), ...)",
|
||||
"_send_minimax constructs NormalizedResponse(text=..., usage=UsageStats(...), ...)",
|
||||
"_send_llama constructs NormalizedResponse(text=..., usage=UsageStats(...), ...)",
|
||||
"All 11-tier batched test run passes (no stop-on-failure)",
|
||||
"audit_weak_types.py --strict exits 0",
|
||||
"audit_dataclass_coverage.py --strict exits 0",
|
||||
"End-of-track report at docs/reports/TRACK_COMPLETION_phase2_4_5_call_site_completion_20260621.md"
|
||||
],
|
||||
"sequencing_note": "This track unblocks code_path_audit_20260607. Run this track first; after merge, run the audit. The Phase 3 follow-up track runs AFTER the audit completes.",
|
||||
"ai_performance_analysis": {
|
||||
"win": "Fixes 1 runtime bug (broadcast() TypeError) + completes the Phase 2/5 migration for 3 senders (grok/minimax/llama). Makes code_path_audit_20260607 instrumentable.",
|
||||
"cost": "~16 commits; ~3 hours Tier 2.",
|
||||
"caveat": "The deferred Phase 3 (112 sites in ai_client.py) is still the biggest remaining work. The audit will quantify the cost before Phase 3 is migrated.",
|
||||
"honest_assessment": "Tight, focused track. Fits Tier 2's 1-4 hour budget. Unblocks the audit without ballooning scope."
|
||||
},
|
||||
"links": {
|
||||
"parent_track": "conductor/tracks/any_type_componentization_20260621/",
|
||||
"audit_track": "conductor/tracks/code_path_audit_20260607/",
|
||||
"phase3_hypothetical_analysis": "docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md",
|
||||
"handoff_docs": "docs/handoffs/"
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,650 @@
|
||||
# Phase 2/4/5 Call-Site Completion Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Fix the `HookServer.broadcast()` runtime bug + complete the Phase 2 `_send_grok` / `_send_minimax` / `_send_llama` migration to `OpenAICompatibleRequest(messages=[ChatMessage(...)])` and `NormalizedResponse(usage=UsageStats(...))`. Adds `tests/test_websocket_broadcast_regression.py` with a "no-TypeError-errors-on-any-thread" assertion that `code_path_audit_20260607` will reuse.
|
||||
|
||||
**Architecture:** 3 phases (Phase 6a + 6b + 6d). Phase 6a is the runtime bug fix (broadcast callers in 3 files). Phase 6b completes the t2_6 deferred OpenAI-compatible sender migration. Phase 6d updates those senders' `NormalizedResponse` to use `UsageStats`. No new modules; only consumer migration + 1 new regression test file.
|
||||
|
||||
**Tech Stack:** Python 3.11+ stdlib. Existing `src/openai_schemas.py` (Phase 2 of parent track) provides `ChatMessage`, `UsageStats`, `ToolCall`. Existing `src/api_hooks.py` (Phase 5 of parent track) provides `WebSocketMessage`.
|
||||
|
||||
**Reference Files:**
|
||||
- `docs/handoffs/PROMPT_FOR_TIER_1.md` — Tier 1 brief
|
||||
- `docs/handoffs/HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md` — test failure categorization
|
||||
- `docs/handoffs/HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md` — runtime cost framing
|
||||
- `conductor/tracks/phase2_4_5_call_site_completion_20260621/spec.md` — the design
|
||||
- `conductor/tracks/any_type_componentization_20260621/spec.md` — parent track
|
||||
- `src/openai_schemas.py` — ChatMessage + UsageStats + NormalizedResponse + OpenAICompatibleRequest
|
||||
- `src/api_hooks.py` — WebSocketMessage + HookServer.broadcast
|
||||
|
||||
**Code Style:** 1-space indentation, CRLF line endings, no comments in source code, type hints mandatory (per `conductor/workflow.md` Code Style section).
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
src/
|
||||
app_controller.py # MODIFIED (Phase 6a): _run_pending_tasks_once_result broadcast callers
|
||||
events.py # MODIFIED (Phase 6a): broadcast callers
|
||||
gui_2.py # MODIFIED (Phase 6a): _process_pending_gui_tasks broadcast callers
|
||||
ai_client.py # MODIFIED (Phase 6b+6d): _send_grok/_send_minimax/_send_llama
|
||||
api_hooks.py # UNCHANGED (the broadcast() change is correct)
|
||||
|
||||
tests/
|
||||
test_websocket_broadcast_regression.py # NEW (Phase 6a): no-TypeError assertion
|
||||
test_grok_provider.py # MODIFIED (Phase 6b+6d): verify ChatMessage + UsageStats
|
||||
test_minimax_provider.py # MODIFIED (Phase 6b+6d): verify ChatMessage + UsageStats
|
||||
test_llama_provider.py # MODIFIED (Phase 6b+6d): verify ChatMessage + UsageStats
|
||||
|
||||
docs/reports/
|
||||
TRACK_COMPLETION_phase2_4_5_call_site_completion_20260621.md # NEW (verify)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 6a: Fix HookServer.broadcast() Callers
|
||||
|
||||
Focus: Replace `broadcast(channel, payload)` with `broadcast(WebSocketMessage(channel=, payload=))` at all internal call sites in `src/`.
|
||||
|
||||
### Task 6a.1: Catalog all broadcast() callers
|
||||
|
||||
**Files:**
|
||||
- Search: `src/app_controller.py`, `src/events.py`, `src/gui_2.py`
|
||||
|
||||
- [ ] **Step 1: Grep for all internal callers**
|
||||
|
||||
Run: `Select-String -Path src/app_controller.py,src/events.py,src/gui_2.py -Pattern '\.broadcast\('`
|
||||
Expected: 5-10 sites (per HANDOFF_FOLLOWUP §5: app_controller.py:_run_pending_tasks_once_result 1-3, events.py 1-3, gui_2.py 1-3)
|
||||
|
||||
- [ ] **Step 2: Document the list**
|
||||
|
||||
For each call site, record `(file:line, current_call_signature, replacement_call_signature)` in your working notes. Example:
|
||||
- `src/app_controller.py:N broadcast(channel_str, payload_dict)` → `broadcast(WebSocketMessage(channel=channel_str, payload=payload_dict))`
|
||||
|
||||
### Task 6a.2: Write failing regression test
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/test_websocket_broadcast_regression.py`
|
||||
|
||||
- [ ] **Step 1: Write the test**
|
||||
|
||||
```python
|
||||
"""Regression test for the HookServer.broadcast() runtime TypeError bug.
|
||||
|
||||
This test ensures that no internal caller of HookServer.broadcast() passes
|
||||
the OLD (channel, payload) signature after Phase 5 changed it to
|
||||
(message: WebSocketMessage). The audit (code_path_audit_20260607) reuses
|
||||
this assertion.
|
||||
"""
|
||||
import asyncio
|
||||
import sys
|
||||
from src.api_hooks import WebSocketMessage
|
||||
|
||||
|
||||
def test_broadcast_accepts_websocket_message() -> None:
|
||||
"""HookServer.broadcast must accept a single WebSocketMessage argument."""
|
||||
from src.api_hooks import HookServer
|
||||
import inspect
|
||||
sig = inspect.signature(HookServer.broadcast)
|
||||
params = list(sig.parameters.keys())
|
||||
# self + 1 positional arg
|
||||
assert len(params) == 2, f"expected 2 params (self + message), got {len(params)}: {params}"
|
||||
|
||||
|
||||
def test_broadcast_rejects_legacy_2arg_call() -> None:
|
||||
"""Calling broadcast with 2 positional args (legacy signature) must raise TypeError."""
|
||||
from src.api_hooks import HookServer
|
||||
server = HookServer()
|
||||
try:
|
||||
server.broadcast("channel", {"key": "value"})
|
||||
except TypeError as e:
|
||||
assert "takes 2 positional arguments" in str(e) or "takes 1 positional argument" in str(e)
|
||||
return
|
||||
assert False, "broadcast should reject legacy 2-arg call"
|
||||
|
||||
|
||||
def test_internal_callers_use_websocket_message_signature() -> None:
|
||||
"""Grep all internal callers of broadcast() and assert they use the new signature."""
|
||||
import subprocess
|
||||
result = subprocess.run(
|
||||
["grep", "-rn", r"\.broadcast\(", "src/"],
|
||||
capture_output=True, text=True,
|
||||
)
|
||||
lines = [l for l in result.stdout.split("\n") if l and "tests/" not in l]
|
||||
for line in lines:
|
||||
file, lineno, content = line.split(":", 2)
|
||||
# The new signature is broadcast(WebSocketMessage(...))
|
||||
# The old signature is broadcast("string", {...})
|
||||
if "WebSocketMessage(" not in content and 'broadcast("' in content:
|
||||
assert False, f"{file}:{lineno} uses legacy signature: {content.strip()}"
|
||||
|
||||
|
||||
def test_no_typeerror_during_gui_task_processing() -> None:
|
||||
"""Smoke test: simulate a GUI task that triggers broadcast; assert no TypeError on any thread."""
|
||||
import logging
|
||||
import io
|
||||
# Capture stderr to detect worker[queue_fallback] error spam
|
||||
captured = io.StringIO()
|
||||
handler = logging.StreamHandler(captured)
|
||||
handler.setLevel(logging.ERROR)
|
||||
logging.getLogger().addHandler(handler)
|
||||
try:
|
||||
# Trigger a task that would have hit the broadcast bug
|
||||
# (This is a structural test — the actual GUI thread simulation is in live_gui tests)
|
||||
import asyncio
|
||||
from src.api_hooks import HookServer, WebSocketMessage
|
||||
server = HookServer()
|
||||
msg = WebSocketMessage(channel="test", payload={"key": "value"})
|
||||
server.broadcast(msg) # must not raise
|
||||
finally:
|
||||
logging.getLogger().removeHandler(handler)
|
||||
stderr_output = captured.getvalue()
|
||||
assert "WebSocketServer.broadcast()" not in stderr_output, f"TypeError detected: {stderr_output}"
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run test to verify first one fails**
|
||||
|
||||
Run: `uv run pytest tests/test_websocket_broadcast_regression.py -v`
|
||||
Expected: The first test passes (the signature is already `(self, message)`); the second passes (legacy call raises); the THIRD may FAIL (internal callers still use old signature — that's what we're fixing); the fourth passes (the smoke test).
|
||||
|
||||
### Task 6a.3: Fix `src/app_controller.py:_run_pending_tasks_once_result` broadcast callers
|
||||
|
||||
- [ ] **Step 1: Find the call sites**
|
||||
|
||||
Run: `Select-String -Path src/app_controller.py -Pattern '\.broadcast\('`
|
||||
Expected: 1-3 lines in `_run_pending_tasks_once_result`
|
||||
|
||||
- [ ] **Step 2: For each call site, replace**
|
||||
|
||||
Old:
|
||||
```python
|
||||
self.web_socket_server.broadcast(channel_str, payload_dict)
|
||||
```
|
||||
|
||||
New:
|
||||
```python
|
||||
from src.api_hooks import WebSocketMessage
|
||||
self.web_socket_server.broadcast(WebSocketMessage(channel=channel_str, payload=payload_dict))
|
||||
```
|
||||
|
||||
(Add the import at the top of the function or file if not already present.)
|
||||
|
||||
- [ ] **Step 3: Run regression test**
|
||||
|
||||
Run: `uv run pytest tests/test_websocket_broadcast_regression.py::test_internal_callers_use_websocket_message_signature -v`
|
||||
Expected: should fail for events.py + gui_2.py still; pass for app_controller.py
|
||||
|
||||
### Task 6a.4: Fix `src/events.py` broadcast callers
|
||||
|
||||
- [ ] **Step 1: Find call sites**
|
||||
|
||||
Run: `Select-String -Path src/events.py -Pattern '\.broadcast\('`
|
||||
|
||||
- [ ] **Step 2: Replace each with `WebSocketMessage(...)` wrapper**
|
||||
|
||||
- [ ] **Step 3: Run regression test**
|
||||
|
||||
Run: `uv run pytest tests/test_websocket_broadcast_regression.py::test_internal_callers_use_websocket_message_signature -v`
|
||||
|
||||
### Task 6a.5: Fix `src/gui_2.py:_process_pending_gui_tasks` broadcast callers
|
||||
|
||||
- [ ] **Step 1: Find call sites**
|
||||
|
||||
Run: `Select-String -Path src/gui_2.py -Pattern '\.broadcast\('`
|
||||
|
||||
- [ ] **Step 2: Replace each with `WebSocketMessage(...)` wrapper**
|
||||
|
||||
- [ ] **Step 3: Run regression test**
|
||||
|
||||
Run: `uv run pytest tests/test_websocket_broadcast_regression.py -v`
|
||||
Expected: all 4 tests pass
|
||||
|
||||
### Task 6a.6: Run tier-1-unit-core FULLY per the regression protocol
|
||||
|
||||
- [ ] **Step 1: Run the full tier-1-unit-core tier (no stop-on-failure)**
|
||||
|
||||
Run: `uv run python scripts/run_tests_batched.py --tier tier-1-unit-core`
|
||||
Expected: all PASS (the "no-TypeError" assertion catches the broadcast bug; any other regressions surface)
|
||||
|
||||
### Task 6a.7: Phase 6a checkpoint
|
||||
|
||||
- [ ] **Step 1: Commit**
|
||||
|
||||
```bash
|
||||
git add src/app_controller.py src/events.py src/gui_2.py tests/test_websocket_broadcast_regression.py
|
||||
git commit -m "fix(broadcast): migrate HookServer.broadcast() callers to WebSocketMessage signature
|
||||
|
||||
Phase 5 of any_type_componentization_20260621 changed
|
||||
HookServer.broadcast(channel, payload) -> broadcast(message: WebSocketMessage)
|
||||
but did not update internal callers in app_controller.py, events.py, gui_2.py.
|
||||
This produced worker[queue_fallback] TypeError spam on the GUI thread.
|
||||
|
||||
Fix: wrap each call site with WebSocketMessage(channel=, payload=).
|
||||
Adds tests/test_websocket_broadcast_regression.py with a no-TypeError assertion
|
||||
that code_path_audit_20260607 will reuse."
|
||||
git notes add -m "Phase 6a checkpoint: broadcast() TypeError fixed; 4 regression tests added; tier-1-unit-core passes FULLY" HEAD
|
||||
```
|
||||
|
||||
Update `conductor/tracks/phase2_4_5_call_site_completion_20260621/state.toml` to mark phase_6a status="completed" + checkpointsha.
|
||||
|
||||
---
|
||||
|
||||
## Phase 6b: Complete `_send_grok` / `_send_minimax` / `_send_llama` OpenAICompatibleRequest Migration
|
||||
|
||||
Focus: Migrate the 3 OpenAI-compatible senders in `src/ai_client.py` to construct `OpenAICompatibleRequest(messages=[ChatMessage(...)])` instead of `messages=[{"role": ..., "content": ...}]`.
|
||||
|
||||
### Task 6b.1: Identify existing provider tests
|
||||
|
||||
- [ ] **Step 1: Check for provider-specific test files**
|
||||
|
||||
Run: `Get-ChildItem tests/test_*provider*.py 2>&1 | Select-String -Pattern 'grok|minimax|llama'`
|
||||
Expected: at least one of `tests/test_grok_provider.py`, `tests/test_minimax_provider.py`, `tests/test_llama_provider.py`; if any are missing, add a smoke test (Task 6b.1b).
|
||||
|
||||
- [ ] **Step 1b: (if any missing) Add smoke test**
|
||||
|
||||
For each missing provider, create `tests/test_<provider>_provider.py`:
|
||||
```python
|
||||
"""Smoke tests for the OpenAI-compatible _send_<provider> path."""
|
||||
def test_<provider>_sends_chat_message() -> None:
|
||||
"""Verify _send_<provider> constructs OpenAICompatibleRequest with ChatMessage."""
|
||||
from src.ai_client import _send_<provider>
|
||||
import inspect
|
||||
src = inspect.getsource(_send_<provider>)
|
||||
# Old signature: messages=[{"role": ...
|
||||
# New signature: messages=[ChatMessage(...
|
||||
assert "ChatMessage" in src or 'messages=[ChatMessage' in src, f"_send_<provider} still uses legacy dict shape"
|
||||
```
|
||||
|
||||
### Task 6b.2: Write failing tests for ChatMessage in OpenAICompatibleRequest construction
|
||||
|
||||
**Files:**
|
||||
- Modify: each provider test file
|
||||
|
||||
For each provider, add:
|
||||
```python
|
||||
def test_<provider>_constructs_openai_compatible_request_with_chat_message() -> None:
|
||||
"""_send_<provider> must use ChatMessage, not dict literals."""
|
||||
from src.openai_schemas import OpenAICompatibleRequest, ChatMessage
|
||||
# Mock the underlying API call; just verify the shape
|
||||
# (Actual call is too expensive for a unit test)
|
||||
import inspect
|
||||
src = inspect.getsource(_send_<provider>)
|
||||
# Look for the OpenAICompatibleRequest instantiation
|
||||
assert "OpenAICompatibleRequest" in src
|
||||
# Look for ChatMessage usage (not legacy dict shape)
|
||||
assert "ChatMessage(" in src, f"_send_<provider} still uses legacy dict shape"
|
||||
assert 'messages=[{"role"' not in src, f"_send_<provider} still uses legacy dict shape"
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run tests to verify they fail**
|
||||
|
||||
Run: `uv run pytest tests/test_grok_provider.py tests/test_minimax_provider.py tests/test_llama_provider.py -v`
|
||||
Expected: FAIL (the 3 senders still use `messages=[{"role": ..., "content": ...}]`)
|
||||
|
||||
### Task 6b.3: Migrate `src/ai_client.py:_send_grok` (L2532)
|
||||
|
||||
- [ ] **Step 1: Read the current implementation**
|
||||
|
||||
Run: `Get-Content src/ai_client.py | Select-Object -Skip 2530 -First 80`
|
||||
|
||||
- [ ] **Step 2: Add ChatMessage import + replace dict construction**
|
||||
|
||||
At the top of `_send_grok`:
|
||||
```python
|
||||
from src.openai_schemas import ChatMessage, NormalizedResponse, OpenAICompatibleRequest, UsageStats
|
||||
```
|
||||
|
||||
Replace each `messages=[{"role": ..., "content": ...}]` with `messages=[ChatMessage(role=..., content=...)]`.
|
||||
|
||||
- [ ] **Step 3: Run grok test**
|
||||
|
||||
Run: `uv run pytest tests/test_grok_provider.py -v`
|
||||
|
||||
### Task 6b.4: Migrate `src/ai_client.py:_send_minimax` (L2616)
|
||||
|
||||
Same pattern as Task 6b.3.
|
||||
|
||||
### Task 6b.5: Migrate `src/ai_client.py:_send_llama` (L2856)
|
||||
|
||||
Same pattern as Task 6b.3.
|
||||
|
||||
### Task 6b.6: Run tier-1-unit-core + provider tests FULLY
|
||||
|
||||
- [ ] **Step 1: Run the tests**
|
||||
|
||||
Run: `uv run python scripts/run_tests_batched.py --tier tier-1-unit-core`
|
||||
Expected: all PASS
|
||||
|
||||
Run: `uv run pytest tests/test_grok_provider.py tests/test_minimax_provider.py tests/test_llama_provider.py -v`
|
||||
Expected: all PASS
|
||||
|
||||
### Task 6b.7: Phase 6b checkpoint
|
||||
|
||||
```bash
|
||||
git add src/ai_client.py tests/test_grok_provider.py tests/test_minimax_provider.py tests/test_llama_provider.py
|
||||
git commit -m "refactor(ai_client): migrate _send_grok/_send_minimax/_send_llama to ChatMessage API
|
||||
|
||||
Completes the deferred t2_6 task from any_type_componentization_20260621 Phase 2.
|
||||
The 3 OpenAI-compatible senders now construct OpenAICompatibleRequest with
|
||||
messages=[ChatMessage(role=, content=)] instead of messages=[dict] literals."
|
||||
git notes add -m "Phase 6b checkpoint: 3 senders migrated to ChatMessage API" HEAD
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 6d: Update Those Senders' `NormalizedResponse` Construction
|
||||
|
||||
Focus: Replace `NormalizedResponse(text=..., usage_input_tokens=X, usage_output_tokens=Y, ...)` with `NormalizedResponse(text=..., usage=UsageStats(input_tokens=X, ...))` in the 3 OpenAI-compatible senders.
|
||||
|
||||
### Task 6d.1: Write failing tests for UsageStats in NormalizedResponse
|
||||
|
||||
For each provider test:
|
||||
```python
|
||||
def test_<provider>_constructs_normalized_response_with_usage_stats() -> None:
|
||||
"""_send_<provider> must use UsageStats, not separate int fields."""
|
||||
import inspect
|
||||
src = inspect.getsource(_send_<provider>)
|
||||
# Look for the old kwargs (4 separate int fields)
|
||||
assert "usage_input_tokens=" not in src, f"_send_<provider} still uses legacy usage_XXX fields"
|
||||
# Look for the new UsageStats field
|
||||
assert "usage=UsageStats(" in src or "usage=UsageStats " in src
|
||||
```
|
||||
|
||||
- [ ] **Step 1: Run tests to verify they fail**
|
||||
|
||||
Run: `uv run pytest tests/test_grok_provider.py tests/test_minimax_provider.py tests/test_llama_provider.py -v`
|
||||
Expected: FAIL on the 3 new tests
|
||||
|
||||
### Task 6d.2-6d.4: Migrate each sender's `NormalizedResponse` construction
|
||||
|
||||
For each of `_send_grok`, `_send_minimax`, `_send_llama`:
|
||||
|
||||
- [ ] **Step 1: Find the `NormalizedResponse(...)` construction**
|
||||
|
||||
- [ ] **Step 2: Replace 4 separate int fields with `UsageStats(...)`**
|
||||
|
||||
Old:
|
||||
```python
|
||||
NormalizedResponse(
|
||||
text=text,
|
||||
tool_calls=(),
|
||||
usage_input_tokens=in_tok,
|
||||
usage_output_tokens=out_tok,
|
||||
usage_cache_read_tokens=cache_read,
|
||||
usage_cache_creation_tokens=cache_create,
|
||||
raw_response=raw,
|
||||
)
|
||||
```
|
||||
|
||||
New:
|
||||
```python
|
||||
NormalizedResponse(
|
||||
text=text,
|
||||
tool_calls=(),
|
||||
usage=UsageStats(
|
||||
input_tokens=in_tok,
|
||||
output_tokens=out_tok,
|
||||
cache_read_tokens=cache_read,
|
||||
cache_creation_tokens=cache_create,
|
||||
),
|
||||
raw_response=raw,
|
||||
)
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Run provider test**
|
||||
|
||||
Run: `uv run pytest tests/test_<provider>_provider.py -v`
|
||||
|
||||
### Task 6d.5: Run ALL 11 tiers FULLY per regression protocol
|
||||
|
||||
- [ ] **Step 1: Run the full batched suite**
|
||||
|
||||
Run: `uv run python scripts/run_tests_batched.py`
|
||||
Expected: all 11 tiers PASS (no stop-on-failure per the regression protocol)
|
||||
|
||||
### Task 6d.6: Phase 6d checkpoint
|
||||
|
||||
```bash
|
||||
git add src/ai_client.py tests/test_grok_provider.py tests/test_minimax_provider.py tests/test_llama_provider.py
|
||||
git commit -m "refactor(ai_client): migrate _send_grok/_send_minimax/_send_llama NormalizedResponse to UsageStats
|
||||
|
||||
Completes the NormalizedResponse migration for the 3 OpenAI-compatible senders.
|
||||
They now construct UsageStats(input_tokens=, output_tokens=, cache_read_tokens=,
|
||||
cache_creation_tokens=) instead of 4 separate int fields."
|
||||
git notes add -m "Phase 6d checkpoint: 3 senders use UsageStats; all 11 tiers pass FULLY" HEAD
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 6e: Phase 3 Hypothetical Cost Deduction (Tier 2 authoritative deliverable)
|
||||
|
||||
Focus: While doing Phase 6b/6d work in `src/ai_client.py`, Tier 2 is reading and modifying the 3 senders anyway. They have the context to produce the authoritative Phase 3 cost analysis (deferred from `any_type_componentization_20260621`). This phase is the **Tier 2 deliverable** that supersedes Tier 1's hypothesis at `docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md`.
|
||||
|
||||
**Tier 1's hypothesis** stays as the placeholder; Tier 2's `PHASE3_TIER2_ANALYSIS.md` is the refined version with in-context, post-Phase-6b/6d-grounded estimates.
|
||||
|
||||
### Task 6e.1: Profile the 6 senders (during Phase 6b/6d work)
|
||||
|
||||
**No new code; pure analysis.** While doing Tasks 6b.3-6b.5 (migrating `_send_grok` / `_send_minimax` / `_send_llama`) and Tasks 6d.2-6d.4 (updating their `NormalizedResponse`), Tier 2 reads the surrounding code and documents:
|
||||
|
||||
For each of the 6 senders, capture in working notes:
|
||||
- All `_anthropic_history` / `_anthropic_history_lock` references (categorized: append, len/iteration, lock-acquire, with-lock-block, global-decl, helper-call)
|
||||
- Helper function call sites (`_repair_<provider>_history`, `_trim_<provider>_history`, `_strip_cache_controls`, `_add_history_cache_breakpoint`)
|
||||
- **Hidden call sites** Tier 2 discovers that Tier 1's grep missed (e.g., `_repair_anthropic_history` is called from `_send_anthropic` AND from `cleanup()` — that's a hidden cross-reference Tier 1's grep didn't see)
|
||||
|
||||
For the 3 senders NOT touched by 6b/6d (`_send_anthropic`, `_send_deepseek`, `_send_qwen`):
|
||||
- Same profiling
|
||||
- Tier 2 reads these while doing the 6b/6d work for context (they share helper patterns)
|
||||
|
||||
### Task 6e.2: Qualitative cost estimation per sender
|
||||
|
||||
For each of the 6 senders, for each codepath category:
|
||||
|
||||
| Category | Current (dict globals) | Proposed (ProviderHistory dataclass) | Per-call delta |
|
||||
|---|---|---|---|
|
||||
| `_<provider>_history.append(m)` | dict.append (~100ns) | dataclass method + lock acquire (~300ns) | **+200ns per call** |
|
||||
| `len(_<provider>_history)` | direct attribute (~50ns) | `.messages` attribute (~100ns) | **+50ns per call** |
|
||||
| `for m in _<provider>_history:` | direct iteration | `h.get_all()` (list copy) OR `with h.lock:` | **+5-10μs per call** (if `get_all()`) |
|
||||
| `with _<provider>_history_lock:` | direct lock | `with h.lock:` | **~0** (same lock) |
|
||||
| `_global _<provider>_history` (in cleanup) | N/A (declaration) | N/A (removed) | **N/A** |
|
||||
|
||||
For each sender, sum the per-turn overhead:
|
||||
- `_send_anthropic` (25 sites; per-turn): estimate total overhead per LLM turn
|
||||
- `_send_deepseek` (20 sites; per-turn): estimate
|
||||
- ... etc for all 6
|
||||
|
||||
### Task 6e.3: Identify the hot iteration sites that need `with h.lock:` pattern
|
||||
|
||||
**Critical:** the `_strip_cache_controls(_anthropic_history)` and `_estimate_prompt_tokens(...)` callsites iterate the list per LLM turn. If the migration uses `h.get_all()`, they pay a list-copy cost (~5-10μs per call).
|
||||
|
||||
Document each iteration site with:
|
||||
- File:line
|
||||
- Call frequency per LLM turn
|
||||
- Recommended pattern: `with h.lock: msg_list = h.messages` vs `h.get_all()`
|
||||
- Justification
|
||||
|
||||
### Task 6e.4: Author `docs/reports/PHASE3_TIER2_ANALYSIS.md`
|
||||
|
||||
**Files:**
|
||||
- Create: `docs/reports/PHASE3_TIER2_ANALYSIS.md`
|
||||
|
||||
Structure (Tier 2 produces this from the analysis in 6e.1-6e.3):
|
||||
|
||||
```markdown
|
||||
# Phase 3 Hypothetical Cost Analysis (Tier 2 authoritative version)
|
||||
|
||||
**Author:** Tier 2 Tech Lead (autonomous sandbox)
|
||||
**Date:** 2026-06-21
|
||||
**Context:** Produced during `phase2_4_5_call_site_completion_20260621` Phase 6e (after Phase 6b/6d work in `src/ai_client.py`).
|
||||
**Supersedes:** Tier 1's hypothesis at `docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md` (kept as the hypothesis doc; this is the refined version).
|
||||
|
||||
---
|
||||
|
||||
## 1. Methodology
|
||||
|
||||
Tier 2 profiled the 6 senders in `src/ai_client.py` (`_send_anthropic`, `_send_deepseek`, `_send_minimax`, `_send_grok`, `_send_qwen`, `_send_llama`) while doing the Phase 6b/6d migration work. This analysis is grounded in actual code reading + Phase 6b/6d context.
|
||||
|
||||
## 2. Per-Sender Codepath Catalog
|
||||
|
||||
### 2.1 `_send_anthropic` (25 sites)
|
||||
[Fill in from 6e.1 working notes]
|
||||
- Direct sites: 22 `_anthropic_history` refs; 2 `_anthropic_history_lock` refs; 1 `global` decl
|
||||
- Helper sites: `_strip_cache_controls`, `_repair_anthropic_history`, `_add_history_cache_breakpoint`, `_trim_anthropic_history`
|
||||
- Hidden cross-references (Tier 2 found): [list any]
|
||||
|
||||
### 2.2-2.6 [other senders; same structure]
|
||||
|
||||
## 3. Qualitative Cost Estimation
|
||||
|
||||
### 3.1 Per-call cost categories
|
||||
[Fill in from 6e.2 table]
|
||||
|
||||
### 3.2 Per-sender per-turn overhead
|
||||
[Fill in from 6e.2 sum]
|
||||
|
||||
### 3.3 Hot iteration sites (the `with h.lock:` pattern)
|
||||
[Fill in from 6e.3]
|
||||
|
||||
## 4. Comparison vs Tier 1's Hypothesis
|
||||
|
||||
| Sender | Tier 1 hypothesis (μs/turn) | Tier 2 refined (μs/turn) | Delta |
|
||||
|---|---|---|---|
|
||||
| anthropic | +8-15 | [Tier 2 actual] | [reason] |
|
||||
| deepseek | +3-7 | [Tier 2 actual] | [reason] |
|
||||
| minimax | +3-7 | [Tier 2 actual] | [reason] |
|
||||
| grok | +2-5 | [Tier 2 actual] | [reason] |
|
||||
| qwen | +2-5 | [Tier 2 actual] | [reason] |
|
||||
| llama | +4-8 | [Tier 2 actual] | [reason] |
|
||||
| **Total** | **~+1.1-2.4ms/session** | [Tier 2 actual] | [reason] |
|
||||
|
||||
## 5. Recommendations for Future Phase 3 Track
|
||||
|
||||
1. **Anthropic first** (highest ROI; per-turn; cache controls)
|
||||
2. **Use `with h.lock: msg_list = h.messages` pattern for hot iteration sites** (avoids `get_all()` list-copy cost)
|
||||
3. **Simpler providers (qwen, grok) can use `get_all()`** since iteration is less frequent
|
||||
4. **Lock semantics unchanged** — `ProviderHistory.lock` is per-instance; no cross-provider contention
|
||||
5. **Hidden cross-references** discovered during this analysis [list] should be the first sites to migrate
|
||||
|
||||
## 6. Open Questions
|
||||
|
||||
[Fill in any unresolved questions; defer to the audit for runtime quantification]
|
||||
|
||||
## 7. See Also
|
||||
|
||||
- `docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md` — Tier 1's hypothesis (the "what we thought before Tier 2 looked")
|
||||
- `conductor/tracks/phase2_4_5_call_site_completion_20260621/spec.md` — Phase 6e directives
|
||||
- `conductor/tracks/code_path_audit_20260607/spec.md` — the audit that quantifies these estimates
|
||||
- `docs/handoffs/PROMPT_FOR_TIER_1.md` — Tier 1 brief
|
||||
```
|
||||
|
||||
### Task 6e.5: Phase 6e checkpoint
|
||||
|
||||
- [ ] **Step 1: Commit the analysis**
|
||||
|
||||
```bash
|
||||
git add docs/reports/PHASE3_TIER2_ANALYSIS.md
|
||||
git commit -m "docs(analysis): PHASE3_TIER2_ANALYSIS - authoritative Phase 3 cost hypothesis
|
||||
|
||||
Tier 2 produced this analysis during phase2_4_5_call_site_completion_20260621
|
||||
Phase 6e. Supersedes Tier 1's draft at PHASE3_HYPOTHETICAL_PROMOTION.md (kept
|
||||
as the hypothesis doc; this is the refined version with in-context data
|
||||
from Phase 6b/6d work in src/ai_client.py).
|
||||
|
||||
Covers all 6 senders (anthropic, deepseek, minimax, grok, qwen, llama)
|
||||
with per-site cost estimates + hidden cross-references + recommendations
|
||||
for the future Phase 3 track. The audit (code_path_audit_20260607)
|
||||
quantifies these estimates after merge."
|
||||
git notes add -m "Phase 6e checkpoint: Tier 2 authoritative Phase 3 cost analysis committed" HEAD
|
||||
```
|
||||
|
||||
Update `state.toml` to mark phase_6e status="completed" + checkpointsha.
|
||||
|
||||
---
|
||||
|
||||
## Verify + Archive
|
||||
|
||||
```bash
|
||||
uv run python scripts/audit_weak_types.py --strict
|
||||
uv run python scripts/audit_dataclass_coverage.py --strict
|
||||
uv run python scripts/generate_type_registry.py --check
|
||||
```
|
||||
Expected: all exit 0
|
||||
|
||||
### Task V.2: Write end-of-track report
|
||||
|
||||
Create `docs/reports/TRACK_COMPLETION_phase2_4_5_call_site_completion_20260621.md` covering:
|
||||
- Executive summary (16 commits; 3 phases; the broadcast() fix; the 3 OpenAI-compatible senders migrated)
|
||||
- The broadcast() TypeError bug (root cause + fix)
|
||||
- The Phase 2 migration completion (3 senders now use ChatMessage + UsageStats)
|
||||
- The regression protocol (run all 11 tiers FULLY; the no-TypeError assertion)
|
||||
- Verification commands + results
|
||||
- What's still deferred (Phase 3 + cross-phase coupling + sandbox fixes)
|
||||
- Follow-up: code_path_audit_20260607 (now unblocked)
|
||||
|
||||
```bash
|
||||
git add docs/reports/TRACK_COMPLETION_phase2_4_5_call_site_completion_20260621.md
|
||||
git commit -m "docs(reports): TRACK_COMPLETION_phase2_4_5_call_site_completion_20260621"
|
||||
```
|
||||
|
||||
### Task V.3: Archive + tracks.md update
|
||||
|
||||
```bash
|
||||
git mv conductor/tracks/phase2_4_5_call_site_completion_20260621 conductor/tracks/archive/
|
||||
```
|
||||
|
||||
Update `conductor/tracks.md` to move the entry to "Recently Completed."
|
||||
|
||||
Update `state.toml` to mark all phases completed.
|
||||
|
||||
```bash
|
||||
git add -A
|
||||
git commit -m "conductor(archive): ship phase2_4_5_call_site_completion_20260621 to archive"
|
||||
git notes add -m "TRACK COMPLETE: phase2_4_5_call_site_completion_20260621. broadcast() TypeError fixed; 3 OpenAI-compatible senders migrated to ChatMessage + UsageStats; test_websocket_broadcast_regression.py added with no-TypeError assertion. Unblocks code_path_audit_20260607." HEAD
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Self-Review
|
||||
|
||||
**1. Spec coverage check:** Every section in `spec.md` maps to a task in this plan.
|
||||
|
||||
| Spec section | Plan coverage |
|
||||
|---|---|
|
||||
| §1 Overview | Background; goal stated at top of plan |
|
||||
| §2 Goals (A/A/B/C/D) | Phase 6a (A: broadcast) + Phase 6b (A: OpenAICompatibleRequest) + Phase 6d (B: NormalizedResponse) + regression protocol across all phases |
|
||||
| §3 Architecture | §3.1-3.3 → Phase 6a (broadcast fix) + Phase 6b-6d (sender migration) |
|
||||
| §4 Per-Phase Plan | Phase 6a (Tasks 6a.1-6a.7) + Phase 6b (Tasks 6b.1-6b.7) + Phase 6d (Tasks 6d.1-6d.6) |
|
||||
| §5 Configuration | No new deps (consistent throughout) |
|
||||
| §6 Testing Strategy | Each Phase has tests; regression protocol task V.5 |
|
||||
| §7 Migration / Rollout | 3 phases × ~5 commits each = ~16 atomic commits |
|
||||
| §8 Risks | Addressed via regression protocol + Tier 1 audit-base verification |
|
||||
| §9 Out of Scope | Phase 3 + cross-phase coupling + sandbox fixes + flake: documented as deferred |
|
||||
| §10 Verification Criteria | All 14 items covered in tasks V.1-V.3 + per-phase tests |
|
||||
|
||||
**2. Placeholder scan:** No "TBD", "TODO", "fill in details" in actionable steps.
|
||||
|
||||
**3. Type consistency:** `WebSocketMessage`, `ChatMessage`, `UsageStats`, `NormalizedResponse`, `OpenAICompatibleRequest` used consistently with the parent track's `src/openai_schemas.py` + `src/api_hooks.py`.
|
||||
|
||||
**4. Ambiguity:** Step descriptions are concrete (specific file:line refs, full code blocks, exact verification commands).
|
||||
|
||||
---
|
||||
|
||||
## Execution Handoff
|
||||
|
||||
Plan complete and saved to `conductor/tracks/phase2_4_5_call_site_completion_20260621/plan.md`.
|
||||
|
||||
**Tier 2 autonomous sandbox command:**
|
||||
```
|
||||
/tier-2-auto-execute phase2_4_5_call_site_completion_20260621
|
||||
```
|
||||
(or `uv run python scripts/mma_exec.py --role tier2-autonomous --track phase2_4_5_call_site_completion_20260621`)
|
||||
|
||||
**Pre-flight:**
|
||||
1. Tier 2 creates `tier2/phase2_4_5_call_site_completion_20260621` branch from `master`
|
||||
2. Phase 6a starts immediately (the broadcast() bug fix is the unblocker for the audit)
|
||||
3. After Phase 6a lands: run `tier-1-unit-core` FULLY per the regression protocol
|
||||
4. After all phases: archive + end-of-track report
|
||||
5. Tier 1 reviews + merges
|
||||
6. After merge: launch `code_path_audit_20260607` (the audit's pre-flight adjustments are committed; it can start)
|
||||
|
||||
**Estimated runtime:** ~3 hours Tier 2 work; ~16 atomic commits; 3 phases with checkpoint commits.
|
||||
@@ -0,0 +1,256 @@
|
||||
# Track: Phase 2/4/5 Call-Site Completion (post `any_type_componentization_20260621`)
|
||||
|
||||
**Status:** Active (spec approved 2026-06-21)
|
||||
**Initialized:** 2026-06-21
|
||||
**Owner:** Tier 2 Tech Lead (autonomous sandbox recommended)
|
||||
**Priority:** A (blocks `code_path_audit_20260607`; runtime TypeError pollutes audit instrumentation)
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
The `any_type_componentization_20260621` track shipped 48 of 89 fat-struct promotions across 6 phases but **deferred Phase 3** (41 `ProviderHistory` call sites in `src/ai_client.py`) and **left 1 runtime bug**: the Phase 5 `HookServer.broadcast()` signature change (from `(channel, payload)` → `(message: WebSocketMessage)`) was not propagated to internal callers in `src/app_controller.py` and `src/events.py`. This produces `worker[queue_fallback] error: WebSocketServer.broadcast() takes 2 positional arguments but 3 were given` spam on the GUI thread.
|
||||
|
||||
**Tier 1's decision (per `docs/handoffs/PROMPT_FOR_TIER_1.md`):** **SHINK** the follow-up to **Phases 6a + 6b + 6d** only. Defer Phase 3 (`provider_state` call-site migration) to a separate track after `code_path_audit_20260607` provides runtime cost data.
|
||||
|
||||
**This track does 3 things:**
|
||||
1. **Phase 6a** — Fix the runtime bug: migrate `HookServer.broadcast()` callers to the new `WebSocketMessage` signature. Adds a "no-TypeError-errors-on-any-thread" regression test that `code_path_audit_20260607` will reuse.
|
||||
2. **Phase 6b** — Complete the Phase 2 t2_6 deferred task: migrate `_send_grok` / `_send_minimax` / `_send_llama` to construct `OpenAICompatibleRequest(messages=[ChatMessage(...)], ...)` instead of the legacy `messages=[{"role": ..., "content": ...}]` shape. The 3 OpenAI-compatible providers are currently unprofiled and untyped at the call site.
|
||||
3. **Phase 6d** — Update those 3 senders' `NormalizedResponse(text=..., usage_input_tokens=..., ...)` construction to `NormalizedResponse(text=..., usage=UsageStats(...))` (the dataclass signature change from Phase 2).
|
||||
|
||||
**Phase 6c (full ProviderHistory migration in `ai_client.py`) is explicitly OUT OF SCOPE.** It gets its own track after `code_path_audit_20260607` produces per-action cost data.
|
||||
|
||||
## 2. Goals (Priority Order)
|
||||
|
||||
| Priority | Goal | Why |
|
||||
|---|---|---|
|
||||
| **A (blocker)** | Phase 6a: Fix `HookServer.broadcast()` callers; no TypeError spam | Unblocks `code_path_audit_20260607` (TypeError spam contaminates per-action timing) |
|
||||
| **A (blocker)** | Phase 6b: Complete `_send_grok` / `_send_minimax` / `_send_llama` `OpenAICompatibleRequest` migration | The 3 OpenAI-compatible providers were skipped in Phase 2; they're now the only un-migrated senders |
|
||||
| **B (consistency)** | Phase 6d: Update those 3 senders' `NormalizedResponse` to use `UsageStats` | Mirrors the migration done for `_send_anthropic` and the openai_compatible.py internal functions |
|
||||
| **C (audit-input)** | Establish a regression protocol: after any Phase-style refactor, run the FULL `tier-1-unit-core` tier, not targeted tests | The 10 test failures in `any_type_componentization_20260621` came from running targeted tests instead of the full tier |
|
||||
| **D (audit-input)** | Add a "no-TypeError-errors-on-any-thread" assertion that `code_path_audit_20260607` will reuse | The assertion catches the broadcast() regression in any future Phase-style refactor |
|
||||
|
||||
### 2.1 Non-Goals (this track)
|
||||
|
||||
- **NOT** migrating the 41 `_<provider>_history` call sites in `src/ai_client.py` to `provider_state.get_history('anthropic')`. Phase 3 deferred to a separate track post-audit.
|
||||
- **NOT** the cross-phase coupling fix (`OpenAICompatibleRequest.tools: list[dict[str, Any]]` → `list[ToolSpec]`). Deferred.
|
||||
- **NOT** the `audit_tier2_leaks.py` 3 sandbox-pollution failures. The user's `tier2/` sandbox harness modifies `mcp_paths.toml` + `opencode.json` + `.opencode/*`; the audit script needs an `--allowlist` for these (separate infra track).
|
||||
- **NOT** the pre-existing `test_gui2_custom_callback_hook_works` flake. Pre-existing; not introduced by this track.
|
||||
- **NOT** merging the `tier2/any_type_componentization_20260621` branch. Per Tier 2's recommendation, the branch stays as reconnaissance input; this track cherry-picks only the fixes, not the full branch.
|
||||
|
||||
## 3. Architecture
|
||||
|
||||
### 3.1 The Bug: Phase 5's `broadcast()` signature change
|
||||
|
||||
Phase 5 commit `e9fa69dd` refactored `HookServer.broadcast()`:
|
||||
|
||||
```python
|
||||
# BEFORE Phase 5
|
||||
def broadcast(self, channel: str, payload: dict[str, Any]) -> None:
|
||||
...
|
||||
|
||||
# AFTER Phase 5 (src/api_hooks.py)
|
||||
def broadcast(self, message: WebSocketMessage) -> None:
|
||||
...
|
||||
```
|
||||
|
||||
**Internal callers NOT updated by Phase 5:**
|
||||
- `src/app_controller.py:_run_pending_tasks_once_result` — broadcasts task results to the WebSocket pipeline per pending GUI task
|
||||
- `src/events.py` — broadcasts events emitted by the `AsyncEventQueue`
|
||||
- `src/gui_2.py:_process_pending_gui_tasks` — broadcasts from the GUI thread's pending-task queue
|
||||
|
||||
**Fix:** Replace `broadcast("channel", payload_dict)` with `broadcast(WebSocketMessage(channel="channel", payload=payload_dict))`.
|
||||
|
||||
### 3.2 The Missing Senders: 3 OpenAI-Compatible Providers
|
||||
|
||||
The 3 OpenAI-compatible senders in `src/ai_client.py`:
|
||||
- `_send_grok` (L2532)
|
||||
- `_send_minimax` (L2616)
|
||||
- `_send_llama` (L2856)
|
||||
|
||||
(Plus `_send_llama_native` at L2954, which is a different code path.)
|
||||
|
||||
These senders construct `OpenAICompatibleRequest(messages=[...], model=..., ...)` with the **legacy** shape:
|
||||
```python
|
||||
messages=[{"role": "user", "content": user_content}]
|
||||
```
|
||||
|
||||
After this track:
|
||||
```python
|
||||
messages=[ChatMessage(role="user", content=user_content)]
|
||||
```
|
||||
|
||||
And `NormalizedResponse(text=..., usage_input_tokens=..., usage_output_tokens=...)`:
|
||||
```python
|
||||
NormalizedResponse(text=text, tool_calls=(), usage=UsageStats(input_tokens=t_in, output_tokens=t_out), raw_response=raw)
|
||||
```
|
||||
|
||||
### 3.3 The Regression Protocol
|
||||
|
||||
After this track, the protocol for any Phase-style refactor is:
|
||||
|
||||
1. After implementing each phase, run the FULL `tier-1-unit-core` tier (not targeted tests). Targeted tests miss call sites in helper functions / cross-file consumers.
|
||||
2. After all phases complete, run `tier-1-unit-core` + `tier-1-unit-mma` + `tier-2-mock-app-core` + `tier-3-live_gui` FULLY (no stop-on-failure).
|
||||
3. The "no-TypeError-errors-on-any-thread" assertion in `tests/test_websocket_broadcast_regression.py` is the canonical regression test. `code_path_audit_20260607` will reuse this assertion in its per-action profiling.
|
||||
|
||||
## 4. Per-Phase Plan
|
||||
|
||||
### Phase 6a: Fix `HookServer.broadcast()` Callers
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/app_controller.py:_run_pending_tasks_once_result`
|
||||
- Modify: `src/events.py` (broadcast sites)
|
||||
- Modify: `src/gui_2.py:_process_pending_gui_tasks`
|
||||
- Create: `tests/test_websocket_broadcast_regression.py`
|
||||
|
||||
**Approach:**
|
||||
1. Grep `\.broadcast\(` in `src/` to find all internal callers
|
||||
2. For each: replace `broadcast(channel_str, payload_dict)` with `broadcast(WebSocketMessage(channel=channel_str, payload=payload_dict))`
|
||||
3. Add regression test: simulate a GUI task that triggers broadcast and assert no TypeError in stderr
|
||||
|
||||
**Why this matters for code_path_audit:**
|
||||
The audit's per-action profiling assumes no TypeError spam on the GUI thread. The Phase 6a fix makes the GUI's broadcast pipeline type-safe; the audit can then measure `WebSocketMessage.__init__` overhead per broadcast without TypeError contamination.
|
||||
|
||||
### Phase 6b: Complete `_send_grok` / `_send_minimax` / `_send_llama` `OpenAICompatibleRequest` Migration
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ai_client.py:_send_grok` (L2532)
|
||||
- Modify: `src/ai_client.py:_send_minimax` (L2616)
|
||||
- Modify: `src/ai_client.py:_send_llama` (L2856)
|
||||
- Modify: `tests/test_grok_provider.py` if it exists
|
||||
- Modify: `tests/test_minimax_provider.py` if it exists
|
||||
- Modify: `tests/test_llama_provider.py` if it exists
|
||||
|
||||
**Approach:**
|
||||
1. In each sender, replace `messages=[{"role": "user", "content": ...}]` with `messages=[ChatMessage(role="user", content=...)]`
|
||||
2. Update `OpenAICompatibleRequest` field-by-field to use `ChatMessage` everywhere
|
||||
3. Run provider tests + integration tests
|
||||
|
||||
### Phase 6d: Update Those Senders' `NormalizedResponse` Construction
|
||||
|
||||
**Files:** Same as 6b.
|
||||
|
||||
**Approach:**
|
||||
1. In each sender, replace `NormalizedResponse(text=..., usage_input_tokens=X, usage_output_tokens=Y, usage_cache_read_tokens=Z, usage_cache_creation_tokens=W, raw_response=R)` with `NormalizedResponse(text=..., tool_calls=(), usage=UsageStats(input_tokens=X, output_tokens=Y, cache_read_tokens=Z, cache_creation_tokens=W), raw_response=R)`
|
||||
2. Add import: `from src.openai_schemas import ChatMessage, NormalizedResponse, OpenAICompatibleRequest, UsageStats`
|
||||
3. Run provider tests + integration tests
|
||||
|
||||
### Phase 6e: Phase 3 Hypothetical Cost Deduction (Tier 2 deliverable)
|
||||
|
||||
**Goal:** Produce the authoritative Phase 3 hypothetical cost analysis as a Tier 2 deliverable. The deferred Phase 3 (`provider_state.ProviderHistory` call-site migration in `src/ai_client.py`) needs runtime cost data BEFORE the migration; Tier 2 produces this analysis as part of the follow-up track because they're already in `src/ai_client.py` doing the Phase 6b/6d work and have full context.
|
||||
|
||||
**Tier 1's draft** at `docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md` stays as the hypothesis document (Tier 1's qualitative estimates). **Tier 2's authoritative analysis** is a separate document at `docs/reports/PHASE3_TIER2_ANALYSIS.md` that supersedes the hypothesis with in-context, post-Phase-6b/6d-grounded estimates.
|
||||
|
||||
**Files:**
|
||||
- Create: `docs/reports/PHASE3_TIER2_ANALYSIS.md`
|
||||
- Modify: `conductor/tracks/phase2_4_5_call_site_completion_20260621/spec.md` (this section)
|
||||
|
||||
**Approach:**
|
||||
1. **For each of the 6 senders** (Tier 2 reads while doing 6b/6d work; cost analysis happens during 6b/6d + a final consolidation commit at end of 6e):
|
||||
- `_send_anthropic` (25 sites; Hot per-turn; uses cache-control helpers)
|
||||
- `_send_deepseek` (20 sites; Hot per-turn; has `_repair_deepseek_history` helper)
|
||||
- `_send_minimax` (21 sites; Hot per-turn; has `_repair_minimax_history` + `_trim_minimax_history` helpers)
|
||||
- `_send_grok` (13 sites; Hot per-turn; **being touched in 6b/6d**)
|
||||
- `_send_qwen` (12 sites; Hot per-turn; simpler pattern)
|
||||
- `_send_llama` (21 sites; Hot per-turn; highest lock count; **being touched in 6b/6d**)
|
||||
2. **For each sender, document:**
|
||||
- Direct `_anthropic_history` / `_anthropic_history_lock` sites (categorized as: append, len/iteration, lock-acquire, with-lock-block, global-decl, helper-call)
|
||||
- Helper function call sites (`_repair_<provider>_history`, `_trim_<provider>_history`, `_strip_cache_controls`, `_add_history_cache_breakpoint`)
|
||||
- Hidden call sites discovered while doing the 6b/6d work (e.g., `_repair_anthropic_history` is called from `_send_anthropic` AND from `cleanup()` — that's a hidden cross-reference)
|
||||
3. **For each category, qualitatively estimate:**
|
||||
- Per-call cost delta: `dict append` (current) vs `dataclass.append` (proposed)
|
||||
- Lock acquire cost: `threading.Lock` (current) vs `ProviderHistory.lock` (proposed) — should be ~identical but document any surprises
|
||||
- `get_all()` list-copy cost: bounded by history length (~10-50 messages); estimate ~5μs per copy
|
||||
- **Critical:** the `_strip_cache_controls(_anthropic_history)` and `_estimate_prompt_tokens(...)` callsites iterate the list; if `get_all()` is used, they copy the list per call. Recommendation: use `with h.lock: msg_list = h.messages` pattern instead of `h.get_all()` for hot iteration sites
|
||||
4. **Author `docs/reports/PHASE3_TIER2_ANALYSIS.md`:**
|
||||
- Per-sender cost summary table (compare Tier 1's hypothesis vs Tier 2's refined estimate)
|
||||
- Hidden call sites table (call sites Tier 2 discovered that Tier 1's grep missed)
|
||||
- Recommendations for the future Phase 3 track:
|
||||
- Use `with h.lock:` blocks for hot iteration sites
|
||||
- The Anthropic cache-control helpers are the highest-value target (~25 sites, per-turn)
|
||||
- The simpler providers (qwen, grok) can use `get_all()` since iteration is less frequent
|
||||
- Cross-references Tier 1's hypothesis explicitly: "Tier 1's draft is the hypothesis; this is the refined version after Phase 6b/6d context."
|
||||
- Roll-up: total estimated cost per session (~50 turns) for the Phase 3 migration; comparison vs Tier 1's hypothesis
|
||||
|
||||
**Why this matters:**
|
||||
- The future Phase 3 track needs this data to scope its phases correctly (e.g., "do the Anthropic helpers first because they're hot; defer the simpler providers to Phase 2")
|
||||
- The audit will quantify these estimates after the merge; this is the pre-audit hypothesis refinement
|
||||
- Tier 2 is the right entity to produce this because they have the actual code context after Phase 6b/6d
|
||||
|
||||
**Verification:**
|
||||
- `docs/reports/PHASE3_TIER2_ANALYSIS.md` committed
|
||||
- All 6 senders profiled
|
||||
- Total estimated cost per session documented
|
||||
- Hidden call sites table documented
|
||||
- Recommendations for future Phase 3 track documented
|
||||
- Cross-reference to Tier 1's hypothesis explicit
|
||||
|
||||
## 5. Configuration
|
||||
|
||||
No new dependencies. No new config files.
|
||||
|
||||
## 6. Testing Strategy
|
||||
|
||||
| Test File | Purpose |
|
||||
|---|---|
|
||||
| `tests/test_websocket_broadcast_regression.py` (NEW) | Verify no TypeError spam on GUI thread after broadcast() callers are fixed |
|
||||
| `tests/test_grok_provider.py` (extend) | Verify `_send_grok` uses ChatMessage + UsageStats |
|
||||
| `tests/test_minimax_provider.py` (extend) | Verify `_send_minimax` uses ChatMessage + UsageStats |
|
||||
| `tests/test_llama_provider.py` (extend) | Verify `_send_llama` uses ChatMessage + UsageStats |
|
||||
|
||||
**Verification protocol (the lesson from `any_type_componentization_20260621`):**
|
||||
- After each Phase, run `uv run python scripts/run_tests_batched.py --tier tier-1-unit-core` FULLY (no stop-on-failure)
|
||||
- After all Phases complete, run all 11 tiers FULLY
|
||||
|
||||
## 7. Migration / Rollout
|
||||
|
||||
| Phase | What | Commits |
|
||||
|---|---|---|
|
||||
| 6a | `HookServer.broadcast()` callers fixed; `test_websocket_broadcast_regression.py` added | ~5-7 |
|
||||
| 6b | `_send_grok/minimax/llama` OpenAICompatibleRequest migration | ~3-5 |
|
||||
| 6d | `_send_grok/minimax/llama` NormalizedResponse migration | ~3-4 |
|
||||
| Total | | ~11-16 |
|
||||
|
||||
Each phase has its own checkpoint commit and git note.
|
||||
|
||||
## 8. Risks & Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|---|---|---|---|
|
||||
| Grep misses an internal broadcast() caller | Low | Medium | Also check `tests/` for callers; assert "no TypeError spam" on the full 11-tier run |
|
||||
| `_send_grok/minimax/llama` test coverage is thin | Medium | Low | The 3 providers are exercised in `tests/test_*provider*.py`; if tests don't exist, add a smoke test |
|
||||
| The "no-TypeError" assertion is too strict (false positives) | Low | Low | Wrap in `try/except queue_fallback`; assert "no broadcast() TypeError specifically" |
|
||||
|
||||
## 9. Out of Scope
|
||||
|
||||
- **Phase 3 (`provider_state` call-site migration).** Deferred to a separate track after `code_path_audit_20260607` provides runtime cost data.
|
||||
- **Cross-phase coupling** (`OpenAICompatibleRequest.tools: list[ToolSpec]`). Deferred.
|
||||
- **`audit_tier2_leaks.py` sandbox-pollution failures.** Separate infra track.
|
||||
- **Pre-existing `test_gui2_custom_callback_hook_works` flake.** Separate investigation.
|
||||
- **Merging `tier2/any_type_componentization_20260621` branch.** Per Tier 2's recommendation, the branch stays as reconnaissance; this track cherry-picks only the fixes.
|
||||
|
||||
## 10. Verification Criteria
|
||||
|
||||
- [ ] `src/app_controller.py:_run_pending_tasks_once_result` uses `broadcast(WebSocketMessage(...))`
|
||||
- [ ] `src/events.py` broadcast callers use `WebSocketMessage`
|
||||
- [ ] `src/gui_2.py:_process_pending_gui_tasks` broadcast callers use `WebSocketMessage`
|
||||
- [ ] `tests/test_websocket_broadcast_regression.py` exists; asserts no broadcast() TypeError
|
||||
- [ ] `_send_grok` constructs `OpenAICompatibleRequest(messages=[ChatMessage(...)], ...)`
|
||||
- [ ] `_send_minimax` constructs `OpenAICompatibleRequest(messages=[ChatMessage(...)], ...)`
|
||||
- [ ] `_send_llama` constructs `OpenAICompatibleRequest(messages=[ChatMessage(...)], ...)`
|
||||
- [ ] `_send_grok` constructs `NormalizedResponse(text=..., usage=UsageStats(...), ...)`
|
||||
- [ ] `_send_minimax` constructs `NormalizedResponse(text=..., usage=UsageStats(...), ...)`
|
||||
- [ ] `_send_llama` constructs `NormalizedResponse(text=..., usage=UsageStats(...), ...)`
|
||||
- [ ] All 11-tier batched test run passes (no stop-on-failure)
|
||||
- [ ] `audit_weak_types.py --strict` exits 0
|
||||
- [ ] `audit_dataclass_coverage.py --strict` exits 0
|
||||
- [ ] End-of-track report at `docs/reports/TRACK_COMPLETION_phase2_4_5_call_site_completion_20260621.md`
|
||||
|
||||
## 11. See Also
|
||||
|
||||
- `docs/handoffs/PROMPT_FOR_TIER_1.md` — Tier 1 brief from Tier 2
|
||||
- `docs/handoffs/HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md` — test failure categorization
|
||||
- `docs/handoffs/HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md` — runtime cost framing
|
||||
- `conductor/tracks/any_type_componentization_20260621/spec.md` — parent track spec
|
||||
- `conductor/tracks/code_path_audit_20260607/spec.md` — the audit (this track unblocks it)
|
||||
- `docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md` — the Phase 3 hypothetical analysis (separate doc)
|
||||
@@ -0,0 +1,84 @@
|
||||
# Track state for phase2_4_5_call_site_completion_20260621
|
||||
# Updated by Tier 2 Tech Lead as tasks complete
|
||||
|
||||
[meta]
|
||||
track_id = "phase2_4_5_call_site_completion_20260621"
|
||||
name = "Phase 2/4/5 Call-Site Completion (post any_type_componentization)"
|
||||
status = "active"
|
||||
current_phase = 0
|
||||
last_updated = "2026-06-21"
|
||||
|
||||
[blocked_by]
|
||||
# No blockers; this track unblocks the audit
|
||||
|
||||
[blocks]
|
||||
code_path_audit_20260607 = "blocked_until_merge"
|
||||
|
||||
[phases]
|
||||
phase_6a = { status = "pending", checkpointsha = "", name = "Fix HookServer.broadcast() callers" }
|
||||
phase_6b = { status = "pending", checkpointsha = "", name = "Complete OpenAICompatibleRequest migration" }
|
||||
phase_6d = { status = "pending", checkpointsha = "", name = "Update NormalizedResponse construction" }
|
||||
phase_6e = { status = "pending", checkpointsha = "", name = "Phase 3 Hypothetical Cost Deduction (Tier 2 authoritative deliverable)" }
|
||||
|
||||
[tasks]
|
||||
# Phase 6a: Fix HookServer.broadcast() callers
|
||||
t6a_1 = { status = "pending", commit_sha = "", description = "Grep src/ for all .broadcast( callers; document the list (expect ~5-10 sites)" }
|
||||
t6a_2 = { status = "pending", commit_sha = "", description = "Red: tests/test_websocket_broadcast_regression.py (verify no broadcast() TypeError on GUI thread)" }
|
||||
t6a_3 = { status = "pending", commit_sha = "", description = "Fix src/app_controller.py:_run_pending_tasks_once_result broadcast callers" }
|
||||
t6a_4 = { status = "pending", commit_sha = "", description = "Fix src/events.py broadcast callers" }
|
||||
t6a_5 = { status = "pending", commit_sha = "", description = "Fix src/gui_2.py:_process_pending_gui_tasks broadcast callers" }
|
||||
t6a_6 = { status = "pending", commit_sha = "", description = "Run tier-1-unit-core FULLY (no stop-on-failure) per regression protocol" }
|
||||
t6a_7 = { status = "pending", commit_sha = "", description = "Phase 6a checkpoint commit + git note" }
|
||||
# Phase 6b: OpenAICompatibleRequest migration
|
||||
t6b_1 = { status = "pending", commit_sha = "", description = "Identify tests/test_grok_provider.py + test_minimax_provider.py + test_llama_provider.py; if absent, add smoke tests" }
|
||||
t6b_2 = { status = "pending", commit_sha = "", description = "Red: tests for ChatMessage in OpenAICompatibleRequest construction (grok/minimax/llama senders)" }
|
||||
t6b_3 = { status = "pending", commit_sha = "", description = "Migrate src/ai_client.py:_send_grok messages construction to ChatMessage" }
|
||||
t6b_4 = { status = "pending", commit_sha = "", description = "Migrate src/ai_client.py:_send_minimax messages construction to ChatMessage" }
|
||||
t6b_5 = { status = "pending", commit_sha = "", description = "Migrate src/ai_client.py:_send_llama messages construction to ChatMessage" }
|
||||
t6b_6 = { status = "pending", commit_sha = "", description = "Run tier-1-unit-core + provider tests FULLY" }
|
||||
t6b_7 = { status = "pending", commit_sha = "", description = "Phase 6b checkpoint commit + git note" }
|
||||
# Phase 6d: NormalizedResponse construction
|
||||
t6d_1 = { status = "pending", commit_sha = "", description = "Red: tests for UsageStats in NormalizedResponse construction (grok/minimax/llama senders)" }
|
||||
t6d_2 = { status = "pending", commit_sha = "", description = "Migrate src/ai_client.py:_send_grok NormalizedResponse to use UsageStats" }
|
||||
t6d_3 = { status = "pending", commit_sha = "", description = "Migrate src/ai_client.py:_send_minimax NormalizedResponse to use UsageStats" }
|
||||
t6d_4 = { status = "pending", commit_sha = "", description = "Migrate src/ai_client.py:_send_llama NormalizedResponse to use UsageStats" }
|
||||
t6d_5 = { status = "pending", commit_sha = "", description = "Run tier-1-unit-core + provider tests FULLY" }
|
||||
t6d_6 = { status = "pending", commit_sha = "", description = "All 11 tiers FULLY (no stop-on-failure) per regression protocol" }
|
||||
t6d_7 = { status = "pending", commit_sha = "", description = "Phase 6d checkpoint commit + git note" }
|
||||
# Verify + archive
|
||||
tv_1 = { status = "pending", commit_sha = "", description = "Run audit_weak_types.py --strict + audit_dataclass_coverage.py --strict (both exit 0)" }
|
||||
tv_2 = { status = "pending", commit_sha = "", description = "Run generate_type_registry.py --check (exit 0)" }
|
||||
tv_3 = { status = "pending", commit_sha = "", description = "Write docs/reports/TRACK_COMPLETION_phase2_4_5_call_site_completion_20260621.md" }
|
||||
tv_4 = { status = "pending", commit_sha = "", description = "git mv to conductor/tracks/archive/" }
|
||||
tv_5 = { status = "pending", commit_sha = "", description = "Update conductor/tracks.md" }
|
||||
# Phase 6e: Phase 3 Hypothetical Cost Deduction
|
||||
t6e_1 = { status = "pending", commit_sha = "", description = "Profile the 6 senders (during 6b/6d work): codepath catalog + helper call sites + hidden cross-references Tier 1's grep missed" }
|
||||
t6e_2 = { status = "pending", commit_sha = "", description = "Qualitative cost estimation per sender (per-call categories: append / len / iteration / lock-acquire / with-lock / global-decl / helper-call)" }
|
||||
t6e_3 = { status = "pending", commit_sha = "", description = "Identify hot iteration sites that need 'with h.lock: msg_list = h.messages' pattern vs h.get_all() (avoids list-copy cost)" }
|
||||
t6e_4 = { status = "pending", commit_sha = "", description = "Author docs/reports/PHASE3_TIER2_ANALYSIS.md (per-sender cost summary + hidden call sites table + recommendations + comparison vs Tier 1 hypothesis + cross-reference to Tier 1 draft)" }
|
||||
t6e_5 = { status = "pending", commit_sha = "", description = "Phase 6e checkpoint commit + git note" }
|
||||
|
||||
[verification]
|
||||
phase_6a_broadcast_fixed = false
|
||||
phase_6a_regression_test_passes = false
|
||||
phase_6b_openai_compat_migrated = false
|
||||
phase_6d_normalized_response_migrated = false
|
||||
phase_6e_tier2_analysis_committed = false
|
||||
full_11_tier_regression_passes = false
|
||||
audit_weak_types_strict_passes = false
|
||||
audit_dataclass_coverage_strict_passes = false
|
||||
type_registry_check_passes = false
|
||||
track_archived = false
|
||||
|
||||
[broadcast_callers_to_fix]
|
||||
# Filled in t6a_1
|
||||
expected_sites = 8
|
||||
files_affected = ["src/app_controller.py", "src/events.py", "src/gui_2.py"]
|
||||
|
||||
[deferred_from_parent_track]
|
||||
phase_3_provider_state_sites = 112
|
||||
phase_3_deferred_to = "separate track post code_path_audit_20260607"
|
||||
cross_phase_coupling = "OpenAICompatibleRequest.tools: list[dict] -> list[ToolSpec]; deferred"
|
||||
|
||||
[unblocks]
|
||||
code_path_audit_20260607 = "Phase 6a fixes broadcast() TypeError that contaminates audit instrumentation"
|
||||
@@ -0,0 +1,99 @@
|
||||
{
|
||||
"video": "C:\\projects\\manual_slop\\conductor\\tracks\\video_analysis_brain_counterintuitive_20260621\\artifacts\\video.mp4",
|
||||
"threshold": 0.05,
|
||||
"total_extracted": 121,
|
||||
"kept": 91,
|
||||
"files": [
|
||||
"frame_00001.jpg",
|
||||
"frame_00002.jpg",
|
||||
"frame_00003.jpg",
|
||||
"frame_00004.jpg",
|
||||
"frame_00005.jpg",
|
||||
"frame_00006.jpg",
|
||||
"frame_00007.jpg",
|
||||
"frame_00008.jpg",
|
||||
"frame_00009.jpg",
|
||||
"frame_00010.jpg",
|
||||
"frame_00011.jpg",
|
||||
"frame_00012.jpg",
|
||||
"frame_00013.jpg",
|
||||
"frame_00015.jpg",
|
||||
"frame_00016.jpg",
|
||||
"frame_00017.jpg",
|
||||
"frame_00018.jpg",
|
||||
"frame_00019.jpg",
|
||||
"frame_00020.jpg",
|
||||
"frame_00021.jpg",
|
||||
"frame_00022.jpg",
|
||||
"frame_00023.jpg",
|
||||
"frame_00024.jpg",
|
||||
"frame_00025.jpg",
|
||||
"frame_00026.jpg",
|
||||
"frame_00027.jpg",
|
||||
"frame_00028.jpg",
|
||||
"frame_00029.jpg",
|
||||
"frame_00030.jpg",
|
||||
"frame_00031.jpg",
|
||||
"frame_00032.jpg",
|
||||
"frame_00034.jpg",
|
||||
"frame_00035.jpg",
|
||||
"frame_00036.jpg",
|
||||
"frame_00037.jpg",
|
||||
"frame_00038.jpg",
|
||||
"frame_00039.jpg",
|
||||
"frame_00041.jpg",
|
||||
"frame_00043.jpg",
|
||||
"frame_00044.jpg",
|
||||
"frame_00045.jpg",
|
||||
"frame_00046.jpg",
|
||||
"frame_00047.jpg",
|
||||
"frame_00048.jpg",
|
||||
"frame_00049.jpg",
|
||||
"frame_00050.jpg",
|
||||
"frame_00051.jpg",
|
||||
"frame_00052.jpg",
|
||||
"frame_00053.jpg",
|
||||
"frame_00054.jpg",
|
||||
"frame_00055.jpg",
|
||||
"frame_00059.jpg",
|
||||
"frame_00063.jpg",
|
||||
"frame_00070.jpg",
|
||||
"frame_00073.jpg",
|
||||
"frame_00080.jpg",
|
||||
"frame_00082.jpg",
|
||||
"frame_00083.jpg",
|
||||
"frame_00084.jpg",
|
||||
"frame_00085.jpg",
|
||||
"frame_00086.jpg",
|
||||
"frame_00087.jpg",
|
||||
"frame_00088.jpg",
|
||||
"frame_00089.jpg",
|
||||
"frame_00090.jpg",
|
||||
"frame_00091.jpg",
|
||||
"frame_00092.jpg",
|
||||
"frame_00093.jpg",
|
||||
"frame_00094.jpg",
|
||||
"frame_00095.jpg",
|
||||
"frame_00096.jpg",
|
||||
"frame_00097.jpg",
|
||||
"frame_00098.jpg",
|
||||
"frame_00099.jpg",
|
||||
"frame_00100.jpg",
|
||||
"frame_00101.jpg",
|
||||
"frame_00102.jpg",
|
||||
"frame_00103.jpg",
|
||||
"frame_00104.jpg",
|
||||
"frame_00106.jpg",
|
||||
"frame_00107.jpg",
|
||||
"frame_00108.jpg",
|
||||
"frame_00109.jpg",
|
||||
"frame_00110.jpg",
|
||||
"frame_00111.jpg",
|
||||
"frame_00112.jpg",
|
||||
"frame_00113.jpg",
|
||||
"frame_00114.jpg",
|
||||
"frame_00115.jpg",
|
||||
"frame_00117.jpg",
|
||||
"frame_00119.jpg"
|
||||
]
|
||||
}
|
||||
|
After Width: | Height: | Size: 191 KiB |
|
After Width: | Height: | Size: 212 KiB |
|
After Width: | Height: | Size: 196 KiB |
|
After Width: | Height: | Size: 200 KiB |
|
After Width: | Height: | Size: 213 KiB |
|
After Width: | Height: | Size: 186 KiB |
|
After Width: | Height: | Size: 263 KiB |
|
After Width: | Height: | Size: 238 KiB |
|
After Width: | Height: | Size: 253 KiB |
|
After Width: | Height: | Size: 287 KiB |
|
After Width: | Height: | Size: 292 KiB |
|
After Width: | Height: | Size: 98 KiB |
|
After Width: | Height: | Size: 1.3 MiB |
|
After Width: | Height: | Size: 399 KiB |
|
After Width: | Height: | Size: 161 KiB |
|
After Width: | Height: | Size: 154 KiB |
|
After Width: | Height: | Size: 227 KiB |
|
After Width: | Height: | Size: 96 KiB |
|
After Width: | Height: | Size: 52 KiB |
|
After Width: | Height: | Size: 297 KiB |
|
After Width: | Height: | Size: 172 KiB |
|
After Width: | Height: | Size: 272 KiB |
|
After Width: | Height: | Size: 305 KiB |
|
After Width: | Height: | Size: 126 KiB |
|
After Width: | Height: | Size: 150 KiB |
|
After Width: | Height: | Size: 239 KiB |
|
After Width: | Height: | Size: 156 KiB |
|
After Width: | Height: | Size: 131 KiB |
|
After Width: | Height: | Size: 138 KiB |
|
After Width: | Height: | Size: 948 KiB |
|
After Width: | Height: | Size: 582 KiB |
|
After Width: | Height: | Size: 926 KiB |
|
After Width: | Height: | Size: 612 KiB |
|
After Width: | Height: | Size: 363 KiB |
|
After Width: | Height: | Size: 88 KiB |
|
After Width: | Height: | Size: 868 KiB |
|
After Width: | Height: | Size: 1.7 MiB |
|
After Width: | Height: | Size: 1.1 MiB |
|
After Width: | Height: | Size: 544 KiB |
|
After Width: | Height: | Size: 526 KiB |
|
After Width: | Height: | Size: 438 KiB |
|
After Width: | Height: | Size: 378 KiB |
|
After Width: | Height: | Size: 388 KiB |
|
After Width: | Height: | Size: 418 KiB |
|
After Width: | Height: | Size: 457 KiB |
|
After Width: | Height: | Size: 476 KiB |
|
After Width: | Height: | Size: 481 KiB |
|
After Width: | Height: | Size: 481 KiB |
|
After Width: | Height: | Size: 500 KiB |
|
After Width: | Height: | Size: 505 KiB |
|
After Width: | Height: | Size: 514 KiB |
|
After Width: | Height: | Size: 551 KiB |
|
After Width: | Height: | Size: 547 KiB |
|
After Width: | Height: | Size: 587 KiB |
|
After Width: | Height: | Size: 606 KiB |
|
After Width: | Height: | Size: 649 KiB |
|
After Width: | Height: | Size: 651 KiB |
|
After Width: | Height: | Size: 376 KiB |
|
After Width: | Height: | Size: 378 KiB |
|
After Width: | Height: | Size: 373 KiB |
|
After Width: | Height: | Size: 465 KiB |
|
After Width: | Height: | Size: 759 KiB |
|
After Width: | Height: | Size: 529 KiB |
|
After Width: | Height: | Size: 215 KiB |
|
After Width: | Height: | Size: 253 KiB |
|
After Width: | Height: | Size: 304 KiB |
|
After Width: | Height: | Size: 416 KiB |
|
After Width: | Height: | Size: 569 KiB |
|
After Width: | Height: | Size: 337 KiB |
|
After Width: | Height: | Size: 772 KiB |
|
After Width: | Height: | Size: 152 KiB |
|
After Width: | Height: | Size: 943 KiB |
|
After Width: | Height: | Size: 246 KiB |
|
After Width: | Height: | Size: 280 KiB |
|
After Width: | Height: | Size: 323 KiB |
|
After Width: | Height: | Size: 248 KiB |
|
After Width: | Height: | Size: 382 KiB |