docs(handoff): Tier 1 prompt - follow-up track + audit sequencing

Synthesizes the 2 prior handoff docs into a ready-to-use Tier 1 brief: - HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md (the audit framing) - HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md (the test failures + scope) Sections: 1. TL;DR (3 paragraphs): what happened, the hidden broadcast() bug, the recommendation (don't merge; use as input for follow-up track) 2. Context: 48 promoted, 41 deferred, 2 new audits, 1 styleguide 3. 4 decision points for Tier 1 (scope, sequencing, audit adjustments, scope expansion) 4. The 4 documents Tier 1 should read in order (45 min total) 5. What Tier 1 should NOT do (3 anti-patterns) 6. What Tier 1 SHOULD do (6 concrete first steps) 7. What Tier 2 is available for (conventions reminder) 8. The bigger vision (agent-debugger framing) Recommended sequencing for Tier 1: T0: Approve follow-up track scope T1: Tier 2 implements Phase 6a + 6b + 6d (~18 commits, 3 hours) T2: Tier 2 runs tier-1-unit-core FULLY (no stop-on-failure) T3: Tier 2 runs tier-3-live_gui FULLY T4: Tier 1 reviews + merges follow-up track T5: Tier 1 launches code_path_audit_20260607 T6: Tier 2 implements Phase 3 + cross-phase coupling (separate track) Tier 1's scope decision: I recommend the SHRUNK version (Phase 6a + 6b + 6d only; defer Phase 3 to its own track). This gives the code-path audit a clean instrumented target without ballooning the follow-up beyond Tier 2's 1-4 hour budget. Audit adjustments to add: - 5 micro-benchmarks (NormalizedResponse.__init__, WebSocketMessage.__init__, UsageStats.__init__, ProviderHistory.lock, ToolSpec.__init__) - 'no-TypeError-errors-on-any-thread' assertion - Instrument grok/minimax/llama providers (currently unprofiled) - Add 2 new actions: provider_history_append + websocket_broadcast
2026-06-21 17:57:38 -04:00
parent b3ed4b1508
commit 95a8fae234
1 changed files with 138 additions and 0 deletions
@@ -0,0 +1,138 @@
+# Tier 1 Prompt: Follow-up Track + Code-Path Audit Sequencing
+
+**From:** Tier 2 Tech Lead (autonomous sandbox, `any_type_componentization_20260621`)
+**To:** Tier 1 Orchestrator
+**Date:** 2026-06-21
+**Status:** Branch `tier2/any_type_componentization_20260621` is at 24 commits, ready for review (not merge).
+
+---
+
+## TL;DR (read this first)
+
+Tier 2 ran `any_type_componentization_20260621` and the result is **reconnaissance-grade, not merge-grade**. The track did 48 of 89 fat-struct promotions cleanly (Phase 1, 2, 4, 5), but deferred Phase 3 entirely and left **one runtime bug** that didn't surface in my targeted regression suite: `WebSocketServer.broadcast()` callers in `src/app_controller.py` and `src/events.py` still use the old `(channel, payload)` signature after Phase 5 changed it to `(message: WebSocketMessage)`. This produces `worker[queue_fallback] error: WebSocketServer.broadcast() takes 2 positional arguments but 3 were given` spam in `tier-2-mock-app-core`.
+
+**Tier 1 should:** (a) approve a ~15-commit follow-up track that closes the deferred work and the broadcast() bug, then (b) sequence `code_path_audit_20260607` to use the follow-up's output as input.
+
+**Do not merge this branch yet.** Use it as the spec input for the follow-up track.
+
+---
+
+## Context: what happened in this track
+
+**Input artifact:** `docs/reports/ANY_TYPE_AUDIT_20260621.md` identified 89 fat-struct sites across 5 candidates (mcp_tool_specs: 8, openai_schemas: 17, provider_state: 41, log_registry.Session: 7, api_hooks.WebSocketMessage: 16).
+
+**Output:**
+- **48 sites promoted:** Phase 1 (`ToolSpec` + `ToolParameter` registry; 45 tools), Phase 2 (`ChatMessage` + `UsageStats` + `ToolCall` + refactored `NormalizedResponse` + `OpenAICompatibleRequest`), Phase 4 (`Session` + `SessionMetadata` with backward-compat `__getitem__`), Phase 5 (`WebSocketMessage` + `JsonValue`).
+- **41 sites deferred:** Phase 3 (`provider_state.ProviderHistory` dataclass exists; the 27 call sites in `src/ai_client.py` `_send_<provider>` functions remain on the legacy `_anthropic_history` / `_deepseek_history` / etc. globals).
+- **2 new audit scripts:** `scripts/audit_dataclass_coverage.py` (CI gate; baseline = 207 → post-track = 200).
+- **1 styleguide update:** `conductor/code_styleguides/type_aliases.md` §12 "When to Promote TypeAlias to dataclass" (98 lines; the codified rule future agents will follow).
+- **1 end-of-track report:** `docs/reports/TRACK_COMPLETION_any_type_componentization_20260621.md`.
+
+**Code-path audit input doc:** `docs/handoffs/HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md` (commit `0fabeaf4`). Tier 1 should read this BEFORE scoping `code_path_audit_20260607`.
+
+**Failure report doc:** `docs/handoffs/HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md` (commit `d7b6b229`). Tier 1 should read this BEFORE scoping the follow-up track.
+
+---
+
+## Tier 1 decision points
+
+### Decision 1: Approve the follow-up track?
+
+**Recommended scope (per `HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md`):**
+
+| Task | Scope | Est. commits |
+|---|---|---:|
+| Phase 6a: Fix `WebSocketServer.broadcast()` callers | Grep `src/` for `\.broadcast\(`; replace `broadcast(channel, payload)` with `broadcast(WebSocketMessage(channel=, payload=))` in `src/app_controller.py:_run_pending_tasks_once_result`, `src/events.py`, `src/gui_2.py`. Add regression tests. | 4-6 |
+| Phase 6b: Complete t2_6 (OpenAICompatibleRequest callers in `_send_grok`, `_send_minimax`, `_send_llama`) | Migrate the 3 remaining `_send_<provider>` functions in `src/ai_client.py` to construct `OpenAICompatibleRequest(messages=[ChatMessage(...)], ...)` instead of `messages=[{"role": ..., "content": ...}]` | 3-4 |
+| Phase 6c: Complete Phase 3 (provider_state call-site migration) | Replace `_anthropic_history` / `_anthropic_history_lock` etc. in `src/ai_client.py` with `provider_state.get_history('anthropic')`. ~27 call sites. | 8-10 |
+| Phase 6d: Update `_send_grok` / `_send_minimax` / `_send_llama` callers to use new `ChatMessage` / `UsageStats` | Migration of `NormalizedResponse(text=..., usage_input_tokens=..., ...)` to `NormalizedResponse(text=..., usage=UsageStats(...))` in the 3 send functions. | 3-4 |
+| **Total** | | **~18-24 commits** |
+
+**Tier 1 should decide:** approve this scope, OR shrink (defer Phase 3 entirely to a separate track; do just Phase 6a + 6b + 6d to unblock the audit), OR expand (also include the cross-phase coupling fix: migrate `OpenAICompatibleRequest.tools` from `list[dict[str, Any]]` to `list[ToolSpec]`).
+
+**My recommendation:** shrink. Phase 3 + cross-phase coupling are separate concerns. Do just Phase 6a + 6b + 6d (the **code-path-honest** part: every `NormalizedResponse` construction site uses the new API; every `broadcast()` caller uses the new signature). Defer Phase 3 + cross-phase coupling to their own tracks. This gives `code_path_audit_20260607` a clean instrumented target.
+
+### Decision 2: Sequence `code_path_audit_20260607` after the follow-up?
+
+**Yes.** The audit's `trace_action` output will be polluted by `worker[queue_fallback] error: WebSocketServer.broadcast() takes 2 positional arguments but 3 were given` unless Phase 6a lands first. The audit's per-action profiling assumes no TypeError spam on the GUI thread; if the broadcast call site raises, the audit's timing data is contaminated.
+
+**Recommended sequencing:**
+
+```
+T0:  Tier 1 approves follow-up track                  (decision 1)
+T1:  Tier 2 implements Phase 6a + 6b + 6d            (~3 hours, ~18 commits)
+T2:  Tier 2 runs tier-1-unit-core FULLY               (no stop-on-failure)
+T3:  Tier 2 runs tier-3-live_gui FULLY                (no stop-on-failure)
+T4:  Tier 1 reviews + merges follow-up track
+T5:  Tier 1 launches code_path_audit_20260607
+T6:  Tier 2 implements Phase 3 + cross-phase coupling (separate track, post-audit)
+```
+
+### Decision 3: Adjust `code_path_audit_20260607` per the handoff doc
+
+The existing `code_path_audit_20260607` spec (per `ANY_TYPE_AUDIT_20260621.md` §5) calls for per-action profiling. Tier 1 should ADD:
+
+1. The 5 micro-benchmarks listed in `HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md` §7 (NormalizedResponse.__init__, WebSocketMessage.__init__, UsageStats.__init__, ProviderHistory.lock, ToolSpec.__init__).
+2. A "no-TypeError-errors-on-any-thread" assertion: the audit should fail if any `worker[queue_fallback] error: WebSocketServer.broadcast()` appears in the test output during the audit's per-action profiling. (Phase 6a's regression test should make this assertion.)
+3. The 3 OpenAI-compatible providers (`grok`, `minimax`, `llama`) — currently unprofiled — should be instrumented, since they're the hot paths Phase 6b will migrate.
+
+### Decision 4: Code-Path Audit pre-flight scope expansion
+
+The existing `code_path_audit_20260607` spec scopes 3 actions (`ai_message_lifecycle`, `discussion_save_load`, `gui_startup`). Tier 1 should ADD:
+
+- `provider_history_append`: every `_send_<provider>` path appends to history; the audit should measure per-turn latency.
+- `websocket_broadcast`: the GUI thread broadcasts; the audit should measure broadcast throughput under load.
+
+These are the hot paths Phase 3 + Phase 6a will touch. The audit's data will directly inform whether the Phase 3 + Phase 6a refactors are worth the cost.
+
+---
+
+## The 4 documents Tier 1 should read (in this order)
+
+1. **`docs/reports/ANY_TYPE_AUDIT_20260621.md`** (input artifact; the 89 sites and the 5-pattern taxonomy)
+2. **`docs/reports/TRACK_COMPLETION_any_type_componentization_20260621.md`** (what was done, what was deferred, the per-phase results table)
+3. **`docs/handoffs/HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md`** (test failure categorization; the 4-section follow-up scope; the micro-benchmarks)
+4. **`docs/handoffs/HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md`** (the 5-pattern taxonomy applied to runtime; the "the code is the agent debugger" framing; the recommendation not to merge this branch)
+
+**Total read time:** ~45 minutes for Tier 1 to come up to speed.
+
+---
+
+## What Tier 1 should NOT do
+
+- **Don't merge `tier2/any_type_componentization_20260621` as-is.** The 1 runtime bug (broadcast() in `src/app_controller.py`) makes the branch not merge-grade.
+- **Don't launch `code_path_audit_20260607` before the follow-up track.** The TypeError spam will pollute the audit's per-action profiling.
+- **Don't try to fix Phase 3 + cross-phase coupling in the same track as the follow-up.** Phase 3 is ~8-10 commits; cross-phase coupling is ~3-4 commits; combining them with the broadcast fix would balloon the follow-up to ~25 commits and exceed the 1-4 hour Tier 2 budget.
+
+---
+
+## What Tier 1 SHOULD do (concrete first steps)
+
+1. **Read the 4 documents above.** (45 min)
+2. **Decide on Decision 1 scope.** (10 min — approve the shrunk 18-commit follow-up, OR the full 24-commit version)
+3. **Create the follow-up track spec** at `conductor/tracks/phase2_4_5_call_site_completion_2026MMDD/spec.md` referencing this prompt + the 4 documents.
+4. **Adjust `code_path_audit_20260607` spec** to include the 5 micro-benchmarks + 2 new actions (`provider_history_append`, `websocket_broadcast`) + the "no-TypeError" assertion.
+5. **Launch the follow-up track** via `/conductor:implement`.
+6. **After follow-up completes and merges,** launch `code_path_audit_20260607`.
+
+---
+
+## What Tier 2 is available for
+
+Tier 2 can be re-invoked to implement the follow-up track. The handoff is in `docs/handoffs/`; the spec will be in `conductor/tracks/.../spec.md`. Same Tier 2 conventions apply:
+- Read all 13 `conductor/code_styleguides/*.md` before starting
+- Per-task commit + git note + state.toml update
+- Throwaway scripts to `scripts/tier2/artifacts/<track-name>/`
+- Archive move is the user's job, not Tier 2's
+
+---
+
+## Final note: the bigger vision
+
+The user said: "We are nudging toward a much more interesting and compelling codebase to ideate this ai llm frontend towards something as novel as the rad debugger but for its domain."
+
+The `any_type_componentization_20260621` track is reconnaissance for that vision. The follow-up track is "make the codebase match the reconnaissance." `code_path_audit_20260607` is "measure the runtime cost of every typed site so the agent debugger UI can read it losslessly." Together: typed code + measured paths + readable dataclasses = the foundation for an agent-debugger frontend.
+
+Don't merge the branch. Use it as input.
+
+— Tier 2