From 95a8fae2340d2d107436378f4bfa6b8cf3aee06b Mon Sep 17 00:00:00 2001 From: Ed_ Date: Sun, 21 Jun 2026 17:57:38 -0400 Subject: [PATCH] docs(handoff): Tier 1 prompt - follow-up track + audit sequencing Synthesizes the 2 prior handoff docs into a ready-to-use Tier 1 brief: - HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md (the audit framing) - HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md (the test failures + scope) Sections: 1. TL;DR (3 paragraphs): what happened, the hidden broadcast() bug, the recommendation (don't merge; use as input for follow-up track) 2. Context: 48 promoted, 41 deferred, 2 new audits, 1 styleguide 3. 4 decision points for Tier 1 (scope, sequencing, audit adjustments, scope expansion) 4. The 4 documents Tier 1 should read in order (45 min total) 5. What Tier 1 should NOT do (3 anti-patterns) 6. What Tier 1 SHOULD do (6 concrete first steps) 7. What Tier 2 is available for (conventions reminder) 8. The bigger vision (agent-debugger framing) Recommended sequencing for Tier 1: T0: Approve follow-up track scope T1: Tier 2 implements Phase 6a + 6b + 6d (~18 commits, 3 hours) T2: Tier 2 runs tier-1-unit-core FULLY (no stop-on-failure) T3: Tier 2 runs tier-3-live_gui FULLY T4: Tier 1 reviews + merges follow-up track T5: Tier 1 launches code_path_audit_20260607 T6: Tier 2 implements Phase 3 + cross-phase coupling (separate track) Tier 1's scope decision: I recommend the SHRUNK version (Phase 6a + 6b + 6d only; defer Phase 3 to its own track). This gives the code-path audit a clean instrumented target without ballooning the follow-up beyond Tier 2's 1-4 hour budget. Audit adjustments to add: - 5 micro-benchmarks (NormalizedResponse.__init__, WebSocketMessage.__init__, UsageStats.__init__, ProviderHistory.lock, ToolSpec.__init__) - 'no-TypeError-errors-on-any-thread' assertion - Instrument grok/minimax/llama providers (currently unprofiled) - Add 2 new actions: provider_history_append + websocket_broadcast --- docs/handoffs/PROMPT_FOR_TIER_1.md | 138 +++++++++++++++++++++++++++++ 1 file changed, 138 insertions(+) create mode 100644 docs/handoffs/PROMPT_FOR_TIER_1.md diff --git a/docs/handoffs/PROMPT_FOR_TIER_1.md b/docs/handoffs/PROMPT_FOR_TIER_1.md new file mode 100644 index 00000000..cc74f6bf --- /dev/null +++ b/docs/handoffs/PROMPT_FOR_TIER_1.md @@ -0,0 +1,138 @@ +# Tier 1 Prompt: Follow-up Track + Code-Path Audit Sequencing + +**From:** Tier 2 Tech Lead (autonomous sandbox, `any_type_componentization_20260621`) +**To:** Tier 1 Orchestrator +**Date:** 2026-06-21 +**Status:** Branch `tier2/any_type_componentization_20260621` is at 24 commits, ready for review (not merge). + +--- + +## TL;DR (read this first) + +Tier 2 ran `any_type_componentization_20260621` and the result is **reconnaissance-grade, not merge-grade**. The track did 48 of 89 fat-struct promotions cleanly (Phase 1, 2, 4, 5), but deferred Phase 3 entirely and left **one runtime bug** that didn't surface in my targeted regression suite: `WebSocketServer.broadcast()` callers in `src/app_controller.py` and `src/events.py` still use the old `(channel, payload)` signature after Phase 5 changed it to `(message: WebSocketMessage)`. This produces `worker[queue_fallback] error: WebSocketServer.broadcast() takes 2 positional arguments but 3 were given` spam in `tier-2-mock-app-core`. + +**Tier 1 should:** (a) approve a ~15-commit follow-up track that closes the deferred work and the broadcast() bug, then (b) sequence `code_path_audit_20260607` to use the follow-up's output as input. + +**Do not merge this branch yet.** Use it as the spec input for the follow-up track. + +--- + +## Context: what happened in this track + +**Input artifact:** `docs/reports/ANY_TYPE_AUDIT_20260621.md` identified 89 fat-struct sites across 5 candidates (mcp_tool_specs: 8, openai_schemas: 17, provider_state: 41, log_registry.Session: 7, api_hooks.WebSocketMessage: 16). + +**Output:** +- **48 sites promoted:** Phase 1 (`ToolSpec` + `ToolParameter` registry; 45 tools), Phase 2 (`ChatMessage` + `UsageStats` + `ToolCall` + refactored `NormalizedResponse` + `OpenAICompatibleRequest`), Phase 4 (`Session` + `SessionMetadata` with backward-compat `__getitem__`), Phase 5 (`WebSocketMessage` + `JsonValue`). +- **41 sites deferred:** Phase 3 (`provider_state.ProviderHistory` dataclass exists; the 27 call sites in `src/ai_client.py` `_send_` functions remain on the legacy `_anthropic_history` / `_deepseek_history` / etc. globals). +- **2 new audit scripts:** `scripts/audit_dataclass_coverage.py` (CI gate; baseline = 207 → post-track = 200). +- **1 styleguide update:** `conductor/code_styleguides/type_aliases.md` §12 "When to Promote TypeAlias to dataclass" (98 lines; the codified rule future agents will follow). +- **1 end-of-track report:** `docs/reports/TRACK_COMPLETION_any_type_componentization_20260621.md`. + +**Code-path audit input doc:** `docs/handoffs/HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md` (commit `0fabeaf4`). Tier 1 should read this BEFORE scoping `code_path_audit_20260607`. + +**Failure report doc:** `docs/handoffs/HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md` (commit `d7b6b229`). Tier 1 should read this BEFORE scoping the follow-up track. + +--- + +## Tier 1 decision points + +### Decision 1: Approve the follow-up track? + +**Recommended scope (per `HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md`):** + +| Task | Scope | Est. commits | +|---|---|---:| +| Phase 6a: Fix `WebSocketServer.broadcast()` callers | Grep `src/` for `\.broadcast\(`; replace `broadcast(channel, payload)` with `broadcast(WebSocketMessage(channel=, payload=))` in `src/app_controller.py:_run_pending_tasks_once_result`, `src/events.py`, `src/gui_2.py`. Add regression tests. | 4-6 | +| Phase 6b: Complete t2_6 (OpenAICompatibleRequest callers in `_send_grok`, `_send_minimax`, `_send_llama`) | Migrate the 3 remaining `_send_` functions in `src/ai_client.py` to construct `OpenAICompatibleRequest(messages=[ChatMessage(...)], ...)` instead of `messages=[{"role": ..., "content": ...}]` | 3-4 | +| Phase 6c: Complete Phase 3 (provider_state call-site migration) | Replace `_anthropic_history` / `_anthropic_history_lock` etc. in `src/ai_client.py` with `provider_state.get_history('anthropic')`. ~27 call sites. | 8-10 | +| Phase 6d: Update `_send_grok` / `_send_minimax` / `_send_llama` callers to use new `ChatMessage` / `UsageStats` | Migration of `NormalizedResponse(text=..., usage_input_tokens=..., ...)` to `NormalizedResponse(text=..., usage=UsageStats(...))` in the 3 send functions. | 3-4 | +| **Total** | | **~18-24 commits** | + +**Tier 1 should decide:** approve this scope, OR shrink (defer Phase 3 entirely to a separate track; do just Phase 6a + 6b + 6d to unblock the audit), OR expand (also include the cross-phase coupling fix: migrate `OpenAICompatibleRequest.tools` from `list[dict[str, Any]]` to `list[ToolSpec]`). + +**My recommendation:** shrink. Phase 3 + cross-phase coupling are separate concerns. Do just Phase 6a + 6b + 6d (the **code-path-honest** part: every `NormalizedResponse` construction site uses the new API; every `broadcast()` caller uses the new signature). Defer Phase 3 + cross-phase coupling to their own tracks. This gives `code_path_audit_20260607` a clean instrumented target. + +### Decision 2: Sequence `code_path_audit_20260607` after the follow-up? + +**Yes.** The audit's `trace_action` output will be polluted by `worker[queue_fallback] error: WebSocketServer.broadcast() takes 2 positional arguments but 3 were given` unless Phase 6a lands first. The audit's per-action profiling assumes no TypeError spam on the GUI thread; if the broadcast call site raises, the audit's timing data is contaminated. + +**Recommended sequencing:** + +``` +T0: Tier 1 approves follow-up track (decision 1) +T1: Tier 2 implements Phase 6a + 6b + 6d (~3 hours, ~18 commits) +T2: Tier 2 runs tier-1-unit-core FULLY (no stop-on-failure) +T3: Tier 2 runs tier-3-live_gui FULLY (no stop-on-failure) +T4: Tier 1 reviews + merges follow-up track +T5: Tier 1 launches code_path_audit_20260607 +T6: Tier 2 implements Phase 3 + cross-phase coupling (separate track, post-audit) +``` + +### Decision 3: Adjust `code_path_audit_20260607` per the handoff doc + +The existing `code_path_audit_20260607` spec (per `ANY_TYPE_AUDIT_20260621.md` §5) calls for per-action profiling. Tier 1 should ADD: + +1. The 5 micro-benchmarks listed in `HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md` §7 (NormalizedResponse.__init__, WebSocketMessage.__init__, UsageStats.__init__, ProviderHistory.lock, ToolSpec.__init__). +2. A "no-TypeError-errors-on-any-thread" assertion: the audit should fail if any `worker[queue_fallback] error: WebSocketServer.broadcast()` appears in the test output during the audit's per-action profiling. (Phase 6a's regression test should make this assertion.) +3. The 3 OpenAI-compatible providers (`grok`, `minimax`, `llama`) — currently unprofiled — should be instrumented, since they're the hot paths Phase 6b will migrate. + +### Decision 4: Code-Path Audit pre-flight scope expansion + +The existing `code_path_audit_20260607` spec scopes 3 actions (`ai_message_lifecycle`, `discussion_save_load`, `gui_startup`). Tier 1 should ADD: + +- `provider_history_append`: every `_send_` path appends to history; the audit should measure per-turn latency. +- `websocket_broadcast`: the GUI thread broadcasts; the audit should measure broadcast throughput under load. + +These are the hot paths Phase 3 + Phase 6a will touch. The audit's data will directly inform whether the Phase 3 + Phase 6a refactors are worth the cost. + +--- + +## The 4 documents Tier 1 should read (in this order) + +1. **`docs/reports/ANY_TYPE_AUDIT_20260621.md`** (input artifact; the 89 sites and the 5-pattern taxonomy) +2. **`docs/reports/TRACK_COMPLETION_any_type_componentization_20260621.md`** (what was done, what was deferred, the per-phase results table) +3. **`docs/handoffs/HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md`** (test failure categorization; the 4-section follow-up scope; the micro-benchmarks) +4. **`docs/handoffs/HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md`** (the 5-pattern taxonomy applied to runtime; the "the code is the agent debugger" framing; the recommendation not to merge this branch) + +**Total read time:** ~45 minutes for Tier 1 to come up to speed. + +--- + +## What Tier 1 should NOT do + +- **Don't merge `tier2/any_type_componentization_20260621` as-is.** The 1 runtime bug (broadcast() in `src/app_controller.py`) makes the branch not merge-grade. +- **Don't launch `code_path_audit_20260607` before the follow-up track.** The TypeError spam will pollute the audit's per-action profiling. +- **Don't try to fix Phase 3 + cross-phase coupling in the same track as the follow-up.** Phase 3 is ~8-10 commits; cross-phase coupling is ~3-4 commits; combining them with the broadcast fix would balloon the follow-up to ~25 commits and exceed the 1-4 hour Tier 2 budget. + +--- + +## What Tier 1 SHOULD do (concrete first steps) + +1. **Read the 4 documents above.** (45 min) +2. **Decide on Decision 1 scope.** (10 min — approve the shrunk 18-commit follow-up, OR the full 24-commit version) +3. **Create the follow-up track spec** at `conductor/tracks/phase2_4_5_call_site_completion_2026MMDD/spec.md` referencing this prompt + the 4 documents. +4. **Adjust `code_path_audit_20260607` spec** to include the 5 micro-benchmarks + 2 new actions (`provider_history_append`, `websocket_broadcast`) + the "no-TypeError" assertion. +5. **Launch the follow-up track** via `/conductor:implement`. +6. **After follow-up completes and merges,** launch `code_path_audit_20260607`. + +--- + +## What Tier 2 is available for + +Tier 2 can be re-invoked to implement the follow-up track. The handoff is in `docs/handoffs/`; the spec will be in `conductor/tracks/.../spec.md`. Same Tier 2 conventions apply: +- Read all 13 `conductor/code_styleguides/*.md` before starting +- Per-task commit + git note + state.toml update +- Throwaway scripts to `scripts/tier2/artifacts//` +- Archive move is the user's job, not Tier 2's + +--- + +## Final note: the bigger vision + +The user said: "We are nudging toward a much more interesting and compelling codebase to ideate this ai llm frontend towards something as novel as the rad debugger but for its domain." + +The `any_type_componentization_20260621` track is reconnaissance for that vision. The follow-up track is "make the codebase match the reconnaissance." `code_path_audit_20260607` is "measure the runtime cost of every typed site so the agent debugger UI can read it losslessly." Together: typed code + measured paths + readable dataclasses = the foundation for an agent-debugger frontend. + +Don't merge the branch. Use it as input. + +— Tier 2 \ No newline at end of file