Private
Public Access
0
0

docs(handoff): Tier 1 prompt - follow-up track + audit sequencing

Synthesizes the 2 prior handoff docs into a ready-to-use Tier 1 brief:
- HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md (the audit framing)
- HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md (the test failures + scope)

Sections:
1. TL;DR (3 paragraphs): what happened, the hidden broadcast() bug,
   the recommendation (don't merge; use as input for follow-up track)
2. Context: 48 promoted, 41 deferred, 2 new audits, 1 styleguide
3. 4 decision points for Tier 1 (scope, sequencing, audit adjustments,
   scope expansion)
4. The 4 documents Tier 1 should read in order (45 min total)
5. What Tier 1 should NOT do (3 anti-patterns)
6. What Tier 1 SHOULD do (6 concrete first steps)
7. What Tier 2 is available for (conventions reminder)
8. The bigger vision (agent-debugger framing)

Recommended sequencing for Tier 1:
T0: Approve follow-up track scope
T1: Tier 2 implements Phase 6a + 6b + 6d (~18 commits, 3 hours)
T2: Tier 2 runs tier-1-unit-core FULLY (no stop-on-failure)
T3: Tier 2 runs tier-3-live_gui FULLY
T4: Tier 1 reviews + merges follow-up track
T5: Tier 1 launches code_path_audit_20260607
T6: Tier 2 implements Phase 3 + cross-phase coupling (separate track)

Tier 1's scope decision: I recommend the SHRUNK version (Phase 6a + 6b + 6d
only; defer Phase 3 to its own track). This gives the code-path audit a
clean instrumented target without ballooning the follow-up beyond Tier 2's
1-4 hour budget.

Audit adjustments to add:
- 5 micro-benchmarks (NormalizedResponse.__init__, WebSocketMessage.__init__,
  UsageStats.__init__, ProviderHistory.lock, ToolSpec.__init__)
- 'no-TypeError-errors-on-any-thread' assertion
- Instrument grok/minimax/llama providers (currently unprofiled)
- Add 2 new actions: provider_history_append + websocket_broadcast
This commit is contained in:
2026-06-21 17:57:38 -04:00
parent b3ed4b1508
commit 95a8fae234
+138
View File
@@ -0,0 +1,138 @@
# Tier 1 Prompt: Follow-up Track + Code-Path Audit Sequencing
**From:** Tier 2 Tech Lead (autonomous sandbox, `any_type_componentization_20260621`)
**To:** Tier 1 Orchestrator
**Date:** 2026-06-21
**Status:** Branch `tier2/any_type_componentization_20260621` is at 24 commits, ready for review (not merge).
---
## TL;DR (read this first)
Tier 2 ran `any_type_componentization_20260621` and the result is **reconnaissance-grade, not merge-grade**. The track did 48 of 89 fat-struct promotions cleanly (Phase 1, 2, 4, 5), but deferred Phase 3 entirely and left **one runtime bug** that didn't surface in my targeted regression suite: `WebSocketServer.broadcast()` callers in `src/app_controller.py` and `src/events.py` still use the old `(channel, payload)` signature after Phase 5 changed it to `(message: WebSocketMessage)`. This produces `worker[queue_fallback] error: WebSocketServer.broadcast() takes 2 positional arguments but 3 were given` spam in `tier-2-mock-app-core`.
**Tier 1 should:** (a) approve a ~15-commit follow-up track that closes the deferred work and the broadcast() bug, then (b) sequence `code_path_audit_20260607` to use the follow-up's output as input.
**Do not merge this branch yet.** Use it as the spec input for the follow-up track.
---
## Context: what happened in this track
**Input artifact:** `docs/reports/ANY_TYPE_AUDIT_20260621.md` identified 89 fat-struct sites across 5 candidates (mcp_tool_specs: 8, openai_schemas: 17, provider_state: 41, log_registry.Session: 7, api_hooks.WebSocketMessage: 16).
**Output:**
- **48 sites promoted:** Phase 1 (`ToolSpec` + `ToolParameter` registry; 45 tools), Phase 2 (`ChatMessage` + `UsageStats` + `ToolCall` + refactored `NormalizedResponse` + `OpenAICompatibleRequest`), Phase 4 (`Session` + `SessionMetadata` with backward-compat `__getitem__`), Phase 5 (`WebSocketMessage` + `JsonValue`).
- **41 sites deferred:** Phase 3 (`provider_state.ProviderHistory` dataclass exists; the 27 call sites in `src/ai_client.py` `_send_<provider>` functions remain on the legacy `_anthropic_history` / `_deepseek_history` / etc. globals).
- **2 new audit scripts:** `scripts/audit_dataclass_coverage.py` (CI gate; baseline = 207 → post-track = 200).
- **1 styleguide update:** `conductor/code_styleguides/type_aliases.md` §12 "When to Promote TypeAlias to dataclass" (98 lines; the codified rule future agents will follow).
- **1 end-of-track report:** `docs/reports/TRACK_COMPLETION_any_type_componentization_20260621.md`.
**Code-path audit input doc:** `docs/handoffs/HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md` (commit `0fabeaf4`). Tier 1 should read this BEFORE scoping `code_path_audit_20260607`.
**Failure report doc:** `docs/handoffs/HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md` (commit `d7b6b229`). Tier 1 should read this BEFORE scoping the follow-up track.
---
## Tier 1 decision points
### Decision 1: Approve the follow-up track?
**Recommended scope (per `HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md`):**
| Task | Scope | Est. commits |
|---|---|---:|
| Phase 6a: Fix `WebSocketServer.broadcast()` callers | Grep `src/` for `\.broadcast\(`; replace `broadcast(channel, payload)` with `broadcast(WebSocketMessage(channel=, payload=))` in `src/app_controller.py:_run_pending_tasks_once_result`, `src/events.py`, `src/gui_2.py`. Add regression tests. | 4-6 |
| Phase 6b: Complete t2_6 (OpenAICompatibleRequest callers in `_send_grok`, `_send_minimax`, `_send_llama`) | Migrate the 3 remaining `_send_<provider>` functions in `src/ai_client.py` to construct `OpenAICompatibleRequest(messages=[ChatMessage(...)], ...)` instead of `messages=[{"role": ..., "content": ...}]` | 3-4 |
| Phase 6c: Complete Phase 3 (provider_state call-site migration) | Replace `_anthropic_history` / `_anthropic_history_lock` etc. in `src/ai_client.py` with `provider_state.get_history('anthropic')`. ~27 call sites. | 8-10 |
| Phase 6d: Update `_send_grok` / `_send_minimax` / `_send_llama` callers to use new `ChatMessage` / `UsageStats` | Migration of `NormalizedResponse(text=..., usage_input_tokens=..., ...)` to `NormalizedResponse(text=..., usage=UsageStats(...))` in the 3 send functions. | 3-4 |
| **Total** | | **~18-24 commits** |
**Tier 1 should decide:** approve this scope, OR shrink (defer Phase 3 entirely to a separate track; do just Phase 6a + 6b + 6d to unblock the audit), OR expand (also include the cross-phase coupling fix: migrate `OpenAICompatibleRequest.tools` from `list[dict[str, Any]]` to `list[ToolSpec]`).
**My recommendation:** shrink. Phase 3 + cross-phase coupling are separate concerns. Do just Phase 6a + 6b + 6d (the **code-path-honest** part: every `NormalizedResponse` construction site uses the new API; every `broadcast()` caller uses the new signature). Defer Phase 3 + cross-phase coupling to their own tracks. This gives `code_path_audit_20260607` a clean instrumented target.
### Decision 2: Sequence `code_path_audit_20260607` after the follow-up?
**Yes.** The audit's `trace_action` output will be polluted by `worker[queue_fallback] error: WebSocketServer.broadcast() takes 2 positional arguments but 3 were given` unless Phase 6a lands first. The audit's per-action profiling assumes no TypeError spam on the GUI thread; if the broadcast call site raises, the audit's timing data is contaminated.
**Recommended sequencing:**
```
T0: Tier 1 approves follow-up track (decision 1)
T1: Tier 2 implements Phase 6a + 6b + 6d (~3 hours, ~18 commits)
T2: Tier 2 runs tier-1-unit-core FULLY (no stop-on-failure)
T3: Tier 2 runs tier-3-live_gui FULLY (no stop-on-failure)
T4: Tier 1 reviews + merges follow-up track
T5: Tier 1 launches code_path_audit_20260607
T6: Tier 2 implements Phase 3 + cross-phase coupling (separate track, post-audit)
```
### Decision 3: Adjust `code_path_audit_20260607` per the handoff doc
The existing `code_path_audit_20260607` spec (per `ANY_TYPE_AUDIT_20260621.md` §5) calls for per-action profiling. Tier 1 should ADD:
1. The 5 micro-benchmarks listed in `HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md` §7 (NormalizedResponse.__init__, WebSocketMessage.__init__, UsageStats.__init__, ProviderHistory.lock, ToolSpec.__init__).
2. A "no-TypeError-errors-on-any-thread" assertion: the audit should fail if any `worker[queue_fallback] error: WebSocketServer.broadcast()` appears in the test output during the audit's per-action profiling. (Phase 6a's regression test should make this assertion.)
3. The 3 OpenAI-compatible providers (`grok`, `minimax`, `llama`) — currently unprofiled — should be instrumented, since they're the hot paths Phase 6b will migrate.
### Decision 4: Code-Path Audit pre-flight scope expansion
The existing `code_path_audit_20260607` spec scopes 3 actions (`ai_message_lifecycle`, `discussion_save_load`, `gui_startup`). Tier 1 should ADD:
- `provider_history_append`: every `_send_<provider>` path appends to history; the audit should measure per-turn latency.
- `websocket_broadcast`: the GUI thread broadcasts; the audit should measure broadcast throughput under load.
These are the hot paths Phase 3 + Phase 6a will touch. The audit's data will directly inform whether the Phase 3 + Phase 6a refactors are worth the cost.
---
## The 4 documents Tier 1 should read (in this order)
1. **`docs/reports/ANY_TYPE_AUDIT_20260621.md`** (input artifact; the 89 sites and the 5-pattern taxonomy)
2. **`docs/reports/TRACK_COMPLETION_any_type_componentization_20260621.md`** (what was done, what was deferred, the per-phase results table)
3. **`docs/handoffs/HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md`** (test failure categorization; the 4-section follow-up scope; the micro-benchmarks)
4. **`docs/handoffs/HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md`** (the 5-pattern taxonomy applied to runtime; the "the code is the agent debugger" framing; the recommendation not to merge this branch)
**Total read time:** ~45 minutes for Tier 1 to come up to speed.
---
## What Tier 1 should NOT do
- **Don't merge `tier2/any_type_componentization_20260621` as-is.** The 1 runtime bug (broadcast() in `src/app_controller.py`) makes the branch not merge-grade.
- **Don't launch `code_path_audit_20260607` before the follow-up track.** The TypeError spam will pollute the audit's per-action profiling.
- **Don't try to fix Phase 3 + cross-phase coupling in the same track as the follow-up.** Phase 3 is ~8-10 commits; cross-phase coupling is ~3-4 commits; combining them with the broadcast fix would balloon the follow-up to ~25 commits and exceed the 1-4 hour Tier 2 budget.
---
## What Tier 1 SHOULD do (concrete first steps)
1. **Read the 4 documents above.** (45 min)
2. **Decide on Decision 1 scope.** (10 min — approve the shrunk 18-commit follow-up, OR the full 24-commit version)
3. **Create the follow-up track spec** at `conductor/tracks/phase2_4_5_call_site_completion_2026MMDD/spec.md` referencing this prompt + the 4 documents.
4. **Adjust `code_path_audit_20260607` spec** to include the 5 micro-benchmarks + 2 new actions (`provider_history_append`, `websocket_broadcast`) + the "no-TypeError" assertion.
5. **Launch the follow-up track** via `/conductor:implement`.
6. **After follow-up completes and merges,** launch `code_path_audit_20260607`.
---
## What Tier 2 is available for
Tier 2 can be re-invoked to implement the follow-up track. The handoff is in `docs/handoffs/`; the spec will be in `conductor/tracks/.../spec.md`. Same Tier 2 conventions apply:
- Read all 13 `conductor/code_styleguides/*.md` before starting
- Per-task commit + git note + state.toml update
- Throwaway scripts to `scripts/tier2/artifacts/<track-name>/`
- Archive move is the user's job, not Tier 2's
---
## Final note: the bigger vision
The user said: "We are nudging toward a much more interesting and compelling codebase to ideate this ai llm frontend towards something as novel as the rad debugger but for its domain."
The `any_type_componentization_20260621` track is reconnaissance for that vision. The follow-up track is "make the codebase match the reconnaissance." `code_path_audit_20260607` is "measure the runtime cost of every typed site so the agent debugger UI can read it losslessly." Together: typed code + measured paths + readable dataclasses = the foundation for an agent-debugger frontend.
Don't merge the branch. Use it as input.
— Tier 2