Private
Public Access
0
0
Commit Graph

3 Commits

Author SHA1 Message Date
ed 95a8fae234 docs(handoff): Tier 1 prompt - follow-up track + audit sequencing
Synthesizes the 2 prior handoff docs into a ready-to-use Tier 1 brief:
- HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md (the audit framing)
- HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md (the test failures + scope)

Sections:
1. TL;DR (3 paragraphs): what happened, the hidden broadcast() bug,
   the recommendation (don't merge; use as input for follow-up track)
2. Context: 48 promoted, 41 deferred, 2 new audits, 1 styleguide
3. 4 decision points for Tier 1 (scope, sequencing, audit adjustments,
   scope expansion)
4. The 4 documents Tier 1 should read in order (45 min total)
5. What Tier 1 should NOT do (3 anti-patterns)
6. What Tier 1 SHOULD do (6 concrete first steps)
7. What Tier 2 is available for (conventions reminder)
8. The bigger vision (agent-debugger framing)

Recommended sequencing for Tier 1:
T0: Approve follow-up track scope
T1: Tier 2 implements Phase 6a + 6b + 6d (~18 commits, 3 hours)
T2: Tier 2 runs tier-1-unit-core FULLY (no stop-on-failure)
T3: Tier 2 runs tier-3-live_gui FULLY
T4: Tier 1 reviews + merges follow-up track
T5: Tier 1 launches code_path_audit_20260607
T6: Tier 2 implements Phase 3 + cross-phase coupling (separate track)

Tier 1's scope decision: I recommend the SHRUNK version (Phase 6a + 6b + 6d
only; defer Phase 3 to its own track). This gives the code-path audit a
clean instrumented target without ballooning the follow-up beyond Tier 2's
1-4 hour budget.

Audit adjustments to add:
- 5 micro-benchmarks (NormalizedResponse.__init__, WebSocketMessage.__init__,
  UsageStats.__init__, ProviderHistory.lock, ToolSpec.__init__)
- 'no-TypeError-errors-on-any-thread' assertion
- Instrument grok/minimax/llama providers (currently unprofiled)
- Add 2 new actions: provider_history_append + websocket_broadcast
2026-06-21 17:57:38 -04:00
ed b3ed4b1508 docs(handoff): test failure report for follow-up track scoping
Categorizes the 12 test failures the user observed when running
scripts/run_tests_batched.py after this track:

- 10 failures (mine): Phase 2 NormalizedResponse API migration
  incomplete (state.toml t2_6 deferred task); FIXED in commit 30c8b263
- 3 failures (sandbox): test_audit_tier2_leaks.py flags sandbox
  files (mcp_paths.toml, opencode.json) as modified; NOT my fault
- 1 failure (pre-existing): test_gui2_custom_callback_hook_works;
  live_gui test not touched by this track

Hidden 12th failure:
- worker[queue_fallback] error: WebSocketServer.broadcast() takes 2
  positional arguments but 3 were given (appeared 6+ times during
  tier-2-mock-app-core but tests still passed; error logged on
  GUI thread from app_controller._run_pending_tasks_once_result).
  Phase 5 refactored broadcast(channel, payload) to
  broadcast(WebSocketMessage); I updated test_websocket_server.py
  but missed app_controller.py and events.py callers.

Sections:
1. Executive summary (3 categories of failure)
2. Per-failure categorization (10 + 3 + 1)
3. Hidden 12th failure: WebSocket broadcast callers in app_controller
4. Phase 2 API migration status (8 sites; 5 done, 3 unverified)
5. Recommendations for follow-up track (~5 call sites + ~41 Phase 3)
6. Code-path audit input (5 micro-benchmarks to add)

Follow-up track scope: ~15-20 commits, well-scoped. Should run BEFORE
code_path_audit_20260607 because the worker[queue_fallback] TypeError
spam will confuse the audit's runtime instrumentation.
2026-06-21 17:53:48 -04:00
ed 0fabeaf4ce docs(handoff): Tier 2 -> Tier 1 input for code_path_audit_20260607
While running any_type_componentization_20260621, the Tier 2 agent
performed a partial code-path audit + code normalization pass that
wasn't in the original scope. This handoff document frames:

1. What was done (48 of 89 fat-struct sites promoted; 41 deferred)
2. The 5-pattern Any-type taxonomy (Patterns 3/4/5 correctly preserved;
   Patterns 1/2 promoted to dataclass/registry)
3. Recommended adjustments for code_path_audit_20260607:
   - Instrument the 89 fat-struct sites with hot/cold/init path tags
   - Compare pre/post refactor cost for the 48 promoted sites
   - Rank the 41 deferred Phase 3 sites by hot-path frequency
   - Report per-call cost deltas in microseconds
4. What was NOT done (no runtime profiling; no pre/post benchmarks)
5. Decision points for Tier 1 (merge / reject / cherry-pick)
6. The bigger vision: AI/LLM frontend debugger (rad-debugger analog)
   requires typed ProviderHistory, ToolSpec, Session, WebSocketMessage
   to step through the agent loop without losing type fidelity

Recommendation: Don't merge this branch yet. Let code_path_audit_20260607
use it as a reconnaissance warm-up; drive the next refactor track from
the audit's per-action cost data.

The 4 newly-promoted dataclasses (mcp_tool_specs, openai_schemas,
log_registry.Session, api_hooks.WebSocketMessage) are the typed-state
foundation that the future debugger UI will read from. The 41 deferred
Phase 3 sites are the last gap: per-turn history manipulation in
src/ai_client.py needs typed state before the debugger can step
through the agent loop losslessly.

Length: 7 sections, 7 paragraphs of Tier 1 decision framing.
Location: docs/handoffs/HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md
(new directory; complements docs/reports/ which is for reports vs
handoffs which are cross-track input artifacts).
2026-06-21 17:14:22 -04:00