Private

Public Access

Files

T

ed 751b94d4e8 Revert "merge: tier2/phase2_4_5_call_site_completion_20260621 (parent + follow-up + Phase 6e analysis)"

This reverts commit f914b2bcd4, reversing
changes made to 7fef95cc87.

2026-06-21 22:39:14 -04:00

10 KiB

Raw Blame History

Tier 1 Prompt: Follow-up Track + Code-Path Audit Sequencing

From: Tier 2 Tech Lead (autonomous sandbox, any_type_componentization_20260621) To: Tier 1 Orchestrator Date: 2026-06-21 Status: Branch tier2/any_type_componentization_20260621 is at 24 commits, ready for review (not merge).

TL;DR (read this first)

Tier 2 ran any_type_componentization_20260621 and the result is reconnaissance-grade, not merge-grade. The track did 48 of 89 fat-struct promotions cleanly (Phase 1, 2, 4, 5), but deferred Phase 3 entirely and left one runtime bug that didn't surface in my targeted regression suite: WebSocketServer.broadcast() callers in src/app_controller.py and src/events.py still use the old (channel, payload) signature after Phase 5 changed it to (message: WebSocketMessage). This produces worker[queue_fallback] error: WebSocketServer.broadcast() takes 2 positional arguments but 3 were given spam in tier-2-mock-app-core.

Tier 1 should: (a) approve a ~15-commit follow-up track that closes the deferred work and the broadcast() bug, then (b) sequence code_path_audit_20260607 to use the follow-up's output as input.

Do not merge this branch yet. Use it as the spec input for the follow-up track.

Context: what happened in this track

Input artifact: docs/reports/ANY_TYPE_AUDIT_20260621.md identified 89 fat-struct sites across 5 candidates (mcp_tool_specs: 8, openai_schemas: 17, provider_state: 41, log_registry.Session: 7, api_hooks.WebSocketMessage: 16).

Output:

48 sites promoted: Phase 1 (ToolSpec + ToolParameter registry; 45 tools), Phase 2 (ChatMessage + UsageStats + ToolCall + refactored NormalizedResponse + OpenAICompatibleRequest), Phase 4 (Session + SessionMetadata with backward-compat __getitem__), Phase 5 (WebSocketMessage + JsonValue).
41 sites deferred: Phase 3 (provider_state.ProviderHistory dataclass exists; the 27 call sites in src/ai_client.py _send_<provider> functions remain on the legacy _anthropic_history / _deepseek_history / etc. globals).
2 new audit scripts: scripts/audit_dataclass_coverage.py (CI gate; baseline = 207 → post-track = 200).
1 styleguide update: conductor/code_styleguides/type_aliases.md §12 "When to Promote TypeAlias to dataclass" (98 lines; the codified rule future agents will follow).
1 end-of-track report: docs/reports/TRACK_COMPLETION_any_type_componentization_20260621.md.

Code-path audit input doc: docs/handoffs/HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md (commit 0fabeaf4). Tier 1 should read this BEFORE scoping code_path_audit_20260607.

Failure report doc: docs/handoffs/HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md (commit d7b6b229). Tier 1 should read this BEFORE scoping the follow-up track.

Tier 1 decision points

Decision 1: Approve the follow-up track?

Recommended scope (per HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md):

Task	Scope	Est. commits
Phase 6a: Fix `WebSocketServer.broadcast()` callers	Grep `src/` for `\.broadcast\(`; replace `broadcast(channel, payload)` with `broadcast(WebSocketMessage(channel=, payload=))` in `src/app_controller.py:_run_pending_tasks_once_result`, `src/events.py`, `src/gui_2.py`. Add regression tests.	4-6
Phase 6b: Complete t2_6 (OpenAICompatibleRequest callers in `_send_grok`, `_send_minimax`, `_send_llama`)	Migrate the 3 remaining `_send_<provider>` functions in `src/ai_client.py` to construct `OpenAICompatibleRequest(messages=[ChatMessage(...)], ...)` instead of `messages=[{"role": ..., "content": ...}]`	3-4
Phase 6c: Complete Phase 3 (provider_state call-site migration)	Replace `_anthropic_history` / `_anthropic_history_lock` etc. in `src/ai_client.py` with `provider_state.get_history('anthropic')`. ~27 call sites.	8-10
Phase 6d: Update `_send_grok` / `_send_minimax` / `_send_llama` callers to use new `ChatMessage` / `UsageStats`	Migration of `NormalizedResponse(text=..., usage_input_tokens=..., ...)` to `NormalizedResponse(text=..., usage=UsageStats(...))` in the 3 send functions.	3-4
Total		~18-24 commits

Tier 1 should decide: approve this scope, OR shrink (defer Phase 3 entirely to a separate track; do just Phase 6a + 6b + 6d to unblock the audit), OR expand (also include the cross-phase coupling fix: migrate OpenAICompatibleRequest.tools from list[dict[str, Any]] to list[ToolSpec]).

My recommendation: shrink. Phase 3 + cross-phase coupling are separate concerns. Do just Phase 6a + 6b + 6d (the code-path-honest part: every NormalizedResponse construction site uses the new API; every broadcast() caller uses the new signature). Defer Phase 3 + cross-phase coupling to their own tracks. This gives code_path_audit_20260607 a clean instrumented target.

Decision 2: Sequence `code_path_audit_20260607` after the follow-up?

Yes. The audit's trace_action output will be polluted by worker[queue_fallback] error: WebSocketServer.broadcast() takes 2 positional arguments but 3 were given unless Phase 6a lands first. The audit's per-action profiling assumes no TypeError spam on the GUI thread; if the broadcast call site raises, the audit's timing data is contaminated.

Recommended sequencing:

T0:  Tier 1 approves follow-up track                  (decision 1)
T1:  Tier 2 implements Phase 6a + 6b + 6d            (~3 hours, ~18 commits)
T2:  Tier 2 runs tier-1-unit-core FULLY               (no stop-on-failure)
T3:  Tier 2 runs tier-3-live_gui FULLY                (no stop-on-failure)
T4:  Tier 1 reviews + merges follow-up track
T5:  Tier 1 launches code_path_audit_20260607
T6:  Tier 2 implements Phase 3 + cross-phase coupling (separate track, post-audit)

Decision 3: Adjust `code_path_audit_20260607` per the handoff doc

The existing code_path_audit_20260607 spec (per ANY_TYPE_AUDIT_20260621.md §5) calls for per-action profiling. Tier 1 should ADD:

The 5 micro-benchmarks listed in HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md §7 (NormalizedResponse.init, WebSocketMessage.init, UsageStats.init, ProviderHistory.lock, ToolSpec.init).
A "no-TypeError-errors-on-any-thread" assertion: the audit should fail if any worker[queue_fallback] error: WebSocketServer.broadcast() appears in the test output during the audit's per-action profiling. (Phase 6a's regression test should make this assertion.)
The 3 OpenAI-compatible providers (grok, minimax, llama) — currently unprofiled — should be instrumented, since they're the hot paths Phase 6b will migrate.

Decision 4: Code-Path Audit pre-flight scope expansion

The existing code_path_audit_20260607 spec scopes 3 actions (ai_message_lifecycle, discussion_save_load, gui_startup). Tier 1 should ADD:

provider_history_append: every _send_<provider> path appends to history; the audit should measure per-turn latency.
websocket_broadcast: the GUI thread broadcasts; the audit should measure broadcast throughput under load.

These are the hot paths Phase 3 + Phase 6a will touch. The audit's data will directly inform whether the Phase 3 + Phase 6a refactors are worth the cost.

The 4 documents Tier 1 should read (in this order)

docs/reports/ANY_TYPE_AUDIT_20260621.md (input artifact; the 89 sites and the 5-pattern taxonomy)
docs/reports/TRACK_COMPLETION_any_type_componentization_20260621.md (what was done, what was deferred, the per-phase results table)
docs/handoffs/HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md (test failure categorization; the 4-section follow-up scope; the micro-benchmarks)
docs/handoffs/HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md (the 5-pattern taxonomy applied to runtime; the "the code is the agent debugger" framing; the recommendation not to merge this branch)

Total read time: ~45 minutes for Tier 1 to come up to speed.

What Tier 1 should NOT do

Don't merge tier2/any_type_componentization_20260621 as-is. The 1 runtime bug (broadcast() in src/app_controller.py) makes the branch not merge-grade.
Don't launch code_path_audit_20260607 before the follow-up track. The TypeError spam will pollute the audit's per-action profiling.
Don't try to fix Phase 3 + cross-phase coupling in the same track as the follow-up. Phase 3 is ~8-10 commits; cross-phase coupling is ~3-4 commits; combining them with the broadcast fix would balloon the follow-up to ~25 commits and exceed the 1-4 hour Tier 2 budget.

What Tier 1 SHOULD do (concrete first steps)

Read the 4 documents above. (45 min)
Decide on Decision 1 scope. (10 min — approve the shrunk 18-commit follow-up, OR the full 24-commit version)
Create the follow-up track spec at conductor/tracks/phase2_4_5_call_site_completion_2026MMDD/spec.md referencing this prompt + the 4 documents.
Adjust code_path_audit_20260607 spec to include the 5 micro-benchmarks + 2 new actions (provider_history_append, websocket_broadcast) + the "no-TypeError" assertion.
Launch the follow-up track via /conductor:implement.
After follow-up completes and merges, launch code_path_audit_20260607.

What Tier 2 is available for

Tier 2 can be re-invoked to implement the follow-up track. The handoff is in docs/handoffs/; the spec will be in conductor/tracks/.../spec.md. Same Tier 2 conventions apply:

Read all 13 conductor/code_styleguides/*.md before starting
Per-task commit + git note + state.toml update
Throwaway scripts to scripts/tier2/artifacts/<track-name>/
Archive move is the user's job, not Tier 2's

Final note: the bigger vision

The user said: "We are nudging toward a much more interesting and compelling codebase to ideate this ai llm frontend towards something as novel as the rad debugger but for its domain."

The any_type_componentization_20260621 track is reconnaissance for that vision. The follow-up track is "make the codebase match the reconnaissance." code_path_audit_20260607 is "measure the runtime cost of every typed site so the agent debugger UI can read it losslessly." Together: typed code + measured paths + readable dataclasses = the foundation for an agent-debugger frontend.

Don't merge the branch. Use it as input.

— Tier 2

10 KiB Raw Blame History