Phase 5 of any_type_componentization_20260621 changed
WebSocketServer.broadcast(channel, payload) -> broadcast(message: WebSocketMessage)
but did not update internal callers in src/app_controller.py + src/events.py.
This adds 4 tests that pin the contract:
- test_websocket_server_broadcast_signature: asserts (self, message) signature
- test_websocket_server_broadcast_rejects_legacy_2arg_call: asserts legacy raises TypeError
- test_websocket_server_broadcast_accepts_websocket_message_instance: smoke test
- test_internal_callers_use_websocket_message_signature: structural grep over src/
The 4th test currently FAILS (red phase), identifying 2 legacy sites:
- src/app_controller.py:1849: self.event_queue.websocket_server.broadcast('telemetry', metrics)
- src/events.py:115: self.websocket_server.broadcast('events', {...})
The structural assertion is reused by code_path_audit_20260607.
The follow-up track now includes Phase 6e: Tier 2 produces the authoritative
Phase 3 cost analysis as part of the follow-up work. Tier 2 is in
src/ai_client.py doing Phase 6b/6d anyway; they have full context to produce
the refined cost hypothesis that Tier 1's draft at PHASE3_HYPOTHETICAL_PROMOTION.md
could not (Tier 1 worked without the 6b/6d ground-truth context).
Tier 1's draft STAYS as the hypothesis doc. Tier 2's PHASE3_TIER2_ANALYSIS.md
is the refined version (per-sender cost summary + hidden call sites table
+ recommendations for the future Phase 3 track + cross-reference to Tier 1
explicit).
Phase 6e tasks (5 total, ~2 commits):
- t6e_1: Profile the 6 senders (codepath catalog + hidden cross-refs)
- t6e_2: Qualitative cost estimation per sender
- t6e_3: Identify hot iteration sites needing 'with h.lock:' pattern
- t6e_4: Author PHASE3_TIER2_ANALYSIS.md
- t6e_5: Phase 6e checkpoint commit + git note
Total estimated commits: 16 -> 18 (still within Tier 2 1-4 hour budget).
Files updated:
- conductor/tracks/phase2_4_5_call_site_completion_20260621/spec.md (+50 lines)
- conductor/tracks/phase2_4_5_call_site_completion_20260621/plan.md (+146 lines)
- conductor/tracks/phase2_4_5_call_site_completion_20260621/metadata.json (+13 lines)
- conductor/tracks/phase2_4_5_call_site_completion_20260621/state.toml (+9 lines)
- conductor/tracks.md (track 27 entry expanded with Phase 6e details)
Per FR8 in conductor/tracks/video_analysis_campaign_20260621/spec.md, mp4 files are too large for git and VTT auto-sub files are regenerable from transcript.json.
Note: existing tracked files in entropy_epiplexity (commit 5c5f347c) are still in history. The gitignore prevents FUTURE commits from adding them. To remove from history requires filter-repo/filter-branch rewrite (out of scope for this commit).
Categorizes the 12 test failures the user observed when running
scripts/run_tests_batched.py after this track:
- 10 failures (mine): Phase 2 NormalizedResponse API migration
incomplete (state.toml t2_6 deferred task); FIXED in commit 30c8b263
- 3 failures (sandbox): test_audit_tier2_leaks.py flags sandbox
files (mcp_paths.toml, opencode.json) as modified; NOT my fault
- 1 failure (pre-existing): test_gui2_custom_callback_hook_works;
live_gui test not touched by this track
Hidden 12th failure:
- worker[queue_fallback] error: WebSocketServer.broadcast() takes 2
positional arguments but 3 were given (appeared 6+ times during
tier-2-mock-app-core but tests still passed; error logged on
GUI thread from app_controller._run_pending_tasks_once_result).
Phase 5 refactored broadcast(channel, payload) to
broadcast(WebSocketMessage); I updated test_websocket_server.py
but missed app_controller.py and events.py callers.
Sections:
1. Executive summary (3 categories of failure)
2. Per-failure categorization (10 + 3 + 1)
3. Hidden 12th failure: WebSocket broadcast callers in app_controller
4. Phase 2 API migration status (8 sites; 5 done, 3 unverified)
5. Recommendations for follow-up track (~5 call sites + ~41 Phase 3)
6. Code-path audit input (5 micro-benchmarks to add)
Follow-up track scope: ~15-20 commits, well-scoped. Should run BEFORE
code_path_audit_20260607 because the worker[queue_fallback] TypeError
spam will confuse the audit's runtime instrumentation.
Auto-generated by scripts/generate_type_registry.py after the Phase
2 + 4 + 5 commits. These were untracked in the working tree because
commit 4a774eb3 was made before Phase 5 (api_hooks) committed.
NEW files (5):
- docs/type_registry/src_mcp_tool_specs.md (Phase 1; ToolSpec + ToolParameter)
- docs/type_registry/src_openai_schemas.md (Phase 2; ToolCall + ChatMessage + UsageStats + NormalizedResponse + OpenAICompatibleRequest)
- docs/type_registry/src_provider_state.md (Phase 3 partial; ProviderHistory + _PROVIDER_HISTORIES)
- docs/type_registry/src_api_hooks.md (Phase 5; WebSocketMessage)
- docs/type_registry/src_log_registry.md (Phase 4; Session + SessionMetadata)
Verified:
uv run python scripts/generate_type_registry.py --check
Registry in sync (22 files checked)
These 5 .md files were generated after the Phase 5 commit (e9fa69dd)
and the Phase 4 commit (fef6c20e); they were left in the working tree
because commit 4a774eb3 (verify) was made after the Phase 2 registry
regen but before Phase 4/5 changes were fully committed.
Phase 2 deferred t2_6: update src/ai_client.py _send_grok + _send_minimax +
_send_llama + _send_gemini_cli (4 functions) to use the new
dataclass API after NormalizedResponse was refactored to
(text, tool_calls: tuple[ToolCall, ...], usage: UsageStats, raw_response).
These 4 callers were left with the old keyword args
(usage_input_tokens, usage_output_tokens, ...) which broke at
runtime: ai_client.send() raised
TypeError: NormalizedResponse.__init__() got an unexpected keyword
argument 'usage_input_tokens'.
FIXES:
- src/ai_client.py L2054: gemini_cli 'adapter unavailable' branch
- src/ai_client.py L2088: gemini_cli normal response branch
- Added: from src.openai_schemas import UsageStats (module level)
- Added backward-compat in src/openai_compatible.py:
messages_dicts = [m.to_dict() if hasattr(m, 'to_dict') else m for m in request.messages]
(accepts both ChatMessage dataclass and dict for backward compat
with existing tests that pass raw dicts)
TEST FIXES:
- tests/test_ai_client_tool_loop.py: _make_normalized_response helper
uses UsageStats instead of usage_*_tokens kwargs
- tests/test_ai_client_tool_loop_builder.py: same
- tests/test_ai_client_tool_loop_send_func.py: same
- tests/test_openai_compatible.py: NormalizedResponse(text=..., usage=UsageStats(...))
+ tool_calls[0].function.name (attribute access) instead of ['function']['name']
- tests/test_auto_whitelist.py: use update_session_metadata() instead of
dict subscript assignment (Session dataclass doesn't support item assignment)
VERIFIED:
uv run pytest tests/test_ai_client_*.py tests/test_openai_*.py \
tests/test_auto_whitelist.py --timeout=30
56 passed in 4.49s (19 previously failing tests now pass)
uv run python scripts/audit_weak_types.py --strict
STRICT OK: 115 weak sites <= baseline 115
uv run python scripts/audit_dataclass_coverage.py --strict
STRICT OK: 200 weak sites <= baseline 207
This commit closes the t2_6 deferred task. The 41-site Phase 3 call-site
migration remains deferred (separate provider_state_migration track).
While running any_type_componentization_20260621, the Tier 2 agent
performed a partial code-path audit + code normalization pass that
wasn't in the original scope. This handoff document frames:
1. What was done (48 of 89 fat-struct sites promoted; 41 deferred)
2. The 5-pattern Any-type taxonomy (Patterns 3/4/5 correctly preserved;
Patterns 1/2 promoted to dataclass/registry)
3. Recommended adjustments for code_path_audit_20260607:
- Instrument the 89 fat-struct sites with hot/cold/init path tags
- Compare pre/post refactor cost for the 48 promoted sites
- Rank the 41 deferred Phase 3 sites by hot-path frequency
- Report per-call cost deltas in microseconds
4. What was NOT done (no runtime profiling; no pre/post benchmarks)
5. Decision points for Tier 1 (merge / reject / cherry-pick)
6. The bigger vision: AI/LLM frontend debugger (rad-debugger analog)
requires typed ProviderHistory, ToolSpec, Session, WebSocketMessage
to step through the agent loop without losing type fidelity
Recommendation: Don't merge this branch yet. Let code_path_audit_20260607
use it as a reconnaissance warm-up; drive the next refactor track from
the audit's per-action cost data.
The 4 newly-promoted dataclasses (mcp_tool_specs, openai_schemas,
log_registry.Session, api_hooks.WebSocketMessage) are the typed-state
foundation that the future debugger UI will read from. The 41 deferred
Phase 3 sites are the last gap: per-turn history manipulation in
src/ai_client.py needs typed state before the debugger can step
through the agent loop losslessly.
Length: 7 sections, 7 paragraphs of Tier 1 decision framing.
Location: docs/handoffs/HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md
(new directory; complements docs/reports/ which is for reports vs
handoffs which are cross-track input artifacts).
youtube-transcript-api v1.2.4 returns XML parse error on empty response for ALL videos in this campaign. yt-dlp's --write-auto-subs reliably returns 1000s of segments per video. Switched to yt-dlp as the primary path.
Tests updated to mock _fetch_via_ytdlp instead of _fetch_raw_transcript. 8/8 tests passing.
Phase 2 (openai_schemas) progress:
- t2_1-t2_5+t2_7-t2_8 (a96f946b): 19 tests pass; NormalizedResponse +
OpenAICompatibleRequest refactored to dataclasses
- t2_6 (deferred): _send_grok + _send_minimax + _send_llama in
src/ai_client.py still use legacy NormalizedResponse(text=..., tool_calls=[], usage_*_tokens=...)
kwargs. These will be updated in Phase 3 (provider_state) as part of
the ai_client refactor.
- t2_9: Phase 2 checkpoint (commit hash filled in this commit)
current_phase: 2 -> 3
phase_2.status: pending -> completed
Next: Phase 3 - provider_state (15 tasks; the largest phase).
Phase 1 of any_type_componentization_20260621. Migrates ai_client.py:
- Line 560: new_tools = {name: False for name in mcp_client.TOOL_NAMES}
-> mcp_tool_specs.tool_names()
- Line 582: _agent_tools = {name: True for name in mcp_client.TOOL_NAMES}
-> mcp_tool_specs.tool_names()
- Line 1012: is_native = name in mcp_client.TOOL_NAMES
-> name in mcp_tool_specs.tool_names()
Plus adds: from src import mcp_tool_specs
Verified:
uv run pytest tests/test_mcp_tool_specs.py tests/test_mcp_client_beads.py tests/test_mcp_client_paths.py tests/test_audit_dataclass_coverage.py tests/test_type_aliases.py
39 passed in 11.79s
No regressions. The mcp_client.TOOL_NAMES re-export is preserved for
backward compatibility with any external test/code that imports it.
Phase 1 of any_type_componentization_20260621. Migrates the 4 call sites
in src/mcp_client.py to use the new typed module:
- Line 1944: native_names = {t['name'] for t in MCP_TOOL_SPECS}
-> native_names = mcp_tool_specs.tool_names()
- Line 1958: res = list(MCP_TOOL_SPECS)
-> res = [s.to_dict() for s in mcp_tool_specs.get_tool_schemas()]
- Line 2747: TOOL_NAMES = {t['name'] for t in MCP_TOOL_SPECS}
-> TOOL_NAMES = mcp_tool_specs.tool_names()
Plus: removes the legacy MCP_TOOL_SPECS list literal (lines 1973-2746;
774 lines of dict literals). The data lives in src/mcp_tool_specs.py
now; the canonical registry. (The legacy dict shape is preserved via
ToolSpec.to_dict() for downstream serialization.)
Adds import: from src import mcp_tool_specs
Verified:
uv run pytest tests/test_mcp_tool_specs.py tests/test_audit_dataclass_coverage.py tests/test_type_aliases.py
32 passed in 5.48s
uv run pytest tests/test_mcp_client_beads.py tests/test_mcp_client_paths.py
7 passed in 3.20s
Cross-module invariant (test_tool_names_subset_of_models_agent_tool_names):
the 45 mcp_tool_specs.tool_names() are all in models.AGENT_TOOL_NAMES.
Phase 0 of any_type_componentization_20260621. Adds the canonical
decision rule that future contributors can apply without re-deriving:
- TypeAlias conditions: open shape, self-describing, transient
- dataclass(frozen=True) conditions: known fields, multi-site access,
stable serialization, shared across modules
- The src/vendor_capabilities.py reference pattern (5 properties)
- Decision tree
- The 5 worked examples (89 sites promoted per the audit)
- Cross-references to audit scripts + input artifact + track
This is the canonical artifact for the 'when to dataclass' question;
subsequent phases refer to it via 'see styleguide 12' rather than
re-deriving the rule.
Phase 0 of any_type_componentization_20260621. Extends src/type_aliases.py
with two recursive-friendly TypeAliases for JSON wire format (used by
Phase 5 api_hooks WebSocketMessage):
- JsonPrimitive: str | int | float | bool | None
- JsonValue: JsonPrimitive | list['JsonValue'] | dict[str, 'JsonValue']
The forward-ref 'JsonValue' strings work because from __future__ import
annotations is at the top of the module (PEP 563 + PEP 613 TypeAlias).
Tests added (4 new, 14 total):
- test_json_primitive_alias_resolves_to_union: hints exposes JsonPrimitive
- test_json_value_alias_resolves_to_recursive_union: hints exposes JsonValue
- test_json_value_accepts_primitive_dict: dict[str, JsonValue] runtime use
- test_json_value_accepts_nested_structures: nested dict+list round-trip
Verification:
uv run pytest tests/test_type_aliases.py --timeout=30
14 passed in 2.97s
GREEN phase for Phase 0. Mirrors scripts/audit_weak_types.py design with
3 additions specific to the any-type componentization track:
1. PROMOTED_SITE_MODULES allowlist: the 3 new src/ modules
(mcp_tool_specs.py, openai_schemas.py, provider_state.py) are exempt
from Any-counting (their new dataclasses intentionally have raw_response: Any
and SDK holder fields that stay as Any per Pattern 3).
2. INLINE_PROMOTED_SITE_MODULES: log_registry.py + api_hooks.py get their
dataclasses added inline in Phase 4 + 5 (not new modules); same exemption.
3. Combined counter: counts both Any AND weak-struct patterns
(dict_str_any, list_of_dict, optional_dict, etc.).
Modes:
- default: informational (exits 0; prints human report)
- --json: machine-readable with by_file, by_category, total_weak
- --strict: CI gate (exits 1 when current > baseline)
- --baseline: path to baseline file (default: scripts/audit_dataclass_coverage.baseline.json)
Baseline: scripts/audit_dataclass_coverage.baseline.json = 207 weak sites
(captured pre-Phase-1; expected to drop to ~118 after 89 sites promoted).
Verification:
uv run python scripts/audit_dataclass_coverage.py --strict
STRICT OK: 207 weak sites <= baseline 207
uv run pytest tests/test_audit_dataclass_coverage.py --timeout=30
7 passed in 5.15s
RED phase for Phase 0. Mirrors tests/test_audit_weak_types.py structure:
- test_audit_script_exists: AUDIT_SCRIPT.is_file() sanity
- test_audit_help_runs: --help exits 0
- test_audit_json_mode_emits_valid_json: --json emits valid JSON with expected fields
- test_audit_default_mode_emits_human_report: default mode prints a report
- test_audit_strict_mode_against_existing_baseline_passes: --strict exits 0 when current <= baseline
- test_audit_strict_mode_fails_when_baseline_is_zero: --strict exits 1 when current > baseline=0
- test_audit_baseline_field_shape: --json output has expected baseline-shape fields
7 tests total. Run with: uv run pytest tests/test_audit_dataclass_coverage.py --timeout=30
NOTE: 6 of 7 tests fail at this commit (audit script not yet implemented).
This is the RED phase; GREEN comes in the next commit.