Private
Public Access
0
0
Files
manual_slop/docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md
T
ed 1a739ecef5 conductor(spec+plan): phase2_4_5_call_site_completion_20260621 + code_path_audit pre-flight adjustments + Phase 3 analysis
PHASE 2/4/5 FOLLOW-UP TRACK (Tier 1 decided SHINK to 6a + 6b + 6d):
- Phase 6a: Fix HookServer.broadcast() callers (app_controller.py + events.py + gui_2.py)
  Adds tests/test_websocket_broadcast_regression.py with no-TypeError assertion
- Phase 6b: Complete _send_grok/_send_minimax/_send_llama OpenAICompatibleRequest migration
- Phase 6d: Update those 3 senders' NormalizedResponse to use UsageStats

Total: ~16 atomic commits, ~3 hours Tier 2 work. Unblocks code_path_audit_20260607.

CODE_PATH_AUDIT_20260607 PRE-FLIGHT ADJUSTMENTS (per handoffs):
- Add 2 new actions: provider_history_append + websocket_broadcast
- Add 5 micro-benchmarks: NormalizedResponse.__init__, WebSocketMessage.__init__,
  UsageStats.__init__, ProviderHistory.lock, ToolSpec.__init__
- Add no-TypeError-errors-on-any-thread assertion (backs test_websocket_broadcast_regression.py)
- Add 89 fat-struct sites from ANY_TYPE_AUDIT_20260621.md as instrumented targets
- BLOCKER: phase2_4_5_call_site_completion_20260621 (broadcast() TypeError)

PHASE 3 HYPOTHETICAL ANALYSIS (separate doc):
docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md - dataclass definitions (already on tier2 branch),
per-provider codepath catalog (112 sites), qualitative cost estimation (~+1-2ms per session,
~+8-15us per _send_anthropic turn). Input for the audit; the audit quantifies the cost.

REGISTRATION:
conductor/tracks.md updated: new row 27 (follow-up), new row 28 (parent any_type_componentization),
row 17 (code_path_audit) updated with pre-flight adjustments note.

Files:
- conductor/tracks/phase2_4_5_call_site_completion_20260621/spec.md (NEW; 633 lines)
- conductor/tracks/phase2_4_5_call_site_completion_20260621/plan.md (NEW; 7 phases, 23 tasks)
- conductor/tracks/phase2_4_5_call_site_completion_20260621/metadata.json (NEW; 8.8KB)
- conductor/tracks/phase2_4_5_call_site_completion_20260621/state.toml (NEW; 11.8KB)
- docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md (NEW; 380 lines; qualitative cost analysis)
- conductor/tracks/code_path_audit_20260607/spec.md (MODIFIED; +93 lines Pre-Flight Adjustments)
- conductor/tracks.md (MODIFIED; +35 lines: 3 new entries + 1 stale row fix)
2026-06-21 18:32:02 -04:00

14 KiB
Raw Blame History

Phase 3 Hypothetical Promotion: ProviderHistory Migration Analysis

Date: 2026-06-21 Author: Tier 1 Orchestrator Status: Hypothetical — this is the analysis the deferred Phase 3 work would look like, NOT a track spec Input: docs/handoffs/HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md (Tier 2's runtime cost framing) + src/provider_state.py (the dataclass already on the tier2 branch)


1. Purpose

Phase 3 (provider_state.ProviderHistory call-site migration in src/ai_client.py) was deferred from any_type_componentization_20260621 because:

  1. It's the highest-risk phase (112 call sites across 6 senders)
  2. The cost depends on whether each site is in a hot path, cold path, or init path
  3. code_path_audit_20260607 is the right tool to quantify that cost before refactoring

This document presents what the migration would look like — the approximate dataclasses, the call-site catalog, and a qualitative cost estimation of each codepath. The actual numbers will come from the audit. This document is the what; the audit produces the cost.

2. The Dataclass (already exists on tier2/any_type_componentization_20260621 branch)

# src/provider_state.py:25-44 (verbatim from branch)
@dataclass
class ProviderHistory:
 messages: list[HistoryMessage] = field(default_factory=list)
 lock: threading.Lock = field(default_factory=threading.Lock)

 def append(self, message: HistoryMessage) -> None:
 with self.lock:
 self.messages.append(message)

 def get_all(self) -> list[HistoryMessage]:
 with self.lock:
 return list(self.messages)

 def replace_all(self, messages: list[HistoryMessage]) -> None:
 with self.lock:
 self.messages = list(messages)

 def clear(self) -> None:
 with self.lock:
 self.messages = []
# src/provider_state.py:47-69 (verbatim from branch)
_PROVIDER_HISTORIES: dict[str, ProviderHistory] = {
 "anthropic": ProviderHistory(),
 "deepseek": ProviderHistory(),
 "minimax": ProviderHistory(),
 "qwen": ProviderHistory(),
 "grok": ProviderHistory(),
 "llama": ProviderHistory(),
}

def get_history(provider: str) -> ProviderHistory:
 if provider not in _PROVIDER_HISTORIES:
 raise KeyError(f"Unknown provider: {provider!r}")
 return _PROVIDER_HISTORIES[provider]

def clear_all() -> None:
 for h in _PROVIDER_HISTORIES.values():
 h.clear()

def providers() -> tuple[str, ...]:
 return tuple(_PROVIDER_HISTORIES.keys())

Properties that hold:

  • @dataclass (NOT frozen=True) — the message list and lock are mutable; this is correct.
  • default_factory=list for messages — each ProviderHistory gets its own list.
  • default_factory=threading.Lock for lock — each ProviderHistory gets its own lock instance.
  • The 4-method interface encapsulates the lock; consumers never see it.

This is already on the tier2 branch. What Phase 3 does is migrate the consumers.

3. The Hypothetical Migration

The migration replaces direct module-global access (_anthropic_history, _anthropic_history_lock) with the typed accessor (get_history("anthropic")).

3.1 Mechanical Translation Rules

Current Hypothetical (typed) Lock needed?
_anthropic_history (read) get_history("anthropic").get_all() Yes (returns copy under lock)
_anthropic_history (write ref) get_history("anthropic").messages Only inside with h.lock:
_anthropic_history.append(m) get_history("anthropic").append(m) Encapsulated
len(_anthropic_history) len(get_history("anthropic").messages) No (length is atomic in CPython)
for m in _anthropic_history: for m in get_history("anthropic").get_all(): Yes
with _anthropic_history_lock: with get_history("anthropic").lock: Same
_anthropic_history = [] get_history("anthropic").clear() Encapsulated

3.2 Pattern Categories (per HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md §1)

Category Sites Path role
_<provider>_history.append(message) 6 Hot — called per LLM turn
len(_<provider>_history) / _<provider>_history[-1] / iteration ~40 Hot — called per LLM turn for trimming, tool-history cache breakpoint, strip_cache_controls
with _<provider>_history_lock: ~30 Mixed — per-turn append is Hot; reset_session is Cold
global _<provider>_history declarations 4 N/A — module-level, no runtime cost
_strip_cache_controls(_<provider>_history) + _repair_<provider>_history() + _add_history_cache_breakpoint() + _trim_<provider>_history() ~30 Hot for Anthropic (cache controls); Mixed for others

3.3 Per-Provider Site Count (measured from current src/ai_client.py)

Provider history refs lock refs global decls Total sites
anthropic 22 2 1 25
deepseek 13 6 1 20
minimax 15 5 1 21
qwen 7 4 1 12
grok 7 6 0 13
llama 12 9 0 21
Total 76 32 4 112

(Note: this 112 count is higher than the HANDOFF's "41" estimate, because the grep counts every reference including duplicates in helper functions. The migration work is the same either way — every reference gets touched — but the codepath catalog is richer.)

4. The Codepath Catalog (with Qualitative Cost Estimation)

This is the what the audit will quantify. Each codepath is tagged with path_role, call_frequency, and estimated qualitative cost delta (positive = slower, negative = faster, zero = no change).

4.1 _send_anthropic (L1407) — HOT per-LLM-turn

Codepaths inside _send_anthropic (per the grep):

Codepath Path role Per-call freq Qualitative cost delta
_strip_cache_controls(_anthropic_history) Hot (called once per send) 1× per LLM turn +0.5-1μs (one extra dict lookup get_history("anthropic") per call)
_repair_anthropic_history(_anthropic_history) Hot 1× per LLM turn +0.5μs (same)
_anthropic_history.append(...) (user message) Hot 1× per LLM turn +0.5μs (method call vs. bare .append())
_add_history_cache_breakpoint(_anthropic_history) Hot 1× per LLM turn +0.5μs (same)
_trim_anthropic_history(system_blocks, _anthropic_history) Hot 1× per LLM turn +0.5μs (one extra dict lookup)
len(_anthropic_history) Hot 2-3× per LLM turn (used in token estimation) +0.3μs per call (.messages attribute access vs. global var lookup)
_estimate_prompt_tokens(system_blocks, _anthropic_history) Hot 1× per LLM turn +1μs (the function takes a list; we pass h.messages under lock or h.get_all(); if the latter, that's a list copy — ~5μs for a 50-message history)
for m in _anthropic_history: (inside _strip_cache_controls) Hot 1× per LLM turn (iteration over ~10-50 messages) +5-10μs (list copy via get_all(); the bare global just iterates directly)

Per-turn overhead estimate: +8-15μs per _send_anthropic call. At ~50 turns per session, that's +400-750μs per session. Negligible vs LLM latency (typically 1-30 seconds).

Recommendation (subject to audit): Migrate, but use with h.lock: blocks for the hot paths inside _strip_cache_controls and _estimate_prompt_tokens to avoid the list-copy overhead of get_all().

4.2 _send_deepseek (L2167) — HOT per-LLM-turn

Similar pattern to _send_anthropic but simpler (no cache controls). Estimated per-turn overhead: +3-7μs. At 50 turns/session, +150-350μs/session.

4.3 _send_minimax (L2616) — HOT per-LLM-turn

Has _trim_minimax_history helper (L2484). Estimated per-turn overhead: +3-7μs. +150-350μs/session.

4.4 _send_grok (L2532) — HOT per-LLM-turn

No _trim or _repair helpers; simpler. Estimated per-turn overhead: +2-5μs. +100-250μs/session.

4.5 _send_qwen (L2771) — HOT per-LLM-turn

No helpers. Estimated per-turn overhead: +2-5μs. +100-250μs/session.

4.6 _send_llama (L2856) — HOT per-LLM-turn

Highest lock count (9 lock refs). Estimated per-turn overhead: +4-8μs. +200-400μs/session.

4.7 cleanup() (L454) — COLD per project-switch

Iterates over all 6 providers, calls clear() on each. Current code does with _<provider>_history_lock: _<provider>_history = [] 6 times. Hypothetical: clear_all() (already defined on branch) iterates and calls clear() once per provider.

Per-call cost: -2 to -5μs (negative — slight speedup because clear_all() is one function call vs. 6 inline blocks). Called once per project switch; negligible in absolute terms.

4.8 reset_session() (L461) — COLD per project-switch

Calls cleanup() (the cold path above). Total per-call cost: -2 to -5μs.

4.9 Init Path — _PROVIDER_HISTORIES dict construction at module load

One-time cost at module import. 6 ProviderHistory() instances each with default_factory=list + default_factory=threading.Lock. Total: ~10-15μs. Negligible.

5. Total Qualitative Cost Summary

Codepath Path role Est. overhead per call Frequency Total per session
_send_anthropic Hot per turn +8-15μs ~50 turns +400-750μs
_send_deepseek Hot per turn +3-7μs ~50 turns +150-350μs
_send_minimax Hot per turn +3-7μs ~50 turns +150-350μs
_send_grok Hot per turn +2-5μs ~50 turns +100-250μs
_send_qwen Hot per turn +2-5μs ~50 turns +100-250μs
_send_llama Hot per turn +4-8μs ~50 turns +200-400μs
cleanup() / reset_session() Cold per project switch -2-5μs ~1× -2-5μs
Init (module load) Once +10-15μs 1× +10-15μs
Total per session ~+1.1-2.4ms

Interpretation: Even at the upper bound (+2.4ms per session), this is 3+ orders of magnitude smaller than the LLM latency it lives alongside. The migration is type-safety for free in absolute runtime terms.

The actual audit will quantify these estimates. If the audit finds a >50μs delta per turn (e.g., from lock contention or get_all() list copies), the migration strategy changes (use with h.lock: blocks instead of get_all() to avoid copies).

6. The Risks (per HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md §1)

Risk Likelihood Impact Mitigation
get_history("anthropic").get_all() copies the list per access; _estimate_prompt_tokens is called per turn and iterates the copy Medium +5-15μs per turn Use with h.lock: msg_list = h.messages pattern in hot iteration sites
Lock contention: multiple _send_<provider> calls in parallel (rare but possible during batch sends) Low +1-10μs per turn under contention The lock is per-provider; no cross-provider contention; benchmark will reveal
getattr lookup overhead for get_history(...) vs. global var Low +0.5μs per access Could inline as a module-level constant if needed; unlikely worth the readability cost
The _send_anthropic cache-control helpers iterate the list; a copy doubles memory bandwidth Medium +10-30μs per turn if hot Refactor to operate on h.messages under lock without copying
Forgotten call site (one of the 76 history refs missed) Medium Runtime AttributeError or NameError Run tier-1-unit-core + tier-2-mock-app-core FULLY per the regression protocol

7. The Codepath Audit Additions (per PROMPT_FOR_TIER_1.md Decision 4)

Per Tier 1's sequencing decision, the code_path_audit_20260607 will instrument:

Action Codepath Measures
provider_history_append get_history(p).append(msg) (or current _anthropic_history.append(msg)) Per-turn append latency + lock acquire time
websocket_broadcast broadcast(WebSocketMessage(...)) (post-Phase 6a) Per-broadcast overhead
ai_message_lifecycle (existing) _send_<provider> end-to-end Total per-turn latency delta pre/post Phase 3
discussion_save_load (existing) reset_session() + project switch Cold-path cost
gui_startup (existing) _PROVIDER_HISTORIES init One-time cost

8. Recommendation (subject to audit results)

If the audit confirms the qualitative estimates (+1-2ms per session; <50μs per turn):

  • Proceed with Phase 3 migration as planned (~10-15 commits).
  • Use with h.lock: blocks for hot iteration sites (_strip_cache_controls, _estimate_prompt_tokens) to avoid get_all() copies.
  • Run the 11-tier regression protocol per the follow-up track.

If the audit reveals a >50μs per-turn delta (e.g., lock contention >10μs):

  • Reconsider: do we even need to migrate the history aspect? It's list[Metadata] already typed.
  • Alternative: keep the module globals but rename them with a _HISTORY suffix and document the pattern; defer full ProviderHistory migration.

The audit decides. This analysis is the input to the audit, not the conclusion.

9. Open Questions

  1. Should the ProviderHistory.messages be list[HistoryMessage] or list[dict[str, Any]]? Currently it's list[HistoryMessage] (= list[Metadata]). The legacy code uses list[Metadata] everywhere. The dataclass stays consistent with the type alias.
  2. Should we add a __len__ method to ProviderHistory to avoid len(h.messages)?
    • Pros: cleaner consumer code
    • Cons: minor; only saves attribute access
  3. Should _PROVIDER_HISTORIES be a MappingProxyType (read-only) for external code? Currently it's a regular dict; external code could mutate _PROVIDER_HISTORIES["anthropic"] = ProviderHistory(). Probably not worth the indirection.
  4. Should get_history(p) validate p (raise on unknown)? Currently it raises KeyError. Could be Literal["anthropic", "deepseek", ...] for static type checking.

10. See Also

  • docs/handoffs/HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md — the original runtime cost framing
  • docs/handoffs/PROMPT_FOR_TIER_1.md — Tier 1's decision points
  • src/provider_state.py — the actual dataclass (already on tier2/any_type_componentization_20260621 branch)
  • conductor/tracks/any_type_componentization_20260621/spec.md — parent track spec
  • conductor/tracks/code_path_audit_20260607/spec.md — the audit that will quantify these estimates
  • conductor/tracks/phase2_4_5_call_site_completion_20260621/spec.md — the follow-up track that unblocks the audit