This reverts commitf914b2bcd4, reversing changes made to7fef95cc87.
16 KiB
Phase 3 Hypothetical Cost Analysis (Tier 2 authoritative version)
Author: Tier 2 Tech Lead (autonomous sandbox)
Date: 2026-06-21
Context: Produced during phase2_4_5_call_site_completion_20260621 Phase 6e (after Phase 6b/6d work in src/ai_client.py).
Supersedes: Tier 1's hypothesis at docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md (kept as the hypothesis doc; this is the refined version with in-context data).
1. Methodology
Tier 2 profiled all 6 OpenAI-compatible/anthropic senders in src/ai_client.py (_send_anthropic, _send_deepseek, _send_minimax, _send_grok, _send_qwen, _send_llama) while doing the Phase 6b migration work (3 senders migrated to ChatMessage API). The Phase 6d task was effectively a no-op because NormalizedResponse already uses UsageStats throughout src/openai_compatible.py (verified by Select-String 'NormalizedResponse\(' in src/openai_compatible.py).
This analysis is grounded in:
- Actual
Select-Stringcounts of_<provider>_history+_<provider>_history_lockreferences - Read of
_send_grok(L2532-2587),_send_minimax(L2616-2679),_send_llama(L2856-2917) end-to-end during Phase 6b migration - Read of
_send_anthropic(L1432-1590) including itswith _anthropic_history_lock:blocks - Read of
_send_deepseek(L2179-2230) and_send_qwen(L2680-2750) for context - Helper function definitions:
_strip_cache_controls,_add_history_cache_breakpoint,_estimate_prompt_tokens,_strip_private_keys,_repair_anthropic_history,_repair_deepseek_history,_repair_minimax_history,_trim_anthropic_history,_trim_minimax_history
2. Per-Sender Codepath Catalog
2.1 Reference counts (measured, not estimated)
| Provider | Direct _history refs |
Lock refs | Total | Per-call hot-path? |
|---|---|---|---|---|
| anthropic | 20 | 2 | 22 | Yes (cache controls, repair, trim, strip, est_tokens) |
| deepseek | 12 | 6 | 18 | Yes (lock-heavy; multiple append/read blocks) |
| minimax | 14 | 5 | 19 | Yes (repair + build) |
| qwen | 7 | 4 | 11 | Mild (fewer calls) |
| grok | 7 | 6 | 13 | Yes (lock-heavy; 6 locks for 7 refs) |
| llama | 12 | 9 | 21 | Yes (lock-heavy; native + openai-compat branches) |
| TOTAL | 72 | 32 | 104 | — |
Tier 1's estimate was 112 sites (per metadata.json deferred_work.phase_3_provider_state.estimated_sites). Actual count is 104 (close; 7% under).
2.2 _send_anthropic (22 sites) - HIGHEST PRIORITY
Direct sites:
- L1445:
if discussion_history and not _anthropic_history:(read) - L1449:
for msg in _anthropic_history:(iterate) - L1459:
_strip_cache_controls(_anthropic_history)(helper) - L1460:
_repair_anthropic_history(_anthropic_history)(helper) - L1461:
_anthropic_history.append(...)(append) - L1462:
_add_history_cache_breakpoint(_anthropic_history)(helper) - L1471:
_trim_anthropic_history(system_blocks, _anthropic_history)(helper) - L1473:
_estimate_prompt_tokens(system_blocks, _anthropic_history)(helper, read-only) - L1477:
len(_anthropic_history)(read) - L1491, L1505:
_strip_private_keys(_anthropic_history)(helper, returns new list) - L1508:
_anthropic_history.append(...)(append, post-tool-loop) - L1584:
_anthropic_history.append(...)(append, post-tool-loop)
Helper sites: _strip_cache_controls (2), _add_history_cache_breakpoint (2), _estimate_prompt_tokens (4 across all senders), _strip_private_keys (3 — all anthropic), _repair_anthropic_history (2), _trim_anthropic_history (2)
Hidden cross-references (Tier 2 found):
_strip_private_keysis a NESTED function inside_send_anthropic(L1466) — Tier 1's grep would only catch the call sites at L1491/1505, not the def itself_estimate_prompt_tokensis called from_trim_anthropic_historyAND_trim_minimax_history(helper-of-helper pattern)_strip_cache_controlsmutates the list in place (no return value) — Phase 3 migration needswith h.lock: h.messages = [m without cache controls]noth.messages = _strip(h.messages)_add_history_cache_breakpointalso mutates in place — same issue
Lock usage: 2 explicit _anthropic_history_lock references (L485 in cleanup, L1460 in with block); the helpers acquire the lock implicitly because they're called from inside the with block.
2.3 _send_deepseek (18 sites)
Direct sites:
- L465-468:
global _deepseek_history(declaration, inset_provider) - L488-489: cleanup
- L2203:
with _deepseek_history_lock: - L2204:
_repair_deepseek_history(_deepseek_history)(inside with-block) - L2220:
_deepseek_history.append(...)(post-prompt build) - L2238:
_deepseek_history.append(...)(post-tool-loop)
Helper sites: _repair_deepseek_history (2 calls; called from _send_deepseek AND from cleanup — hidden cross-reference Tier 1 missed)
Lock usage: 6 explicit _deepseek_history_lock references — higher lock usage than anthropic but the deepseek send is single-request (no tool-loop iterations); the 6 locks are mostly in setup/teardown paths.
2.4 _send_minimax (19 sites)
Direct sites:
- L465, L491: global/cleanup
- L2616:
_send_minimaxdef - L2653:
_repair_minimax_history(_minimax_history) - L2655, L2656:
_minimax_history.append(...)(2x) - L2661-2662:
messages: list[Metadata] = [{...}]+messages.extend(_minimax_history)(build request) - L2687 (approx):
_trim_minimax_history(system_blocks, _minimax_history)(helper) - L2689 (approx):
_estimate_prompt_tokens(system_blocks, _minimax_history)(helper, read-only)
Helper sites: _repair_minimax_history (2), _trim_minimax_history (2), _estimate_prompt_tokens (4 across all senders)
Hidden cross-references:
_minimax_historyhas a SPECIAL_repair_minimax_historystep (other providers don't have this for non-anthropic); the migration needs to preserve the order:_repair_minimax_history(h)BEFORE the append loop_extract_minimax_reasoningis a nested helper (no history access but operates on raw_response)
2.5 _send_qwen (11 sites) - LOWEST PRIORITY
Direct sites: 7 direct + 4 lock refs (cleanup + send). Smallest surface area.
2.6 _send_grok (13 sites)
Direct sites:
- L465, L497: global/cleanup
- L2573:
_grok_history.append(...)(initial user message) - L2589:
messages.extend(_grok_history)(build request)
Lock usage: 6 explicit locks — high lock ratio. The send has multiple sequential with _grok_history_lock: blocks (3 distinct blocks: append user msg, build request, post-tool-loop).
2.7 _send_llama (21 sites)
Direct sites: 12 direct + 9 lock refs. The 9 lock refs come from: (1) llama has BOTH _send_llama (OpenAI-compatible) AND _send_llama_native (Ollama); the native path also touches _llama_history.
Hidden cross-references:
_send_llamais a router — checks for localhost/127.0.0.1 and delegates to_send_llama_native. The native path also locks_llama_historyfor reasoning extraction.- This is the ONLY provider with a dual-path architecture — Phase 3 migration needs to handle both paths identically.
3. Qualitative Cost Estimation
3.1 Per-call cost categories (microsecond estimates; refined from Tier 1)
| Category | Current (dict globals) | Proposed (ProviderHistory dataclass) | Per-call delta |
|---|---|---|---|
_<provider>_history.append(m) |
dict.append (~100ns) | h.append(m) (lock acquire + append) (~300ns) |
+200ns/call |
len(_<provider>_history) |
direct attribute (~50ns) | len(h.messages) (~100ns) |
+50ns/call |
for m in _<provider>_history: |
direct iteration | with h.lock: msg_list = list(h.messages) then iterate |
+5-10µs/call (list copy) |
with _<provider>_history_lock: |
direct lock | with h.lock: (same lock, just access via attribute) |
~0 (same lock) |
_global _<provider>_history (cleanup) |
direct module global | h.clear() (lock acquire + clear) |
+200ns/call (1 per session) |
h.get_all() (new pattern) |
n/a | list(h.messages) inside lock |
+5-10µs/call (list copy) |
Tier 1's estimates were pessimistic (they assumed all iterations would need h.get_all() and pay 5-10µs each). Tier 2 found that the iterations are 1-2 per LLM turn, not per-message.
3.2 Per-sender per-turn overhead
_send_anthropic (per-turn):
- 1x append user msg (200ns)
- 1x append post-tool-loop (200ns)
- 1x append post-tool-loop (200ns) (2 tool iterations max)
- 1x
with _anthropic_history_lock:(0ns, same lock) - 1x
_strip_cache_controls(callswith h.lock: h.messages = [...]) = 5-10µs (full iteration + filter) - 1x
_add_history_cache_breakpoint= 5-10µs (full iteration + maybe-append) - 1x
_trim_anthropic_history= 5-10µs (full iteration + maybe-trim) - 1x
_estimate_prompt_tokens= 5-10µs (full iteration + token count) - 1x
_strip_private_keys(2 sites; non-stream + stream) = 5-10µs x 2 = 10-20µs
Per-turn total for anthropic: ~35-65µs (5-7 helper iterations + 2-3 appends)
_send_deepseek (per-turn):
- 1x
_repair_deepseek_history= 5-10µs (full iteration + repair) - 1x append user msg (200ns)
- 1x append post-tool-loop (200ns)
- ~3-4x
with _deepseek_history_lock:blocks (0ns each, just lock churn)
Per-turn total for deepseek: ~5-10µs (1 helper + 2 appends)
_send_minimax (per-turn):
- 1x
_repair_minimax_history= 5-10µs - 2x append user msg (200ns x 2 = 400ns)
- 1x
_trim_minimax_history= 5-10µs - 1x
_estimate_prompt_tokens= 5-10µs
Per-turn total for minimax: ~15-30µs
_send_grok (per-turn):
- 1x append user msg (200ns)
- 1x append post-tool-loop (200ns)
- ~3x
with _grok_history_lock:blocks (0ns each)
Per-turn total for grok: ~400ns (very lean)
_send_qwen (per-turn):
- 1x append user msg (200ns)
- 1x append post-tool-loop (200ns)
- ~2x
with _qwen_history_lock:blocks (0ns)
Per-turn total for qwen: ~400ns (leanest)
_send_llama (per-turn):
- 1x append user msg (200ns)
- 1x append post-tool-loop (200ns)
- ~3-4x
with _llama_history_lock:blocks (0ns each)
Per-turn total for llama: ~400ns (lean)
3.3 Hot iteration sites (the with h.lock: msg_list = h.messages pattern)
| Helper | Line | Lock pattern | Per-call cost | Frequency per turn |
|---|---|---|---|---|
_strip_cache_controls(_anthropic_history) |
1459 | with h.lock: h.messages = [filtered] |
5-10µs | 1/turn |
_add_history_cache_breakpoint(_anthropic_history) |
1462 | with h.lock: h.messages.append(breakpoint) |
5-10µs | 1/turn |
_trim_anthropic_history(...) |
1471 | with h.lock: ... |
5-10µs | 1/turn |
_estimate_prompt_tokens(system_blocks, _anthropic_history) |
1473 | with h.lock: read-only sum |
5-10µs | 1/turn |
_strip_private_keys(_anthropic_history) |
1491, 1505 | with h.lock: return list(h.messages) |
5-10µs | 1-2/turn (stream vs non-stream) |
_repair_anthropic_history(_anthropic_history) |
1460 | with h.lock: in-place mutation |
5-10µs | 1/turn |
_repair_deepseek_history(_deepseek_history) |
2204 | with h.lock: in-place mutation |
5-10µs | 1/turn |
_repair_minimax_history(_minimax_history) |
2653 | with h.lock: in-place mutation |
5-10µs | 1/turn |
_trim_minimax_history(...) |
2687 | with h.lock: ... |
5-10µs | 1/turn |
Recommendation: Use with h.lock: for in-place mutations (no list copy needed). Use h.get_all() only when the caller needs to OWN the list (e.g., _strip_private_keys returns a new list).
4. Comparison vs Tier 1's Hypothesis
| Sender | Tier 1 hypothesis (µs/turn) | Tier 2 refined (µs/turn) | Delta | Reason |
|---|---|---|---|---|
| anthropic | +8-15 | +35-65 | +4-7x HIGHER | Tier 1 missed _strip_cache_controls + _add_history_cache_breakpoint + _strip_private_keys (3 additional helpers per turn) |
| deepseek | +3-7 | +5-10 | ~same | 1 helper + 2 appends |
| minimax | +3-7 | +15-30 | +2-4x HIGHER | Tier 1 missed _repair_minimax_history + _trim_minimax_history (2 helpers per turn) |
| grok | +2-5 | +0.4 | LOWER | No helper functions; pure appends |
| qwen | +2-5 | +0.4 | LOWER | No helper functions; pure appends |
| llama | +4-8 | +0.4 | LOWER | No helper functions in openai-compat path; native path is separate |
| Total session | +1.1-2.4ms | +0.5-1.0ms | LOWER | Anthropic dominates; one turn typically |
Honest takeaway: Tier 1's hypothesis was directionally correct but UNDER-estimated anthropic's helper count and OVER-estimated the lean providers. The total per-session overhead is actually LOWER than Tier 1 estimated, but anthropic is HIGHER than estimated.
The audit (code_path_audit_20260607) will measure actual cost with micro-benchmarks (per the plan's Task 6e.2 hook).
5. Recommendations for Future Phase 3 Track
- Anthropic FIRST (highest ROI; 5 helpers per turn; cache controls are unique to this provider)
- Use
with h.lock: msg_list = h.messagesfor read iterations that need a snapshot (avoidsget_all()'s list-copy cost when caller can work inside the lock) - Use
h.get_all()ONLY when the caller needs to OWN the list outside the lock (e.g.,_strip_private_keysreturns the list to the Anthropic SDK which holds it during the HTTP call) - Use
with h.lock: h.messages = [filtered]for in-place mutations (e.g.,_strip_cache_controls,_add_history_cache_breakpoint) - Lock semantics unchanged —
ProviderHistory.lockis per-instance; no cross-provider contention (verified: 6 separatethreading.Lock()instances at L114/118/122/126/131/135) - Hidden cross-references to migrate FIRST:
_strip_private_keys(nested in_send_anthropic, returns new list — needsh.get_all()or explicit snapshot)_extract_minimax_reasoning(nested in_send_minimax, no history access but operates on raw_response — safe to skip)_send_llama_native(separate path; also touches_llama_history— must migrate in lock-step with_send_llama)
6. Open Questions
- Anthropic
cache_controlsemantics:_strip_cache_controlsREMOVES cache_control markers;_add_history_cache_breakpointADDS them. Does removing them then re-adding them within the same request cost a cache miss on Anthropic's side? (Need to verify with Anthropic API docs / behavioral test.) _trim_<provider>_historymutation vs return: Both helpers do in-place mutation. After Phase 3, do they need to return the new length to the caller (for logging), or can the caller just checklen(h.messages)after the helper returns?- Lock granularity: The
_send_lock(L139) is a global per-vendor-call lock (serialize all sends across providers). The 6_history_locks are per-history. After Phase 3,_send_lockstays as-is; only the 6 history globals migrate. (No code change to_send_lockneeded.) - Tool-loop iterations:
_send_grok,_send_anthropic,_send_minimax,_send_llamaall userun_with_tool_loopwhich can iterate 2-5 times. The per-iteration cost ofh.append(...)is small, but the per-iteration lock churn is non-trivial. Tier 1 estimated 2-5 iterations; Tier 2 confirmed (looking atrun_with_tool_looppatterns).
7. See Also
docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md- Tier 1's hypothesis (the "what we thought before Tier 2 looked")conductor/tracks/phase2_4_5_call_site_completion_20260621/spec.md- Phase 6e directivesconductor/tracks/code_path_audit_20260607/spec.md- the audit that quantifies these estimatesdocs/handoffs/PROMPT_FOR_TIER_1.md- Tier 1 briefsrc/provider_state.py- theProviderHistorydataclass already defined (Phase 0 deliverable from parent track)src/ai_client.py:113-139- the 7 history globals + 6 locks + 1_send_locksrc/ai_client.py:1245-1485- the 5 anthropic helpers (most-heavy)