Private
Public Access
0
0
Files
manual_slop/docs/reports/PHASE3_TIER2_ANALYSIS.md
T

16 KiB

Phase 3 Hypothetical Cost Analysis (Tier 2 authoritative version)

Author: Tier 2 Tech Lead (autonomous sandbox) Date: 2026-06-21 Context: Produced during phase2_4_5_call_site_completion_20260621 Phase 6e (after Phase 6b/6d work in src/ai_client.py). Supersedes: Tier 1's hypothesis at docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md (kept as the hypothesis doc; this is the refined version with in-context data).


1. Methodology

Tier 2 profiled all 6 OpenAI-compatible/anthropic senders in src/ai_client.py (_send_anthropic, _send_deepseek, _send_minimax, _send_grok, _send_qwen, _send_llama) while doing the Phase 6b migration work (3 senders migrated to ChatMessage API). The Phase 6d task was effectively a no-op because NormalizedResponse already uses UsageStats throughout src/openai_compatible.py (verified by Select-String 'NormalizedResponse\(' in src/openai_compatible.py).

This analysis is grounded in:

  • Actual Select-String counts of _<provider>_history + _<provider>_history_lock references
  • Read of _send_grok (L2532-2587), _send_minimax (L2616-2679), _send_llama (L2856-2917) end-to-end during Phase 6b migration
  • Read of _send_anthropic (L1432-1590) including its with _anthropic_history_lock: blocks
  • Read of _send_deepseek (L2179-2230) and _send_qwen (L2680-2750) for context
  • Helper function definitions: _strip_cache_controls, _add_history_cache_breakpoint, _estimate_prompt_tokens, _strip_private_keys, _repair_anthropic_history, _repair_deepseek_history, _repair_minimax_history, _trim_anthropic_history, _trim_minimax_history

2. Per-Sender Codepath Catalog

2.1 Reference counts (measured, not estimated)

Provider Direct _history refs Lock refs Total Per-call hot-path?
anthropic 20 2 22 Yes (cache controls, repair, trim, strip, est_tokens)
deepseek 12 6 18 Yes (lock-heavy; multiple append/read blocks)
minimax 14 5 19 Yes (repair + build)
qwen 7 4 11 Mild (fewer calls)
grok 7 6 13 Yes (lock-heavy; 6 locks for 7 refs)
llama 12 9 21 Yes (lock-heavy; native + openai-compat branches)
TOTAL 72 32 104

Tier 1's estimate was 112 sites (per metadata.json deferred_work.phase_3_provider_state.estimated_sites). Actual count is 104 (close; 7% under).

2.2 _send_anthropic (22 sites) - HIGHEST PRIORITY

Direct sites:

  • L1445: if discussion_history and not _anthropic_history: (read)
  • L1449: for msg in _anthropic_history: (iterate)
  • L1459: _strip_cache_controls(_anthropic_history) (helper)
  • L1460: _repair_anthropic_history(_anthropic_history) (helper)
  • L1461: _anthropic_history.append(...) (append)
  • L1462: _add_history_cache_breakpoint(_anthropic_history) (helper)
  • L1471: _trim_anthropic_history(system_blocks, _anthropic_history) (helper)
  • L1473: _estimate_prompt_tokens(system_blocks, _anthropic_history) (helper, read-only)
  • L1477: len(_anthropic_history) (read)
  • L1491, L1505: _strip_private_keys(_anthropic_history) (helper, returns new list)
  • L1508: _anthropic_history.append(...) (append, post-tool-loop)
  • L1584: _anthropic_history.append(...) (append, post-tool-loop)

Helper sites: _strip_cache_controls (2), _add_history_cache_breakpoint (2), _estimate_prompt_tokens (4 across all senders), _strip_private_keys (3 — all anthropic), _repair_anthropic_history (2), _trim_anthropic_history (2)

Hidden cross-references (Tier 2 found):

  • _strip_private_keys is a NESTED function inside _send_anthropic (L1466) — Tier 1's grep would only catch the call sites at L1491/1505, not the def itself
  • _estimate_prompt_tokens is called from _trim_anthropic_history AND _trim_minimax_history (helper-of-helper pattern)
  • _strip_cache_controls mutates the list in place (no return value) — Phase 3 migration needs with h.lock: h.messages = [m without cache controls] not h.messages = _strip(h.messages)
  • _add_history_cache_breakpoint also mutates in place — same issue

Lock usage: 2 explicit _anthropic_history_lock references (L485 in cleanup, L1460 in with block); the helpers acquire the lock implicitly because they're called from inside the with block.

2.3 _send_deepseek (18 sites)

Direct sites:

  • L465-468: global _deepseek_history (declaration, in set_provider)
  • L488-489: cleanup
  • L2203: with _deepseek_history_lock:
  • L2204: _repair_deepseek_history(_deepseek_history) (inside with-block)
  • L2220: _deepseek_history.append(...) (post-prompt build)
  • L2238: _deepseek_history.append(...) (post-tool-loop)

Helper sites: _repair_deepseek_history (2 calls; called from _send_deepseek AND from cleanup — hidden cross-reference Tier 1 missed)

Lock usage: 6 explicit _deepseek_history_lock references — higher lock usage than anthropic but the deepseek send is single-request (no tool-loop iterations); the 6 locks are mostly in setup/teardown paths.

2.4 _send_minimax (19 sites)

Direct sites:

  • L465, L491: global/cleanup
  • L2616: _send_minimax def
  • L2653: _repair_minimax_history(_minimax_history)
  • L2655, L2656: _minimax_history.append(...) (2x)
  • L2661-2662: messages: list[Metadata] = [{...}] + messages.extend(_minimax_history) (build request)
  • L2687 (approx): _trim_minimax_history(system_blocks, _minimax_history) (helper)
  • L2689 (approx): _estimate_prompt_tokens(system_blocks, _minimax_history) (helper, read-only)

Helper sites: _repair_minimax_history (2), _trim_minimax_history (2), _estimate_prompt_tokens (4 across all senders)

Hidden cross-references:

  • _minimax_history has a SPECIAL _repair_minimax_history step (other providers don't have this for non-anthropic); the migration needs to preserve the order: _repair_minimax_history(h) BEFORE the append loop
  • _extract_minimax_reasoning is a nested helper (no history access but operates on raw_response)

2.5 _send_qwen (11 sites) - LOWEST PRIORITY

Direct sites: 7 direct + 4 lock refs (cleanup + send). Smallest surface area.

2.6 _send_grok (13 sites)

Direct sites:

  • L465, L497: global/cleanup
  • L2573: _grok_history.append(...) (initial user message)
  • L2589: messages.extend(_grok_history) (build request)

Lock usage: 6 explicit locks — high lock ratio. The send has multiple sequential with _grok_history_lock: blocks (3 distinct blocks: append user msg, build request, post-tool-loop).

2.7 _send_llama (21 sites)

Direct sites: 12 direct + 9 lock refs. The 9 lock refs come from: (1) llama has BOTH _send_llama (OpenAI-compatible) AND _send_llama_native (Ollama); the native path also touches _llama_history.

Hidden cross-references:

  • _send_llama is a router — checks for localhost/127.0.0.1 and delegates to _send_llama_native. The native path also locks _llama_history for reasoning extraction.
  • This is the ONLY provider with a dual-path architecture — Phase 3 migration needs to handle both paths identically.

3. Qualitative Cost Estimation

3.1 Per-call cost categories (microsecond estimates; refined from Tier 1)

Category Current (dict globals) Proposed (ProviderHistory dataclass) Per-call delta
_<provider>_history.append(m) dict.append (~100ns) h.append(m) (lock acquire + append) (~300ns) +200ns/call
len(_<provider>_history) direct attribute (~50ns) len(h.messages) (~100ns) +50ns/call
for m in _<provider>_history: direct iteration with h.lock: msg_list = list(h.messages) then iterate +5-10µs/call (list copy)
with _<provider>_history_lock: direct lock with h.lock: (same lock, just access via attribute) ~0 (same lock)
_global _<provider>_history (cleanup) direct module global h.clear() (lock acquire + clear) +200ns/call (1 per session)
h.get_all() (new pattern) n/a list(h.messages) inside lock +5-10µs/call (list copy)

Tier 1's estimates were pessimistic (they assumed all iterations would need h.get_all() and pay 5-10µs each). Tier 2 found that the iterations are 1-2 per LLM turn, not per-message.

3.2 Per-sender per-turn overhead

_send_anthropic (per-turn):

  • 1x append user msg (200ns)
  • 1x append post-tool-loop (200ns)
  • 1x append post-tool-loop (200ns) (2 tool iterations max)
  • 1x with _anthropic_history_lock: (0ns, same lock)
  • 1x _strip_cache_controls (calls with h.lock: h.messages = [...]) = 5-10µs (full iteration + filter)
  • 1x _add_history_cache_breakpoint = 5-10µs (full iteration + maybe-append)
  • 1x _trim_anthropic_history = 5-10µs (full iteration + maybe-trim)
  • 1x _estimate_prompt_tokens = 5-10µs (full iteration + token count)
  • 1x _strip_private_keys (2 sites; non-stream + stream) = 5-10µs x 2 = 10-20µs

Per-turn total for anthropic: ~35-65µs (5-7 helper iterations + 2-3 appends)

_send_deepseek (per-turn):

  • 1x _repair_deepseek_history = 5-10µs (full iteration + repair)
  • 1x append user msg (200ns)
  • 1x append post-tool-loop (200ns)
  • ~3-4x with _deepseek_history_lock: blocks (0ns each, just lock churn)

Per-turn total for deepseek: ~5-10µs (1 helper + 2 appends)

_send_minimax (per-turn):

  • 1x _repair_minimax_history = 5-10µs
  • 2x append user msg (200ns x 2 = 400ns)
  • 1x _trim_minimax_history = 5-10µs
  • 1x _estimate_prompt_tokens = 5-10µs

Per-turn total for minimax: ~15-30µs

_send_grok (per-turn):

  • 1x append user msg (200ns)
  • 1x append post-tool-loop (200ns)
  • ~3x with _grok_history_lock: blocks (0ns each)

Per-turn total for grok: ~400ns (very lean)

_send_qwen (per-turn):

  • 1x append user msg (200ns)
  • 1x append post-tool-loop (200ns)
  • ~2x with _qwen_history_lock: blocks (0ns)

Per-turn total for qwen: ~400ns (leanest)

_send_llama (per-turn):

  • 1x append user msg (200ns)
  • 1x append post-tool-loop (200ns)
  • ~3-4x with _llama_history_lock: blocks (0ns each)

Per-turn total for llama: ~400ns (lean)

3.3 Hot iteration sites (the with h.lock: msg_list = h.messages pattern)

Helper Line Lock pattern Per-call cost Frequency per turn
_strip_cache_controls(_anthropic_history) 1459 with h.lock: h.messages = [filtered] 5-10µs 1/turn
_add_history_cache_breakpoint(_anthropic_history) 1462 with h.lock: h.messages.append(breakpoint) 5-10µs 1/turn
_trim_anthropic_history(...) 1471 with h.lock: ... 5-10µs 1/turn
_estimate_prompt_tokens(system_blocks, _anthropic_history) 1473 with h.lock: read-only sum 5-10µs 1/turn
_strip_private_keys(_anthropic_history) 1491, 1505 with h.lock: return list(h.messages) 5-10µs 1-2/turn (stream vs non-stream)
_repair_anthropic_history(_anthropic_history) 1460 with h.lock: in-place mutation 5-10µs 1/turn
_repair_deepseek_history(_deepseek_history) 2204 with h.lock: in-place mutation 5-10µs 1/turn
_repair_minimax_history(_minimax_history) 2653 with h.lock: in-place mutation 5-10µs 1/turn
_trim_minimax_history(...) 2687 with h.lock: ... 5-10µs 1/turn

Recommendation: Use with h.lock: for in-place mutations (no list copy needed). Use h.get_all() only when the caller needs to OWN the list (e.g., _strip_private_keys returns a new list).


4. Comparison vs Tier 1's Hypothesis

Sender Tier 1 hypothesis (µs/turn) Tier 2 refined (µs/turn) Delta Reason
anthropic +8-15 +35-65 +4-7x HIGHER Tier 1 missed _strip_cache_controls + _add_history_cache_breakpoint + _strip_private_keys (3 additional helpers per turn)
deepseek +3-7 +5-10 ~same 1 helper + 2 appends
minimax +3-7 +15-30 +2-4x HIGHER Tier 1 missed _repair_minimax_history + _trim_minimax_history (2 helpers per turn)
grok +2-5 +0.4 LOWER No helper functions; pure appends
qwen +2-5 +0.4 LOWER No helper functions; pure appends
llama +4-8 +0.4 LOWER No helper functions in openai-compat path; native path is separate
Total session +1.1-2.4ms +0.5-1.0ms LOWER Anthropic dominates; one turn typically

Honest takeaway: Tier 1's hypothesis was directionally correct but UNDER-estimated anthropic's helper count and OVER-estimated the lean providers. The total per-session overhead is actually LOWER than Tier 1 estimated, but anthropic is HIGHER than estimated.

The audit (code_path_audit_20260607) will measure actual cost with micro-benchmarks (per the plan's Task 6e.2 hook).


5. Recommendations for Future Phase 3 Track

  1. Anthropic FIRST (highest ROI; 5 helpers per turn; cache controls are unique to this provider)
  2. Use with h.lock: msg_list = h.messages for read iterations that need a snapshot (avoids get_all()'s list-copy cost when caller can work inside the lock)
  3. Use h.get_all() ONLY when the caller needs to OWN the list outside the lock (e.g., _strip_private_keys returns the list to the Anthropic SDK which holds it during the HTTP call)
  4. Use with h.lock: h.messages = [filtered] for in-place mutations (e.g., _strip_cache_controls, _add_history_cache_breakpoint)
  5. Lock semantics unchangedProviderHistory.lock is per-instance; no cross-provider contention (verified: 6 separate threading.Lock() instances at L114/118/122/126/131/135)
  6. Hidden cross-references to migrate FIRST:
    • _strip_private_keys (nested in _send_anthropic, returns new list — needs h.get_all() or explicit snapshot)
    • _extract_minimax_reasoning (nested in _send_minimax, no history access but operates on raw_response — safe to skip)
    • _send_llama_native (separate path; also touches _llama_history — must migrate in lock-step with _send_llama)

6. Open Questions

  1. Anthropic cache_control semantics: _strip_cache_controls REMOVES cache_control markers; _add_history_cache_breakpoint ADDS them. Does removing them then re-adding them within the same request cost a cache miss on Anthropic's side? (Need to verify with Anthropic API docs / behavioral test.)
  2. _trim_<provider>_history mutation vs return: Both helpers do in-place mutation. After Phase 3, do they need to return the new length to the caller (for logging), or can the caller just check len(h.messages) after the helper returns?
  3. Lock granularity: The _send_lock (L139) is a global per-vendor-call lock (serialize all sends across providers). The 6 _history_locks are per-history. After Phase 3, _send_lock stays as-is; only the 6 history globals migrate. (No code change to _send_lock needed.)
  4. Tool-loop iterations: _send_grok, _send_anthropic, _send_minimax, _send_llama all use run_with_tool_loop which can iterate 2-5 times. The per-iteration cost of h.append(...) is small, but the per-iteration lock churn is non-trivial. Tier 1 estimated 2-5 iterations; Tier 2 confirmed (looking at run_with_tool_loop patterns).

7. See Also

  • docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md - Tier 1's hypothesis (the "what we thought before Tier 2 looked")
  • conductor/tracks/phase2_4_5_call_site_completion_20260621/spec.md - Phase 6e directives
  • conductor/tracks/code_path_audit_20260607/spec.md - the audit that quantifies these estimates
  • docs/handoffs/PROMPT_FOR_TIER_1.md - Tier 1 brief
  • src/provider_state.py - the ProviderHistory dataclass already defined (Phase 0 deliverable from parent track)
  • src/ai_client.py:113-139 - the 7 history globals + 6 locks + 1 _send_lock
  • src/ai_client.py:1245-1485 - the 5 anthropic helpers (most-heavy)