Private

Public Access

Files

T

ed 751b94d4e8 Revert "merge: tier2/phase2_4_5_call_site_completion_20260621 (parent + follow-up + Phase 6e analysis)"

This reverts commit f914b2bcd4, reversing
changes made to 7fef95cc87.

2026-06-21 22:39:14 -04:00

16 KiB

Raw Blame History

Phase 3 Hypothetical Cost Analysis (Tier 2 authoritative version)

Author: Tier 2 Tech Lead (autonomous sandbox) Date: 2026-06-21 Context: Produced during phase2_4_5_call_site_completion_20260621 Phase 6e (after Phase 6b/6d work in src/ai_client.py). Supersedes: Tier 1's hypothesis at docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md (kept as the hypothesis doc; this is the refined version with in-context data).

1. Methodology

Tier 2 profiled all 6 OpenAI-compatible/anthropic senders in src/ai_client.py (_send_anthropic, _send_deepseek, _send_minimax, _send_grok, _send_qwen, _send_llama) while doing the Phase 6b migration work (3 senders migrated to ChatMessage API). The Phase 6d task was effectively a no-op because NormalizedResponse already uses UsageStats throughout src/openai_compatible.py (verified by Select-String 'NormalizedResponse\(' in src/openai_compatible.py).

This analysis is grounded in:

Actual Select-String counts of _<provider>_history + _<provider>_history_lock references
Read of _send_grok (L2532-2587), _send_minimax (L2616-2679), _send_llama (L2856-2917) end-to-end during Phase 6b migration
Read of _send_anthropic (L1432-1590) including its with _anthropic_history_lock: blocks
Read of _send_deepseek (L2179-2230) and _send_qwen (L2680-2750) for context
Helper function definitions: _strip_cache_controls, _add_history_cache_breakpoint, _estimate_prompt_tokens, _strip_private_keys, _repair_anthropic_history, _repair_deepseek_history, _repair_minimax_history, _trim_anthropic_history, _trim_minimax_history

2. Per-Sender Codepath Catalog

2.1 Reference counts (measured, not estimated)

Provider	Direct `_history` refs	Lock refs	Total	Per-call hot-path?
anthropic	20	2	22	Yes (cache controls, repair, trim, strip, est_tokens)
deepseek	12	6	18	Yes (lock-heavy; multiple append/read blocks)
minimax	14	5	19	Yes (repair + build)
qwen	7	4	11	Mild (fewer calls)
grok	7	6	13	Yes (lock-heavy; 6 locks for 7 refs)
llama	12	9	21	Yes (lock-heavy; native + openai-compat branches)
TOTAL	72	32	104	—

Tier 1's estimate was 112 sites (per metadata.json deferred_work.phase_3_provider_state.estimated_sites). Actual count is 104 (close; 7% under).

2.2 `_send_anthropic` (22 sites) - HIGHEST PRIORITY

Direct sites:

L1445: if discussion_history and not _anthropic_history: (read)
L1449: for msg in _anthropic_history: (iterate)
L1459: _strip_cache_controls(_anthropic_history) (helper)
L1460: _repair_anthropic_history(_anthropic_history) (helper)
L1461: _anthropic_history.append(...) (append)
L1462: _add_history_cache_breakpoint(_anthropic_history) (helper)
L1471: _trim_anthropic_history(system_blocks, _anthropic_history) (helper)
L1473: _estimate_prompt_tokens(system_blocks, _anthropic_history) (helper, read-only)
L1477: len(_anthropic_history) (read)
L1491, L1505: _strip_private_keys(_anthropic_history) (helper, returns new list)
L1508: _anthropic_history.append(...) (append, post-tool-loop)
L1584: _anthropic_history.append(...) (append, post-tool-loop)

Helper sites: _strip_cache_controls (2), _add_history_cache_breakpoint (2), _estimate_prompt_tokens (4 across all senders), _strip_private_keys (3 — all anthropic), _repair_anthropic_history (2), _trim_anthropic_history (2)

Hidden cross-references (Tier 2 found):

_strip_private_keys is a NESTED function inside _send_anthropic (L1466) — Tier 1's grep would only catch the call sites at L1491/1505, not the def itself
_estimate_prompt_tokens is called from _trim_anthropic_history AND _trim_minimax_history (helper-of-helper pattern)
_strip_cache_controls mutates the list in place (no return value) — Phase 3 migration needs with h.lock: h.messages = [m without cache controls] not h.messages = _strip(h.messages)
_add_history_cache_breakpoint also mutates in place — same issue

Lock usage: 2 explicit _anthropic_history_lock references (L485 in cleanup, L1460 in with block); the helpers acquire the lock implicitly because they're called from inside the with block.

2.3 `_send_deepseek` (18 sites)

Direct sites:

L465-468: global _deepseek_history (declaration, in set_provider)
L488-489: cleanup
L2203: with _deepseek_history_lock:
L2204: _repair_deepseek_history(_deepseek_history) (inside with-block)
L2220: _deepseek_history.append(...) (post-prompt build)
L2238: _deepseek_history.append(...) (post-tool-loop)

Helper sites: _repair_deepseek_history (2 calls; called from _send_deepseek AND from cleanup — hidden cross-reference Tier 1 missed)

Lock usage: 6 explicit _deepseek_history_lock references — higher lock usage than anthropic but the deepseek send is single-request (no tool-loop iterations); the 6 locks are mostly in setup/teardown paths.

2.4 `_send_minimax` (19 sites)

Direct sites:

L465, L491: global/cleanup
L2616: _send_minimax def
L2653: _repair_minimax_history(_minimax_history)
L2655, L2656: _minimax_history.append(...) (2x)
L2661-2662: messages: list[Metadata] = [{...}] + messages.extend(_minimax_history) (build request)
L2687 (approx): _trim_minimax_history(system_blocks, _minimax_history) (helper)
L2689 (approx): _estimate_prompt_tokens(system_blocks, _minimax_history) (helper, read-only)

Helper sites: _repair_minimax_history (2), _trim_minimax_history (2), _estimate_prompt_tokens (4 across all senders)

Hidden cross-references:

_minimax_history has a SPECIAL _repair_minimax_history step (other providers don't have this for non-anthropic); the migration needs to preserve the order: _repair_minimax_history(h) BEFORE the append loop
_extract_minimax_reasoning is a nested helper (no history access but operates on raw_response)

2.5 `_send_qwen` (11 sites) - LOWEST PRIORITY

Direct sites: 7 direct + 4 lock refs (cleanup + send). Smallest surface area.

2.6 `_send_grok` (13 sites)

Direct sites:

L465, L497: global/cleanup
L2573: _grok_history.append(...) (initial user message)
L2589: messages.extend(_grok_history) (build request)

Lock usage: 6 explicit locks — high lock ratio. The send has multiple sequential with _grok_history_lock: blocks (3 distinct blocks: append user msg, build request, post-tool-loop).

2.7 `_send_llama` (21 sites)

Direct sites: 12 direct + 9 lock refs. The 9 lock refs come from: (1) llama has BOTH _send_llama (OpenAI-compatible) AND _send_llama_native (Ollama); the native path also touches _llama_history.

Hidden cross-references:

_send_llama is a router — checks for localhost/127.0.0.1 and delegates to _send_llama_native. The native path also locks _llama_history for reasoning extraction.
This is the ONLY provider with a dual-path architecture — Phase 3 migration needs to handle both paths identically.

3. Qualitative Cost Estimation

3.1 Per-call cost categories (microsecond estimates; refined from Tier 1)

Category	Current (dict globals)	Proposed (ProviderHistory dataclass)	Per-call delta
`_<provider>_history.append(m)`	dict.append (~100ns)	`h.append(m)` (lock acquire + append) (~300ns)	+200ns/call
`len(_<provider>_history)`	direct attribute (~50ns)	`len(h.messages)` (~100ns)	+50ns/call
`for m in _<provider>_history:`	direct iteration	`with h.lock: msg_list = list(h.messages)` then iterate	+5-10µs/call (list copy)
`with _<provider>_history_lock:`	direct lock	`with h.lock:` (same lock, just access via attribute)	~0 (same lock)
`_global _<provider>_history` (cleanup)	direct module global	`h.clear()` (lock acquire + clear)	+200ns/call (1 per session)
`h.get_all()` (new pattern)	n/a	`list(h.messages)` inside lock	+5-10µs/call (list copy)

Tier 1's estimates were pessimistic (they assumed all iterations would need h.get_all() and pay 5-10µs each). Tier 2 found that the iterations are 1-2 per LLM turn, not per-message.

3.2 Per-sender per-turn overhead

_send_anthropic (per-turn):

1x append user msg (200ns)
1x append post-tool-loop (200ns)
1x append post-tool-loop (200ns) (2 tool iterations max)
1x with _anthropic_history_lock: (0ns, same lock)
1x _strip_cache_controls (calls with h.lock: h.messages = [...]) = 5-10µs (full iteration + filter)
1x _add_history_cache_breakpoint = 5-10µs (full iteration + maybe-append)
1x _trim_anthropic_history = 5-10µs (full iteration + maybe-trim)
1x _estimate_prompt_tokens = 5-10µs (full iteration + token count)
1x _strip_private_keys (2 sites; non-stream + stream) = 5-10µs x 2 = 10-20µs

Per-turn total for anthropic: ~35-65µs (5-7 helper iterations + 2-3 appends)

_send_deepseek (per-turn):

1x _repair_deepseek_history = 5-10µs (full iteration + repair)
1x append user msg (200ns)
1x append post-tool-loop (200ns)
~3-4x with _deepseek_history_lock: blocks (0ns each, just lock churn)

Per-turn total for deepseek: ~5-10µs (1 helper + 2 appends)

_send_minimax (per-turn):

1x _repair_minimax_history = 5-10µs
2x append user msg (200ns x 2 = 400ns)
1x _trim_minimax_history = 5-10µs
1x _estimate_prompt_tokens = 5-10µs

Per-turn total for minimax: ~15-30µs

_send_grok (per-turn):

1x append user msg (200ns)
1x append post-tool-loop (200ns)
~3x with _grok_history_lock: blocks (0ns each)

Per-turn total for grok: ~400ns (very lean)

_send_qwen (per-turn):

1x append user msg (200ns)
1x append post-tool-loop (200ns)
~2x with _qwen_history_lock: blocks (0ns)

Per-turn total for qwen: ~400ns (leanest)

_send_llama (per-turn):

1x append user msg (200ns)
1x append post-tool-loop (200ns)
~3-4x with _llama_history_lock: blocks (0ns each)

Per-turn total for llama: ~400ns (lean)

3.3 Hot iteration sites (the `with h.lock: msg_list = h.messages` pattern)

Helper	Line	Lock pattern	Per-call cost	Frequency per turn
`_strip_cache_controls(_anthropic_history)`	1459	`with h.lock: h.messages = [filtered]`	5-10µs	1/turn
`_add_history_cache_breakpoint(_anthropic_history)`	1462	`with h.lock: h.messages.append(breakpoint)`	5-10µs	1/turn
`_trim_anthropic_history(...)`	1471	`with h.lock: ...`	5-10µs	1/turn
`_estimate_prompt_tokens(system_blocks, _anthropic_history)`	1473	`with h.lock: read-only sum`	5-10µs	1/turn
`_strip_private_keys(_anthropic_history)`	1491, 1505	`with h.lock: return list(h.messages)`	5-10µs	1-2/turn (stream vs non-stream)
`_repair_anthropic_history(_anthropic_history)`	1460	`with h.lock: in-place mutation`	5-10µs	1/turn
`_repair_deepseek_history(_deepseek_history)`	2204	`with h.lock: in-place mutation`	5-10µs	1/turn
`_repair_minimax_history(_minimax_history)`	2653	`with h.lock: in-place mutation`	5-10µs	1/turn
`_trim_minimax_history(...)`	2687	`with h.lock: ...`	5-10µs	1/turn

Recommendation: Use with h.lock: for in-place mutations (no list copy needed). Use h.get_all() only when the caller needs to OWN the list (e.g., _strip_private_keys returns a new list).

4. Comparison vs Tier 1's Hypothesis

Sender	Tier 1 hypothesis (µs/turn)	Tier 2 refined (µs/turn)	Delta	Reason
anthropic	+8-15	+35-65	+4-7x HIGHER	Tier 1 missed `_strip_cache_controls` + `_add_history_cache_breakpoint` + `_strip_private_keys` (3 additional helpers per turn)
deepseek	+3-7	+5-10	~same	1 helper + 2 appends
minimax	+3-7	+15-30	+2-4x HIGHER	Tier 1 missed `_repair_minimax_history` + `_trim_minimax_history` (2 helpers per turn)
grok	+2-5	+0.4	LOWER	No helper functions; pure appends
qwen	+2-5	+0.4	LOWER	No helper functions; pure appends
llama	+4-8	+0.4	LOWER	No helper functions in openai-compat path; native path is separate
Total session	+1.1-2.4ms	+0.5-1.0ms	LOWER	Anthropic dominates; one turn typically

Honest takeaway: Tier 1's hypothesis was directionally correct but UNDER-estimated anthropic's helper count and OVER-estimated the lean providers. The total per-session overhead is actually LOWER than Tier 1 estimated, but anthropic is HIGHER than estimated.

The audit (code_path_audit_20260607) will measure actual cost with micro-benchmarks (per the plan's Task 6e.2 hook).

5. Recommendations for Future Phase 3 Track

Anthropic FIRST (highest ROI; 5 helpers per turn; cache controls are unique to this provider)
Use with h.lock: msg_list = h.messages for read iterations that need a snapshot (avoids get_all()'s list-copy cost when caller can work inside the lock)
Use h.get_all() ONLY when the caller needs to OWN the list outside the lock (e.g., _strip_private_keys returns the list to the Anthropic SDK which holds it during the HTTP call)
Use with h.lock: h.messages = [filtered] for in-place mutations (e.g., _strip_cache_controls, _add_history_cache_breakpoint)
Lock semantics unchanged — ProviderHistory.lock is per-instance; no cross-provider contention (verified: 6 separate threading.Lock() instances at L114/118/122/126/131/135)
Hidden cross-references to migrate FIRST:
- _strip_private_keys (nested in _send_anthropic, returns new list — needs h.get_all() or explicit snapshot)
- _extract_minimax_reasoning (nested in _send_minimax, no history access but operates on raw_response — safe to skip)
- _send_llama_native (separate path; also touches _llama_history — must migrate in lock-step with _send_llama)

6. Open Questions

Anthropic cache_control semantics: _strip_cache_controls REMOVES cache_control markers; _add_history_cache_breakpoint ADDS them. Does removing them then re-adding them within the same request cost a cache miss on Anthropic's side? (Need to verify with Anthropic API docs / behavioral test.)
_trim_<provider>_history mutation vs return: Both helpers do in-place mutation. After Phase 3, do they need to return the new length to the caller (for logging), or can the caller just check len(h.messages) after the helper returns?
Lock granularity: The _send_lock (L139) is a global per-vendor-call lock (serialize all sends across providers). The 6 _history_locks are per-history. After Phase 3, _send_lock stays as-is; only the 6 history globals migrate. (No code change to _send_lock needed.)
Tool-loop iterations: _send_grok, _send_anthropic, _send_minimax, _send_llama all use run_with_tool_loop which can iterate 2-5 times. The per-iteration cost of h.append(...) is small, but the per-iteration lock churn is non-trivial. Tier 1 estimated 2-5 iterations; Tier 2 confirmed (looking at run_with_tool_loop patterns).

7. See Also

docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md - Tier 1's hypothesis (the "what we thought before Tier 2 looked")
conductor/tracks/phase2_4_5_call_site_completion_20260621/spec.md - Phase 6e directives
conductor/tracks/code_path_audit_20260607/spec.md - the audit that quantifies these estimates
docs/handoffs/PROMPT_FOR_TIER_1.md - Tier 1 brief
src/provider_state.py - the ProviderHistory dataclass already defined (Phase 0 deliverable from parent track)
src/ai_client.py:113-139 - the 7 history globals + 6 locks + 1 _send_lock
src/ai_client.py:1245-1485 - the 5 anthropic helpers (most-heavy)

16 KiB Raw Blame History

Phase 3 Hypothetical Cost Analysis (Tier 2 authoritative version)

1. Methodology

2. Per-Sender Codepath Catalog

2.1 Reference counts (measured, not estimated)

2.2 _send_anthropic (22 sites) - HIGHEST PRIORITY

2.3 _send_deepseek (18 sites)

2.4 _send_minimax (19 sites)

2.5 _send_qwen (11 sites) - LOWEST PRIORITY

2.6 _send_grok (13 sites)

2.7 _send_llama (21 sites)

3. Qualitative Cost Estimation

3.1 Per-call cost categories (microsecond estimates; refined from Tier 1)

3.2 Per-sender per-turn overhead

3.3 Hot iteration sites (the with h.lock: msg_list = h.messages pattern)

4. Comparison vs Tier 1's Hypothesis

5. Recommendations for Future Phase 3 Track

6. Open Questions

7. See Also

16 KiB

Raw Blame History

2.2 `_send_anthropic` (22 sites) - HIGHEST PRIORITY

2.3 `_send_deepseek` (18 sites)

2.4 `_send_minimax` (19 sites)

2.5 `_send_qwen` (11 sites) - LOWEST PRIORITY

2.6 `_send_grok` (13 sites)

2.7 `_send_llama` (21 sites)

3.3 Hot iteration sites (the `with h.lock: msg_list = h.messages` pattern)