diff --git a/conductor/tracks/data_oriented_error_handling_20260606/spec.md b/conductor/tracks/data_oriented_error_handling_20260606/spec.md index b2901f66..97d4ac2d 100644 --- a/conductor/tracks/data_oriented_error_handling_20260606/spec.md +++ b/conductor/tracks/data_oriented_error_handling_20260606/spec.md @@ -115,6 +115,16 @@ class ErrorKind(str, Enum): UNKNOWN = "unknown" CONFIG = "config" INTERNAL = "internal" + # Added 2026-06-08 per nagent_review Pitfall #4 (provider history divergence). + # The Application edits the entry's content (e.g., user fixes a typo in an AI + # response, or branches at a midpoint via guide_discussions.md §"Per-Entry + # Operations" A1+A4) but the ai_client.__history (the bytes + # actually replayed to the LLM) still contains the original. This is + # silent corruption, not a thrown error. The PROVIDER_HISTORY_DIVERGED_FROM_UI + # kind makes the divergence *detectable* and *reportable* so the follow-up + # public_api_migration_20260606 track can collapse the two history layers + # (see §12.1). + PROVIDER_HISTORY_DIVERGED_FROM_UI = "provider_history_diverged_from_ui" @dataclass(frozen=True) class ErrorInfo: @@ -479,7 +489,7 @@ All existing configs (`config.toml`, `credentials.toml`, per-project TOML) work |---|---|---| | `tests/test_result_types.py` | `Result`, `ErrorInfo`, nil-sentinel singletons. | 100% | | `tests/test_mcp_client_paths.py` | Verify `_resolve_and_check` returns `Result` (not tuple); verify `read_file` returns `Result[str]`. | 90% (covers the new code paths; existing tests still pass) | -| `tests/test_ai_client_result.py` | Verify `_send__result()` returns `Result`; verify `send_result()` is the new public API; verify `send()` emits `DeprecationWarning`. | 90% | +| `tests/test_ai_client_result.py` | Verify `_send__result()` returns `Result`; verify `send_result()` is the new public API; verify `send()` emits `DeprecationWarning`. **State-delegation regression tests (added 2026-06-08 per `docs/guide_state_lifecycle.md` and the 2026-06-08 docs refresh):** verify that `app.temperature = 0.5` round-trips through the `App.__getattr__`/`__setattr__` delegation (per `gui_2.py:666-675`) and is visible in the next `send_result()` call; verify that `controller.disc_entries[i].content = "..."` is reflected in the next `send_result()`'s `messages` parameter (this is the regression vector for nagent_review Pitfall #4, the provider-history divergence); verify that the 3 per-provider history locks (`_anthropic_history_lock`, `_deepseek_history_lock`, `_minimax_history_lock` per `ai_client.py:124,128,132`) serialize correctly under concurrent `send_result()` calls from different threads. These tests are *mandatory* for Phase 3 (the ai_client refactor) because the `App.__getattr__`/`__setattr__` delegation means a partial refactor would manifest as silent `AttributeError`s deep in the test, not at the refactor commit boundary. | 90% | | `tests/test_rag_engine_result.py` | Verify RAG methods return `Result`; verify `NilRAGState` is used. | 80% | | `tests/test_deprecation_warnings.py` | Verify `ai_client.send()` emits exactly one `DeprecationWarning` per call site (cached after first). | 100% | | `tests/test_mcp_client.py` (existing) | Verify no regressions; existing tests pass unchanged. | 100% (regression) | @@ -683,10 +693,15 @@ Plus ~50+ test files that call `send()` directly. The follow-up track's `rg "ai_ ### 12.3 Project References -- `docs/guide_ai_client.md` — current provider architecture; will be updated in Phase 5. -- `docs/guide_mcp_client.md` — current MCP client architecture; will be updated in Phase 5. -- `conductor/product-guidelines.md` "Modular Controller Pattern" — the convention this track extends (Data-Oriented Error Handling is a new top-level convention in the same family). -- `conductor/tracks/qwen_llama_grok_integration_20260606/` — the previous track that introduced the "data-oriented" framing; this track extends that philosophy to error handling. +- `docs/guide_ai_client.md` — current provider architecture; will be updated in Phase 5. The per-provider history globals (`_anthropic_history`, `_deepseek_history`, `_minimax_history` at `ai_client.py:123-132`) are the **specific pattern** that the `ErrorKind.PROVIDER_HISTORY_DIVERGED_FROM_UI` new error kind (added 2026-06-08) is designed to surface. Per `guide_ai_client.md §"State"`, the per-provider-lock pattern is the established convention. +- `docs/guide_mcp_client.md` — current MCP client architecture; will be updated in Phase 5. Per the 2026-06-08 docs refresh, `guide_mcp_client.md` documents the 3-layer security model (Allowlist Construction → Path Validation → Resolution Gate) that the mcp_client refactor must preserve. The new `Result` return type must not weaken the 3 layers. +- `docs/guide_state_lifecycle.md` — added 2026-06-08. The 3 per-thread + 7-lock pattern documented in §4 ("State Synchronization Across Threads") is what the `ai_client` refactor's state-delegation regression tests must exercise. +- `docs/guide_discussions.md` — added 2026-06-08. The 23-operation matrix (A1-A7 + B1-B11 + C1-C5) is the *user-facing* source of truth for what the per-entry edit operations do. The provider-history-divergence issue (Pitfall #4 from the nagent_review) is exactly that: user edits `disc_entries[i].content` via A1, but `ai_client.__history` is not updated. The follow-up `public_api_migration_20260606` is the natural moment to fix this. +- `docs/guide_context_aggregation.md` — added 2026-06-08. The `aggregate.py:109 build_discussion_section` consumes the `disc_entries` list. If the entries are edited via A1, the section regenerates correctly. If the provider history is *not* updated, the next LLM call still sees the old history. The `Result` pattern from this track is the natural carrier for the "diverged" signal. +- `conductor/tracks/qwen_llama_grok_integration_20260606/` — the previous track that introduced the "data-oriented" framing; this track extends that philosophy to error handling. The qwen track's `send_openai_compatible()` helper is *expected* to return `Result` from day 1 (per the coordination note in the qwen spec §3.1) — this is a real concrete dependency. +- `conductor/tracks/mcp_architecture_refactor_20260606/` — the next major track (after this one). Each sub-MCP's `invoke()` returns `Result[str, ErrorInfo]` per the mcp spec; this track defines the `Result` type that the mcp refactor uses. Coordination: this track ships *before* the mcp refactor can ship Phase 4 (extract Python) onward. +- `conductor/tracks/nagent_review_20260608/report.md` — added 2026-06-08. §15 Pitfalls #2 and #4 (per-provider history globals, stateful singleton) and Pitfall #9 (sub-conversations) inform this track's risk register. Pitfall #4 specifically motivates the new `ErrorKind.PROVIDER_HISTORY_DIVERGED_FROM_UI` kind. +- `conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md` — added 2026-06-08. §9 ("Edit-the-input, not the output") describes the same provider-history-divergence problem; the `Result` pattern + the new error kind are the data-oriented solution. - `conductor/tracks/test_batching_refactor_20260606/` — the previous track that established the "tier-based" pattern; this track uses the same convention format (spec + metadata + state + plan). ### 12.4 External References