From 1fb0d79c0d6a27cd4f9720a97ba314ee5890a3e6 Mon Sep 17 00:00:00 2001 From: conductor-tier2 Date: Mon, 8 Jun 2026 20:50:50 -0400 Subject: [PATCH] conductor(track-update): data_structure_strengthening - HistoryMessage vs ProviderHistoryMessage split MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 4 surgical additions to the spec, no task changes: 1. ProviderHistoryMessage: Added a new alias to §3.1 (The Aliases). Per nagent_review Pitfall #4 (provider history divergence), the UI/curation layer (HistoryMessage, edited via disc_entries[i].content) and the SDK layer (ProviderHistoryMessage, the bytes actually replayed to the LLM) are *distinct*. Conflating them via a single alias perpetuates the bug. The new alias is documented as a separate concept with its own use sites (_anthropic_history, _deepseek_history, _minimax_history, _grok_history, _llama_history). The follow-up public_api_migration_20260606 track is the natural moment to unify the two layers; this spec just makes the distinction explicit. 2. FileItem alias points to the existing models.FileItem dataclass, not Metadata. Per docs/guide_context_aggregation.md (added 2026-06-08), FileItem is a 9-field dataclass (path, auto_aggregate, force_full, view_mode, selected, ast_signatures, ast_definitions, ast_mask, custom_slices, injected_at) with a __post_init__ normalizer. Aliasing it to dict[str, Any] would lose the type safety. The 9 other aliases remain dict aliases for round-trip compatibility. 3. gui_2.py and mcp_client.py as follow-up: Added a Note (dated 2026-06-08) to the Out of Scope section. The 23 lower-impact files (deferred) are dominated by gui_2.py (26+ weak sites per guide_state_lifecycle.md) and mcp_client.py (will be touched heavily by the parallel mcp_architecture_refactor_20260606). The deferral is correct but the follow-up should explicitly call out these two files as the next targets, rather than implying they're handled. 4. See Also cross-references: Added 7 new entries to §12.2: - docs/guide_models.md (FileItem dataclass source) - docs/guide_context_aggregation.md (FileItems consumer) - docs/guide_discussions.md (HistoryMessage shape) - docs/guide_state_lifecycle.md (state delegation) - conductor/tracks/mcp_architecture_refactor_20260606/ - conductor/tracks/nagent_review_20260608/{report,takeaways}.md No plan.md changes. --- .../spec.md | 59 +++++++++++++++---- 1 file changed, 49 insertions(+), 10 deletions(-) diff --git a/conductor/tracks/data_structure_strengthening_20260606/spec.md b/conductor/tracks/data_structure_strengthening_20260606/spec.md index dc202b10..648adb56 100644 --- a/conductor/tracks/data_structure_strengthening_20260606/spec.md +++ b/conductor/tracks/data_structure_strengthening_20260606/spec.md @@ -74,18 +74,50 @@ CommsLogEntry: TypeAlias = Metadata # A list of comms log entries. CommsLog: TypeAlias = list[CommsLogEntry] -# A single entry in the AI provider's conversation history (the messages -# list passed to/from OpenAI/Anthropic/Gemini). Used by _anthropic_history, -# _deepseek_history, _minimax_history, _grok_history, _llama_history, etc. +# A single entry in the Application's discussion (the UI-layer entry list +# persisted to project TOML; see docs/guide_discussions.md §"Data Model"). +# Per the docs refresh (2026-06-08), this has at least 7 fields: +# {role, content, collapsed, ts, thinking_segments?, usage?, read_mode?}. +# Plus optional extras (e.g., tag, comment from custom slices). +# Uses Metadata (dict[str, Any]) because the dict is intentionally OPEN — +# extra keys are allowed and ignored by the renderer. The alias docstring +# documents the minimum required keys, not the full schema. +# +# IMPORTANT (added 2026-06-08 per nagent_review Pitfall #4): this is the +# UI/curation-layer history. It is *distinct* from ProviderHistoryMessage +# below, which is the provider-side history (the bytes actually replayed +# to the LLM). Conflating them perpetuates the provider-history-divergence +# bug: user edits HistoryMessage.content via the discussion UI but +# ProviderHistoryMessage.content is not updated. The follow-up +# public_api_migration_20260606 track is the natural moment to unify. HistoryMessage: TypeAlias = Metadata # A list of history messages. History: TypeAlias = list[HistoryMessage] -# A single file item in the context (path, content, is_image flag, base64 -# data, mtime). Used by file_items parameter (the most-threated list in -# the codebase), _reread_file_items, _build_file_context_text, etc. -FileItem: TypeAlias = Metadata +# Provider-side history entry: a single message passed to/from the LLM +# SDK (OpenAI/Anthropic/Gemini/DeepSeek/etc.). Per the docs refresh and +# the nagent_review (Pitfall #4), this is a DIFFERENT layer from +# HistoryMessage. Shape: {role: "user"|"assistant"|"tool"|"system", +# content: str | list[ContentBlock], tool_calls?: [...], +# tool_call_id?: str, name?: str}. Aliased to Metadata for the same +# reason HistoryMessage is (open shape; type aliases as semantic +# names, not structural constraints). The distinction from +# HistoryMessage is the alias name, not the underlying dict shape. +ProviderHistoryMessage: TypeAlias = Metadata + +# A list of provider history messages. +ProviderHistory: TypeAlias = list[ProviderHistoryMessage] + +# A single file item in the context. Per docs/guide_context_aggregation.md +# §"The FileItem Schema (Full)" (added 2026-06-08), this is a 9-field +# dataclass: {path, auto_aggregate, force_full, view_mode, selected, +# ast_signatures, ast_definitions, ast_mask, custom_slices, injected_at}. +# The alias does NOT point to Metadata — it points to the existing +# models.FileItem class. This is the only alias in the 10 that is not +# a dict alias; the others remain dict aliases for compatibility with +# the FileItem.to_dict()/from_dict() round-trip. +FileItem: TypeAlias = "models.FileItem" # type: ignore[misc] # A list of file items. The most common weak pattern in the codebase. FileItems: TypeAlias = list[FileItem] @@ -386,7 +418,7 @@ Each phase has its own checkpoint commit and git note. ## 10. Out of Scope (Explicit) - **TypedDict / @dataclass migration** of the `Metadata` family. The type registry (added in Phase 2) captures the field information in docs form, with much lower upfront cost than `TypedDict` migration. A future track MAY convert the most-used aliases to `TypedDict` (giving the AI schema hints via type hints instead of via docs); this is a separate decision. -- **The 23 lower-impact files** (those with 1-9 weak sites each). Deferred; will be addressed opportunistically or in a future incremental track. +- **The 23 lower-impact files** (those with 1-9 weak sites each). Deferred; will be addressed opportunistically or in a future incremental track. **Note (added 2026-06-08):** this list is dominated by `src/gui_2.py` (26+ weak sites per `docs/guide_state_lifecycle.md` §"State Delegation" and §"Reset" — `_disc_entries_lock` references, `_last_ui_snapshot`, the `UISnapshot` capture/restore, the 30+ fields cleared in `_handle_reset_session`) and `src/mcp_client.py` (will be touched heavily by the parallel `mcp_architecture_refactor_20260606` track). The deferral is correct, but a *follow-up* track should explicitly call out gui_2.py and mcp_client.py as the next targets, rather than implying they're handled. - **Adding pydantic models.** Not requested; would be a much larger architectural decision. - **Changing function signatures at the runtime level.** The aliases are TYPE-LEVEL; runtime behavior is identical. - **Modifying `scripts/audit_weak_types.py`'s regex patterns.** The patterns are correct for the current findings. If new patterns emerge, a future track can extend the script. @@ -412,9 +444,16 @@ Each phase has its own checkpoint commit and git note. - `scripts/audit_weak_types.py` (already committed; `84fd9ac9`) — the audit that found 430 weak sites. - `docs/guide_testing.md` — test conventions. -- `conductor/code_styleguides/error_handling.md` (created in the data_oriented_error_handling_20260606 track) — the convention for `Result` types; the new type-aliases convention lives alongside. +- `docs/guide_models.md` — the existing `models.py:510-559 FileItem` dataclass is the *concrete* class the new `FileItem` alias points to. Per the 2026-06-08 docs refresh, the FileItem schema (9 fields + `__post_init__` normalizer) is documented in `docs/guide_context_aggregation.md §"The FileItem Schema (Full)"`. +- `docs/guide_context_aggregation.md` — added 2026-06-08. The `aggregate.py:142 build_file_items` function consumes the `FileItem` list; the `FileItems: TypeAlias` is the consumer-side type. +- `docs/guide_discussions.md` — added 2026-06-08. The entry dict shape (the `HistoryMessage` alias) is documented here. The shape has at least 7 fields (`{role, content, collapsed, ts, thinking_segments?, usage?, read_mode?}`) plus optional extras. The alias docstring notes the dict is *open* — extra keys are allowed. +- `docs/guide_state_lifecycle.md` — added 2026-06-08. The `App.__getattr__`/`__setattr__` state delegation (per `gui_2.py:666-675`) and the `UISnapshot` capture (`gui_2.py:735-789`) are the *correctness* the alias-typed code must preserve; aliases are TYPE-LEVEL ONLY and don't change runtime behavior. +- `conductor/code_styleguides/error_handling.md` (created in the data_oriented_error_handling_20260606 track) — the convention for `Result` types; the new type-aliases convention lives alongside. The two conventions are *complementary*: aliases name the *data* (`T` in `Result[T]`); `Result` wraps the *control flow*. See §3.5 of the spec. - `conductor/product-guidelines.md` "Data-Oriented Error Handling" — the convention this track extends (Data Structure Strengthening is a new top-level convention in the same family). -- `conductor/tracks/data_oriented_error_handling_20260606/` — the previous track that established the convention format; this track uses the same pattern. +- `conductor/tracks/data_oriented_error_handling_20260606/` — the previous track that established the convention format; this track uses the same pattern. The new `ProviderHistoryMessage` alias (added 2026-06-08) is the *concrete manifestation* of nagent_review Pitfall #4 (provider-history divergence) — the user's edits to the `HistoryMessage` (UI layer) are a different layer from the `ProviderHistoryMessage` (SDK layer), and conflating them perpetuates the bug. +- `conductor/tracks/mcp_architecture_refactor_20260606/` — the parallel major track. `mcp_client.py` is currently listed as "UNCHANGED (only 9 weak sites; below the threshold)" in the module layout, but the refactor will touch it heavily; the audit script should be re-run after the mcp refactor lands, and a follow-up type-aliases pass on mcp_client.py is the natural next target. +- `conductor/tracks/nagent_review_20260608/report.md` — added 2026-06-08. §6 (per-file memory) and §15 Pitfall #4 (provider history divergence) directly motivate the `HistoryMessage` vs `ProviderHistoryMessage` split in §3.1 of this spec. +- `conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md` — added 2026-06-08. §9 (edit-the-input, not the output) describes the bug the new alias split addresses. ### 12.3 External References