conductor(track-update): data_structure_strengthening - HistoryMessage vs ProviderHistoryMessage split
4 surgical additions to the spec, no task changes: 1. ProviderHistoryMessage: Added a new alias to §3.1 (The Aliases). Per nagent_review Pitfall #4 (provider history divergence), the UI/curation layer (HistoryMessage, edited via disc_entries[i].content) and the SDK layer (ProviderHistoryMessage, the bytes actually replayed to the LLM) are *distinct*. Conflating them via a single alias perpetuates the bug. The new alias is documented as a separate concept with its own use sites (_anthropic_history, _deepseek_history, _minimax_history, _grok_history, _llama_history). The follow-up public_api_migration_20260606 track is the natural moment to unify the two layers; this spec just makes the distinction explicit. 2. FileItem alias points to the existing models.FileItem dataclass, not Metadata. Per docs/guide_context_aggregation.md (added 2026-06-08), FileItem is a 9-field dataclass (path, auto_aggregate, force_full, view_mode, selected, ast_signatures, ast_definitions, ast_mask, custom_slices, injected_at) with a __post_init__ normalizer. Aliasing it to dict[str, Any] would lose the type safety. The 9 other aliases remain dict aliases for round-trip compatibility. 3. gui_2.py and mcp_client.py as follow-up: Added a Note (dated 2026-06-08) to the Out of Scope section. The 23 lower-impact files (deferred) are dominated by gui_2.py (26+ weak sites per guide_state_lifecycle.md) and mcp_client.py (will be touched heavily by the parallel mcp_architecture_refactor_20260606). The deferral is correct but the follow-up should explicitly call out these two files as the next targets, rather than implying they're handled. 4. See Also cross-references: Added 7 new entries to §12.2: - docs/guide_models.md (FileItem dataclass source) - docs/guide_context_aggregation.md (FileItems consumer) - docs/guide_discussions.md (HistoryMessage shape) - docs/guide_state_lifecycle.md (state delegation) - conductor/tracks/mcp_architecture_refactor_20260606/ - conductor/tracks/nagent_review_20260608/{report,takeaways}.md No plan.md changes.
This commit is contained in:
@@ -74,18 +74,50 @@ CommsLogEntry: TypeAlias = Metadata
|
||||
# A list of comms log entries.
|
||||
CommsLog: TypeAlias = list[CommsLogEntry]
|
||||
|
||||
# A single entry in the AI provider's conversation history (the messages
|
||||
# list passed to/from OpenAI/Anthropic/Gemini). Used by _anthropic_history,
|
||||
# _deepseek_history, _minimax_history, _grok_history, _llama_history, etc.
|
||||
# A single entry in the Application's discussion (the UI-layer entry list
|
||||
# persisted to project TOML; see docs/guide_discussions.md §"Data Model").
|
||||
# Per the docs refresh (2026-06-08), this has at least 7 fields:
|
||||
# {role, content, collapsed, ts, thinking_segments?, usage?, read_mode?}.
|
||||
# Plus optional extras (e.g., tag, comment from custom slices).
|
||||
# Uses Metadata (dict[str, Any]) because the dict is intentionally OPEN —
|
||||
# extra keys are allowed and ignored by the renderer. The alias docstring
|
||||
# documents the minimum required keys, not the full schema.
|
||||
#
|
||||
# IMPORTANT (added 2026-06-08 per nagent_review Pitfall #4): this is the
|
||||
# UI/curation-layer history. It is *distinct* from ProviderHistoryMessage
|
||||
# below, which is the provider-side history (the bytes actually replayed
|
||||
# to the LLM). Conflating them perpetuates the provider-history-divergence
|
||||
# bug: user edits HistoryMessage.content via the discussion UI but
|
||||
# ProviderHistoryMessage.content is not updated. The follow-up
|
||||
# public_api_migration_20260606 track is the natural moment to unify.
|
||||
HistoryMessage: TypeAlias = Metadata
|
||||
|
||||
# A list of history messages.
|
||||
History: TypeAlias = list[HistoryMessage]
|
||||
|
||||
# A single file item in the context (path, content, is_image flag, base64
|
||||
# data, mtime). Used by file_items parameter (the most-threated list in
|
||||
# the codebase), _reread_file_items, _build_file_context_text, etc.
|
||||
FileItem: TypeAlias = Metadata
|
||||
# Provider-side history entry: a single message passed to/from the LLM
|
||||
# SDK (OpenAI/Anthropic/Gemini/DeepSeek/etc.). Per the docs refresh and
|
||||
# the nagent_review (Pitfall #4), this is a DIFFERENT layer from
|
||||
# HistoryMessage. Shape: {role: "user"|"assistant"|"tool"|"system",
|
||||
# content: str | list[ContentBlock], tool_calls?: [...],
|
||||
# tool_call_id?: str, name?: str}. Aliased to Metadata for the same
|
||||
# reason HistoryMessage is (open shape; type aliases as semantic
|
||||
# names, not structural constraints). The distinction from
|
||||
# HistoryMessage is the alias name, not the underlying dict shape.
|
||||
ProviderHistoryMessage: TypeAlias = Metadata
|
||||
|
||||
# A list of provider history messages.
|
||||
ProviderHistory: TypeAlias = list[ProviderHistoryMessage]
|
||||
|
||||
# A single file item in the context. Per docs/guide_context_aggregation.md
|
||||
# §"The FileItem Schema (Full)" (added 2026-06-08), this is a 9-field
|
||||
# dataclass: {path, auto_aggregate, force_full, view_mode, selected,
|
||||
# ast_signatures, ast_definitions, ast_mask, custom_slices, injected_at}.
|
||||
# The alias does NOT point to Metadata — it points to the existing
|
||||
# models.FileItem class. This is the only alias in the 10 that is not
|
||||
# a dict alias; the others remain dict aliases for compatibility with
|
||||
# the FileItem.to_dict()/from_dict() round-trip.
|
||||
FileItem: TypeAlias = "models.FileItem" # type: ignore[misc]
|
||||
|
||||
# A list of file items. The most common weak pattern in the codebase.
|
||||
FileItems: TypeAlias = list[FileItem]
|
||||
@@ -386,7 +418,7 @@ Each phase has its own checkpoint commit and git note.
|
||||
## 10. Out of Scope (Explicit)
|
||||
|
||||
- **TypedDict / @dataclass migration** of the `Metadata` family. The type registry (added in Phase 2) captures the field information in docs form, with much lower upfront cost than `TypedDict` migration. A future track MAY convert the most-used aliases to `TypedDict` (giving the AI schema hints via type hints instead of via docs); this is a separate decision.
|
||||
- **The 23 lower-impact files** (those with 1-9 weak sites each). Deferred; will be addressed opportunistically or in a future incremental track.
|
||||
- **The 23 lower-impact files** (those with 1-9 weak sites each). Deferred; will be addressed opportunistically or in a future incremental track. **Note (added 2026-06-08):** this list is dominated by `src/gui_2.py` (26+ weak sites per `docs/guide_state_lifecycle.md` §"State Delegation" and §"Reset" — `_disc_entries_lock` references, `_last_ui_snapshot`, the `UISnapshot` capture/restore, the 30+ fields cleared in `_handle_reset_session`) and `src/mcp_client.py` (will be touched heavily by the parallel `mcp_architecture_refactor_20260606` track). The deferral is correct, but a *follow-up* track should explicitly call out gui_2.py and mcp_client.py as the next targets, rather than implying they're handled.
|
||||
- **Adding pydantic models.** Not requested; would be a much larger architectural decision.
|
||||
- **Changing function signatures at the runtime level.** The aliases are TYPE-LEVEL; runtime behavior is identical.
|
||||
- **Modifying `scripts/audit_weak_types.py`'s regex patterns.** The patterns are correct for the current findings. If new patterns emerge, a future track can extend the script.
|
||||
@@ -412,9 +444,16 @@ Each phase has its own checkpoint commit and git note.
|
||||
|
||||
- `scripts/audit_weak_types.py` (already committed; `84fd9ac9`) — the audit that found 430 weak sites.
|
||||
- `docs/guide_testing.md` — test conventions.
|
||||
- `conductor/code_styleguides/error_handling.md` (created in the data_oriented_error_handling_20260606 track) — the convention for `Result` types; the new type-aliases convention lives alongside.
|
||||
- `docs/guide_models.md` — the existing `models.py:510-559 FileItem` dataclass is the *concrete* class the new `FileItem` alias points to. Per the 2026-06-08 docs refresh, the FileItem schema (9 fields + `__post_init__` normalizer) is documented in `docs/guide_context_aggregation.md §"The FileItem Schema (Full)"`.
|
||||
- `docs/guide_context_aggregation.md` — added 2026-06-08. The `aggregate.py:142 build_file_items` function consumes the `FileItem` list; the `FileItems: TypeAlias` is the consumer-side type.
|
||||
- `docs/guide_discussions.md` — added 2026-06-08. The entry dict shape (the `HistoryMessage` alias) is documented here. The shape has at least 7 fields (`{role, content, collapsed, ts, thinking_segments?, usage?, read_mode?}`) plus optional extras. The alias docstring notes the dict is *open* — extra keys are allowed.
|
||||
- `docs/guide_state_lifecycle.md` — added 2026-06-08. The `App.__getattr__`/`__setattr__` state delegation (per `gui_2.py:666-675`) and the `UISnapshot` capture (`gui_2.py:735-789`) are the *correctness* the alias-typed code must preserve; aliases are TYPE-LEVEL ONLY and don't change runtime behavior.
|
||||
- `conductor/code_styleguides/error_handling.md` (created in the data_oriented_error_handling_20260606 track) — the convention for `Result` types; the new type-aliases convention lives alongside. The two conventions are *complementary*: aliases name the *data* (`T` in `Result[T]`); `Result` wraps the *control flow*. See §3.5 of the spec.
|
||||
- `conductor/product-guidelines.md` "Data-Oriented Error Handling" — the convention this track extends (Data Structure Strengthening is a new top-level convention in the same family).
|
||||
- `conductor/tracks/data_oriented_error_handling_20260606/` — the previous track that established the convention format; this track uses the same pattern.
|
||||
- `conductor/tracks/data_oriented_error_handling_20260606/` — the previous track that established the convention format; this track uses the same pattern. The new `ProviderHistoryMessage` alias (added 2026-06-08) is the *concrete manifestation* of nagent_review Pitfall #4 (provider-history divergence) — the user's edits to the `HistoryMessage` (UI layer) are a different layer from the `ProviderHistoryMessage` (SDK layer), and conflating them perpetuates the bug.
|
||||
- `conductor/tracks/mcp_architecture_refactor_20260606/` — the parallel major track. `mcp_client.py` is currently listed as "UNCHANGED (only 9 weak sites; below the threshold)" in the module layout, but the refactor will touch it heavily; the audit script should be re-run after the mcp refactor lands, and a follow-up type-aliases pass on mcp_client.py is the natural next target.
|
||||
- `conductor/tracks/nagent_review_20260608/report.md` — added 2026-06-08. §6 (per-file memory) and §15 Pitfall #4 (provider history divergence) directly motivate the `HistoryMessage` vs `ProviderHistoryMessage` split in §3.1 of this spec.
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md` — added 2026-06-08. §9 (edit-the-input, not the output) describes the bug the new alias split addresses.
|
||||
|
||||
### 12.3 External References
|
||||
|
||||
|
||||
Reference in New Issue
Block a user