docs(models): add guide_models.md

2026-06-02 23:38:52 -04:00
parent 9ea7989f90
commit 7ade88d577
1 changed files with 556 additions and 0 deletions
@@ -0,0 +1,556 @@
+# `src/models.py` — Data Models
+
+[Top](../README.md) | [Architecture](guide_architecture.md) | [MMA](guide_mma.md) | [App Controller](guide_app_controller.md)
+
+---
+
+## Overview
+
+`src/models.py` (~132KB) is the **centralized data model registry**. It defines every data structure used across the app — Tickets, Tracks, Personas, Presets, Discussion entries, Context files, etc. — using `pydantic` and `dataclasses`.
+
+The file exists to **eliminate redundant model definitions** scattered across modules. It also serves as the single source of truth for serialization (TOML, JSON-L, Markdown).
+
+---
+
+## Design Principles
+
+1. **One place to look for any data structure**: If you need to know what fields a `Ticket` has, look here.
+2. **Strict types**: `pydantic` for fields with validation, `dataclasses` for internal structures.
+3. **No business logic**: Models are pure data. Methods like `to_toml()` are allowed; methods like `execute()` are not.
+4. **SDM tags**: Every model has `[C: ...]` (callers) and `[M: ...]` (mutators) tags in docstrings for AI-assisted impact analysis.
+
+---
+
+## Model Categories
+
+The file is organized into regions:
+
+```python
+#region: Core Models
+#endregion: Core Models
+
+#region: AI Models
+#endregion: AI Models
+
+#region: Preset Models
+#endregion: Preset Models
+
+#region: Persona Models
+#endregion: Persona Models
+
+#region: Context Models
+#endregion: Context Models
+
+#region: MMA Models
+#endregion: MMA Models
+
+#region: UI State Models
+#endregion: UI State Models
+
+#region: Logging Models
+#endregion: Logging Models
+```
+
+### `Provider`, `ModelInfo` — AI Models
+
+```python
+class Provider(str, Enum):
+    GEMINI = "gemini"
+    ANTHROPIC = "anthropic"
+    DEEPSEEK = "deepseek"
+    MINIMAX = "MiniMax"
+    GEMINI_CLI = "gemini-cli"
+
+@dataclass
+class ModelInfo:
+    name: str
+    provider: Provider
+    context_window: int
+    max_output_tokens: int
+    supports_caching: bool = False
+    cost_per_1k_input: float = 0.0
+    cost_per_1k_output: float = 0.0
+```
+
+These back the AI Settings panel and the cost tracker.
+
+### `DiscussionEntry`, `Message` — Discussion History
+
+```python
+@dataclass
+class Message:
+    role: Literal["user", "assistant", "system", "tool"]
+    content: str
+    timestamp: float
+    metadata: dict[str, Any] = field(default_factory=dict)
+
+@dataclass
+class DiscussionEntry:
+    entry_id: str
+    messages: list[Message]
+    is_take_root: bool = False  # First message of a "take" (timeline branch)
+    parent_take_id: str | None = None
+    metadata: dict[str, Any] = field(default_factory=dict)
+```
+
+Discussion history is a list of `DiscussionEntry` objects, each containing one or more `Message` objects. The branching structure supports "takes" (alternative timeline branches).
+
+### `ContextFileEntry`, `ContextScreenshot` — Context
+
+```python
+class ViewMode(str, Enum):
+    FULL = "full"
+    SUMMARIZE = "summarize"
+    SKELETON = "skeleton"
+    OUTLINE = "outline"
+    NONE = "none"
+
+@dataclass
+class ContextFileEntry:
+    path: str  # Absolute or relative to project root
+    view_mode: ViewMode = ViewMode.FULL
+    annotations: list[Annotation] = field(default_factory=list)
+    fuzzy_slice: FuzzySlice | None = None  # Optional line range
+
+@dataclass
+class ContextScreenshot:
+    path: str  # Absolute path to image
+    caption: str = ""
+```
+
+Context is a composition of files + screenshots, each with optional view mode and line-range slicing.
+
+### `FuzzySlice`, `Annotation` — Visual Slice Editor
+
+```python
+@dataclass
+class FuzzySlice:
+    start_anchor: str  # Fuzzy-matched string
+    end_anchor: str
+    start_offset: int = 0
+    end_offset: int = 0
+    fallback_start_line: int | None = None
+    fallback_end_line: int | None = None
+
+@dataclass
+class Annotation:
+    kind: Literal["tag", "comment"]
+    text: str
+    line_range: tuple[int, int] | None = None
+```
+
+Fuzzy slices use **anchor-based matching** to survive code modifications. If `start_anchor` shifts due to edits, the slice re-anchors on the next render.
+
+See **[docs/guide_context_curation.md](guide_context_curation.md)** for the full Visual Slice Editor.
+
+### `Ticket`, `Track`, `WorkerContext` — MMA
+
+```python
+class TicketStatus(str, Enum):
+    PENDING = "pending"
+    RUNNING = "running"
+    DONE = "done"
+    BLOCKED = "blocked"
+    SKIPPED = "skipped"
+
+class TicketPriority(str, Enum):
+    HIGH = "high"
+    MEDIUM = "medium"
+    LOW = "low"
+
+@dataclass
+class Ticket:
+    ticket_id: str
+    title: str
+    description: str
+    status: TicketStatus = TicketStatus.PENDING
+    priority: TicketPriority = TicketPriority.MEDIUM
+    depends_on: list[str] = field(default_factory=list)
+    blocks: list[str] = field(default_factory=list)
+    files_involved: list[str] = field(default_factory=list)
+    persona: str | None = None
+    result: dict | None = None
+    error: str | None = None
+    commit_sha: str | None = None
+
+@dataclass
+class Track:
+    track_id: str
+    title: str
+    description: str
+    tickets: list[Ticket]
+    plan_path: str
+    created_at: float
+    checkpoints: list[TrackCheckpoint] = field(default_factory=list)
+
+@dataclass
+class TrackCheckpoint:
+    sha: str
+    phase: str
+    timestamp: float
+    note: str
+
+@dataclass
+class WorkerContext:
+    """The minimal context slice given to a tier3-worker sub-agent."""
+    ticket_id: str
+    track_id: str
+    persona: str | None
+    focus_files: list[str]
+    skeleton_views: dict[str, str]  # path -> skeleton string
+    history: list[Message]  # Recent messages from the parent
+    conductor_notes: str
+```
+
+`WorkerContext` is the **Token Firewall** boundary: this is exactly what each Tier 3 worker sees. It includes only the focus files, their skeletons, and recent history. The parent agent's full state is never visible.
+
+### `Persona`, `Preset`, `ContextPreset`, `ToolPreset` — Configuration
+
+```python
+@dataclass
+class Persona:
+    name: str
+    model: str | None = None
+    system_prompt: str | None = None
+    tool_weights: dict[str, int] = field(default_factory=dict)  # tool_name -> 1..5
+    parameter_biases: dict[str, Any] = field(default_factory=dict)
+    bias_profile: str | None = None
+    tier_assignments: dict[str, str] = field(default_factory=dict)  # tier -> persona_name
+    description: str = ""
+
+@dataclass
+class Preset:
+    name: str
+    base_prompt: str
+    user_instructions: str
+    full_text: str  # base_prompt + user_instructions
+    temperature: float = 0.7
+    top_p: float = 0.95
+    max_output_tokens: int = 8192
+    is_foundation: bool = False  # True for the foundational base prompt
+
+@dataclass
+class ContextPreset:
+    name: str
+    files: list[ContextFileEntry]
+    screenshots: list[ContextScreenshot]
+    description: str = ""
+    last_validated: float = 0.0
+
+@dataclass
+class ToolPreset:
+    name: str
+    enabled_tools: dict[str, bool] = field(default_factory=dict)  # tool_name -> enabled
+    weights: dict[str, int] = field(default_factory=dict)  # tool_name -> 1..5
+    parameter_biases: dict[str, Any] = field(default_factory=dict)
+    bias_profile: str | None = None
+    description: str = ""
+```
+
+Personas consolidate **everything an agent needs** into a single named entity. Presets are simpler — just system prompt + parameters.
+
+### `CommsLogEntry`, `LogEntry` — Logging
+
+```python
+@dataclass
+class CommsLogEntry:
+    timestamp: float
+    source: str  # "main", "tier3-worker", "tier4-qa"
+    role: str  # "user", "assistant", "system"
+    payload_type: str  # "prompt", "response", "tool_call", "tool_result"
+    content: str
+    metadata: dict[str, Any] = field(default_factory=dict)
+    ticket_id: str | None = None
+
+@dataclass
+class LogEntry:
+    timestamp: float
+    level: Literal["DEBUG", "INFO", "WARNING", "ERROR"]
+    message: str
+    source: str  # Module or subsystem name
+    context: dict[str, Any] = field(default_factory=dict)
+```
+
+Comms logs are append-only and stored as JSON-L. They are the **primary debugging surface** for AI interactions.
+
+### `UIPerformanceSnapshot`, `DiagnosticEntry` — Diagnostics
+
+```python
+@dataclass
+class UIPerformanceSnapshot:
+    timestamp: float
+    fps: float
+    frame_time_ms: float
+    cpu_pct: float
+    input_lag_ms: float
+
+@dataclass
+class DiagnosticEntry:
+    timestamp: float
+    component: str  # "DAG Engine", "Aggregation", "Panel:Command Palette"
+    hit_count: int
+    total_latency_ms: float
+    peak_latency_ms: float
+    min_latency_ms: float
+```
+
+Diagnostics power the **Performance Diagnostics** panel (FPS, Frame Time, CPU, plus per-component hit counts and latencies).
+
+### `HookRequest`, `HookResponse` — Hook API
+
+```python
+@dataclass
+class HookRequest:
+    action: str  # "click", "set_value", "custom_callback", etc.
+    item: str | None = None
+    value: Any = None
+    callback: str | None = None
+    args: list[Any] = field(default_factory=list)
+    kwargs: dict[str, Any] = field(default_factory=dict)
+
+@dataclass
+class HookResponse:
+    status: Literal["ok", "error", "queued", "rejected"]
+    message: str = ""
+    data: dict[str, Any] = field(default_factory=dict)
+```
+
+### `WorkspaceProfile`, `LayoutPreset` — Layouts
+
+```python
+@dataclass
+class WorkspaceProfile:
+    name: str
+    scope: Literal["global", "project"]
+    docking_layout: str  # ImGui ini-string
+    window_visibility: dict[str, bool] = field(default_factory=dict)
+    panel_state: dict[str, dict] = field(default_factory=dict)
+    auto_switch_triggers: list[str] = field(default_factory=list)
+    description: str = ""
+
+@dataclass
+class LayoutPreset:
+    name: str
+    multi_viewport_state: dict[str, Any] = field(default_factory=dict)
+    description: str = ""
+```
+
+### `RAGConfig`, `RAGChunk`, `RAGResult` — RAG
+
+```python
+@dataclass
+class RAGConfig:
+    enabled: bool = False
+    source: Literal["chromadb", "external_mcp"] = "chromadb"
+    embedding_provider: str = "gemini-embedding-001"
+    chunk_size: int = 512
+    chunk_overlap: int = 64
+    top_k: int = 5
+    external_mcp_server: str | None = None
+
+@dataclass
+class RAGChunk:
+    text: str
+    source_path: str
+    start_line: int
+    end_line: int
+    embedding: list[float] = field(default_factory=list)
+
+@dataclass
+class RAGResult:
+    chunks: list[RAGChunk]
+    query: str
+    distance_threshold: float = 0.0
+```
+
+---
+
+## Constants
+
+The file also defines several module-level constants used across the app:
+
+```python
+# Provider routing
+PROVIDERS: list[str] = ["gemini", "anthropic", "deepseek", "MiniMax", "gemini-cli"]
+
+# Tool categories (for Tool Bias)
+TOOL_CATEGORIES: list[str] = [
+    "File I/O",
+    "Python AST",
+    "C/C++ AST",
+    "Analysis",
+    "Network",
+    "Runtime",
+    "Beads",
+]
+
+# MMA tier -> default persona
+DEFAULT_TIER_PERSONAS: dict[str, str] = {
+    "tier1": "orchestrator",
+    "tier2": "tech-lead",
+    "tier3": "worker",
+    "tier4": "qa",
+}
+
+# AGENT_TOOL_NAMES — the canonical list of all 45 tool names
+AGENT_TOOL_NAMES: list[str] = [
+    "read_file", "list_directory", "search_files", "get_file_summary",
+    "get_file_slice", "set_file_slice", "edit_file",
+    # ... all 45 ...
+]
+```
+
+These constants eliminate the **scattered list definitions** problem — every module imports the same source of truth.
+
+---
+
+## Serialization
+
+Models use a mix of strategies:
+- **`pydantic` models**: For TOML round-trip with validation (Persona, Preset, ContextPreset, ToolPreset, WorkspaceProfile, RAGConfig).
+- **`dataclasses.asdict()`**: For JSON-L logging (CommsLogEntry, LogEntry, DiscussionEntry, Message).
+- **Custom tomli-w / tomllib**: For the modules that need precise control over TOML output ordering.
+
+Most serialization is done by the **manager classes** (PresetManager, PersonaManager, etc.) — the model itself is pure data.
+
+---
+
+## Validation
+
+`pydantic` validators enforce constraints:
+
+```python
+class Preset(BaseModel):
+    name: str = Field(..., min_length=1, max_length=64)
+    temperature: float = Field(0.7, ge=0.0, le=2.0)
+    top_p: float = Field(0.95, ge=0.0, le=1.0)
+    max_output_tokens: int = Field(8192, ge=1, le=200000)
+
+    @validator('name')
+    def name_must_be_safe(cls, v):
+        if '/' in v or '\\' in v:
+            raise ValueError("name cannot contain path separators")
+        return v
+```
+
+Validators run on load and on save. The managers call `.model_dump()` / `Preset.parse_obj(dict)` to round-trip.
+
+---
+
+## The `parse_plan_md` Function
+
+A critical utility that converts a markdown plan file to `Track` and `Ticket` objects:
+
+```python
+def parse_plan_md(plan_path: Path) -> list[Ticket]:
+    """Parse a plan.md file into a list of Ticket objects."""
+    text = plan_path.read_text(encoding="utf-8")
+    tickets = []
+    current_phase = None
+    for line in text.splitlines():
+        line = line.rstrip()
+        if not line:
+            continue
+        # Phase heading
+        if line.startswith("# "):
+            current_phase = line[2:].strip()
+            continue
+        # Ticket line
+        m = re.match(r'^\s*-\s*\[(.)\]\s*(.+?)(?:\s*\[depends:\s*([^\]]+)\])?\s*$', line)
+        if not m:
+            continue
+        marker, rest, deps = m.groups()
+        status = {" ": "pending", "~": "running", "x": "done", "!": "blocked"}.get(marker, "pending")
+        # Split rest into ticket_id and title
+        id_match = re.match(r'(\S+):\s*(.+)', rest)
+        if id_match:
+            tid, title = id_match.groups()
+        else:
+            tid, title = rest, rest
+        tickets.append(Ticket(
+            ticket_id=tid.strip(),
+            title=title.strip(),
+            description="",
+            status=TicketStatus(status),
+            depends_on=[d.strip() for d in (deps or "").split(",") if d.strip()],
+        ))
+    return tickets
+```
+
+The DAG engine uses the returned `Ticket` objects to build the dependency graph.
+
+---
+
+## The `AppState` Class
+
+A separate large dataclass that aggregates all GUI-visible state. **Lives in `src/app_controller.py`**, not here, because it holds the controller's runtime state (not a pure data model). But it follows the same conventions (typed fields, no methods, SDM tags).
+
+---
+
+## How Models Are Used
+
+### In `src/presets.py`
+
+```python
+def save_preset(preset: Preset) -> None:
+    data = preset.model_dump()
+    tomli_w.dump(data, open(self.presets_path, "wb"))
+```
+
+### In `src/ai_client.py`
+
+```python
+def send(self, request: AIRequest) -> AIResponse:
+    """Sends a request. AIRequest is defined in models.py."""
+```
+
+### In `src/multi_agent_conductor.py`
+
+```python
+def load_track(self, track_id: str) -> Track:
+    tickets = parse_plan_md(plan_path)
+    return Track(
+        track_id=track_id,
+        title=...,
+        tickets=tickets,
+        plan_path=str(plan_path),
+        created_at=time.time(),
+    )
+```
+
+---
+
+## Testing
+
+Models are tested for:
+- **Round-trip serialization** (`to_toml` → `from_toml` → equal)
+- **Validation** (invalid values rejected)
+- **Default values** (all fields have sensible defaults)
+- **Field types** (TypeScript-like strict checking via `pydantic`)
+
+Tests live in `tests/test_models.py` and module-specific test files (e.g., `tests/test_preset_manager.py` exercises the `Preset` model).
+
+---
+
+## Adding a New Model
+
+1. Add the model to the appropriate region block in `src/models.py`.
+2. Add validators if any fields have constraints.
+3. Add a docstring with `[C: ...]` (callers) and `[M: ...]` (mutators) SDM tags.
+4. If the model is persisted, write a `to_<format>()` / `from_<format>()` pair in the relevant manager.
+5. Add tests in `tests/test_models.py` (round-trip + validation).
+6. Update `docs/guide_models.md` (this file) to document the new model.
+
+---
+
+## See Also
+
+- **[guide_architecture.md](guide_architecture.md)** — How models flow through the system
+- **[guide_app_controller.md](guide_app_controller.md)** — `AppState` and controller-owned models
+- **[guide_mma.md](guide_mma.md)** — `Ticket`, `Track`, `WorkerContext` usage in MMA
+- **[guide_personas.md](guide_personas.md)** — `Persona` model in detail
+- **[guide_workspace_profiles.md](guide_workspace_profiles.md)** — `WorkspaceProfile` model in detail
+- **[guide_rag.md](guide_rag.md)** — `RAGConfig`, `RAGChunk`, `RAGResult` models
+- **`src/presets.py`, `src/personas.py`, `src/context_presets.py`, `src/tool_presets.py`** — Managers that use these models
+- **`src/multi_agent_conductor.py`** — Uses `Ticket`, `Track`, `WorkerContext`
+- **`src/ai_client.py`** — Uses `Provider`, `ModelInfo`, `AIRequest`, `AIResponse`