# `src/models.py` — Data Models [Top](../Readme.md) | [Architecture](guide_architecture.md) | [MMA](guide_mma.md) | [App Controller](guide_app_controller.md) --- ## Overview `src/models.py` (~132KB) is the **centralized data model registry**. It defines every data structure used across the app — Tickets, Tracks, Personas, Presets, Discussion entries, Context files, etc. — using `pydantic` and `dataclasses`. The file exists to **eliminate redundant model definitions** scattered across modules. It also serves as the single source of truth for serialization (TOML, JSON-L, Markdown). --- ## Design Principles 1. **One place to look for any data structure**: If you need to know what fields a `Ticket` has, look here. 2. **Strict types**: `pydantic` for fields with validation, `dataclasses` for internal structures. 3. **No business logic**: Models are pure data. Methods like `to_toml()` are allowed; methods like `execute()` are not. 4. **SDM tags**: Every model has `[C: ...]` (callers) and `[M: ...]` (mutators) tags in docstrings for AI-assisted impact analysis. --- ## Model Categories The file is organized into regions: ```python #region: Core Models #endregion: Core Models #region: AI Models #endregion: AI Models #region: Preset Models #endregion: Preset Models #region: Persona Models #endregion: Persona Models #region: Context Models #endregion: Context Models #region: MMA Models #endregion: MMA Models #region: UI State Models #endregion: UI State Models #region: Logging Models #endregion: Logging Models ``` ### `Provider`, `ModelInfo` — AI Models ```python class Provider(str, Enum): GEMINI = "gemini" ANTHROPIC = "anthropic" DEEPSEEK = "deepseek" MINIMAX = "MiniMax" GEMINI_CLI = "gemini-cli" @dataclass class ModelInfo: name: str provider: Provider context_window: int max_output_tokens: int supports_caching: bool = False cost_per_1k_input: float = 0.0 cost_per_1k_output: float = 0.0 ``` These back the AI Settings panel and the cost tracker. ### `DiscussionEntry`, `Message` — Discussion History ```python @dataclass class Message: role: Literal["user", "assistant", "system", "tool"] content: str timestamp: float metadata: dict[str, Any] = field(default_factory=dict) @dataclass class DiscussionEntry: entry_id: str messages: list[Message] is_take_root: bool = False # First message of a "take" (timeline branch) parent_take_id: str | None = None metadata: dict[str, Any] = field(default_factory=dict) ``` Discussion history is a list of `DiscussionEntry` objects, each containing one or more `Message` objects. The branching structure supports "takes" (alternative timeline branches). ### `ContextFileEntry`, `ContextScreenshot` — Context ```python class ViewMode(str, Enum): FULL = "full" SUMMARIZE = "summarize" SKELETON = "skeleton" OUTLINE = "outline" NONE = "none" @dataclass class ContextFileEntry: path: str # Absolute or relative to project root view_mode: ViewMode = ViewMode.FULL annotations: list[Annotation] = field(default_factory=list) fuzzy_slice: FuzzySlice | None = None # Optional line range @dataclass class ContextScreenshot: path: str # Absolute path to image caption: str = "" ``` Context is a composition of files + screenshots, each with optional view mode and line-range slicing. ### `FuzzySlice`, `Annotation` — Visual Slice Editor ```python @dataclass class FuzzySlice: start_anchor: str # Fuzzy-matched string end_anchor: str start_offset: int = 0 end_offset: int = 0 fallback_start_line: int | None = None fallback_end_line: int | None = None @dataclass class Annotation: kind: Literal["tag", "comment"] text: str line_range: tuple[int, int] | None = None ``` Fuzzy slices use **anchor-based matching** to survive code modifications. If `start_anchor` shifts due to edits, the slice re-anchors on the next render. See **[docs/guide_context_curation.md](guide_context_curation.md)** for the full Visual Slice Editor. ### `Ticket`, `Track`, `WorkerContext` — MMA ```python class TicketStatus(str, Enum): PENDING = "pending" RUNNING = "running" DONE = "done" BLOCKED = "blocked" SKIPPED = "skipped" class TicketPriority(str, Enum): HIGH = "high" MEDIUM = "medium" LOW = "low" @dataclass class Ticket: ticket_id: str title: str description: str status: TicketStatus = TicketStatus.PENDING priority: TicketPriority = TicketPriority.MEDIUM depends_on: list[str] = field(default_factory=list) blocks: list[str] = field(default_factory=list) files_involved: list[str] = field(default_factory=list) persona: str | None = None result: dict | None = None error: str | None = None commit_sha: str | None = None @dataclass class Track: track_id: str title: str description: str tickets: list[Ticket] plan_path: str created_at: float checkpoints: list[TrackCheckpoint] = field(default_factory=list) @dataclass class TrackCheckpoint: sha: str phase: str timestamp: float note: str @dataclass class WorkerContext: """The minimal context slice given to a tier3-worker sub-agent.""" ticket_id: str track_id: str persona: str | None focus_files: list[str] skeleton_views: dict[str, str] # path -> skeleton string history: list[Message] # Recent messages from the parent conductor_notes: str ``` `WorkerContext` is the **Token Firewall** boundary: this is exactly what each Tier 3 worker sees. It includes only the focus files, their skeletons, and recent history. The parent agent's full state is never visible. ### `Persona`, `Preset`, `ContextPreset`, `ToolPreset` — Configuration ```python @dataclass class Persona: name: str model: str | None = None system_prompt: str | None = None tool_weights: dict[str, int] = field(default_factory=dict) # tool_name -> 1..5 parameter_biases: dict[str, Any] = field(default_factory=dict) bias_profile: str | None = None tier_assignments: dict[str, str] = field(default_factory=dict) # tier -> persona_name description: str = "" @dataclass class Preset: name: str base_prompt: str user_instructions: str full_text: str # base_prompt + user_instructions temperature: float = 0.7 top_p: float = 0.95 max_output_tokens: int = 8192 is_foundation: bool = False # True for the foundational base prompt @dataclass class ContextPreset: name: str files: list[ContextFileEntry] screenshots: list[ContextScreenshot] description: str = "" last_validated: float = 0.0 @dataclass class ToolPreset: name: str enabled_tools: dict[str, bool] = field(default_factory=dict) # tool_name -> enabled weights: dict[str, int] = field(default_factory=dict) # tool_name -> 1..5 parameter_biases: dict[str, Any] = field(default_factory=dict) bias_profile: str | None = None description: str = "" ``` Personas consolidate **everything an agent needs** into a single named entity. Presets are simpler — just system prompt + parameters. ### `CommsLogEntry`, `LogEntry` — Logging ```python @dataclass class CommsLogEntry: timestamp: float source: str # "main", "tier3-worker", "tier4-qa" role: str # "user", "assistant", "system" payload_type: str # "prompt", "response", "tool_call", "tool_result" content: str metadata: dict[str, Any] = field(default_factory=dict) ticket_id: str | None = None @dataclass class LogEntry: timestamp: float level: Literal["DEBUG", "INFO", "WARNING", "ERROR"] message: str source: str # Module or subsystem name context: dict[str, Any] = field(default_factory=dict) ``` Comms logs are append-only and stored as JSON-L. They are the **primary debugging surface** for AI interactions. ### `UIPerformanceSnapshot`, `DiagnosticEntry` — Diagnostics ```python @dataclass class UIPerformanceSnapshot: timestamp: float fps: float frame_time_ms: float cpu_pct: float input_lag_ms: float @dataclass class DiagnosticEntry: timestamp: float component: str # "DAG Engine", "Aggregation", "Panel:Command Palette" hit_count: int total_latency_ms: float peak_latency_ms: float min_latency_ms: float ``` Diagnostics power the **Performance Diagnostics** panel (FPS, Frame Time, CPU, plus per-component hit counts and latencies). ### `HookRequest`, `HookResponse` — Hook API ```python @dataclass class HookRequest: action: str # "click", "set_value", "custom_callback", etc. item: str | None = None value: Any = None callback: str | None = None args: list[Any] = field(default_factory=list) kwargs: dict[str, Any] = field(default_factory=dict) @dataclass class HookResponse: status: Literal["ok", "error", "queued", "rejected"] message: str = "" data: dict[str, Any] = field(default_factory=dict) ``` ### `WorkspaceProfile`, `LayoutPreset` — Layouts ```python @dataclass class WorkspaceProfile: name: str scope: Literal["global", "project"] docking_layout: str # ImGui ini-string window_visibility: dict[str, bool] = field(default_factory=dict) panel_state: dict[str, dict] = field(default_factory=dict) auto_switch_triggers: list[str] = field(default_factory=list) description: str = "" @dataclass class LayoutPreset: name: str multi_viewport_state: dict[str, Any] = field(default_factory=dict) description: str = "" ``` ### `RAGConfig`, `RAGChunk`, `RAGResult` — RAG ```python @dataclass class RAGConfig: enabled: bool = False source: Literal["chromadb", "external_mcp"] = "chromadb" embedding_provider: str = "gemini-embedding-001" chunk_size: int = 512 chunk_overlap: int = 64 top_k: int = 5 external_mcp_server: str | None = None @dataclass class RAGChunk: text: str source_path: str start_line: int end_line: int embedding: list[float] = field(default_factory=list) @dataclass class RAGResult: chunks: list[RAGChunk] query: str distance_threshold: float = 0.0 ``` --- ## Constants The file also defines several module-level constants used across the app: ```python # Provider routing PROVIDERS: list[str] = ["gemini", "anthropic", "deepseek", "MiniMax", "gemini-cli"] # Tool categories (for Tool Bias) TOOL_CATEGORIES: list[str] = [ "File I/O", "Python AST", "C/C++ AST", "Analysis", "Network", "Runtime", "Beads", ] # MMA tier -> default persona DEFAULT_TIER_PERSONAS: dict[str, str] = { "tier1": "orchestrator", "tier2": "tech-lead", "tier3": "worker", "tier4": "qa", } # AGENT_TOOL_NAMES — the canonical list of all 45 tool names AGENT_TOOL_NAMES: list[str] = [ "read_file", "list_directory", "search_files", "get_file_summary", "get_file_slice", "set_file_slice", "edit_file", # ... all 45 ... ] ``` These constants eliminate the **scattered list definitions** problem — every module imports the same source of truth. --- ## Serialization Models use a mix of strategies: - **`pydantic` models**: For TOML round-trip with validation (Persona, Preset, ContextPreset, ToolPreset, WorkspaceProfile, RAGConfig). - **`dataclasses.asdict()`**: For JSON-L logging (CommsLogEntry, LogEntry, DiscussionEntry, Message). - **Custom tomli-w / tomllib**: For the modules that need precise control over TOML output ordering. Most serialization is done by the **manager classes** (PresetManager, PersonaManager, etc.) — the model itself is pure data. --- ## Validation `pydantic` validators enforce constraints: ```python class Preset(BaseModel): name: str = Field(..., min_length=1, max_length=64) temperature: float = Field(0.7, ge=0.0, le=2.0) top_p: float = Field(0.95, ge=0.0, le=1.0) max_output_tokens: int = Field(8192, ge=1, le=200000) @validator('name') def name_must_be_safe(cls, v): if '/' in v or '\\' in v: raise ValueError("name cannot contain path separators") return v ``` Validators run on load and on save. The managers call `.model_dump()` / `Preset.parse_obj(dict)` to round-trip. --- ## The `parse_plan_md` Function A critical utility that converts a markdown plan file to `Track` and `Ticket` objects: ```python def parse_plan_md(plan_path: Path) -> list[Ticket]: """Parse a plan.md file into a list of Ticket objects.""" text = plan_path.read_text(encoding="utf-8") tickets = [] current_phase = None for line in text.splitlines(): line = line.rstrip() if not line: continue # Phase heading if line.startswith("# "): current_phase = line[2:].strip() continue # Ticket line m = re.match(r'^\s*-\s*\[(.)\]\s*(.+?)(?:\s*\[depends:\s*([^\]]+)\])?\s*$', line) if not m: continue marker, rest, deps = m.groups() status = {" ": "pending", "~": "running", "x": "done", "!": "blocked"}.get(marker, "pending") # Split rest into ticket_id and title id_match = re.match(r'(\S+):\s*(.+)', rest) if id_match: tid, title = id_match.groups() else: tid, title = rest, rest tickets.append(Ticket( ticket_id=tid.strip(), title=title.strip(), description="", status=TicketStatus(status), depends_on=[d.strip() for d in (deps or "").split(",") if d.strip()], )) return tickets ``` The DAG engine uses the returned `Ticket` objects to build the dependency graph. --- ## The `AppState` Class A separate large dataclass that aggregates all GUI-visible state. **Lives in `src/app_controller.py`**, not here, because it holds the controller's runtime state (not a pure data model). But it follows the same conventions (typed fields, no methods, SDM tags). --- ## How Models Are Used ### In `src/presets.py` ```python def save_preset(preset: Preset) -> None: data = preset.model_dump() tomli_w.dump(data, open(self.presets_path, "wb")) ``` ### In `src/ai_client.py` ```python def send(self, request: AIRequest) -> AIResponse: """Sends a request. AIRequest is defined in models.py.""" ``` ### In `src/multi_agent_conductor.py` ```python def load_track(self, track_id: str) -> Track: tickets = parse_plan_md(plan_path) return Track( track_id=track_id, title=..., tickets=tickets, plan_path=str(plan_path), created_at=time.time(), ) ``` --- ## Testing Models are tested for: - **Round-trip serialization** (`to_toml` → `from_toml` → equal) - **Validation** (invalid values rejected) - **Default values** (all fields have sensible defaults) - **Field types** (TypeScript-like strict checking via `pydantic`) Tests live in `tests/test_models.py` and module-specific test files (e.g., `tests/test_preset_manager.py` exercises the `Preset` model). --- ## Adding a New Model 1. Add the model to the appropriate region block in `src/models.py`. 2. Add validators if any fields have constraints. 3. Add a docstring with `[C: ...]` (callers) and `[M: ...]` (mutators) SDM tags. 4. If the model is persisted, write a `to_()` / `from_()` pair in the relevant manager. 5. Add tests in `tests/test_models.py` (round-trip + validation). 6. Update `docs/guide_models.md` (this file) to document the new model. --- ## See Also - **[guide_architecture.md](guide_architecture.md)** — How models flow through the system - **[guide_app_controller.md](guide_app_controller.md)** — `AppState` and controller-owned models - **[guide_mma.md](guide_mma.md)** — `Ticket`, `Track`, `WorkerContext` usage in MMA - **[guide_personas.md](guide_personas.md)** — `Persona` model in detail - **[guide_workspace_profiles.md](guide_workspace_profiles.md)** — `WorkspaceProfile` model in detail - **[guide_rag.md](guide_rag.md)** — `RAGConfig`, `RAGChunk`, `RAGResult` models - **[guide_context_aggregation.md](guide_context_aggregation.md)** — How the `FileItem` and `ContextPreset` schemas flow through the `aggregate.py` pipeline - **[guide_discussions.md](guide_discussions.md)** — The entry dict shape (`{role, content, collapsed, ts, ...}`) consumed by `parse_history_entries` - **`src/presets.py`, `src/personas.py`, `src/context_presets.py`, `src/tool_presets.py`** — Managers that use these models - **`src/multi_agent_conductor.py`** — Uses `Ticket`, `Track`, `WorkerContext` - **[conductor/tracks/nagent_review_20260608/report.md §6](../conductor/tracks/nagent_review_20260608/report.md)** — Deep-dive on the `FileItem` schema as Manual Slop's strongest curation dimension - **`src/ai_client.py`** — Uses `Provider`, `ModelInfo`, `AIRequest`, `AIResponse`