MMA_Support draft

check point support MMA
2026-02-24 19:11:15 -05:00 · 2026-02-24 19:03:22 -05:00
9 changed files with 1830 additions and 0 deletions
@@ -0,0 +1,32 @@
+# Data Pipelines, Memory Views & Configuration
+
+The 4-Tier Architecture relies on strictly managed data pipelines and configuration files to prevent token bloat and maintain a deterministically safe execution environment.
+
+## 1. AST Extraction Pipelines (Memory Views)
+
+To prevent LLMs from hallucinating or consuming massive context windows, raw file text is heavily restricted. The `file_cache.py` uses Tree-sitter for deterministic Abstract Syntax Tree (AST) parsing to generate specific views:
+
+1.  **The Directory Map (Tier 1):** Just filenames and nested paths (e.g., output of `tree /F`). No source code.
+2.  **The Skeleton View (Tier 2 & 3 Dependencies):** Extracts only `class` and `def` signatures, parameters, and type hints. Strips all docstrings and function bodies, replacing them with `pass`. Used for foreign modules a worker must call but not modify.
+3.  **The Curated Implementation View (Tier 2 Target Modules):**
+    *   Keeps class/struct definitions.
+    *   Keeps module-level docstrings and block comments (heuristics).
+    *   Keeps full bodies of functions marked with `@core_logic` or `# [HOT]`.
+    *   Replaces standard function bodies with `... # Hidden`.
+4.  **The Raw View (Tier 3 Target File):** Unredacted, line-by-line source code of the *single* file a Tier 3 worker is assigned to modify.
+
+## 2. Configuration Schema
+
+The architecture separates sensitive billing logic from AI behavior routing.
+
+*   **`credentials.toml` (Security Prerequisite):** Holds the bare metal authentication (`gemini_api_key`, `anthropic_api_key`, `deepseek_api_key`). **This file must be in `.gitignore`.** Loaded strictly for instantiating HTTP clients.
+*   **`project.toml` (Repo Rules):** Holds repository-specific bounds (e.g., "This project uses Python 3.12 and strictly follows PEP8").
+*   **`agents.toml` (AI Routing):** Defines the hardcoded hierarchy's operational behaviors. Includes fallback models (`default_expensive`, `default_cheap`), Tier 1/2 overarching parameters (temperature, base system prompts), and Tier 3 worker archetypes (`refactor`, `codegen`, `contract_stubber`) mapped to specific models (DeepSeek V3, Gemini Flash) and `trust_level` tags (`step` vs. `auto`).
+
+## 3. LLM Output Formats
+
+To ensure robust parser execution and avoid JSON string-escaping nightmares, the architecture uses a hybrid approach for LLM outputs depending on the Tier:
+
+*   **Native Structured Outputs (JSON Schema forced by API):** Used for Tier 1 and Tier 2 routing and orchestration. The model provider mathematically guarantees the syntax, allowing clean parsing of `Track` and `Ticket` metadata by `pydantic`.
+*   **XML Tags (`<file_path>`, `<file_content>`):** Used for Tier 3 Code Generation & Tools. It natively isolates syntax and requires zero string escaping. The UI/Orchestrator parses these via regex to safely extract raw Python code without bracket-matching failures.
+*   **Godot ECS Flat List (Linearized Entities with ID Pointers):** Instead of deeply nested JSON (which models hallucinate across 500 tokens), Tier 1/2 Orchestrators define complex dependency DAGs as a flat list of items (e.g., `[Ticket id="tkt_impl" depends_on="tkt_stub"]`). The Python state machine reconstructs the DAG locally.
@@ -0,0 +1,46 @@
+# Iteration Plan (Implementation Tracks)
+
+To safely refactor a linear, single-agent codebase into the 4-Tier Multi-Model Architecture without breaking the working prototype, the implementation should be sequenced into these five isolated Epics (Tracks):
+
+## Track 1: The Memory Foundations (AST Parser)
+**Goal:** Build the engine that prevents token-bloat by turning massive source files into curated memory views.
+**Implementation Details:**
+1.  Integrate `tree-sitter` and language bindings into `file_cache.py`.
+2.  Build `ASTParser` extraction rules:
+    *   *Skeleton View:* Strip function/class bodies, preserving only signatures, parameters, and type hints.
+    *   *Curated View:* Preserve class structures, module docstrings, and bodies of functions marked `# [HOT]` or `@core_logic`. Replace standard bodies with `... # Hidden`.
+3.  **Acceptance:** `file_cache.get_curated_view('script.py')` returns a perfectly formatted summary string in the terminal.
+
+## Track 2: State Machine & Data Structures
+**Goal:** Define the rigid Python objects the AI agents will pass to each other to rely on structured data, not loose chat strings.
+**Implementation Details:**
+1.  Create `models.py` with `pydantic` or `dataclasses` for `Track` (Epic) and `Ticket` (Task).
+2.  Define `WorkerContext` holding the Ticket ID, assigned model (from `agents.toml`), isolated `credentials.toml` injection, and a `messages` payload array.
+3.  Add helper methods for state mutators (e.g., `ticket.mark_blocked()`, `ticket.mark_complete()`).
+4.  **Acceptance:** Instantiate a `Track` with 3 `Tickets` and successfully enforce state changes in Python without AI involvement.
+
+## Track 3: The Linear Orchestrator & Execution Clutch
+**Goal:** Build the synchronous, debuggable core loop that runs a single Tier 3 Worker and pauses for human approval.
+**Implementation Details:**
+1.  Create `multi_agent_conductor.py` with a `run_worker_lifecycle(ticket: Ticket)` function.
+2.  Inject context (Raw View from `file_cache.py`) and format the `messages` array for the API.
+3.  Implement the Clutch (HITL): `input()` pause for CLI or wait state for GUI before executing the returned tool (e.g., `write_file`). Allow manual memory mutation of the JSON payload.
+4.  **Acceptance:** The script sends a hardcoded Ticket to DeepSeek, pauses in the terminal showing a diff, waits for user approval, applies the diff via `mcp_client.py`, and wipes the worker's history.
+
+## Track 4: Tier 4 QA Interception
+**Goal:** Stop error traces from destroying the Worker's token window by routing crashes through a stateless translator.
+**Implementation Details:**
+1.  In `shell_runner.py`, intercept `stderr` (e.g., `returncode != 0`).
+2.  Do *not* append `stderr` to the main Worker's history. Instead, instantiate a synchronous API call to the `default_cheap` model.
+3.  Prompt: *"You are an error parser. Output only a 1-2 sentence instruction on how to fix this syntax error."* Send the raw `stderr` and target file snippet.
+4.  Append the translated 20-word fix to the main Worker's history as a "System Hint".
+5.  **Acceptance:** A deliberate syntax error triggers the execution engine to silently ping the cheap API, returning a 20-word correction to the Worker instead of a 200-line stack trace.
+
+## Track 5: UI Decoupling & Tier 1/2 Routing (The Final Boss)
+**Goal:** Bring the system online by letting Tier 1 and Tier 2 dynamically generate Tickets managed by the async Event Bus.
+**Implementation Details:**
+1.  Implement an `asyncio.Queue` in `multi_agent_conductor.py`.
+2.  Write Tier 1 & 2 system prompts forcing output as strict JSON arrays (Tracks and Tickets).
+3.  Write the Dispatcher async loop to convert JSON into `Ticket` objects and push to the queue.
+4.  Enforce the Stub Resolver: If a Ticket archetype is `contract_stubber`, pause dependent Tickets, run the stubber, trigger `file_cache.py` to rebuild the Skeleton View, then resume.
+5.  **Acceptance:** Vague prompt ("Refactor config system") results in Tier 1 Track, Tier 2 Tickets (Interface stub + Implementation). System executes stub, updates AST, and finishes implementation automatically (or steps through if Linear toggle is on).
@@ -0,0 +1,37 @@
+# The Orchestrator Engine & UI
+
+To transition from a linear, single-agent chat box to a multi-agent control center, the GUI must be decoupled from the LLM execution loops. A single-agent UI assumes a linear flow (*User types -> UI waits -> LLM responds -> UI updates*), which freezes the application if a Tier 1 PM waits for human approval while Tier 3 Workers run local tests in the background.
+
+## 1. The Async Event Bus (Decoupling UI from Agents)
+
+The GUI acts as a "dumb" renderer. It only renders state; it never manages state.
+
+*   **The Agent Bus (Message Queue):** A thread-safe signaling system (e.g., `asyncio.Queue`, `pyqtSignal`) passes messages between agents, UI, and the filesystem.
+*   **Background Workers:** When Tier 1 spawns a Tier 2 Tech Lead, the GUI does not wait. It pushes a `UserRequestEvent` to the Conductor's queue. The Conductor runs the LLM call asynchronously and fires `StateUpdateEvents` back for the GUI to redraw.
+
+## 2. The Execution Clutch (HITL)
+
+Every spawned worker panel implements an execution state toggle based on the `trust_level` defined in `agents.toml`.
+
+*   **Step Mode (Lock-step):** The worker pauses **twice** per cycle:
+    1.  *After* generating a response/tool-call, but *before* executing the tool. The GUI renders a preview (e.g., diff of lines 40-50) and offers `[Approve]`, `[Edit Payload]`, or `[Abort]`.
+    2.  *After* executing the tool, but *before* sending output back to the LLM (allows verification of the system output).
+*   **Auto Mode (Fire-and-forget):** The worker loops continuously until it outputs a "Task Complete" status to the Router.
+
+## 3. Memory Mutation (The "Debug" Superpower)
+
+If a worker generates a flawed plan in Step Mode, the "Memory Mutator" allows the user to click the last message and edit the raw JSON/text directly before hitting "Approve." By rewriting the AI's brain mid-task, the model proceeds as if it generated the correct idea, saving the context window from restarting due to a minor hallucination.
+
+## 4. The Global Execution Toggle
+
+A Global Execution Toggle overrides all individual agent trust levels for debugging race conditions or context leaks.
+
+*   **Mode = "async" (Production):** The Dispatcher throws Tickets into an `asyncio.TaskGroup`. They spawn instantly, fight for API rate limits, read the skeleton, and run in parallel.
+*   **Mode = "linear" (Debug):** The Dispatcher iterates through the array sequentially using a strict `for` loop. It `awaits` absolute completion of Ticket 1 (including QA loops and code review) before instantiating the `WorkerAgent` for Ticket 2. This enforces a deterministic state machine and outputs state snapshots (`debug_state.json`) for manual verification.
+
+## 5. State Machine (Dataclasses)
+
+The Conductor relies on strict definitions for `Track` and `Ticket` to enforce state and UI rendering (e.g., using `dataclasses` or `pydantic`).
+
+*   **`Ticket`:** Contains `id`, `target_file`, `prompt`, `worker_archetype`, `status` (pending, running, blocked, step_paused, completed), and a `dependencies` list of Ticket IDs that must finish first.
+*   **`Track`:** Contains `id`, `title`, `description`, `status`, and a list of `Tickets`.
@@ -0,0 +1,18 @@
+# System Specification: 4-Tier Hierarchical Multi-Model Architecture
+
+**Project:** `manual_slop` (or equivalent Agentic Co-Dev Prototype)
+
+**Core Philosophy:** Token Economy, Strict Memory Siloing, and Human-In-The-Loop (HITL) Execution.
+
+## 1. Architectural Overview
+
+This system rejects the "monolithic black-box" approach to agentic coding. Instead of passing an entire codebase into a single expensive context window, the architecture mimics a senior engineering department. It uses a 4-Tier hierarchy where cognitive load and context are aggressively filtered from top to bottom.
+
+Expensive, high-reasoning models manage metadata and architecture (Tier 1 & 2), while cheap, fast models handle repetitive syntax and error parsing (Tier 3 & 4).
+
+### 1.1 Core Paradigms
+
+* **Token Firewalling:** Error logs and deep history are never allowed to bubble up to high-tier models. The system relies heavily on abstracted AST views (Skeleton, Curated) rather than raw code when context allows.
+* **Context Amnesia:** Worker agents (Tier 3) have their trial-and-error histories wiped upon task completion to prevent context ballooning and hallucination.
+* **The Execution Clutch (HITL):** Agents operate based on Archetype Trust Scores defined in configuration. Trusted patterns run in `Auto` mode; untrusted or complex refactors run in `Step` mode, pausing before tool execution for human review and JSON history mutation.
+* **Interface-Driven Development (IDD):** The architecture inherently prioritizes the creation of contracts (stubs, schemas) before implementation, allowing workers to proceed in parallel without breaking cross-module boundaries.
@@ -0,0 +1,38 @@
+# Tier 1: The Top-Level Orchestrator (Product Manager)
+
+**Designated Models:** Gemini 3.1 Pro, Claude 3.5 Sonnet.
+**Execution Frequency:** Low (Start of feature, Macro-merge resolution).
+**Core Role:** Epic planning, architecture enforcement, and cross-module task delegation.
+
+The Tier 1 Orchestrator is the most capable and expensive model in the hierarchy. It operates strictly on metadata, summaries, and executive-level directives. It **never** sees raw implementation code.
+
+## Memory Context & Paths
+
+### Path A: Epic Initialization (Project Planning)
+*   **Trigger:** User drops a massive new feature request or architectural shift into the main UI.
+*   **What it Sees (Context):**
+    *   **The User Prompt:** The raw feature request.
+    *   **Project Meta-State:** `project.toml` (rules, allowed languages, dependencies).
+    *   **Repository Map:** A strict, file-tree outline (names and paths only).
+    *   **Global Architecture Docs:** High-level markdown files (e.g., `docs/guide_architecture.md`).
+*   **What it Ignores:** All source code, all AST skeletons, and all previous micro-task histories.
+*   **Output Format:** A JSON array (Godot ECS Flat List format) of `Tracks` (Jira Epics), identifying which modules will be affected, the required Tech Lead persona, and the severity level.
+
+### Path B: Track Delegation (Sprint Kickoff)
+*   **Trigger:** The PM is handing a defined Track down to a Tier 2 Tech Lead.
+*   **What it Sees (Context):**
+    *   **The Target Track:** The specific goal and Acceptance Criteria generated in Path A.
+    *   **Module Interfaces (Skeleton View):** Strict AST skeleton (just class/function definitions) *only* for the modules this specific Track is allowed to touch.
+    *   **Track Roster:** A list of currently active or completed Tracks to prevent duplicate work.
+*   **What it Ignores:** Unrelated module docs, original massive user prompt, implementation details.
+*   **Output Format:** A compiled "Track Brief" (system prompt + curated file list) passed to instantiate the Tier 2 Tech Lead panel.
+
+### Path C: Macro-Merge & Acceptance Review (Severity Resolution)
+*   **Trigger:** A Tier 2 Tech Lead reports "Track Complete" and submits a pull request/diff for a "High Severity" task.
+*   **What it Sees (Context):**
+    *   **Original Acceptance Criteria:** The Track's goals.
+    *   **Tech Lead's Executive Summary:** A ~200-word explanation of the chosen implementation algorithm.
+    *   **The Macro-Diff:** Actual changes made to the codebase.
+    *   **Curated Implementation View:** For boundary files, ensuring the merge doesn't break foreign modules.
+*   **What it Ignores:** Tier 3 Worker trial-and-error histories, Tier 4 error logs, raw bodies of unchanged functions.
+*   **Output Format:** "Approved" (commits to memory) OR "Rejected" with specific architectural feedback for Tier 2.
@@ -0,0 +1,46 @@
+# Tier 2: The Track Conductor (Tech Lead)
+
+**Designated Models:** Gemini 3.0 Flash, Gemini 2.5 Pro.
+**Execution Frequency:** Medium.
+**Core Role:** Module-specific planning, code review, spawning Worker agents, and Topological Dependency Graph management.
+
+The Tech Lead bridges the gap between high-level architecture and actual code syntax. It operates in a "need-to-know" state, utilizing AST parsing (`file_cache.py`) to keep token counts low while maintaining structural awareness of its assigned modules.
+
+## Memory Context & Paths
+
+### Path A: Sprint Planning (Task Delegation)
+*   **Trigger:** Tier 1 (PM) assigns a Track (Epic) and wakes up the Tech Lead.
+*   **What it Sees (Context):**
+    *   **The Track Brief:** Acceptance Criteria from Tier 1.
+    *   **Curated Implementation View (Target Modules):** AST-extracted class structures, docstrings, and `# [HOT]` function bodies for the 1-3 files this Track explicitly modifies.
+    *   **Skeleton View (Foreign Modules):** Only function signatures and return types for external dependencies.
+*   **What it Ignores:** The rest of the repository, the PM's overarching project-planning logic, raw line-by-line code of non-hot functions.
+*   **Output Format:** A JSON array (Godot ECS Flat List format) of discrete Tier 3 `Tickets` (e.g., Ticket 1: *Write DB migration script*, Ticket 2: *Update core API endpoints*), including `depends_on` pointers to construct an execution DAG.
+
+### Path B: Code Review (Local Integration)
+*   **Trigger:** A Tier 3 Contributor completes a Ticket and submits a diff, OR Tier 4 (QA) flags a persistent failure.
+*   **What it Sees (Context):**
+    *   **Specific Ticket Goal:** What the Contributor was instructed to do.
+    *   **Proposed Diff:** The exact line changes submitted by Tier 3.
+    *   **Test/QA Output:** Relevant logs from Tier 4 compiler checks.
+    *   **Curated Implementation View:** To cross-reference the proposed diff against the existing architecture.
+*   **What it Ignores:** The Contributor's internal trial-and-error chat history. It only sees the final submission.
+*   **Output Format:** *Approve* (merges diff into working branch and updates Curated View) or *Reject* (sends technical critique back to Tier 3).
+
+### Path C: Track Finalization (Upward Reporting)
+*   **Trigger:** All Tier 3 Tickets assigned to this Track are marked "Approved."
+*   **What it Sees (Context):**
+    *   **Original Track Brief:** To verify requirements were met.
+    *   **Aggregated Track Diff:** The sum total of all changes made across all Tier 3 Tickets.
+    *   **Dependency Delta:** A list of any new foreign modules or libraries imported.
+*   **What it Ignores:** The back-and-forth review cycles, original AST Curated View.
+*   **Output Format:** An Executive Summary and the final Macro-Diff, sent back to Tier 1.
+
+### Path D: Contract-First Delegation (Stub-and-Resolve)
+*   **Trigger:** Tier 2 evaluates a Track and detects a cross-module dependency (or a single massive refactor) requiring an undefined signature.
+*   **Role:** Force Interface-Driven Development (IDD) to prevent hallucination.
+*   **Execution Flow:**
+    1.  **Contract Definition:** Splits requirement into a `Stub Ticket`, `Consumer Ticket`, and `Implementation Ticket`.
+    2.  **Stub Generation:** Spawns a cheap Tier 3 worker (e.g., DeepSeek V3 `contract_stubber` archetype) to generate the empty function signature, type hints, and docstrings.
+    3.  **Skeleton Broadcast:** The stub merges, and the system instantly re-runs Tree-sitter to update the global Skeleton View.
+    4.  **Parallel Implementation:** Tier 2 simultaneously spawns the `Consumer` (codes against the skeleton) and the `Implementer` (fills the stub logic) in isolated contexts.
@@ -0,0 +1,35 @@
+# Tier 3: The Worker Agents (Contributors)
+
+**Designated Models:** DeepSeek V3/R1, Gemini 2.5 Flash.
+**Execution Frequency:** High (The core loop).
+**Core Role:** Generating syntax, writing localized files, running unit tests.
+
+The engine room of the system. Contributors execute the highest volume of API calls. Their memory context is ruthlessly pruned. By leveraging cheap, fast models, they operate with zero architectural anxiety—they just write the code they are assigned. They are "Amnesiac Workers," having their history wiped between tasks to prevent context ballooning.
+
+## Memory Context & Paths
+
+### Path A: Heads Down Execution (Task Execution)
+*   **Trigger:** Tier 2 (Tech Lead) hands down a hyper-specific Ticket.
+*   **What it Sees (Context):**
+    *   **The Ticket Prompt:** The exact, isolated instructions from Tier 2.
+    *   **The Target File (Raw View):** The raw, unredacted, line-by-line source code of *only* the specific file (or class/function) it was assigned to modify.
+    *   **Foreign Interfaces (Skeleton View):** Strict AST skeleton (signatures only) of external dependencies required by the ticket.
+*   **What it Ignores:** Epic/Track goals, Tech Lead's Curated View, other files in the same directory, parallel Tickets.
+*   **Output Format:** XML Tags (`<file_path>`, `<file_content>`) defining direct file modifications or `mcp_client.py` tool payloads.
+
+### Path B: Trial and Error (Local Iteration & Tool Execution)
+*   **Trigger:** The Contributor runs a local linter/test, encounters a syntax error, or the human pauses execution using "Step" mode.
+*   **What it Sees (Context):**
+    *   **Ephemeral Working History:** A short, rolling window of its last 2–3 attempts (e.g., "Attempt 1: Wrote code -> Tool Output: SyntaxError").
+    *   **Tier 4 (QA) Injections:** Compressed (20-50 token) fix recommendations from Tier 4 agents (e.g., "Add a closing bracket on line 42").
+    *   **Human Mutations:** Any direct edits made to its JSON history payload before proceeding.
+*   **What it Ignores:** Tech Lead code reviews, attempts older than the rolling window (wiped to save tokens).
+*   **Output Format:** Revised tool payloads until tests pass or the human approves.
+
+### Path C: Task Submission (Micro-Pull Request)
+*   **Trigger:** The code executes cleanly, and "Step" mode is finalized into "Task Complete."
+*   **What it Sees (Context):**
+    *   **The Original Ticket:** To confirm instructions were met.
+    *   **The Final State:** The cleanly modified file or exact diff.
+*   **What it Ignores:** **All of Path B.** Before submission to Tier 2, the orchestrator wipes the messy trial-and-error history from the payload.
+*   **Output Format:** A concise completion message and the clean diff, sent up to Tier 2.
@@ -0,0 +1,33 @@
+# Tier 4: The Utility Agents (Compiler / QA)
+
+**Designated Models:** DeepSeek V3 (Lowest cost possible).
+**Execution Frequency:** On-demand (Intercepts local failures).
+**Core Role:** Single-shot, stateless translation of machine garbage into human English.
+
+Tier 4 acts as the financial firewall. It solves the expensive problem of feeding massive (e.g., 3,000-token) stack traces back into a mid-tier LLM's context window. Tier 4 agents wake up, translate errors, and immediately die.
+
+## Memory Context & Paths
+
+### Path A: The Stack Trace Interceptor (Translator)
+*   **Trigger:** A Tier 3 Contributor executes a script, resulting in a non-zero exit code with a massive `stderr` payload.
+*   **What it Sees (Context):**
+    *   **Raw Error Output:** The exact traceback from the runtime/compiler.
+    *   **Offending Snippet:** *Only* the specific function or 20-line block of code where the error originated.
+*   **What it Ignores:** Everything else. It is blind to the "Why" and focuses only on "What broke."
+*   **Output Format:** A surgical, highly compressed string (20-50 tokens) passed back into the Tier 3 Contributor's working memory (e.g., "Syntax Error on line 42: You missed a closing parenthesis. Add `]`").
+
+### Path B: The Linter / Formatter (Pedant)
+*   **Trigger:** Tier 3 believes it finished a Ticket, but pre-commit hooks (e.g., `ruff`, `eslint`) fail.
+*   **What it Sees (Context):**
+    *   **Linter Warning:** Specific error (e.g., "Line too long", "Missing type hint").
+    *   **Target File:** Code written by Tier 3.
+*   **What it Ignores:** Business logic. It only cares about styling rules.
+*   **Output Format:** A direct `sed` command or silent diff overwrite via tools to fix the formatting without bothering Tier 2 or consuming Tier 3 loops.
+
+### Path C: The Flaky Test Debugger (Isolator)
+*   **Trigger:** A localized unit test fails due to logic (e.g., `assert 5 == 4`), not a syntax crash.
+*   **What it Sees (Context):**
+    *   **Failing Test Function:** The exact `pytest` or `go test` block.
+    *   **Target Function:** The specific function being tested.
+*   **What it Ignores:** The rest of the test suite and module.
+*   **Output Format:** A quick diagnosis sent to Tier 3 (e.g., "The test expects an integer, but your function is currently returning a stringified float. Cast to `int`").
Author	SHA1	Message	Date
ed	22607b4ed2	MMA_Support draft	2026-02-24 19:11:15 -05:00
ed	f68a07e30e	check point support MMA	2026-02-24 19:03:22 -05:00