fix(conductor): Enforce strict statelessness and robust JSON parsing for subagents

feat(conductor): Add run_subagent script for stable headless skill invocation
docs(conductor): Add MMA_Support as the fallback source of truth to the core engine track
2026-02-24 23:36:41 -05:00 · 2026-02-24 23:17:45 -05:00 · 2026-02-24 23:03:14 -05:00 · 2026-02-24 22:57:28 -05:00 · 2026-02-24 22:45:17 -05:00 · 2026-02-24 22:44:36 -05:00
14 changed files with 472 additions and 50 deletions
@@ -0,0 +1,66 @@
+# Skill: MMA Tiered Orchestrator
+
+## Description
+This skill enforces the 4-Tier Hierarchical Multi-Model Architecture (MMA) directly within the Gemini CLI using Token Firewalling and sub-agent task delegation. It teaches the CLI how to act as a Tier 1/2 Orchestrator, dispatching stateless tasks to cheaper models using shell commands, thereby preventing massive error traces or heavy coding contexts from polluting the primary prompt context.
+
+<instructions>
+# MMA Token Firewall & Tiered Delegation Protocol
+
+You are operating as a Tier 1 Product Manager or Tier 2 Tech Lead within the MMA Framework. Your context window is extremely valuable and must be protected from token bloat (such as raw, repetitive code edits, trial-and-error histories, or massive stack traces).
+
+To accomplish this, you MUST delegate token-heavy or stateless tasks to "Tier 3 Contributors" or "Tier 4 QA Agents" by spawning secondary Gemini CLI instances via `run_shell_command`.
+
+**CRITICAL Prerequisite:**
+To avoid hanging the CLI and ensure proper environment authentication, you MUST NOT call the `gemini` command directly. Instead, you MUST use the wrapper script:
+`.\scripts\run_subagent.ps1 -Prompt "..."`
+
+## 1. The Tier 3 Worker (Heads-Down Coding)
+When you need to perform a significant code modification (e.g., refactoring a 500-line script, writing a massive class, or implementing a predefined spec):
+1. **DO NOT** attempt to write or use `replace`/`write_file` yourself. Your history will bloat.
+2. **DO** construct a single, highly specific prompt.
+3. **DO** spawn a sub-agent using `run_shell_command` pointing to the target file.
+   *Command:* `.\scripts\run_subagent.ps1 -Prompt "Modify [FILE_PATH] to implement [SPECIFIC_INSTRUCTION]. Only write the code, no pleasantries."`
+4. If you need the sub-agent to automatically apply changes instead of just returning the text, use `gemini run` or pipe the output appropriately. However, the best method is to let the sub-agent modify the code and return "Done."
+
+## 2. The Tier 4 QA Agent (Error Translation)
+If you run a local test (e.g., `npm test`, `pytest`, `go run`) via `run_shell_command` and it fails with a massive traceback (e.g., 200+ lines of `stderr`):
+1. **DO NOT** analyze the raw `stderr` in your own context window.
+2. **DO** immediately spawn a stateless Tier 4 agent to compress the error.
+3. *Command:* `.\scripts\run_subagent.ps1 -Prompt "Summarize this stack trace into a 20-word fix: [PASTE_SNIPPET_OF_STDERR_HERE]"`
+4. Use the 20-word fix returned by the Tier 4 agent to inform your next architectural decision or pass it to the Tier 3 worker.
+
+## 3. Context Amnesia (Phase Checkpoints)
+When you complete a major Phase or Track within the `conductor` workflow:
+1. Stage your changes and commit them.
+2. Draft a comprehensive summary of the state changes in a Git Note attached to the commit.
+3. Treat the checkpoint as a "Memory Wipe." Actively disregard previous conversational turns and trial-and-error histories. Rely exclusively on the newly generated Git Note and the physical state of the files on disk for your next Phase.
+</instructions>
+
+<examples>
+### Example 1: Spawning a Tier 4 QA Agent
+**User / System:** `pytest tests/test_gui.py` failed with 400 lines of output.
+**Agent (You):** 
+```json
+{
+  "command": ".\\scripts\\run_subagent.ps1 -Prompt \"Summarize this stack trace into a 20-word fix: [snip first 30 lines...]\"",
+  "description": "Spawning Tier 4 QA to compress error trace statelessly."
+}
+```
+
+### Example 2: Spawning a Tier 3 Worker
+**User:** Please implement the `ASTParser` class in `file_cache.py` as defined in Track 1.
+**Agent (You):**
+```json
+{
+  "command": ".\\scripts\\run_subagent.ps1 -Prompt \"Read file_cache.py and implement the ASTParser class using tree-sitter. Ensure you preserve docstrings but strip function bodies. Output the updated code or edit the file directly.\"",
+  "description": "Delegating implementation to a Tier 3 Worker."
+}
+```
+</examples>
+
+<triggers>
+- When asked to write large amounts of boilerplate or repetitive code.
+- When encountering a large error trace from a shell execution.
+- When explicitly instructed to act as a "Tech Lead" or "Orchestrator".
+- When managing complex, multi-file Track implementations.
+</triggers>
@@ -10,7 +10,9 @@ To serve as an expert-level utility for personal developer use on small projects

 ## Key Features
 - **Multi-Provider Integration:** Supports both Gemini and Anthropic with seamless switching.
- **Explicit Execution Control:** All AI-generated PowerShell scripts require explicit human confirmation via interactive UI dialogs before execution.
+- **4-Tier Hierarchical Multi-Model Architecture:** Orchestrates an intelligent cascade of specialized models (Product Manager, Tech Lead, Contributor, QA) to isolate cognitive loads and minimize token burn.
+- **Strict Memory Siloing:** Employs AST-based interface extraction and "Context Amnesia" to provide workers only with the absolute minimum context required, preventing hallucination loops.
+- **Explicit Execution Control:** All AI-generated PowerShell scripts require explicit human confirmation via interactive UI dialogs before execution, supported by a global "Linear Execution Clutch" for deterministic debugging.
 - **Detailed History Management:** Rich discussion history with branching, timestamping, and specific git commit linkage per conversation.
 - **In-Depth Toolset Access:** MCP-like file exploration, URL fetching, search, and dynamic context aggregation embedded within a multi-viewport Dear PyGui/ImGui interface.
 - **Integrated Workspace:** A consolidated Hub-based layout (Context, AI Settings, Discussion, Operations) designed for expert multi-monitor workflows.
@@ -16,6 +16,8 @@

 ## Configuration & Tooling

+- **tree-sitter & tree-sitter-python:** For deterministic AST parsing and generation of curated "Skeleton Views" and interface-level memory structures.
+- **pydantic / dataclasses:** For defining strict state schemas (Tracks, Tickets) used in linear orchestration.
 - **tomli-w:** For writing TOML configuration files.
 - **psutil:** For system and process monitoring (CPU/Memory telemetry).
 - **uv:** An extremely fast Python package and project manager.
@@ -25,10 +25,14 @@ This file tracks all major tracks for the project. Each track has its own detail

 ---

- [ ] **Track: 4-Tier Architecture Implementation & Conductor Self-Improvement**
+- [x] **Track: 4-Tier Architecture Implementation & Conductor Self-Improvement**
 *Link: [./tracks/mma_implementation_20260224/](./tracks/mma_implementation_20260224/)*

 ---

 - [ ] **Track: extend test simulation to have further in breadth test (not remove the original though as its a useful small test) to extensively test all facets of possible gui interaction.**
 *Link: [./tracks/gui_sim_extension_20260224/](./tracks/gui_sim_extension_20260224/)*
+---
+
+- [ ] **Track: MMA Core Engine Implementation**
+*Link: [./tracks/mma_core_engine_20260224/](./tracks/mma_core_engine_20260224/)*
@@ -0,0 +1,9 @@
+# MMA Core Engine Implementation
+
+This track implements the 5 Core Epics defined during the MMA Architecture Evaluation.
+
+### Navigation
+- [Specification](./spec.md)
+- [Implementation Plan](./plan.md)
+- [Original Architecture Proposal / Meta-Track](../mma_implementation_20260224/index.md)
+- [MMA Support Directory (Source of Truth)](../../../MMA_Support/)
@@ -0,0 +1,6 @@
+{
+  "id": "mma_core_engine_20260224",
+  "title": "MMA Core Engine Implementation",
+  "status": "planning",
+  "created_at": "2026-02-24T00:00:00.000000"
+}
@@ -0,0 +1,48 @@
+# Implementation Plan: MMA Core Engine Implementation
+
+## Phase 1: Track 1 - The Memory Foundations (AST Parser)
+- [ ] Task: Dependency Setup
+    - [ ] Add `tree-sitter` and `tree-sitter-python` to `pyproject.toml` / `requirements.txt`
+- [ ] Task: Core Parser Class
+    - [ ] Create `ASTParser` in `file_cache.py`
+- [ ] Task: Skeleton View Extraction
+    - [ ] Write query to extract `function_definition` and `class_definition`
+    - [ ] Replace bodies with `pass`, keep type hints and signatures
+- [ ] Task: Curated View Extraction
+    - [ ] Keep class structures, module docstrings
+    - [ ] Preserve `@core_logic` or `# [HOT]` function bodies, hide others
+
+## Phase 2: Track 2 - State Machine & Data Structures
+- [ ] Task: The Dataclasses
+    - [ ] Create `models.py` defining `Ticket` and `Track`
+- [ ] Task: Worker Context Definition
+    - [ ] Define `WorkerContext` holding `Ticket` ID, model config, and ephemeral messages
+- [ ] Task: State Mutator Methods
+    - [ ] Implement `ticket.mark_blocked()`, `ticket.mark_complete()`, `track.get_executable_tickets()`
+
+## Phase 3: Track 3 - The Linear Orchestrator & Execution Clutch
+- [ ] Task: The Engine Core
+    - [ ] Create `multi_agent_conductor.py` containing `ConductorEngine` and `run_worker_lifecycle`
+- [ ] Task: Context Injection
+    - [ ] Format context strings using `file_cache.py` target AST views
+- [ ] Task: The HITL Execution Clutch
+    - [ ] Before executing `write_file`/`shell_runner.py` tools in step-mode, prompt user for confirmation
+    - [ ] Provide functionality to mutate the history JSON before resuming execution
+
+## Phase 4: Track 4 - Tier 4 QA Interception
+- [ ] Task: The Interceptor Loop
+    - [ ] Catch `subprocess.run()` execution errors inside `shell_runner.py`
+- [ ] Task: Tier 4 Instantiation
+    - [ ] Make a secondary API call to `default_cheap` model passing `stderr` and snippet
+- [ ] Task: Payload Formatting
+    - [ ] Inject the 20-word fix summary into the Tier 3 worker history
+
+## Phase 5: Track 5 - UI Decoupling & Tier 1/2 Routing (The Final Boss)
+- [ ] Task: The Event Bus
+    - [ ] Implement an `asyncio.Queue` linking GUI actions to the backend engine
+- [ ] Task: Tier 1 & 2 System Prompts
+    - [ ] Create structured system prompts for Epic routing and Ticket creation
+- [ ] Task: The Dispatcher Loop
+    - [ ] Read Tier 2 JSON flat-lists, construct Tickets, execute Stub resolution paths
+- [ ] Task: UI Component Update
+    - [ ] Refactor `gui_2.py` to push `UserRequestEvent` instead of blocking on API generation
@@ -0,0 +1,39 @@
+# Specification: MMA Core Engine Implementation
+
+## 1. Overview
+This track consolidates the implementation of the 4-Tier Hierarchical Multi-Model Architecture into the `manual_slop` codebase. The architecture transitions the current monolithic single-agent loop into a compartmentalized, token-efficient, and fully debuggable state machine.
+
+## 2. Functional Requirements
+
+### Phase 1: The Memory Foundations (AST Parser)
+- Integrate `tree-sitter` and `tree-sitter-python` into `pyproject.toml` / `requirements.txt`.
+- Implement `ASTParser` in `file_cache.py` to extract strict memory views (Skeleton View, Curated View).
+- Strip function bodies from dependencies while preserving `@core_logic` or `# [HOT]` logic for the target modules.
+
+### Phase 2: State Machine & Data Structures
+- Create `models.py` incorporating strict Pydantic/Dataclass schemas for `Ticket`, `Track`, and `WorkerContext`.
+- Enforce rigid state mutators governing dependencies between tickets (e.g., locking execution until a stub generation ticket completes).
+
+### Phase 3: The Linear Orchestrator & Execution Clutch
+- Build `multi_agent_conductor.py` and a `ConductorEngine` dispatcher loop.
+- Embed the "Execution Clutch" allowing developers to pause, review, and manually rewrite payloads (JSON history mutation) before applying changes to the local filesystem.
+
+### Phase 4: Tier 4 QA Interception
+- Augment `shell_runner.py` with try/except wrappers capturing process errors (`stderr`).
+- Rather than feeding raw stack traces to an expensive model, instantly forward them to a stateless `default_cheap` sub-agent for a 20-word summarization that is subsequently injected into the primary worker's context.
+
+### Phase 5: UI Decoupling & Tier 1/2 Routing (The Final Boss)
+- Disconnect `gui_2.py` from direct LLM inference requests.
+- Bind the GUI to a synchronous or `asyncio.Queue` Event Bus managed by the Orchestrator, allowing dynamic tracking of parallel worker executions without thread-locking the interface.
+
+## 3. Acceptance Criteria
+- [ ] A 1000-line script can be successfully parsed into a 100-line AST Skeleton.
+- [ ] Tickets properly block and resolve depending on stub-generation dependencies.
+- [ ] Shell errors are compressed into >50-token hints using the cheap utility model.
+- [ ] The GUI remains responsive during multi-model generation phases.
+
+## 4. Meta-Track Reference & Source of Truth
+For the original rationale, API formatting recommendations (e.g., Godot ECS schemas vs Nested JSON), and strict token firewall workflows, refer back to the architectural planning meta-track: `conductor/tracks/mma_implementation_20260224/`.
+
+**Fallback Source of Truth:**
+As a fallback, any track or sub-task should absolve its source of truth by referencing the `./MMA_Support/` directory. This directory contains the original design documents and raw discussions from which the entire `mma_implementation` track and 4-Tier Architecture were initially generated.
@@ -0,0 +1,128 @@
+# MMA Migration: Epics and Detailed Tasks
+
+## Track 1: The Memory Foundations (AST Parser)
+
+**Goal:** Build the engine that prevents token-bloat by turning massive source files into curated memory views.
+
+### 1. TDD Approach for `tree-sitter` Integration
+- Create `tests/test_file_cache_ast.py`.
+- Define mock Python source files containing various structures (classes, functions, docstrings, `@core_logic` decorators, `# [HOT]` comments).
+- Write failing tests that instantiate `ASTParser` and assert that `get_skeleton_view()` and `get_curated_view()` return the precisely filtered strings.
+- **Red Phase:** Ensure tests fail because `ASTParser` does not exist.
+- **Green Phase:** Implement the tree-sitter logic iteratively until strings match exactly.
+
+### 2. `ASTParser` Extraction Rules (Tasks)
+- **Task 1.1: Dependency Setup**
+  - Add `tree-sitter` and `tree-sitter-python` to `pyproject.toml` / `requirements.txt`.
+- **Task 1.2: Core Parser Class**
+  - Create `ASTParser` in `file_cache.py` that initializes the language parser.
+- **Task 1.3: Skeleton View Extraction**
+  - Write query to extract `function_definition` and `class_definition`.
+  - Keep signatures, parameters, and return type hints.
+  - Replace all bodies with `pass`.
+- **Task 1.4: Curated View Extraction**
+  - Write query to keep class structures and `expression_statement` docstrings.
+  - Implement heuristic to preserve full bodies of functions decorated with `@core_logic` or containing `# [HOT]` comments.
+  - Replace all other function bodies with `... # Hidden`.
+
+### 3. Acceptance Testing Criteria
+- **Unit Tests:** All AST parsing tests pass with >90% coverage for `file_cache.py`.
+- **Integration Test:** Execute the parser on a large, complex project file (e.g., `ai_client.py`). The output `Skeleton View` must be less than 15% of the original token count. The `Curated View` must correctly retain docstrings and marked functions while stripping standard bodies.
+## Track 2: State Machine & Data Structures
+
+**Goal:** Define the rigid Python objects (Pydantic/Dataclasses) that AI agents will pass to each other, enforcing structured data over loose chat strings.
+
+### 1. TDD Approach for \models.py\
+- Create \	ests/test_models.py\.
+- Write failing tests that instantiate \Track\, \Ticket\, and \WorkerContext\ with various valid and invalid schemas.
+- Write tests that assert state transitions (e.g., from \pending\ to \locked\, from \step_paused\ to \completed\) correctly update internal flags and dependencies.
+- **Red Phase:** Tests fail because \models.py\ classes are undefined or lack transition methods.
+- **Green Phase:** Implement the dataclasses and state mutators.
+
+### 2. State Machine Tasks
+- **Task 2.1: The Dataclasses**
+  - Create \models.py\. Define \Ticket\ (id, target_file, prompt, worker_archetype, status, dependencies).
+  - Define \Track\ (id, title, description, status, tickets).
+- **Task 2.2: Worker Context Definition**
+  - Define \WorkerContext\ holding a \Ticket\ ID, assigned model, configuration injection, and an ephemeral \messages\ array.
+- **Task 2.3: State Mutator Methods**
+  - Implement methods like \	icket.mark_blocked(dependency_id)\, \	icket.mark_complete()\, and \	rack.get_executable_tickets()\. Ensure strict validation of valid state transitions.
+
+### 3. Acceptance Testing Criteria
+- **Unit Tests:** \models.py\ has 100% test coverage for all state transitions.
+- **Integration Test:** Instantiate a \Track\ with 3 dependent \Tickets\ in Python. Programmatically mark tickets as complete and assert that the subsequent dependent tickets transition from \locked\ to \pending\ without any AI involvement.
+
+## Track 3: The Linear Orchestrator & Execution Clutch
+
+**Goal:** Build the synchronous, debuggable core loop that runs a single Tier 3 Worker and pauses for human approval.
+
+### 1. TDD Approach for \multi_agent_conductor.py\
+- Create \	ests/test_conductor.py\.
+- Write tests that mock the AI client response (e.g., returning a mock tool call like \write_file\).
+- Test that \
+un_worker_lifecycle(ticket: Ticket)\ fetches the Raw View from \ile_cache.py\, formats messages, and processes the mock output.
+- Test that execution pauses (waits for a simulated human signal) when the \	rust_level\ dictates.
+- **Red Phase:** Failure occurs because \multi_agent_conductor.py\ lacks the lifecycle execution loop.
+- **Green Phase:** Implement the \ConductorEngine\ core execution block.
+
+### 2. Linear Orchestration Tasks
+- **Task 3.1: The Engine Core**
+  - Create \multi_agent_conductor.py\. Implement the \ConductorEngine\ class containing the \
+un_worker_lifecycle\ synchronous execution.
+- **Task 3.2: Context Injection**
+  - Implement logic reading the Ticket target, querying \ile_cache.py\ for the \Raw View\, and formatting the messages array for the API.
+- **Task 3.3: The HITL Execution Clutch**
+  - Before executing tools via \mcp_client.py\ or \shell_runner.py\, intercept the tool payload if the Worker's archetype dictates a \step\ mode.
+  - Wait for explicit user confirmation via a CLI prompt (or event block for UI future-proofing). Allow editing of the JSON payload.
+  - Flush history upon \TicketCompleted\.
+
+### 3. Acceptance Testing Criteria
+- **Unit Tests:** Context generation, API schema mapping, and event-blocking are tested for all Edge cases.
+- **Integration Test:** Manually execute a script pointing the \ConductorEngine\ at a dummy file. The CLI should pause before \write_file\ execution, display the diff, allow manual JSON editing via terminal input, execute the updated JSON file modification, and return \Task Complete\.
+
+## Track 4: Tier 4 QA Interception
+
+**Goal:** Stop error traces from destroying the Worker's token window by routing crashes through a cheap, stateless translator.
+
+### 1. TDD Approach for \shell_runner.py\
+- Create \	ests/test_shell_runner.py\.
+- Write tests that mock a local execution failure (e.g., returning a mock 3000-line Python stack trace).
+- Test that the error is intercepted and passed to a mock Tier 4 agent.
+- Test that the output is compressed into a 20-word fix before returning.
+- **Red Phase:** Fails because no interception loop exists in \shell_runner.py\.
+- **Green Phase:** Implement the try/except logic handling \subprocess.run()\ with \
+eturncode != 0\.
+
+### 2. QA Interception Tasks
+- **Task 4.1: The Interceptor Loop**
+  - Open \shell_runner.py\ and catch execution errors.
+- **Task 4.2: Tier 4 Instantiation**
+  - Construct a secondary, synchronous API call directly to the \default_cheap\ model, sending the raw \stderr\ and the offending code snippet.
+- **Task 4.3: Payload Formatting**
+  - Inject the 20-word fix response from the Tier 4 agent back into the main Tier 3 worker's history context as a system hint.
+
+### 3. Acceptance Testing Criteria
+- **Unit Tests:** Verify that massive error outputs never leak uncompressed into the main history logs.
+- **Integration Test:** Purposely introduce a syntax error in a local script. Ensure the orchestrator catches it, pings the mock/cheap API, and the history log receives the 20-word hint instead of the 200-line stack trace.
+
+## Track 5: UI Decoupling & Tier 1/2 Routing (The Final Boss)
+
+**Goal:** Bring the whole system online by letting Tier 1 and Tier 2 generate Tickets dynamically, managed via an asynchronous Event Bus.
+
+### 1. TDD Approach for \gui_2.py\ Decoupling
+- Create \	ests/test_gui_decoupling.py\.
+- Write tests that instantiate a mocked GUI instance listening to an \syncio.Queue\.
+- Mock pushing \TrackStateUpdated\ and \TicketStarted\ events into the queue and ensure the GUI updates its view state rather than calling LLM endpoints directly.
+- **Red Phase:** Failure occurs because \gui_2.py\ is tightly coupled with \i_client.py\ logic.
+- **Green Phase:** Implement the \AgentBus\ messaging system linking \multi_agent_conductor.py\ to \gui_2.py\.
+
+### 2. Tier 1/2 Routing Tasks
+- **Task 5.1: The Event Bus**
+  - Implement an \syncio.Queue\ in \multi_agent_conductor.py\.
+- **Task 5.2: Tier 1 & 2 System Prompts**
+  - Define system prompts that force the 3.1 Pro/3.5 Sonnet models to output strict JSON arrays defining the Tracks and Tickets (utilizing native Structured Outputs).
+- **Task 5.3: The Dispatcher**
+  - Write an async loop that reads JSON from Tier 2, converts them into \Ticket\ objects, and pushes them onto the queue.
+  - Implement the Stub Resolver to enforce \contract_stubber\ dependent execution flow.
+- **Task 5.4: UI Component Update**
+  - Remove direct LLM calls from \gui_2.py\. Wire user inputs into \UserRequestEvents\ for the Orchestrator's queue.
@@ -1,50 +1,50 @@
 # Implementation Plan: 4-Tier Architecture Implementation & Conductor Self-Improvement

-## Phase 1: `manual_slop` Migration Planning
- [ ] Task: Synthesize MMA Documentation
-    - [ ] Read and analyze `./MMA_Support/Data_Pipelines_and_Config.md` and `./MMA_Support/OriginalDiscussion.md`
-    - [ ] Read and analyze `./MMA_Support/Tier1_Orchestrator.md` through `./MMA_Support/Tier4_Utility.md`
-    - [ ] Document key takeaways and constraints for the migration plan
- [ ] Task: Draft Track 1 - The Memory Foundations (AST Parser)
-    - [ ] Define TDD approach for `tree-sitter` integration in `file_cache.py`
-    - [ ] Specify tasks for `ASTParser` extraction rules (Skeleton View, Curated View)
-    - [ ] Define acceptance testing criteria for AST extraction
- [ ] Task: Draft Track 2 - State Machine & Data Structures
-    - [ ] Define TDD approach for `models.py` (`Track`, `Ticket`, `WorkerContext`)
-    - [ ] Specify tasks for state mutator methods
-    - [ ] Define acceptance testing criteria for state transitions
- [ ] Task: Draft Track 3 - The Linear Orchestrator & Execution Clutch
-    - [ ] Define TDD approach for `multi_agent_conductor.py` (`run_worker_lifecycle`)
-    - [ ] Specify tasks for context injection and HITL Clutch implementation
-    - [ ] Define acceptance testing criteria for the linear orchestration loop
- [ ] Task: Draft Track 4 - Tier 4 QA Interception
-    - [ ] Define TDD approach for `shell_runner.py` stderr interception
-    - [ ] Specify tasks for routing errors to the cheap API model
-    - [ ] Define acceptance testing criteria for the QA interception loop
- [ ] Task: Draft Track 5 - UI Decoupling & Tier 1/2 Routing (The Final Boss)
-    - [ ] Define TDD approach for async queue in `multi_agent_conductor.py`
-    - [ ] Specify tasks for Tier 1 & 2 system prompts and the Dispatcher async loop
-    - [ ] Define acceptance testing criteria for UI decoupling and dynamic routing
- [ ] Task: Conductor - User Manual Verification '`manual_slop` Migration Planning' (Protocol in workflow.md)
+## Phase 1: `manual_slop` Migration Planning [checkpoint: e07e8e5]
+- [x] Task: Synthesize MMA Documentation [46b351e]
+    - [x] Read and analyze `./MMA_Support/Data_Pipelines_and_Config.md` and `./MMA_Support/OriginalDiscussion.md`
+    - [x] Read and analyze `./MMA_Support/Tier1_Orchestrator.md` through `./MMA_Support/Tier4_Utility.md`
+    - [x] Document key takeaways and constraints for the migration plan
+- [x] Task: Draft Track 1 - The Memory Foundations (AST Parser) [bdd935d]
+    - [x] Define TDD approach for `tree-sitter` integration in `file_cache.py`
+    - [x] Specify tasks for `ASTParser` extraction rules (Skeleton View, Curated View)
+    - [x] Define acceptance testing criteria for AST extraction
+- [x] Task: Draft Track 2 - State Machine & Data Structures [1198aee]
+    - [x] Define TDD approach for `models.py` (`Track`, `Ticket`, `WorkerContext`)
+    - [x] Specify tasks for state mutator methods
+    - [x] Define acceptance testing criteria for state transitions
+- [x] Task: Draft Track 3 - The Linear Orchestrator & Execution Clutch [aaeed92]
+    - [x] Define TDD approach for `multi_agent_conductor.py` (`run_worker_lifecycle`)
+    - [x] Specify tasks for context injection and HITL Clutch implementation
+    - [x] Define acceptance testing criteria for the linear orchestration loop
+- [x] Task: Draft Track 4 - Tier 4 QA Interception [584bff9]
+    - [x] Define TDD approach for `shell_runner.py` stderr interception
+    - [x] Specify tasks for routing errors to the cheap API model
+    - [x] Define acceptance testing criteria for the QA interception loop
+- [x] Task: Draft Track 5 - UI Decoupling & Tier 1/2 Routing (The Final Boss) [67734c9]
+    - [x] Define TDD approach for async queue in `multi_agent_conductor.py`
+    - [x] Specify tasks for Tier 1 & 2 system prompts and the Dispatcher async loop
+    - [x] Define acceptance testing criteria for UI decoupling and dynamic routing
+- [x] Task: Conductor - User Manual Verification '`manual_slop` Migration Planning' (Protocol in workflow.md) [e07e8e5]

-## Phase 2: Conductor Self-Reflection & Upgrade Strategy
- [ ] Task: Research Optimal Proposal Format
-    - [ ] Search Gemini CLI documentation for extension guidelines
-    - [ ] Search Conductor documentation for tuning and advice
-    - [ ] Define the structure for `proposal.md` based on findings
- [ ] Task: Draft Proposal - Memory Siloing & Token Firewalling
-    - [ ] Evaluate current `conductor` context management
-    - [ ] Propose strategies to prevent token bloat during planning and execution
-    - [ ] Write the corresponding section in `proposal.md`
- [ ] Task: Draft Proposal - Execution Clutch & Linear Debug Mode
-    - [ ] Evaluate current `conductor` execution workflows
-    - [ ] Propose mechanisms for manual step-through and auto modes
-    - [ ] Write the corresponding section in `proposal.md`
- [ ] Task: Draft Proposal - Multi-Model/Sub-Agent Delegation
-    - [ ] Evaluate current `conductor` single-model reliance
-    - [ ] Propose a design for delegating tasks (e.g., summarization, syntax-fixing) to sub-agents
-    - [ ] Write the corresponding section in `proposal.md`
- [ ] Task: Review and Finalize Proposal
-    - [ ] Ensure all three core areas are addressed with equal priority
-    - [ ] Verify alignment with the overall 4-Tier Architecture philosophy
- [ ] Task: Conductor - User Manual Verification 'Conductor Self-Reflection & Upgrade Strategy' (Protocol in workflow.md)
+## Phase 2: Conductor Self-Reflection & Upgrade Strategy [checkpoint: 40339a1]
+- [x] Task: Research Optimal Proposal Format [0c5f8b9]
+    - [x] Search Gemini CLI documentation for extension guidelines
+    - [x] Search Conductor documentation for tuning and advice
+    - [x] Define the structure for `proposal.md` based on findings
+- [x] Task: Draft Proposal - Memory Siloing & Token Firewalling [59556d1]
+    - [x] Evaluate current `conductor` context management
+    - [x] Propose strategies to prevent token bloat during planning and execution
+    - [x] Write the corresponding section in `proposal.md`
+- [x] Task: Draft Proposal - Execution Clutch & Linear Debug Mode [baff5c1]
+    - [x] Evaluate current `conductor` execution workflows
+    - [x] Propose mechanisms for manual step-through and auto modes
+    - [x] Write the corresponding section in `proposal.md`
+- [x] Task: Draft Proposal - Multi-Model/Sub-Agent Delegation [f62bf31]
+    - [x] Evaluate current `conductor` single-model reliance
+    - [x] Propose a design for delegating tasks (e.g., summarization, syntax-fixing) to sub-agents
+    - [x] Write the corresponding section in `proposal.md`
+- [x] Task: Review and Finalize Proposal [f62bf31]
+    - [x] Ensure all three core areas are addressed with equal priority
+    - [x] Verify alignment with the overall 4-Tier Architecture philosophy
+- [x] Task: Conductor - User Manual Verification 'Conductor Self-Reflection & Upgrade Strategy' (Protocol in workflow.md) [40339a1]
@@ -0,0 +1,40 @@
+# Conductor Self-Reflection & Upgrade Strategy Proposal
+
+## 1. Executive Summary
+This proposal outlines a strategic path for upgrading the Gemini CLI `conductor` extension to fully embrace the 4-Tier Hierarchical Multi-Model Architecture principles. By migrating from a monolithic, context-heavy single-agent loop to a compartmentalized, multi-model delegation system, Conductor can drastically reduce token burn, mitigate hallucination loops, and grant developers surgical Human-In-The-Loop (HITL) control over execution tasks.
+
+## 2. Memory Siloing & Token Firewalling
+
+### Current Evaluation
+Currently, the `conductor` extension relies heavily on reading index files and full markdown texts recursively through the project structure. This injects entire tracks, plans, guidelines, and specifications into the LLM context continuously. While beneficial for ensuring alignment with user instructions, this linear scaling creates immense token bloat during repetitive planning and execution loops. 
+
+### Proposed Upgrade Strategy
+To align with the 4-Tier Architecture, the Conductor extension must implement **Token Firewalling**:
+1. **Curated Manifests & Viewports:** Implement an extension tool or AST parser hook to generate "Skeleton Views" or restricted tree maps instead of fully loading index files into the prompt.
+2. **Stateless Sub-Agent Invocations:** Delegate localized tasks (like writing documentation updates to a single file) to a background sub-agent (via `run_shell_command` leveraging a separate stateless invocation, or by utilizing Gemini CLI's sub-agent framework). This prevents the main conductor thread from storing the trial-and-error generation in its history.
+3. **Amnesiac Context Management:** Incorporate lifecycle hooks (`before_tool_call`, `after_tool_call`) to clean up unnecessary tool outputs from the active memory array, only keeping the 50-token summaries of execution outcomes.
+
+## 3. Execution Clutch & Linear Debug Mode
+
+### Current Evaluation
+Conductor currently employs an iterative, fire-and-forget `execute_tasks` workflow where each `replace`, `write_file`, and `run_shell_command` is done sequentially via its prompt instructions. While autonomous, the user's only control mechanism during rapid tool-calling is the standard CLI prompt interruption, which may leave tracked artifacts in an inconsistent state or execute runaway hallucinated loops.
+
+### Proposed Upgrade Strategy
+To enforce precise developer control, Conductor should natively embed a **Human-In-The-Loop Execution Clutch**:
+1. **Interactive Checkpoints (Trust Levels):** Use extension hooks like `before_tool_call` to intercept payload executions based on heuristic models. Tools like `replace` might trigger an interactive payload editor (`vim` / CLI editor plugin) before applying the JSON parameters, ensuring full developer review.
+2. **Global Linear Mode Flag:** Implement a `gemini conductor:implement --step` flag. This configures the engine to pause execution and prompt the user using `ask_user` natively after every major milestone, allowing validation of file diffs and tool payloads before resuming.
+3. **Rollback Mutators:** Provide quick access commands (e.g., via `after_tool_call`) to reject the change, auto-restoring the last known file state, and feeding the error/feedback directly back to the model without breaking the run loop.
+
+## 4. Multi-Model/Sub-Agent Delegation
+
+### Current Evaluation
+Conductor heavily relies on the single primary LLM instantiated by the Gemini CLI session. When acting as a PM, Tech Lead, and Worker simultaneously, the model experiences extreme context exhaustion. Furthermore, handling minor formatting, syntax repairs, or summaries with expensive high-tier reasoning models results in suboptimal cost-efficiency.
+
+### Proposed Upgrade Strategy
+Conductor should leverage the native **Sub-Agent & Skill Routing capabilities**:
+1. **Dynamic Tier Routing:** Utilize specific Sub-agents (like `codebase_investigator` for planning/AST generation) and custom Skills for discrete tasks.
+2. **Stateless Utility Agents (Tier 4):** Hook into test runner commands via `after_tool_call`. If `pytest` fails with massive `stderr`, immediately invoke a cheap background utility sub-agent to parse the log and return a condensed 20-word summary back to the main Orchestrator, rather than feeding the main Orchestrator raw traceback tokens.
+3. **Contract Stubbers:** Embed `contract_stubber` skills that explicitly limit a sub-agent's action strictly to writing `class` or `def` definitions, ensuring cross-module dependency generation without full implementation drift.
+
+## 5. Implementation Strategy
+These upgrades can be realized by augmenting the `gemini-extension.json` manifest with designated MCP hooks, adding new custom Skills to `~/.gemini/skills/`, and overriding default CLI execution flows with `before_tool_call` and `after_tool_call` interception logic tailored explicitly for Token Firewalling and Execution Checkpoints.
@@ -0,0 +1,28 @@
+# MMA Documentation Synthesis
+
+## Key Takeaways
+
+1. **Architecture Model**: 4-Tier Hierarchical Multi-Model Architecture mimicking a senior engineering department.
+   - **Tier 1 (Product Manager)**: High-reasoning models (Gemini 3.1 Pro/Claude 3.5 Sonnet) focusing on Epics and Tracks.
+   - **Tier 2 (Tech Lead)**: Mid-cost models (Gemini 3.0 Flash/2.5 Pro) for Track delegation, Ticket generation, and interface-driven development (Stub-and-Resolve).
+   - **Tier 3 (Contributors)**: Cheap/Fast models (DeepSeek V3/R1, Gemini 2.5 Flash) acting as amnesiac workers for heads-down coding.
+   - **Tier 4 (QA/Compiler)**: Ultra-cheap models (DeepSeek V3) for stateless translation of raw errors to human language.
+
+2. **Strict Context Management**: 
+   - Uses `tree-sitter` for deterministic AST extraction (`Skeleton View`, `Curated Implementation View`, `Directory Map`).
+   - "Context Amnesia" ensures worker threads start fresh and do not accumulate hallucination-inducing token bloat.
+
+3. **Data Pipelines & Formats**:
+   - Tiers 1 & 2 output **Godot ECS Flat Relational Lists** (e.g., INI-style flat lists with `depends_on` pointers) to build DAGs. This avoids JSON nesting nightmares.
+   - Tier 3 uses **XML tags** (`<file_path>`, `<file_content>`) to avoid string escaping friction.
+
+4. **Execution Flow**:
+   - The engine is decoupled from the UI using an `asyncio` event bus.
+   - A global **"Execution Clutch"** allows falling back from `async` parallel swarm mode to strict `linear` step mode for deterministic debugging and human-in-the-loop (HITL) overrides.
+
+## Constraints for Migration Plan
+
+- **Security**: `credentials.toml` must be strictly isolated and ignored in version control.
+- **Phased Rollout**: Migration cannot be a single rewrite. It must follow strict tracks: AST Parser -> State Machine -> Linear Orchestrator -> Tier 4 QA -> UI Decoupling.
+- **Tooling Constraints**: `tree-sitter` is mandatory for AST parsing.
+- **UI State**: The GUI must be fully decoupled ("dumb" renderer) responding to queue events instead of blocking on LLM calls.
@@ -358,3 +358,17 @@ A task is complete when:
 - Document lessons learned
 - Optimize for user happiness
 - Keep things simple and maintainable
+
+## Conductor Token Firewalling & Model Switching Strategy
+
+To emulate the 4-Tier MMA Architecture within the standard Conductor extension without requiring a custom fork, adhere to these strict workflow policies:
+
+### 1. Active Model Switching (Simulating the 4 Tiers)
+- **Activate MMA Orchestrator Skill:** To enforce the 4-Tier token firewall explicitly, invoke `/activate_skill mma-orchestrator` (or use the `activate_skill` tool) when planning or executing new tracks.
+- **Phase Planning & Macro Merges (Tier 1):** Use high-reasoning models (e.g., Gemini 1.5 Pro or Claude 3.5 Sonnet) when running `/conductor:setup` or when reviewing a major phase checkpoint.
+- **Track Delegation & Implementation (Tier 2/3):** The MMA Orchestrator skill autonomously dispatches Tier 3 (Heads-Down Coding) tasks to secondary stateless instances of Gemini CLI (via `.\scripts\run_subagent.ps1 -Prompt "..."`) rather than performing heavy coding directly in the main thread.
+- **QA/Fixing (Tier 4):** If a test fails with a massive traceback, **DO NOT** paste the traceback into the main conductor thread. Instead, the MMA Orchestrator skill instructs you to spawn a fast/cheap model sub-agent (via a shell command) to compress the error trace into a 20-word fix, keeping the main context clean.
+
+### 2. Context Checkpoints (The Token Firewall)
+- The **Phase Completion Verification and Checkpointing Protocol** is the project's primary defense against token bloat.
+- When a Phase is marked complete and a checkpoint commit is created, the AI Agent must actively interpret this as a **"Context Wipe"** signal. It should summarize the outcome in its git notes and move forward treating the checkpoint as absolute truth, deliberately dropping earlier conversational history and trial-and-error logs to preserve token bandwidth for the next phase.
@@ -0,0 +1,36 @@
+param(
+    [Parameter(Mandatory=$true)]
+    [string]$Prompt,
+    
+    [string]$Model = "gemini-3-flash-preview"
+)
+
+# Ensure the session has the API key loaded
+. C:\projects\misc\setup_gemini.ps1
+
+# Prepend a strict system instruction to the prompt to prevent the model from entering a tool-usage loop
+$SafePrompt = "STRICT SYSTEM DIRECTIVE: You are a stateless utility function. DO NOT USE ANY TOOLS (no write_file, no run_shell_command, etc.). ONLY output the exact requested text, code, or JSON.`n`nUSER PROMPT:`n$Prompt"
+
+# Execute headless Gemini using -p, suppressing stderr noise
+$jsonOutput = gemini -p $SafePrompt --model $Model --output-format json 2>$null
+
+try {
+    # Extract only the JSON part
+    $fullString = $jsonOutput -join "`n"
+    $jsonStartIndex = $fullString.IndexOf("{")
+    
+    if ($jsonStartIndex -ge 0) {
+        $cleanJsonString = $fullString.Substring($jsonStartIndex)
+        $parsed = $cleanJsonString | ConvertFrom-Json
+        
+        # Output only the clean response text
+        Write-Output $parsed.response
+    } else {
+        Write-Warning "No JSON object found in output."
+        Write-Output $fullString
+    }
+} catch {
+    # Fallback if parsing fails
+    Write-Warning "Failed to parse JSON from sub-agent. Raw output:"
+    Write-Output $jsonOutput
+}
Author	SHA1	Message	Date
ed	bab468fc82	fix(conductor): Enforce strict statelessness and robust JSON parsing for subagents	2026-02-24 23:36:41 -05:00
ed	462ed2266a	feat(conductor): Add run_subagent script for stable headless skill invocation	2026-02-24 23:17:45 -05:00
ed	0080ceb397	docs(conductor): Add MMA_Support as the fallback source of truth to the core engine track	2026-02-24 23:03:14 -05:00
ed	45abcbb1b9	feat(conductor): Consolidate MMA implementation into single multi-phase track and draft Agent Skill	2026-02-24 22:57:28 -05:00
ed	10c5705748	docs(conductor): Add Token Firewalling and Model Switching Strategy	2026-02-24 22:45:17 -05:00
ed	f76054b1df	feat(conductor): Scaffold MMA Migration Tracks from Epics	2026-02-24 22:44:36 -05:00
ed	982fbfa1cf	docs(conductor): Synchronize docs for track '4-Tier Architecture Implementation & Conductor Self-Improvement'	2026-02-24 22:39:20 -05:00
ed	25f9edbed1	chore(conductor): Mark track '4-Tier Architecture Implementation & Conductor Self-Improvement' as complete	2026-02-24 22:38:13 -05:00
ed	5c4a195505	conductor(plan): Mark phase 'Phase 2: Conductor Self-Reflection' as complete	2026-02-24 22:37:49 -05:00
ed	40339a1667	conductor(checkpoint): Checkpoint end of Phase 2: Conductor Self-Reflection & Upgrade Strategy	2026-02-24 22:37:26 -05:00
ed	8dbd6eaade	conductor(plan): Mark tasks 'Multi-Model' and 'Review' as complete	2026-02-24 22:35:31 -05:00
ed	f62bf3113f	docs(mma): Draft Multi-Model Delegation and finish Proposal	2026-02-24 22:35:02 -05:00
ed	baff5c18d3	docs(mma): Draft Execution Clutch & Linear Debug Mode section	2026-02-24 22:34:19 -05:00
ed	2647586286	conductor(plan): Mark task 'Execution Clutch' as in progress	2026-02-24 22:34:16 -05:00
ed	30574aefd1	conductor(plan): Mark task 'Draft Proposal - Memory Siloing' as complete	2026-02-24 22:33:58 -05:00
ed	ae67c93015	docs(mma): Draft Memory Siloing & Token Firewalling section	2026-02-24 22:33:44 -05:00
ed	c409a6d2a3	conductor(plan): Mark task 'Research Optimal Proposal Format' as complete	2026-02-24 22:33:32 -05:00
ed	0c5f8b9bfe	docs(mma): Draft outline for Conductor Self-Reflection Proposal	2026-02-24 22:33:07 -05:00
ed	4a66f994ee	conductor(plan): Mark task 'Research Optimal Proposal Format' as in progress	2026-02-24 22:31:57 -05:00
ed	5ea8059812	conductor(plan): Mark phase 'Phase 1: manual_slop Migration Planning' as complete	2026-02-24 22:31:41 -05:00
ed	e07e8e5127	conductor(checkpoint): Checkpoint end of Phase 1: manual_slop Migration Planning	2026-02-24 22:31:19 -05:00
ed	5278c05cec	conductor(plan): Mark task 'Draft Track 5' as complete	2026-02-24 22:28:41 -05:00
ed	67734c92a1	docs(mma): Draft Track 5 - UI Decoupling & Tier 1/2 Routing	2026-02-24 22:27:22 -05:00
ed	a9786d4737	conductor(plan): Mark task 'Draft Track 4' as complete	2026-02-24 22:27:02 -05:00
ed	584bff9c06	docs(mma): Draft Track 4 - Tier 4 QA Interception	2026-02-24 22:26:27 -05:00
ed	ac55b553b3	conductor(plan): Mark task 'Draft Track 3' as complete	2026-02-24 22:25:21 -05:00
ed	aaeed92e3a	docs(mma): Draft Track 3 - The Linear Orchestrator & Execution Clutch	2026-02-24 22:24:28 -05:00
ed	447a701dc4	conductor(plan): Mark task 'Draft Track 2' as complete	2026-02-24 22:18:37 -05:00
ed	1198aee36e	docs(mma): Draft Track 2 - State Machine & Data Structures	2026-02-24 22:18:14 -05:00
ed	95c6f1f4b2	conductor(plan): Mark task 'Draft Track 1' as complete	2026-02-24 22:17:46 -05:00
ed	bdd935ddfd	docs(mma): Draft Track 1 - The Memory Foundations	2026-02-24 22:17:34 -05:00
ed	4dd4be4afb	conductor(plan): Mark task 'Synthesize MMA Documentation' as complete	2026-02-24 22:17:09 -05:00
ed	46b351e945	docs(mma): Synthesize MMA Documentation constraints and takeaways	2026-02-24 22:16:44 -05:00