From 0dedcc177312c7282f05f6225856fdb1dc9ac441 Mon Sep 17 00:00:00 2001 From: Ed_ Date: Fri, 27 Feb 2026 19:22:24 -0500 Subject: [PATCH] docs(conductor): Add context and origins block to new phase 2 specs --- .../spec.md | 14 +- .../mma_data_architecture_dag_engine/spec.md | 14 +- .../mma_implementation_20260224/index.md | 5 - .../mma_implementation_20260224/metadata.json | 8 -- .../migration_epics.md | 128 ------------------ .../mma_implementation_20260224/plan.md | 50 ------- .../mma_implementation_20260224/proposal.md | 40 ------ .../mma_implementation_20260224/spec.md | 37 ----- .../mma_implementation_20260224/synthesis.md | 28 ---- .../spec.md | 14 +- .../spec.md | 14 +- 11 files changed, 52 insertions(+), 300 deletions(-) delete mode 100644 conductor/tracks/mma_implementation_20260224/index.md delete mode 100644 conductor/tracks/mma_implementation_20260224/metadata.json delete mode 100644 conductor/tracks/mma_implementation_20260224/migration_epics.md delete mode 100644 conductor/tracks/mma_implementation_20260224/plan.md delete mode 100644 conductor/tracks/mma_implementation_20260224/proposal.md delete mode 100644 conductor/tracks/mma_implementation_20260224/spec.md delete mode 100644 conductor/tracks/mma_implementation_20260224/synthesis.md diff --git a/conductor/tracks/mma_dashboard_visualization_overhaul/spec.md b/conductor/tracks/mma_dashboard_visualization_overhaul/spec.md index 5b139dc..7aa88b0 100644 --- a/conductor/tracks/mma_dashboard_visualization_overhaul/spec.md +++ b/conductor/tracks/mma_dashboard_visualization_overhaul/spec.md @@ -10,4 +10,16 @@ Make the invisible backend operations visible and interactive. The current GUI i ## Constraints - Must be built natively using DearPyGui capabilities (e.g., node editor or nested tree nodes). -- Must be performant and not block the main rendering loop when updating live text streams. \ No newline at end of file +- Must be performant and not block the main rendering loop when updating live text streams. + +## Context & Origins +This track was born from the "Human Verification" phase of the initial MMA Orchestrator prototype (`mma_orchestrator_integration_20260226`). We realized that while the backend API plumbing for the hierarchical MMA tiers (Tiers 1-4) was technically functional, the product lacked the necessary state management, UX visualization, and human-in-the-loop security gates to be usable. + +**Key Takeaways from the Prototype Phase:** +- The Tier 2 (Tech Lead) needs its own track-scoped discussion history, rather than polluting the global project history. +- Tasks within a track require a DAG (Directed Acyclic Graph) engine to manage complex dependencies and blocking states. +- The GUI must visualize this DAG and stream the output of individual workers directly to their associated tasks. +- We must enforce tiered context subsetting so that Tier 3/4 workers don't receive the massive global context blob, and we need a pre-spawn approval modal so the user can intercept, review, and modify worker prompts/contexts before they execute. + +**Instructions for the Implementing Agent:** +As you execute this track, ensure you maintain alignment with the other Phase 2 tracks. If you learn something that impacts the dependent tracks, please append a similar "Context Summary" to their `spec.md` files before concluding your run. \ No newline at end of file diff --git a/conductor/tracks/mma_data_architecture_dag_engine/spec.md b/conductor/tracks/mma_data_architecture_dag_engine/spec.md index f9ad3da..d59b951 100644 --- a/conductor/tracks/mma_data_architecture_dag_engine/spec.md +++ b/conductor/tracks/mma_data_architecture_dag_engine/spec.md @@ -10,4 +10,16 @@ Restructure how `manual_slop` stores and executes work. The current implementati ## Constraints - Must integrate seamlessly with the existing project TOML structure. -- Avoid external complex dependencies (like Steve Yegge's Beads) for now; rely on standard Python libraries or lightweight custom implementations to ensure tight coupling with the DearPyGui stack. \ No newline at end of file +- Avoid external complex dependencies (like Steve Yegge's Beads) for now; rely on standard Python libraries or lightweight custom implementations to ensure tight coupling with the DearPyGui stack. + +## Context & Origins +This track was born from the "Human Verification" phase of the initial MMA Orchestrator prototype (`mma_orchestrator_integration_20260226`). We realized that while the backend API plumbing for the hierarchical MMA tiers (Tiers 1-4) was technically functional, the product lacked the necessary state management, UX visualization, and human-in-the-loop security gates to be usable. + +**Key Takeaways from the Prototype Phase:** +- The Tier 2 (Tech Lead) needs its own track-scoped discussion history, rather than polluting the global project history. +- Tasks within a track require a DAG (Directed Acyclic Graph) engine to manage complex dependencies and blocking states. +- The GUI must visualize this DAG and stream the output of individual workers directly to their associated tasks. +- We must enforce tiered context subsetting so that Tier 3/4 workers don't receive the massive global context blob, and we need a pre-spawn approval modal so the user can intercept, review, and modify worker prompts/contexts before they execute. + +**Instructions for the Implementing Agent:** +As you execute this track, ensure you maintain alignment with the other Phase 2 tracks. If you learn something that impacts the dependent tracks, please append a similar "Context Summary" to their `spec.md` files before concluding your run. \ No newline at end of file diff --git a/conductor/tracks/mma_implementation_20260224/index.md b/conductor/tracks/mma_implementation_20260224/index.md deleted file mode 100644 index 5f25552..0000000 --- a/conductor/tracks/mma_implementation_20260224/index.md +++ /dev/null @@ -1,5 +0,0 @@ -# Track mma_implementation_20260224 Context - -- [Specification](./spec.md) -- [Implementation Plan](./plan.md) -- [Metadata](./metadata.json) \ No newline at end of file diff --git a/conductor/tracks/mma_implementation_20260224/metadata.json b/conductor/tracks/mma_implementation_20260224/metadata.json deleted file mode 100644 index b660ce5..0000000 --- a/conductor/tracks/mma_implementation_20260224/metadata.json +++ /dev/null @@ -1,8 +0,0 @@ -{ - "track_id": "mma_implementation_20260224", - "type": "feature", - "status": "new", - "created_at": "2026-02-24T00:00:00Z", - "updated_at": "2026-02-24T00:00:00Z", - "description": "4-Tier Architecture Implementation & Conductor Self-Improvement" -} \ No newline at end of file diff --git a/conductor/tracks/mma_implementation_20260224/migration_epics.md b/conductor/tracks/mma_implementation_20260224/migration_epics.md deleted file mode 100644 index d6b35be..0000000 --- a/conductor/tracks/mma_implementation_20260224/migration_epics.md +++ /dev/null @@ -1,128 +0,0 @@ -# MMA Migration: Epics and Detailed Tasks - -## Track 1: The Memory Foundations (AST Parser) - -**Goal:** Build the engine that prevents token-bloat by turning massive source files into curated memory views. - -### 1. TDD Approach for `tree-sitter` Integration -- Create `tests/test_file_cache_ast.py`. -- Define mock Python source files containing various structures (classes, functions, docstrings, `@core_logic` decorators, `# [HOT]` comments). -- Write failing tests that instantiate `ASTParser` and assert that `get_skeleton_view()` and `get_curated_view()` return the precisely filtered strings. -- **Red Phase:** Ensure tests fail because `ASTParser` does not exist. -- **Green Phase:** Implement the tree-sitter logic iteratively until strings match exactly. - -### 2. `ASTParser` Extraction Rules (Tasks) -- **Task 1.1: Dependency Setup** - - Add `tree-sitter` and `tree-sitter-python` to `pyproject.toml` / `requirements.txt`. -- **Task 1.2: Core Parser Class** - - Create `ASTParser` in `file_cache.py` that initializes the language parser. -- **Task 1.3: Skeleton View Extraction** - - Write query to extract `function_definition` and `class_definition`. - - Keep signatures, parameters, and return type hints. - - Replace all bodies with `pass`. -- **Task 1.4: Curated View Extraction** - - Write query to keep class structures and `expression_statement` docstrings. - - Implement heuristic to preserve full bodies of functions decorated with `@core_logic` or containing `# [HOT]` comments. - - Replace all other function bodies with `... # Hidden`. - -### 3. Acceptance Testing Criteria -- **Unit Tests:** All AST parsing tests pass with >90% coverage for `file_cache.py`. -- **Integration Test:** Execute the parser on a large, complex project file (e.g., `ai_client.py`). The output `Skeleton View` must be less than 15% of the original token count. The `Curated View` must correctly retain docstrings and marked functions while stripping standard bodies. -## Track 2: State Machine & Data Structures - -**Goal:** Define the rigid Python objects (Pydantic/Dataclasses) that AI agents will pass to each other, enforcing structured data over loose chat strings. - -### 1. TDD Approach for \models.py\ -- Create \ ests/test_models.py\. -- Write failing tests that instantiate \Track\, \Ticket\, and \WorkerContext\ with various valid and invalid schemas. -- Write tests that assert state transitions (e.g., from \pending\ to \locked\, from \step_paused\ to \completed\) correctly update internal flags and dependencies. -- **Red Phase:** Tests fail because \models.py\ classes are undefined or lack transition methods. -- **Green Phase:** Implement the dataclasses and state mutators. - -### 2. State Machine Tasks -- **Task 2.1: The Dataclasses** - - Create \models.py\. Define \Ticket\ (id, target_file, prompt, worker_archetype, status, dependencies). - - Define \Track\ (id, title, description, status, tickets). -- **Task 2.2: Worker Context Definition** - - Define \WorkerContext\ holding a \Ticket\ ID, assigned model, configuration injection, and an ephemeral \messages\ array. -- **Task 2.3: State Mutator Methods** - - Implement methods like \ icket.mark_blocked(dependency_id)\, \ icket.mark_complete()\, and \ rack.get_executable_tickets()\. Ensure strict validation of valid state transitions. - -### 3. Acceptance Testing Criteria -- **Unit Tests:** \models.py\ has 100% test coverage for all state transitions. -- **Integration Test:** Instantiate a \Track\ with 3 dependent \Tickets\ in Python. Programmatically mark tickets as complete and assert that the subsequent dependent tickets transition from \locked\ to \pending\ without any AI involvement. - -## Track 3: The Linear Orchestrator & Execution Clutch - -**Goal:** Build the synchronous, debuggable core loop that runs a single Tier 3 Worker and pauses for human approval. - -### 1. TDD Approach for \multi_agent_conductor.py\ -- Create \ ests/test_conductor.py\. -- Write tests that mock the AI client response (e.g., returning a mock tool call like \write_file\). -- Test that \ un_worker_lifecycle(ticket: Ticket)\ fetches the Raw View from \ ile_cache.py\, formats messages, and processes the mock output. -- Test that execution pauses (waits for a simulated human signal) when the \ rust_level\ dictates. -- **Red Phase:** Failure occurs because \multi_agent_conductor.py\ lacks the lifecycle execution loop. -- **Green Phase:** Implement the \ConductorEngine\ core execution block. - -### 2. Linear Orchestration Tasks -- **Task 3.1: The Engine Core** - - Create \multi_agent_conductor.py\. Implement the \ConductorEngine\ class containing the \ un_worker_lifecycle\ synchronous execution. -- **Task 3.2: Context Injection** - - Implement logic reading the Ticket target, querying \ ile_cache.py\ for the \Raw View\, and formatting the messages array for the API. -- **Task 3.3: The HITL Execution Clutch** - - Before executing tools via \mcp_client.py\ or \shell_runner.py\, intercept the tool payload if the Worker's archetype dictates a \step\ mode. - - Wait for explicit user confirmation via a CLI prompt (or event block for UI future-proofing). Allow editing of the JSON payload. - - Flush history upon \TicketCompleted\. - -### 3. Acceptance Testing Criteria -- **Unit Tests:** Context generation, API schema mapping, and event-blocking are tested for all Edge cases. -- **Integration Test:** Manually execute a script pointing the \ConductorEngine\ at a dummy file. The CLI should pause before \write_file\ execution, display the diff, allow manual JSON editing via terminal input, execute the updated JSON file modification, and return \Task Complete\. - -## Track 4: Tier 4 QA Interception - -**Goal:** Stop error traces from destroying the Worker's token window by routing crashes through a cheap, stateless translator. - -### 1. TDD Approach for \shell_runner.py\ -- Create \ ests/test_shell_runner.py\. -- Write tests that mock a local execution failure (e.g., returning a mock 3000-line Python stack trace). -- Test that the error is intercepted and passed to a mock Tier 4 agent. -- Test that the output is compressed into a 20-word fix before returning. -- **Red Phase:** Fails because no interception loop exists in \shell_runner.py\. -- **Green Phase:** Implement the try/except logic handling \subprocess.run()\ with \ eturncode != 0\. - -### 2. QA Interception Tasks -- **Task 4.1: The Interceptor Loop** - - Open \shell_runner.py\ and catch execution errors. -- **Task 4.2: Tier 4 Instantiation** - - Construct a secondary, synchronous API call directly to the \default_cheap\ model, sending the raw \stderr\ and the offending code snippet. -- **Task 4.3: Payload Formatting** - - Inject the 20-word fix response from the Tier 4 agent back into the main Tier 3 worker's history context as a system hint. - -### 3. Acceptance Testing Criteria -- **Unit Tests:** Verify that massive error outputs never leak uncompressed into the main history logs. -- **Integration Test:** Purposely introduce a syntax error in a local script. Ensure the orchestrator catches it, pings the mock/cheap API, and the history log receives the 20-word hint instead of the 200-line stack trace. - -## Track 5: UI Decoupling & Tier 1/2 Routing (The Final Boss) - -**Goal:** Bring the whole system online by letting Tier 1 and Tier 2 generate Tickets dynamically, managed via an asynchronous Event Bus. - -### 1. TDD Approach for \gui_2.py\ Decoupling -- Create \ ests/test_gui_decoupling.py\. -- Write tests that instantiate a mocked GUI instance listening to an \syncio.Queue\. -- Mock pushing \TrackStateUpdated\ and \TicketStarted\ events into the queue and ensure the GUI updates its view state rather than calling LLM endpoints directly. -- **Red Phase:** Failure occurs because \gui_2.py\ is tightly coupled with \i_client.py\ logic. -- **Green Phase:** Implement the \AgentBus\ messaging system linking \multi_agent_conductor.py\ to \gui_2.py\. - -### 2. Tier 1/2 Routing Tasks -- **Task 5.1: The Event Bus** - - Implement an \syncio.Queue\ in \multi_agent_conductor.py\. -- **Task 5.2: Tier 1 & 2 System Prompts** - - Define system prompts that force the 3.1 Pro/3.5 Sonnet models to output strict JSON arrays defining the Tracks and Tickets (utilizing native Structured Outputs). -- **Task 5.3: The Dispatcher** - - Write an async loop that reads JSON from Tier 2, converts them into \Ticket\ objects, and pushes them onto the queue. - - Implement the Stub Resolver to enforce \contract_stubber\ dependent execution flow. -- **Task 5.4: UI Component Update** - - Remove direct LLM calls from \gui_2.py\. Wire user inputs into \UserRequestEvents\ for the Orchestrator's queue. - -### 3. Acceptance Testing Criteria -- **Integration Test:** Execute the full app stack in simulation. Issue a vague prompt ("Refactor the config system"). Ensure Tier 1 outputs a Track. Tier 2 breaks it into an interface stub Ticket and an implementation Ticket. The system executes the stub, updates the AST, and finishes the implementation automatically or allows step-through in Linear mode. diff --git a/conductor/tracks/mma_implementation_20260224/plan.md b/conductor/tracks/mma_implementation_20260224/plan.md deleted file mode 100644 index a8ed684..0000000 --- a/conductor/tracks/mma_implementation_20260224/plan.md +++ /dev/null @@ -1,50 +0,0 @@ -# Implementation Plan: 4-Tier Architecture Implementation & Conductor Self-Improvement - -## Phase 1: `manual_slop` Migration Planning [checkpoint: e07e8e5] -- [x] Task: Synthesize MMA Documentation [46b351e] - - [x] Read and analyze `./MMA_Support/Data_Pipelines_and_Config.md` and `./MMA_Support/OriginalDiscussion.md` - - [x] Read and analyze `./MMA_Support/Tier1_Orchestrator.md` through `./MMA_Support/Tier4_Utility.md` - - [x] Document key takeaways and constraints for the migration plan -- [x] Task: Draft Track 1 - The Memory Foundations (AST Parser) [bdd935d] - - [x] Define TDD approach for `tree-sitter` integration in `file_cache.py` - - [x] Specify tasks for `ASTParser` extraction rules (Skeleton View, Curated View) - - [x] Define acceptance testing criteria for AST extraction -- [x] Task: Draft Track 2 - State Machine & Data Structures [1198aee] - - [x] Define TDD approach for `models.py` (`Track`, `Ticket`, `WorkerContext`) - - [x] Specify tasks for state mutator methods - - [x] Define acceptance testing criteria for state transitions -- [x] Task: Draft Track 3 - The Linear Orchestrator & Execution Clutch [aaeed92] - - [x] Define TDD approach for `multi_agent_conductor.py` (`run_worker_lifecycle`) - - [x] Specify tasks for context injection and HITL Clutch implementation - - [x] Define acceptance testing criteria for the linear orchestration loop -- [x] Task: Draft Track 4 - Tier 4 QA Interception [584bff9] - - [x] Define TDD approach for `shell_runner.py` stderr interception - - [x] Specify tasks for routing errors to the cheap API model - - [x] Define acceptance testing criteria for the QA interception loop -- [x] Task: Draft Track 5 - UI Decoupling & Tier 1/2 Routing (The Final Boss) [67734c9] - - [x] Define TDD approach for async queue in `multi_agent_conductor.py` - - [x] Specify tasks for Tier 1 & 2 system prompts and the Dispatcher async loop - - [x] Define acceptance testing criteria for UI decoupling and dynamic routing -- [x] Task: Conductor - User Manual Verification '`manual_slop` Migration Planning' (Protocol in workflow.md) [e07e8e5] - -## Phase 2: Conductor Self-Reflection & Upgrade Strategy [checkpoint: 40339a1] -- [x] Task: Research Optimal Proposal Format [0c5f8b9] - - [x] Search Gemini CLI documentation for extension guidelines - - [x] Search Conductor documentation for tuning and advice - - [x] Define the structure for `proposal.md` based on findings -- [x] Task: Draft Proposal - Memory Siloing & Token Firewalling [59556d1] - - [x] Evaluate current `conductor` context management - - [x] Propose strategies to prevent token bloat during planning and execution - - [x] Write the corresponding section in `proposal.md` -- [x] Task: Draft Proposal - Execution Clutch & Linear Debug Mode [baff5c1] - - [x] Evaluate current `conductor` execution workflows - - [x] Propose mechanisms for manual step-through and auto modes - - [x] Write the corresponding section in `proposal.md` -- [x] Task: Draft Proposal - Multi-Model/Sub-Agent Delegation [f62bf31] - - [x] Evaluate current `conductor` single-model reliance - - [x] Propose a design for delegating tasks (e.g., summarization, syntax-fixing) to sub-agents - - [x] Write the corresponding section in `proposal.md` -- [x] Task: Review and Finalize Proposal [f62bf31] - - [x] Ensure all three core areas are addressed with equal priority - - [x] Verify alignment with the overall 4-Tier Architecture philosophy -- [x] Task: Conductor - User Manual Verification 'Conductor Self-Reflection & Upgrade Strategy' (Protocol in workflow.md) [40339a1] \ No newline at end of file diff --git a/conductor/tracks/mma_implementation_20260224/proposal.md b/conductor/tracks/mma_implementation_20260224/proposal.md deleted file mode 100644 index b0a1d91..0000000 --- a/conductor/tracks/mma_implementation_20260224/proposal.md +++ /dev/null @@ -1,40 +0,0 @@ -# Conductor Self-Reflection & Upgrade Strategy Proposal - -## 1. Executive Summary -This proposal outlines a strategic path for upgrading the Gemini CLI `conductor` extension to fully embrace the 4-Tier Hierarchical Multi-Model Architecture principles. By migrating from a monolithic, context-heavy single-agent loop to a compartmentalized, multi-model delegation system, Conductor can drastically reduce token burn, mitigate hallucination loops, and grant developers surgical Human-In-The-Loop (HITL) control over execution tasks. - -## 2. Memory Siloing & Token Firewalling - -### Current Evaluation -Currently, the `conductor` extension relies heavily on reading index files and full markdown texts recursively through the project structure. This injects entire tracks, plans, guidelines, and specifications into the LLM context continuously. While beneficial for ensuring alignment with user instructions, this linear scaling creates immense token bloat during repetitive planning and execution loops. - -### Proposed Upgrade Strategy -To align with the 4-Tier Architecture, the Conductor extension must implement **Token Firewalling**: -1. **Curated Manifests & Viewports:** Implement an extension tool or AST parser hook to generate "Skeleton Views" or restricted tree maps instead of fully loading index files into the prompt. -2. **Stateless Sub-Agent Invocations:** Delegate localized tasks (like writing documentation updates to a single file) to a background sub-agent (via `run_shell_command` leveraging a separate stateless invocation, or by utilizing Gemini CLI's sub-agent framework). This prevents the main conductor thread from storing the trial-and-error generation in its history. -3. **Amnesiac Context Management:** Incorporate lifecycle hooks (`before_tool_call`, `after_tool_call`) to clean up unnecessary tool outputs from the active memory array, only keeping the 50-token summaries of execution outcomes. - -## 3. Execution Clutch & Linear Debug Mode - -### Current Evaluation -Conductor currently employs an iterative, fire-and-forget `execute_tasks` workflow where each `replace`, `write_file`, and `run_shell_command` is done sequentially via its prompt instructions. While autonomous, the user's only control mechanism during rapid tool-calling is the standard CLI prompt interruption, which may leave tracked artifacts in an inconsistent state or execute runaway hallucinated loops. - -### Proposed Upgrade Strategy -To enforce precise developer control, Conductor should natively embed a **Human-In-The-Loop Execution Clutch**: -1. **Interactive Checkpoints (Trust Levels):** Use extension hooks like `before_tool_call` to intercept payload executions based on heuristic models. Tools like `replace` might trigger an interactive payload editor (`vim` / CLI editor plugin) before applying the JSON parameters, ensuring full developer review. -2. **Global Linear Mode Flag:** Implement a `gemini conductor:implement --step` flag. This configures the engine to pause execution and prompt the user using `ask_user` natively after every major milestone, allowing validation of file diffs and tool payloads before resuming. -3. **Rollback Mutators:** Provide quick access commands (e.g., via `after_tool_call`) to reject the change, auto-restoring the last known file state, and feeding the error/feedback directly back to the model without breaking the run loop. - -## 4. Multi-Model/Sub-Agent Delegation - -### Current Evaluation -Conductor heavily relies on the single primary LLM instantiated by the Gemini CLI session. When acting as a PM, Tech Lead, and Worker simultaneously, the model experiences extreme context exhaustion. Furthermore, handling minor formatting, syntax repairs, or summaries with expensive high-tier reasoning models results in suboptimal cost-efficiency. - -### Proposed Upgrade Strategy -Conductor should leverage the native **Sub-Agent & Skill Routing capabilities**: -1. **Dynamic Tier Routing:** Utilize specific Sub-agents (like `codebase_investigator` for planning/AST generation) and custom Skills for discrete tasks. -2. **Stateless Utility Agents (Tier 4):** Hook into test runner commands via `after_tool_call`. If `pytest` fails with massive `stderr`, immediately invoke a cheap background utility sub-agent to parse the log and return a condensed 20-word summary back to the main Orchestrator, rather than feeding the main Orchestrator raw traceback tokens. -3. **Contract Stubbers:** Embed `contract_stubber` skills that explicitly limit a sub-agent's action strictly to writing `class` or `def` definitions, ensuring cross-module dependency generation without full implementation drift. - -## 5. Implementation Strategy -These upgrades can be realized by augmenting the `gemini-extension.json` manifest with designated MCP hooks, adding new custom Skills to `~/.gemini/skills/`, and overriding default CLI execution flows with `before_tool_call` and `after_tool_call` interception logic tailored explicitly for Token Firewalling and Execution Checkpoints. \ No newline at end of file diff --git a/conductor/tracks/mma_implementation_20260224/spec.md b/conductor/tracks/mma_implementation_20260224/spec.md deleted file mode 100644 index f7c04c6..0000000 --- a/conductor/tracks/mma_implementation_20260224/spec.md +++ /dev/null @@ -1,37 +0,0 @@ -# Specification: 4-Tier Architecture Implementation & Conductor Self-Improvement - -## 1. Overview -This track encompasses two major phases. Phase 1 focuses on designing a comprehensive, step-by-step implementation plan to refactor the `manual_slop` codebase from a single-agent linear chat into an asynchronous, 4-Tier Hierarchical Multi-Model Architecture. Phase 2 focuses on evaluating the Gemini CLI `conductor` extension itself and proposing architectural upgrades to enforce multi-tier, cost-saving, and context-preserving disciplines. - -## 2. Functional Requirements - -### Phase 1: `manual_slop` Implementation Planning -- **Synthesis:** Read and synthesize all markdown files within the `./MMA_Support/` directory. -- **Plan Generation:** Generate a detailed implementation plan (`plan.md`) for the `manual_slop` migration. - - The plan must break down the migration into actionable sub-tracks or tickets (Epics and detailed technical tasks). - - It must strictly follow the iterative safe-migration strategy outlined in `MMA_Support/Implementation_Tracks.md`. - - The sequence must be: - 1. Tree-sitter AST parsing. - 2. State Machines. - 3. Linear Orchestrator. - 4. Tier 4 QA Interception. - 5. UI Decoupling. - - Every ticket/task must include explicit steps for testing and verifying the implementation. - -### Phase 2: Conductor Self-Reflection & Upgrade Strategy -- **Evaluation:** Critically evaluate the `conductor` extension's architecture and workflows against the principles of the 4-Tier Architecture. -- **Formal Proposal:** Deliver a formal proposal document within this track's directory (`proposal.md`). - - **Format Research:** Investigate the optimal format for the proposal based on Google's documentation for extending or tuning Conductor. - - **Content:** The proposal must address three core areas with equal priority: - 1. **Strict Memory Siloing & Token Firewalling:** How to reduce token bloat during Conductor's planning and execution loops. - 2. **Execution Clutch & Linear Debug Mode:** How to implement manual step-through or auto modes when managing complex tracks. - 3. **Multi-Model/Sub-Agent Delegation:** Design a system for internally delegating tasks (e.g., summarization, syntax fixing) to cheaper, faster models. - -## 3. Acceptance Criteria -- [ ] A fully populated `plan.md` exists within this track, detailing the `manual_slop` migration with Epics, detailed tasks, and testing steps. -- [ ] A formal proposal document (`proposal.md`) exists within this track, addressing the three core areas for Conductor's self-improvement. -- [ ] The proposal's format is justified based on official documentation or best practices for Conductor extensions. - -## 4. Out of Scope -- Actual implementation of the `manual_slop` refactor (this track is purely for planning the implementation). -- Actual modification of the `conductor` extension's core logic. \ No newline at end of file diff --git a/conductor/tracks/mma_implementation_20260224/synthesis.md b/conductor/tracks/mma_implementation_20260224/synthesis.md deleted file mode 100644 index 34fc8f3..0000000 --- a/conductor/tracks/mma_implementation_20260224/synthesis.md +++ /dev/null @@ -1,28 +0,0 @@ -# MMA Documentation Synthesis - -## Key Takeaways - -1. **Architecture Model**: 4-Tier Hierarchical Multi-Model Architecture mimicking a senior engineering department. - - **Tier 1 (Product Manager)**: High-reasoning models (Gemini 3.1 Pro/Claude 3.5 Sonnet) focusing on Epics and Tracks. - - **Tier 2 (Tech Lead)**: Mid-cost models (Gemini 3.0 Flash/2.5 Pro) for Track delegation, Ticket generation, and interface-driven development (Stub-and-Resolve). - - **Tier 3 (Contributors)**: Cheap/Fast models (DeepSeek V3/R1, Gemini 2.5 Flash) acting as amnesiac workers for heads-down coding. - - **Tier 4 (QA/Compiler)**: Ultra-cheap models (DeepSeek V3) for stateless translation of raw errors to human language. - -2. **Strict Context Management**: - - Uses `tree-sitter` for deterministic AST extraction (`Skeleton View`, `Curated Implementation View`, `Directory Map`). - - "Context Amnesia" ensures worker threads start fresh and do not accumulate hallucination-inducing token bloat. - -3. **Data Pipelines & Formats**: - - Tiers 1 & 2 output **Godot ECS Flat Relational Lists** (e.g., INI-style flat lists with `depends_on` pointers) to build DAGs. This avoids JSON nesting nightmares. - - Tier 3 uses **XML tags** (``, ``) to avoid string escaping friction. - -4. **Execution Flow**: - - The engine is decoupled from the UI using an `asyncio` event bus. - - A global **"Execution Clutch"** allows falling back from `async` parallel swarm mode to strict `linear` step mode for deterministic debugging and human-in-the-loop (HITL) overrides. - -## Constraints for Migration Plan - -- **Security**: `credentials.toml` must be strictly isolated and ignored in version control. -- **Phased Rollout**: Migration cannot be a single rewrite. It must follow strict tracks: AST Parser -> State Machine -> Linear Orchestrator -> Tier 4 QA -> UI Decoupling. -- **Tooling Constraints**: `tree-sitter` is mandatory for AST parsing. -- **UI State**: The GUI must be fully decoupled ("dumb" renderer) responding to queue events instead of blocking on LLM calls. \ No newline at end of file diff --git a/conductor/tracks/robust_live_simulation_verification/spec.md b/conductor/tracks/robust_live_simulation_verification/spec.md index 80527d2..9f2996a 100644 --- a/conductor/tracks/robust_live_simulation_verification/spec.md +++ b/conductor/tracks/robust_live_simulation_verification/spec.md @@ -10,4 +10,16 @@ Establish a robust, visual simulation framework to prevent regressions in the co ## Constraints - Must run against a live instance of the application using `--enable-test-hooks`. -- Must fail loudly if the visual state (e.g., rendered DAG nodes, text box contents) does not match expectations. \ No newline at end of file +- Must fail loudly if the visual state (e.g., rendered DAG nodes, text box contents) does not match expectations. + +## Context & Origins +This track was born from the "Human Verification" phase of the initial MMA Orchestrator prototype (`mma_orchestrator_integration_20260226`). We realized that while the backend API plumbing for the hierarchical MMA tiers (Tiers 1-4) was technically functional, the product lacked the necessary state management, UX visualization, and human-in-the-loop security gates to be usable. + +**Key Takeaways from the Prototype Phase:** +- The Tier 2 (Tech Lead) needs its own track-scoped discussion history, rather than polluting the global project history. +- Tasks within a track require a DAG (Directed Acyclic Graph) engine to manage complex dependencies and blocking states. +- The GUI must visualize this DAG and stream the output of individual workers directly to their associated tasks. +- We must enforce tiered context subsetting so that Tier 3/4 workers don't receive the massive global context blob, and we need a pre-spawn approval modal so the user can intercept, review, and modify worker prompts/contexts before they execute. + +**Instructions for the Implementing Agent:** +As you execute this track, ensure you maintain alignment with the other Phase 2 tracks. If you learn something that impacts the dependent tracks, please append a similar "Context Summary" to their `spec.md` files before concluding your run. \ No newline at end of file diff --git a/conductor/tracks/tiered_context_scoping_hitl_approval/spec.md b/conductor/tracks/tiered_context_scoping_hitl_approval/spec.md index 1de5a7b..fb4c8ad 100644 --- a/conductor/tracks/tiered_context_scoping_hitl_approval/spec.md +++ b/conductor/tracks/tiered_context_scoping_hitl_approval/spec.md @@ -10,4 +10,16 @@ Provide the user with absolute visual control over what the AI sees at every lev ## Constraints - Must adhere to the project's security and transparency mandates. -- The interceptor must be reliable and not cause the main event loop to hang indefinitely. \ No newline at end of file +- The interceptor must be reliable and not cause the main event loop to hang indefinitely. + +## Context & Origins +This track was born from the "Human Verification" phase of the initial MMA Orchestrator prototype (`mma_orchestrator_integration_20260226`). We realized that while the backend API plumbing for the hierarchical MMA tiers (Tiers 1-4) was technically functional, the product lacked the necessary state management, UX visualization, and human-in-the-loop security gates to be usable. + +**Key Takeaways from the Prototype Phase:** +- The Tier 2 (Tech Lead) needs its own track-scoped discussion history, rather than polluting the global project history. +- Tasks within a track require a DAG (Directed Acyclic Graph) engine to manage complex dependencies and blocking states. +- The GUI must visualize this DAG and stream the output of individual workers directly to their associated tasks. +- We must enforce tiered context subsetting so that Tier 3/4 workers don't receive the massive global context blob, and we need a pre-spawn approval modal so the user can intercept, review, and modify worker prompts/contexts before they execute. + +**Instructions for the Implementing Agent:** +As you execute this track, ensure you maintain alignment with the other Phase 2 tracks. If you learn something that impacts the dependent tracks, please append a similar "Context Summary" to their `spec.md` files before concluding your run. \ No newline at end of file