Compare commits
5 Commits
6b7cd0a9da
...
d43ec78240
| Author | SHA1 | Date | |
|---|---|---|---|
| d43ec78240 | |||
| 5a0ec6646e | |||
| 5e6c685b06 | |||
| 8666137479 | |||
| 9762b00393 |
@@ -1 +0,0 @@
|
|||||||
C:/projects/manual_slop/mma-orchestrator
|
|
||||||
121
.gemini/skills/mma-orchestrator/SKILL.md
Normal file
121
.gemini/skills/mma-orchestrator/SKILL.md
Normal file
@@ -0,0 +1,121 @@
|
|||||||
|
---
|
||||||
|
name: mma-orchestrator
|
||||||
|
description: Enforces the 4-Tier Hierarchical Multi-Model Architecture (MMA) within Gemini CLI using Token Firewalling and sub-agent task delegation.
|
||||||
|
---
|
||||||
|
|
||||||
|
# MMA Token Firewall & Tiered Delegation Protocol
|
||||||
|
|
||||||
|
You are operating within the MMA Framework, acting as either the **Tier 1 Orchestrator** (for setup/init) or the **Tier 2 Tech Lead** (for execution). Your context window is extremely valuable and must be protected from token bloat (such as raw, repetitive code edits, trial-and-error histories, or massive stack traces).
|
||||||
|
|
||||||
|
To accomplish this, you MUST delegate token-heavy or stateless tasks to **Tier 3 Workers** or **Tier 4 QA Agents** by spawning secondary Gemini CLI instances via `run_shell_command`.
|
||||||
|
|
||||||
|
**CRITICAL Prerequisite:**
|
||||||
|
To ensure proper environment handling and logging, you MUST NOT call the `gemini` command directly for sub-tasks. Instead, use the wrapper script:
|
||||||
|
`uv run python scripts/mma_exec.py --role <Role> "..."`
|
||||||
|
|
||||||
|
## 0. Architecture Fallback & Surgical Methodology
|
||||||
|
|
||||||
|
**Before creating or refining any track**, consult the deep-dive architecture docs:
|
||||||
|
- `docs/guide_architecture.md`: Thread domains, event system (`AsyncEventQueue`, `_pending_gui_tasks` action catalog), AI client multi-provider architecture, HITL Execution Clutch blocking flow, frame-sync mechanism
|
||||||
|
- `docs/guide_tools.md`: MCP Bridge 3-layer security model, full 26-tool inventory with params, Hook API GET/POST endpoints with request/response formats, ApiHookClient method reference
|
||||||
|
- `docs/guide_mma.md`: Ticket/Track/WorkerContext data structures, DAG engine (cycle detection, topological sort), ConductorEngine execution loop, Tier 2 ticket generation, Tier 3 worker lifecycle with context amnesia
|
||||||
|
- `docs/guide_simulations.md`: `live_gui` fixture lifecycle, Puppeteer pattern, mock provider JSON-L protocol, visual verification patterns
|
||||||
|
|
||||||
|
### The Surgical Spec Protocol (MANDATORY for track creation)
|
||||||
|
|
||||||
|
When creating tracks (`activate_skill mma-tier1-orchestrator`), follow this protocol:
|
||||||
|
|
||||||
|
1. **AUDIT BEFORE SPECIFYING**: Use `get_code_outline`, `py_get_definition`, `grep_search`, and `get_git_diff` to map what already exists. Previous track specs asked to re-implement existing features (Track Browser, DAG tree, approval dialogs) because no audit was done. Document findings in a "Current State Audit" section with file:line references.
|
||||||
|
|
||||||
|
2. **GAPS, NOT FEATURES**: Frame requirements as what's MISSING relative to what exists.
|
||||||
|
- GOOD: "The existing `_render_mma_dashboard` (gui_2.py:2633-2724) has a token usage table but no cost column."
|
||||||
|
- BAD: "Build a metrics dashboard with token and cost tracking."
|
||||||
|
|
||||||
|
3. **WORKER-READY TASKS**: Each plan task must specify:
|
||||||
|
- **WHERE**: Exact file and line range (`gui_2.py:2700-2701`)
|
||||||
|
- **WHAT**: The specific change (add function, modify dict, extend table)
|
||||||
|
- **HOW**: Which API calls (`imgui.progress_bar(...)`, `imgui.collapsing_header(...)`)
|
||||||
|
- **SAFETY**: Thread-safety constraints if cross-thread data is involved
|
||||||
|
|
||||||
|
4. **ROOT CAUSE ANALYSIS** (for fix tracks): Don't write "investigate and fix." List specific candidates with code-level reasoning.
|
||||||
|
|
||||||
|
5. **REFERENCE DOCS**: Link to relevant `docs/guide_*.md` sections in every spec.
|
||||||
|
|
||||||
|
6. **MAP DEPENDENCIES**: State execution order and blockers between tracks.
|
||||||
|
|
||||||
|
## 1. The Tier 3 Worker (Execution)
|
||||||
|
When performing code modifications or implementing specific requirements:
|
||||||
|
1. **Pre-Delegation Checkpoint:** For dangerous or non-trivial changes, ALWAYS stage your changes (`git add .`) or commit before delegating to a Tier 3 Worker. If the worker fails or runs `git restore`, you will lose all prior AI iterations for that file if it wasn't staged/committed.
|
||||||
|
2. **Code Style Enforcement:** You MUST explicitly remind the worker to "use exactly 1-space indentation for Python code" in your prompt to prevent them from breaking the established codebase style.
|
||||||
|
3. **DO NOT** perform large code writes yourself.
|
||||||
|
4. **DO** construct a single, highly specific prompt with a clear objective. Include exact file:line references and the specific API calls to use (from your audit or the architecture docs).
|
||||||
|
5. **DO** spawn a Tier 3 Worker.
|
||||||
|
*Command:* `uv run python scripts/mma_exec.py --role tier3-worker "Implement [SPECIFIC_INSTRUCTION] in [FILE_PATH] at lines [N-M]. Use [SPECIFIC_API_CALL]. Use 1-space indentation."`
|
||||||
|
6. **Handling Repeated Failures:** If a Tier 3 Worker fails multiple times on the same task, it may lack the necessary capability. You must track failures and retry with `--failure-count <N>` (e.g., `--failure-count 2`). This tells `mma_exec.py` to escalate the sub-agent to a more powerful reasoning model (like `gemini-3-flash`).
|
||||||
|
7. The Tier 3 Worker is stateless and has tool access for file I/O.
|
||||||
|
|
||||||
|
## 2. The Tier 4 QA Agent (Diagnostics)
|
||||||
|
If you run a test or command that fails with a significant error or large traceback:
|
||||||
|
1. **DO NOT** analyze the raw logs in your own context window.
|
||||||
|
2. **DO** spawn a stateless Tier 4 agent to diagnose the failure.
|
||||||
|
3. *Command:* `uv run python scripts/mma_exec.py --role tier4-qa "Analyze this failure and summarize the root cause: [LOG_DATA]"`
|
||||||
|
4. **Mandatory Research-First Protocol:** Avoid direct `read_file` calls for any file over 50 lines. Use `get_file_summary`, `py_get_skeleton`, or `py_get_code_outline` first to identify relevant sections. Use `git diff` to understand changes.
|
||||||
|
|
||||||
|
## 3. Persistent Tech Lead Memory (Tier 2)
|
||||||
|
Unlike the stateless sub-agents (Tiers 3 & 4), the **Tier 2 Tech Lead** maintains persistent context throughout the implementation of a track. Do NOT apply "Context Amnesia" to your own session during track implementation. You are responsible for the continuity of the technical strategy.
|
||||||
|
|
||||||
|
## 4. AST Skeleton & Outline Views
|
||||||
|
To minimize context bloat for Tier 2 & 3:
|
||||||
|
1. Use `py_get_code_outline` or `get_tree` to map out the structure of a file or project.
|
||||||
|
2. Use `py_get_skeleton` and `py_get_imports` to understand the interface, docstrings, and dependencies of modules.
|
||||||
|
3. Use `py_get_definition` to read specific functions/classes by name without loading entire files.
|
||||||
|
4. Use `py_find_usages` to pinpoint where a function or class is called instead of searching the whole codebase.
|
||||||
|
5. Use `py_check_syntax` after making string replacements to ensure the file is still syntactically valid.
|
||||||
|
6. Only use `read_file` with `start_line` and `end_line` for specific implementation details once target areas are identified.
|
||||||
|
7. Tier 3 workers MUST NOT read the full content of unrelated files.
|
||||||
|
|
||||||
|
## 5. Cross-Skill Activation
|
||||||
|
When your current role requires capabilities from another tier, use `activate_skill`:
|
||||||
|
- **Track creation/refinement**: `activate_skill mma-tier1-orchestrator` — applies the Surgical Spec Protocol
|
||||||
|
- **Track execution**: `activate_skill mma-tier2-tech-lead` — applies persistent context and TDD workflow
|
||||||
|
- **Quick code task**: Spawn via `mma_exec.py --role tier3-worker` (stateless, no skill activation needed)
|
||||||
|
- **Error analysis**: Spawn via `mma_exec.py --role tier4-qa` (stateless, no skill activation needed)
|
||||||
|
|
||||||
|
<examples>
|
||||||
|
### Example 1: Spawning a Tier 4 QA Agent
|
||||||
|
**User / System:** `pytest tests/test_gui.py` failed with 400 lines of output.
|
||||||
|
**Agent (You):**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"command": "python scripts/mma_exec.py --role tier4-qa \"Summarize this stack trace into a 20-word fix: [snip first 30 lines...]\"",
|
||||||
|
"description": "Spawning Tier 4 QA to compress error trace statelessly."
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 2: Spawning a Tier 3 Worker with Surgical Prompt
|
||||||
|
**User:** Please implement the cost tracking column in the token usage table.
|
||||||
|
**Agent (You):**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"command": "python scripts/mma_exec.py --role tier3-worker \"In gui_2.py, modify _render_mma_dashboard (lines 2685-2699). Extend the token usage table from 3 columns to 5 by adding 'Model' and 'Est. Cost' columns. Use imgui.table_setup_column() for the new columns. Import cost_tracker and call cost_tracker.estimate_cost(model, input_tokens, output_tokens) for each tier row. Add a total row at the bottom. Use 1-space indentation.\"",
|
||||||
|
"description": "Delegating surgical implementation to Tier 3 Worker with exact line refs."
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 3: Creating a Track with Audit
|
||||||
|
**User:** Create a track for adding dark mode support.
|
||||||
|
**Agent (You):**
|
||||||
|
1. First, audit the codebase: `py_get_code_outline gui_2.py` → find `_render_theme_panel` (lines 2993-3030).
|
||||||
|
2. Read the existing implementation: `py_get_definition gui_2.py _render_theme_panel`.
|
||||||
|
3. Check for existing color constants: `grep_search "vec4\|C_" gui_2.py`.
|
||||||
|
4. Now write the spec with a "Current State Audit" section documenting what the theme panel already does.
|
||||||
|
5. Write tasks referencing the exact lines and imgui color APIs to use.
|
||||||
|
</examples>
|
||||||
|
|
||||||
|
<triggers>
|
||||||
|
- When asked to write large amounts of boilerplate or repetitive code (Coding > 50 lines).
|
||||||
|
- When encountering a large error trace from a shell execution (Errors > 100 lines).
|
||||||
|
- When explicitly instructed to act as a "Tech Lead" or "Orchestrator".
|
||||||
|
- When managing complex, multi-file Track implementations.
|
||||||
|
- When creating or refining conductor tracks (MUST follow Surgical Spec Protocol).
|
||||||
|
</triggers>
|
||||||
@@ -8,7 +8,7 @@ This file tracks all major tracks for the project. Each track has its own detail
|
|||||||
|
|
||||||
*The following tracks MUST be executed in this exact order to safely resolve tech debt before feature development.*
|
*The following tracks MUST be executed in this exact order to safely resolve tech debt before feature development.*
|
||||||
|
|
||||||
1. [ ] **Track: Test Suite Stabilization & Consolidation** (Active/Next)
|
1. [~] **Track: Test Suite Stabilization & Consolidation** (Active/Next)
|
||||||
*Link: [./tracks/test_stabilization_20260302/](./tracks/test_stabilization_20260302/)*
|
*Link: [./tracks/test_stabilization_20260302/](./tracks/test_stabilization_20260302/)*
|
||||||
|
|
||||||
2. [ ] **Track: Strict Static Analysis & Type Safety**
|
2. [ ] **Track: Strict Static Analysis & Type Safety**
|
||||||
|
|||||||
@@ -1,13 +1,13 @@
|
|||||||
# Implementation Plan: Test Suite Stabilization & Consolidation (test_stabilization_20260302)
|
# Implementation Plan: Test Suite Stabilization & Consolidation (test_stabilization_20260302)
|
||||||
|
|
||||||
## Phase 1: Infrastructure & Paradigm Consolidation
|
## Phase 1: Infrastructure & Paradigm Consolidation [checkpoint: 8666137]
|
||||||
- [x] Task: Initialize MMA Environment `activate_skill mma-orchestrator` [Manual]
|
- [x] Task: Initialize MMA Environment `activate_skill mma-orchestrator` [Manual]
|
||||||
- [x] Task: Setup Artifact Isolation Directories [570c0ea]
|
- [x] Task: Setup Artifact Isolation Directories [570c0ea]
|
||||||
- [ ] WHERE: Project root
|
- [ ] WHERE: Project root
|
||||||
- [ ] WHAT: Create `./tests/artifacts/` and `./tests/logs/` directories. Add `.gitignore` to both containing `*` and `!.gitignore`.
|
- [ ] WHAT: Create `./tests/artifacts/` and `./tests/logs/` directories. Add `.gitignore` to both containing `*` and `!.gitignore`.
|
||||||
- [ ] HOW: Use PowerShell `New-Item` and `Out-File`.
|
- [ ] HOW: Use PowerShell `New-Item` and `Out-File`.
|
||||||
- [ ] SAFETY: Do not commit artifacts.
|
- [ ] SAFETY: Do not commit artifacts.
|
||||||
- [ ] Task: Migrate Manual Launchers to `live_gui` Fixture
|
- [x] Task: Migrate Manual Launchers to `live_gui` Fixture [6b7cd0a]
|
||||||
- [ ] WHERE: `tests/visual_mma_verification.py` (lines 15-40), `simulation/` scripts.
|
- [ ] WHERE: `tests/visual_mma_verification.py` (lines 15-40), `simulation/` scripts.
|
||||||
- [ ] WHAT: Replace `subprocess.Popen(["python", "gui_2.py"])` with the `live_gui` fixture injected into `pytest` test functions. Remove manual while-loop sleeps.
|
- [ ] WHAT: Replace `subprocess.Popen(["python", "gui_2.py"])` with the `live_gui` fixture injected into `pytest` test functions. Remove manual while-loop sleeps.
|
||||||
- [ ] HOW: Use standard pytest `def test_... (live_gui):` and rely on `ApiHookClient` with proper timeouts.
|
- [ ] HOW: Use standard pytest `def test_... (live_gui):` and rely on `ApiHookClient` with proper timeouts.
|
||||||
@@ -15,7 +15,7 @@
|
|||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Infrastructure & Consolidation' (Protocol in workflow.md)
|
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Infrastructure & Consolidation' (Protocol in workflow.md)
|
||||||
|
|
||||||
## Phase 2: Asyncio Stabilization & Logging
|
## Phase 2: Asyncio Stabilization & Logging
|
||||||
- [ ] Task: Audit and Fix `conftest.py` Loop Lifecycle
|
- [x] Task: Audit and Fix `conftest.py` Loop Lifecycle [5a0ec66]
|
||||||
- [ ] WHERE: `tests/conftest.py:20-50` (around `app_instance` fixture).
|
- [ ] WHERE: `tests/conftest.py:20-50` (around `app_instance` fixture).
|
||||||
- [ ] WHAT: Ensure the `app._loop.stop()` cleanup safely cancels pending background tasks.
|
- [ ] WHAT: Ensure the `app._loop.stop()` cleanup safely cancels pending background tasks.
|
||||||
- [ ] HOW: Use `asyncio.all_tasks(loop)` and `task.cancel()` before stopping the loop in the fixture teardown.
|
- [ ] HOW: Use `asyncio.all_tasks(loop)` and `task.cancel()` before stopping the loop in the fixture teardown.
|
||||||
|
|||||||
8
conductor/tracks/ux_sim_test_20260302/metadata.json
Normal file
8
conductor/tracks/ux_sim_test_20260302/metadata.json
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
{
|
||||||
|
"id": "ux_sim_test_20260302",
|
||||||
|
"title": "UX_SIM_TEST",
|
||||||
|
"description": "Simulation testing for GUI UX",
|
||||||
|
"type": "feature",
|
||||||
|
"status": "new",
|
||||||
|
"progress": 0.0
|
||||||
|
}
|
||||||
3
conductor/tracks/ux_sim_test_20260302/plan.md
Normal file
3
conductor/tracks/ux_sim_test_20260302/plan.md
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
# Implementation Plan: UX_SIM_TEST
|
||||||
|
|
||||||
|
- [ ] Task 1: Initialize
|
||||||
5
conductor/tracks/ux_sim_test_20260302/spec.md
Normal file
5
conductor/tracks/ux_sim_test_20260302/spec.md
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
# Specification: UX_SIM_TEST
|
||||||
|
|
||||||
|
Type: feature
|
||||||
|
|
||||||
|
Description: Simulation testing for GUI UX
|
||||||
@@ -1,4 +1,5 @@
|
|||||||
import pytest
|
import pytest
|
||||||
|
import asyncio
|
||||||
import subprocess
|
import subprocess
|
||||||
import time
|
import time
|
||||||
import requests
|
import requests
|
||||||
@@ -49,9 +50,23 @@ def app_instance() -> Generator[App, None, None]:
|
|||||||
):
|
):
|
||||||
app = App()
|
app = App()
|
||||||
yield app
|
yield app
|
||||||
# Cleanup: Ensure asyncio loop is stopped
|
# Cleanup: Ensure asyncio loop is stopped and tasks are cancelled
|
||||||
if hasattr(app, '_loop') and app._loop.is_running():
|
if hasattr(app, '_loop'):
|
||||||
app._loop.call_soon_threadsafe(app._loop.stop)
|
# 1. Identify all pending tasks in app._loop.
|
||||||
|
tasks = [t for t in asyncio.all_tasks(app._loop) if not t.done()]
|
||||||
|
# 2. Cancel them using task.cancel().
|
||||||
|
for task in tasks:
|
||||||
|
task.cancel()
|
||||||
|
# Stop background thread to take control of the loop thread-safely
|
||||||
|
if app._loop.is_running():
|
||||||
|
app._loop.call_soon_threadsafe(app._loop.stop)
|
||||||
|
if hasattr(app, '_loop_thread') and app._loop_thread.is_alive():
|
||||||
|
app._loop_thread.join(timeout=2.0)
|
||||||
|
# 3. Wait for them to complete using loop.run_until_complete(asyncio.gather(*tasks, return_exceptions=True)).
|
||||||
|
if tasks:
|
||||||
|
app._loop.run_until_complete(asyncio.gather(*tasks, return_exceptions=True))
|
||||||
|
# 4. Then stop the loop.
|
||||||
|
app._loop.stop()
|
||||||
|
|
||||||
@pytest.fixture
|
@pytest.fixture
|
||||||
def mock_app(app_instance: App) -> App:
|
def mock_app(app_instance: App) -> App:
|
||||||
|
|||||||
Reference in New Issue
Block a user