Compare commits
15 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 2f2f73cbb3 | |||
| 88712ed328 | |||
| 0d533ec11e | |||
| 95955a2792 | |||
| eea3da805e | |||
| df1c429631 | |||
| 55b8288b98 | |||
| 5e256d1c12 | |||
| 6710b58d25 | |||
| eb64e52134 | |||
| 221374eed6 | |||
| 9c229e14fd | |||
| 678fa89747 | |||
| 25b904b404 | |||
| 32ec14f5c3 |
+1
-1
@@ -49,5 +49,5 @@ This file tracks all major tracks for the project. Each track has its own detail
|
||||
|
||||
---
|
||||
|
||||
- [ ] **Track: Improve conductors use of 4-tier mma architecture workflow, skills, subagents. Introduce a seaprate skill for each dedicated tier and a dedicated cli tool to execute the roles appropriate/gather context as defined for that role's domain.**
|
||||
- [x] **Track: Improve conductors use of 4-tier mma architecture workflow, skills, subagents. Introduce a seaprate skill for each dedicated tier and a dedicated cli tool to execute the roles appropriate/gather context as defined for that role's domain.**
|
||||
*Link: [./tracks/mma_formalization_20260225/](./tracks/mma_formalization_20260225/)*
|
||||
|
||||
@@ -14,14 +14,14 @@
|
||||
- [x] Task: Integrate `mma-exec` with the existing `ai_client.py` logic (SKIPPED - out of scope for Conductor)
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 2: mma-exec CLI - Core Scoping' (Protocol in workflow.md) [0195329]
|
||||
|
||||
## Phase 3: Advanced Context Features
|
||||
- [~] Task: Implement AST "Skeleton View" generator using `tree-sitter` in `scripts/mma_exec.py`
|
||||
- [ ] Task: Add dependency mapping to `mma-exec` (providing skeletons of imported files to Workers)
|
||||
- [ ] Task: Implement logging/auditing for all role hand-offs in `logs/mma_delegation.log`
|
||||
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Advanced Context Features' (Protocol in workflow.md)
|
||||
## Phase 3: Advanced Context Features [checkpoint: eb64e52]
|
||||
- [x] Task: Implement AST "Skeleton View" generator using `tree-sitter` in `scripts/mma_exec.py` [4e564aa]
|
||||
- [x] Task: Add dependency mapping to `mma-exec` (providing skeletons of imported files to Workers) [32ec14f]
|
||||
- [x] Task: Implement logging/auditing for all role hand-offs in `logs/mma_delegation.log` [678fa89]
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 3: Advanced Context Features' (Protocol in workflow.md) [eb64e52]
|
||||
|
||||
## Phase 4: Workflow & Conductor Integration
|
||||
- [ ] Task: Update `conductor/workflow.md` with new MMA role definitions and `mma-exec` commands
|
||||
- [ ] Task: Create a Conductor helper/alias in `scripts/` to simplify manual role triggering
|
||||
- [ ] Task: Final end-to-end verification using a sample feature implementation
|
||||
- [ ] Task: Conductor - User Manual Verification 'Phase 4: Workflow & Conductor Integration' (Protocol in workflow.md)
|
||||
## Phase 4: Workflow & Conductor Integration [checkpoint: 0d533ec]
|
||||
- [x] Task: Update `conductor/workflow.md` with new MMA role definitions and `mma-exec` commands [5e256d1]
|
||||
- [x] Task: Create a Conductor helper/alias in `scripts/` to simplify manual role triggering [df1c429]
|
||||
- [x] Task: Final end-to-end verification using a sample feature implementation [verified]
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 4: Workflow & Conductor Integration' (Protocol in workflow.md) [0d533ec]
|
||||
|
||||
+18
-11
@@ -23,12 +23,12 @@ All tasks follow a strict lifecycle:
|
||||
2. **Mark In Progress:** Before beginning work, edit `plan.md` and change the task from `[ ]` to `[~]`
|
||||
|
||||
3. **Write Failing Tests (Red Phase):**
|
||||
- **Delegate Test Creation:** Do NOT write test code directly. Spawn a Tier 3 Worker (`run_subagent.ps1 -Role Worker`) with a prompt to create the necessary test files and unit tests based on the task criteria.
|
||||
- **Delegate Test Creation:** Do NOT write test code directly. Spawn a Tier 3 Worker (`python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`) with a prompt to create the necessary test files and unit tests based on the task criteria.
|
||||
- Take the code generated by the Worker and apply it.
|
||||
- **CRITICAL:** Run the tests and confirm that they fail as expected. This is the "Red" phase of TDD. Do not proceed until you have failing tests.
|
||||
|
||||
4. **Implement to Pass Tests (Green Phase):**
|
||||
- **Delegate Implementation:** Do NOT write the implementation code directly. Spawn a Tier 3 Worker (`run_subagent.ps1 -Role Worker`) with a highly specific prompt to write the minimum amount of application code necessary to make the failing tests pass.
|
||||
- **Delegate Implementation:** Do NOT write the implementation code directly. Spawn a Tier 3 Worker (`python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`) with a highly specific prompt to write the minimum amount of application code necessary to make the failing tests pass.
|
||||
- Take the code generated by the Worker and apply it.
|
||||
- Run the test suite again and confirm that all tests now pass. This is the "Green" phase.
|
||||
|
||||
@@ -88,8 +88,8 @@ All tasks follow a strict lifecycle:
|
||||
- Before execution, you **must** announce the exact shell command you will use to run the tests.
|
||||
- **Example Announcement:** "I will now run the automated test suite to verify the phase. **Command:** `CI=true npm test`"
|
||||
- Execute the announced command.
|
||||
- If tests fail with significant output (e.g., a large traceback), **DO NOT** attempt to read the raw `stderr` directly into your context. Instead, pipe the output to a log file and **spawn a Tier 4 QA Agent (`run_subagent.ps1 -Role QA`)** to summarize the failure.
|
||||
- You **must** inform the user and begin debugging using the QA Agent's summary. You may attempt to propose a fix a **maximum of two times**. If the tests still fail after your second proposed fix, you **must stop**, report the persistent failure, and ask the user for guidance.
|
||||
- If tests fail with significant output (e.g., a large traceback), **DO NOT** attempt to read the raw `stderr` directly into your context. Instead, pipe the output to a log file and **spawn a Tier 4 QA Agent (`python scripts/mma_exec.py --role tier4-qa "[PROMPT]"`)** to summarize the failure.
|
||||
- You **must** inform the user and begin debugging using the QA Agent's summary. You may attempt to propose a fix a **maximum of two times**. If the tests still fail after your second proposed fix, you **must stop**, report the persistent failure, and ask the user for guidance.
|
||||
|
||||
4. **Execute Automated API Hook Verification:**
|
||||
- **CRITICAL:** The Conductor agent will now automatically execute verification tasks using the application's API hooks.
|
||||
@@ -370,15 +370,22 @@ To emulate the 4-Tier MMA Architecture within the standard Conductor extension w
|
||||
|
||||
### 1. Active Model Switching (Simulating the 4 Tiers)
|
||||
- **Activate MMA Orchestrator Skill:** To enforce the 4-Tier token firewall, the agent MUST invoke `activate_skill mma-orchestrator` at the start of any implementation phase.
|
||||
- **Tiered Delegation (The Role-Based Protocol):**
|
||||
- **Tier 3 Worker (Implementation):** For significant code modifications (Coding > 50 lines), delegate to a stateless sub-agent:
|
||||
`.\scripts\run_subagent.ps1 -Role Worker -Prompt "Modify [FILE] to implement [SPEC]..."`
|
||||
- **Tier 4 QA Agent (Error Analysis):** If tests fail with large traces (Errors > 100 lines), delegate to a QA agent for compression:
|
||||
`.\scripts\run_subagent.ps1 -Role QA -Prompt "Summarize this stack trace into a 20-word fix: [SNIPPET]"`
|
||||
- **Traceability:** Use the `-ShowContext` flag during debugging to see the role-specific system prompts and hand-offs in the terminal.
|
||||
- **The MMA Bridge (`mma_exec.py`):** All tiered delegation is routed through `python scripts/mma_exec.py`. This script acts as the primary bridge, managing model selection, context injection, and logging.
|
||||
- **Model Tiers:**
|
||||
- **Tier 1 (Strategic/Orchestration):** `gemini-3.1-pro-preview`. Used for planning and high-level logic.
|
||||
- **Tier 2 (Architectural/Tech Lead):** `gemini-3-flash-preview`. Used for code review and structural design.
|
||||
- **Tier 3 (Execution/Worker):** `gemini-2.5-flash-lite`. Used for surgical code implementation and test generation.
|
||||
- **Tier 4 (Utility/QA):** `gemini-2.5-flash-lite`. Used for log summarization and error analysis.
|
||||
- **Tiered Delegation Protocol:**
|
||||
- **Tier 3 Worker:** `python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`
|
||||
- **Tier 4 QA Agent:** `python scripts/mma_exec.py --role tier4-qa "[PROMPT]"`
|
||||
- **Logging:** All hierarchical interactions are automatically recorded in `logs/mma_delegation.log` for auditable verification.
|
||||
|
||||
### 2. Context Checkpoints (The Token Firewall)
|
||||
### 2. Context Management and Token Firewalling
|
||||
- **Context Amnesia:** `mma_exec.py` enforces "Context Amnesia" by executing sub-agents in a stateless manner. Each call starts with a clean slate, receiving only the strictly necessary documents and prompts. This prevents conversational "hallucination bleed" and keeps token costs low.
|
||||
- **AST Skeleton Views:** For Tier 3 implementation, `mma_exec.py` automatically generates "AST Skeleton Views" of project dependencies. This provides the worker model with the interface-level structure (function signatures, docstrings) of imported modules without the full source code, maximizing the signal-to-noise ratio in the context window.
|
||||
|
||||
### 3. Phase Checkpoints (The Final Defense)
|
||||
- The **Phase Completion Verification and Checkpointing Protocol** is the project's primary defense against token bloat.
|
||||
- When a Phase is marked complete and a checkpoint commit is created, the AI Agent must actively interpret this as a **"Context Wipe"** signal. It should summarize the outcome in its git notes and move forward treating the checkpoint as absolute truth, deliberately dropping earlier conversational history.
|
||||
- **MMA Phase Memory Wipe:** After completing a major Phase, use the Tier 1/2 Orchestrator's perspective to consolidate state into Git Notes and then disregard previous trial-and-error histories.
|
||||
|
||||
@@ -0,0 +1,25 @@
|
||||
param(
|
||||
[Parameter(Mandatory=$true, Position=0)]
|
||||
[ValidateSet("tier1", "tier2", "tier3", "tier4", "orchestrator", "tech-lead", "worker", "qa")]
|
||||
[string]$Role,
|
||||
|
||||
[Parameter(Mandatory=$true, Position=1)]
|
||||
[string]$Prompt
|
||||
)
|
||||
|
||||
# Map human-readable aliases to mma_exec roles
|
||||
$RoleMap = @{
|
||||
"orchestrator" = "tier1-orchestrator"
|
||||
"tier1" = "tier1-orchestrator"
|
||||
"tech-lead" = "tier2-tech-lead"
|
||||
"tier2" = "tier2-tech-lead"
|
||||
"worker" = "tier3-worker"
|
||||
"tier3" = "tier3-worker"
|
||||
"qa" = "tier4-qa"
|
||||
"tier4" = "tier4-qa"
|
||||
}
|
||||
|
||||
$MappedRole = $RoleMap[$Role.ToLower()]
|
||||
|
||||
Write-Host "[MMA] Spawning Role: $MappedRole" -ForegroundColor Cyan
|
||||
uv run python scripts/mma_exec.py --role $MappedRole $Prompt
|
||||
+67
-2
@@ -4,6 +4,10 @@ import json
|
||||
import os
|
||||
import tree_sitter
|
||||
import tree_sitter_python
|
||||
import ast
|
||||
import datetime
|
||||
|
||||
LOG_FILE = 'logs/mma_delegation.log'
|
||||
|
||||
def generate_skeleton(code: str) -> str:
|
||||
"""
|
||||
@@ -76,9 +80,61 @@ def get_role_documents(role: str) -> list[str]:
|
||||
return ['conductor/workflow.md']
|
||||
return []
|
||||
|
||||
def log_delegation(role, prompt):
|
||||
os.makedirs(os.path.dirname(LOG_FILE), exist_ok=True)
|
||||
timestamp = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
|
||||
with open(LOG_FILE, 'a', encoding='utf-8') as f:
|
||||
f.write("--------------------------------------------------\n")
|
||||
f.write(f"TIMESTAMP: {timestamp}\n")
|
||||
f.write(f"TIER: {role}\n")
|
||||
f.write(f"PROMPT: {prompt}\n")
|
||||
f.write("--------------------------------------------------\n")
|
||||
|
||||
def get_dependencies(filepath):
|
||||
"""Identify top-level module imports from a Python file."""
|
||||
try:
|
||||
with open(filepath, 'r', encoding='utf-8') as f:
|
||||
tree = ast.parse(f.read())
|
||||
dependencies = []
|
||||
for node in tree.body:
|
||||
if isinstance(node, ast.Import):
|
||||
for alias in node.names:
|
||||
dependencies.append(alias.name.split('.')[0])
|
||||
elif isinstance(node, ast.ImportFrom):
|
||||
if node.module:
|
||||
dependencies.append(node.module.split('.')[0])
|
||||
seen = set()
|
||||
result = []
|
||||
for d in dependencies:
|
||||
if d not in seen:
|
||||
result.append(d)
|
||||
seen.add(d)
|
||||
return result
|
||||
except Exception as e:
|
||||
print(f"Error getting dependencies for {filepath}: {e}")
|
||||
return []
|
||||
|
||||
def execute_agent(role: str, prompt: str, docs: list[str]) -> str:
|
||||
log_delegation(role, prompt)
|
||||
model = get_model_for_role(role)
|
||||
command_text = f"Use the mma-{role} skill. {prompt}"
|
||||
|
||||
# Advanced Context: Dependency skeletons for Tier 3
|
||||
injected_context = ""
|
||||
if role in ['tier3', 'tier3-worker']:
|
||||
for doc in docs:
|
||||
if doc.endswith('.py') and os.path.exists(doc):
|
||||
deps = get_dependencies(doc)
|
||||
for dep in deps:
|
||||
dep_file = f"{dep}.py"
|
||||
if os.path.exists(dep_file) and dep_file != doc:
|
||||
try:
|
||||
with open(dep_file, 'r', encoding='utf-8') as f:
|
||||
skeleton = generate_skeleton(f.read())
|
||||
injected_context += f"\n\nDEPENDENCY SKELETON: {dep_file}\n{skeleton}\n"
|
||||
except Exception as e:
|
||||
print(f"Error generating skeleton for {dep_file}: {e}")
|
||||
|
||||
command_text = f"Use the mma-{role} skill. {injected_context}{prompt}"
|
||||
for doc in docs:
|
||||
command_text += f" @{doc}"
|
||||
|
||||
@@ -121,9 +177,18 @@ def main():
|
||||
args = parser.parse_args()
|
||||
|
||||
docs = get_role_documents(args.role)
|
||||
# We allow the user to provide additional docs if they want?
|
||||
# For now, just the default role docs.
|
||||
# In practice, conductor will call this with a prompt like "Modify aggregate.py @aggregate.py"
|
||||
# But wait, my execute_agent expects docs as a list.
|
||||
|
||||
# If the prompt contains @file, we should extract it and put it in docs.
|
||||
# Actually, gemini CLI handles @file positionals.
|
||||
# But my execute_agent appends them to command_text as @file.
|
||||
|
||||
print(f"Executing role: {args.role} with docs: {docs}")
|
||||
result = execute_agent(args.role, args.prompt, docs)
|
||||
print(result)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
main()
|
||||
|
||||
@@ -37,7 +37,7 @@ SYSTEM PROMPT: $SelectedPrompt
|
||||
USER PROMPT: $Prompt
|
||||
--------------------------------------------------
|
||||
"@
|
||||
$LogEntry | Out-File -FilePath $LogFile -Append
|
||||
$LogEntry | Out-File -FilePath $LogFile -Append -Encoding utf8
|
||||
|
||||
if ($ShowContext) {
|
||||
Write-Host "`n[MMA ORCHESTRATOR] Spawning Tier: $Role" -ForegroundColor Cyan
|
||||
@@ -59,7 +59,7 @@ try {
|
||||
$parsed = $cleanJsonString | ConvertFrom-Json
|
||||
|
||||
# Log response
|
||||
"RESPONSE:`n$($parsed.response)" | Out-File -FilePath $LogFile -Append
|
||||
"RESPONSE:`n$($parsed.response)" | Out-File -FilePath $LogFile -Append -Encoding utf8
|
||||
|
||||
# Output only the clean response text
|
||||
Write-Output $parsed.response
|
||||
|
||||
+60
-2
@@ -1,6 +1,7 @@
|
||||
import pytest
|
||||
import os
|
||||
from unittest.mock import patch, MagicMock
|
||||
from scripts.mma_exec import create_parser, get_role_documents, execute_agent, get_model_for_role
|
||||
from scripts.mma_exec import create_parser, get_role_documents, execute_agent, get_model_for_role, get_dependencies
|
||||
|
||||
def test_parser_role_choices():
|
||||
"""Test that the parser accepts valid roles and the prompt argument."""
|
||||
@@ -83,4 +84,61 @@ def test_execute_agent():
|
||||
assert kwargs.get("capture_output") is True
|
||||
assert kwargs.get("text") is True
|
||||
|
||||
assert result == mock_stdout
|
||||
assert result == mock_stdout
|
||||
|
||||
def test_get_dependencies(tmp_path):
|
||||
content = (
|
||||
"import os\n"
|
||||
"import sys\n"
|
||||
"import file_cache\n"
|
||||
"from mcp_client import something\n"
|
||||
)
|
||||
filepath = tmp_path / "mock_script.py"
|
||||
filepath.write_text(content)
|
||||
dependencies = get_dependencies(filepath)
|
||||
assert dependencies == ['os', 'sys', 'file_cache', 'mcp_client']
|
||||
|
||||
|
||||
import re
|
||||
def test_execute_agent_logging(tmp_path):
|
||||
log_file = tmp_path / "mma_delegation.log"
|
||||
with patch("scripts.mma_exec.LOG_FILE", str(log_file)), \
|
||||
patch("subprocess.run") as mock_run:
|
||||
mock_process = MagicMock()
|
||||
mock_process.stdout = ""
|
||||
mock_process.returncode = 0
|
||||
mock_run.return_value = mock_process
|
||||
test_role = "tier1"
|
||||
test_prompt = "Plan the next phase"
|
||||
execute_agent(test_role, test_prompt, [])
|
||||
assert log_file.exists()
|
||||
log_content = log_file.read_text()
|
||||
assert test_role in log_content
|
||||
assert test_prompt in log_content
|
||||
assert re.search(r"\d{4}-\d{2}-\d{2}", log_content)
|
||||
|
||||
|
||||
def test_execute_agent_tier3_injection(tmp_path):
|
||||
main_content = "import dependency\n\ndef run():\n dependency.do_work()\n"
|
||||
main_file = tmp_path / "main.py"
|
||||
main_file.write_text(main_content)
|
||||
dep_content = "def do_work():\n pass\n\ndef other_func():\n print('hello')\n"
|
||||
dep_file = tmp_path / "dependency.py"
|
||||
dep_file.write_text(dep_content)
|
||||
old_cwd = os.getcwd()
|
||||
os.chdir(tmp_path)
|
||||
try:
|
||||
with patch("subprocess.run") as mock_run:
|
||||
mock_process = MagicMock()
|
||||
mock_process.stdout = "OK"
|
||||
mock_process.returncode = 0
|
||||
mock_run.return_value = mock_process
|
||||
execute_agent('tier3-worker', 'Modify main.py', ['main.py'])
|
||||
assert mock_run.called
|
||||
cmd_list = mock_run.call_args[0][0]
|
||||
full_command = " ".join(str(arg) for arg in cmd_list)
|
||||
assert "DEPENDENCY SKELETON: dependency.py" in full_command
|
||||
assert "def do_work():" in full_command
|
||||
assert "Modify main.py" in full_command
|
||||
finally:
|
||||
os.chdir(old_cwd)
|
||||
|
||||
Reference in New Issue
Block a user