Files
manual_slop/docs/guide_simulations.md
Ed_ 08e003a137 docs: Complete documentation rewrite at gencpp/VEFontCache reference quality
Rewrites all docs from Gemini's 330-line executive summaries to 1874 lines
of expert-level architectural reference matching the pedagogical depth of
gencpp (Parser_Algo.md, AST_Types.md) and VEFontCache-Odin (guide_architecture.md).

Changes:
- guide_architecture.md: 73 -> 542 lines. Adds inline data structures for all
  dialog classes, cross-thread communication patterns, complete action type
  catalog, provider comparison table, 4-breakpoint Anthropic cache strategy,
  Gemini server-side cache lifecycle, context refresh algorithm.
- guide_tools.md: 66 -> 385 lines. Full 26-tool inventory with parameters,
  3-layer MCP security model walkthrough, all Hook API GET/POST endpoints
  with request/response formats, ApiHookClient method reference, /api/ask
  synchronous HITL protocol, shell runner with env config.
- guide_mma.md: NEW (368 lines). Fills major documentation gap — complete
  Ticket/Track/WorkerContext data structures, DAG engine algorithms (cycle
  detection, topological sort), ConductorEngine execution loop, Tier 2 ticket
  generation, Tier 3 worker lifecycle with context amnesia, token firewalling.
- guide_simulations.md: 64 -> 377 lines. 8-stage Puppeteer simulation
  lifecycle, mock_gemini_cli.py JSON-L protocol, approval automation pattern,
  ASTParser tree-sitter vs stdlib ast comparison, VerificationLogger.
- Readme.md: Rewritten with module map, architecture summary, config examples.
- docs/Readme.md: Proper index with guide contents table and GUI panel docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-01 09:44:50 -05:00

378 lines
13 KiB
Markdown

# Verification & Simulation Framework
[Top](../Readme.md) | [Architecture](guide_architecture.md) | [Tools & IPC](guide_tools.md) | [MMA Orchestration](guide_mma.md)
---
## Infrastructure
### `--enable-test-hooks`
When launched with this flag, the application starts the `HookServer` on port `8999`, exposing its internal state to external HTTP requests. This is the foundation for all automated verification. Without this flag, the Hook API is only available when the provider is `gemini_cli`.
### The `live_gui` pytest Fixture
Defined in `tests/conftest.py`, this session-scoped fixture manages the lifecycle of the application under test.
**Spawning:**
```python
@pytest.fixture(scope="session")
def live_gui() -> Generator[tuple[subprocess.Popen, str], None, None]:
process = subprocess.Popen(
["uv", "run", "python", "-u", gui_script, "--enable-test-hooks"],
stdout=log_file, stderr=log_file, text=True,
creationflags=subprocess.CREATE_NEW_PROCESS_GROUP if os.name == 'nt' else 0
)
```
- **`-u` flag**: Disables output buffering for real-time log capture.
- **Process group**: On Windows, uses `CREATE_NEW_PROCESS_GROUP` so the entire tree (GUI + child processes) can be killed cleanly.
- **Logging**: Stdout/stderr redirected to `logs/gui_2_py_test.log`.
**Readiness polling:**
```python
max_retries = 15 # seconds
while time.time() - start_time < max_retries:
response = requests.get("http://127.0.0.1:8999/status", timeout=0.5)
if response.status_code == 200:
ready = True; break
if process.poll() is not None: break # Process died early
time.sleep(0.5)
```
Polls `GET /status` every 500ms for up to 15 seconds. Checks `process.poll()` each iteration to detect early crashes (avoids waiting the full timeout if the GUI exits). Pre-check: tests if port 8999 is already occupied.
**Failure path:** If the hook server never responds, kills the process tree and calls `pytest.fail()` to abort the entire test session. Diagnostic telemetry (startup time, PID, success/fail) is written via `VerificationLogger`.
**Teardown:**
```python
finally:
client = ApiHookClient()
client.reset_session() # Clean GUI state before killing
time.sleep(0.5)
kill_process_tree(process.pid)
log_file.close()
```
Sends `reset_session()` via `ApiHookClient` before killing to prevent stale state files.
**Yield value:** `(process: subprocess.Popen, gui_script: str)`.
### Session Isolation
```python
@pytest.fixture(autouse=True)
def reset_ai_client() -> Generator[None, None, None]:
ai_client.reset_session()
ai_client.set_provider("gemini", "gemini-2.5-flash-lite")
yield
```
Runs automatically before every test. Resets the `ai_client` module state and defaults to a safe model, preventing state pollution between tests.
### Process Cleanup
```python
def kill_process_tree(pid: int | None) -> None:
```
- **Windows**: `taskkill /F /T /PID <pid>` — force-kills the process and all children (`/T` is critical since the GUI spawns child processes).
- **Unix**: `os.killpg(os.getpgid(pid), SIGKILL)` to kill the entire process group.
### VerificationLogger
Structured diagnostic logging for test telemetry:
```python
class VerificationLogger:
def __init__(self, test_name: str, script_name: str):
self.logs_dir = Path(f"logs/test/{datetime.now().strftime('%Y%m%d_%H%M%S')}")
def log_state(self, field: str, before: Any, after: Any, delta: Any = None)
def finalize(self, description: str, status: str, result_msg: str)
```
Output format: fixed-width column table (`Field | Before | After | Delta`) written to `logs/test/<timestamp>/<script_name>.txt`. Dual output: file + tagged stdout lines for CI visibility.
---
## Simulation Lifecycle: The "Puppeteer" Pattern
Simulations act as external puppeteers, driving the GUI through the `ApiHookClient` HTTP interface. The canonical example is `tests/visual_sim_mma_v2.py`.
### Stage 1: Mock Provider Setup
```python
client = ApiHookClient()
client.set_value('current_provider', 'gemini_cli')
mock_cli_path = f'{sys.executable} {os.path.abspath("tests/mock_gemini_cli.py")}'
client.set_value('gcli_path', mock_cli_path)
client.set_value('files_base_dir', 'tests/artifacts/temp_workspace')
client.click('btn_project_save')
```
- Switches the GUI's LLM provider to `gemini_cli` (the CLI adapter).
- Points the CLI binary to `python tests/mock_gemini_cli.py` — all LLM calls go to the mock.
- Redirects `files_base_dir` to a temp workspace to prevent polluting real project directories.
- Saves the project configuration.
### Stage 2: Epic Planning
```python
client.set_value('mma_epic_input', 'Develop a new feature')
client.click('btn_mma_plan_epic')
```
Enters an epic description and triggers planning. The GUI invokes the LLM (which hits the mock).
### Stage 3: Poll for Proposed Tracks (60s timeout)
```python
for _ in range(60):
status = client.get_mma_status()
if status.get('pending_mma_spawn_approval'): client.click('btn_approve_spawn')
elif status.get('pending_mma_step_approval'): client.click('btn_approve_mma_step')
elif status.get('pending_tool_approval'): client.click('btn_approve_tool')
if status.get('proposed_tracks') and len(status['proposed_tracks']) > 0: break
time.sleep(1)
```
The **approval automation** is a critical pattern repeated in every polling loop. The MMA engine has three approval gates:
- **Spawn approval**: Permission to create a new worker subprocess.
- **Step approval**: Permission to proceed with the next orchestration step.
- **Tool approval**: Permission to execute a tool call.
All three are auto-approved by clicking the corresponding button. Without this, the engine would block indefinitely at each gate.
### Stage 4: Accept Tracks
```python
client.click('btn_mma_accept_tracks')
```
### Stage 5: Poll for Tracks Populated (30s timeout)
Waits until `status['tracks']` contains a track with `'Mock Goal 1'` in its title.
### Stage 6: Load Track and Verify Tickets (60s timeout)
```python
client.click('btn_mma_load_track', user_data=track_id_to_load)
```
Then polls until:
- `active_track` matches the loaded track ID.
- `active_tickets` list is non-empty.
### Stage 7: Verify MMA Status Transitions (120s timeout)
Polls until `mma_status == 'running'` or `'done'`. Continues auto-approving all gates.
### Stage 8: Verify Worker Output in Streams (60s timeout)
```python
streams = status.get('mma_streams', {})
if any("Tier 3" in k for k in streams.keys()):
tier3_key = [k for k in streams.keys() if "Tier 3" in k][0]
if "SUCCESS: Mock Tier 3 worker" in streams[tier3_key]:
streams_found = True
```
Verifies that `mma_streams` contains a key with "Tier 3" and the value contains the exact mock output string.
### Assertions Summary
1. Mock provider setup succeeds (try/except with `pytest.fail`).
2. `proposed_tracks` appears within 60 seconds.
3. `'Mock Goal 1'` track exists in tracks list within 30 seconds.
4. Track loads and `active_tickets` populate within 60 seconds.
5. MMA status becomes `'running'` or `'done'` within 120 seconds.
6. Tier 3 worker output with specific mock content appears in `mma_streams` within 60 seconds.
---
## Mock Provider Strategy
### `tests/mock_gemini_cli.py`
A fake Gemini CLI executable that replaces the real `gemini` binary during integration tests. Outputs JSON-L messages matching the real CLI's streaming output protocol.
**Input mechanism:**
```python
prompt = sys.stdin.read() # Primary: prompt via stdin
sys.argv # Secondary: management command detection
os.environ.get('GEMINI_CLI_HOOK_CONTEXT') # Tertiary: environment variable
```
**Management command bypass:**
```python
if len(sys.argv) > 1 and sys.argv[1] in ["mcp", "extensions", "skills", "hooks"]:
return # Silent exit
```
**Response routing** — keyword matching on stdin content:
| Prompt Contains | Response | Session ID |
|---|---|---|
| `'PATH: Epic Initialization'` | Two mock Track objects (`mock-track-1`, `mock-track-2`) | `mock-session-epic` |
| `'PATH: Sprint Planning'` | Two mock Ticket objects (`mock-ticket-1` independent, `mock-ticket-2` depends on `mock-ticket-1`) | `mock-session-sprint` |
| `'"role": "tool"'` or `'"tool_call_id"'` | Success message (simulates post-tool-call final answer) | `mock-session-final` |
| Default (Tier 3 worker prompts) | `"SUCCESS: Mock Tier 3 worker implemented the change. [MOCK OUTPUT]"` | `mock-session-default` |
**Output protocol** — every response is exactly two JSON-L lines:
```json
{"type": "message", "role": "assistant", "content": "<response>"}
{"type": "result", "status": "success", "stats": {"total_tokens": N, ...}, "session_id": "mock-session-*"}
```
This matches the real Gemini CLI's streaming output format. `flush=True` on every `print()` ensures the consuming process receives data immediately.
**Tool call simulation:** The mock does **not** emit tool calls. It detects tool results in the prompt (`'"role": "tool"'` check) and responds with a final answer, simulating the second turn of a tool-call conversation without actually issuing calls.
**Debug output:** All debug information goes to stderr, keeping stdout clean for the JSON-L protocol.
---
## Visual Verification Patterns
Tests in this framework don't just check return values — they verify the **rendered state** of the application via the Hook API.
### DAG Integrity
Verify that `active_tickets` in the MMA status matches the expected task graph:
```python
status = client.get_mma_status()
tickets = status.get('active_tickets', [])
assert len(tickets) >= 2
assert any(t['id'] == 'mock-ticket-1' for t in tickets)
```
### Stream Telemetry
Check `mma_streams` to ensure output from multiple tiers is correctly captured and routed:
```python
streams = status.get('mma_streams', {})
tier3_keys = [k for k in streams.keys() if "Tier 3" in k]
assert len(tier3_keys) > 0
assert "SUCCESS" in streams[tier3_keys[0]]
```
### Modal State
Assert that the correct dialog is active during a pending tool call:
```python
status = client.get_mma_status()
assert status.get('pending_tool_approval') == True
# or
diag = client.get_indicator_state('thinking')
assert diag.get('thinking') == True
```
### Performance Monitoring
Verify UI responsiveness under load:
```python
perf = client.get_performance()
assert perf['fps'] > 30
assert perf['input_lag_ms'] < 100
```
---
## Supporting Analysis Modules
### `file_cache.py` — ASTParser (tree-sitter)
```python
class ASTParser:
def __init__(self, language: str = "python"):
self.language = tree_sitter.Language(tree_sitter_python.language())
self.parser = tree_sitter.Parser(self.language)
def parse(self, code: str) -> tree_sitter.Tree
def get_skeleton(self, code: str) -> str
def get_curated_view(self, code: str) -> str
```
**`get_skeleton` algorithm:**
1. Parse code to tree-sitter AST.
2. Walk all `function_definition` nodes.
3. For each body (`block` node):
- If first non-comment child is a docstring: preserve docstring, replace rest with `...`.
- Otherwise: replace entire body with `...`.
4. Apply edits in reverse byte order (maintains valid offsets).
**`get_curated_view` algorithm:**
Enhanced skeleton that preserves bodies under two conditions:
- Function has `@core_logic` decorator.
- Function body contains a `# [HOT]` comment anywhere in its descendants.
If either condition is true, the body is preserved verbatim. This enables a two-tier code view: hot paths shown in full, boilerplate compressed.
### `summarize.py` — Heuristic File Summaries
Token-efficient structural descriptions without AI calls:
```python
_SUMMARISERS: dict[str, Callable] = {
".py": _summarise_python, # imports, classes, methods, functions, constants
".toml": _summarise_toml, # table keys + array lengths
".md": _summarise_markdown, # h1-h3 headings
".ini": _summarise_generic, # line count + preview
}
```
**`_summarise_python`** uses stdlib `ast`:
1. Parse with `ast.parse()`.
2. Extract deduplicated imports (top-level module names only).
3. Extract `ALL_CAPS` constants (both `Assign` and `AnnAssign`).
4. Extract classes with their method names.
5. Extract top-level function names.
Output:
```
**Python** — 150 lines
imports: ast, json, pathlib
constants: TIMEOUT_SECONDS
class ASTParser: __init__, parse, get_skeleton
functions: summarise_file, build_summary_markdown
```
### `outline_tool.py` — Hierarchical Code Outline
```python
class CodeOutliner:
def outline(self, code: str) -> str
```
Walks top-level `ast` nodes:
- `ClassDef``[Class] Name (Lines X-Y)` + docstring + recurse for methods
- `FunctionDef``[Func] Name (Lines X-Y)` or `[Method] Name` if nested
- `AsyncFunctionDef``[Async Func] Name (Lines X-Y)`
Only extracts first line of docstrings. Uses indentation depth as heuristic for method vs function.
---
## Two Parallel Code Analysis Implementations
The codebase has two parallel approaches for structural code analysis:
| Aspect | `file_cache.py` (tree-sitter) | `summarize.py` / `outline_tool.py` (stdlib `ast`) |
|---|---|---|
| Parser | tree-sitter with `tree_sitter_python` | Python's built-in `ast` module |
| Precision | Byte-accurate, preserves exact syntax | Line-level, may lose formatting nuance |
| `@core_logic` / `[HOT]` | Supported (selective body preservation) | Not supported |
| Used by | `py_get_skeleton` MCP tool, worker context injection | `get_file_summary` MCP tool, `py_get_code_outline` |
| Performance | Slightly slower (C extension + tree walk) | Faster (pure Python, simpler walk) |