Files

Ed_ 08e003a137 docs: Complete documentation rewrite at gencpp/VEFontCache reference quality

Rewrites all docs from Gemini's 330-line executive summaries to 1874 lines
of expert-level architectural reference matching the pedagogical depth of
gencpp (Parser_Algo.md, AST_Types.md) and VEFontCache-Odin (guide_architecture.md).

Changes:
- guide_architecture.md: 73 -> 542 lines. Adds inline data structures for all
  dialog classes, cross-thread communication patterns, complete action type
  catalog, provider comparison table, 4-breakpoint Anthropic cache strategy,
  Gemini server-side cache lifecycle, context refresh algorithm.
- guide_tools.md: 66 -> 385 lines. Full 26-tool inventory with parameters,
  3-layer MCP security model walkthrough, all Hook API GET/POST endpoints
  with request/response formats, ApiHookClient method reference, /api/ask
  synchronous HITL protocol, shell runner with env config.
- guide_mma.md: NEW (368 lines). Fills major documentation gap — complete
  Ticket/Track/WorkerContext data structures, DAG engine algorithms (cycle
  detection, topological sort), ConductorEngine execution loop, Tier 2 ticket
  generation, Tier 3 worker lifecycle with context amnesia, token firewalling.
- guide_simulations.md: 64 -> 377 lines. 8-stage Puppeteer simulation
  lifecycle, mock_gemini_cli.py JSON-L protocol, approval automation pattern,
  ASTParser tree-sitter vs stdlib ast comparison, VerificationLogger.
- Readme.md: Rewritten with module map, architecture summary, config examples.
- docs/Readme.md: Proper index with guide contents table and GUI panel docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-01 09:44:50 -05:00

13 KiB

Raw Blame History

Verification & Simulation Framework

Top | Architecture | Tools & IPC | MMA Orchestration

Infrastructure

`--enable-test-hooks`

When launched with this flag, the application starts the HookServer on port 8999, exposing its internal state to external HTTP requests. This is the foundation for all automated verification. Without this flag, the Hook API is only available when the provider is gemini_cli.

The `live_gui` pytest Fixture

Defined in tests/conftest.py, this session-scoped fixture manages the lifecycle of the application under test.

Spawning:

@pytest.fixture(scope="session")
def live_gui() -> Generator[tuple[subprocess.Popen, str], None, None]:
    process = subprocess.Popen(
        ["uv", "run", "python", "-u", gui_script, "--enable-test-hooks"],
        stdout=log_file, stderr=log_file, text=True,
        creationflags=subprocess.CREATE_NEW_PROCESS_GROUP if os.name == 'nt' else 0
    )

-u flag: Disables output buffering for real-time log capture.
Process group: On Windows, uses CREATE_NEW_PROCESS_GROUP so the entire tree (GUI + child processes) can be killed cleanly.
Logging: Stdout/stderr redirected to logs/gui_2_py_test.log.

Readiness polling:

max_retries = 15  # seconds
while time.time() - start_time < max_retries:
    response = requests.get("http://127.0.0.1:8999/status", timeout=0.5)
    if response.status_code == 200:
        ready = True; break
    if process.poll() is not None: break  # Process died early
    time.sleep(0.5)

Polls GET /status every 500ms for up to 15 seconds. Checks process.poll() each iteration to detect early crashes (avoids waiting the full timeout if the GUI exits). Pre-check: tests if port 8999 is already occupied.

Failure path: If the hook server never responds, kills the process tree and calls pytest.fail() to abort the entire test session. Diagnostic telemetry (startup time, PID, success/fail) is written via VerificationLogger.

Teardown:

finally:
    client = ApiHookClient()
    client.reset_session()    # Clean GUI state before killing
    time.sleep(0.5)
    kill_process_tree(process.pid)
    log_file.close()

Sends reset_session() via ApiHookClient before killing to prevent stale state files.

Yield value: (process: subprocess.Popen, gui_script: str).

Session Isolation

@pytest.fixture(autouse=True)
def reset_ai_client() -> Generator[None, None, None]:
    ai_client.reset_session()
    ai_client.set_provider("gemini", "gemini-2.5-flash-lite")
    yield

Runs automatically before every test. Resets the ai_client module state and defaults to a safe model, preventing state pollution between tests.

Process Cleanup

def kill_process_tree(pid: int | None) -> None:

Windows: taskkill /F /T /PID <pid> — force-kills the process and all children (/T is critical since the GUI spawns child processes).
Unix: os.killpg(os.getpgid(pid), SIGKILL) to kill the entire process group.

VerificationLogger

Structured diagnostic logging for test telemetry:

class VerificationLogger:
    def __init__(self, test_name: str, script_name: str):
        self.logs_dir = Path(f"logs/test/{datetime.now().strftime('%Y%m%d_%H%M%S')}")

    def log_state(self, field: str, before: Any, after: Any, delta: Any = None)
    def finalize(self, description: str, status: str, result_msg: str)

Output format: fixed-width column table (Field | Before | After | Delta) written to logs/test/<timestamp>/<script_name>.txt. Dual output: file + tagged stdout lines for CI visibility.

Simulation Lifecycle: The "Puppeteer" Pattern

Simulations act as external puppeteers, driving the GUI through the ApiHookClient HTTP interface. The canonical example is tests/visual_sim_mma_v2.py.

Stage 1: Mock Provider Setup

client = ApiHookClient()
client.set_value('current_provider', 'gemini_cli')
mock_cli_path = f'{sys.executable} {os.path.abspath("tests/mock_gemini_cli.py")}'
client.set_value('gcli_path', mock_cli_path)
client.set_value('files_base_dir', 'tests/artifacts/temp_workspace')
client.click('btn_project_save')

Switches the GUI's LLM provider to gemini_cli (the CLI adapter).
Points the CLI binary to python tests/mock_gemini_cli.py — all LLM calls go to the mock.
Redirects files_base_dir to a temp workspace to prevent polluting real project directories.
Saves the project configuration.

Stage 2: Epic Planning

client.set_value('mma_epic_input', 'Develop a new feature')
client.click('btn_mma_plan_epic')

Enters an epic description and triggers planning. The GUI invokes the LLM (which hits the mock).

Stage 3: Poll for Proposed Tracks (60s timeout)

for _ in range(60):
    status = client.get_mma_status()
    if status.get('pending_mma_spawn_approval'): client.click('btn_approve_spawn')
    elif status.get('pending_mma_step_approval'): client.click('btn_approve_mma_step')
    elif status.get('pending_tool_approval'):     client.click('btn_approve_tool')
    if status.get('proposed_tracks') and len(status['proposed_tracks']) > 0: break
    time.sleep(1)

The approval automation is a critical pattern repeated in every polling loop. The MMA engine has three approval gates:

Spawn approval: Permission to create a new worker subprocess.
Step approval: Permission to proceed with the next orchestration step.
Tool approval: Permission to execute a tool call.

All three are auto-approved by clicking the corresponding button. Without this, the engine would block indefinitely at each gate.

Stage 4: Accept Tracks

client.click('btn_mma_accept_tracks')

Stage 5: Poll for Tracks Populated (30s timeout)

Waits until status['tracks'] contains a track with 'Mock Goal 1' in its title.

Stage 6: Load Track and Verify Tickets (60s timeout)

client.click('btn_mma_load_track', user_data=track_id_to_load)

Then polls until:

active_track matches the loaded track ID.
active_tickets list is non-empty.

Stage 7: Verify MMA Status Transitions (120s timeout)

Polls until mma_status == 'running' or 'done'. Continues auto-approving all gates.

Stage 8: Verify Worker Output in Streams (60s timeout)

streams = status.get('mma_streams', {})
if any("Tier 3" in k for k in streams.keys()):
    tier3_key = [k for k in streams.keys() if "Tier 3" in k][0]
    if "SUCCESS: Mock Tier 3 worker" in streams[tier3_key]:
        streams_found = True

Verifies that mma_streams contains a key with "Tier 3" and the value contains the exact mock output string.

Assertions Summary

Mock provider setup succeeds (try/except with pytest.fail).
proposed_tracks appears within 60 seconds.
'Mock Goal 1' track exists in tracks list within 30 seconds.
Track loads and active_tickets populate within 60 seconds.
MMA status becomes 'running' or 'done' within 120 seconds.
Tier 3 worker output with specific mock content appears in mma_streams within 60 seconds.

Mock Provider Strategy

`tests/mock_gemini_cli.py`

A fake Gemini CLI executable that replaces the real gemini binary during integration tests. Outputs JSON-L messages matching the real CLI's streaming output protocol.

Input mechanism:

prompt = sys.stdin.read()          # Primary: prompt via stdin
sys.argv                            # Secondary: management command detection
os.environ.get('GEMINI_CLI_HOOK_CONTEXT')  # Tertiary: environment variable

Management command bypass:

if len(sys.argv) > 1 and sys.argv[1] in ["mcp", "extensions", "skills", "hooks"]:
    return  # Silent exit

Response routing — keyword matching on stdin content:

Prompt Contains	Response	Session ID
`'PATH: Epic Initialization'`	Two mock Track objects (`mock-track-1`, `mock-track-2`)	`mock-session-epic`
`'PATH: Sprint Planning'`	Two mock Ticket objects (`mock-ticket-1` independent, `mock-ticket-2` depends on `mock-ticket-1`)	`mock-session-sprint`
`'"role": "tool"'` or `'"tool_call_id"'`	Success message (simulates post-tool-call final answer)	`mock-session-final`
Default (Tier 3 worker prompts)	`"SUCCESS: Mock Tier 3 worker implemented the change. [MOCK OUTPUT]"`	`mock-session-default`

Output protocol — every response is exactly two JSON-L lines:

{"type": "message", "role": "assistant", "content": "<response>"}
{"type": "result", "status": "success", "stats": {"total_tokens": N, ...}, "session_id": "mock-session-*"}

This matches the real Gemini CLI's streaming output format. flush=True on every print() ensures the consuming process receives data immediately.

Tool call simulation: The mock does not emit tool calls. It detects tool results in the prompt ('"role": "tool"' check) and responds with a final answer, simulating the second turn of a tool-call conversation without actually issuing calls.

Debug output: All debug information goes to stderr, keeping stdout clean for the JSON-L protocol.

Visual Verification Patterns

Tests in this framework don't just check return values — they verify the rendered state of the application via the Hook API.

DAG Integrity

Verify that active_tickets in the MMA status matches the expected task graph:

status = client.get_mma_status()
tickets = status.get('active_tickets', [])
assert len(tickets) >= 2
assert any(t['id'] == 'mock-ticket-1' for t in tickets)

Stream Telemetry

Check mma_streams to ensure output from multiple tiers is correctly captured and routed:

streams = status.get('mma_streams', {})
tier3_keys = [k for k in streams.keys() if "Tier 3" in k]
assert len(tier3_keys) > 0
assert "SUCCESS" in streams[tier3_keys[0]]

Assert that the correct dialog is active during a pending tool call:

status = client.get_mma_status()
assert status.get('pending_tool_approval') == True
# or
diag = client.get_indicator_state('thinking')
assert diag.get('thinking') == True

Performance Monitoring

Verify UI responsiveness under load:

perf = client.get_performance()
assert perf['fps'] > 30
assert perf['input_lag_ms'] < 100

Supporting Analysis Modules

`file_cache.py` — ASTParser (tree-sitter)

class ASTParser:
    def __init__(self, language: str = "python"):
        self.language = tree_sitter.Language(tree_sitter_python.language())
        self.parser = tree_sitter.Parser(self.language)

    def parse(self, code: str) -> tree_sitter.Tree
    def get_skeleton(self, code: str) -> str
    def get_curated_view(self, code: str) -> str

get_skeleton algorithm:

Parse code to tree-sitter AST.
Walk all function_definition nodes.
For each body (block node):
- If first non-comment child is a docstring: preserve docstring, replace rest with ....
- Otherwise: replace entire body with ....
Apply edits in reverse byte order (maintains valid offsets).

get_curated_view algorithm: Enhanced skeleton that preserves bodies under two conditions:

Function has @core_logic decorator.
Function body contains a # [HOT] comment anywhere in its descendants.

If either condition is true, the body is preserved verbatim. This enables a two-tier code view: hot paths shown in full, boilerplate compressed.

`summarize.py` — Heuristic File Summaries

Token-efficient structural descriptions without AI calls:

_SUMMARISERS: dict[str, Callable] = {
    ".py":   _summarise_python,    # imports, classes, methods, functions, constants
    ".toml": _summarise_toml,      # table keys + array lengths
    ".md":   _summarise_markdown,  # h1-h3 headings
    ".ini":  _summarise_generic,   # line count + preview
}

_summarise_python uses stdlib ast:

Parse with ast.parse().
Extract deduplicated imports (top-level module names only).
Extract ALL_CAPS constants (both Assign and AnnAssign).
Extract classes with their method names.
Extract top-level function names.

Output:

**Python** — 150 lines
imports: ast, json, pathlib
constants: TIMEOUT_SECONDS
class ASTParser: __init__, parse, get_skeleton
functions: summarise_file, build_summary_markdown

`outline_tool.py` — Hierarchical Code Outline

class CodeOutliner:
    def outline(self, code: str) -> str

Walks top-level ast nodes:

ClassDef → [Class] Name (Lines X-Y) + docstring + recurse for methods
FunctionDef → [Func] Name (Lines X-Y) or [Method] Name if nested
AsyncFunctionDef → [Async Func] Name (Lines X-Y)

Only extracts first line of docstrings. Uses indentation depth as heuristic for method vs function.

Two Parallel Code Analysis Implementations

The codebase has two parallel approaches for structural code analysis:

Aspect	`file_cache.py` (tree-sitter)	`summarize.py` / `outline_tool.py` (stdlib `ast`)
Parser	tree-sitter with `tree_sitter_python`	Python's built-in `ast` module
Precision	Byte-accurate, preserves exact syntax	Line-level, may lose formatting nuance
`@core_logic` / `[HOT]`	Supported (selective body preservation)	Not supported
Used by	`py_get_skeleton` MCP tool, worker context injection	`get_file_summary` MCP tool, `py_get_code_outline`
Performance	Slightly slower (C extension + tree walk)	Faster (pure Python, simpler walk)

13 KiB Raw Blame History