Private
Public Access
0
0

docs(tier2): architecture check - click chain isolation is correct

Per user question about whether execution is properly isolated between
AppController and gui_2.py main thread.

Verified by reading the architecture contract (docs/guide_architecture.md
lines 12, 884-890) and the two click handlers in question:

- _handle_generate_send (btn_gen_send): self.submit_io(worker)
- _cb_plan_epic (btn_mma_plan_epic): self.submit_io(_bg_task)

BOTH click handlers return immediately after submitting work. The
heavy AI call (ai_client.send -> subprocess.Popen -> process.communicate)
runs on the io_pool worker thread. The execution isolation between
AppController and gui_2.py's main render thread IS being followed.

The crash (STATUS_STACK_OVERFLOW, 0xC00000FD) is NOT in the click
handler chain. It IS in the main thread's imgui-bundle render loop.

The render loop runs concurrently with the io_pool worker's subprocess
operations. imgui-bundle's per-frame C++ draw code can exceed the main
thread's 1.94 MB stack (verified via kernel32.GetCurrentThreadStackLimits).

What aspect of negative_flows triggers this: the error-response render
path. MOCK_MODE=malformed_json causes the adapter to raise, which
triggers _handle_request_event to emit a 'response' event with
status='error'. The render loop draws this error response on the next
frame, exhausting the main thread's stack.

test_visual_orchestration.py uses the same provider setup but does NOT
set MOCK_MODE, so the mock defaults to 'success' mode, the adapter
returns normally, no error event, no crash. Empirically PASSED in
11.01s.

The architecture's render-loop contract assumes imgui-bundle's C stack
usage is bounded. It's not. The architecture has no enforcement
mechanism (no stack guard, no per-frame stack measurement, no graceful
degradation).

Next step (post-compact): capture Windows crash dump via procdump to
identify the specific imgui-bundle draw call.
This commit is contained in:
2026-06-17 13:09:57 -04:00
parent cc2105dc65
commit cc234b1b83
@@ -0,0 +1,68 @@
# Architecture Check: Click Chain vs Main Thread Isolation
## Contract (from `docs/guide_architecture.md`)
- **`gui_2.py`** should be a **pure visualization of application state**. State mutations occur only through lock-guarded queues consumed on the main render thread.
- **Background threads never write GUI state directly** - they serialize task dicts for later consumption.
- **Click handlers must be FAST** - they should submit heavy work to background threads (io_pool, MMA WorkerPool) and return immediately.
- The single-writer principle: all GUI state mutations happen on the main thread via `_process_pending_gui_tasks`.
## Verification of the contract
| Click handler | Work submission | Compliant? |
|---|---|---|
| `_handle_generate_send` (btn_gen_send) | `self.submit_io(worker)` | YES |
| `_cb_plan_epic` (btn_mma_plan_epic) | `self.submit_io(_bg_task)` | YES |
Both handlers return immediately after submitting work. The heavy AI call (`ai_client.send` -> `subprocess.Popen` -> `process.communicate`) runs on the io_pool worker thread, not on the main thread. The execution isolation between AppController and gui_2.py's main render thread IS being followed.
## What's actually crashing
The crash (STATUS_STACK_OVERFLOW, 0xC00000FD) is NOT in the click handler chain. It IS in the **main thread's imgui-bundle render loop**.
The render loop runs concurrently with the io_pool worker's subprocess operations. Each frame, imgui-bundle's C++ draw code consumes native stack on the main thread. The main thread has 1.94 MB stack (verified via `kernel32.GetCurrentThreadStackLimits`). imgui-bundle's per-frame C stack usage can exceed this 1.94 MB under certain conditions.
The crash is NOT an architecture violation by the application code. It's a constraint violation by imgui-bundle's native draw code, which assumes more stack than the main thread has.
## What aspect of negative_flows triggers this
The aspect: **negative_flows triggers the error-response render path**.
- `test_z_negative_flows.py` sets `MOCK_MODE=malformed_json` -> the mock_gemini_cli.py subprocess prints broken JSON and exits 1.
- The adapter raises an Exception -> `_send_gemini_cli` catches and returns `Result(ok=False)` -> `_handle_request_event` emits a "response" event with `status="error"` -> the render loop processes the event and draws the error response on the next frame.
- Other tier-3 tests don't trigger this path because they use MockProvider (no subprocess, no exception, no error render) or use the success-mode mock (adapter returns normally, no error event).
`test_visual_orchestration.py` uses the same provider setup but does NOT set MOCK_MODE, so the mock defaults to "success" mode, the adapter returns normally, no exception, no error response, no crash. **Empirically verified: this test PASSES in 11.01s.**
## Why the architecture needs updating
The architecture's render-loop contract assumes imgui-bundle's C stack usage is bounded. It's not. Specifically:
- The render loop runs on the main thread (1.94 MB stack, PE-header-baked).
- imgui-bundle's per-frame draw code can use significantly more stack, especially when rendering large error overlays, complex text, or extensive draw lists.
- When the io_pool worker triggers specific render paths (via emitted events), the main thread's render loop exceeds its 1.94 MB stack.
- The architecture has no enforcement mechanism for this (no stack guard, no per-frame stack measurement, no graceful degradation).
## Where to investigate next (post-compact)
1. Capture a Windows crash dump to identify the specific imgui-bundle draw call that exhausts the main thread's stack:
```
procdump -ma -e 1 -f "" uv run python sloppy.py --enable-test-hooks
```
Open the .dmp in WinDbg, run `!analyze -v` to see the crashing thread and exact C++ stack frame.
2. Bump the main thread's stack at the OS level (out of scope for a 1-track fix):
```
editbin /STACK:8388608 C:\projects\manual_slop_tier2\.venv\Scripts\python.exe
```
3. Long-term: consider imgui-bundle's offscreen rendering mode so the main thread isn't doing heavy C++ draw calls.
## Files in this report
- `docs/reports/NEGATIVE_FLOWS_INVESTIGATION_20260617_REFINED.md` (the prior investigation)
- `scripts/tier2/artifacts/send_result_to_send_20260616/WHATS_SPECIAL.md` (previous round - what's unique about this test)
- `scripts/tier2/artifacts/send_result_to_send_20260616/test_visual_orch_out.txt` (visual_orchestration PASSED with same provider setup)
- `logs/sloppy_no_click_*.log` (no-click baseline - process survives 60s)
- `docs/guide_architecture.md` lines 12, 884-890 (the contract)
- `src/app_controller.py` `_handle_generate_send` (line 3434) and `_cb_plan_epic` (line 4025) (the click handlers, both compliant)