docs(tier2): architecture check - click chain isolation is correct
Per user question about whether execution is properly isolated between AppController and gui_2.py main thread. Verified by reading the architecture contract (docs/guide_architecture.md lines 12, 884-890) and the two click handlers in question: - _handle_generate_send (btn_gen_send): self.submit_io(worker) - _cb_plan_epic (btn_mma_plan_epic): self.submit_io(_bg_task) BOTH click handlers return immediately after submitting work. The heavy AI call (ai_client.send -> subprocess.Popen -> process.communicate) runs on the io_pool worker thread. The execution isolation between AppController and gui_2.py's main render thread IS being followed. The crash (STATUS_STACK_OVERFLOW, 0xC00000FD) is NOT in the click handler chain. It IS in the main thread's imgui-bundle render loop. The render loop runs concurrently with the io_pool worker's subprocess operations. imgui-bundle's per-frame C++ draw code can exceed the main thread's 1.94 MB stack (verified via kernel32.GetCurrentThreadStackLimits). What aspect of negative_flows triggers this: the error-response render path. MOCK_MODE=malformed_json causes the adapter to raise, which triggers _handle_request_event to emit a 'response' event with status='error'. The render loop draws this error response on the next frame, exhausting the main thread's stack. test_visual_orchestration.py uses the same provider setup but does NOT set MOCK_MODE, so the mock defaults to 'success' mode, the adapter returns normally, no error event, no crash. Empirically PASSED in 11.01s. The architecture's render-loop contract assumes imgui-bundle's C stack usage is bounded. It's not. The architecture has no enforcement mechanism (no stack guard, no per-frame stack measurement, no graceful degradation). Next step (post-compact): capture Windows crash dump via procdump to identify the specific imgui-bundle draw call.
This commit is contained in:
@@ -0,0 +1,68 @@
|
||||
# Architecture Check: Click Chain vs Main Thread Isolation
|
||||
|
||||
## Contract (from `docs/guide_architecture.md`)
|
||||
|
||||
- **`gui_2.py`** should be a **pure visualization of application state**. State mutations occur only through lock-guarded queues consumed on the main render thread.
|
||||
- **Background threads never write GUI state directly** - they serialize task dicts for later consumption.
|
||||
- **Click handlers must be FAST** - they should submit heavy work to background threads (io_pool, MMA WorkerPool) and return immediately.
|
||||
- The single-writer principle: all GUI state mutations happen on the main thread via `_process_pending_gui_tasks`.
|
||||
|
||||
## Verification of the contract
|
||||
|
||||
| Click handler | Work submission | Compliant? |
|
||||
|---|---|---|
|
||||
| `_handle_generate_send` (btn_gen_send) | `self.submit_io(worker)` | YES |
|
||||
| `_cb_plan_epic` (btn_mma_plan_epic) | `self.submit_io(_bg_task)` | YES |
|
||||
|
||||
Both handlers return immediately after submitting work. The heavy AI call (`ai_client.send` -> `subprocess.Popen` -> `process.communicate`) runs on the io_pool worker thread, not on the main thread. The execution isolation between AppController and gui_2.py's main render thread IS being followed.
|
||||
|
||||
## What's actually crashing
|
||||
|
||||
The crash (STATUS_STACK_OVERFLOW, 0xC00000FD) is NOT in the click handler chain. It IS in the **main thread's imgui-bundle render loop**.
|
||||
|
||||
The render loop runs concurrently with the io_pool worker's subprocess operations. Each frame, imgui-bundle's C++ draw code consumes native stack on the main thread. The main thread has 1.94 MB stack (verified via `kernel32.GetCurrentThreadStackLimits`). imgui-bundle's per-frame C stack usage can exceed this 1.94 MB under certain conditions.
|
||||
|
||||
The crash is NOT an architecture violation by the application code. It's a constraint violation by imgui-bundle's native draw code, which assumes more stack than the main thread has.
|
||||
|
||||
## What aspect of negative_flows triggers this
|
||||
|
||||
The aspect: **negative_flows triggers the error-response render path**.
|
||||
|
||||
- `test_z_negative_flows.py` sets `MOCK_MODE=malformed_json` -> the mock_gemini_cli.py subprocess prints broken JSON and exits 1.
|
||||
- The adapter raises an Exception -> `_send_gemini_cli` catches and returns `Result(ok=False)` -> `_handle_request_event` emits a "response" event with `status="error"` -> the render loop processes the event and draws the error response on the next frame.
|
||||
- Other tier-3 tests don't trigger this path because they use MockProvider (no subprocess, no exception, no error render) or use the success-mode mock (adapter returns normally, no error event).
|
||||
|
||||
`test_visual_orchestration.py` uses the same provider setup but does NOT set MOCK_MODE, so the mock defaults to "success" mode, the adapter returns normally, no exception, no error response, no crash. **Empirically verified: this test PASSES in 11.01s.**
|
||||
|
||||
## Why the architecture needs updating
|
||||
|
||||
The architecture's render-loop contract assumes imgui-bundle's C stack usage is bounded. It's not. Specifically:
|
||||
|
||||
- The render loop runs on the main thread (1.94 MB stack, PE-header-baked).
|
||||
- imgui-bundle's per-frame draw code can use significantly more stack, especially when rendering large error overlays, complex text, or extensive draw lists.
|
||||
- When the io_pool worker triggers specific render paths (via emitted events), the main thread's render loop exceeds its 1.94 MB stack.
|
||||
- The architecture has no enforcement mechanism for this (no stack guard, no per-frame stack measurement, no graceful degradation).
|
||||
|
||||
## Where to investigate next (post-compact)
|
||||
|
||||
1. Capture a Windows crash dump to identify the specific imgui-bundle draw call that exhausts the main thread's stack:
|
||||
```
|
||||
procdump -ma -e 1 -f "" uv run python sloppy.py --enable-test-hooks
|
||||
```
|
||||
Open the .dmp in WinDbg, run `!analyze -v` to see the crashing thread and exact C++ stack frame.
|
||||
|
||||
2. Bump the main thread's stack at the OS level (out of scope for a 1-track fix):
|
||||
```
|
||||
editbin /STACK:8388608 C:\projects\manual_slop_tier2\.venv\Scripts\python.exe
|
||||
```
|
||||
|
||||
3. Long-term: consider imgui-bundle's offscreen rendering mode so the main thread isn't doing heavy C++ draw calls.
|
||||
|
||||
## Files in this report
|
||||
|
||||
- `docs/reports/NEGATIVE_FLOWS_INVESTIGATION_20260617_REFINED.md` (the prior investigation)
|
||||
- `scripts/tier2/artifacts/send_result_to_send_20260616/WHATS_SPECIAL.md` (previous round - what's unique about this test)
|
||||
- `scripts/tier2/artifacts/send_result_to_send_20260616/test_visual_orch_out.txt` (visual_orchestration PASSED with same provider setup)
|
||||
- `logs/sloppy_no_click_*.log` (no-click baseline - process survives 60s)
|
||||
- `docs/guide_architecture.md` lines 12, 884-890 (the contract)
|
||||
- `src/app_controller.py` `_handle_generate_send` (line 3434) and `_cb_plan_epic` (line 4025) (the click handlers, both compliant)
|
||||
Reference in New Issue
Block a user