manual_slop

Private

Public Access

Author	SHA1	Message	Date
ed	dc5e581368	chore(track): archive throw-away scripts for result_migration_review_pass_20260617 (4 helper scripts + sites_to_classify.json)	2026-06-17 17:02:27 -04:00
ed	4b5d5caa8b	docs(tier2): hand off to tier 1 - architectural investigation of stack overflow User indicated they want tier 1 to investigate ('something feels architecturally wrong'). Investigation summary: ROOT CAUSE: imgui.set_window_focus('Response') called on the same frame as the response render, when _trigger_blink is set by _handle_ai_response. The native call exhausts the main thread's 1.94MB stack. VERIFIED: disabling _trigger_blink and _autofocus_response_tab makes the test PASS. The process survives, the response event arrives with correct error text. HISTORY CHECK (git log -S): - _trigger_blink: pre-existing since March 2026 (`c88330cc` feat(hot- reload) Exhaustive region grouping for module-level render funcs) - _autofocus_response_tab: pre-existing since March 6 2026 (`0e9f84f0` 'fixing') - set_window_focus in render_response_panel: pre-existing since `96a013c3` 'fixes and possible wip gui_2/theme_2 for multi-viewport' - response event flow: pre-existing since `68861c07` feat(mma): Decouple UI from API calls using UserRequestEvent and AsyncEventQueue - FR1 (send_result error routing): commit `24ba2499` (Jun 15 2026) in public_api_migration_and_ui_polish_20260615 track The jank is OLDER than the user thinks. The most likely explanation: the test was never run as part of the regular tier-3 batch, so the crash was masked by the Isolated-Pass Verification Fallacy. QUESTIONS FOR TIER 1: 1. Is _trigger_blink a sound design? 2. Should imgui focus changes be deferred to next frame's idle phase? 3. Is there a general principle that no native imgui call should be made during the same frame as a draw call? PROPOSED MINIMAL FIX: defer set_window_focus to next frame's idle phase via a _pending_focus_response flag handled in _process_pending_gui_tasks (which runs before the render).	2026-06-17 13:40:12 -04:00
ed	694cfd2b70	diag(tier2): isolate the jank - _trigger_blink in render_response_panel User asked: 'what does negative flows cause in the imgui procedural dag graph that would cause a recursive processing of the stack?' Tested 4 hypotheses: 1. PYTHONSTACKSIZE env var to bump main thread stack: IGNORED. Main thread stays at 1.94MB regardless of env var or PE header (PE header SizeOfStackReserve is 4TB but Windows OS uses its own default for the main thread commit size). 2. -X faulthandler: doesn't capture native STATUS_STACK_OVERFLOW (faulthandler only catches Python-level signals). 3. Editbin /STACK: editbin not installed on this system. 4. PE header patching with ctypes: SizeOfStackReserve is 4TB but the OS commits only 1.94MB for the main thread and Python doesn't honor any env var to change it. The breakthrough: monkey-patched _handle_ai_response via sitecustomize to disable _trigger_blink and _autofocus_response_tab. Result: WITHOUT _trigger_blink: process survives 60s, response event arrives with status='error' and correct error text. The test WOULD PASS. WITH _trigger_blink (default): process dies with 0xC00000FD (STATUS_STACK_OVERFLOW) within 1s of click. The jank: in src/gui_2.py:render_response_panel (line 5537), the _trigger_blink flag triggers imgui.set_window_focus('Response') on the SAME frame as the response render. This native imgui call apparently triggers imgui-bundle to do extra C++ draw work that exhausts the main thread's 1.94MB stack. Why negative_flows specifically: it's the ONLY tier-3 test where the error response triggers the _trigger_blink path. Success responses also trigger _trigger_blink but don't crash (perhaps because imgui- bundle's layout calculations for an error overlay are heavier than for a normal text response). User predicted: 'i wont solve it but just pad out until failure'. Confirmed - bumping stack didn't fix it (couldn't bump anyway, but the prediction about recursion-related behavior is on track). The fix (per user's framing 'needs to be guarded'): wrap the set_window_focus call in render_response_panel in a try/except or add a stack-depth guard before calling it. Or move the _trigger_blink logic to a deferred frame to avoid the same-frame race with the response render.	2026-06-17 13:22:38 -04:00
ed	cc234b1b83	docs(tier2): architecture check - click chain isolation is correct Per user question about whether execution is properly isolated between AppController and gui_2.py main thread. Verified by reading the architecture contract (docs/guide_architecture.md lines 12, 884-890) and the two click handlers in question: - _handle_generate_send (btn_gen_send): self.submit_io(worker) - _cb_plan_epic (btn_mma_plan_epic): self.submit_io(_bg_task) BOTH click handlers return immediately after submitting work. The heavy AI call (ai_client.send -> subprocess.Popen -> process.communicate) runs on the io_pool worker thread. The execution isolation between AppController and gui_2.py's main render thread IS being followed. The crash (STATUS_STACK_OVERFLOW, 0xC00000FD) is NOT in the click handler chain. It IS in the main thread's imgui-bundle render loop. The render loop runs concurrently with the io_pool worker's subprocess operations. imgui-bundle's per-frame C++ draw code can exceed the main thread's 1.94 MB stack (verified via kernel32.GetCurrentThreadStackLimits). What aspect of negative_flows triggers this: the error-response render path. MOCK_MODE=malformed_json causes the adapter to raise, which triggers _handle_request_event to emit a 'response' event with status='error'. The render loop draws this error response on the next frame, exhausting the main thread's stack. test_visual_orchestration.py uses the same provider setup but does NOT set MOCK_MODE, so the mock defaults to 'success' mode, the adapter returns normally, no error event, no crash. Empirically PASSED in 11.01s. The architecture's render-loop contract assumes imgui-bundle's C stack usage is bounded. It's not. The architecture has no enforcement mechanism (no stack guard, no per-frame stack measurement, no graceful degradation). Next step (post-compact): capture Windows crash dump via procdump to identify the specific imgui-bundle draw call.	2026-06-17 13:09:57 -04:00
ed	cc2105dc65	docs(tier2): what's special about test_z_negative_flows User asked why this test is uniquely affected. Answer: it's the ONLY tier-3 test where the AI call runs ASYNCHRONOUSLY in the io_pool worker while the imgui-bundle render loop continues on the main thread. Verified: test_visual_orchestration.py::test_mma_epic_lifecycle uses the same provider setup (gemini_cli + mock_gemini_cli.py + click) but calls orchestrator_pm.generate_tracks() synchronously in the main thread, blocking the render loop. It PASSES in 11s. test_mma_step_mode_sim.py::test_mma_step_mode_approval_flow also uses the async path but is @pytest.mark.skipif(not RUN_MMA_INTEGRATION) - skipped by default. Would likely also crash if unsuppressed. All other MockProvider tests short-circuit at ai_client.send and never spawn a subprocess. The crash is on the MAIN thread (1.94 MB stack, verified via kernel32.GetCurrentThreadStackLimits), not the io_pool worker (which has 8MB after threading.stack_size(8MB) patch). The main thread's imgui-bundle render loop runs concurrently with the io_pool worker's subprocess.Popen / process.communicate. The accumulated imgui-bundle C++ frames exhaust the main thread's 1.94 MB stack. This explains: - Why bumping io_pool stack to 8MB doesn't help (the patch can't reach the main thread, which was created before any sitecustomize runs). - Why the standalone subprocess call works (no render loop concurrent). - Why the no-click baseline survives 60s (no AI call to trigger the race). Next step: capture a Windows crash dump via procdump or cdb.exe to confirm the crashing thread is the main thread and identify the specific imgui-bundle C++ stack frame.	2026-06-17 12:58:15 -04:00

5 Commits