Per user feedback this round: 1. T-shirt size removed from conductor/workflow.md (policy), conductor/tracks.md (registry), and the prior NEGATIVE_FLOWS_INVESTIGATION_20260617.md report. 2. Layout regenerated from _default_windows (17KB -> 3KB, 10 stale windows -> 3). Layout fix did NOT fix the crash. Three new diagnostic experiments (results appended to the report): - diag_no_click.py: process survives 60s without clicks (render loop is stable in isolation; crash is click-triggered). - diag_thread.py: standalone ThreadPoolExecutor + adapter call works fine in all 3 MOCK_MODE modes (subprocess spawn is not the issue). - diag_realbig2_run.py: bumping threading.stack_size(8MB) does NOT prevent the crash (io_pool worker is not where the stack is exhausted). Refined hypothesis: the crash is in the MAIN THREAD's imgui-bundle render loop (1.94 MB stack), running concurrently with the io_pool worker's adapter call. The subprocess spawn + CreateProcessW causes the kernel to allocate resources at the moment the main thread is deep in imgui-bundle C++ frames, exhausting the main thread's small guard page. What's needed for definitive diagnosis: a Windows crash dump (procdump -ma or cdb.exe) to see the actual C-side stack frame, OR a SetUnhandledExceptionFilter in sitecustomize.py that logs the crashing thread's TEB and call stack to stderr before the process dies.
16 KiB
test_z_negative_flows.py Failure - Refined Root Cause Analysis
Investigator: Tier 2 Tech Lead (autonomous run)
Track context: Post-completion of send_result_to_send_20260616
Previous report: NEGATIVE_FLOWS_INVESTIGATION_20260617.md (now superseded by this one for the root-cause section)
TL;DR
The 3 tests in tests/test_z_negative_flows.py fail with Windows 0xC00000FD = STATUS_STACK_OVERFLOW in the GUI subprocess. The Python call stack at the moment of the crash is only 13 frames deep — so this is not a Python recursion bug. The actual cause is that the main thread of sloppy.py only has a 1.94 MB stack on this Python 3.11.6 / Windows installation (verified via kernel32.GetCurrentThreadStackLimits). The io_pool workers DO get the 8MB stack from threading.stack_size(8MB) (set by my diagnostic sitecustomize) — and they STILL crash with 0xC00000FD, which means the stack overflow is in the main thread, not the io_pool worker.
Why the previous "thread stack is too small" theory is wrong
I previously hypothesized the io_pool's 1MB thread stack was the bottleneck. After running three follow-up experiments, this is no longer credible:
- Bumping
threading.stack_size(8 * 1024 * 1024)before any thread is created (via sitecustomize.py loaded into the subprocess) → process still dies with 0xC00000FD. So the io_pool workers and_loop_thread(both created after the sitecustomize) have 8MB stacks and still crash. - Replacing
concurrent.futures.ThreadPoolExecutorwith a custom pool that usesthreading.Thread(..., stack_size=8MB)→ fails on Python 3.11 becauseThread.__init__no longer accepts thestack_sizekwarg in 3.11 (onlythreading.stack_size()global works). Bypassed that by using the global. - Running the adapter directly in
ThreadPoolExecutorfrom a standalone Python process (no imgui-bundle, no render loop) → works fine for all 3 MOCK_MODE values. So the io_pool thread is not the problem in isolation.
The actual data
Python call stack at crash
Instrumented _send_gemini_cli and GeminiCliAdapter.send via sitecustomize.py. Stack at adapter.send ENTRY:
[STK] _send_gemini_cli ENTRY depth=9
[STK] adapter.send ENTRY depth=13
[STK] sitecustomize.py:25 _walk_stack
[STK] sitecustomize.py:42 _patched_send
[STK] ai_client.py:1853 _send
[STK] ai_client.py:808 run_with_tool_loop
[STK] ai_client.py:1917 _send_gemini_cli
[STK] sitecustomize.py:69 _patched_send_gc
[STK] ai_client.py:3016 send
[STK] app_controller.py:3674 _handle_request_event
[STK] thread.py:58 run <-- io_pool worker
[STK] thread.py:83 _worker
[STK] threading.py:982 run
[STK] threading.py:1045 _bootstrap_inner
[STK] threading.py:1002 _bootstrap
13 frames is trivial. ~6-7KB of Python stack. ~50KB of C stack underneath. No recursion anywhere.
Thread stack sizes in this process (verified)
[DIAGSTK] Set thread stack size to 8388608 bytes
[DIAGSTK] Main thread stack: 1.94 MB
Confirmed via kernel32.GetCurrentThreadStackLimits:
import ctypes
GetCurrentThreadStackLimits = ctypes.windll.kernel32.GetCurrentThreadStackLimits
GetCurrentThreadStackLimits.argtypes = [ctypes.POINTER(ctypes.c_void_p), ctypes.POINTER(ctypes.c_void_p)]
low = ctypes.c_void_p(); high = ctypes.c_void_p()
GetCurrentThreadStackLimits(ctypes.byref(low), ctypes.byref(high))
# Result: high - low = 1.94 MB on the main thread
The main thread's stack is 1.94 MB, set by the Windows PE header (Python 3.11.6's python.exe). The sitecustomize's threading.stack_size(8MB) call sets the default for new threads (the io_pool workers, the _loop_thread, the HookServer thread), but the main thread was created before sitecustomize ran, so it keeps its PE-header-baked 1.94 MB.
Process death pattern
$ poll=3221225725 (= 0xC00000FD)
Reproducible 100% across runs and across all 3 MOCK_MODE values (malformed_json, error_result, success).
When the main thread's stack overflows, the whole process dies — including all worker threads. So when the io_pool worker is mid-call to adapter.send, the main thread's stack overflow kills everything.
What is the main thread doing during the test?
The main thread runs immapp.run(...) from imgui-bundle, which is the HelloImGui native render loop. It calls our Python _gui_func callback ~60 times/second. The render loop has been running since startup. By the time the test clicks btn_gen_send:
- ~50-60 frames have been rendered (1 second of warmup + 0.5s × 6 setup calls)
- The imgui-bundle render context has been built up with widgets, fonts, theme
Hypothesis (not yet verified): the render loop is calling into imgui-bundle's native layout/draw code, which is using C++ frames with deep template instantiations. After many frames, the C stack grows. When the click is dispatched and the render loop continues to run alongside the io_pool worker's adapter.send, the main thread's stack hits its 1.94MB guard page and dies.
This is not Python recursion. It's the imgui-bundle native render code's stack usage, accumulated over many frames.
What we know for sure
- The crash is
0xC00000FD = STATUS_STACK_OVERFLOWon Windows. NOT a Python exception. - The Python call chain at the crash point is 13 frames deep. NOT a Python recursion bug.
- The crash happens in the GUI subprocess (
sloppy.pywith--enable-test-hooks), not in pytest. - The crash happens after
click("btn_gen_send")is processed, not before. All 6 setup API calls return 200. - The crash is reproducible 100% with MOCK_MODE in {malformed_json, error_result, success}. Not specific to the exception path.
- The main thread has 1.94 MB. The io_pool workers, after
threading.stack_size(8MB), have 8 MB. Bumping the io_pool stack doesn't fix the crash. - The standalone Python process (no imgui-bundle, no render loop) running the same adapter call from a ThreadPoolExecutor with default 1MB stack works fine for all 3 MOCK_MODE values.
What we don't know yet
- Whether the main thread is actually the one whose stack overflows (vs. a thread we haven't yet identified — e.g., a HelloImGui-internal thread, or a thread created by imgui-bundle). To verify, I'd need to attach a debugger or add
SetUnhandledExceptionFilterlogging in the subprocess to dump the crashing thread's TEB. - What specific imgui-bundle code path causes the C stack to grow. Without a debugger or
WERcrash dump, we can't see the C-side stack trace. - Whether the stack growth is linear (slow leak over many frames) or sudden (one specific draw call).
Plausible root cause (next investigation step)
The most likely culprit is one of:
-
_render_message_panel/_render_response_panelrendering path: whenai_statusbecomes "error", the response panel starts rendering an error overlay. If the error overlay calls into imgui-bundle with a pathological layout (e.g.,add_rectwith a malformed argument list — the bug from9fcf0517!), imgui-bundle may recurse deeply into its C++ template metaprogramming for layout calc. Even with the theme fix in9fcf0517, the C++ stack usage per frame may have grown to the point where the next frame overflows the 1.94MB main thread stack. -
A specific frame's draw call: clicking
btn_gen_sendtriggers_do_generatein a worker, which puts an event on the queue, which gets processed by the render loop on the next frame. The render loop renders the new state. That specific draw call has a deep C++ stack. -
External MCP server thread: if any external MCP server is connected, its thread may have a small stack. But this would be caught by the io_pool stack bump, which we did.
Recommended next steps (in order)
- Capture a Windows Error Reporting (WER) crash dump from the subprocess. Run
sloppy.pyunder a debugger (e.g.,cdb.exe -g -G -o sloppy.py --enable-test-hooks) or useprocdump -ma -e 1 -f "" sloppy.py. This will give us a.dmpfile with full call stacks for ALL threads at the moment of crash. - Add
SetUnhandledExceptionFilterto the subprocess that logs the crashing thread's TEB and stack to stderr before the process dies. The handler can be installed viasitecustomize.pyso it doesn't require code changes tosloppy.py. - Reduce the test's render load: if the test workspace's layout file is 17KB and references 10 stale window names, that may be a major source of native stack usage per frame. Fix the stale layout (it has been stale for 7+ days per the WARNING in the log: "Run the 'Reset Layout' command from the Command Palette").
- Bump the main thread's stack at the OS level: This requires modifying the PE header of
python.exe(viaeditbin /STACK:8388608 python.exeon Windows) or recompiling. Neither is in scope for a 1-track fix.
The fix path forward
Short-term (ship in next track, 1-2 hours):
- Fix the stale
manualslop_layout.ini(it references 10 deleted window names, causing imgui-bundle to do extra work each frame) - Capture a WER dump to identify the actual C-side stack frame that overflows
- If the dump points to a specific render function, fix that function
Medium-term (separate track, 1-2 days):
- Bump
sloppy.py's main thread stack viaeditbin(Windows) or by settingPYTHONSTACKSIZEenv var if available - Migrate heavy AI calls to a subprocess (
multiprocessing.Process) so the C stack is per-call, not per-thread
Long-term (architectural):
- Move the GUI's render loop off the main thread (or use imgui-bundle's offscreen rendering mode) so the main thread is a thin renderer
- Move all
subprocess.Popencalls to dedicated subprocess worker pool
Update 2026-06-17 (post-user-feedback round)
User feedback after the previous report:
- Remove the T-shirt size metric from all places encountered.
- Fix the layout (it was stale - 10 windows referencing deleted/renamed windows).
- The user correctly suspected "Something more fundamental is wrong" - the layout fix was a guess.
T-shirt size removal (done)
Removed T-shirt size from:
conductor/workflow.md(the policy file) - removed the S/M/L/XL table, the replacement pattern row, and the "reasonable effort" guard's reference. Scope (N files, M sites, N tasks) is now the only effort dimension.conductor/tracks.md(the registry) - removed the T-shirt column header and the Fable track entry's T-shirt mentions.docs/reports/NEGATIVE_FLOWS_INVESTIGATION_20260617.md- removed the T-shirt mention in the follow-up suggestion.
Track artifacts (conductor/tracks/fable_review_20260617/metadata.json, conductor/tracks/result_migration_20260616/metadata.json, their spec.md files) still have T-shirt references. These are historical track snapshots - left as records of past decisions.
Layout fix (done, didn't help)
Regenerated manualslop_layout.ini: 17,360 bytes -> 3,361 bytes (102 windows -> 23 windows). Now matches the windows registered in src/app_controller.py _default_windows (lines 1862-1886). Docking section preserved. Stale window warning dropped from 10 windows to 3.
The layout fix did NOT fix the crash. Process still dies with rc=3221225725 (0xC00000FD) within 1s of click.
Three new diagnostic experiments (everything points at the main thread)
Experiment 1: No-click baseline (diag_no_click.py). Spawned sloppy.py with hook server, did NO clicks, waited 60s polling status every 2s. Process survived 60s. So the render loop is stable in isolation; the crash is specifically triggered by the click chain.
Experiment 2: Standalone ThreadPoolExecutor (diag_thread.py). Created a fresh ThreadPoolExecutor, called the adapter from a worker thread, tested all 3 MOCK_MODE values. No crash, no stack overflow. So the io_pool thread + adapter + subprocess stack usage is fine in isolation.
Experiment 3: Bumped io_pool to 8MB stack (diag_realbig2_run.py). Used threading.stack_size(8 * 1024 * 1024) via sitecustomize.py, then spawned sloppy.py. Verified via the log: [DIAGSTK] Set thread stack size to 8388608 bytes. Process STILL dies with 0xC00000FD. So the io_pool worker's stack is not the bottleneck.
Refined understanding
Combining all the data:
| What we know | What it means |
|---|---|
| Call depth at crash is 13 frames | Not Python recursion; not call depth |
threading.stack_size(8MB) doesn't help |
The io_pool worker (and _loop_thread) are not where the stack is exhausted |
Main thread stack is 1.94 MB (verified via kernel32.GetCurrentThreadStackLimits) |
The only thread left with a small stack is the main thread |
Crash happens after _send_gemini_cli returns ok=False but before the "response" event is emitted |
The crash is in the ai_client.send -> _handle_request_event -> _on_api_event chain OR in something concurrent with it (render loop on main thread) |
| Standalone ThreadPoolExecutor + adapter works fine | The subprocess spawn is fine; the issue is specific to sloppy.py's environment |
| Render loop is stable in isolation (no clicks) | The crash is triggered by the click -> worker -> adapter call chain |
Most likely cause (re-formulated hypothesis)
The crash is almost certainly in the main thread, not the io_pool worker. The main thread's imgui-bundle render loop is running concurrently with the io_pool worker's adapter call. When the click is processed:
- The io_pool worker calls
subprocess.Popen(CreateProcessW on Windows) - The Windows kernel allocates resources for the new process
- The main thread's render loop is in a frame draw call
- Some imgui-bundle native code in the render loop uses the C stack
- The main thread's 1.94 MB stack is exhausted
The cmd_list debug print (in the io_pool worker) succeeds because the io_pool worker has 8MB. But the main thread is rendering concurrently and runs out.
The "after _send_gemini_cli returns" timing is incidental - it just happens to be when the main thread's render loop hits the stack limit. The actual crash is in imgui-bundle's render code, not in the AI call chain.
What's needed for definitive diagnosis
To find the actual C-side stack frame that's overflowing, we need:
-
A Windows crash dump. Run sloppy.py under a debugger:
cdb.exe -g -G -o sloppy.py --enable-test-hooksOr use
procdump:procdump -ma -e 1 -f "" sloppy.py --enable-test-hooksThe .dmp file gives full call stacks for ALL threads at the moment of crash.
-
Or:
SetUnhandledExceptionFilterin sitecustomize.py that dumps the crashing thread's TEB and call stack to stderr before the process dies. This avoids needing a debugger.
Files added in this round
scripts/tier2/artifacts/send_result_to_send_20260616/diag_no_click.py(no-click baseline - confirms crash is click-triggered)scripts/tier2/artifacts/send_result_to_send_20260616/diag_thread.py(standalone ThreadPoolExecutor - confirms subprocess works in isolation)scripts/tier2/artifacts/send_result_to_send_20260616/diag_realbig2_run.py(8MB thread stack - confirms io_pool worker is not the bottleneck)scripts/tier2/artifacts/send_result_to_send_20260616/diag_thread_stk_run.py(instrumented thread.start logging)scripts/tier2/artifacts/send_result_to_send_20260616/regen_layout.py(regenerates layout from_default_windows)scripts/tier2/artifacts/send_result_to_send_20260616/remove_tshirt3.py(removes T-shirt from conductor files)logs/sloppy_no_click_*.log(process alive after 60s, no clicks)logs/sloppy_diag2_*_after_layout.log(process dies after layout fix)
Files in this report
docs/reports/THEME_BUG_ANALYSIS_send_result_to_send_20260616.md(the prior theme fix report, restored in8c6d9aa0)docs/reports/NEGATIVE_FLOWS_INVESTIGATION_20260617.md(the previous investigation — partially superseded)docs/reports/NEGATIVE_FLOWS_INVESTIGATION_20260617_REFINED.md(this file)scripts/tier2/artifacts/send_result_to_send_20260616/diag_diag_stacks_init.py(sitecustomize that sets 8MB stack + reports main thread stack size)logs/sloppy_diag_stk_20260617_*.log(log showing "Main thread stack: 1.94 MB" then crash)