The audit script's visit_Try had a bug where the
\or child in handler.body\ loop was OUTSIDE the
\or handler in node.handlers\ loop. So \handler\ was bound
to the LAST handler, and only the last handler's body was walked.
Raises in non-last except handlers were missed (e.g.,
src/rag_engine.py:31 was not in the audit findings).
The fix moves the inner loop inside the outer loop so each
handler's body is walked. Both the FIRST and LAST handler raises
are now detected.
Adds tests/test_audit_exception_handling_bug_fixes.py with 2
tests for the walker behavior (first-handler raise, middle-handler
raise in a 3-handler try).
Tier 2 sandbox invariant: no production script under ./scripts/ may
write to the global %TEMP% directory (C:\\Users\\Ed\\AppData\\Local\\
Temp\\). All scratch / intermediate files must live in:
- ./tests/artifacts/ (for test artifacts)
- C:\\Users\\Ed\\AppData\\Local\\manual_slop\\tier2\\ (for app data)
Writing to %TEMP% breaks the sandbox boundary: the OpenCode session
fires the 'ask' prompt for paths outside the project root, halting
autonomous ops (the 2026-06-17 bug with audit_exception_handling.py
output being written to %TEMP% by the agent's shell redirection).
Convention enforcement (per conductor/workflow.md Audit Script Policy):
- scripts/audit_no_temp_writes.py: the canonical audit. Same shape
as scripts/audit_exception_handling.py: --json for machine output,
--strict for the CI gate (exits 1 on any violation). Patterns
cover tempfile module, os.environ['TEMP'], C:\Users\Ed\AppData\Local\Temp, %TEMP%,
/tmp/, etc. Excludes the throw-away archive at scripts/tier2/
artifacts/ and itself (so it can find its own pattern defs).
- tests/test_no_temp_writes.py: default-on regression test. Calls
the audit with --strict and asserts exit 0. If a new script
under ./scripts/ ever uses %TEMP%, the test fails and CI breaks.
Current state: CLEAN. All 36 tier2 tests pass (1 new + 16 slash
command spec + 13 failcount + 6 opt-in). Sanity-checked: dropping
a fake 'import tempfile' script into ./scripts/ triggered exit 1
with 'FOUND 1 matches: scripts/_test_temp_check/test_uses_temp.py:1:
import tempfile'.
Future: also add a corresponding deny rule to the sandbox bash
permission in a follow-up if needed (already added in 03c9df84 for
the agent's own bash). The audit + test is the structural guard.
The Tier 2 agent wrote audit_exception_handling.py output to
C:\\Users\\Ed\\AppData\\Local\\Temp\\audit_initial.json via shell
redirection. This is OUTSIDE the sandbox allowlist (which is
C:\\projects\\manual_slop_tier2 + C:\\Users\\Ed\\AppData\\Local\\
manual_slop\\tier2 + C:\\Users\\Ed\\AppData\\Local\\manual_slop\\
tier2_failures). The OpenCode session-level guard fires the 'ask'
prompt for paths outside the project root, which has no answer in an
autonomous session, so ops halted mid-track.
Fix (3 layers):
1. opencode.json.fragment: add bash deny rule
'*AppData\\Local\\Temp\\*': 'deny' to BOTH the top-level
permission.bash (for default agents) and the tier2-autonomous
agent's permission.bash. The agent physically cannot run shell
commands that target the global Temp dir.
2. conductor/tier2/agents/tier2-autonomous.md: add 'Temp files'
convention telling the agent to use
C:\\Users\\Ed\\AppData\\Local\\manual_slop\\tier2\\ for scratch
/ audit-output / intermediate files, NOT %TEMP%.
3. conductor/tier2/commands/tier-2-auto-execute.md: same convention
in the slash command so the agent sees it at slash-command time.
Tests (default-on):
- test_agent_denies_temp_writes: agent prompt has the Temp deny in
frontmatter bash + the app-data dir note
- test_config_fragment_denies_temp_writes: both top-level and agent
bash have the deny rule
All 16 tier 2 slash command tests pass.
Also: cleaned up the leaked audit_initial.json + audit.json +
audit_after*.json from %TEMP% (they were leftovers from a prior
run). Re-ran setup against the live clone; opencode.json's agent
bash and top-level bash both have the deny rule.
The clone's opencode.json inherited the main repo's top-level 'model'
field (zai/glm-5) via 'git clone'. The tier2-autonomous agent has its
own 'model: minimax-coding-plan/MiniMax-M3' override, so the default
agent path was technically correct, but any other agent spawned without
an explicit model (or if the user manually switched to build/plan)
would have used zai/glm-5 instead of MiniMax-M3.
Fix:
1. Add top-level 'model: minimax-coding-plan/MiniMax-M3' to
conductor/tier2/opencode.json.fragment.
2. setup_tier2_clone.ps1 merge now overrides 'model' from the fragment
(was only overriding agent, permission, default_agent).
3. Added test_config_fragment_has_top_level_model (default-on) to
assert the fragment's model field.
4. Added test_setup_script_overrides_model (opt-in TIER2_SANDBOX_TESTS=1)
to assert the merge code.
All 17 tests pass (14 default-on + 3 opt-in).
Verified: re-ran setup against the live clone; opencode.json's
top-level 'model' is now minimax-coding-plan/MiniMax-M3.
Follow-up to 9cd85364. The previous fix patched the OpenCode session-
level permission.read/write allowlist to include the sandbox clone
path, but Tier 2 was still hitting 'ACCESS DENIED' on clone paths.
Root cause: the MCP server has its OWN allowlist that's separate from
OpenCode's session-level permission. The MCP server's allowlist =
project_root (parent dir of the script) + extra_dirs from
mcp_paths.toml in the project root. The clone inherited the main
repo's mcp.manual-slop.command via 'git clone', which launched
C:\\projects\\manual_slop\\scripts\\mcp_server.py with
PYTHONPATH=C:\\projects\\manual_slop\\src. So the MCP server was
using the main repo's project_root + the main repo's mcp_paths.toml
(extra_dirs=['C:/projects/gencpp']) -- exactly the
'Allowed base directories are: gencpp, manual_slop' the user saw.
Fix: setup_tier2_clone.ps1 now overrides the clone's mcp.manual-slop
config to point at the CLONE's scripts/mcp_server.py and src/, and
replaces the clone's mcp_paths.toml with an empty extra_dirs list.
The MCP server's allowlist becomes [C:\\projects\\manual_slop_tier2]
only -- the sandbox boundary.
Added test_setup_script_overrides_mcp_server (text-based regression)
to assert the script contains the required overrides. Opt-in via
TIER2_SANDBOX_TESTS=1.
Verified: re-ran setup against the live clone. opencode.json now has
mcp.manual-slop.command pointing at C:\\projects\\manual_slop_tier2\\
scripts\\mcp_server.py with PYTHONPATH=C:\\projects\\manual_slop_tier2\\
src. mcp_paths.toml has 'extra_dirs = []'.
Replace positional args[3..5] assertions with assert_called_once_with using
rounding=/thickness=/flags= kwargs to match the existing add_rect call in
src/theme_nerv_fx.py:AlertPulsing.render and the parallel test in
tests/test_theme_nerv_fx.py:TestThemeNervFx.test_alert_pulsing_render.
Fixes test_alert_pulsing_render_active IndexError that surfaced when the
positional contract was asserted against the kwargs-shaped production call.
Regression: a Tier 2 session was denied access to
C:\\projects\\manual_slop_tier2\\scripts\\run_tests_batched.py
with 'Allowed base directories are: gencpp, manual_slop'. The
tier2-autonomous agent had a correct permission.read allowlist, but
the top-level permission block (inherited from the main repo's
opencode.json via 'git clone') had no read/write keys, and OpenCode
uses the top-level for the default agent path. The agent's
permission.read was merged but apparently not enforced for the
default-agent access check.
Fix:
1. Add a top-level 'permission' block to
conductor/tier2/opencode.json.fragment with:
- permission.edit: 'deny' (default agents locked down)
- permission.read: deny *, allow sandbox clone + app-data dirs
- permission.write: same
- permission.bash: deny *, allowlist of read-only git commands +
uv run python scripts/{run_tests_batched.py,tier2/*} + basic
shell commands. git push/checkout/restore/reset remain denied.
2. Update setup_tier2_clone.ps1 to also patch the top-level
'permission' block (was only merging the tier2-autonomous agent
block). The script preserves the user's mcp, model, instructions,
watcher, and plugin settings from the inherited opencode.json.
3. Update test_tier2_slash_command_spec.py:
- Rename test_command_fetches_origin_main -> ..._master (we
changed the slash command on 2026-06-17).
- Add test_config_fragment_has_top_level_permission to assert
the new top-level permission block has the right deny-all +
allowlist shape.
The tier2-autonomous agent's permission block is unchanged; it
overrides the top-level for that agent's tool calls.
src/theme_nerv_fx.py:97 was calling draw_list.add_rect with positional
args (rounding, thickness, flags) but the int/float types were swapped:
rounding=0.0 (correct)
thickness=0 (int, signature expects float)
flags=10.0 (float, signature expects int)
The TypeError fires every render frame once ai_status starts with
'error'. App.run's except RuntimeError eventually catches and calls
self.shutdown() -> controller.shutdown() -> _io_pool.shutdown(wait=False).
Subsequent tests in the same live_gui session can't submit_io.
Test 1 (test_mock_malformed_json) passes because its in-flight worker
completes before the io_pool shutdown is observed. Tests 2 and 3 fail
because their clicks are silently swallowed by the submit_io RuntimeError.
Switch to keyword args with correct types. Update test_theme_nerv_fx
assertion to match.
Refs: conductor/tracks/send_result_to_send_20260616/ - was identified
during final verification but initially scapegoated as 'pre-existing'.
Per user feedback, the bug is fixed now.
Verified: test_theme_nerv_fx 5/5 pass. test_z_negative_flows.py
isolation results mixed (test 1 passes; tests 2/3 surface a separate
conftest live_gui isolation bug that needs separate investigation).
Batch rename of 22 test files. 62 references renamed total.
The full test suite is now GREEN again, matching the pre-rename baseline
from Task 1.1. Pure mechanical rename. No behavior change.
Files affected: test_ai_cache_tracking, test_ai_client_cli,
test_ai_client_result, test_api_events, test_context_pruner,
test_deepseek_provider, test_gemini_cli_* (3 files), test_gui2_mcp,
test_headless_* (2 files), test_live_gui_integration_v2,
test_orchestration_logic, test_phase6_engine, test_rag_integration,
test_run_worker_lifecycle_abort, test_spawn_interception_v2,
test_symbol_parsing, test_tier4_interceptor, test_tiered_aggregation,
test_token_usage.
Note: spec estimated 24 files; actual is 22 (test_deprecation_warnings
no longer exists, and 1 fewer file than spec's list).
Refs: conductor/tracks/send_result_to_send_20260616/
13 references renamed (planned 12; one extra found in a comment).
Test function test_fr2_send_result_callable_in_app_controller_namespace
renamed to test_fr2_send_callable_in_app_controller_namespace.
7 tests pass.
Two bugs in src/rag_engine.py were causing 'NoneType object has no attribute get'
in the live_gui RAG tests (test_rag_phase4_final_verify,
test_rag_phase4_stress):
1. _validate_collection_dim_result:148
Old: if not embeddings or len(embeddings) == 0:
New: if embeddings is None or len(embeddings) == 0:
The 'if not embeddings' check raises ValueError('The truth value of an
array with more than one element is ambiguous. Use a.any() or a.all()')
when 'embeddings' is a non-empty numpy array (which is the normal case
after documents are upserted). The exception is caught by the outer
'except Exception' which returns a non-ok Result, causing __init__ to
set self.collection = None. Subsequent 'get_all_indexed_paths()' then
fails with 'NoneType has no attribute get' on self.collection.get().
2. get_all_indexed_paths:334
Old: return list(set(m.get('path') for m in res['metadatas'] if m.get('path')))
New: return list(set(m['path'] for m in res['metadatas'] if m is not None and m.get('path')))
When chromadb returns 'metadatas=[None, ...]' (documents upserted
without metadata), 'm.get('path')' fails with AttributeError on the
first None element. Adds 'm is not None' guard.
Both fixes are defensive: the conditions that trigger them (orphan docs
without metadata, non-empty embeddings arrays) are normal valid
states that the old code couldn't handle.
New file: tests/test_rag_sync_none_error.py
3 unit tests covering both bugs:
- test_dim_check_does_not_raise_on_non_empty_ndarray
- test_get_all_indexed_paths_handles_none_metadata
- test_get_all_indexed_paths_returns_paths_with_metadata
Verified:
- 3/3 focused tests pass
- test_rag_phase4_final_verify.py::test_phase4_final_verify PASSES (was failing)
- test_rag_phase4_stress.py::test_rag_large_codebase_verification_sim PASSES (was failing)
- test_rag_visual_sim.py::test_rag_full_lifecycle_sim PASSES (still passing)
The test_headless_verification_full_run test in test_headless_verification.py
mocked src.multi_agent_conductor.ai_client.send_result with a return_value
of a raw string. The production code does 'if not result.ok:' which
fails on raw strings with AttributeError.
In xdist mode this caused a worker crash (gw0/gw11: 'node down: Not
properly terminated') that hung the entire tier-1-unit-headless batch
in the batched test runner (~50s+ per batch). The crash was the
worker dying while pytest-master waited for it; the master never
got a clean exit and the run was orphaned until the user's manual
cancel.
The test was missed in the original Phase 2 list (it was an xdist
crash rather than a test logic failure) and in the 4 Phase 2
follow-up commits (which targeted the 4 specific test files the
user reported during the run).
Change: mock_send.return_value = 'Task completed successfully.' ->
mock_send.return_value = Result(data='Task completed successfully.')
Plus add the Result import.
2/2 tests in test_headless_verification.py now pass under xdist
(was 1/2 + worker crash in xdist). Full headless batch (14 tests)
completes in 18.7s.
The test_run_worker_lifecycle_uses_strategy test in test_tiered_aggregation.py
mocked src.multi_agent_conductor.ai_client.send_result with a return_value
of a raw string. The production code does "if not result.ok:" which
fails on raw strings.
3/3 tests in test_tiered_aggregation.py pass (was 2/3).
The test_rag_integration test mocks the internal _send_gemini
function to return a raw string. The production code in
app_controller._handle_request_event now does 'if result.ok:'
which fails on raw strings.
Change: mock_provider.return_value = 'Mock AI Response' ->
mock_provider.return_value = Result(data='Mock AI Response')
Plus add the Result import.
1 test passes (was 1 pre-existing failure).
The test_token_reduction_logging test in test_context_pruner.py
mocked src.ai_client.send_result with a lambda that returned
a raw string. The production code now does "if not result.ok:"
which fails on raw strings.
1 test passes (was 1 pre-existing failure).
The 7 tests in test_conductor_engine_v2.py (already updated to
mock src.ai_client.send_result) were still returning raw strings
from the mocks. The production code in multi_agent_conductor.py
now does "if not result.ok:" which fails on raw strings with
AttributeError.
Changes:
- Add "from src.result_types import Result" import
- Wrap all mock_send.return_value = "..." with Result(data="...") (4 sites)
- Wrap MagicMock(return_value="...") with Result(data="...") (2 sites)
- Wrap side_effect return with Result(data="Success")
10/10 tests pass (was 3/10).
Per plan Task 6.3: both tests in test_deprecation_warnings.py are obsolete
after the send() function was removed in Phase 6.1:
- test_send_deprecated_warning_emitted_once_per_site: literally cannot
run without ai_client.send (AttributeError)
- test_send_result_does_not_emit_deprecation: trivially true after
send() is removed (no deprecation source)
The test_send_result_does_not_emit_deprecation regression test is
preserved in tests/test_ai_client_result.py (added in Phase 2.7 as the
renamed test). The pre-Phase-2.7 test_send_deprecated_emits_warning
was deleted in Phase 2.7.
Verification: pytest tests/test_deprecation_warnings.py reports
'ERROR: file or directory not found'.
The test used src.find() which locates the first occurrence of
'Refresh Registry' in the comment block (line 2090 in src/gui_2.py),
not the actual code (line 2111). The 400-char snippet window doesn't
reach the code, so the assertion for 'load_registry' fails.
Production code is already correct (in-place load_registry()) at
src/gui_2.py:2111-2112 (user commit df7bda6e). This test just needs
to use rfind() to locate the actual code, not the comment.
Change: src.find(marker) -> src.rfind(marker)
1 test passes (was 1 pre-existing failure).
The test used src.find() which locates the first occurrence of
'Keep Pairs:' in the comment block (line 5113 in src/gui_2.py), not
the actual code (line 5130). The 200-char snippet window only reaches
the comment, so the assertions for set_next_item_width(140) and
drag_int fail.
Production code is already correct (set_next_item_width(140) +
drag_int) at src/gui_2.py:5130-5131 (user commit d0b06575). This
test just needs to use rfind() to locate the actual code, not the
comment.
Change: src.find(marker) -> src.rfind(marker)
1 test passes (was 1 pre-existing failure).
The 2 tests in test_symbol_parsing.py mock src.ai_client.send but
production now uses send_result (migrated by doeh_test_thinking_cleanup_20260615
commit 24ba2499). Mocks receive 0 calls; tests fail with
"send was called 0 times".
Changes:
- Replace patch(src.ai_client.send) with patch(src.ai_client.send_result)
- Rename mock_send to mock_send_result
- Set return_value=Result(data="mocked response")
- Add "from src.result_types import Result" import
All 2 tests in test_symbol_parsing.py pass (were 2 pre-existing failures).
The _send_qwen() function returns Result[str] after the
data_oriented_error_handling_20260606 refactor (commit 64d6ba2d),
but 2 tests in test_qwen_provider.py were asserting against the
raw str type. They were 2 of the 10 pre-existing failures documented
in the track spec.
Changes (mirrors the doeh_test_thinking_cleanup_20260615 pattern for
grok/llama/llama_native):
- Replace assert result == "hi from qwen" with assert result.ok and result.data == "hi from qwen"
- Replace assert "cat" in result.lower() with assert result.ok and "cat" in result.data.lower()
- Add "from src.result_types import Result" import
All 5 tests in test_qwen_provider.py now pass (was 3/5).
Phase 2.13 missed the test_run_worker_lifecycle_blocked test in
test_orchestration_logic.py - it also mocked src.ai_client.send.
The test was failing with "Worker send_result failed for T1: ...
[Errno 2] No such file or directory: .beads_mock/beads.json" because
the unmocked send_result fell through to the real provider which
tried to read beads.json.
Changes:
- Replace patch(src.ai_client.send) with patch(src.ai_client.send_result)
- Wrap mock return_value with Result(data="BLOCKED because of missing info")
All 8 tests in test_orchestration_logic.py now pass.
The test_ai_client_passes_qa_callback test calls ai_client.send() with
qa_callback=lambda. The qa_callback is passed through to the provider
function (_send_gemini).
Per plan note: the test has complex callback setup; the Result handling
needs the mock to return Result(data="ok") so the qa_callback passes
through and the test succeeds.
Changes:
- Rename ai_client.send(...) to ai_client.send_result(...)
- Add assert result.ok
- Mock _send_gemini to return Result(data="ok") instead of relying on
the default (which would call the real provider)
- Add "from src.result_types import Result" import
7 tests pass (the migrated test_ai_client_passes_qa_callback was
previously broken because the send() call hit the real provider and
either failed or returned empty; the mock now provides a clean response).
Changes:
- Rename ai_client.send(...) to ai_client.send_result(...) (2 sites)
- Add assert result.ok (1 site; the second test only checks result is not None)
- Add "from src.result_types import Result" import
2 tests pass.
All 6 sites in test_deepseek_provider.py call ai_client.send(...). Each
assertion pattern is slightly different (==, "in", call_args inspection);
migration follows the same pattern: rename to send_result(), add
assert result.ok, and use result.data for the response text.
Changes:
- Rename ai_client.send(...) to ai_client.send_result(...) (6 sites)
- Add assert result.ok (6 sites)
- Replace result == "x" with result.data == "x" (or "x" in result.data)
- Add "from src.result_types import Result" import
7 tests pass (1 unrelated test_deepseek_model_selection + 6 migrated).
Per plan Task 2.7:
- DELETE test_send_deprecated_emits_warning (obsolete after Phase 6; send()
is being removed)
- RENAME test_send_extracts_data_from_result -> test_send_result_does_not_emit_deprecation
(this is the regression test the plan said to KEEP; it now asserts the new
API does not emit a deprecation warning, instead of testing the old behavior)
- MIGRATE test_send_extracts_data_from_result (renamed to the above)
- MIGRATE test_send_returns_empty_string_on_error_result ->
test_send_result_returns_empty_data_with_error_on_auth_failure (asserts
the Result has data="" and not ok)
5 tests pass (down from 6; the deleted test removed 1; the renamed
test_send_extracts_data_from_result became test_send_result_does_not_emit_deprecation).
The test_mcp_tool_call_is_dispatched test calls ai_client.send() and
asserts the MCP dispatch function was called. Migrating to send_result()
+ assert result.ok.
Changes:
- Rename ai_client.send(...) to ai_client.send_result(...)
- Add assert result.ok
- Add "from src.result_types import Result" import
1 test passes.
The test_send_invokes_adapter_send test calls ai_client.send() and
asserts the return value. Migrating to send_result() with
assert res.ok and res.data == "Hello from mock adapter".
Changes:
- Rename ai_client.send(...) to ai_client.send_result(...)
- Add assert res.ok before accessing res.data
- Add "from src.result_types import Result" import
1 test passes.
The test_gemini_cli_loop_termination test calls ai_client.send() and
asserts the return value. Migrating to send_result() with
assert result.ok and result.data == "Final answer".
Changes:
- Rename ai_client.send(...) to ai_client.send_result(...)
- Add assert result.ok before accessing result.data
- Add "from src.result_types import Result" import
3 tests pass.
The test calls ai_client.send() but does not check the return value -
it only verifies the side effect on gemini cache stats. Migrating to
send_result() and asserting result.ok is enough.
Changes:
- Rename ai_client.send(...) to ai_client.send_result(...)
- Add assert result.ok (the return value is unused)
- Add "from src.result_types import Result" import
2 tests pass.