Reference/analysis track. Produces 0 code changes.
Artifacts (conductor/tracks/nagent_review_20260608/):
- spec.md (240 lines) - track wrapper with Application/Meta-Tooling framing
- report.md (571 lines) - 14-section deep-dive; primary deliverable
- comparison_table.md (79 lines) - flat side-by-side reference
- decisions.md (286 lines) - 10 future-track candidates with priority matrix
- nagent_takeaways_20260608.md (363 lines) - 10 actionable patterns grounded
in code (file:line refs into nagent source and Manual Slop source)
- metadata.json (132 lines) - structured metadata + verification criteria
- state.toml (113 lines) - per-task tracking + user-corrections log (7 entries)
14 nagent principles covered in report.md (durable work, text-in/text-out,
editable state, visible protocol, the loop, per-file memory, repo history,
neighborhoods, sub-conversations, controlled writes, large files, tool
discovery, framework differences, build your own).
6 pitfalls (revised from 8 after user-corrections):
1. No structured output protocol in Application AI (opaque function calling)
2. Provider-specific history in process globals (ai_client._anthropic_history
+ _deepseek_history + _minimax_history)
3. RAG is not 'history as data' (fuzzy, not auditable)
4. AI client is a stateful singleton (2,685-line ai_client.py)
5. No non-MMA disposable sub-conversations (1:1 gap; user-flagged want)
6. Hard-coded tool discovery (45-tool if/elif in mcp_client.py)
User-corrections applied (3 rounds, 7 total corrections recorded):
- Editable discussions: PARTIAL -> PARITY (DIFFERENT FOCUS) with full A1-A7
per-entry + B1-B11 discussion-level + C1-C5 undo/redo operation matrix
- Per-file memory: DOMAIN MISMATCH -> MANUAL SLOP IS STRONGER IN
CURATION DIMENSION (FileItem + ContextPreset vs nagent's inode-keyed
conversation log; complementary, not equivalent)
- Sub-conversations: MMA has it; 1:1 does not -> 'PARITY for MMA; GAP for
1:1 discussions' (user wants this)
- RAG: opt-in, not gap; user wants pre-staging via sub-conversation
- Personas: config bundling (can opt out via AI settings)
- Tool discovery: deferred (user has 'intent based DSL' idea but 'no where
near that ideation yet')
10 actionable takeaways (separate from the 6 pitfalls - those are
diagnosis, these are prescription):
1. State visibility (UI inspector for in-process state)
2. Readable conversation log (text-greppable, not just JSON-L)
3. Sub-agents for 1:1 (HIGH priority - user-flagged)
4. File-identity over file-path (st_dev:st_ino rename-safe)
5. One loop shape visible in diagnostics
6. Visible retry on protocol failure
7. Meta-Tooling DSL (intent-based, deferred)
8. Self-describing tools (subsumed by mcp_architecture_refactor_20260606)
9. Single source of truth for disc_entries + provider history
10. Sub-agent return type constraint (bake into candidate #1 spec)
Domain classification: every recommendation tagged Application / Meta-Tooling
/ Both per docs/guide_meta_boundary.md. nagent lives in the Meta-Tooling
domain; Manual Slop's Application AI is a different kind of thing.
No code modified by this track (reference/analysis only). All 7 files
parse cleanly (JSON, TOML, Markdown). All internal cross-links resolve.
Track is 'active' awaiting human review; future-track candidates live in
decisions.md and nagent_takeaways_20260608.md.
The 30s wait_for_project_switch timeout was an excessive constraint.
In batch context, prior sims' AI discussion turn workers saturate the
8-worker io_pool, queueing this switch for tens of seconds. The other
defensive waits in the test (warmup 60s, prior switch 60s) already use
60s+, so 30s was the inconsistent outlier.
User confirmed: 'I think not completing in 30s is an excessive constraint
if thats whats going on.'
Verification:
- test_full_live_workflow isolation: 11.69s PASS
- 7-test batch (test_full_live_workflow + 4 extended sims + 2 markdown): 85.83s PASS
Root cause: test_full_live_workflow in batch context (with prior sims
running AI discussion turns) would queue its _do_project_switch behind
the auto-pruner's scan of tests/logs/ (154MB, 6519 files). The 4-worker
pool was saturated, so the switch would never run within 30s.
Fix: bump IO_POOL_MAX_WORKERS from 4 to 8. This gives the pool enough
capacity to run: 2 pruners + the project switch + 5 spare.
Also: add /api/io_pool_status endpoint + get_io_pool_status +
wait_io_pool_idle helpers (kept in api_hooks.py and api_hook_client.py
for the test_api_hook_client_io_pool.py tests, even though the test
itself no longer uses them - they remain useful for future tests that
want to assert pool state directly).
Also: add wait_for_warmup at the start of test_full_live_workflow to
ensure SDK modules are loaded before AI ops.
Test verification:
- test_full_live_workflow in isolation: 11.83s PASS
- test_full_live_workflow in batch (with 4 prior sims): 83.46s PASS
- 30/30 related unit tests PASS
When a prior test in the tier-3-live_gui batch leaves a _do_project_switch
background thread running, the next test's btn_project_new_automated click
sees _project_switch_in_progress=True (from the prior thread) and queues
the new path via _project_switch_pending_path. The queued switch is never
actually submitted to the io_pool, so is_project_stale() stays True and
AI ops (_handle_generate_send) bail with 'project switch in progress;
AI ops disabled'.
Fix: _handle_reset_session now also clears _project_switch_in_progress,
_project_switch_pending_path, and _project_switch_error (under the
existing _project_switch_lock). This way, even if the prior background
thread is still running, the controller reports an idle state and the
new switch can be submitted normally.
Also:
- src/api_hook_client.py: reverted wait_for_project_switch to require
in_progress=False (was relaxed to return on queued path, which misled
the caller into thinking the switch was done)
- tests/test_handle_reset_session_clears_project.py: new test
test_handle_reset_session_clears_project_switch_state asserts
is_project_stale() returns False after reset
- tests/test_api_hook_client_wait_for_project_switch.py: updated
test_wait_for_project_switch_does_not_return_on_queued (in_progress
+ matching path should keep waiting, not return early)
- tests/test_live_workflow.py: added pre-wait for any in-flight switch
before doing btn_reset (so the test waits up to 60s for the prior
switch to complete if needed)
- conductor/todos/TODO_test_full_live_workflow.md: updated Task 4 with
the deeper hang analysis and recommended fix
Known follow-up: test_full_live_workflow still hangs in tier-3 batch
even with this fix, because the new _do_project_switch itself is hung
in the io_pool (likely saturation from prior sims' AI discussion turn
workers). Deeper investigation required.
Following the conductor convention of organizing track-related
artifacts under conductor/. The TODO tracks the test_full_live_workflow
race condition fix and its follow-up items (Tasks 3, 7 still pending;
known batch hang documented).
Tasks 1, 2 (with regression fix), 4, 5, 6 are SHIPPED in prior commits.
Silences the PytestUnknownMarkWarning emitted by test_visual_mma.py and
test_visual_sim_gui_ux.py (3 instances). The @pytest.mark.live mark
already exists in the test files; pyproject.toml just didn't know
about it.
- pyproject.toml: added 'live: marks tests as live visualization tests
(not in CI by default)' to [tool.pytest.ini_options].markers
Replaces the 10x1s blind poll of derived state with a condition-based
wait on /api/project_switch_status. Also adds a defensive file existence
check that fails fast (within 5s) if the click was dropped or the
project creation handler crashed.
The new wait surfaces a clear error message ('Project switch did not
complete in 30s. Last status: ...') instead of the generic 'Project
failed to activate', and exposes _project_switch_error if the controller
reported one.
- tests/test_live_workflow.py: replaced poll loop (lines 57-65) with
wait_for_project_switch + os.path.exists defensive check
Adds a polling helper that blocks until the project switch completes,
errors out, or times out. Replaces the fragile 10x1s blind poll in
test_full_live_workflow with a condition-based wait on the
/api/project_switch_status endpoint.
Features:
- Polls /api/project_switch_status every 200ms (configurable)
- Returns immediately on error (with the error in the result)
- Path matching: exact match OR basename match (handles absolute vs relative)
- Times out with a clear 'timeout' flag instead of a generic assertion
- Optional expected_path: if None, returns on any in_progress=False
- src/api_hook_client.py: new wait_for_project_switch method (37 lines)
- tests/test_api_hook_client_wait_for_project_switch.py: 6 unit tests
with mocked _make_request covering all paths
Task 2 (_handle_reset_session reset) introduced a regression: setting self.active_project_path to empty caused an infinite re-switch loop in _do_project_switch because _flush_to_project writes to active_project_path (raises OSError on empty path), and the finally block re-submitted the failed switch on every iteration. Result: test_context_sim_live saw switching-to status for 5+ seconds and MD-only generation was blocked.
Fix: keep self.active_project_path as-is in _handle_reset_session. Only reset self.project (to a fresh default_project dict) and self.project_paths (to empty list). The stale project state issue is solved by replacing the project dict; the active_project_path stays valid for _flush_to_project.
- src/app_controller.py: refined _handle_reset_session project reset
- tests/test_handle_reset_session_clears_project.py: updated contract test to assert active_project_path is preserved
Stale project state from prior live_gui tests (shared session-scoped
subprocess) was leaking into subsequent tests, causing the
test_full_live_workflow race condition: 'Project not switched' errors
when self.project still claimed to be a different project.
The fix: _handle_reset_session now mirrors the default-project branch
of __init__ (lines 1743-1745), creating a fresh default project dict,
clearing active_project_path and project_paths, and reinitializing
the workspace manager.
- src/app_controller.py: 6 new lines in _handle_reset_session
- tests/test_handle_reset_session_clears_project.py: 3 tests
(active_project_path, project_paths, self.project)
Adds a new endpoint that exposes the project-switch state machine so tests
can poll for completion instead of guessing with timeouts.
- AppController: track _project_switch_error on failure paths
- src/api_hooks.py: GET /api/project_switch_status returns
{in_progress, pending_path, active_path, error}
- src/api_hook_client.py: get_project_switch_status() helper
- tests/test_api_hooks_project_switch.py: 3 unit tests for client + endpoint
shape, 1 live_gui test for the default-idle case