Private
Public Access
0
0
Commit Graph

2782 Commits

Author SHA1 Message Date
conductor-tier2 1fb0d79c0d conductor(track-update): data_structure_strengthening - HistoryMessage vs ProviderHistoryMessage split
4 surgical additions to the spec, no task changes:

1. ProviderHistoryMessage: Added a new alias to §3.1 (The
   Aliases). Per nagent_review Pitfall #4 (provider history
   divergence), the UI/curation layer (HistoryMessage, edited
   via disc_entries[i].content) and the SDK layer
   (ProviderHistoryMessage, the bytes actually replayed to the
   LLM) are *distinct*. Conflating them via a single alias
   perpetuates the bug. The new alias is documented as a
   separate concept with its own use sites (_anthropic_history,
   _deepseek_history, _minimax_history, _grok_history,
   _llama_history). The follow-up public_api_migration_20260606
   track is the natural moment to unify the two layers; this
   spec just makes the distinction explicit.

2. FileItem alias points to the existing models.FileItem
   dataclass, not Metadata. Per docs/guide_context_aggregation.md
   (added 2026-06-08), FileItem is a 9-field dataclass
   (path, auto_aggregate, force_full, view_mode, selected,
   ast_signatures, ast_definitions, ast_mask, custom_slices,
   injected_at) with a __post_init__ normalizer. Aliasing it to
   dict[str, Any] would lose the type safety. The 9 other
   aliases remain dict aliases for round-trip compatibility.

3. gui_2.py and mcp_client.py as follow-up: Added a Note
   (dated 2026-06-08) to the Out of Scope section. The 23
   lower-impact files (deferred) are dominated by gui_2.py
   (26+ weak sites per guide_state_lifecycle.md) and
   mcp_client.py (will be touched heavily by the parallel
   mcp_architecture_refactor_20260606). The deferral is correct
   but the follow-up should explicitly call out these two
   files as the next targets, rather than implying they're
   handled.

4. See Also cross-references: Added 7 new entries to §12.2:
   - docs/guide_models.md (FileItem dataclass source)
   - docs/guide_context_aggregation.md (FileItems consumer)
   - docs/guide_discussions.md (HistoryMessage shape)
   - docs/guide_state_lifecycle.md (state delegation)
   - conductor/tracks/mcp_architecture_refactor_20260606/
   - conductor/tracks/nagent_review_20260608/{report,takeaways}.md

No plan.md changes.
2026-06-08 20:50:50 -04:00
ed 1c565da7a0 feat(gui): wrap immapp.run in try/except + add /api/gui_health endpoint
PR2 of the test_full_live_workflow_imgui_assert fix sequence.

When an ImGui scope mismatch (IM_ASSERT(Missing End())) fires in
immapp.run (e.g. after cumulative state corruption from prior sims'
panel renders), the RuntimeError propagates out of app.run(). The
controller's _io_pool gets shut down via __del__/finalization. The
hook server (separate ThreadingHTTPServer) survives. Subsequent test
clicks fail with 'cannot schedule new futures after shutdown' and
the test times out after 120s with no clear signal of what went
wrong.

This commit:
1. Wraps immapp.run in try/except RuntimeError in gui_2.py:618.
   On assertion: logs the error to stderr (NOT silent), records
   it on controller._gui_degraded_reason and _last_imgui_assert,
   and returns from run() so the hook server keeps serving.
2. Adds _gui_degraded_reason and _last_imgui_assert to
   AppController.__init__ (initialized to None).
3. Adds /api/gui_health endpoint in api_hooks.py:148. Returns
   {healthy, degraded_reason, last_assert, io_pool_alive}.
4. Adds ApiHookClient.get_gui_health() with the matching unit
   tests (3 mocked tests + 1 live test).

Per user feedback 2026-06-08:
- The wrap does NOT silently swallow the error. It logs at ERROR
  level and surfaces it via the health endpoint.
- Tests can call client.get_gui_health() to detect a degraded GUI
  and fail fast with a clear message.

TDD: tests written first, confirmed to fail, then fix applied.
34/34 unit tests pass. 1/1 live test passes (live_gui health
endpoint reports healthy=True on fresh subprocess).
2026-06-08 20:46:41 -04:00
conductor-tier2 0471440c68 conductor(track-update): data_oriented_error_handling - nagent_review + docs refresh
3 surgical additions to the spec, no task changes:

1. New ErrorKind: Added PROVIDER_HISTORY_DIVERGED_FROM_UI to
   the ErrorKind enum. Per nagent_review Pitfall #4 (provider
   history divergence: user edits disc_entries[i].content via
   the discussion UI but ai_client._<provider>_history still
   replays the original). The new kind makes the divergence
   *detectable* and *reportable* so the follow-up
   public_api_migration_20260606 track can collapse the two
   history layers. The Result pattern from this track is the
   natural carrier for the signal.

2. State-delegation regression tests: Added mandatory
   regression tests to the testing strategy in §6 for the
   ai_client refactor (highest-risk phase). The new tests
   exercise:
   - app.temperature = 0.5 round-trips through App.__getattr__/
     __setattr__ delegation (per gui_2.py:666-675)
   - controller.disc_entries[i].content is reflected in the
     next send_result()'s messages parameter
   - The 3 per-provider history locks serialize correctly under
     concurrent send_result() calls
   The reason this is mandatory: per guide_state_lifecycle.md
   (added 2026-06-08), the App.__getattr__/__setattr__ pattern
   means a partial refactor manifests as silent AttributeError
   deep in test code, not at the refactor commit boundary.

3. See Also cross-references: Added 6 new entries to §12.3:
   - docs/guide_ai_client.md (per-provider history globals)
   - docs/guide_mcp_client.md (3-layer security model)
   - docs/guide_state_lifecycle.md (3 per-thread + 7-lock pattern)
   - docs/guide_discussions.md (23-operation matrix)
   - docs/guide_context_aggregation.md (build_discussion_section)
   - conductor/tracks/mcp_architecture_refactor_20260606/
   - conductor/tracks/nagent_review_20260608/{report,takeaways}.md

No plan.md changes. Plan tasks are task-level and will flow from
the spec changes when the track is re-planned.
2026-06-08 20:41:00 -04:00
conductor-tier2 77ae2ec7a8 conductor(track-update): qwen_llama_grok - spec notes for nagent_review + docs refresh
4 surgical additions to the spec, no task changes:

1. Result return type: Added a coordination note in §3.1 (Data-
   Oriented Design) explaining that the shared send_openai_compatible
   helper should return Result[NormalizedResponse, ErrorInfo] from
   day 1, not NormalizedResponse + ProviderError raise. This is so
   the downstream data_oriented_error_handling_20260606 track is
   a small mechanical pass over new code, not a second migration.
   References nagent_review Pitfall #4 (provider history divergence)
   and the ErrorKind.PROVIDER_HISTORY_DIVERGED_FROM_UI use case.

2. Declarative read, not behavioral dispatch: Added clarification
   to §6 (UX Adaptation) that the capability matrix is a *read* of
   declarative data, not a new dispatch layer. Per nagent_review
   Pitfall #1 (opaque function calling in the Application is the
   correct choice; nagent-style protocol is for Meta-Tooling),
   UI elements are visible/enabled/disabled/hidden but the
   *behavior* they invoke is unchanged. Three concrete examples
   added: screenshot button, cost panel, cache panel.

3. PROVIDERS source of truth: Added a NOTE in §3.2 (Module Layout)
   that src/models.py:79-86 PROVIDERS is the existing single
   source of truth for the (vendor, model) enumeration. The
   capability registry reads from this constant rather than
   introducing a parallel list. Cross-references
   docs/guide_models.md.

4. Docs touchpoint: Expanded Phase 6 (Docs + Archive) in §9 to
   note that docs/guide_ai_client.md needs the new providers +
   the shared helper documented, and that
   docs/guide_context_aggregation.md (added 2026-06-08) is the
   reference for the aggregate.py pipeline that all new providers
   use.

5. See Also cross-references: Added 3 new entries to §13.2:
   - docs/guide_context_aggregation.md (the new pipeline guide)
   - conductor/tracks/nagent_review_20260608/report.md (§1, §5, §15)
   - conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md
     (§1, §2, §9)

No plan.md changes. Plan tasks are task-level and will flow from
the spec changes when the track is re-planned.
2026-06-08 20:35:52 -04:00
ed d7a065e9d5 ascii gui comms worflow ideation 2026-06-08 20:32:42 -04:00
conductor-tier2 161ebb0da6 docs(fix): correct nav link case + relative-path level
Gitea (and any case-sensitive filesystem) was rendering the [Top]
nav links in /docs as broken because of two bugs:

1. Case-sensitivity: 22 links used '../README.md' (all-uppercase)
   but the actual file is 'docs/Readme.md' (capital R, lowercase
   rest). 21 guide_*.md nav bars were affected, plus 1 internal
   cross-link in Readme.md itself. Works on Windows (case-
   insensitive) but broken on Linux/Gitea.

   Fix: 22 occurrences across 22 files changed
   '../README.md' -> '../Readme.md'

2. Wrong relative-path level: 16 links used '../../conductor/...'
   from 'docs/guide_*.md' to reach 'conductor/'. This goes up 2
   levels to 'projects/', which doesn't exist. The correct path
   from 'docs/guide_*.md' to 'conductor/' is 1 level up
   ('../conductor/...'). 12 unique patterns across 10 files
   affected.

   Fix: 16 occurrences across 10 files changed
   '../../conductor/' -> '../conductor/'

3. Bonus: 1 planned-guide link in guide_context_curation.md
   referenced a never-written 'guide_context_presets.md'. The
   ContextPreset schema is now fully covered in the new
   'guide_context_aggregation.md' (per the 2026-06-08 docs
   refresh). Fix: link target updated.

No content was changed, only link paths. 24 files, 37 link
replacements, 37 deletions.

Verification:
- All .md links in docs/ now resolve to existing files
  (validated by path-resolution check from each file's directory)
- The 3 new guides from the previous docs refresh commit
  (guide_discussions.md, guide_state_lifecycle.md,
  guide_context_aggregation.md) had the case bug inherited from
  guide_architecture.md's existing nav pattern; their top-of-file
  nav bars are now correct
- The 21 pre-existing guide nav bars that had the same bug
  (all 21 of them, except the 3 that used the correct case:
  guide_mma.md, guide_simulations.md, guide_tools.md) are now
  also fixed
- Inter-guide links (e.g. [Discussions](guide_discussions.md))
  were not affected; they were always correct because both the
  link text and the actual filename are lowercase

This is a docs-only fix. No code modified.
2026-06-08 19:51:55 -04:00
conductor-tier2 ba05168493 docs(refresh): 3 new guides + cross-links from nagent_review
Per the docs Refresh Protocol (conductor/workflow.md), after a
reference/analysis track ships, the affected guides must be updated
to reflect new module structure or new conventions. The nagent_review
track (9cc51ca9) produced a deep-dive + 10 actionable takeaways that
named 3 documentation gaps in /docs. This commit fills them.

3 new guides (1,122 lines total):

1. guide_discussions.md (353 lines) — The Discussion system
   - 23-operation matrix: A1-A7 per-entry + B1-B11 discussion-level
     + C1-C5 undo/redo
   - Take naming convention (<base>_take_<n>), branching, promotion
   - User-managed role list (app.disc_roles)
   - Per-role filter linked to MMA persona focus
   - _disc_entries_lock thread-safety contract
   - Hook API session endpoints
   - Persistence: _flush_to_project, _flush_disc_entries_to_project,
     context_snapshot
   - 9 file:line refs into gui_2.py:3770-4260 + history.py

2. guide_state_lifecycle.md (375 lines) — Undo/redo + reset + state
   delegation
   - HistoryManager + UISnapshot (13 captured fields, 100-snapshot
     capacity, debounced change-detection at render frame)
   - _handle_reset_session (clears 30+ fields, replaces project,
     preserves active_project_path per the 2026-06-08 regression fix)
   - App.__getattr__/__setattr__ state delegation to Controller
   - 4-thread access pattern with 7 lock-protected regions
   - State persistence: in-memory vs project TOML vs config TOML
   - Hot-reload integration
   - Hook API registries (_predefined_callbacks, _gettable_fields)
   - 14 file:line refs into gui_2.py:1140-1170, history.py,
     app_controller.py:3286-3356

3. guide_context_aggregation.md (394 lines) — The aggregate.py
   pipeline
   - 3 aggregation strategies (auto, summarize, full)
   - 7 per-file view modes (full, summary, skeleton, outline,
     masked, custom, none)
   - Full FileItem schema (9 fields + __post_init__ normalizer)
     at models.py:510-559
   - ContextPreset schema and ContextPresetManager
   - Tier 3 worker variant (build_tier3_context with FuzzyAnchor
     re-resolution and focus-file handling)
   - force_full / auto_aggregate short-circuits
   - Cache strategy (static prefix + dynamic history)
   - 23 file:line refs into aggregate.py:36-518 + models.py:909-937

8 existing guides cross-linked to the 3 new guides and to the
nagent_review track:

- guide_gui_2.md           (+ See Also entries for discussions,
                           state lifecycle, context aggregation,
                           nagent_review report)
- guide_app_controller.md  (+ See Also entries for discussions,
                           state lifecycle, context aggregation,
                           nagent_review report)
- guide_context_curation.md (+ new See Also section pointing to
                            context aggregation + nagent_review)
- guide_architecture.md    (+ new See Also section listing all 10
                           guides + nagent_review report)
- guide_ai_client.md       (+ See Also entries for state lifecycle,
                           context aggregation, nagent_review
                           pitfalls #2 and #4)
- guide_mma.md             (+ new See Also section pointing to
                           context aggregation, discussions,
                           nagent_review report §9 + takeaways §3/§10
                           for SubConversationRunner priority)
- guide_models.md          (+ See Also entries for context
                           aggregation, discussions, nagent_review
                           report §6 on FileItem as strongest
                           curation dimension)
- Readme.md                (+ 3 new guide entries in the index
                           table, with one-line summaries)

No code modified. This is documentation only.

Why these 3 guides specifically:

- guide_discussions.md: The discussion system is the user's most
  edited surface. nagent_review's report §3 enumerated 23 operations
  (A1-C5) that previously existed only as scattered file:line refs
  across gui_2.py. A dedicated guide makes the operation matrix
  discoverable.

- guide_state_lifecycle.md: The undo/redo + reset + state delegation
  machinery is architecturally load-bearing but scattered across 4
  files. After nagent_review identified the provider-side history
  divergence as Pitfall #4, the relationship between Manual Slop's
  state and the provider's state needs explicit documentation.

- guide_context_aggregation.md: aggregate.py (518 lines) is the
  most-touched module after ai_client.py but had no dedicated
  guide. nagent_review confirmed it's Manual Slop's strongest
  curation dimension. A dedicated guide makes the 7 view modes
  and 3 strategies discoverable.

The 3 new guides total 1,122 lines and follow the existing
per-source-file deep-dive style (architectural, data-oriented,
state-management-focused).
2026-06-08 19:26:08 -04:00
conductor-tier2 9cc51ca9af conductor(track): nagent review - deep-dive + 6 pitfalls + 10 actionable takeaways
Reference/analysis track. Produces 0 code changes.

Artifacts (conductor/tracks/nagent_review_20260608/):
- spec.md (240 lines) - track wrapper with Application/Meta-Tooling framing
- report.md (571 lines) - 14-section deep-dive; primary deliverable
- comparison_table.md (79 lines) - flat side-by-side reference
- decisions.md (286 lines) - 10 future-track candidates with priority matrix
- nagent_takeaways_20260608.md (363 lines) - 10 actionable patterns grounded
  in code (file:line refs into nagent source and Manual Slop source)
- metadata.json (132 lines) - structured metadata + verification criteria
- state.toml (113 lines) - per-task tracking + user-corrections log (7 entries)

14 nagent principles covered in report.md (durable work, text-in/text-out,
editable state, visible protocol, the loop, per-file memory, repo history,
neighborhoods, sub-conversations, controlled writes, large files, tool
discovery, framework differences, build your own).

6 pitfalls (revised from 8 after user-corrections):
1. No structured output protocol in Application AI (opaque function calling)
2. Provider-specific history in process globals (ai_client._anthropic_history
   + _deepseek_history + _minimax_history)
3. RAG is not 'history as data' (fuzzy, not auditable)
4. AI client is a stateful singleton (2,685-line ai_client.py)
5. No non-MMA disposable sub-conversations (1:1 gap; user-flagged want)
6. Hard-coded tool discovery (45-tool if/elif in mcp_client.py)

User-corrections applied (3 rounds, 7 total corrections recorded):
- Editable discussions: PARTIAL -> PARITY (DIFFERENT FOCUS) with full A1-A7
  per-entry + B1-B11 discussion-level + C1-C5 undo/redo operation matrix
- Per-file memory: DOMAIN MISMATCH -> MANUAL SLOP IS STRONGER IN
  CURATION DIMENSION (FileItem + ContextPreset vs nagent's inode-keyed
  conversation log; complementary, not equivalent)
- Sub-conversations: MMA has it; 1:1 does not -> 'PARITY for MMA; GAP for
  1:1 discussions' (user wants this)
- RAG: opt-in, not gap; user wants pre-staging via sub-conversation
- Personas: config bundling (can opt out via AI settings)
- Tool discovery: deferred (user has 'intent based DSL' idea but 'no where
  near that ideation yet')

10 actionable takeaways (separate from the 6 pitfalls - those are
diagnosis, these are prescription):
1. State visibility (UI inspector for in-process state)
2. Readable conversation log (text-greppable, not just JSON-L)
3. Sub-agents for 1:1 (HIGH priority - user-flagged)
4. File-identity over file-path (st_dev:st_ino rename-safe)
5. One loop shape visible in diagnostics
6. Visible retry on protocol failure
7. Meta-Tooling DSL (intent-based, deferred)
8. Self-describing tools (subsumed by mcp_architecture_refactor_20260606)
9. Single source of truth for disc_entries + provider history
10. Sub-agent return type constraint (bake into candidate #1 spec)

Domain classification: every recommendation tagged Application / Meta-Tooling
/ Both per docs/guide_meta_boundary.md. nagent lives in the Meta-Tooling
domain; Manual Slop's Application AI is a different kind of thing.

No code modified by this track (reference/analysis only). All 7 files
parse cleanly (JSON, TOML, Markdown). All internal cross-links resolve.
Track is 'active' awaiting human review; future-track candidates live in
decisions.md and nagent_takeaways_20260608.md.
2026-06-08 18:44:35 -04:00
ed c9a991bbb8 test(live_workflow): bump project switch wait timeout 30s -> 120s
The 30s wait_for_project_switch timeout was an excessive constraint.
In batch context, prior sims' AI discussion turn workers saturate the
8-worker io_pool, queueing this switch for tens of seconds. The other
defensive waits in the test (warmup 60s, prior switch 60s) already use
60s+, so 30s was the inconsistent outlier.

User confirmed: 'I think not completing in 30s is an excessive constraint
if thats whats going on.'

Verification:
- test_full_live_workflow isolation: 11.69s PASS
- 7-test batch (test_full_live_workflow + 4 extended sims + 2 markdown): 85.83s PASS
2026-06-08 18:14:18 -04:00
ed 87d7c5bff2 test(io_pool): update assertion for 8-worker pool size 2026-06-08 17:51:39 -04:00
ed 4a33848620 fix(io_pool): increase worker count from 4 to 8 to prevent test hangs
Root cause: test_full_live_workflow in batch context (with prior sims
running AI discussion turns) would queue its _do_project_switch behind
the auto-pruner's scan of tests/logs/ (154MB, 6519 files). The 4-worker
pool was saturated, so the switch would never run within 30s.

Fix: bump IO_POOL_MAX_WORKERS from 4 to 8. This gives the pool enough
capacity to run: 2 pruners + the project switch + 5 spare.

Also: add /api/io_pool_status endpoint + get_io_pool_status +
wait_io_pool_idle helpers (kept in api_hooks.py and api_hook_client.py
for the test_api_hook_client_io_pool.py tests, even though the test
itself no longer uses them - they remain useful for future tests that
want to assert pool state directly).

Also: add wait_for_warmup at the start of test_full_live_workflow to
ensure SDK modules are loaded before AI ops.

Test verification:
- test_full_live_workflow in isolation: 11.83s PASS
- test_full_live_workflow in batch (with 4 prior sims): 83.46s PASS
- 30/30 related unit tests PASS
2026-06-08 17:49:34 -04:00
ed 9afc93bce2 fix(app_controller): clear project-switch state in _handle_reset_session
When a prior test in the tier-3-live_gui batch leaves a _do_project_switch
background thread running, the next test's btn_project_new_automated click
sees _project_switch_in_progress=True (from the prior thread) and queues
the new path via _project_switch_pending_path. The queued switch is never
actually submitted to the io_pool, so is_project_stale() stays True and
AI ops (_handle_generate_send) bail with 'project switch in progress;
AI ops disabled'.

Fix: _handle_reset_session now also clears _project_switch_in_progress,
_project_switch_pending_path, and _project_switch_error (under the
existing _project_switch_lock). This way, even if the prior background
thread is still running, the controller reports an idle state and the
new switch can be submitted normally.

Also:
- src/api_hook_client.py: reverted wait_for_project_switch to require
  in_progress=False (was relaxed to return on queued path, which misled
  the caller into thinking the switch was done)
- tests/test_handle_reset_session_clears_project.py: new test
  test_handle_reset_session_clears_project_switch_state asserts
  is_project_stale() returns False after reset
- tests/test_api_hook_client_wait_for_project_switch.py: updated
  test_wait_for_project_switch_does_not_return_on_queued (in_progress
  + matching path should keep waiting, not return early)
- tests/test_live_workflow.py: added pre-wait for any in-flight switch
  before doing btn_reset (so the test waits up to 60s for the prior
  switch to complete if needed)
- conductor/todos/TODO_test_full_live_workflow.md: updated Task 4 with
  the deeper hang analysis and recommended fix

Known follow-up: test_full_live_workflow still hangs in tier-3 batch
even with this fix, because the new _do_project_switch itself is hung
in the io_pool (likely saturation from prior sims' AI discussion turn
workers). Deeper investigation required.
2026-06-08 15:19:30 -04:00
ed 5087ee988d chore: move TODO_test_full_live_workflow.md to conductor/todos/
Following the conductor convention of organizing track-related
artifacts under conductor/. The TODO tracks the test_full_live_workflow
race condition fix and its follow-up items (Tasks 3, 7 still pending;
known batch hang documented).

Tasks 1, 2 (with regression fix), 4, 5, 6 are SHIPPED in prior commits.
2026-06-08 14:05:40 -04:00
ed 3391e18f64 chore(pyproject): register pytest.mark.live marker
Silences the PytestUnknownMarkWarning emitted by test_visual_mma.py and
test_visual_sim_gui_ux.py (3 instances). The @pytest.mark.live mark
already exists in the test files; pyproject.toml just didn't know
about it.

- pyproject.toml: added 'live: marks tests as live visualization tests
  (not in CI by default)' to [tool.pytest.ini_options].markers
2026-06-08 13:59:18 -04:00
ed d09f70ea44 docs(todo): mark Tasks 4+5 as SHIPPED; note known batch hang issue 2026-06-08 13:37:13 -04:00
ed b6972c31de test(live_workflow): use wait_for_project_switch + defensive file check
Replaces the 10x1s blind poll of derived state with a condition-based
wait on /api/project_switch_status. Also adds a defensive file existence
check that fails fast (within 5s) if the click was dropped or the
project creation handler crashed.

The new wait surfaces a clear error message ('Project switch did not
complete in 30s. Last status: ...') instead of the generic 'Project
failed to activate', and exposes _project_switch_error if the controller
reported one.

- tests/test_live_workflow.py: replaced poll loop (lines 57-65) with
  wait_for_project_switch + os.path.exists defensive check
2026-06-08 13:26:54 -04:00
ed a6605d9889 feat(api_hook_client): add wait_for_project_switch for deterministic test waits
Adds a polling helper that blocks until the project switch completes,
errors out, or times out. Replaces the fragile 10x1s blind poll in
test_full_live_workflow with a condition-based wait on the
/api/project_switch_status endpoint.

Features:
- Polls /api/project_switch_status every 200ms (configurable)
- Returns immediately on error (with the error in the result)
- Path matching: exact match OR basename match (handles absolute vs relative)
- Times out with a clear 'timeout' flag instead of a generic assertion
- Optional expected_path: if None, returns on any in_progress=False

- src/api_hook_client.py: new wait_for_project_switch method (37 lines)
- tests/test_api_hook_client_wait_for_project_switch.py: 6 unit tests
  with mocked _make_request covering all paths
2026-06-08 13:04:12 -04:00
ed 54e46ee815 docs(todo): note regression discovered and fixed in test_context_sim_live 2026-06-08 12:35:24 -04:00
ed 4548726a2b conductor(tracks): restructure - chronological by phase + status groupings + active queue table 2026-06-08 12:26:56 -04:00
ed e0a3eb8c05 fix(app_controller): regression in test_context_sim_live from clearing active_project_path
Task 2 (_handle_reset_session reset) introduced a regression: setting self.active_project_path to empty caused an infinite re-switch loop in _do_project_switch because _flush_to_project writes to active_project_path (raises OSError on empty path), and the finally block re-submitted the failed switch on every iteration. Result: test_context_sim_live saw switching-to status for 5+ seconds and MD-only generation was blocked.

Fix: keep self.active_project_path as-is in _handle_reset_session. Only reset self.project (to a fresh default_project dict) and self.project_paths (to empty list). The stale project state issue is solved by replacing the project dict; the active_project_path stays valid for _flush_to_project.

- src/app_controller.py: refined _handle_reset_session project reset
- tests/test_handle_reset_session_clears_project.py: updated contract test to assert active_project_path is preserved
2026-06-08 12:24:10 -04:00
ed 40d61bf3d8 docs(todo): mark Tasks 1+2 as SHIPPED for test_full_live_workflow fix 2026-06-08 10:15:54 -04:00
ed 6ecb31ea0a feat(app_controller): reset project state in _handle_reset_session
Stale project state from prior live_gui tests (shared session-scoped
subprocess) was leaking into subsequent tests, causing the
test_full_live_workflow race condition: 'Project not switched' errors
when self.project still claimed to be a different project.

The fix: _handle_reset_session now mirrors the default-project branch
of __init__ (lines 1743-1745), creating a fresh default project dict,
clearing active_project_path and project_paths, and reinitializing
the workspace manager.

- src/app_controller.py: 6 new lines in _handle_reset_session
- tests/test_handle_reset_session_clears_project.py: 3 tests
  (active_project_path, project_paths, self.project)
2026-06-08 10:13:07 -04:00
ed abb3856525 feat(api_hooks): add /api/project_switch_status endpoint for deterministic test signaling
Adds a new endpoint that exposes the project-switch state machine so tests
can poll for completion instead of guessing with timeouts.

- AppController: track _project_switch_error on failure paths
- src/api_hooks.py: GET /api/project_switch_status returns
  {in_progress, pending_path, active_path, error}
- src/api_hook_client.py: get_project_switch_status() helper
- tests/test_api_hooks_project_switch.py: 3 unit tests for client + endpoint
  shape, 1 live_gui test for the default-idle case
2026-06-08 09:55:36 -04:00
ed c531cebe03 conductor(plan): review pass — fix cross-references, add NOT_READY + with_errors + Lottes/Valigo, split §3.4 into 8 sub-tasks 2026-06-08 09:38:27 -04:00
ed 8248a49f1e docs(todo): simple todo list for fixing test_full_live_workflow race 2026-06-08 09:25:18 -04:00
ed 08ee7547be docs(reports): root cause report for test_full_live_workflow race condition 2026-06-08 09:24:14 -04:00
ed 64823493c0 conductor(closeout): ship test_batching_refactor_20260606 with CLOSEOUT.md and follow-up recommendation 2026-06-08 08:36:22 -04:00
ed 488ae04459 fix(run_tests_batched): detect batch failure from output when proc.returncode is wrong 2026-06-08 02:03:50 -04:00
ed 5c6eb620a1 fix(run_tests_batched): colorize non-xdist format (tests/... STATUS), filter 'Error during log pruning' noise 2026-06-08 01:54:56 -04:00
ed 272b7841ae fix(run_tests_batched): filter xdist scheduling queue output (test paths without status prefix) 2026-06-08 01:51:07 -04:00
ed a2d16541d0 fix(run_tests_batched): keep pytest's full -v output, only filter LogPruner/win errors, colorize per-test status 2026-06-08 01:49:39 -04:00
ed 21cb57b31d fix(run_tests_batched): graceful xdist fallback, live progress streaming, ANSI colors, absolute default paths 2026-06-08 01:28:53 -04:00
ed fb6b4bd3eb conductor(tracks): mark test_batching_refactor_20260606 as completed 2026-06-08 01:18:20 -04:00
ed 50bd894f8d conductor(archive): ship test_batching_refactor_20260606 to archive 2026-06-08 01:16:58 -04:00
ed 50f26f0d5c chore: delete legacy run_tests_batched.py (was preserved for one cycle) 2026-06-08 01:15:12 -04:00
ed ac7e638b23 chore: gitignore tests/.test_durations.json (developer-local cache) 2026-06-08 01:14:51 -04:00
ed 9eac02ddcb feat(tests): populate test_categories.toml with cross-cutting entries 2026-06-08 01:14:12 -04:00
ed 796eec0058 conductor(plan): mark Phases 2,3 complete in test_batching_refactor_20260606 2026-06-08 01:09:02 -04:00
ed 5252b6d782 docs(testing): document new run_tests_batched.py in Running Tests section 2026-06-08 01:00:50 -04:00
ed e6ad2ecda2 chore: preserve old run_tests_batched.py as .legacy for one cycle 2026-06-08 00:59:49 -04:00
ed 2c3a0512f2 feat(run_tests_batched): full CLI with --tiers, --durations, actual pytest execution 2026-06-08 00:58:53 -04:00
ed 7610c9c1dc conductor(plan): mark Phase 1 complete in test_batching_refactor_20260606 2026-06-08 00:53:59 -04:00
ed 57285d048b feat(run_tests_batched): add --plan and --audit modes (Phase 1 stub) 2026-06-08 00:50:37 -04:00
ed 29ac64adc6 test(conftest): register tests.pytest_collection_order as pytest plugin 2026-06-08 00:49:11 -04:00
ed f240504f0e feat(collection_order): implement opt-in per-test sort via conftest hook 2026-06-08 00:47:21 -04:00
ed 6287005ad1 test(collection_order): add red tests for opt-in sort_items_by_order 2026-06-08 00:47:03 -04:00
ed e07036ad5d feat(batcher): implement Batch dataclass and plan() function 2026-06-08 00:46:12 -04:00
ed 246f293c56 test(batcher): add red tests for plan() function 2026-06-08 00:41:20 -04:00
ed 9c5ad3fb8d config 2026-06-08 00:40:33 -04:00
ed f778ef509e feat(categorizer): implement load_registry, merge_registry, categorize_all 2026-06-08 00:33:21 -04:00