manual_slop

Private

Public Access

Author	SHA1	Message	Date
ed	b801b11c3b	conductor(todo): mark task 9 (test deps in dev + conftest gate) as shipped	2026-06-09 10:39:29 -04:00
ed	a341d7a7c8	test: ensure sentence-transformers is in test env + conftest gate	2026-06-09 10:37:14 -04:00
ed	2148e79a1c	docs(rag): document venv dep install + new failure mode (relative path bug) The venv now has sentence-transformers (installed via uv sync --extra local-rag). The RAG test passes in isolation (7.10s) but fails in batch with a NEW error: 'RAG context not found in history' (test_rag_phase4_final_verify.py:95). This is a SEPARATE bug from the missing-dep issue. The RAG test uses RELATIVE file paths ('final_test_1.txt' instead of absolute). The RAG engine indexes with these relative paths but the CWD is the project root, not the test's workspace dir. Result: 0 docs indexed, 0 chunks retrieved, no '## Retrieved Context' block in history. The fix to _sync_rag_engine (`e62266e8`) is still correct - it surfaces the error when the dep is missing. The dep is now installed, so the sync/index/AI flow runs to completion. The new failure is a deeper RAG test infrastructure bug that needs a separate track to fix.	2026-06-09 10:21:45 -04:00
ed	e62266e868	fix(rag): surface embedding provider init failure as 'error' status The bug: when the local embedding provider fails to initialize (e.g. sentence-transformers not installed), RAGEngine.__init__ leaves self.embedding_provider = None (initialized at line 93 but never overwritten by the failing LocalEmbeddingProvider ctor). The constructor returns. _sync_rag_engine's else branch then sets status to 'ready' - a lie. The RAG panel shows 'ready'. The user triggers a retrieval. The engine either has a broken embedding provider (None) or the retrieval fails silently. The RAG context never appears in the AI's history. The fix: in _sync_rag_engine's _task, after RAGEngine(...) returns, check if engine.embedding_provider is None. If so, set status to 'error: RAG embedding provider failed to initialize' and return early. This prevents: - The engine from being assigned to self.rag_engine - The rebuild being triggered - The status being set to 'ready' / 'indexing' Note: this does NOT make the RAG test pass. The test requires the sentence-transformers package which isn't installed in this env. The fix makes the failure reliable (not flaky) and surfaces the right error message. TDD: 3 tests added in tests/test_rag_engine_ready_status_bug.py: - RAGEngine ctor raises ImportError on missing sentence-transformers - _sync_rag_engine sets status to 'error' (not 'ready') on init failure - RAGEngine ctor leaves embedding_provider=None when init fails All 3 pass. The RAG batch test now fails reliably at line 46 with the clear error message.	2026-06-09 09:39:02 -04:00
conductor-tier2	adc7ff8029	docs(audit): workflow/agent markdown audit with 10 recommendations User asked: is there anything in our workflow or agent markdown that should be updated or introduced based on this session? This commit is the AUDIT ONLY. No workflow files are modified. The 10 recommendations are not yet applied. User picks which to act on, which to defer, which to discard. docs/reports/workflow_markdown_audit_20260608.md (~370 lines): Read all the workflow/agent markdown in scope (AGENTS.md, CLAUDE.md, GEMINI.md, all 5 .agents/skills//SKILL.md, the 4 .agents/agents/.md, conductor/workflow.md, product.md, product-guidelines.md, tech-stack.md, index.md, tracks.md, edit_workflow.md, the 2 existing code_styleguides/.md, and the 4 .agents/policies/.toml + 7 .agents/tools/*.json). Cross-referenced each against the 7 new session artifacts (nagent_review, 3 docs guides, ASCII-sketch workflow, SSDL digest, C11 interop v1+v2, 2 new tracks) and the 3 user-correction patterns (duffle-as-style-ref, v2 request/response model, "only under hard constraint"). The 10 recommendations: 1 (HIGH) Update architecture-fallback with new docs 2 (HIGH) Document ASCII-sketch workflow in workflow.md 3 (HIGH) Document SSDL digest in product-guidelines.md 4 (HIGH) Add user_corrections_log to State.toml Template 5 (MED) Document contingency track pattern 6 (MED) Update Compaction Recovery to reference session_synthesis 7 (MED) Document v1->v2 framing iteration anti-pattern 8 (MED) Document preserve-before-compact archive pattern 9 (LOW) Document MiniMax understand_image for ASCII verification 10 (LOW) Document per-proposal commit chain with git notes 4 HIGH-priority = ~75 min to act on. All 10 = ~2-3 hours. The audit is conservative: it does NOT recommend changing TDD, the per-task commit discipline, the 4-tier MMA model, product.md, tech-stack.md, the existing styleguides, or adding new audit scripts. The session did not surface conflicts with any of these. Meta-pattern: workflow/agent markdown is the theoretical contract; session artifacts are the empirical evidence; when the two diverge, update the theory to match the evidence. This session's evidence (new methodology, new vocabulary, new patterns, new anti-patterns) drives the 10 recommendations.	2026-06-09 09:15:57 -04:00
ed	37b9a68017	docs: add test_infra_hardening foundation + RAG batch failure status Foundation document for the future test_infra_hardening track that will address session-scoped live_gui fixture isolation, silent __getattr__/__setattr__ contract assumptions, and similar test infrastructure fragility. Also documents the test_rag_phase4_final_verify batch failure that surfaces after the __getattr__ fix unblocks test_full_live_workflow. The RAG test failure is NOT a regression - it reproduces on pre-fix HEAD too. It's a pre-existing test isolation issue (the live_gui fixture is session-scoped, so state from the 4 sims pollutes the controller).	2026-06-09 00:26:05 -04:00
ed	bcdc26d0bd	fix(gui): correct __getattr__ to not silently return None for missing ui_ attrs PR1 follow-up (the actual IM_ASSERT root cause fix). The IM_ASSERT in 'MainDockSpace' was triggered by the render_approve_script_modal function (gui_2.py:4895) calling imgui.checkbox with a None value for app.ui_approve_modal_preview. The chain of bugs: 1. AppController.__getattr__ returned None for ANY ui_ attribute (line 1237-1238). This was intended as a safety net for ui_* flags defined in __init__ but it was too généreux: it returned None for ui_ attrs that were NEVER set. 2. The pattern in render_approve_script_modal: if not hasattr(app, 'ui_approve_modal_preview'): app.ui_approve_modal_preview = False _, app.ui_approve_modal_preview = imgui.checkbox(..., app.ui_approve_modal_preview) relied on hasattr() returning False for unset attrs to trigger the initialization. But the App.__setattr__ checks hasattr(self.controller, name) to decide where to route assignments. The controller's __getattr__ returned None for ui_approve_modal_preview, so hasattr() returned True. The App.__setattr__ routed the assignment to the controller. The controller's __getattr__ then returned None on read, silently dropping the False value. 3. The next line called imgui.checkbox with None, which raised a TypeError. The TypeError propagated out of render_approve_script_modal without closing the modal, leaving the ImGui scope stack unbalanced. The unbalanced scope triggered IM_ASSERT(Missing End()) on the next frame. Fix: AppController.__getattr__ now only returns None for an EXPLICIT allowlist of ui_ attrs that are defined in __init__. For any other missing attribute (including the case 'hasattr() should return False'), it raises AttributeError. The App.__getattr__ was also fixed (per the test) to check hasattr(controller, name) before delegating. This is defense in depth in case other __getattr__ patterns are added. Test verification (TDD red → green): - 1/1 test_app_getattr_hasattr_bug PASSES (verifies hasattr returns False for unset attrs via App.__getattr__) - 1/1 test_app_controller_getattr_ui_bug PASSES (verifies hasattr returns False for unset ui_ attrs on controller) Live verification: - 4 sims + test_live_workflow + 2 markdown tests: 7/7 PASS in 83.15s - Previously failed at 200s+ with 'cannot schedule new futures after shutdown' / 121s with 'GUI is degraded before test starts' - Now passes cleanly. The IM_ASSERT no longer fires. 13/13 related unit tests pass (app_controller_* + app_run_* + app_getattr_*). No regressions in 51/51 io_pool/warmup/sigint/etc. unit tests.	2026-06-08 23:45:25 -04:00
conductor-tier2	999fdea467	docs(c11-interop): cross-reference SSDL digest in See Also The SSDL digest (docs/reports/computational_shapes_ssdl_digest_20260608.md, 504 lines, 30KB) is the theoretical foundation for the chunkification pattern. Per the digest's Technique 5 "Assume-away (Xar)" in §2.2 and the "Xar-style chunked arrays" recommendation in §5.2, the chunkification track is a direct application of the SSDL's "assume as much as possible" lens (§4). This commit adds the SSDL digest to the See Also of the v1+v2 C11-Python interop assessment (front-matter Cross-references line). The same cross-reference is also being added to: - conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md (in a new §6.1 "SSDL alignment" subsection) - conductor/tracks/manual_ux_validation_20260608_PLACEHOLDER/spec.md (in §5 Architectural Reference + §6 See Also + a new §2.6 "SSDL cross-reference" section that distinguishes GUI ASCII vocabulary from SSDL vocabulary) No code modified. Cross-reference only. Also: small update to conductor/tracks.md to add the 2 new tracks (manual_ux_validation_20260608_PLACEHOLDER as Active; chunkification_optimization_20260608_PLACEHOLDER as Backlog/Contingency).	2026-06-08 23:42:21 -04:00
conductor-tier2	5b3c11a0f3	conductor(track): manual_ux_validation_20260608_PLACEHOLDER - ASCII-sketch workflow + first-target redesign The user said (verbatim): "On number 1. I love the idea and definitely see poitental." This commit creates a full track that promotes the ASCII-sketch UX ideation workflow (docs/reports/ascii_sketch_ux_workflow_20260608.md, 340 lines) to a real track with a concrete first target. The track complements (does not replace) the existing manual_ux_validation_20260302 track (which is a general UX review track; this 2026-06-08 track is focused on the ASCII-sketch workflow specifically). Files (5 total, ~52KB, 12,000+ words): - spec.md (186 lines, 9 sections) - track design, 5 open questions, first target analysis, SSDL cross-reference - plan.md (~280 lines, 4 phases, 21 tasks) - TDD-style with WHERE/WHAT/HOW/SAFETY annotations - metadata.json (~120 lines) - structured metadata, 5 open questions with defaults, 5 SSDL principles available - state.toml (~95 lines) - per-task tracking + phase status - index.md (~50 lines) - track context + related docs Key design decisions captured: 1. Two distinct vocabularies are conflated at first glance: - GUI ASCII (the workflow) for panel sketches - SSDL (computational shapes digest) for internal code sketches Spec §2.6 makes the distinction explicit; both are useful for this track (GUI ASCII for Phase 2 design; SSDL for Phase 3 internal refactoring documentation). 2. The 5 open questions from the workflow report (Q1 vocabulary, Q2 comparison policy, Q3 storage location, Q4 tooling, Q5 frequency) are documented with sensible defaults in spec.md §2.1-2.5 and metadata.json. The user can override any of them; defaults pre-stage the work. 3. First target is src/gui_2.py:3770 render_discussion_entry (Discussion Hub per-entry panel). Rationale: - Most-edited surface (every AI/user message) - User has strong opinions (per nagent_review_20260608 3 rounds of corrections) - 23-op matrix A1-A7 is the source of truth - ImGui layout maps cleanly to ASCII - SSDL defusing techniques can guide the internal refactoring 4. 4 phases: 1=resolve 5 questions, 2=execute workflow on first target (1-3 ASCII rounds), 3=implement per design contract (TDD with 7 test files for A1-A7 operations), 4=document the pattern + propose 5-7 next targets. Cross-references added throughout: - docs/reports/computational_shapes_ssdl_digest_20260608.md (the SSDL digest, with explicit "this is a different vocabulary for a different purpose" note in spec §2.6) - docs/reports/ascii_sketch_ux_workflow_20260608.md (the workflow) - docs/guide_discussions.md (the 23-op matrix A1-A7) - conductor/tracks/nagent_review_20260608/ (the source of the user's editable-discussion corrections) - conductor/tracks/manual_ux_validation_20260302/ (complementary general UX review track) - conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/ (the contingency track; referenced in spec §2.6 SSDL cross-ref) No code modified. Track is active; Phase 1 (5 user-questions) is the current phase. User-confirmed worth doing in the prior turn.	2026-06-08 23:41:43 -04:00
conductor-tier2	816e9f2f5c	conductor(track): chunkification_optimization_20260608_PLACEHOLDER - 1-page contingency document The user's third correction this session changed the framing from "build a stateful C extension" to "wait for a hard constraint, then build a request/response blob pipeline." This commit creates a 1-page contingency document (no plan.md, no implementation) that captures: - The threshold: "only worth it under a hard constraint that no existing Python package can solve" - The shape when activated: subprocess-launch C11 binary with request/response blob wire format (NOT stateful CPython C extension) - The 2 cited candidates (markdown parsing into aggregate markdown, context snapshot processing) are NOT currently bottlenecks per src/aggregate.py:380-454 (pure-Python string concat, zero third-party markdown deps in pyproject.toml:6-27) and src/history.py:1-141 (bounded ~500KB at 100-snapshot capacity, debounced) - The SSDL digest's Technique 5 "Assume-away (Xar)" in §2.2 + "Xar-style chunked arrays" recommendation in §5.2 pre-support this track Files (4 total, 227+ lines of contingency document): - conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md - conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/metadata.json - conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/state.toml - conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/index.md Cross-references added: - docs/reports/computational_shapes_ssdl_digest_20260608.md (the SSDL digest is the theoretical foundation; explicitly cited in the spec's §6.1 "SSDL alignment" and in metadata.json external) - docs/reports/c11_python_interop_assessment_20260608.md (the v1+v2 assessment; explicitly cited in spec's §6 See Also) No code modified. Track does NOT appear in the active queue of conductor/tracks.md; appears in the Backlog / Contingency section as a reference, not a commitment. Activation criteria (per metadata.json): 1. Profiling shows a real bottleneck in a target code path 2. The bottleneck cannot be solved with existing Python packages 3. The user explicitly approves activation Without all 3, this track stays deferred. Default action is don't.	2026-06-08 23:40:27 -04:00
conductor-tier2	12311190b3	docs(interop-v2): part 3 revises the recommendation after user's threshold-shift + shape-change corrections The user pushed back on the v1 recommendation (commit `68354841`) twice in this turn. Both corrections reshape the answer. Correction 1 (already incorporated): duffle.h + pikuma ps1 are a C11 STYLE REFERENCE, not an interop pattern. (Captured in v1 §0.) Correction 2 (NEW, this commit): The C11 path is only worth it under a hard constraint that no existing Python package can solve. The shape is request-blob -> C11 pipeline -> response-blob, NOT a stateful C extension with a Python-facing API. Targets cited: parsing markdown files/sources into aggregate markdown, context snapshot processing, "possibly other things." This commit adds Part 3 (sections 3.1-3.12) to the existing doc. Part 1 (style) and Part 2 (general interop) stay as background. Section 4 is re-flagged as "SUPERSEDED - see Part 3". Part 3 covers: - The two moves the user's second correction made (threshold-shift on when, shape-change on what) - Grounded analysis of the 2 cited targets against actual code: * src/aggregate.py:380-454 (current markdown hot path is pure-Python string concat; pyproject.toml has zero third-party markdown deps) * src/history.py:1-141 (snapshot processing is bounded ~500KB at 100-snapshot capacity; pickle is the obvious cheap fix, not C11) - The request/response wire format design space (text vs binary vs hybrid envelope-text+payload-binary) - The pipeline API shape (single C entry point, subprocess-launch model) - Revised answer to the "chunkification" question (chunk-array becomes an internal C implementation detail, not a Python type) - Decision tree: profile first, try existing Python packages, only reach for C11 when hard constraint surfaces - The 4 questions to revisit when constraint surfaces - Revised insight: v2 (subprocess + wire format) is strictly more tractable than v1 (stateful C extension) - Track implications: chunkification_optimization becomes a 1-page contingency, not a full track; manual_ux_validation unaffected and confirmed - v2 verdict matrix (11 rows) replacing v1's 7 Cross-references the actual code paths I read this turn: - src/aggregate.py:380-454 (build_markdown_from_items) - src/summarize.py:1-219 (the 3 _summarise_* functions) - src/history.py:1-141 (UISnapshot, HistoryManager) - pyproject.toml:6-27 (no markdown deps) The user is right to push back. The v1 framing was over-engineered. "Build a stateful C extension" assumed a future need; the actual answer is "wait for a real bottleneck, then build a simple subprocess pipeline." The 843-line doc now captures both the v1 over-engineering AND the v2 contingency plan, so future sessions can see the iteration and learn from it.	2026-06-08 23:07:24 -04:00
conductor-tier2	68354841cb	docs(interop-assessment): C11 <-> Python interop design space for chunkification_optimization The user asked a sharp, skeptical question: can a chunk-based C11 data structure actually interop with Python's runtime in a way that's useful for Manual Slop? They explicitly corrected my first-draft framing (the duffle.h + pikuma ps1 files are a C11 style reference, not an interop pattern). The assessment investigates honestly and reports tractable-vs-not. docs/reports/c11_python_interop_assessment_20260608.md (564 lines, 38KB): Part 1: C11 style reference summary - 11 style observations from reading duffle.h + main.c + pikuma ps1 duffle/ + hello_gte.c end-to-end - Byte-width typedef convention (U1/U2/U4/U8, S1/S2/S4/S8, B1-B8, F4/F8) - The macro meta-DSL (Struct_/Enum_/Array_/Slice_/Opt_/Ret_) - The I_/IA_/N_ inline discipline - The r/v pointer rule (restrict OR volatile, never both, never const) - Slice + Slice_T as the data-structure primitive - FArena as the allocation primitive (single-buffer, NOT chunked) - defer/defer_rewind/scope as the cleanup primitive - KTL (linear key-value table) as the "assume small N" pattern - What a chunk-array in duffle.h style would look like Part 2: Interop design space (the actual question) - 5 candidate interop layers: ctypes, cffi, pybind11, custom CPython C extension, NumPy wrap - Honest assessment matrix: build cost, per-op overhead, style fit, lego-set pattern support - Verdict: custom CPython C extension is most tractable; pybind11 is style-mismatched; ctypes/cffi work for non-hot-path - What "MVP chunked C11 package" requires (~500-1000 LOC total) - 5 questions to ask the user before this becomes a track - Crucial insight: the user's "unorthodox" interop is most likely duffle.h-style C11 + thin PyTypeObject glue at the bottom of the same .h file. Tractable, style-fit high. Cross-references the 5 sources: - docs/transcripts/i-h95QIGchY (Reece's Xar reference impl) - docs/ideation/ed_chunk_data_structures_20260523.md - docs/reports/session_synthesis_20260608.md (the original proposal) - src/app_controller.py:716 (the comms.log target) - The user's local forth_bootslop + pikuma ps1 repos (read in full) This is a follow-on to the synthesis's 2 proposed tracks (manual_ux_validation_20260608_PLACEHOLDER + chunkification_optimization_20260608_PLACEHOLDER). The user's question resolved the "skeptical of #2" concern by scoping the tractable path: CPython C extension in duffle.h style. The "lego-set of user-defined Python->C11 chunk ops" is NOT tractable without a Python->C11 AST emitter, which is a different (much larger) track.	2026-06-08 22:50:03 -04:00
conductor-tier2	77d7dff5ff	docs(session-synthesis): preserve-before-compact archive of the 2026-06-08 session The user explicitly requested the biggest in-depth report I can muster at 478,992 tokens (94% of context window). The next session will start with a fresh context; these two documents are the minimum-sufficient anchor. docs/reports/session_synthesis_20260608.md (579 lines, 40KB): - 12 sections covering every artifact this session produced - The 5 sources loaded: 2 YouTube transcripts + 2 Fleury articles + user's chunk-ideation archive - The 10 commits in the session's commit chain (with the user's test-fragility work adjacent but not mine) - The 4 audit-time heuristics derived from the 5-source lens - The "what the user should know" section for next session docs/reports/proposed_new_tracks_20260608.md (190 lines, 12KB): - 2 new tracks proposed (manual_ux_validation_20260608_PLACEHOLDER, chunkification_optimization_20260608_PLACEHOLDER) with spec-ready detail - 8 non-recommendations (so the user knows what I'm NOT suggesting) - A "what I'd recommend" section with one-tracks-when sequencing No code modified. Both are session-final artifacts, not tracks. They live in docs/reports/ alongside the other session outputs (SSDL digest, ASCII-sketch workflow, chunk ideation archive). Cross-references the 5 sources (all committed to docs/transcripts/ and docs/ideation/ in earlier user commits): - docs/transcripts/wo84LFzx5nI_big_oops_casemuratori.txt - docs/transcripts/i-h95QIGchY_assuming_as_much_as_possible_andrewreece.txt - docs/ideation/ed_chunk_data_structures_20260523.md - docs/reports/computational_shapes_ssdl_digest_20260608.md - docs/reports/ascii_sketch_ux_workflow_20260608.md These 5 documents are the session's "thinking-aid" corpus. The synthesis is the index; together they're the minimum-sufficient context to re-anchor any future session.	2026-06-08 22:25:00 -04:00
conductor-tier2	a9333bbb59	conductor(track-update): code_path_audit_20260607 - post-4-tracks timing + 5-source framing The user specified that the code_path_audit_20260607 track should run AFTER the 4 foundational tracks complete (qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor). This commit formalizes that timing and grounds the audit's analytical framing in the 5 sources loaded into context on 2026-06-08. 3 surgical additions to the spec/plan, no task changes: 1. Post-4-tracks timing (new section in spec.md §"Timing", plus a "Timing" callout in plan.md's opening): - The 4 tracks will significantly reshape src/ai_client.py, src/mcp_client.py, src/app_controller.py, and src/type_aliases.py - Running the audit on pre-refactor code would produce a report that's stale on day 1 - The post-4-tracks timing ensures the audit grounds optimization decisions for the resulting architecture - Pre-flight check: verify all 4 tracks are [x] completed in conductor/tracks.md before starting this track 2. Analytical framing (new section in spec.md §"Analytical Framing (5-source lens)"): - Maps each of the 5 sources (Fleury taxonomy + Fleury combinatoric + Muratori Big OOPs + Reece Assuming + user's chunk ideation) to specific audit-time heuristics - 4 concrete heuristics: effective-codepath count, entity-hierarchy fingerprint, assumed-too-much detector, chunkification candidates - The heuristics shape REPORT INTERPRETATION, not the static cost model (which stays data-grounded in EXPENSIVE_THRESHOLD + per-class weights) 3. See Also cross-references in spec.md (6 new entries): - nagent_review Pitfalls #2 and #4 (provider history globals + stateful singleton) - wo84LFzx5nI Big OOPs transcript (full text, 4310 segments, 200KB; loaded 2026-06-08) - i-h95QIGchY Assuming transcript (full text, 3719 segments, 162KB; loaded 2026-06-08) - ed_chunk_data_structures_20260523.md (5-image archive of user's chunk ideation, 19KB; saved 2026-06-08) - computational_shapes_ssdl_digest_20260608.md (the SSDL digest that synthesizes the 4-source computational-shapes thinking; the audit's tree/mermaid outputs ARE computational-shape visualizations) 4. tracks.md entry updated to include the spec/plan links and a brief status note that the audit is post-4-tracks. 5. plan.md has a "Timing" callout at the top stating the 4 tracks must ship before the plan executes. No code modified. The audit's tasks (Phases 1-6) are unchanged in structure; the new sections only add analytical context and timing constraints.	2026-06-08 22:05:54 -04:00
ed	2eef50c5c2	transcripts	2026-06-08 21:49:35 -04:00
ed	d7b66a5dda	ideating chunk-based data structures	2026-06-08 21:45:30 -04:00
ed	0be9b4f0fb	digest on computational shapes ssdl	2026-06-08 21:23:11 -04:00
ed	51ecace464	test(live_workflow): pre-flight health check fails fast on dirty state PR3 of the test_full_live_workflow_imgui_assert fix sequence. When a prior live_gui test in the same session crashes the GUI (e.g. via an ImGui IM_ASSERT from cumulative panel state), the controller's _io_pool gets shut down. The next test starts in a degraded state but only discovers this 120s later when its project switch times out with a confusing 'cannot schedule new futures after shutdown' error. This commit adds a /api/gui_health pre-flight check at the start of test_full_live_workflow. If the GUI is degraded, the test fails fast (within 1s) with a clear, actionable message that includes: - The exact RuntimeError that caused the degradation - The full traceback of the last ImGui scope mismatch - A note that the new test cannot proceed with a dirty state Per user feedback 2026-06-08: 'I don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails. Just has to note that the new file didn't get to deal with a dirty state.' Also includes the planning documents written earlier in this session: - TODO_test_full_live_workflow_v2.md (task list) - test_full_live_workflow_imgui_assert_20260608.md (root cause report) - test_full_live_workflow_propagation_digest_20260608.md (solutions digest) - batch_resilience_plan_20260608.md (batch resilience plan) Verification: - test_full_live_workflow in isolation: 13.45s PASS (health=True, no degrade) - 4 sims + test_full_live_workflow in batch: 76.46s (1 FAIL fast, 4 sims PASS) - Without PR3 fix: 200s FAIL with confusing 120s timeout - With PR3 fix: 76s FAIL with clear 'GUI is degraded' message - The fast-fail is observable, not silent (per user's 'wrap might be worth it if that properly lets us handle the assert')	2026-06-08 21:17:54 -04:00
conductor-tier2	8a597d1832	conductor(track-update): mcp_architecture_refactor - list_tool_schemas + security-as-contract 4 surgical additions to the spec, no task changes: 1. list_tool_schemas on the SubMCP Protocol: Added the method to §3.1 (The SubMCP Protocol). Per nagent_review Pitfall #6 (hard-coded tool discovery) and takeaway #5 (self-describing tools), each sub-MCP advertises its own capabilities via list_tool_schemas() rather than relying on a central registry. This is the equivalent of nagent's collect_bin_tool_descriptions per sub-MCP. The MCPController.get_tool_schemas() becomes a simple aggregator. 2. Security model is the contract: Added a new Important note to §3.3 (The 3-Layer Security Model). The 3 layers (Allowlist Construction -> Path Validation -> Resolution Gate, per docs/guide_mcp_client.md) are not just refactored - they are the CONTRACT between MCPController and the sub-MCPs. Sub-MCPs receive a pre-validated Path and trust it. They do NOT re-validate. The refactor is structural, not security-changing. 3. Docs touchpoint in Phase 7: Added the docs touchpoint to Phase 7 per the docs Refresh Protocol. The update to docs/guide_mcp_client.md should add a Sub-MCP Architecture section, link the list_tool_schemas pattern to 3-Layer Security Model, and cross-link the 3 new guides from the 2026-06-08 docs refresh. 4. See Also cross-references: Added 8 new entries to §12.2: - docs/guide_context_aggregation.md (FileItem consumer) - docs/guide_state_lifecycle.md (App state delegation) - docs/guide_discussions.md (23-operation matrix) - conductor/tracks/qwen_llama_grok_integration_20260606/ (Result return type coordination) - conductor/tracks/nagent_review_20260608/{report,takeaways}.md - (2 specific data_oriented_error_handling and data_structure_strengthening cross-refs) No plan.md changes.	2026-06-08 20:59:27 -04:00
conductor-tier2	1fb0d79c0d	conductor(track-update): data_structure_strengthening - HistoryMessage vs ProviderHistoryMessage split 4 surgical additions to the spec, no task changes: 1. ProviderHistoryMessage: Added a new alias to §3.1 (The Aliases). Per nagent_review Pitfall #4 (provider history divergence), the UI/curation layer (HistoryMessage, edited via disc_entries[i].content) and the SDK layer (ProviderHistoryMessage, the bytes actually replayed to the LLM) are distinct. Conflating them via a single alias perpetuates the bug. The new alias is documented as a separate concept with its own use sites (_anthropic_history, _deepseek_history, _minimax_history, _grok_history, _llama_history). The follow-up public_api_migration_20260606 track is the natural moment to unify the two layers; this spec just makes the distinction explicit. 2. FileItem alias points to the existing models.FileItem dataclass, not Metadata. Per docs/guide_context_aggregation.md (added 2026-06-08), FileItem is a 9-field dataclass (path, auto_aggregate, force_full, view_mode, selected, ast_signatures, ast_definitions, ast_mask, custom_slices, injected_at) with a __post_init__ normalizer. Aliasing it to dict[str, Any] would lose the type safety. The 9 other aliases remain dict aliases for round-trip compatibility. 3. gui_2.py and mcp_client.py as follow-up: Added a Note (dated 2026-06-08) to the Out of Scope section. The 23 lower-impact files (deferred) are dominated by gui_2.py (26+ weak sites per guide_state_lifecycle.md) and mcp_client.py (will be touched heavily by the parallel mcp_architecture_refactor_20260606). The deferral is correct but the follow-up should explicitly call out these two files as the next targets, rather than implying they're handled. 4. See Also cross-references: Added 7 new entries to §12.2: - docs/guide_models.md (FileItem dataclass source) - docs/guide_context_aggregation.md (FileItems consumer) - docs/guide_discussions.md (HistoryMessage shape) - docs/guide_state_lifecycle.md (state delegation) - conductor/tracks/mcp_architecture_refactor_20260606/ - conductor/tracks/nagent_review_20260608/{report,takeaways}.md No plan.md changes.	2026-06-08 20:50:50 -04:00
ed	1c565da7a0	feat(gui): wrap immapp.run in try/except + add /api/gui_health endpoint PR2 of the test_full_live_workflow_imgui_assert fix sequence. When an ImGui scope mismatch (IM_ASSERT(Missing End())) fires in immapp.run (e.g. after cumulative state corruption from prior sims' panel renders), the RuntimeError propagates out of app.run(). The controller's _io_pool gets shut down via __del__/finalization. The hook server (separate ThreadingHTTPServer) survives. Subsequent test clicks fail with 'cannot schedule new futures after shutdown' and the test times out after 120s with no clear signal of what went wrong. This commit: 1. Wraps immapp.run in try/except RuntimeError in gui_2.py:618. On assertion: logs the error to stderr (NOT silent), records it on controller._gui_degraded_reason and _last_imgui_assert, and returns from run() so the hook server keeps serving. 2. Adds _gui_degraded_reason and _last_imgui_assert to AppController.__init__ (initialized to None). 3. Adds /api/gui_health endpoint in api_hooks.py:148. Returns {healthy, degraded_reason, last_assert, io_pool_alive}. 4. Adds ApiHookClient.get_gui_health() with the matching unit tests (3 mocked tests + 1 live test). Per user feedback 2026-06-08: - The wrap does NOT silently swallow the error. It logs at ERROR level and surfaces it via the health endpoint. - Tests can call client.get_gui_health() to detect a degraded GUI and fail fast with a clear message. TDD: tests written first, confirmed to fail, then fix applied. 34/34 unit tests pass. 1/1 live test passes (live_gui health endpoint reports healthy=True on fresh subprocess).	2026-06-08 20:46:41 -04:00
conductor-tier2	0471440c68	conductor(track-update): data_oriented_error_handling - nagent_review + docs refresh 3 surgical additions to the spec, no task changes: 1. New ErrorKind: Added PROVIDER_HISTORY_DIVERGED_FROM_UI to the ErrorKind enum. Per nagent_review Pitfall #4 (provider history divergence: user edits disc_entries[i].content via the discussion UI but ai_client._<provider>_history still replays the original). The new kind makes the divergence detectable and reportable so the follow-up public_api_migration_20260606 track can collapse the two history layers. The Result pattern from this track is the natural carrier for the signal. 2. State-delegation regression tests: Added mandatory regression tests to the testing strategy in §6 for the ai_client refactor (highest-risk phase). The new tests exercise: - app.temperature = 0.5 round-trips through App.__getattr__/ __setattr__ delegation (per gui_2.py:666-675) - controller.disc_entries[i].content is reflected in the next send_result()'s messages parameter - The 3 per-provider history locks serialize correctly under concurrent send_result() calls The reason this is mandatory: per guide_state_lifecycle.md (added 2026-06-08), the App.__getattr__/__setattr__ pattern means a partial refactor manifests as silent AttributeError deep in test code, not at the refactor commit boundary. 3. See Also cross-references: Added 6 new entries to §12.3: - docs/guide_ai_client.md (per-provider history globals) - docs/guide_mcp_client.md (3-layer security model) - docs/guide_state_lifecycle.md (3 per-thread + 7-lock pattern) - docs/guide_discussions.md (23-operation matrix) - docs/guide_context_aggregation.md (build_discussion_section) - conductor/tracks/mcp_architecture_refactor_20260606/ - conductor/tracks/nagent_review_20260608/{report,takeaways}.md No plan.md changes. Plan tasks are task-level and will flow from the spec changes when the track is re-planned.	2026-06-08 20:41:00 -04:00
conductor-tier2	77ae2ec7a8	conductor(track-update): qwen_llama_grok - spec notes for nagent_review + docs refresh 4 surgical additions to the spec, no task changes: 1. Result return type: Added a coordination note in §3.1 (Data- Oriented Design) explaining that the shared send_openai_compatible helper should return Result[NormalizedResponse, ErrorInfo] from day 1, not NormalizedResponse + ProviderError raise. This is so the downstream data_oriented_error_handling_20260606 track is a small mechanical pass over new code, not a second migration. References nagent_review Pitfall #4 (provider history divergence) and the ErrorKind.PROVIDER_HISTORY_DIVERGED_FROM_UI use case. 2. Declarative read, not behavioral dispatch: Added clarification to §6 (UX Adaptation) that the capability matrix is a read of declarative data, not a new dispatch layer. Per nagent_review Pitfall #1 (opaque function calling in the Application is the correct choice; nagent-style protocol is for Meta-Tooling), UI elements are visible/enabled/disabled/hidden but the behavior they invoke is unchanged. Three concrete examples added: screenshot button, cost panel, cache panel. 3. PROVIDERS source of truth: Added a NOTE in §3.2 (Module Layout) that src/models.py:79-86 PROVIDERS is the existing single source of truth for the (vendor, model) enumeration. The capability registry reads from this constant rather than introducing a parallel list. Cross-references docs/guide_models.md. 4. Docs touchpoint: Expanded Phase 6 (Docs + Archive) in §9 to note that docs/guide_ai_client.md needs the new providers + the shared helper documented, and that docs/guide_context_aggregation.md (added 2026-06-08) is the reference for the aggregate.py pipeline that all new providers use. 5. See Also cross-references: Added 3 new entries to §13.2: - docs/guide_context_aggregation.md (the new pipeline guide) - conductor/tracks/nagent_review_20260608/report.md (§1, §5, §15) - conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md (§1, §2, §9) No plan.md changes. Plan tasks are task-level and will flow from the spec changes when the track is re-planned.	2026-06-08 20:35:52 -04:00
ed	d7a065e9d5	ascii gui comms worflow ideation	2026-06-08 20:32:42 -04:00
conductor-tier2	161ebb0da6	docs(fix): correct nav link case + relative-path level Gitea (and any case-sensitive filesystem) was rendering the [Top] nav links in /docs as broken because of two bugs: 1. Case-sensitivity: 22 links used '../README.md' (all-uppercase) but the actual file is 'docs/Readme.md' (capital R, lowercase rest). 21 guide_.md nav bars were affected, plus 1 internal cross-link in Readme.md itself. Works on Windows (case- insensitive) but broken on Linux/Gitea. Fix: 22 occurrences across 22 files changed '../README.md' -> '../Readme.md' 2. Wrong relative-path level: 16 links used '../../conductor/...' from 'docs/guide_.md' to reach 'conductor/'. This goes up 2 levels to 'projects/', which doesn't exist. The correct path from 'docs/guide_*.md' to 'conductor/' is 1 level up ('../conductor/...'). 12 unique patterns across 10 files affected. Fix: 16 occurrences across 10 files changed '../../conductor/' -> '../conductor/' 3. Bonus: 1 planned-guide link in guide_context_curation.md referenced a never-written 'guide_context_presets.md'. The ContextPreset schema is now fully covered in the new 'guide_context_aggregation.md' (per the 2026-06-08 docs refresh). Fix: link target updated. No content was changed, only link paths. 24 files, 37 link replacements, 37 deletions. Verification: - All .md links in docs/ now resolve to existing files (validated by path-resolution check from each file's directory) - The 3 new guides from the previous docs refresh commit (guide_discussions.md, guide_state_lifecycle.md, guide_context_aggregation.md) had the case bug inherited from guide_architecture.md's existing nav pattern; their top-of-file nav bars are now correct - The 21 pre-existing guide nav bars that had the same bug (all 21 of them, except the 3 that used the correct case: guide_mma.md, guide_simulations.md, guide_tools.md) are now also fixed - Inter-guide links (e.g. [Discussions](guide_discussions.md)) were not affected; they were always correct because both the link text and the actual filename are lowercase This is a docs-only fix. No code modified.	2026-06-08 19:51:55 -04:00
conductor-tier2	ba05168493	docs(refresh): 3 new guides + cross-links from nagent_review Per the docs Refresh Protocol (conductor/workflow.md), after a reference/analysis track ships, the affected guides must be updated to reflect new module structure or new conventions. The nagent_review track (`9cc51ca9`) produced a deep-dive + 10 actionable takeaways that named 3 documentation gaps in /docs. This commit fills them. 3 new guides (1,122 lines total): 1. guide_discussions.md (353 lines) — The Discussion system - 23-operation matrix: A1-A7 per-entry + B1-B11 discussion-level + C1-C5 undo/redo - Take naming convention (<base>_take_<n>), branching, promotion - User-managed role list (app.disc_roles) - Per-role filter linked to MMA persona focus - _disc_entries_lock thread-safety contract - Hook API session endpoints - Persistence: _flush_to_project, _flush_disc_entries_to_project, context_snapshot - 9 file:line refs into gui_2.py:3770-4260 + history.py 2. guide_state_lifecycle.md (375 lines) — Undo/redo + reset + state delegation - HistoryManager + UISnapshot (13 captured fields, 100-snapshot capacity, debounced change-detection at render frame) - _handle_reset_session (clears 30+ fields, replaces project, preserves active_project_path per the 2026-06-08 regression fix) - App.__getattr__/__setattr__ state delegation to Controller - 4-thread access pattern with 7 lock-protected regions - State persistence: in-memory vs project TOML vs config TOML - Hot-reload integration - Hook API registries (_predefined_callbacks, _gettable_fields) - 14 file:line refs into gui_2.py:1140-1170, history.py, app_controller.py:3286-3356 3. guide_context_aggregation.md (394 lines) — The aggregate.py pipeline - 3 aggregation strategies (auto, summarize, full) - 7 per-file view modes (full, summary, skeleton, outline, masked, custom, none) - Full FileItem schema (9 fields + __post_init__ normalizer) at models.py:510-559 - ContextPreset schema and ContextPresetManager - Tier 3 worker variant (build_tier3_context with FuzzyAnchor re-resolution and focus-file handling) - force_full / auto_aggregate short-circuits - Cache strategy (static prefix + dynamic history) - 23 file:line refs into aggregate.py:36-518 + models.py:909-937 8 existing guides cross-linked to the 3 new guides and to the nagent_review track: - guide_gui_2.md (+ See Also entries for discussions, state lifecycle, context aggregation, nagent_review report) - guide_app_controller.md (+ See Also entries for discussions, state lifecycle, context aggregation, nagent_review report) - guide_context_curation.md (+ new See Also section pointing to context aggregation + nagent_review) - guide_architecture.md (+ new See Also section listing all 10 guides + nagent_review report) - guide_ai_client.md (+ See Also entries for state lifecycle, context aggregation, nagent_review pitfalls #2 and #4) - guide_mma.md (+ new See Also section pointing to context aggregation, discussions, nagent_review report §9 + takeaways §3/§10 for SubConversationRunner priority) - guide_models.md (+ See Also entries for context aggregation, discussions, nagent_review report §6 on FileItem as strongest curation dimension) - Readme.md (+ 3 new guide entries in the index table, with one-line summaries) No code modified. This is documentation only. Why these 3 guides specifically: - guide_discussions.md: The discussion system is the user's most edited surface. nagent_review's report §3 enumerated 23 operations (A1-C5) that previously existed only as scattered file:line refs across gui_2.py. A dedicated guide makes the operation matrix discoverable. - guide_state_lifecycle.md: The undo/redo + reset + state delegation machinery is architecturally load-bearing but scattered across 4 files. After nagent_review identified the provider-side history divergence as Pitfall #4, the relationship between Manual Slop's state and the provider's state needs explicit documentation. - guide_context_aggregation.md: aggregate.py (518 lines) is the most-touched module after ai_client.py but had no dedicated guide. nagent_review confirmed it's Manual Slop's strongest curation dimension. A dedicated guide makes the 7 view modes and 3 strategies discoverable. The 3 new guides total 1,122 lines and follow the existing per-source-file deep-dive style (architectural, data-oriented, state-management-focused).	2026-06-08 19:26:08 -04:00
conductor-tier2	9cc51ca9af	conductor(track): nagent review - deep-dive + 6 pitfalls + 10 actionable takeaways Reference/analysis track. Produces 0 code changes. Artifacts (conductor/tracks/nagent_review_20260608/): - spec.md (240 lines) - track wrapper with Application/Meta-Tooling framing - report.md (571 lines) - 14-section deep-dive; primary deliverable - comparison_table.md (79 lines) - flat side-by-side reference - decisions.md (286 lines) - 10 future-track candidates with priority matrix - nagent_takeaways_20260608.md (363 lines) - 10 actionable patterns grounded in code (file:line refs into nagent source and Manual Slop source) - metadata.json (132 lines) - structured metadata + verification criteria - state.toml (113 lines) - per-task tracking + user-corrections log (7 entries) 14 nagent principles covered in report.md (durable work, text-in/text-out, editable state, visible protocol, the loop, per-file memory, repo history, neighborhoods, sub-conversations, controlled writes, large files, tool discovery, framework differences, build your own). 6 pitfalls (revised from 8 after user-corrections): 1. No structured output protocol in Application AI (opaque function calling) 2. Provider-specific history in process globals (ai_client._anthropic_history + _deepseek_history + _minimax_history) 3. RAG is not 'history as data' (fuzzy, not auditable) 4. AI client is a stateful singleton (2,685-line ai_client.py) 5. No non-MMA disposable sub-conversations (1:1 gap; user-flagged want) 6. Hard-coded tool discovery (45-tool if/elif in mcp_client.py) User-corrections applied (3 rounds, 7 total corrections recorded): - Editable discussions: PARTIAL -> PARITY (DIFFERENT FOCUS) with full A1-A7 per-entry + B1-B11 discussion-level + C1-C5 undo/redo operation matrix - Per-file memory: DOMAIN MISMATCH -> MANUAL SLOP IS STRONGER IN CURATION DIMENSION (FileItem + ContextPreset vs nagent's inode-keyed conversation log; complementary, not equivalent) - Sub-conversations: MMA has it; 1:1 does not -> 'PARITY for MMA; GAP for 1:1 discussions' (user wants this) - RAG: opt-in, not gap; user wants pre-staging via sub-conversation - Personas: config bundling (can opt out via AI settings) - Tool discovery: deferred (user has 'intent based DSL' idea but 'no where near that ideation yet') 10 actionable takeaways (separate from the 6 pitfalls - those are diagnosis, these are prescription): 1. State visibility (UI inspector for in-process state) 2. Readable conversation log (text-greppable, not just JSON-L) 3. Sub-agents for 1:1 (HIGH priority - user-flagged) 4. File-identity over file-path (st_dev:st_ino rename-safe) 5. One loop shape visible in diagnostics 6. Visible retry on protocol failure 7. Meta-Tooling DSL (intent-based, deferred) 8. Self-describing tools (subsumed by mcp_architecture_refactor_20260606) 9. Single source of truth for disc_entries + provider history 10. Sub-agent return type constraint (bake into candidate #1 spec) Domain classification: every recommendation tagged Application / Meta-Tooling / Both per docs/guide_meta_boundary.md. nagent lives in the Meta-Tooling domain; Manual Slop's Application AI is a different kind of thing. No code modified by this track (reference/analysis only). All 7 files parse cleanly (JSON, TOML, Markdown). All internal cross-links resolve. Track is 'active' awaiting human review; future-track candidates live in decisions.md and nagent_takeaways_20260608.md.	2026-06-08 18:44:35 -04:00
ed	c9a991bbb8	test(live_workflow): bump project switch wait timeout 30s -> 120s The 30s wait_for_project_switch timeout was an excessive constraint. In batch context, prior sims' AI discussion turn workers saturate the 8-worker io_pool, queueing this switch for tens of seconds. The other defensive waits in the test (warmup 60s, prior switch 60s) already use 60s+, so 30s was the inconsistent outlier. User confirmed: 'I think not completing in 30s is an excessive constraint if thats whats going on.' Verification: - test_full_live_workflow isolation: 11.69s PASS - 7-test batch (test_full_live_workflow + 4 extended sims + 2 markdown): 85.83s PASS	2026-06-08 18:14:18 -04:00
ed	87d7c5bff2	test(io_pool): update assertion for 8-worker pool size	2026-06-08 17:51:39 -04:00
ed	4a33848620	fix(io_pool): increase worker count from 4 to 8 to prevent test hangs Root cause: test_full_live_workflow in batch context (with prior sims running AI discussion turns) would queue its _do_project_switch behind the auto-pruner's scan of tests/logs/ (154MB, 6519 files). The 4-worker pool was saturated, so the switch would never run within 30s. Fix: bump IO_POOL_MAX_WORKERS from 4 to 8. This gives the pool enough capacity to run: 2 pruners + the project switch + 5 spare. Also: add /api/io_pool_status endpoint + get_io_pool_status + wait_io_pool_idle helpers (kept in api_hooks.py and api_hook_client.py for the test_api_hook_client_io_pool.py tests, even though the test itself no longer uses them - they remain useful for future tests that want to assert pool state directly). Also: add wait_for_warmup at the start of test_full_live_workflow to ensure SDK modules are loaded before AI ops. Test verification: - test_full_live_workflow in isolation: 11.83s PASS - test_full_live_workflow in batch (with 4 prior sims): 83.46s PASS - 30/30 related unit tests PASS	2026-06-08 17:49:34 -04:00
ed	9afc93bce2	fix(app_controller): clear project-switch state in _handle_reset_session When a prior test in the tier-3-live_gui batch leaves a _do_project_switch background thread running, the next test's btn_project_new_automated click sees _project_switch_in_progress=True (from the prior thread) and queues the new path via _project_switch_pending_path. The queued switch is never actually submitted to the io_pool, so is_project_stale() stays True and AI ops (_handle_generate_send) bail with 'project switch in progress; AI ops disabled'. Fix: _handle_reset_session now also clears _project_switch_in_progress, _project_switch_pending_path, and _project_switch_error (under the existing _project_switch_lock). This way, even if the prior background thread is still running, the controller reports an idle state and the new switch can be submitted normally. Also: - src/api_hook_client.py: reverted wait_for_project_switch to require in_progress=False (was relaxed to return on queued path, which misled the caller into thinking the switch was done) - tests/test_handle_reset_session_clears_project.py: new test test_handle_reset_session_clears_project_switch_state asserts is_project_stale() returns False after reset - tests/test_api_hook_client_wait_for_project_switch.py: updated test_wait_for_project_switch_does_not_return_on_queued (in_progress + matching path should keep waiting, not return early) - tests/test_live_workflow.py: added pre-wait for any in-flight switch before doing btn_reset (so the test waits up to 60s for the prior switch to complete if needed) - conductor/todos/TODO_test_full_live_workflow.md: updated Task 4 with the deeper hang analysis and recommended fix Known follow-up: test_full_live_workflow still hangs in tier-3 batch even with this fix, because the new _do_project_switch itself is hung in the io_pool (likely saturation from prior sims' AI discussion turn workers). Deeper investigation required.	2026-06-08 15:19:30 -04:00
ed	5087ee988d	chore: move TODO_test_full_live_workflow.md to conductor/todos/ Following the conductor convention of organizing track-related artifacts under conductor/. The TODO tracks the test_full_live_workflow race condition fix and its follow-up items (Tasks 3, 7 still pending; known batch hang documented). Tasks 1, 2 (with regression fix), 4, 5, 6 are SHIPPED in prior commits.	2026-06-08 14:05:40 -04:00
ed	3391e18f64	chore(pyproject): register pytest.mark.live marker Silences the PytestUnknownMarkWarning emitted by test_visual_mma.py and test_visual_sim_gui_ux.py (3 instances). The @pytest.mark.live mark already exists in the test files; pyproject.toml just didn't know about it. - pyproject.toml: added 'live: marks tests as live visualization tests (not in CI by default)' to [tool.pytest.ini_options].markers	2026-06-08 13:59:18 -04:00
ed	d09f70ea44	docs(todo): mark Tasks 4+5 as SHIPPED; note known batch hang issue	2026-06-08 13:37:13 -04:00
ed	b6972c31de	test(live_workflow): use wait_for_project_switch + defensive file check Replaces the 10x1s blind poll of derived state with a condition-based wait on /api/project_switch_status. Also adds a defensive file existence check that fails fast (within 5s) if the click was dropped or the project creation handler crashed. The new wait surfaces a clear error message ('Project switch did not complete in 30s. Last status: ...') instead of the generic 'Project failed to activate', and exposes _project_switch_error if the controller reported one. - tests/test_live_workflow.py: replaced poll loop (lines 57-65) with wait_for_project_switch + os.path.exists defensive check	2026-06-08 13:26:54 -04:00
ed	a6605d9889	feat(api_hook_client): add wait_for_project_switch for deterministic test waits Adds a polling helper that blocks until the project switch completes, errors out, or times out. Replaces the fragile 10x1s blind poll in test_full_live_workflow with a condition-based wait on the /api/project_switch_status endpoint. Features: - Polls /api/project_switch_status every 200ms (configurable) - Returns immediately on error (with the error in the result) - Path matching: exact match OR basename match (handles absolute vs relative) - Times out with a clear 'timeout' flag instead of a generic assertion - Optional expected_path: if None, returns on any in_progress=False - src/api_hook_client.py: new wait_for_project_switch method (37 lines) - tests/test_api_hook_client_wait_for_project_switch.py: 6 unit tests with mocked _make_request covering all paths	2026-06-08 13:04:12 -04:00
ed	54e46ee815	docs(todo): note regression discovered and fixed in test_context_sim_live	2026-06-08 12:35:24 -04:00
ed	4548726a2b	conductor(tracks): restructure - chronological by phase + status groupings + active queue table	2026-06-08 12:26:56 -04:00
ed	e0a3eb8c05	fix(app_controller): regression in test_context_sim_live from clearing active_project_path Task 2 (_handle_reset_session reset) introduced a regression: setting self.active_project_path to empty caused an infinite re-switch loop in _do_project_switch because _flush_to_project writes to active_project_path (raises OSError on empty path), and the finally block re-submitted the failed switch on every iteration. Result: test_context_sim_live saw switching-to status for 5+ seconds and MD-only generation was blocked. Fix: keep self.active_project_path as-is in _handle_reset_session. Only reset self.project (to a fresh default_project dict) and self.project_paths (to empty list). The stale project state issue is solved by replacing the project dict; the active_project_path stays valid for _flush_to_project. - src/app_controller.py: refined _handle_reset_session project reset - tests/test_handle_reset_session_clears_project.py: updated contract test to assert active_project_path is preserved	2026-06-08 12:24:10 -04:00
ed	40d61bf3d8	docs(todo): mark Tasks 1+2 as SHIPPED for test_full_live_workflow fix	2026-06-08 10:15:54 -04:00
ed	6ecb31ea0a	feat(app_controller): reset project state in _handle_reset_session Stale project state from prior live_gui tests (shared session-scoped subprocess) was leaking into subsequent tests, causing the test_full_live_workflow race condition: 'Project not switched' errors when self.project still claimed to be a different project. The fix: _handle_reset_session now mirrors the default-project branch of __init__ (lines 1743-1745), creating a fresh default project dict, clearing active_project_path and project_paths, and reinitializing the workspace manager. - src/app_controller.py: 6 new lines in _handle_reset_session - tests/test_handle_reset_session_clears_project.py: 3 tests (active_project_path, project_paths, self.project)	2026-06-08 10:13:07 -04:00
ed	abb3856525	feat(api_hooks): add /api/project_switch_status endpoint for deterministic test signaling Adds a new endpoint that exposes the project-switch state machine so tests can poll for completion instead of guessing with timeouts. - AppController: track _project_switch_error on failure paths - src/api_hooks.py: GET /api/project_switch_status returns {in_progress, pending_path, active_path, error} - src/api_hook_client.py: get_project_switch_status() helper - tests/test_api_hooks_project_switch.py: 3 unit tests for client + endpoint shape, 1 live_gui test for the default-idle case	2026-06-08 09:55:36 -04:00
ed	c531cebe03	conductor(plan): review pass — fix cross-references, add NOT_READY + with_errors + Lottes/Valigo, split §3.4 into 8 sub-tasks	2026-06-08 09:38:27 -04:00
ed	8248a49f1e	docs(todo): simple todo list for fixing test_full_live_workflow race	2026-06-08 09:25:18 -04:00
ed	08ee7547be	docs(reports): root cause report for test_full_live_workflow race condition	2026-06-08 09:24:14 -04:00
ed	64823493c0	conductor(closeout): ship test_batching_refactor_20260606 with CLOSEOUT.md and follow-up recommendation	2026-06-08 08:36:22 -04:00
ed	488ae04459	fix(run_tests_batched): detect batch failure from output when proc.returncode is wrong	2026-06-08 02:03:50 -04:00
ed	5c6eb620a1	fix(run_tests_batched): colorize non-xdist format (tests/... STATUS), filter 'Error during log pruning' noise	2026-06-08 01:54:56 -04:00
ed	272b7841ae	fix(run_tests_batched): filter xdist scheduling queue output (test paths without status prefix)	2026-06-08 01:51:07 -04:00
ed	a2d16541d0	fix(run_tests_batched): keep pytest's full -v output, only filter LogPruner/win errors, colorize per-test status	2026-06-08 01:49:39 -04:00

1 2 3 4 5 ...

2801 Commits