manual_slop

Private

Public Access

Author	SHA1	Message	Date
ed	aebbd66836	conductor(audit): document hardcoded workspace paths in test suite	2026-06-09 15:29:06 -04:00
ed	d1c6c6c327	conductor(audit): catalog live_gui test cross-file state dependencies	2026-06-09 15:28:56 -04:00
ed	fcb161fd2e	conductor(tracks): add test_infrastructure_hardening_20260609 as foundation track + supersede 4 placeholder test tracks	2026-06-09 15:18:20 -04:00
ed	566cf08cb8	conductor(track): test_infrastructure_hardening_20260609 - spec to kill the test regression nightmare	2026-06-09 15:15:26 -04:00
ed	b4d240a9f3	docs(rag): final report on dim-mismatch recursion fix	2026-06-09 15:04:42 -04:00
ed	40f905d14b	test(rag): update dim-mismatch test to assert rmtree behavior The fix in `644d88ab` changed the recovery path from client.delete_collection to shutil.rmtree (chromadb 1.5.x delete_collection is broken on corrupted state). The test still asserted the old behavior.	2026-06-09 14:50:55 -04:00
ed	644d88ab93	fix(rag): break recursion in _validate_collection_dim The wipe path called self._init_vector_store() which re-invoked _validate_collection_dim, causing infinite recursion (RecursionError) when the dim mismatch test ran with the mock embedding provider. Re-initialize the vector store INLINE after the rmtree wipe so the fresh collection is created without going through the validator again.	2026-06-09 14:47:01 -04:00
ed	f207d297a3	docs(rag): final fix report and next steps	2026-06-09 14:38:30 -04:00
ed	64bc04a6b8	fix(rag): wipe chroma dir on dim mismatch instead of delete_collection When the existing collection has embeddings from a different embedding provider (e.g. Gemini 3072-dim vs local 384-dim), the prior approach of calling client.delete_collection() fails with 'RustBindingsAPI object has no attribute bindings' in chromadb 1.5.x when the underlying state is corrupted. rmtree is reliable and re-creates a fresh empty collection. Also fixes: - 'The truth value of an empty array is ambiguous' on numpy 2.x by using try/except around len() instead of truthiness check - WinError 32 on rmtree by closing the chroma client first Verified: tests/test_rag_phase4_final_verify.py passes in isolation in 7.75s after this fix. The test still fails in batch context due to a separate io_pool race condition (multiple _sync_rag_engine calls collide when the test sets rag_enabled, rag_source, and rag_emb_provider in sequence). The race is in app_controller.py and is out of scope for this defensive fix. Note: tests/test_rag_engine.py has explicit unit tests for test_rag_collection_dim_mismatch_recreates_collection and test_rag_collection_dim_match_preserves_collection which exercise this code path.	2026-06-09 14:37:19 -04:00
conductor-tier2	ac0c0cbe73	docs(styleguide): add No-Diagnostic-Noise rule to AI-Agent Conventions One addition to conductor/code_styleguides/python.md §8 "AI-Agent Specific Conventions": - No diagnostic noise in production code (Added 2026-06-09). `sys.stderr.write(f"[XYZ_DIAG] ...") lines in src/.py are technical debt. The right place for one-time investigation output is tests/artifacts/<test>.diag.log (a log file) or a standalone /tmp/diag_<name>.py script. If you must instrument production code, the diag lines are part of the same atomic commit as the fix. - Test files ARE allowed to be diagnostic.* The rule applies to src/.py only; tests/test_.py may use print(..., file=sys.stderr) freely. Markdown only. No code modified.	2026-06-09 14:03:18 -04:00
conductor-tier2	631c40c9c4	docs(workflow): add Process Anti-Patterns section + Isolated-Pass rule Two additions to conductor/workflow.md §"Known Pitfalls": 1. Isolated-Pass Verification Fallacy (Added 2026-06-09) — the rule that a test passing in isolation but failing in batch is FAILING. The only verification that matters for live_gui tests is the batch run. This is the flip side of the existing "Live_gui Test Fragility (Authoring-Side)" rule. Cross-references that rule. 2. Process Anti-Patterns (Added 2026-06-09) — 8-rule summary list, with cross-reference to AGENTS.md for the full ruleset. The 8 patterns are: Deduction Loop, Report-Instead-of-Fix, Scope-Creep Track-Doc, Inherited-Cruft, Diagnostic Noise in Production, Premature Surrender, Verbose Commit Message, Isolated-Pass Verification Fallacy. Markdown only. No code modified. Cross-references AGENTS.md (the load-bearing agent doc) for the full text of each pattern.	2026-06-09 14:03:00 -04:00
conductor-tier2	d7dc1e3b90	docs(edit-workflow): fix set_file_slice rule + add contract-change check Three surgical fixes to conductor/edit_workflow.md: 1. §2 "Verify Before Editing" — removed the leftover `git checkout -- src/gui_2.py` instruction. The user's commit `4eba059e unfuck edit workflow` removed most of the git checkout nuke instructions but missed §2. The revised §2 now says: read the contract (function signature, yield shape, return type) before editing, and DO NOT use `git checkout` to revert. Ask the user. 2. §3 "Reading Before Editing" — added the line-number offset check. `set_file_slice` uses 1-indexed inclusive `start_line`/`end_line`; off-by-one is a common silent failure. The rule is now: confirm the exact line range with `get_file_slice` first. 3. §8 "set_file_slice IS Valid for Multi-Line Content (Revised 2026-06-09)" — replaced the wrong rule ("Do not use set_file_slice for multi-line content") with the correct rule: set_file_slice IS valid for 3-10 line surgical edits, with a tool-selection guide (which tool for which job), a mandatory contract-change check (search for callers of the symbol being changed; update all callers in the same atomic commit if the public interface changes), and a mandatory whitespace-and-EOL rule (preserve line ending, indentation, and line count). 4. §9 "No Diagnostic Noise in Production Code (Added 2026-06-09)" — new section. Diag stderr goes to log files or /tmp scripts, NOT src/.py. If you must add diag lines to production code, they are part of the same atomic commit as the fix — they do not live uncommitted in the working tree. 5. "If set_file_slice produces wrong indentation"* — new handler in the Step-by-Step Workflow. Tells the agent: you wrote the wrong indent; the tool did what you asked; re-read the file with get_file_slice; do NOT use git checkout to revert. These are the rule corrections the user demanded after the Tier-2's bad set_file_slice + git nuke + diag-noise behavior. Markdown only. No code modified.	2026-06-09 14:02:41 -04:00
conductor-tier2	113e68fe18	docs(agents): add Process Anti-Patterns section + revise set_file_slice rule The user explicitly called out the bad patterns the agents (Tier-2 and the parent session's Tier-1) have been exhibiting. This commit updates AGENTS.md to filter them out at the load-bearing agent doc level (the first file any agent reads). Three changes: 1. Revised the `set_file_slice` rule on line 38 of the Critical Anti-Patterns. The previous rule said "Do not use set_file_slice for multi-line content" — that was wrong. `set_file_slice` IS valid for multi-line content, provided the agent verifies the exact byte offsets with `get_file_slice` and checks for contract changes (function signature, yield shape, return type). The full revised rule is in `conductor/edit_workflow.md §8`. 2. Added "No diagnostic noise in production code" to the Critical Anti-Patterns. The pattern: agent adds `sys.stderr.write(f"[RAG_DIAG] ...") to src/.py` for debugging, then "reverts everything" but leaves the diag lines uncommitted. Next agent runs git status, sees the diag lines, either commits them by accident or spends 10 min cleaning them up. The rule: diag goes to log files or /tmp scripts, NOT src/.py. 3. Added "No loop, no scope-creep, no report-instead-of-fix" to the Critical Anti-Patterns. The 200-line status report is a confession, not a fix. The 5-phase "future track" document for a 1-line fix is scope-creep. The "I am not going to attempt another fix without your direction" surrender is allowed ONLY if the agent has already read-predicted-instrumented-run-captured. 4. Added a new section: "Process Anti-Patterns (Added 2026-06-09)" with 8 numbered anti-patterns, each with a Symptom, Rule, and reference. The 8 patterns are the ones the user explicitly called out: Deduction Loop, Report-Instead-of-Fix, Scope-Creep Track-Doc, Inherited-Cruft, Diagnostic Noise in Production, Premature Surrender, Verbose Commit Message, Isolated-Pass Verification Fallacy. These are the rules the user is filtering out of LLM training data noise. The full ruleset is the source of truth; AGENTS.md is the load-bearing entry point. No code modified. Markdown only.	2026-06-09 14:01:26 -04:00
ed	4eba059e89	unfuck edit workflow.	2026-06-09 13:48:17 -04:00
ed	eb8357ec0e	fix(rag): add CWD fallback in index_file for path-resolution resilience RAGEngine.index_file silently returns when the joined base_dir+file_path doesn't exist. This caused the RAG batch test to fail with 0 indexed documents when the live_gui subprocess's active_project_root resolved to a parent dir (e.g. tests/artifacts/) instead of the workspace (tests/artifacts/live_gui_workspace/). The fix: if the primary path doesn't exist, try CWD+file_path. The base_dir takes priority; CWD is a safety net for relative-path resolution across the spawn CWD boundary. This is a defensive fix at the rag_engine layer. It does NOT fix the underlying path-leakage issue in tests/conftest.py (hardcoded Path('tests/artifacts/live_gui_workspace')) which needs a proper fixture refactor. The RAG test still fails in batch due to that deeper issue, documented in docs/reports/rag_test_batch_failure_status_20260609_pm3.md. Behavior: - base_dir+file_path exists: indexed from base_dir (unchanged) - base_dir+file_path missing, CWD+file_path exists: indexed from CWD (new) - Both missing: silently returns (unchanged) Verified: tests/test_rag_index_file_path_fallback.py (3 tests, all pass) - test_index_file_finds_file_via_cwd_fallback - test_index_file_uses_base_dir_first - test_index_file_silently_returns_when_no_match Note: test file was removed before commit because it was being abandoned along with the broader path-hygiene refactor. The fix itself is preserved in src/rag_engine.py.	2026-06-09 12:31:21 -04:00
ed	b801b11c3b	conductor(todo): mark task 9 (test deps in dev + conftest gate) as shipped	2026-06-09 10:39:29 -04:00
ed	a341d7a7c8	test: ensure sentence-transformers is in test env + conftest gate	2026-06-09 10:37:14 -04:00
ed	2148e79a1c	docs(rag): document venv dep install + new failure mode (relative path bug) The venv now has sentence-transformers (installed via uv sync --extra local-rag). The RAG test passes in isolation (7.10s) but fails in batch with a NEW error: 'RAG context not found in history' (test_rag_phase4_final_verify.py:95). This is a SEPARATE bug from the missing-dep issue. The RAG test uses RELATIVE file paths ('final_test_1.txt' instead of absolute). The RAG engine indexes with these relative paths but the CWD is the project root, not the test's workspace dir. Result: 0 docs indexed, 0 chunks retrieved, no '## Retrieved Context' block in history. The fix to _sync_rag_engine (`e62266e8`) is still correct - it surfaces the error when the dep is missing. The dep is now installed, so the sync/index/AI flow runs to completion. The new failure is a deeper RAG test infrastructure bug that needs a separate track to fix.	2026-06-09 10:21:45 -04:00
ed	e62266e868	fix(rag): surface embedding provider init failure as 'error' status The bug: when the local embedding provider fails to initialize (e.g. sentence-transformers not installed), RAGEngine.__init__ leaves self.embedding_provider = None (initialized at line 93 but never overwritten by the failing LocalEmbeddingProvider ctor). The constructor returns. _sync_rag_engine's else branch then sets status to 'ready' - a lie. The RAG panel shows 'ready'. The user triggers a retrieval. The engine either has a broken embedding provider (None) or the retrieval fails silently. The RAG context never appears in the AI's history. The fix: in _sync_rag_engine's _task, after RAGEngine(...) returns, check if engine.embedding_provider is None. If so, set status to 'error: RAG embedding provider failed to initialize' and return early. This prevents: - The engine from being assigned to self.rag_engine - The rebuild being triggered - The status being set to 'ready' / 'indexing' Note: this does NOT make the RAG test pass. The test requires the sentence-transformers package which isn't installed in this env. The fix makes the failure reliable (not flaky) and surfaces the right error message. TDD: 3 tests added in tests/test_rag_engine_ready_status_bug.py: - RAGEngine ctor raises ImportError on missing sentence-transformers - _sync_rag_engine sets status to 'error' (not 'ready') on init failure - RAGEngine ctor leaves embedding_provider=None when init fails All 3 pass. The RAG batch test now fails reliably at line 46 with the clear error message.	2026-06-09 09:39:02 -04:00
conductor-tier2	adc7ff8029	docs(audit): workflow/agent markdown audit with 10 recommendations User asked: is there anything in our workflow or agent markdown that should be updated or introduced based on this session? This commit is the AUDIT ONLY. No workflow files are modified. The 10 recommendations are not yet applied. User picks which to act on, which to defer, which to discard. docs/reports/workflow_markdown_audit_20260608.md (~370 lines): Read all the workflow/agent markdown in scope (AGENTS.md, CLAUDE.md, GEMINI.md, all 5 .agents/skills//SKILL.md, the 4 .agents/agents/.md, conductor/workflow.md, product.md, product-guidelines.md, tech-stack.md, index.md, tracks.md, edit_workflow.md, the 2 existing code_styleguides/.md, and the 4 .agents/policies/.toml + 7 .agents/tools/*.json). Cross-referenced each against the 7 new session artifacts (nagent_review, 3 docs guides, ASCII-sketch workflow, SSDL digest, C11 interop v1+v2, 2 new tracks) and the 3 user-correction patterns (duffle-as-style-ref, v2 request/response model, "only under hard constraint"). The 10 recommendations: 1 (HIGH) Update architecture-fallback with new docs 2 (HIGH) Document ASCII-sketch workflow in workflow.md 3 (HIGH) Document SSDL digest in product-guidelines.md 4 (HIGH) Add user_corrections_log to State.toml Template 5 (MED) Document contingency track pattern 6 (MED) Update Compaction Recovery to reference session_synthesis 7 (MED) Document v1->v2 framing iteration anti-pattern 8 (MED) Document preserve-before-compact archive pattern 9 (LOW) Document MiniMax understand_image for ASCII verification 10 (LOW) Document per-proposal commit chain with git notes 4 HIGH-priority = ~75 min to act on. All 10 = ~2-3 hours. The audit is conservative: it does NOT recommend changing TDD, the per-task commit discipline, the 4-tier MMA model, product.md, tech-stack.md, the existing styleguides, or adding new audit scripts. The session did not surface conflicts with any of these. Meta-pattern: workflow/agent markdown is the theoretical contract; session artifacts are the empirical evidence; when the two diverge, update the theory to match the evidence. This session's evidence (new methodology, new vocabulary, new patterns, new anti-patterns) drives the 10 recommendations.	2026-06-09 09:15:57 -04:00
ed	37b9a68017	docs: add test_infra_hardening foundation + RAG batch failure status Foundation document for the future test_infra_hardening track that will address session-scoped live_gui fixture isolation, silent __getattr__/__setattr__ contract assumptions, and similar test infrastructure fragility. Also documents the test_rag_phase4_final_verify batch failure that surfaces after the __getattr__ fix unblocks test_full_live_workflow. The RAG test failure is NOT a regression - it reproduces on pre-fix HEAD too. It's a pre-existing test isolation issue (the live_gui fixture is session-scoped, so state from the 4 sims pollutes the controller).	2026-06-09 00:26:05 -04:00
ed	bcdc26d0bd	fix(gui): correct __getattr__ to not silently return None for missing ui_ attrs PR1 follow-up (the actual IM_ASSERT root cause fix). The IM_ASSERT in 'MainDockSpace' was triggered by the render_approve_script_modal function (gui_2.py:4895) calling imgui.checkbox with a None value for app.ui_approve_modal_preview. The chain of bugs: 1. AppController.__getattr__ returned None for ANY ui_ attribute (line 1237-1238). This was intended as a safety net for ui_* flags defined in __init__ but it was too généreux: it returned None for ui_ attrs that were NEVER set. 2. The pattern in render_approve_script_modal: if not hasattr(app, 'ui_approve_modal_preview'): app.ui_approve_modal_preview = False _, app.ui_approve_modal_preview = imgui.checkbox(..., app.ui_approve_modal_preview) relied on hasattr() returning False for unset attrs to trigger the initialization. But the App.__setattr__ checks hasattr(self.controller, name) to decide where to route assignments. The controller's __getattr__ returned None for ui_approve_modal_preview, so hasattr() returned True. The App.__setattr__ routed the assignment to the controller. The controller's __getattr__ then returned None on read, silently dropping the False value. 3. The next line called imgui.checkbox with None, which raised a TypeError. The TypeError propagated out of render_approve_script_modal without closing the modal, leaving the ImGui scope stack unbalanced. The unbalanced scope triggered IM_ASSERT(Missing End()) on the next frame. Fix: AppController.__getattr__ now only returns None for an EXPLICIT allowlist of ui_ attrs that are defined in __init__. For any other missing attribute (including the case 'hasattr() should return False'), it raises AttributeError. The App.__getattr__ was also fixed (per the test) to check hasattr(controller, name) before delegating. This is defense in depth in case other __getattr__ patterns are added. Test verification (TDD red → green): - 1/1 test_app_getattr_hasattr_bug PASSES (verifies hasattr returns False for unset attrs via App.__getattr__) - 1/1 test_app_controller_getattr_ui_bug PASSES (verifies hasattr returns False for unset ui_ attrs on controller) Live verification: - 4 sims + test_live_workflow + 2 markdown tests: 7/7 PASS in 83.15s - Previously failed at 200s+ with 'cannot schedule new futures after shutdown' / 121s with 'GUI is degraded before test starts' - Now passes cleanly. The IM_ASSERT no longer fires. 13/13 related unit tests pass (app_controller_* + app_run_* + app_getattr_*). No regressions in 51/51 io_pool/warmup/sigint/etc. unit tests.	2026-06-08 23:45:25 -04:00
conductor-tier2	999fdea467	docs(c11-interop): cross-reference SSDL digest in See Also The SSDL digest (docs/reports/computational_shapes_ssdl_digest_20260608.md, 504 lines, 30KB) is the theoretical foundation for the chunkification pattern. Per the digest's Technique 5 "Assume-away (Xar)" in §2.2 and the "Xar-style chunked arrays" recommendation in §5.2, the chunkification track is a direct application of the SSDL's "assume as much as possible" lens (§4). This commit adds the SSDL digest to the See Also of the v1+v2 C11-Python interop assessment (front-matter Cross-references line). The same cross-reference is also being added to: - conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md (in a new §6.1 "SSDL alignment" subsection) - conductor/tracks/manual_ux_validation_20260608_PLACEHOLDER/spec.md (in §5 Architectural Reference + §6 See Also + a new §2.6 "SSDL cross-reference" section that distinguishes GUI ASCII vocabulary from SSDL vocabulary) No code modified. Cross-reference only. Also: small update to conductor/tracks.md to add the 2 new tracks (manual_ux_validation_20260608_PLACEHOLDER as Active; chunkification_optimization_20260608_PLACEHOLDER as Backlog/Contingency).	2026-06-08 23:42:21 -04:00
conductor-tier2	5b3c11a0f3	conductor(track): manual_ux_validation_20260608_PLACEHOLDER - ASCII-sketch workflow + first-target redesign The user said (verbatim): "On number 1. I love the idea and definitely see poitental." This commit creates a full track that promotes the ASCII-sketch UX ideation workflow (docs/reports/ascii_sketch_ux_workflow_20260608.md, 340 lines) to a real track with a concrete first target. The track complements (does not replace) the existing manual_ux_validation_20260302 track (which is a general UX review track; this 2026-06-08 track is focused on the ASCII-sketch workflow specifically). Files (5 total, ~52KB, 12,000+ words): - spec.md (186 lines, 9 sections) - track design, 5 open questions, first target analysis, SSDL cross-reference - plan.md (~280 lines, 4 phases, 21 tasks) - TDD-style with WHERE/WHAT/HOW/SAFETY annotations - metadata.json (~120 lines) - structured metadata, 5 open questions with defaults, 5 SSDL principles available - state.toml (~95 lines) - per-task tracking + phase status - index.md (~50 lines) - track context + related docs Key design decisions captured: 1. Two distinct vocabularies are conflated at first glance: - GUI ASCII (the workflow) for panel sketches - SSDL (computational shapes digest) for internal code sketches Spec §2.6 makes the distinction explicit; both are useful for this track (GUI ASCII for Phase 2 design; SSDL for Phase 3 internal refactoring documentation). 2. The 5 open questions from the workflow report (Q1 vocabulary, Q2 comparison policy, Q3 storage location, Q4 tooling, Q5 frequency) are documented with sensible defaults in spec.md §2.1-2.5 and metadata.json. The user can override any of them; defaults pre-stage the work. 3. First target is src/gui_2.py:3770 render_discussion_entry (Discussion Hub per-entry panel). Rationale: - Most-edited surface (every AI/user message) - User has strong opinions (per nagent_review_20260608 3 rounds of corrections) - 23-op matrix A1-A7 is the source of truth - ImGui layout maps cleanly to ASCII - SSDL defusing techniques can guide the internal refactoring 4. 4 phases: 1=resolve 5 questions, 2=execute workflow on first target (1-3 ASCII rounds), 3=implement per design contract (TDD with 7 test files for A1-A7 operations), 4=document the pattern + propose 5-7 next targets. Cross-references added throughout: - docs/reports/computational_shapes_ssdl_digest_20260608.md (the SSDL digest, with explicit "this is a different vocabulary for a different purpose" note in spec §2.6) - docs/reports/ascii_sketch_ux_workflow_20260608.md (the workflow) - docs/guide_discussions.md (the 23-op matrix A1-A7) - conductor/tracks/nagent_review_20260608/ (the source of the user's editable-discussion corrections) - conductor/tracks/manual_ux_validation_20260302/ (complementary general UX review track) - conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/ (the contingency track; referenced in spec §2.6 SSDL cross-ref) No code modified. Track is active; Phase 1 (5 user-questions) is the current phase. User-confirmed worth doing in the prior turn.	2026-06-08 23:41:43 -04:00
conductor-tier2	816e9f2f5c	conductor(track): chunkification_optimization_20260608_PLACEHOLDER - 1-page contingency document The user's third correction this session changed the framing from "build a stateful C extension" to "wait for a hard constraint, then build a request/response blob pipeline." This commit creates a 1-page contingency document (no plan.md, no implementation) that captures: - The threshold: "only worth it under a hard constraint that no existing Python package can solve" - The shape when activated: subprocess-launch C11 binary with request/response blob wire format (NOT stateful CPython C extension) - The 2 cited candidates (markdown parsing into aggregate markdown, context snapshot processing) are NOT currently bottlenecks per src/aggregate.py:380-454 (pure-Python string concat, zero third-party markdown deps in pyproject.toml:6-27) and src/history.py:1-141 (bounded ~500KB at 100-snapshot capacity, debounced) - The SSDL digest's Technique 5 "Assume-away (Xar)" in §2.2 + "Xar-style chunked arrays" recommendation in §5.2 pre-support this track Files (4 total, 227+ lines of contingency document): - conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md - conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/metadata.json - conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/state.toml - conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/index.md Cross-references added: - docs/reports/computational_shapes_ssdl_digest_20260608.md (the SSDL digest is the theoretical foundation; explicitly cited in the spec's §6.1 "SSDL alignment" and in metadata.json external) - docs/reports/c11_python_interop_assessment_20260608.md (the v1+v2 assessment; explicitly cited in spec's §6 See Also) No code modified. Track does NOT appear in the active queue of conductor/tracks.md; appears in the Backlog / Contingency section as a reference, not a commitment. Activation criteria (per metadata.json): 1. Profiling shows a real bottleneck in a target code path 2. The bottleneck cannot be solved with existing Python packages 3. The user explicitly approves activation Without all 3, this track stays deferred. Default action is don't.	2026-06-08 23:40:27 -04:00
conductor-tier2	12311190b3	docs(interop-v2): part 3 revises the recommendation after user's threshold-shift + shape-change corrections The user pushed back on the v1 recommendation (commit `68354841`) twice in this turn. Both corrections reshape the answer. Correction 1 (already incorporated): duffle.h + pikuma ps1 are a C11 STYLE REFERENCE, not an interop pattern. (Captured in v1 §0.) Correction 2 (NEW, this commit): The C11 path is only worth it under a hard constraint that no existing Python package can solve. The shape is request-blob -> C11 pipeline -> response-blob, NOT a stateful C extension with a Python-facing API. Targets cited: parsing markdown files/sources into aggregate markdown, context snapshot processing, "possibly other things." This commit adds Part 3 (sections 3.1-3.12) to the existing doc. Part 1 (style) and Part 2 (general interop) stay as background. Section 4 is re-flagged as "SUPERSEDED - see Part 3". Part 3 covers: - The two moves the user's second correction made (threshold-shift on when, shape-change on what) - Grounded analysis of the 2 cited targets against actual code: * src/aggregate.py:380-454 (current markdown hot path is pure-Python string concat; pyproject.toml has zero third-party markdown deps) * src/history.py:1-141 (snapshot processing is bounded ~500KB at 100-snapshot capacity; pickle is the obvious cheap fix, not C11) - The request/response wire format design space (text vs binary vs hybrid envelope-text+payload-binary) - The pipeline API shape (single C entry point, subprocess-launch model) - Revised answer to the "chunkification" question (chunk-array becomes an internal C implementation detail, not a Python type) - Decision tree: profile first, try existing Python packages, only reach for C11 when hard constraint surfaces - The 4 questions to revisit when constraint surfaces - Revised insight: v2 (subprocess + wire format) is strictly more tractable than v1 (stateful C extension) - Track implications: chunkification_optimization becomes a 1-page contingency, not a full track; manual_ux_validation unaffected and confirmed - v2 verdict matrix (11 rows) replacing v1's 7 Cross-references the actual code paths I read this turn: - src/aggregate.py:380-454 (build_markdown_from_items) - src/summarize.py:1-219 (the 3 _summarise_* functions) - src/history.py:1-141 (UISnapshot, HistoryManager) - pyproject.toml:6-27 (no markdown deps) The user is right to push back. The v1 framing was over-engineered. "Build a stateful C extension" assumed a future need; the actual answer is "wait for a real bottleneck, then build a simple subprocess pipeline." The 843-line doc now captures both the v1 over-engineering AND the v2 contingency plan, so future sessions can see the iteration and learn from it.	2026-06-08 23:07:24 -04:00
conductor-tier2	68354841cb	docs(interop-assessment): C11 <-> Python interop design space for chunkification_optimization The user asked a sharp, skeptical question: can a chunk-based C11 data structure actually interop with Python's runtime in a way that's useful for Manual Slop? They explicitly corrected my first-draft framing (the duffle.h + pikuma ps1 files are a C11 style reference, not an interop pattern). The assessment investigates honestly and reports tractable-vs-not. docs/reports/c11_python_interop_assessment_20260608.md (564 lines, 38KB): Part 1: C11 style reference summary - 11 style observations from reading duffle.h + main.c + pikuma ps1 duffle/ + hello_gte.c end-to-end - Byte-width typedef convention (U1/U2/U4/U8, S1/S2/S4/S8, B1-B8, F4/F8) - The macro meta-DSL (Struct_/Enum_/Array_/Slice_/Opt_/Ret_) - The I_/IA_/N_ inline discipline - The r/v pointer rule (restrict OR volatile, never both, never const) - Slice + Slice_T as the data-structure primitive - FArena as the allocation primitive (single-buffer, NOT chunked) - defer/defer_rewind/scope as the cleanup primitive - KTL (linear key-value table) as the "assume small N" pattern - What a chunk-array in duffle.h style would look like Part 2: Interop design space (the actual question) - 5 candidate interop layers: ctypes, cffi, pybind11, custom CPython C extension, NumPy wrap - Honest assessment matrix: build cost, per-op overhead, style fit, lego-set pattern support - Verdict: custom CPython C extension is most tractable; pybind11 is style-mismatched; ctypes/cffi work for non-hot-path - What "MVP chunked C11 package" requires (~500-1000 LOC total) - 5 questions to ask the user before this becomes a track - Crucial insight: the user's "unorthodox" interop is most likely duffle.h-style C11 + thin PyTypeObject glue at the bottom of the same .h file. Tractable, style-fit high. Cross-references the 5 sources: - docs/transcripts/i-h95QIGchY (Reece's Xar reference impl) - docs/ideation/ed_chunk_data_structures_20260523.md - docs/reports/session_synthesis_20260608.md (the original proposal) - src/app_controller.py:716 (the comms.log target) - The user's local forth_bootslop + pikuma ps1 repos (read in full) This is a follow-on to the synthesis's 2 proposed tracks (manual_ux_validation_20260608_PLACEHOLDER + chunkification_optimization_20260608_PLACEHOLDER). The user's question resolved the "skeptical of #2" concern by scoping the tractable path: CPython C extension in duffle.h style. The "lego-set of user-defined Python->C11 chunk ops" is NOT tractable without a Python->C11 AST emitter, which is a different (much larger) track.	2026-06-08 22:50:03 -04:00
conductor-tier2	77d7dff5ff	docs(session-synthesis): preserve-before-compact archive of the 2026-06-08 session The user explicitly requested the biggest in-depth report I can muster at 478,992 tokens (94% of context window). The next session will start with a fresh context; these two documents are the minimum-sufficient anchor. docs/reports/session_synthesis_20260608.md (579 lines, 40KB): - 12 sections covering every artifact this session produced - The 5 sources loaded: 2 YouTube transcripts + 2 Fleury articles + user's chunk-ideation archive - The 10 commits in the session's commit chain (with the user's test-fragility work adjacent but not mine) - The 4 audit-time heuristics derived from the 5-source lens - The "what the user should know" section for next session docs/reports/proposed_new_tracks_20260608.md (190 lines, 12KB): - 2 new tracks proposed (manual_ux_validation_20260608_PLACEHOLDER, chunkification_optimization_20260608_PLACEHOLDER) with spec-ready detail - 8 non-recommendations (so the user knows what I'm NOT suggesting) - A "what I'd recommend" section with one-tracks-when sequencing No code modified. Both are session-final artifacts, not tracks. They live in docs/reports/ alongside the other session outputs (SSDL digest, ASCII-sketch workflow, chunk ideation archive). Cross-references the 5 sources (all committed to docs/transcripts/ and docs/ideation/ in earlier user commits): - docs/transcripts/wo84LFzx5nI_big_oops_casemuratori.txt - docs/transcripts/i-h95QIGchY_assuming_as_much_as_possible_andrewreece.txt - docs/ideation/ed_chunk_data_structures_20260523.md - docs/reports/computational_shapes_ssdl_digest_20260608.md - docs/reports/ascii_sketch_ux_workflow_20260608.md These 5 documents are the session's "thinking-aid" corpus. The synthesis is the index; together they're the minimum-sufficient context to re-anchor any future session.	2026-06-08 22:25:00 -04:00
conductor-tier2	a9333bbb59	conductor(track-update): code_path_audit_20260607 - post-4-tracks timing + 5-source framing The user specified that the code_path_audit_20260607 track should run AFTER the 4 foundational tracks complete (qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor). This commit formalizes that timing and grounds the audit's analytical framing in the 5 sources loaded into context on 2026-06-08. 3 surgical additions to the spec/plan, no task changes: 1. Post-4-tracks timing (new section in spec.md §"Timing", plus a "Timing" callout in plan.md's opening): - The 4 tracks will significantly reshape src/ai_client.py, src/mcp_client.py, src/app_controller.py, and src/type_aliases.py - Running the audit on pre-refactor code would produce a report that's stale on day 1 - The post-4-tracks timing ensures the audit grounds optimization decisions for the resulting architecture - Pre-flight check: verify all 4 tracks are [x] completed in conductor/tracks.md before starting this track 2. Analytical framing (new section in spec.md §"Analytical Framing (5-source lens)"): - Maps each of the 5 sources (Fleury taxonomy + Fleury combinatoric + Muratori Big OOPs + Reece Assuming + user's chunk ideation) to specific audit-time heuristics - 4 concrete heuristics: effective-codepath count, entity-hierarchy fingerprint, assumed-too-much detector, chunkification candidates - The heuristics shape REPORT INTERPRETATION, not the static cost model (which stays data-grounded in EXPENSIVE_THRESHOLD + per-class weights) 3. See Also cross-references in spec.md (6 new entries): - nagent_review Pitfalls #2 and #4 (provider history globals + stateful singleton) - wo84LFzx5nI Big OOPs transcript (full text, 4310 segments, 200KB; loaded 2026-06-08) - i-h95QIGchY Assuming transcript (full text, 3719 segments, 162KB; loaded 2026-06-08) - ed_chunk_data_structures_20260523.md (5-image archive of user's chunk ideation, 19KB; saved 2026-06-08) - computational_shapes_ssdl_digest_20260608.md (the SSDL digest that synthesizes the 4-source computational-shapes thinking; the audit's tree/mermaid outputs ARE computational-shape visualizations) 4. tracks.md entry updated to include the spec/plan links and a brief status note that the audit is post-4-tracks. 5. plan.md has a "Timing" callout at the top stating the 4 tracks must ship before the plan executes. No code modified. The audit's tasks (Phases 1-6) are unchanged in structure; the new sections only add analytical context and timing constraints.	2026-06-08 22:05:54 -04:00
ed	2eef50c5c2	transcripts	2026-06-08 21:49:35 -04:00
ed	d7b66a5dda	ideating chunk-based data structures	2026-06-08 21:45:30 -04:00
ed	0be9b4f0fb	digest on computational shapes ssdl	2026-06-08 21:23:11 -04:00
ed	51ecace464	test(live_workflow): pre-flight health check fails fast on dirty state PR3 of the test_full_live_workflow_imgui_assert fix sequence. When a prior live_gui test in the same session crashes the GUI (e.g. via an ImGui IM_ASSERT from cumulative panel state), the controller's _io_pool gets shut down. The next test starts in a degraded state but only discovers this 120s later when its project switch times out with a confusing 'cannot schedule new futures after shutdown' error. This commit adds a /api/gui_health pre-flight check at the start of test_full_live_workflow. If the GUI is degraded, the test fails fast (within 1s) with a clear, actionable message that includes: - The exact RuntimeError that caused the degradation - The full traceback of the last ImGui scope mismatch - A note that the new test cannot proceed with a dirty state Per user feedback 2026-06-08: 'I don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails. Just has to note that the new file didn't get to deal with a dirty state.' Also includes the planning documents written earlier in this session: - TODO_test_full_live_workflow_v2.md (task list) - test_full_live_workflow_imgui_assert_20260608.md (root cause report) - test_full_live_workflow_propagation_digest_20260608.md (solutions digest) - batch_resilience_plan_20260608.md (batch resilience plan) Verification: - test_full_live_workflow in isolation: 13.45s PASS (health=True, no degrade) - 4 sims + test_full_live_workflow in batch: 76.46s (1 FAIL fast, 4 sims PASS) - Without PR3 fix: 200s FAIL with confusing 120s timeout - With PR3 fix: 76s FAIL with clear 'GUI is degraded' message - The fast-fail is observable, not silent (per user's 'wrap might be worth it if that properly lets us handle the assert')	2026-06-08 21:17:54 -04:00
conductor-tier2	8a597d1832	conductor(track-update): mcp_architecture_refactor - list_tool_schemas + security-as-contract 4 surgical additions to the spec, no task changes: 1. list_tool_schemas on the SubMCP Protocol: Added the method to §3.1 (The SubMCP Protocol). Per nagent_review Pitfall #6 (hard-coded tool discovery) and takeaway #5 (self-describing tools), each sub-MCP advertises its own capabilities via list_tool_schemas() rather than relying on a central registry. This is the equivalent of nagent's collect_bin_tool_descriptions per sub-MCP. The MCPController.get_tool_schemas() becomes a simple aggregator. 2. Security model is the contract: Added a new Important note to §3.3 (The 3-Layer Security Model). The 3 layers (Allowlist Construction -> Path Validation -> Resolution Gate, per docs/guide_mcp_client.md) are not just refactored - they are the CONTRACT between MCPController and the sub-MCPs. Sub-MCPs receive a pre-validated Path and trust it. They do NOT re-validate. The refactor is structural, not security-changing. 3. Docs touchpoint in Phase 7: Added the docs touchpoint to Phase 7 per the docs Refresh Protocol. The update to docs/guide_mcp_client.md should add a Sub-MCP Architecture section, link the list_tool_schemas pattern to 3-Layer Security Model, and cross-link the 3 new guides from the 2026-06-08 docs refresh. 4. See Also cross-references: Added 8 new entries to §12.2: - docs/guide_context_aggregation.md (FileItem consumer) - docs/guide_state_lifecycle.md (App state delegation) - docs/guide_discussions.md (23-operation matrix) - conductor/tracks/qwen_llama_grok_integration_20260606/ (Result return type coordination) - conductor/tracks/nagent_review_20260608/{report,takeaways}.md - (2 specific data_oriented_error_handling and data_structure_strengthening cross-refs) No plan.md changes.	2026-06-08 20:59:27 -04:00
conductor-tier2	1fb0d79c0d	conductor(track-update): data_structure_strengthening - HistoryMessage vs ProviderHistoryMessage split 4 surgical additions to the spec, no task changes: 1. ProviderHistoryMessage: Added a new alias to §3.1 (The Aliases). Per nagent_review Pitfall #4 (provider history divergence), the UI/curation layer (HistoryMessage, edited via disc_entries[i].content) and the SDK layer (ProviderHistoryMessage, the bytes actually replayed to the LLM) are distinct. Conflating them via a single alias perpetuates the bug. The new alias is documented as a separate concept with its own use sites (_anthropic_history, _deepseek_history, _minimax_history, _grok_history, _llama_history). The follow-up public_api_migration_20260606 track is the natural moment to unify the two layers; this spec just makes the distinction explicit. 2. FileItem alias points to the existing models.FileItem dataclass, not Metadata. Per docs/guide_context_aggregation.md (added 2026-06-08), FileItem is a 9-field dataclass (path, auto_aggregate, force_full, view_mode, selected, ast_signatures, ast_definitions, ast_mask, custom_slices, injected_at) with a __post_init__ normalizer. Aliasing it to dict[str, Any] would lose the type safety. The 9 other aliases remain dict aliases for round-trip compatibility. 3. gui_2.py and mcp_client.py as follow-up: Added a Note (dated 2026-06-08) to the Out of Scope section. The 23 lower-impact files (deferred) are dominated by gui_2.py (26+ weak sites per guide_state_lifecycle.md) and mcp_client.py (will be touched heavily by the parallel mcp_architecture_refactor_20260606). The deferral is correct but the follow-up should explicitly call out these two files as the next targets, rather than implying they're handled. 4. See Also cross-references: Added 7 new entries to §12.2: - docs/guide_models.md (FileItem dataclass source) - docs/guide_context_aggregation.md (FileItems consumer) - docs/guide_discussions.md (HistoryMessage shape) - docs/guide_state_lifecycle.md (state delegation) - conductor/tracks/mcp_architecture_refactor_20260606/ - conductor/tracks/nagent_review_20260608/{report,takeaways}.md No plan.md changes.	2026-06-08 20:50:50 -04:00
ed	1c565da7a0	feat(gui): wrap immapp.run in try/except + add /api/gui_health endpoint PR2 of the test_full_live_workflow_imgui_assert fix sequence. When an ImGui scope mismatch (IM_ASSERT(Missing End())) fires in immapp.run (e.g. after cumulative state corruption from prior sims' panel renders), the RuntimeError propagates out of app.run(). The controller's _io_pool gets shut down via __del__/finalization. The hook server (separate ThreadingHTTPServer) survives. Subsequent test clicks fail with 'cannot schedule new futures after shutdown' and the test times out after 120s with no clear signal of what went wrong. This commit: 1. Wraps immapp.run in try/except RuntimeError in gui_2.py:618. On assertion: logs the error to stderr (NOT silent), records it on controller._gui_degraded_reason and _last_imgui_assert, and returns from run() so the hook server keeps serving. 2. Adds _gui_degraded_reason and _last_imgui_assert to AppController.__init__ (initialized to None). 3. Adds /api/gui_health endpoint in api_hooks.py:148. Returns {healthy, degraded_reason, last_assert, io_pool_alive}. 4. Adds ApiHookClient.get_gui_health() with the matching unit tests (3 mocked tests + 1 live test). Per user feedback 2026-06-08: - The wrap does NOT silently swallow the error. It logs at ERROR level and surfaces it via the health endpoint. - Tests can call client.get_gui_health() to detect a degraded GUI and fail fast with a clear message. TDD: tests written first, confirmed to fail, then fix applied. 34/34 unit tests pass. 1/1 live test passes (live_gui health endpoint reports healthy=True on fresh subprocess).	2026-06-08 20:46:41 -04:00
conductor-tier2	0471440c68	conductor(track-update): data_oriented_error_handling - nagent_review + docs refresh 3 surgical additions to the spec, no task changes: 1. New ErrorKind: Added PROVIDER_HISTORY_DIVERGED_FROM_UI to the ErrorKind enum. Per nagent_review Pitfall #4 (provider history divergence: user edits disc_entries[i].content via the discussion UI but ai_client._<provider>_history still replays the original). The new kind makes the divergence detectable and reportable so the follow-up public_api_migration_20260606 track can collapse the two history layers. The Result pattern from this track is the natural carrier for the signal. 2. State-delegation regression tests: Added mandatory regression tests to the testing strategy in §6 for the ai_client refactor (highest-risk phase). The new tests exercise: - app.temperature = 0.5 round-trips through App.__getattr__/ __setattr__ delegation (per gui_2.py:666-675) - controller.disc_entries[i].content is reflected in the next send_result()'s messages parameter - The 3 per-provider history locks serialize correctly under concurrent send_result() calls The reason this is mandatory: per guide_state_lifecycle.md (added 2026-06-08), the App.__getattr__/__setattr__ pattern means a partial refactor manifests as silent AttributeError deep in test code, not at the refactor commit boundary. 3. See Also cross-references: Added 6 new entries to §12.3: - docs/guide_ai_client.md (per-provider history globals) - docs/guide_mcp_client.md (3-layer security model) - docs/guide_state_lifecycle.md (3 per-thread + 7-lock pattern) - docs/guide_discussions.md (23-operation matrix) - docs/guide_context_aggregation.md (build_discussion_section) - conductor/tracks/mcp_architecture_refactor_20260606/ - conductor/tracks/nagent_review_20260608/{report,takeaways}.md No plan.md changes. Plan tasks are task-level and will flow from the spec changes when the track is re-planned.	2026-06-08 20:41:00 -04:00
conductor-tier2	77ae2ec7a8	conductor(track-update): qwen_llama_grok - spec notes for nagent_review + docs refresh 4 surgical additions to the spec, no task changes: 1. Result return type: Added a coordination note in §3.1 (Data- Oriented Design) explaining that the shared send_openai_compatible helper should return Result[NormalizedResponse, ErrorInfo] from day 1, not NormalizedResponse + ProviderError raise. This is so the downstream data_oriented_error_handling_20260606 track is a small mechanical pass over new code, not a second migration. References nagent_review Pitfall #4 (provider history divergence) and the ErrorKind.PROVIDER_HISTORY_DIVERGED_FROM_UI use case. 2. Declarative read, not behavioral dispatch: Added clarification to §6 (UX Adaptation) that the capability matrix is a read of declarative data, not a new dispatch layer. Per nagent_review Pitfall #1 (opaque function calling in the Application is the correct choice; nagent-style protocol is for Meta-Tooling), UI elements are visible/enabled/disabled/hidden but the behavior they invoke is unchanged. Three concrete examples added: screenshot button, cost panel, cache panel. 3. PROVIDERS source of truth: Added a NOTE in §3.2 (Module Layout) that src/models.py:79-86 PROVIDERS is the existing single source of truth for the (vendor, model) enumeration. The capability registry reads from this constant rather than introducing a parallel list. Cross-references docs/guide_models.md. 4. Docs touchpoint: Expanded Phase 6 (Docs + Archive) in §9 to note that docs/guide_ai_client.md needs the new providers + the shared helper documented, and that docs/guide_context_aggregation.md (added 2026-06-08) is the reference for the aggregate.py pipeline that all new providers use. 5. See Also cross-references: Added 3 new entries to §13.2: - docs/guide_context_aggregation.md (the new pipeline guide) - conductor/tracks/nagent_review_20260608/report.md (§1, §5, §15) - conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md (§1, §2, §9) No plan.md changes. Plan tasks are task-level and will flow from the spec changes when the track is re-planned.	2026-06-08 20:35:52 -04:00
ed	d7a065e9d5	ascii gui comms worflow ideation	2026-06-08 20:32:42 -04:00
conductor-tier2	161ebb0da6	docs(fix): correct nav link case + relative-path level Gitea (and any case-sensitive filesystem) was rendering the [Top] nav links in /docs as broken because of two bugs: 1. Case-sensitivity: 22 links used '../README.md' (all-uppercase) but the actual file is 'docs/Readme.md' (capital R, lowercase rest). 21 guide_.md nav bars were affected, plus 1 internal cross-link in Readme.md itself. Works on Windows (case- insensitive) but broken on Linux/Gitea. Fix: 22 occurrences across 22 files changed '../README.md' -> '../Readme.md' 2. Wrong relative-path level: 16 links used '../../conductor/...' from 'docs/guide_.md' to reach 'conductor/'. This goes up 2 levels to 'projects/', which doesn't exist. The correct path from 'docs/guide_*.md' to 'conductor/' is 1 level up ('../conductor/...'). 12 unique patterns across 10 files affected. Fix: 16 occurrences across 10 files changed '../../conductor/' -> '../conductor/' 3. Bonus: 1 planned-guide link in guide_context_curation.md referenced a never-written 'guide_context_presets.md'. The ContextPreset schema is now fully covered in the new 'guide_context_aggregation.md' (per the 2026-06-08 docs refresh). Fix: link target updated. No content was changed, only link paths. 24 files, 37 link replacements, 37 deletions. Verification: - All .md links in docs/ now resolve to existing files (validated by path-resolution check from each file's directory) - The 3 new guides from the previous docs refresh commit (guide_discussions.md, guide_state_lifecycle.md, guide_context_aggregation.md) had the case bug inherited from guide_architecture.md's existing nav pattern; their top-of-file nav bars are now correct - The 21 pre-existing guide nav bars that had the same bug (all 21 of them, except the 3 that used the correct case: guide_mma.md, guide_simulations.md, guide_tools.md) are now also fixed - Inter-guide links (e.g. [Discussions](guide_discussions.md)) were not affected; they were always correct because both the link text and the actual filename are lowercase This is a docs-only fix. No code modified.	2026-06-08 19:51:55 -04:00
conductor-tier2	ba05168493	docs(refresh): 3 new guides + cross-links from nagent_review Per the docs Refresh Protocol (conductor/workflow.md), after a reference/analysis track ships, the affected guides must be updated to reflect new module structure or new conventions. The nagent_review track (`9cc51ca9`) produced a deep-dive + 10 actionable takeaways that named 3 documentation gaps in /docs. This commit fills them. 3 new guides (1,122 lines total): 1. guide_discussions.md (353 lines) — The Discussion system - 23-operation matrix: A1-A7 per-entry + B1-B11 discussion-level + C1-C5 undo/redo - Take naming convention (<base>_take_<n>), branching, promotion - User-managed role list (app.disc_roles) - Per-role filter linked to MMA persona focus - _disc_entries_lock thread-safety contract - Hook API session endpoints - Persistence: _flush_to_project, _flush_disc_entries_to_project, context_snapshot - 9 file:line refs into gui_2.py:3770-4260 + history.py 2. guide_state_lifecycle.md (375 lines) — Undo/redo + reset + state delegation - HistoryManager + UISnapshot (13 captured fields, 100-snapshot capacity, debounced change-detection at render frame) - _handle_reset_session (clears 30+ fields, replaces project, preserves active_project_path per the 2026-06-08 regression fix) - App.__getattr__/__setattr__ state delegation to Controller - 4-thread access pattern with 7 lock-protected regions - State persistence: in-memory vs project TOML vs config TOML - Hot-reload integration - Hook API registries (_predefined_callbacks, _gettable_fields) - 14 file:line refs into gui_2.py:1140-1170, history.py, app_controller.py:3286-3356 3. guide_context_aggregation.md (394 lines) — The aggregate.py pipeline - 3 aggregation strategies (auto, summarize, full) - 7 per-file view modes (full, summary, skeleton, outline, masked, custom, none) - Full FileItem schema (9 fields + __post_init__ normalizer) at models.py:510-559 - ContextPreset schema and ContextPresetManager - Tier 3 worker variant (build_tier3_context with FuzzyAnchor re-resolution and focus-file handling) - force_full / auto_aggregate short-circuits - Cache strategy (static prefix + dynamic history) - 23 file:line refs into aggregate.py:36-518 + models.py:909-937 8 existing guides cross-linked to the 3 new guides and to the nagent_review track: - guide_gui_2.md (+ See Also entries for discussions, state lifecycle, context aggregation, nagent_review report) - guide_app_controller.md (+ See Also entries for discussions, state lifecycle, context aggregation, nagent_review report) - guide_context_curation.md (+ new See Also section pointing to context aggregation + nagent_review) - guide_architecture.md (+ new See Also section listing all 10 guides + nagent_review report) - guide_ai_client.md (+ See Also entries for state lifecycle, context aggregation, nagent_review pitfalls #2 and #4) - guide_mma.md (+ new See Also section pointing to context aggregation, discussions, nagent_review report §9 + takeaways §3/§10 for SubConversationRunner priority) - guide_models.md (+ See Also entries for context aggregation, discussions, nagent_review report §6 on FileItem as strongest curation dimension) - Readme.md (+ 3 new guide entries in the index table, with one-line summaries) No code modified. This is documentation only. Why these 3 guides specifically: - guide_discussions.md: The discussion system is the user's most edited surface. nagent_review's report §3 enumerated 23 operations (A1-C5) that previously existed only as scattered file:line refs across gui_2.py. A dedicated guide makes the operation matrix discoverable. - guide_state_lifecycle.md: The undo/redo + reset + state delegation machinery is architecturally load-bearing but scattered across 4 files. After nagent_review identified the provider-side history divergence as Pitfall #4, the relationship between Manual Slop's state and the provider's state needs explicit documentation. - guide_context_aggregation.md: aggregate.py (518 lines) is the most-touched module after ai_client.py but had no dedicated guide. nagent_review confirmed it's Manual Slop's strongest curation dimension. A dedicated guide makes the 7 view modes and 3 strategies discoverable. The 3 new guides total 1,122 lines and follow the existing per-source-file deep-dive style (architectural, data-oriented, state-management-focused).	2026-06-08 19:26:08 -04:00
conductor-tier2	9cc51ca9af	conductor(track): nagent review - deep-dive + 6 pitfalls + 10 actionable takeaways Reference/analysis track. Produces 0 code changes. Artifacts (conductor/tracks/nagent_review_20260608/): - spec.md (240 lines) - track wrapper with Application/Meta-Tooling framing - report.md (571 lines) - 14-section deep-dive; primary deliverable - comparison_table.md (79 lines) - flat side-by-side reference - decisions.md (286 lines) - 10 future-track candidates with priority matrix - nagent_takeaways_20260608.md (363 lines) - 10 actionable patterns grounded in code (file:line refs into nagent source and Manual Slop source) - metadata.json (132 lines) - structured metadata + verification criteria - state.toml (113 lines) - per-task tracking + user-corrections log (7 entries) 14 nagent principles covered in report.md (durable work, text-in/text-out, editable state, visible protocol, the loop, per-file memory, repo history, neighborhoods, sub-conversations, controlled writes, large files, tool discovery, framework differences, build your own). 6 pitfalls (revised from 8 after user-corrections): 1. No structured output protocol in Application AI (opaque function calling) 2. Provider-specific history in process globals (ai_client._anthropic_history + _deepseek_history + _minimax_history) 3. RAG is not 'history as data' (fuzzy, not auditable) 4. AI client is a stateful singleton (2,685-line ai_client.py) 5. No non-MMA disposable sub-conversations (1:1 gap; user-flagged want) 6. Hard-coded tool discovery (45-tool if/elif in mcp_client.py) User-corrections applied (3 rounds, 7 total corrections recorded): - Editable discussions: PARTIAL -> PARITY (DIFFERENT FOCUS) with full A1-A7 per-entry + B1-B11 discussion-level + C1-C5 undo/redo operation matrix - Per-file memory: DOMAIN MISMATCH -> MANUAL SLOP IS STRONGER IN CURATION DIMENSION (FileItem + ContextPreset vs nagent's inode-keyed conversation log; complementary, not equivalent) - Sub-conversations: MMA has it; 1:1 does not -> 'PARITY for MMA; GAP for 1:1 discussions' (user wants this) - RAG: opt-in, not gap; user wants pre-staging via sub-conversation - Personas: config bundling (can opt out via AI settings) - Tool discovery: deferred (user has 'intent based DSL' idea but 'no where near that ideation yet') 10 actionable takeaways (separate from the 6 pitfalls - those are diagnosis, these are prescription): 1. State visibility (UI inspector for in-process state) 2. Readable conversation log (text-greppable, not just JSON-L) 3. Sub-agents for 1:1 (HIGH priority - user-flagged) 4. File-identity over file-path (st_dev:st_ino rename-safe) 5. One loop shape visible in diagnostics 6. Visible retry on protocol failure 7. Meta-Tooling DSL (intent-based, deferred) 8. Self-describing tools (subsumed by mcp_architecture_refactor_20260606) 9. Single source of truth for disc_entries + provider history 10. Sub-agent return type constraint (bake into candidate #1 spec) Domain classification: every recommendation tagged Application / Meta-Tooling / Both per docs/guide_meta_boundary.md. nagent lives in the Meta-Tooling domain; Manual Slop's Application AI is a different kind of thing. No code modified by this track (reference/analysis only). All 7 files parse cleanly (JSON, TOML, Markdown). All internal cross-links resolve. Track is 'active' awaiting human review; future-track candidates live in decisions.md and nagent_takeaways_20260608.md.	2026-06-08 18:44:35 -04:00
ed	c9a991bbb8	test(live_workflow): bump project switch wait timeout 30s -> 120s The 30s wait_for_project_switch timeout was an excessive constraint. In batch context, prior sims' AI discussion turn workers saturate the 8-worker io_pool, queueing this switch for tens of seconds. The other defensive waits in the test (warmup 60s, prior switch 60s) already use 60s+, so 30s was the inconsistent outlier. User confirmed: 'I think not completing in 30s is an excessive constraint if thats whats going on.' Verification: - test_full_live_workflow isolation: 11.69s PASS - 7-test batch (test_full_live_workflow + 4 extended sims + 2 markdown): 85.83s PASS	2026-06-08 18:14:18 -04:00
ed	87d7c5bff2	test(io_pool): update assertion for 8-worker pool size	2026-06-08 17:51:39 -04:00
ed	4a33848620	fix(io_pool): increase worker count from 4 to 8 to prevent test hangs Root cause: test_full_live_workflow in batch context (with prior sims running AI discussion turns) would queue its _do_project_switch behind the auto-pruner's scan of tests/logs/ (154MB, 6519 files). The 4-worker pool was saturated, so the switch would never run within 30s. Fix: bump IO_POOL_MAX_WORKERS from 4 to 8. This gives the pool enough capacity to run: 2 pruners + the project switch + 5 spare. Also: add /api/io_pool_status endpoint + get_io_pool_status + wait_io_pool_idle helpers (kept in api_hooks.py and api_hook_client.py for the test_api_hook_client_io_pool.py tests, even though the test itself no longer uses them - they remain useful for future tests that want to assert pool state directly). Also: add wait_for_warmup at the start of test_full_live_workflow to ensure SDK modules are loaded before AI ops. Test verification: - test_full_live_workflow in isolation: 11.83s PASS - test_full_live_workflow in batch (with 4 prior sims): 83.46s PASS - 30/30 related unit tests PASS	2026-06-08 17:49:34 -04:00
ed	9afc93bce2	fix(app_controller): clear project-switch state in _handle_reset_session When a prior test in the tier-3-live_gui batch leaves a _do_project_switch background thread running, the next test's btn_project_new_automated click sees _project_switch_in_progress=True (from the prior thread) and queues the new path via _project_switch_pending_path. The queued switch is never actually submitted to the io_pool, so is_project_stale() stays True and AI ops (_handle_generate_send) bail with 'project switch in progress; AI ops disabled'. Fix: _handle_reset_session now also clears _project_switch_in_progress, _project_switch_pending_path, and _project_switch_error (under the existing _project_switch_lock). This way, even if the prior background thread is still running, the controller reports an idle state and the new switch can be submitted normally. Also: - src/api_hook_client.py: reverted wait_for_project_switch to require in_progress=False (was relaxed to return on queued path, which misled the caller into thinking the switch was done) - tests/test_handle_reset_session_clears_project.py: new test test_handle_reset_session_clears_project_switch_state asserts is_project_stale() returns False after reset - tests/test_api_hook_client_wait_for_project_switch.py: updated test_wait_for_project_switch_does_not_return_on_queued (in_progress + matching path should keep waiting, not return early) - tests/test_live_workflow.py: added pre-wait for any in-flight switch before doing btn_reset (so the test waits up to 60s for the prior switch to complete if needed) - conductor/todos/TODO_test_full_live_workflow.md: updated Task 4 with the deeper hang analysis and recommended fix Known follow-up: test_full_live_workflow still hangs in tier-3 batch even with this fix, because the new _do_project_switch itself is hung in the io_pool (likely saturation from prior sims' AI discussion turn workers). Deeper investigation required.	2026-06-08 15:19:30 -04:00
ed	5087ee988d	chore: move TODO_test_full_live_workflow.md to conductor/todos/ Following the conductor convention of organizing track-related artifacts under conductor/. The TODO tracks the test_full_live_workflow race condition fix and its follow-up items (Tasks 3, 7 still pending; known batch hang documented). Tasks 1, 2 (with regression fix), 4, 5, 6 are SHIPPED in prior commits.	2026-06-08 14:05:40 -04:00
ed	3391e18f64	chore(pyproject): register pytest.mark.live marker Silences the PytestUnknownMarkWarning emitted by test_visual_mma.py and test_visual_sim_gui_ux.py (3 instances). The @pytest.mark.live mark already exists in the test files; pyproject.toml just didn't know about it. - pyproject.toml: added 'live: marks tests as live visualization tests (not in CI by default)' to [tool.pytest.ini_options].markers	2026-06-08 13:59:18 -04:00
ed	d09f70ea44	docs(todo): mark Tasks 4+5 as SHIPPED; note known batch hang issue	2026-06-08 13:37:13 -04:00
ed	b6972c31de	test(live_workflow): use wait_for_project_switch + defensive file check Replaces the 10x1s blind poll of derived state with a condition-based wait on /api/project_switch_status. Also adds a defensive file existence check that fails fast (within 5s) if the click was dropped or the project creation handler crashed. The new wait surfaces a clear error message ('Project switch did not complete in 30s. Last status: ...') instead of the generic 'Project failed to activate', and exposes _project_switch_error if the controller reported one. - tests/test_live_workflow.py: replaced poll loop (lines 57-65) with wait_for_project_switch + os.path.exists defensive check	2026-06-08 13:26:54 -04:00

1 2 3 4 5 ...

2816 Commits