manual_slop

Private

Public Access

Author	SHA1	Message	Date
ed	7a946544ff	test(mma): mark test_visual_mma_components with clean_baseline	2026-06-09 17:14:23 -04:00
ed	e7da7e0d6a	test(rag): update test for Phase 4 coalescing state	2026-06-09 17:10:33 -04:00
ed	5656957622	conductor(plan): Phase 8 complete - docs + audit extended	2026-06-09 17:05:35 -04:00
ed	719fe9abe7	conductor(checkpoint): Checkpoint end of Phase 8	2026-06-09 17:04:17 -04:00
ed	cb525519cf	docs(testing): document _LiveGuiHandle + live_gui_workspace + clean_baseline marker	2026-06-09 17:03:26 -04:00
ed	749120d239	feat(audit): flag hardcoded workspace and project-root paths in tests	2026-06-09 17:01:14 -04:00
ed	d2ff6ffcf9	conductor(plan): Phase 7 complete - test_bed_health report	2026-06-09 16:59:16 -04:00
ed	84edb20038	docs(report): test_bed_health_20260609 - post-track batch status	2026-06-09 16:58:33 -04:00
ed	1cd3444e4c	test(rag): mark RAG tests with clean_baseline for batch isolation	2026-06-09 16:56:55 -04:00
ed	3ed52be4bf	conductor(plan): Phase 6 complete - clean_baseline marker	2026-06-09 16:42:48 -04:00
ed	7b87bbf5ec	feat(test): clean_baseline marker resets controller state before test	2026-06-09 16:40:18 -04:00
ed	afc8600800	conductor(plan): Phase 5 complete - set_value hook verified	2026-06-09 16:35:18 -04:00
ed	33d5caceaf	fix(api_hooks): verified set_value('ai_input') works in batch	2026-06-09 16:33:55 -04:00
ed	6764c9e12f	conductor(plan): Phase 4 complete - coalesce _sync_rag_engine	2026-06-09 16:27:15 -04:00
ed	b8fcd9d6f5	fix(rag): coalesce _sync_rag_engine calls via token + dirty flag	2026-06-09 16:25:44 -04:00
ed	45b4497a66	conductor(plan): Phase 3 complete - tmp_path_factory + live_gui_workspace fixture	2026-06-09 16:15:50 -04:00
ed	006bb11488	refactor(test): 5 test files use live_gui_workspace fixture instead of hardcoded path	2026-06-09 16:14:40 -04:00
ed	91313451a2	feat(test): expose live_gui_workspace as a separate fixture	2026-06-09 15:53:06 -04:00
ed	c64da95ef5	refactor(test): live_gui workspace via tmp_path_factory	2026-06-09 15:51:35 -04:00
ed	c32ae33817	wip: pre-Phase 3 checkpoint	2026-06-09 15:49:12 -04:00
ed	c3cb3c6e44	feat(test): autouse _check_live_gui_health recovers from degraded subprocess	2026-06-09 15:47:28 -04:00
ed	05ddb45236	conductor(plan): Phase 2 complete - FR1 handle + autouse fixture	2026-06-09 15:43:38 -04:00
ed	67d0211e56	feat(test): autouse _check_live_gui_health recovers from degraded subprocess	2026-06-09 15:42:00 -04:00
ed	16bd3d3a47	refactor(test): wrap live_gui subprocess in _LiveGuiHandle class	2026-06-09 15:37:47 -04:00
ed	30c04860c7	conductor(plan): Phase 1 audit complete - ready for user review	2026-06-09 15:30:31 -04:00
ed	5df22fa8d5	conductor(audit): trace set_value('ai_input') flow to find routing bug	2026-06-09 15:29:27 -04:00
ed	5e13fa9ba7	conductor(audit): document _sync_rag_engine race in controller	2026-06-09 15:29:17 -04:00
ed	aebbd66836	conductor(audit): document hardcoded workspace paths in test suite	2026-06-09 15:29:06 -04:00
ed	d1c6c6c327	conductor(audit): catalog live_gui test cross-file state dependencies	2026-06-09 15:28:56 -04:00
ed	fcb161fd2e	conductor(tracks): add test_infrastructure_hardening_20260609 as foundation track + supersede 4 placeholder test tracks	2026-06-09 15:18:20 -04:00
ed	566cf08cb8	conductor(track): test_infrastructure_hardening_20260609 - spec to kill the test regression nightmare	2026-06-09 15:15:26 -04:00
ed	b4d240a9f3	docs(rag): final report on dim-mismatch recursion fix	2026-06-09 15:04:42 -04:00
ed	40f905d14b	test(rag): update dim-mismatch test to assert rmtree behavior The fix in `644d88ab` changed the recovery path from client.delete_collection to shutil.rmtree (chromadb 1.5.x delete_collection is broken on corrupted state). The test still asserted the old behavior.	2026-06-09 14:50:55 -04:00
ed	644d88ab93	fix(rag): break recursion in _validate_collection_dim The wipe path called self._init_vector_store() which re-invoked _validate_collection_dim, causing infinite recursion (RecursionError) when the dim mismatch test ran with the mock embedding provider. Re-initialize the vector store INLINE after the rmtree wipe so the fresh collection is created without going through the validator again.	2026-06-09 14:47:01 -04:00
ed	f207d297a3	docs(rag): final fix report and next steps	2026-06-09 14:38:30 -04:00
ed	64bc04a6b8	fix(rag): wipe chroma dir on dim mismatch instead of delete_collection When the existing collection has embeddings from a different embedding provider (e.g. Gemini 3072-dim vs local 384-dim), the prior approach of calling client.delete_collection() fails with 'RustBindingsAPI object has no attribute bindings' in chromadb 1.5.x when the underlying state is corrupted. rmtree is reliable and re-creates a fresh empty collection. Also fixes: - 'The truth value of an empty array is ambiguous' on numpy 2.x by using try/except around len() instead of truthiness check - WinError 32 on rmtree by closing the chroma client first Verified: tests/test_rag_phase4_final_verify.py passes in isolation in 7.75s after this fix. The test still fails in batch context due to a separate io_pool race condition (multiple _sync_rag_engine calls collide when the test sets rag_enabled, rag_source, and rag_emb_provider in sequence). The race is in app_controller.py and is out of scope for this defensive fix. Note: tests/test_rag_engine.py has explicit unit tests for test_rag_collection_dim_mismatch_recreates_collection and test_rag_collection_dim_match_preserves_collection which exercise this code path.	2026-06-09 14:37:19 -04:00
conductor-tier2	ac0c0cbe73	docs(styleguide): add No-Diagnostic-Noise rule to AI-Agent Conventions One addition to conductor/code_styleguides/python.md §8 "AI-Agent Specific Conventions": - No diagnostic noise in production code (Added 2026-06-09). `sys.stderr.write(f"[XYZ_DIAG] ...") lines in src/.py are technical debt. The right place for one-time investigation output is tests/artifacts/<test>.diag.log (a log file) or a standalone /tmp/diag_<name>.py script. If you must instrument production code, the diag lines are part of the same atomic commit as the fix. - Test files ARE allowed to be diagnostic.* The rule applies to src/.py only; tests/test_.py may use print(..., file=sys.stderr) freely. Markdown only. No code modified.	2026-06-09 14:03:18 -04:00
conductor-tier2	631c40c9c4	docs(workflow): add Process Anti-Patterns section + Isolated-Pass rule Two additions to conductor/workflow.md §"Known Pitfalls": 1. Isolated-Pass Verification Fallacy (Added 2026-06-09) — the rule that a test passing in isolation but failing in batch is FAILING. The only verification that matters for live_gui tests is the batch run. This is the flip side of the existing "Live_gui Test Fragility (Authoring-Side)" rule. Cross-references that rule. 2. Process Anti-Patterns (Added 2026-06-09) — 8-rule summary list, with cross-reference to AGENTS.md for the full ruleset. The 8 patterns are: Deduction Loop, Report-Instead-of-Fix, Scope-Creep Track-Doc, Inherited-Cruft, Diagnostic Noise in Production, Premature Surrender, Verbose Commit Message, Isolated-Pass Verification Fallacy. Markdown only. No code modified. Cross-references AGENTS.md (the load-bearing agent doc) for the full text of each pattern.	2026-06-09 14:03:00 -04:00
conductor-tier2	d7dc1e3b90	docs(edit-workflow): fix set_file_slice rule + add contract-change check Three surgical fixes to conductor/edit_workflow.md: 1. §2 "Verify Before Editing" — removed the leftover `git checkout -- src/gui_2.py` instruction. The user's commit `4eba059e unfuck edit workflow` removed most of the git checkout nuke instructions but missed §2. The revised §2 now says: read the contract (function signature, yield shape, return type) before editing, and DO NOT use `git checkout` to revert. Ask the user. 2. §3 "Reading Before Editing" — added the line-number offset check. `set_file_slice` uses 1-indexed inclusive `start_line`/`end_line`; off-by-one is a common silent failure. The rule is now: confirm the exact line range with `get_file_slice` first. 3. §8 "set_file_slice IS Valid for Multi-Line Content (Revised 2026-06-09)" — replaced the wrong rule ("Do not use set_file_slice for multi-line content") with the correct rule: set_file_slice IS valid for 3-10 line surgical edits, with a tool-selection guide (which tool for which job), a mandatory contract-change check (search for callers of the symbol being changed; update all callers in the same atomic commit if the public interface changes), and a mandatory whitespace-and-EOL rule (preserve line ending, indentation, and line count). 4. §9 "No Diagnostic Noise in Production Code (Added 2026-06-09)" — new section. Diag stderr goes to log files or /tmp scripts, NOT src/.py. If you must add diag lines to production code, they are part of the same atomic commit as the fix — they do not live uncommitted in the working tree. 5. "If set_file_slice produces wrong indentation"* — new handler in the Step-by-Step Workflow. Tells the agent: you wrote the wrong indent; the tool did what you asked; re-read the file with get_file_slice; do NOT use git checkout to revert. These are the rule corrections the user demanded after the Tier-2's bad set_file_slice + git nuke + diag-noise behavior. Markdown only. No code modified.	2026-06-09 14:02:41 -04:00
conductor-tier2	113e68fe18	docs(agents): add Process Anti-Patterns section + revise set_file_slice rule The user explicitly called out the bad patterns the agents (Tier-2 and the parent session's Tier-1) have been exhibiting. This commit updates AGENTS.md to filter them out at the load-bearing agent doc level (the first file any agent reads). Three changes: 1. Revised the `set_file_slice` rule on line 38 of the Critical Anti-Patterns. The previous rule said "Do not use set_file_slice for multi-line content" — that was wrong. `set_file_slice` IS valid for multi-line content, provided the agent verifies the exact byte offsets with `get_file_slice` and checks for contract changes (function signature, yield shape, return type). The full revised rule is in `conductor/edit_workflow.md §8`. 2. Added "No diagnostic noise in production code" to the Critical Anti-Patterns. The pattern: agent adds `sys.stderr.write(f"[RAG_DIAG] ...") to src/.py` for debugging, then "reverts everything" but leaves the diag lines uncommitted. Next agent runs git status, sees the diag lines, either commits them by accident or spends 10 min cleaning them up. The rule: diag goes to log files or /tmp scripts, NOT src/.py. 3. Added "No loop, no scope-creep, no report-instead-of-fix" to the Critical Anti-Patterns. The 200-line status report is a confession, not a fix. The 5-phase "future track" document for a 1-line fix is scope-creep. The "I am not going to attempt another fix without your direction" surrender is allowed ONLY if the agent has already read-predicted-instrumented-run-captured. 4. Added a new section: "Process Anti-Patterns (Added 2026-06-09)" with 8 numbered anti-patterns, each with a Symptom, Rule, and reference. The 8 patterns are the ones the user explicitly called out: Deduction Loop, Report-Instead-of-Fix, Scope-Creep Track-Doc, Inherited-Cruft, Diagnostic Noise in Production, Premature Surrender, Verbose Commit Message, Isolated-Pass Verification Fallacy. These are the rules the user is filtering out of LLM training data noise. The full ruleset is the source of truth; AGENTS.md is the load-bearing entry point. No code modified. Markdown only.	2026-06-09 14:01:26 -04:00
ed	4eba059e89	unfuck edit workflow.	2026-06-09 13:48:17 -04:00
ed	eb8357ec0e	fix(rag): add CWD fallback in index_file for path-resolution resilience RAGEngine.index_file silently returns when the joined base_dir+file_path doesn't exist. This caused the RAG batch test to fail with 0 indexed documents when the live_gui subprocess's active_project_root resolved to a parent dir (e.g. tests/artifacts/) instead of the workspace (tests/artifacts/live_gui_workspace/). The fix: if the primary path doesn't exist, try CWD+file_path. The base_dir takes priority; CWD is a safety net for relative-path resolution across the spawn CWD boundary. This is a defensive fix at the rag_engine layer. It does NOT fix the underlying path-leakage issue in tests/conftest.py (hardcoded Path('tests/artifacts/live_gui_workspace')) which needs a proper fixture refactor. The RAG test still fails in batch due to that deeper issue, documented in docs/reports/rag_test_batch_failure_status_20260609_pm3.md. Behavior: - base_dir+file_path exists: indexed from base_dir (unchanged) - base_dir+file_path missing, CWD+file_path exists: indexed from CWD (new) - Both missing: silently returns (unchanged) Verified: tests/test_rag_index_file_path_fallback.py (3 tests, all pass) - test_index_file_finds_file_via_cwd_fallback - test_index_file_uses_base_dir_first - test_index_file_silently_returns_when_no_match Note: test file was removed before commit because it was being abandoned along with the broader path-hygiene refactor. The fix itself is preserved in src/rag_engine.py.	2026-06-09 12:31:21 -04:00
ed	b801b11c3b	conductor(todo): mark task 9 (test deps in dev + conftest gate) as shipped	2026-06-09 10:39:29 -04:00
ed	a341d7a7c8	test: ensure sentence-transformers is in test env + conftest gate	2026-06-09 10:37:14 -04:00
ed	2148e79a1c	docs(rag): document venv dep install + new failure mode (relative path bug) The venv now has sentence-transformers (installed via uv sync --extra local-rag). The RAG test passes in isolation (7.10s) but fails in batch with a NEW error: 'RAG context not found in history' (test_rag_phase4_final_verify.py:95). This is a SEPARATE bug from the missing-dep issue. The RAG test uses RELATIVE file paths ('final_test_1.txt' instead of absolute). The RAG engine indexes with these relative paths but the CWD is the project root, not the test's workspace dir. Result: 0 docs indexed, 0 chunks retrieved, no '## Retrieved Context' block in history. The fix to _sync_rag_engine (`e62266e8`) is still correct - it surfaces the error when the dep is missing. The dep is now installed, so the sync/index/AI flow runs to completion. The new failure is a deeper RAG test infrastructure bug that needs a separate track to fix.	2026-06-09 10:21:45 -04:00
ed	e62266e868	fix(rag): surface embedding provider init failure as 'error' status The bug: when the local embedding provider fails to initialize (e.g. sentence-transformers not installed), RAGEngine.__init__ leaves self.embedding_provider = None (initialized at line 93 but never overwritten by the failing LocalEmbeddingProvider ctor). The constructor returns. _sync_rag_engine's else branch then sets status to 'ready' - a lie. The RAG panel shows 'ready'. The user triggers a retrieval. The engine either has a broken embedding provider (None) or the retrieval fails silently. The RAG context never appears in the AI's history. The fix: in _sync_rag_engine's _task, after RAGEngine(...) returns, check if engine.embedding_provider is None. If so, set status to 'error: RAG embedding provider failed to initialize' and return early. This prevents: - The engine from being assigned to self.rag_engine - The rebuild being triggered - The status being set to 'ready' / 'indexing' Note: this does NOT make the RAG test pass. The test requires the sentence-transformers package which isn't installed in this env. The fix makes the failure reliable (not flaky) and surfaces the right error message. TDD: 3 tests added in tests/test_rag_engine_ready_status_bug.py: - RAGEngine ctor raises ImportError on missing sentence-transformers - _sync_rag_engine sets status to 'error' (not 'ready') on init failure - RAGEngine ctor leaves embedding_provider=None when init fails All 3 pass. The RAG batch test now fails reliably at line 46 with the clear error message.	2026-06-09 09:39:02 -04:00
conductor-tier2	adc7ff8029	docs(audit): workflow/agent markdown audit with 10 recommendations User asked: is there anything in our workflow or agent markdown that should be updated or introduced based on this session? This commit is the AUDIT ONLY. No workflow files are modified. The 10 recommendations are not yet applied. User picks which to act on, which to defer, which to discard. docs/reports/workflow_markdown_audit_20260608.md (~370 lines): Read all the workflow/agent markdown in scope (AGENTS.md, CLAUDE.md, GEMINI.md, all 5 .agents/skills//SKILL.md, the 4 .agents/agents/.md, conductor/workflow.md, product.md, product-guidelines.md, tech-stack.md, index.md, tracks.md, edit_workflow.md, the 2 existing code_styleguides/.md, and the 4 .agents/policies/.toml + 7 .agents/tools/*.json). Cross-referenced each against the 7 new session artifacts (nagent_review, 3 docs guides, ASCII-sketch workflow, SSDL digest, C11 interop v1+v2, 2 new tracks) and the 3 user-correction patterns (duffle-as-style-ref, v2 request/response model, "only under hard constraint"). The 10 recommendations: 1 (HIGH) Update architecture-fallback with new docs 2 (HIGH) Document ASCII-sketch workflow in workflow.md 3 (HIGH) Document SSDL digest in product-guidelines.md 4 (HIGH) Add user_corrections_log to State.toml Template 5 (MED) Document contingency track pattern 6 (MED) Update Compaction Recovery to reference session_synthesis 7 (MED) Document v1->v2 framing iteration anti-pattern 8 (MED) Document preserve-before-compact archive pattern 9 (LOW) Document MiniMax understand_image for ASCII verification 10 (LOW) Document per-proposal commit chain with git notes 4 HIGH-priority = ~75 min to act on. All 10 = ~2-3 hours. The audit is conservative: it does NOT recommend changing TDD, the per-task commit discipline, the 4-tier MMA model, product.md, tech-stack.md, the existing styleguides, or adding new audit scripts. The session did not surface conflicts with any of these. Meta-pattern: workflow/agent markdown is the theoretical contract; session artifacts are the empirical evidence; when the two diverge, update the theory to match the evidence. This session's evidence (new methodology, new vocabulary, new patterns, new anti-patterns) drives the 10 recommendations.	2026-06-09 09:15:57 -04:00
ed	37b9a68017	docs: add test_infra_hardening foundation + RAG batch failure status Foundation document for the future test_infra_hardening track that will address session-scoped live_gui fixture isolation, silent __getattr__/__setattr__ contract assumptions, and similar test infrastructure fragility. Also documents the test_rag_phase4_final_verify batch failure that surfaces after the __getattr__ fix unblocks test_full_live_workflow. The RAG test failure is NOT a regression - it reproduces on pre-fix HEAD too. It's a pre-existing test isolation issue (the live_gui fixture is session-scoped, so state from the 4 sims pollutes the controller).	2026-06-09 00:26:05 -04:00
ed	bcdc26d0bd	fix(gui): correct __getattr__ to not silently return None for missing ui_ attrs PR1 follow-up (the actual IM_ASSERT root cause fix). The IM_ASSERT in 'MainDockSpace' was triggered by the render_approve_script_modal function (gui_2.py:4895) calling imgui.checkbox with a None value for app.ui_approve_modal_preview. The chain of bugs: 1. AppController.__getattr__ returned None for ANY ui_ attribute (line 1237-1238). This was intended as a safety net for ui_* flags defined in __init__ but it was too généreux: it returned None for ui_ attrs that were NEVER set. 2. The pattern in render_approve_script_modal: if not hasattr(app, 'ui_approve_modal_preview'): app.ui_approve_modal_preview = False _, app.ui_approve_modal_preview = imgui.checkbox(..., app.ui_approve_modal_preview) relied on hasattr() returning False for unset attrs to trigger the initialization. But the App.__setattr__ checks hasattr(self.controller, name) to decide where to route assignments. The controller's __getattr__ returned None for ui_approve_modal_preview, so hasattr() returned True. The App.__setattr__ routed the assignment to the controller. The controller's __getattr__ then returned None on read, silently dropping the False value. 3. The next line called imgui.checkbox with None, which raised a TypeError. The TypeError propagated out of render_approve_script_modal without closing the modal, leaving the ImGui scope stack unbalanced. The unbalanced scope triggered IM_ASSERT(Missing End()) on the next frame. Fix: AppController.__getattr__ now only returns None for an EXPLICIT allowlist of ui_ attrs that are defined in __init__. For any other missing attribute (including the case 'hasattr() should return False'), it raises AttributeError. The App.__getattr__ was also fixed (per the test) to check hasattr(controller, name) before delegating. This is defense in depth in case other __getattr__ patterns are added. Test verification (TDD red → green): - 1/1 test_app_getattr_hasattr_bug PASSES (verifies hasattr returns False for unset attrs via App.__getattr__) - 1/1 test_app_controller_getattr_ui_bug PASSES (verifies hasattr returns False for unset ui_ attrs on controller) Live verification: - 4 sims + test_live_workflow + 2 markdown tests: 7/7 PASS in 83.15s - Previously failed at 200s+ with 'cannot schedule new futures after shutdown' / 121s with 'GUI is degraded before test starts' - Now passes cleanly. The IM_ASSERT no longer fires. 13/13 related unit tests pass (app_controller_* + app_run_* + app_getattr_*). No regressions in 51/51 io_pool/warmup/sigint/etc. unit tests.	2026-06-08 23:45:25 -04:00
conductor-tier2	999fdea467	docs(c11-interop): cross-reference SSDL digest in See Also The SSDL digest (docs/reports/computational_shapes_ssdl_digest_20260608.md, 504 lines, 30KB) is the theoretical foundation for the chunkification pattern. Per the digest's Technique 5 "Assume-away (Xar)" in §2.2 and the "Xar-style chunked arrays" recommendation in §5.2, the chunkification track is a direct application of the SSDL's "assume as much as possible" lens (§4). This commit adds the SSDL digest to the See Also of the v1+v2 C11-Python interop assessment (front-matter Cross-references line). The same cross-reference is also being added to: - conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md (in a new §6.1 "SSDL alignment" subsection) - conductor/tracks/manual_ux_validation_20260608_PLACEHOLDER/spec.md (in §5 Architectural Reference + §6 See Also + a new §2.6 "SSDL cross-reference" section that distinguishes GUI ASCII vocabulary from SSDL vocabulary) No code modified. Cross-reference only. Also: small update to conductor/tracks.md to add the 2 new tracks (manual_ux_validation_20260608_PLACEHOLDER as Active; chunkification_optimization_20260608_PLACEHOLDER as Backlog/Contingency).	2026-06-08 23:42:21 -04:00

1 2 3 4 5 ...