manual_slop

Private

Public Access

Author	SHA1	Message	Date
ed	8742c977e7	docs(tracks): add status note to Qwen track entry pointing to follow-up Adds a status line to the qwen_llama_grok_integration_20260606 entry in conductor/tracks.md noting that: - Phases 1-5 are done; Phase 6 (docs) is in progress - The track is NOT being archived (per user directive) - A 5-phase follow-up track exists at conductor/tracks/qwen_llama_grok_followup_20260611/ - An audit report is at docs/reports/qwen_llama_grok_followup_audit_20260611.md - 50/79 tasks done; the remaining gaps are documented	2026-06-11 09:33:39 -04:00
ed	691dc584eb	docs(phase-6): update ai_client+models guides; report + follow-up track setup Phase 6 t6.1 + t6.2 (no archive per user directive): - docs/guide_ai_client.md: update Overview to mention 8 providers (was 5); add 'Shared OpenAI-Compatible Helper' section explaining src/openai_compatible.py (NormalizedResponse, OpenAICompatibleRequest, send_openai_compatible, usage pattern); document the Qwen adapter and Llama multi-backend. - docs/guide_models.md: update PROVIDERS list to 8 entries (was 5). - conductor/tracks.md: update the Qwen track entry to reflect '50/79 tasks done; Phase 6 in progress; NOT archiving - has follow-up'; add detailed status note pointing to the follow-up track + audit report. - docs/reports/qwen_llama_grok_followup_audit_20260611.md: NEW report explaining why a follow-up is needed (7 categories of gaps; the Tech Lead's 'footnote for now' failure mode; the lessons learned). - conductor/tracks/qwen_llama_grok_followup_20260611/: NEW follow-up track setup (spec.md, state.toml, metadata.json, TODO.md). 5 phases: tool loop lift, PROVIDERS move, UX adaptations 2-9, local-first + matrix v2, Anthropic/Gemini/DeepSeek migration. Phase 6 t6.3 (git mv to archive) and t6.4 (mark Recently Completed) are NOT applied per user directive: 'we can then doc this we're not archiving yet, if we have a follow up track I need this one to stay up because there is still alot todo'.	2026-06-11 09:33:18 -04:00
ed	457255bcd4	conductor(plan): mark t5_6 + phase_5 complete; advance to phase 6	2026-06-11 09:15:26 -04:00
ed	b75ae57ef2	docs(spec): footnote 8 remaining UX adaptations (2-9) deferred to follow-up After the end of Phase 5, only adaptation 1 of 9 from spec §6 was applied (Screenshot button iff vision, render_files_and_media:3030). The pattern is established; the remaining 8 are mechanical applications of the same pattern at their respective render sites. The follow-up track applies the wrapping at: - tools toggle (tool_calling) - cache panel (caching) - stream progress (streaming) - fetch models button (model_discovery) - token budget max (context_window) - cost panel (3 cost_tracking states: estimate / 'Free (local)' / '-') The _get_active_capabilities() helper (t5.1) is already in place.	2026-06-11 09:13:55 -04:00
ed	15b3b33081	docs(spec): footnote tool-loop lift follow-up in §13.1.B (in case context expires) As of end of Phase 4, only _send_minimax has a working tool-call loop. Phase 3 (Grok, Llama) and Phase 2 (Qwen) entry points are single-shot; they call send_openai_compatible once and return without executing tool_calls. If the user notices 'tool execution doesn't work for Qwen/Grok/Llama' after Phase 5 ships, the fix is to lift the tool loop into a shared run_with_tool_loop() helper that wraps send_openai_compatible. The 4 existing vendors (_send_anthropic / _send_gemini / _send_gemini_cli / _send_deepseek) already have the same inline duplication, so the lift would also help those. This is a follow-up track, not in scope for qwen_llama_grok_integration_20260606.	2026-06-11 09:04:54 -04:00
ed	ccdfaefd52	conductor(plan): mark Phase 4 fully complete (fix phase_4 SHA, t4_4 status, verification flags, minimax_refactor_stats, openai_compatible_models flag)	2026-06-11 08:57:35 -04:00
ed	fadb4c329b	conductor(plan): mark Phase 4 complete in qwen_llama_grok_integration_20260606	2026-06-11 02:25:36 -04:00
ed	94fe10089e	conductor(plan): mark t3.18 + phase_3 complete; advance to phase 4	2026-06-11 02:06:13 -04:00
ed	9be228f620	conductor(plan): fix duplicates in Phase 3 state; advance t3.18 (checkpoint)	2026-06-11 02:05:07 -04:00
ed	07bac1c6a7	conductor(plan): mark t3.3-t3.7 + t3.14-t3.17 complete (t3.4/t3.15 cancelled: no template)	2026-06-11 02:04:09 -04:00
ed	8e3543d875	docs(spec): revise 'best API per vendor' after Grok consultation Grok's own recommendation (consulted 2026-06-11): 'xAI (Grok) \| xAI official OpenAI-compatible (https://api.x.ai/v1) \| Fully compatible and clean. Supports Grok-2 + Grok-2-Vision. No meaningful unique native surface lost by using the compatible endpoint.' This REVERSES the earlier 'xAI native' correction. The OpenAI- compatible approach for Grok is the canonical full-featured path; the implementation in Phase 3 (OpenAI SDK with base_url=https://api.x.ai/v1 + send_openai_compatible helper) is correct as-is. Updates to the spec: 1. §3.1.1: replaced the 'use xAI native' decision with the confirmed per-vendor table. Qwen=Native, Grok=OpenAI-Compatible (per Grok's own confirmation), MiniMax=OpenAI-Compatible, DeepSeek=OpenAI- Compatible, Ollama=OpenAI-Compatible-in-v1 (native in v2), Meta Llama API=Native (new 4th backend, follow-up), Gemini=Native (follow-up), Anthropic=Native (follow-up). Also added Grok's recommended v2 matrix field expansion: audio, video, grounding, computer_use, local, reasoning/extended_thinking, web_search, x_search, code_execution, file_search, mcp_support, structured_output. 2. §4.3: reverted from 'Grok via xAI (Native REST API)' back to 'Grok via xAI (OpenAI-Compatible) - confirmed 2026-06-11'. The implementation does NOT need a native refactor; the OpenAI SDK at https://api.x.ai/v1 is the canonical approach. Removed the earlier 'caching: true' entry from the registry (since the OpenAI-compat shim doesn't expose prompt_cache_key) and the 'no persistent client' state struct (back to the OpenAI SDK pattern). 3. §13.1.B: renamed from 'Native Vendor APIs' to 'Llama Native APIs (Ollama native + Meta Llama API)' and removed the Grok native refactor item (Grok says OpenAI-compat is fine). Kept the Ollama native + Meta Llama API items + matrix expansion. Clarified that Grok tests do NOT need rewriting; only Llama tests get 2 more (native Ollama, Meta Llama API). Net effect: the Phase 3 work that just shipped (Grok+Llama Green using OpenAI-compat shim) is CORRECT as-is. The implementation matches Grok's actual recommendation. No code rollback needed.	2026-06-11 02:01:08 -04:00
ed	06716252f1	docs(spec): add 'best API per vendor' principle; mark xAI native as target; document follow-ups Three additions to the spec, per the user's architectural correction in this session: 1. NEW section 3.1.1: 'Architectural principle: Use the best API per vendor' — explains why the OpenAI-compatible shim loses vendor- specific features (xAI: prompt_cache_key, reasoning_effort, server- side tools, cost_in_usd_ticks; Ollama: think param, images array, thinking field, structured outputs) and states the principle: 'use each vendor's native SDK or REST API when one exists, falling back to OpenAI-compatible only when no native option exists.' Also notes that the capability matrix IS the aggregate tracker; future native features go into the matrix, and the GUI filters based on it (no per-vendor UI branches). 2. UPDATED section 4.3 (Grok): 'Grok via xAI (Native REST API)' — was 'OpenAI-Compatible'. Now specifies two native endpoints (/v1/chat/completions and /v1/responses), the native features that matter, the updated capability registry (caching=true for Grok via prompt_cache_key), and a 'Phase 3 placeholder behavior' note that this track's Phase 3 ships the OpenAI-compatible Grok as a placeholder. The native refactor is deferred to follow-up B. 3. UPDATED section 13.1: added follow-up track B 'Native Vendor APIs (post-OpenAI-compatible-placeholder)' which documents: - Grok → xAI native REST - Llama (Ollama) → native /api/chat - Llama (Meta Llama API) → new 4th backend (deferred pending verification of Meta's API spec; llama.developer.meta.com/docs/overview returned 400 on fetch this session) - Capability matrix expansion (web_search, x_search, code_execution, file_search, mcp_support, reasoning_effort, structured_output) - Test rewrites (mock requests.post instead of chat.completions.create) This is a docs-only commit; no code changes. The Phase 3 Green work continues with the OpenAI-compatible approach as planned in the existing Red tests (t3.3 Grok + t3.14 Llama), and the follow-up track B handles the native refactor when prioritized.	2026-06-11 01:49:36 -04:00
ed	891c008f0c	conductor(plan): mark t3.1-t3.2 + t3.8-t3.13 complete; advance to t3.3+t3.14 (Green)	2026-06-11 01:42:13 -04:00
ed	4204116c66	conductor(plan): mark t2.11 completed (Phase 2 checkpoint)	2026-06-11 01:36:44 -04:00
ed	4d70dcc7ce	conductor(plan): mark t2.11 + phase_2 complete; advance to phase 3	2026-06-11 01:35:22 -04:00
ed	45d316a0bd	conductor(plan): mark t2.6-t2.10 complete (t2.7 cancelled: no template); advance to t2.11	2026-06-11 01:34:25 -04:00
ed	3940eb36ac	conductor(plan): mark t2.1-t2.5 complete; advance to t2.6 (Green)	2026-06-11 00:53:58 -04:00
ed	d5373e8f94	conductor(plan): mark t1.12 + phase_1 complete; advance to phase 2	2026-06-11 00:48:14 -04:00
ed	67782198b6	conductor(plan): mark t1.11 (dashscope dep) complete; advance to t1.12	2026-06-11 00:46:18 -04:00
ed	f07e616c38	conductor(plan): mark t1.5-t1.10 complete; advance to t1.11	2026-06-11 00:41:11 -04:00
ed	6f11e7da14	conductor(plan): mark t1.1-t1.4 complete; advance to phase 1 in_progress	2026-06-11 00:31:57 -04:00
ed	7d6dbbd371	docs(conductor/index): fix guide count (23->27), update last-refresh date and add docs_sync_test_era_20260610 reference	2026-06-10 23:58:20 -04:00
ed	49ac008a87	docs: replace 2 'fictional' usages with neutral phrasing (predates the refactor / was stale)	2026-06-10 23:34:33 -04:00
ed	e1287a4cf4	conductor(plan): prior_session_sepia_20260610 spec + design + metadata New track for prior-session sepia tint: - 3 new theme slots (prior_session_bg, prior_session_tint, prior_session_amount) - per-palette state dict mirroring _brightness/_contrast/_gamma - apply_prior_tint helper (float-only math per user requirement) - 6 prior-session render sites wrapped (2 bubble_vendor swaps + 4 tint wraps) - Theme Settings panel slider with persistence Code-block tonemap fix is OUT OF SCOPE (upstream imgui_bundle 1.92.5 API only exposes 4-value PaletteId enum, no per-instance struct). See spec §1.1.1 and design doc 'Honest constraint' section.	2026-06-10 23:00:29 -04:00
ed	994ded3598	conductor(tracks): consolidate Phase 6+ chronology (3 recently completed + 4 in plan) The Phase 6+ section had two duplicate '### Active' headers, which made the chronology confusing. The user (paraphrased): preserve the chronology of project progress, don't need full detail, follow the previous restructure's lightweight pattern. Changes: - Add '### Recently Completed (2026-06-06 to 2026-06-10)' subsection containing the 3 closed tracks (startup_speedup, test_batching_refactor, test_infrastructure_hardening) with lightweight entries: per-phase commit SHAs only, 1-line summary, link to spec/plan/state folder. Trimmed the verbose per-sub-track commentary that was in the old startup_speedup entry (the per-sub-track bullets for warmup, status indicator, audit violations, post-shipping fixes are in the archive's spec/plan, not the tracks.md). - Remove the duplicate '### Active' header. - Update section intro to reflect '3 recently completed, 4 in plan' (was '2 already completed, 3 in plan'). - test_infrastructure_hardening entry now has phase commit SHAs (`5df22fa8`, `67d0211e`, `006bb114`, `b8fcd9d6`, `33d5cac`, `7b87bbf5`, `84edb200`, `719fe9a`) instead of just the closing-report link. Chronology is now visible at a glance; per-track full detail is in the linked archive/ folder.	2026-06-10 20:42:00 -04:00
ed	2b0e17ef0c	conductor(track): add docs_sync_test_era_20260610 plan.md and spec.md These were authored at track start but missed by the final-state commit. They are the brief 1-2 page design intent and executable plan for the docs sync track. The closing report at docs/reports/docs_sync_test_era_20260610.md summarizes the actual 17-commit execution.	2026-06-10 20:25:32 -04:00
ed	da240577f9	conductor(track): close docs_sync_test_era_20260610 - state.toml: status active->completed, all 25 tasks marked complete with commit SHAs, all 4 phases checkpointed - metadata.json: status active->shipped, 17-commit list, all 9 verification criteria flipped to DONE	2026-06-10 20:24:31 -04:00
ed	72b237457e	docs(guidelines): add Testing Requirements section with 4 standards - Structural Testing Contract (mirrors workflow.md) - Isolated-Pass Verification Fallacy (Lesson 1, with link to the test_infrastructure_hardening_batch_green_20260610 incident report that motivated the rule) - Audit Scripts as CI Gates (4 scripts: check_test_toml_paths, audit_main_thread_imports, audit_weak_types, audit_no_models_config_io) - Skip Markers Are Documentation, Not Avoidance (workflow.md policy)	2026-06-10 20:20:58 -04:00
ed	965e015709	docs(workflow): add 3 test-hell lessons to Known Pitfalls + Live_gui Test Fragility Known Pitfalls (new subsection): - HARD BAN: git checkout -- <file>, git restore, git reset (per AGENTS.md Critical Anti-Patterns; destroyed user in-progress edits twice on 2026-06-07; concrete 2026-06-10 incident: mma_tier_usage_reset_fix regression) Live_gui Test Fragility (2 new subsections): - Anti-pattern: push_event + time.sleep(N) + assert is a race. Fix: poll-until-state-visible with bounded retries. 5+ tests affected in 2026-06-10 batch-green wave. - Async setters need poll-for-state. mma_state_update and rag_* setters dispatch to _pending_gui_tasks queue; the setter returns before the GUI render loop processes the task. Assert immediately = race. Fix: poll via get_value with bounded retry.	2026-06-10 20:19:54 -04:00
ed	01ea22fc4a	docs(styleguide): add chroma_cache.md — chroma DB path and cleanup pattern Lesson 5 from the 4-day test-hell saga. The chroma cache lives at tests/artifacts/.slop_cache/chroma_<collection>/, NOT at the per-run live_gui_workspace_<timestamp>/ subdir. The trailing-slash bug in Path(active_project_path).parent places the cache one level higher than expected. RAG tests must pre-clean the cache to avoid persistent state from prior batched runs. Documents the cleanup pattern (shutil.rmtree with ignore_errors=True), the auto-recovery mechanism (_validate_collection_dim), and 3 anti-patterns (assuming per-run, not cleaning, asserting on first chunk in batched context).	2026-06-10 20:18:09 -04:00
ed	f0b7c8b7d6	conductor(index): add Test Infrastructure Hardening to Recently Shipped New entry at the top of the Recently Shipped list, linking to the archive/ folder. Includes: - 314/314 green across all 11 tier batches - FR1-FR5 summary - 3 lineage tracks also archived - The 4 unblocked tracks - Link to the closing batch-green report	2026-06-10 20:16:17 -04:00
ed	3945fe37fe	conductor(tracks): archive test_infrastructure_hardening_20260609 in tracks.md - Remove row 1 from Active Tracks table - Update rows 2-5, 17: test_infrastructure_hardening_20260609 -> '(merged)' - Mark test_infrastructure_hardening as [COMPLETE 2026-06-10] [archived] - Update link to use archive/ instead of tracks/ - Add closing note: 314/314 tests green, lineage tracks also archived	2026-06-10 20:15:18 -04:00
ed	5d2624526b	conductor(archive): move 4 test-hell lineage tracks to archive/ - workspace_path_finalize_20260609 -> archive/ (precursor track) - test_infrastructure_hardening_20260609 -> archive/ (main 8-phase track) - mma_tier_usage_reset_fix_20260610 -> archive/ (4 controller bug fixes) - rag_phase4_sync_fix_20260610 -> archive/ (RAG dim-mismatch + rag_config reset) The archive/ directory already existed (71+ archived tracks from earlier phases). The 4 tracks' state.toml + metadata.json were already closed in the prior commit. This just relocates the folders to match the convention referenced in tracks.md.	2026-06-10 20:12:50 -04:00
ed	1ea38ad16b	conductor(track): close 4 test-hell lineage tracks (state + metadata) - test_infrastructure_hardening_20260609: status active->completed, last_updated 2026-06-09->2026-06-10, t7_/t8_ tasks marked complete with commit SHAs (`84edb200`, `719fe9a`, `cb525519`) - mma_tier_usage_reset_fix_20260610: status spec->shipped - rag_phase4_sync_fix_20260610: status spec->shipped - workspace_path_finalize_20260609: status active->completed, current_phase 1->complete, all tasks marked complete (`c725270b`, `93ec2809`), verification flags flipped to true	2026-06-10 20:09:01 -04:00
ed	2c924fe6df	test(infra): poll-for-event race fixes + watchdog timeout bump + spec update	2026-06-10 15:14:35 -04:00
ed	80697e221a	conductor(checkpoint): RAG phase 4 sync fix + test assertion fix - track complete	2026-06-10 13:55:06 -04:00
ed	2ad0d6a3f0	conductor(plan): Update RAG sync fix track state - sync works, retrieval assertion is separate	2026-06-10 13:29:18 -04:00
ed	989b2e6835	conductor(plan): New track for RAG phase 4 sync fix	2026-06-10 12:45:56 -04:00
ed	1772fa8fc2	conductor(checkpoint): Final Phase 2 complete - FR1+FR2 re-applied, sim test passes in batch	2026-06-10 12:13:16 -04:00
ed	14a329c1a9	conductor(plan): Adjust track after catastrophic git checkout - FR1+FR2 reverted, FR3+FR4 were no-ops	2026-06-10 11:45:56 -04:00
ed	c729f8adaf	conductor(plan): Update spec/plan for Phase 2 (live_gui sim test fragility)	2026-06-10 10:12:09 -04:00
ed	e788512d93	conductor(plan): Mark mma_tier_usage_reset_fix_20260610 as complete	2026-06-10 09:59:26 -04:00
ed	d304af5d22	sigh	2026-06-10 08:34:46 -04:00
ed	93ec28097c	docs(styleguide): add workspace_paths.md — hard rule for test workspace paths	2026-06-09 20:36:41 -04:00
ed	39c97cb365	conductor(track): workspace_path_finalize_20260609 - plan with 3 phases, 4-step execution	2026-06-09 20:29:55 -04:00
ed	c725270b99	conductor(track): workspace_path_finalize_20260609 - per-run workspace under tests/artifacts/	2026-06-09 20:27:20 -04:00
ed	5656957622	conductor(plan): Phase 8 complete - docs + audit extended	2026-06-09 17:05:35 -04:00
ed	d2ff6ffcf9	conductor(plan): Phase 7 complete - test_bed_health report	2026-06-09 16:59:16 -04:00
ed	3ed52be4bf	conductor(plan): Phase 6 complete - clean_baseline marker	2026-06-09 16:42:48 -04:00
ed	afc8600800	conductor(plan): Phase 5 complete - set_value hook verified	2026-06-09 16:35:18 -04:00

1 2 3 4 5 ...

1441 Commits