manual_slop

Private

Public Access

Author	SHA1	Message	Date
ed	457255bcd4	conductor(plan): mark t5_6 + phase_5 complete; advance to phase 6	2026-06-11 09:15:26 -04:00
ed	b75ae57ef2	docs(spec): footnote 8 remaining UX adaptations (2-9) deferred to follow-up After the end of Phase 5, only adaptation 1 of 9 from spec §6 was applied (Screenshot button iff vision, render_files_and_media:3030). The pattern is established; the remaining 8 are mechanical applications of the same pattern at their respective render sites. The follow-up track applies the wrapping at: - tools toggle (tool_calling) - cache panel (caching) - stream progress (streaming) - fetch models button (model_discovery) - token budget max (context_window) - cost panel (3 cost_tracking states: estimate / 'Free (local)' / '-') The _get_active_capabilities() helper (t5.1) is already in place.	2026-06-11 09:13:55 -04:00
ed	15b3b33081	docs(spec): footnote tool-loop lift follow-up in §13.1.B (in case context expires) As of end of Phase 4, only _send_minimax has a working tool-call loop. Phase 3 (Grok, Llama) and Phase 2 (Qwen) entry points are single-shot; they call send_openai_compatible once and return without executing tool_calls. If the user notices 'tool execution doesn't work for Qwen/Grok/Llama' after Phase 5 ships, the fix is to lift the tool loop into a shared run_with_tool_loop() helper that wraps send_openai_compatible. The 4 existing vendors (_send_anthropic / _send_gemini / _send_gemini_cli / _send_deepseek) already have the same inline duplication, so the lift would also help those. This is a follow-up track, not in scope for qwen_llama_grok_integration_20260606.	2026-06-11 09:04:54 -04:00
ed	ccdfaefd52	conductor(plan): mark Phase 4 fully complete (fix phase_4 SHA, t4_4 status, verification flags, minimax_refactor_stats, openai_compatible_models flag)	2026-06-11 08:57:35 -04:00
ed	fadb4c329b	conductor(plan): mark Phase 4 complete in qwen_llama_grok_integration_20260606	2026-06-11 02:25:36 -04:00
ed	94fe10089e	conductor(plan): mark t3.18 + phase_3 complete; advance to phase 4	2026-06-11 02:06:13 -04:00
ed	9be228f620	conductor(plan): fix duplicates in Phase 3 state; advance t3.18 (checkpoint)	2026-06-11 02:05:07 -04:00
ed	07bac1c6a7	conductor(plan): mark t3.3-t3.7 + t3.14-t3.17 complete (t3.4/t3.15 cancelled: no template)	2026-06-11 02:04:09 -04:00
ed	8e3543d875	docs(spec): revise 'best API per vendor' after Grok consultation Grok's own recommendation (consulted 2026-06-11): 'xAI (Grok) \| xAI official OpenAI-compatible (https://api.x.ai/v1) \| Fully compatible and clean. Supports Grok-2 + Grok-2-Vision. No meaningful unique native surface lost by using the compatible endpoint.' This REVERSES the earlier 'xAI native' correction. The OpenAI- compatible approach for Grok is the canonical full-featured path; the implementation in Phase 3 (OpenAI SDK with base_url=https://api.x.ai/v1 + send_openai_compatible helper) is correct as-is. Updates to the spec: 1. §3.1.1: replaced the 'use xAI native' decision with the confirmed per-vendor table. Qwen=Native, Grok=OpenAI-Compatible (per Grok's own confirmation), MiniMax=OpenAI-Compatible, DeepSeek=OpenAI- Compatible, Ollama=OpenAI-Compatible-in-v1 (native in v2), Meta Llama API=Native (new 4th backend, follow-up), Gemini=Native (follow-up), Anthropic=Native (follow-up). Also added Grok's recommended v2 matrix field expansion: audio, video, grounding, computer_use, local, reasoning/extended_thinking, web_search, x_search, code_execution, file_search, mcp_support, structured_output. 2. §4.3: reverted from 'Grok via xAI (Native REST API)' back to 'Grok via xAI (OpenAI-Compatible) - confirmed 2026-06-11'. The implementation does NOT need a native refactor; the OpenAI SDK at https://api.x.ai/v1 is the canonical approach. Removed the earlier 'caching: true' entry from the registry (since the OpenAI-compat shim doesn't expose prompt_cache_key) and the 'no persistent client' state struct (back to the OpenAI SDK pattern). 3. §13.1.B: renamed from 'Native Vendor APIs' to 'Llama Native APIs (Ollama native + Meta Llama API)' and removed the Grok native refactor item (Grok says OpenAI-compat is fine). Kept the Ollama native + Meta Llama API items + matrix expansion. Clarified that Grok tests do NOT need rewriting; only Llama tests get 2 more (native Ollama, Meta Llama API). Net effect: the Phase 3 work that just shipped (Grok+Llama Green using OpenAI-compat shim) is CORRECT as-is. The implementation matches Grok's actual recommendation. No code rollback needed.	2026-06-11 02:01:08 -04:00
ed	06716252f1	docs(spec): add 'best API per vendor' principle; mark xAI native as target; document follow-ups Three additions to the spec, per the user's architectural correction in this session: 1. NEW section 3.1.1: 'Architectural principle: Use the best API per vendor' — explains why the OpenAI-compatible shim loses vendor- specific features (xAI: prompt_cache_key, reasoning_effort, server- side tools, cost_in_usd_ticks; Ollama: think param, images array, thinking field, structured outputs) and states the principle: 'use each vendor's native SDK or REST API when one exists, falling back to OpenAI-compatible only when no native option exists.' Also notes that the capability matrix IS the aggregate tracker; future native features go into the matrix, and the GUI filters based on it (no per-vendor UI branches). 2. UPDATED section 4.3 (Grok): 'Grok via xAI (Native REST API)' — was 'OpenAI-Compatible'. Now specifies two native endpoints (/v1/chat/completions and /v1/responses), the native features that matter, the updated capability registry (caching=true for Grok via prompt_cache_key), and a 'Phase 3 placeholder behavior' note that this track's Phase 3 ships the OpenAI-compatible Grok as a placeholder. The native refactor is deferred to follow-up B. 3. UPDATED section 13.1: added follow-up track B 'Native Vendor APIs (post-OpenAI-compatible-placeholder)' which documents: - Grok → xAI native REST - Llama (Ollama) → native /api/chat - Llama (Meta Llama API) → new 4th backend (deferred pending verification of Meta's API spec; llama.developer.meta.com/docs/overview returned 400 on fetch this session) - Capability matrix expansion (web_search, x_search, code_execution, file_search, mcp_support, reasoning_effort, structured_output) - Test rewrites (mock requests.post instead of chat.completions.create) This is a docs-only commit; no code changes. The Phase 3 Green work continues with the OpenAI-compatible approach as planned in the existing Red tests (t3.3 Grok + t3.14 Llama), and the follow-up track B handles the native refactor when prioritized.	2026-06-11 01:49:36 -04:00
ed	891c008f0c	conductor(plan): mark t3.1-t3.2 + t3.8-t3.13 complete; advance to t3.3+t3.14 (Green)	2026-06-11 01:42:13 -04:00
ed	4204116c66	conductor(plan): mark t2.11 completed (Phase 2 checkpoint)	2026-06-11 01:36:44 -04:00
ed	4d70dcc7ce	conductor(plan): mark t2.11 + phase_2 complete; advance to phase 3	2026-06-11 01:35:22 -04:00
ed	45d316a0bd	conductor(plan): mark t2.6-t2.10 complete (t2.7 cancelled: no template); advance to t2.11	2026-06-11 01:34:25 -04:00
ed	3940eb36ac	conductor(plan): mark t2.1-t2.5 complete; advance to t2.6 (Green)	2026-06-11 00:53:58 -04:00
ed	d5373e8f94	conductor(plan): mark t1.12 + phase_1 complete; advance to phase 2	2026-06-11 00:48:14 -04:00
ed	67782198b6	conductor(plan): mark t1.11 (dashscope dep) complete; advance to t1.12	2026-06-11 00:46:18 -04:00
ed	f07e616c38	conductor(plan): mark t1.5-t1.10 complete; advance to t1.11	2026-06-11 00:41:11 -04:00
ed	6f11e7da14	conductor(plan): mark t1.1-t1.4 complete; advance to phase 1 in_progress	2026-06-11 00:31:57 -04:00
ed	49ac008a87	docs: replace 2 'fictional' usages with neutral phrasing (predates the refactor / was stale)	2026-06-10 23:34:33 -04:00
ed	e1287a4cf4	conductor(plan): prior_session_sepia_20260610 spec + design + metadata New track for prior-session sepia tint: - 3 new theme slots (prior_session_bg, prior_session_tint, prior_session_amount) - per-palette state dict mirroring _brightness/_contrast/_gamma - apply_prior_tint helper (float-only math per user requirement) - 6 prior-session render sites wrapped (2 bubble_vendor swaps + 4 tint wraps) - Theme Settings panel slider with persistence Code-block tonemap fix is OUT OF SCOPE (upstream imgui_bundle 1.92.5 API only exposes 4-value PaletteId enum, no per-instance struct). See spec §1.1.1 and design doc 'Honest constraint' section.	2026-06-10 23:00:29 -04:00
ed	2b0e17ef0c	conductor(track): add docs_sync_test_era_20260610 plan.md and spec.md These were authored at track start but missed by the final-state commit. They are the brief 1-2 page design intent and executable plan for the docs sync track. The closing report at docs/reports/docs_sync_test_era_20260610.md summarizes the actual 17-commit execution.	2026-06-10 20:25:32 -04:00
ed	da240577f9	conductor(track): close docs_sync_test_era_20260610 - state.toml: status active->completed, all 25 tasks marked complete with commit SHAs, all 4 phases checkpointed - metadata.json: status active->shipped, 17-commit list, all 9 verification criteria flipped to DONE	2026-06-10 20:24:31 -04:00
ed	5d2624526b	conductor(archive): move 4 test-hell lineage tracks to archive/ - workspace_path_finalize_20260609 -> archive/ (precursor track) - test_infrastructure_hardening_20260609 -> archive/ (main 8-phase track) - mma_tier_usage_reset_fix_20260610 -> archive/ (4 controller bug fixes) - rag_phase4_sync_fix_20260610 -> archive/ (RAG dim-mismatch + rag_config reset) The archive/ directory already existed (71+ archived tracks from earlier phases). The 4 tracks' state.toml + metadata.json were already closed in the prior commit. This just relocates the folders to match the convention referenced in tracks.md.	2026-06-10 20:12:50 -04:00
ed	1ea38ad16b	conductor(track): close 4 test-hell lineage tracks (state + metadata) - test_infrastructure_hardening_20260609: status active->completed, last_updated 2026-06-09->2026-06-10, t7_/t8_ tasks marked complete with commit SHAs (`84edb200`, `719fe9a`, `cb525519`) - mma_tier_usage_reset_fix_20260610: status spec->shipped - rag_phase4_sync_fix_20260610: status spec->shipped - workspace_path_finalize_20260609: status active->completed, current_phase 1->complete, all tasks marked complete (`c725270b`, `93ec2809`), verification flags flipped to true	2026-06-10 20:09:01 -04:00
ed	2c924fe6df	test(infra): poll-for-event race fixes + watchdog timeout bump + spec update	2026-06-10 15:14:35 -04:00
ed	80697e221a	conductor(checkpoint): RAG phase 4 sync fix + test assertion fix - track complete	2026-06-10 13:55:06 -04:00
ed	2ad0d6a3f0	conductor(plan): Update RAG sync fix track state - sync works, retrieval assertion is separate	2026-06-10 13:29:18 -04:00
ed	989b2e6835	conductor(plan): New track for RAG phase 4 sync fix	2026-06-10 12:45:56 -04:00
ed	1772fa8fc2	conductor(checkpoint): Final Phase 2 complete - FR1+FR2 re-applied, sim test passes in batch	2026-06-10 12:13:16 -04:00
ed	14a329c1a9	conductor(plan): Adjust track after catastrophic git checkout - FR1+FR2 reverted, FR3+FR4 were no-ops	2026-06-10 11:45:56 -04:00
ed	c729f8adaf	conductor(plan): Update spec/plan for Phase 2 (live_gui sim test fragility)	2026-06-10 10:12:09 -04:00
ed	e788512d93	conductor(plan): Mark mma_tier_usage_reset_fix_20260610 as complete	2026-06-10 09:59:26 -04:00
ed	d304af5d22	sigh	2026-06-10 08:34:46 -04:00
ed	39c97cb365	conductor(track): workspace_path_finalize_20260609 - plan with 3 phases, 4-step execution	2026-06-09 20:29:55 -04:00
ed	c725270b99	conductor(track): workspace_path_finalize_20260609 - per-run workspace under tests/artifacts/	2026-06-09 20:27:20 -04:00
ed	5656957622	conductor(plan): Phase 8 complete - docs + audit extended	2026-06-09 17:05:35 -04:00
ed	d2ff6ffcf9	conductor(plan): Phase 7 complete - test_bed_health report	2026-06-09 16:59:16 -04:00
ed	3ed52be4bf	conductor(plan): Phase 6 complete - clean_baseline marker	2026-06-09 16:42:48 -04:00
ed	afc8600800	conductor(plan): Phase 5 complete - set_value hook verified	2026-06-09 16:35:18 -04:00
ed	6764c9e12f	conductor(plan): Phase 4 complete - coalesce _sync_rag_engine	2026-06-09 16:27:15 -04:00
ed	45b4497a66	conductor(plan): Phase 3 complete - tmp_path_factory + live_gui_workspace fixture	2026-06-09 16:15:50 -04:00
ed	05ddb45236	conductor(plan): Phase 2 complete - FR1 handle + autouse fixture	2026-06-09 15:43:38 -04:00
ed	30c04860c7	conductor(plan): Phase 1 audit complete - ready for user review	2026-06-09 15:30:31 -04:00
ed	5df22fa8d5	conductor(audit): trace set_value('ai_input') flow to find routing bug	2026-06-09 15:29:27 -04:00
ed	5e13fa9ba7	conductor(audit): document _sync_rag_engine race in controller	2026-06-09 15:29:17 -04:00
ed	aebbd66836	conductor(audit): document hardcoded workspace paths in test suite	2026-06-09 15:29:06 -04:00
ed	d1c6c6c327	conductor(audit): catalog live_gui test cross-file state dependencies	2026-06-09 15:28:56 -04:00
ed	566cf08cb8	conductor(track): test_infrastructure_hardening_20260609 - spec to kill the test regression nightmare	2026-06-09 15:15:26 -04:00
conductor-tier2	5b3c11a0f3	conductor(track): manual_ux_validation_20260608_PLACEHOLDER - ASCII-sketch workflow + first-target redesign The user said (verbatim): "On number 1. I love the idea and definitely see poitental." This commit creates a full track that promotes the ASCII-sketch UX ideation workflow (docs/reports/ascii_sketch_ux_workflow_20260608.md, 340 lines) to a real track with a concrete first target. The track complements (does not replace) the existing manual_ux_validation_20260302 track (which is a general UX review track; this 2026-06-08 track is focused on the ASCII-sketch workflow specifically). Files (5 total, ~52KB, 12,000+ words): - spec.md (186 lines, 9 sections) - track design, 5 open questions, first target analysis, SSDL cross-reference - plan.md (~280 lines, 4 phases, 21 tasks) - TDD-style with WHERE/WHAT/HOW/SAFETY annotations - metadata.json (~120 lines) - structured metadata, 5 open questions with defaults, 5 SSDL principles available - state.toml (~95 lines) - per-task tracking + phase status - index.md (~50 lines) - track context + related docs Key design decisions captured: 1. Two distinct vocabularies are conflated at first glance: - GUI ASCII (the workflow) for panel sketches - SSDL (computational shapes digest) for internal code sketches Spec §2.6 makes the distinction explicit; both are useful for this track (GUI ASCII for Phase 2 design; SSDL for Phase 3 internal refactoring documentation). 2. The 5 open questions from the workflow report (Q1 vocabulary, Q2 comparison policy, Q3 storage location, Q4 tooling, Q5 frequency) are documented with sensible defaults in spec.md §2.1-2.5 and metadata.json. The user can override any of them; defaults pre-stage the work. 3. First target is src/gui_2.py:3770 render_discussion_entry (Discussion Hub per-entry panel). Rationale: - Most-edited surface (every AI/user message) - User has strong opinions (per nagent_review_20260608 3 rounds of corrections) - 23-op matrix A1-A7 is the source of truth - ImGui layout maps cleanly to ASCII - SSDL defusing techniques can guide the internal refactoring 4. 4 phases: 1=resolve 5 questions, 2=execute workflow on first target (1-3 ASCII rounds), 3=implement per design contract (TDD with 7 test files for A1-A7 operations), 4=document the pattern + propose 5-7 next targets. Cross-references added throughout: - docs/reports/computational_shapes_ssdl_digest_20260608.md (the SSDL digest, with explicit "this is a different vocabulary for a different purpose" note in spec §2.6) - docs/reports/ascii_sketch_ux_workflow_20260608.md (the workflow) - docs/guide_discussions.md (the 23-op matrix A1-A7) - conductor/tracks/nagent_review_20260608/ (the source of the user's editable-discussion corrections) - conductor/tracks/manual_ux_validation_20260302/ (complementary general UX review track) - conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/ (the contingency track; referenced in spec §2.6 SSDL cross-ref) No code modified. Track is active; Phase 1 (5 user-questions) is the current phase. User-confirmed worth doing in the prior turn.	2026-06-08 23:41:43 -04:00

1 2 3 4 5 ...

1090 Commits