Commit Graph

815 Commits

Author SHA1 Message Date
Ed_
008cfc355a wtf 2026-03-02 10:58:25 -05:00
Ed_
1329f859f7 wtf 2026-03-02 10:58:20 -05:00
Ed_
970b4466d4 conductor(tracks): remove deleted ux_sim_test artifact from tracks.md 2026-03-02 10:47:24 -05:00
Ed_
776d709246 chore: delete ux_sim_test_20260301 — test artifact from New Track form exercise 2026-03-02 10:47:14 -05:00
Ed_
c35f372f52 conductor(tracks): archive 3 completed tracks, update tracks.md with active/archived sections 2026-03-02 10:46:08 -05:00
Ed_
e7879f45a6 fix(test): replace fixed sleeps with polling in context_bleed test to fix ordering flake 2026-03-02 10:45:30 -05:00
Ed_
57efca4f9b fix(thread-safety): lock disc_entries reads/writes in HookServer, remove debug logs 2026-03-02 10:37:33 -05:00
Ed_
eb293f3c96 chore: config, layout, project history, simulation framework updates 2026-03-02 10:15:44 -05:00
Ed_
0b5552fa01 test(suite): update all tests for streaming/locking architecture and mock parity 2026-03-02 10:15:41 -05:00
Ed_
5de253b15b test(mock): major mock_gemini_cli rewrite — robust is_resume detection, tool triggers 2026-03-02 10:15:36 -05:00
Ed_
1df088845d fix(mcp): mcp_client refactor, claude_mma_exec update 2026-03-02 10:15:32 -05:00
Ed_
89e82f1134 fix(infra): api_hook_client debug logging, gemini_cli_adapter streaming fixes, ai_client minor 2026-03-02 10:15:28 -05:00
Ed_
fc9634fd73 fix(gui): move lock init before use, protect disc_entries with threading lock 2026-03-02 10:15:20 -05:00
Ed_
c14150fa81 oops 2026-03-01 23:47:06 -05:00
Ed_
fd37cbf87b pic 2026-03-01 23:46:45 -05:00
Ed_
9fb01ce5d1 feat(mma): complete Phase 6 and finalize Comprehensive GUI UX track
- Implement Live Worker Streaming: wire ai_client.comms_log_callback to Tier 3 streams
- Add Parallel DAG Execution using asyncio.gather for non-dependent tickets
- Implement Automatic Retry with Model Escalation (Flash-Lite -> Flash -> Pro)
- Add Tier Model Configuration UI to MMA Dashboard with project TOML persistence
- Fix FPS reporting in PerformanceMonitor to prevent transient 0.0 values
- Update Ticket model with retry_count and dictionary-like access
- Stabilize Gemini CLI integration tests and handle script approval events in simulations
- Finalize and verify all 6 phases of the implementation plan
2026-03-01 22:38:43 -05:00
Ed_
d1ce0eaaeb feat(gui): implement Phases 2-5 of Comprehensive GUI UX track
- Add cost tracking with new cost_tracker.py module
- Enhance Track Proposal modal with editable titles and goals
- Add Conductor Setup summary and New Track creation form to MMA Dashboard
- Implement Task DAG editing (add/delete tickets) and track-scoped discussion
- Add visual polish: color-coded statuses, tinted progress bars, and node indicators
- Support live worker streaming from AI providers to GUI panels
- Fix numerous integration test regressions and stabilize headless service
2026-03-01 20:17:31 -05:00
Ed_
2ce7a87069 feat(gui): Tier stream panels as separate dockable windows (Tier 1-4) 2026-03-01 15:57:46 -05:00
Ed_
a7903d3a4b conductor(plan): Mark tasks 1.2 and 1.3 complete — 8e57ae1 2026-03-01 15:49:32 -05:00
Ed_
8e57ae1247 feat(gui): Add blinking APPROVAL PENDING badge to MMA dashboard 2026-03-01 15:49:18 -05:00
Ed_
6999aac197 add readme splash 2026-03-01 15:44:40 -05:00
Ed_
05cd321aa9 conductor(plan): Mark task 'Task 1.1' as complete 3a68243 2026-03-01 15:28:51 -05:00
Ed_
3a68243d88 feat(gui): Replace single strategy box with 4-tier collapsible stream panels 2026-03-01 15:28:35 -05:00
Ed_
a7c8183364 conductor(plan): Mark simulation_hardening_20260301 all tasks complete
All 9 tasks done across 3 phases. Key fixes beyond spec:
- btn_approve_script wired (was implemented but not registered)
- pending_script_approval exposed in hook API
- mma_tier_usage exposed in hook API
- pytest-timeout installed
- Tier 3 subscription auth fixed (ANTHROPIC_API_KEY stripping)
- --dangerously-skip-permissions for headless workers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:32:25 -05:00
Ed_
90fc38f671 fix(sim): wire btn_approve_script and expose pending_script_approval in hook API
_handle_approve_script existed but was not registered in the click handler dict.
_pending_dialog (PowerShell confirmation) was invisible to the hook API —
only _pending_ask_dialog (MCP tool ask) was exposed.

- gui_2.py: register btn_approve_script -> _handle_approve_script
- api_hooks.py: add pending_script_approval field to mma_status response
- visual_sim_mma_v2.py: _drain_approvals handles pending_script_approval

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:31:32 -05:00
Ed_
5f661f76b4 fix(hooks): expose mma_tier_usage in /api/gui/mma_status; install pytest-timeout
- api_hooks.py: add mma_tier_usage to get_mma_status() response
- pytest-timeout 2.4.0 installed so mark.timeout(300) is enforced in CI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:26:03 -05:00
Ed_
63fa181192 feat(sim): add pytest timeout(300) and tier_usage Stage 9 check
Task 2.3: prevent infinite CI hangs with 300s hard timeout
Task 3.2: non-blocking Stage 9 logs mma_tier_usage after Tier 3 completes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:24:05 -05:00
Ed_
08734532ce test(mock): add standalone test for mock_gemini_cli routing
4 tests verify: epic prompt -> Track JSON, sprint prompt -> Ticket JSON
with correct field names, worker prompt -> plain text, tool-result -> plain text.
All pass in 0.57s.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:22:53 -05:00
Ed_
0593b289e5 fix(mock): correct sprint ticket format and add keyword detection
- description/status/assigned_to fields now match parse_json_tickets expectations
- Sprint planning branch also detects 'generate the implementation tickets'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:21:21 -05:00
Ed_
f7e417b3df fix(mma-exec): add --dangerously-skip-permissions for headless file writes
Tier 3 workers need to read/write files in headless mode. Without this
flag, all file tool calls are blocked waiting for interactive permission.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:20:38 -05:00
Ed_
36d464f82f fix(mma-exec): strip ANTHROPIC_API_KEY from subprocess env to use subscription login
When ANTHROPIC_API_KEY is set in the shell environment, claude --print
routes through the API key instead of subscription auth. Stripping it
forces the CLI to use subscription login for all Tier 3/4 delegation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:18:57 -05:00
Ed_
3f8ae2ec3b fix(conductor): load Tier 2 role doc in startup, add Tier 3 failure protocol
- Add step 1: read mma-tier2-tech-lead.md before any track work
- Add explicit stop rule when Tier 3 delegation fails (credit/API error)
  Tier 2 must NOT silently absorb Tier 3 work as a fallback

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:09:23 -05:00
Ed_
5cacbb1151 conductor(plan): Mark task 3.2 complete — sim test PASSED
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:04:57 -05:00
Ed_
ce5b6d202b fix(tier1): disable tools in generate_tracks, add enable_tools param to ai_client.send
Tier 1 planning calls are strategic — the model should never use file tools
during epic initialization. This caused JSON parse failures when the model
tried to verify file references in the epic prompt.

- ai_client.py: add enable_tools param to send() and _send_gemini()
- orchestrator_pm.py: pass enable_tools=False in generate_tracks()
- tests/visual_sim_mma_v2.py: remove file reference from test epic

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:04:44 -05:00
Ed_
c023ae14dc conductor(plan): Update task 3.1 complete, 3.2 awaiting verification 2026-03-01 13:42:52 -05:00
Ed_
89a8d9bcc2 test(sim): Rewrite visual_sim_mma_v2 for real Gemini API with frame-sync fixes
Uses gemini-2.5-flash-lite (real API, CLI quota exhausted). Adds _poll/_drain_approvals helpers, frame-sync sleeps after all state-changing clicks, proper stage transitions, and 120s timeouts for real API latency. Addresses simulation_hardening Issues 2 & 3.
2026-03-01 13:42:34 -05:00
Ed_
24ed309ac1 conductor(plan): Mark task 3.1 complete — Stage 8 assertions already correct 2026-03-01 13:26:15 -05:00
Ed_
0fe74660e1 conductor(plan): Mark Phase 2 complete, begin Phase 3 2026-03-01 13:25:24 -05:00
Ed_
a2097f14b3 fix(mma): Add Tier 1 and Tier 2 token tracking from comms log
Task 2.2 of mma_pipeline_fix_20260301: _cb_plan_epic captures comms baseline before generate_tracks() and pushes mma_tier_usage['Tier 1'] update via custom_callback. _start_track_logic does same for generate_tickets() -> mma_tier_usage['Tier 2'].
2026-03-01 13:25:07 -05:00
Ed_
2f9f71d2dc conductor(plan): Mark task 2.1 complete, begin 2.2 2026-03-01 13:22:34 -05:00
Ed_
3eefdfd29d fix(mma): Replace token stats stub with real comms log extraction in run_worker_lifecycle
Task 2.1 of mma_pipeline_fix_20260301: capture comms baseline before send(), then sum input_tokens/output_tokens from IN/response entries to populate engine.tier_usage['Tier 3'].
2026-03-01 13:22:15 -05:00
Ed_
d5eb3f472e conductor(plan): Mark task 1.4 as complete, begin Phase 2 2026-03-01 13:20:10 -05:00
Ed_
c5695c6dac test(mma): Add test verifying run_worker_lifecycle pushes response via _queue_put
Task 1.4 of mma_pipeline_fix_20260301: asserts stream_id='Tier 3 (Worker): T1', event_name='response', text and status fields correct.
2026-03-01 13:19:50 -05:00
Ed_
130a36d7b2 conductor(plan): Mark tasks 1.1, 1.2, 1.3 as complete 2026-03-01 13:18:09 -05:00
Ed_
b7c283972c fix(mma): Add diagnostic logging and remove unsafe asyncio.Queue else branches
Tasks 1.1, 1.2, 1.3 of mma_pipeline_fix_20260301:
- Task 1.1: Add [MMA] diagnostic print before _queue_put in run_worker_lifecycle; enhance except to include traceback
- Task 1.2: Replace unsafe event_queue._queue.put_nowait() else branches with RuntimeError in run_worker_lifecycle, confirm_execution, confirm_spawn
- Task 1.3: Verified run_in_executor positional arg order is correct (no change needed)
2026-03-01 13:17:37 -05:00
Ed_
cf7938a843 wrong archive location 2026-03-01 13:17:34 -05:00
Ed_
3d398f1905 remove main context 2026-03-01 10:26:01 -05:00
Ed_
52f3820199 conductor(gui_ux): Add Phase 6 — live streaming, per-tier model config, parallel DAG, auto-retry
Addresses three gaps where Claude Code and Gemini CLI outperform Manual Slop's
MMA during actual execution:

1. Live worker streaming: Wire comms_log_callback to per-ticket streams so
   users see real-time output instead of waiting for worker completion.
2. Per-tier model config: Replace hardcoded get_model_for_role with GUI
   dropdowns persisted to project TOML.
3. Parallel DAG execution: asyncio.gather for independent tickets (exploratory
   — _send_lock may block, needs investigation).
4. Auto-retry with escalation: flash-lite -> flash -> pro on BLOCKED, up to
   2 retries (wires existing --failure-count mechanism into ConductorEngine).

7 new tasks across Phase 6, bringing total to 30 tasks across 6 phases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-01 10:24:29 -05:00
Ed_
0b03b612b9 chore: Wire architecture docs into mma_exec.py and workflow delegation prompts
mma_exec.py changes:
- get_role_documents: Tier 1 now gets docs/guide_architecture.md + guide_mma.md
  (was: only product.md). Tier 2 gets same (was: only tech-stack + workflow).
  Tier 3 gets guide_architecture.md (was: only workflow.md — workers modifying
  gui_2.py had zero knowledge of threading model). Tier 4 gets guide_architecture.md
  (was: nothing).
- Tier 3 system directive: Added ARCHITECTURE REFERENCE callout, CRITICAL
  THREADING RULE (never write GUI state from background thread), TASK FORMAT
  instruction (follow WHERE/WHAT/HOW/SAFETY from surgical tasks), and
  py_get_definition to tool list.
- Tier 4 system directive: Added ARCHITECTURE REFERENCE callout and instruction
  to trace errors through thread domains documented in guide_architecture.md.

conductor/workflow.md changes:
- Red Phase delegation prompt: Replaced 'with a prompt to create tests' with
  surgical prompt format example showing WHERE/WHAT/HOW/SAFETY.
- Green Phase delegation prompt: Replaced 'with a highly specific prompt' with
  surgical prompt format example with exact line refs and API calls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-01 10:16:38 -05:00
Ed_
4e2003c191 chore(gemini): Encode surgical methodology into all Gemini MMA skills
Updates three Gemini skill files to match the Claude command methodology:

mma-orchestrator/SKILL.md:
- New Section 0: Architecture Fallback with links to all 4 docs/guide_*.md
- New Surgical Spec Protocol (6-point mandatory checklist)
- New Section 5: Cross-Skill Activation for tier transitions
- Example 2 rewritten with surgical prompt (exact line refs + API calls)
- New Example 3: Track creation with audit-first workflow
- Added py_get_definition to tool usage guidance

mma-tier1-orchestrator/SKILL.md:
- Added Architecture Fallback and Surgical Spec Protocol summary
- References activate_skill mma-orchestrator for full protocol

mma-tier2-tech-lead/SKILL.md:
- Added Architecture Fallback section
- Added Surgical Delegation Protocol with WHERE/WHAT/HOW/SAFETY example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-01 10:13:29 -05:00