manual_slop

Private

Public Access

Author	SHA1	Message	Date
ed	776d709246	chore: delete ux_sim_test_20260301 — test artifact from New Track form exercise	2026-03-02 10:47:14 -05:00
ed	c35f372f52	conductor(tracks): archive 3 completed tracks, update tracks.md with active/archived sections	2026-03-02 10:46:08 -05:00
ed	e7879f45a6	fix(test): replace fixed sleeps with polling in context_bleed test to fix ordering flake	2026-03-02 10:45:30 -05:00
ed	57efca4f9b	fix(thread-safety): lock disc_entries reads/writes in HookServer, remove debug logs	2026-03-02 10:37:33 -05:00
ed	eb293f3c96	chore: config, layout, project history, simulation framework updates	2026-03-02 10:15:44 -05:00
ed	0b5552fa01	test(suite): update all tests for streaming/locking architecture and mock parity	2026-03-02 10:15:41 -05:00
ed	5de253b15b	test(mock): major mock_gemini_cli rewrite — robust is_resume detection, tool triggers	2026-03-02 10:15:36 -05:00
ed	1df088845d	fix(mcp): mcp_client refactor, claude_mma_exec update	2026-03-02 10:15:32 -05:00
ed	89e82f1134	fix(infra): api_hook_client debug logging, gemini_cli_adapter streaming fixes, ai_client minor	2026-03-02 10:15:28 -05:00
ed	fc9634fd73	fix(gui): move lock init before use, protect disc_entries with threading lock	2026-03-02 10:15:20 -05:00
ed	c14150fa81	oops	2026-03-01 23:47:06 -05:00
ed	fd37cbf87b	pic	2026-03-01 23:46:45 -05:00
ed	9fb01ce5d1	feat(mma): complete Phase 6 and finalize Comprehensive GUI UX track - Implement Live Worker Streaming: wire ai_client.comms_log_callback to Tier 3 streams - Add Parallel DAG Execution using asyncio.gather for non-dependent tickets - Implement Automatic Retry with Model Escalation (Flash-Lite -> Flash -> Pro) - Add Tier Model Configuration UI to MMA Dashboard with project TOML persistence - Fix FPS reporting in PerformanceMonitor to prevent transient 0.0 values - Update Ticket model with retry_count and dictionary-like access - Stabilize Gemini CLI integration tests and handle script approval events in simulations - Finalize and verify all 6 phases of the implementation plan	2026-03-01 22:38:43 -05:00
ed	d1ce0eaaeb	feat(gui): implement Phases 2-5 of Comprehensive GUI UX track - Add cost tracking with new cost_tracker.py module - Enhance Track Proposal modal with editable titles and goals - Add Conductor Setup summary and New Track creation form to MMA Dashboard - Implement Task DAG editing (add/delete tickets) and track-scoped discussion - Add visual polish: color-coded statuses, tinted progress bars, and node indicators - Support live worker streaming from AI providers to GUI panels - Fix numerous integration test regressions and stabilize headless service	2026-03-01 20:17:31 -05:00
ed	2ce7a87069	feat(gui): Tier stream panels as separate dockable windows (Tier 1-4)	2026-03-01 15:57:46 -05:00
ed	a7903d3a4b	conductor(plan): Mark tasks 1.2 and 1.3 complete — `8e57ae1`	2026-03-01 15:49:32 -05:00
ed	8e57ae1247	feat(gui): Add blinking APPROVAL PENDING badge to MMA dashboard	2026-03-01 15:49:18 -05:00
ed	6999aac197	add readme splash	2026-03-01 15:44:40 -05:00
ed	05cd321aa9	conductor(plan): Mark task 'Task 1.1' as complete `3a68243`	2026-03-01 15:28:51 -05:00
ed	3a68243d88	feat(gui): Replace single strategy box with 4-tier collapsible stream panels	2026-03-01 15:28:35 -05:00
ed	a7c8183364	conductor(plan): Mark simulation_hardening_20260301 all tasks complete All 9 tasks done across 3 phases. Key fixes beyond spec: - btn_approve_script wired (was implemented but not registered) - pending_script_approval exposed in hook API - mma_tier_usage exposed in hook API - pytest-timeout installed - Tier 3 subscription auth fixed (ANTHROPIC_API_KEY stripping) - --dangerously-skip-permissions for headless workers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-01 14:32:25 -05:00
ed	90fc38f671	fix(sim): wire btn_approve_script and expose pending_script_approval in hook API _handle_approve_script existed but was not registered in the click handler dict. _pending_dialog (PowerShell confirmation) was invisible to the hook API — only _pending_ask_dialog (MCP tool ask) was exposed. - gui_2.py: register btn_approve_script -> _handle_approve_script - api_hooks.py: add pending_script_approval field to mma_status response - visual_sim_mma_v2.py: _drain_approvals handles pending_script_approval Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-01 14:31:32 -05:00
ed	5f661f76b4	fix(hooks): expose mma_tier_usage in /api/gui/mma_status; install pytest-timeout - api_hooks.py: add mma_tier_usage to get_mma_status() response - pytest-timeout 2.4.0 installed so mark.timeout(300) is enforced in CI Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-01 14:26:03 -05:00
ed	63fa181192	feat(sim): add pytest timeout(300) and tier_usage Stage 9 check Task 2.3: prevent infinite CI hangs with 300s hard timeout Task 3.2: non-blocking Stage 9 logs mma_tier_usage after Tier 3 completes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-01 14:24:05 -05:00
ed	08734532ce	test(mock): add standalone test for mock_gemini_cli routing 4 tests verify: epic prompt -> Track JSON, sprint prompt -> Ticket JSON with correct field names, worker prompt -> plain text, tool-result -> plain text. All pass in 0.57s. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-01 14:22:53 -05:00
ed	0593b289e5	fix(mock): correct sprint ticket format and add keyword detection - description/status/assigned_to fields now match parse_json_tickets expectations - Sprint planning branch also detects 'generate the implementation tickets' Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-01 14:21:21 -05:00
ed	f7e417b3df	fix(mma-exec): add --dangerously-skip-permissions for headless file writes Tier 3 workers need to read/write files in headless mode. Without this flag, all file tool calls are blocked waiting for interactive permission. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-01 14:20:38 -05:00
ed	36d464f82f	fix(mma-exec): strip ANTHROPIC_API_KEY from subprocess env to use subscription login When ANTHROPIC_API_KEY is set in the shell environment, claude --print routes through the API key instead of subscription auth. Stripping it forces the CLI to use subscription login for all Tier 3/4 delegation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-01 14:18:57 -05:00
ed	3f8ae2ec3b	fix(conductor): load Tier 2 role doc in startup, add Tier 3 failure protocol - Add step 1: read mma-tier2-tech-lead.md before any track work - Add explicit stop rule when Tier 3 delegation fails (credit/API error) Tier 2 must NOT silently absorb Tier 3 work as a fallback Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-01 14:09:23 -05:00
ed	5cacbb1151	conductor(plan): Mark task 3.2 complete — sim test PASSED Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-01 14:04:57 -05:00
ed	ce5b6d202b	fix(tier1): disable tools in generate_tracks, add enable_tools param to ai_client.send Tier 1 planning calls are strategic — the model should never use file tools during epic initialization. This caused JSON parse failures when the model tried to verify file references in the epic prompt. - ai_client.py: add enable_tools param to send() and _send_gemini() - orchestrator_pm.py: pass enable_tools=False in generate_tracks() - tests/visual_sim_mma_v2.py: remove file reference from test epic Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-01 14:04:44 -05:00
ed	c023ae14dc	conductor(plan): Update task 3.1 complete, 3.2 awaiting verification	2026-03-01 13:42:52 -05:00
ed	89a8d9bcc2	test(sim): Rewrite visual_sim_mma_v2 for real Gemini API with frame-sync fixes Uses gemini-2.5-flash-lite (real API, CLI quota exhausted). Adds _poll/_drain_approvals helpers, frame-sync sleeps after all state-changing clicks, proper stage transitions, and 120s timeouts for real API latency. Addresses simulation_hardening Issues 2 & 3.	2026-03-01 13:42:34 -05:00
ed	24ed309ac1	conductor(plan): Mark task 3.1 complete — Stage 8 assertions already correct	2026-03-01 13:26:15 -05:00
ed	0fe74660e1	conductor(plan): Mark Phase 2 complete, begin Phase 3	2026-03-01 13:25:24 -05:00
ed	a2097f14b3	fix(mma): Add Tier 1 and Tier 2 token tracking from comms log Task 2.2 of mma_pipeline_fix_20260301: _cb_plan_epic captures comms baseline before generate_tracks() and pushes mma_tier_usage['Tier 1'] update via custom_callback. _start_track_logic does same for generate_tickets() -> mma_tier_usage['Tier 2'].	2026-03-01 13:25:07 -05:00
ed	2f9f71d2dc	conductor(plan): Mark task 2.1 complete, begin 2.2	2026-03-01 13:22:34 -05:00
ed	3eefdfd29d	fix(mma): Replace token stats stub with real comms log extraction in run_worker_lifecycle Task 2.1 of mma_pipeline_fix_20260301: capture comms baseline before send(), then sum input_tokens/output_tokens from IN/response entries to populate engine.tier_usage['Tier 3'].	2026-03-01 13:22:15 -05:00
ed	d5eb3f472e	conductor(plan): Mark task 1.4 as complete, begin Phase 2	2026-03-01 13:20:10 -05:00
ed	c5695c6dac	test(mma): Add test verifying run_worker_lifecycle pushes response via _queue_put Task 1.4 of mma_pipeline_fix_20260301: asserts stream_id='Tier 3 (Worker): T1', event_name='response', text and status fields correct.	2026-03-01 13:19:50 -05:00
ed	130a36d7b2	conductor(plan): Mark tasks 1.1, 1.2, 1.3 as complete	2026-03-01 13:18:09 -05:00
ed	b7c283972c	fix(mma): Add diagnostic logging and remove unsafe asyncio.Queue else branches Tasks 1.1, 1.2, 1.3 of mma_pipeline_fix_20260301: - Task 1.1: Add [MMA] diagnostic print before _queue_put in run_worker_lifecycle; enhance except to include traceback - Task 1.2: Replace unsafe event_queue._queue.put_nowait() else branches with RuntimeError in run_worker_lifecycle, confirm_execution, confirm_spawn - Task 1.3: Verified run_in_executor positional arg order is correct (no change needed)	2026-03-01 13:17:37 -05:00
ed	cf7938a843	wrong archive location	2026-03-01 13:17:34 -05:00
ed	3d398f1905	remove main context	2026-03-01 10:26:01 -05:00
ed	52f3820199	conductor(gui_ux): Add Phase 6 — live streaming, per-tier model config, parallel DAG, auto-retry Addresses three gaps where Claude Code and Gemini CLI outperform Manual Slop's MMA during actual execution: 1. Live worker streaming: Wire comms_log_callback to per-ticket streams so users see real-time output instead of waiting for worker completion. 2. Per-tier model config: Replace hardcoded get_model_for_role with GUI dropdowns persisted to project TOML. 3. Parallel DAG execution: asyncio.gather for independent tickets (exploratory — _send_lock may block, needs investigation). 4. Auto-retry with escalation: flash-lite -> flash -> pro on BLOCKED, up to 2 retries (wires existing --failure-count mechanism into ConductorEngine). 7 new tasks across Phase 6, bringing total to 30 tasks across 6 phases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-01 10:24:29 -05:00
ed	0b03b612b9	chore: Wire architecture docs into mma_exec.py and workflow delegation prompts mma_exec.py changes: - get_role_documents: Tier 1 now gets docs/guide_architecture.md + guide_mma.md (was: only product.md). Tier 2 gets same (was: only tech-stack + workflow). Tier 3 gets guide_architecture.md (was: only workflow.md — workers modifying gui_2.py had zero knowledge of threading model). Tier 4 gets guide_architecture.md (was: nothing). - Tier 3 system directive: Added ARCHITECTURE REFERENCE callout, CRITICAL THREADING RULE (never write GUI state from background thread), TASK FORMAT instruction (follow WHERE/WHAT/HOW/SAFETY from surgical tasks), and py_get_definition to tool list. - Tier 4 system directive: Added ARCHITECTURE REFERENCE callout and instruction to trace errors through thread domains documented in guide_architecture.md. conductor/workflow.md changes: - Red Phase delegation prompt: Replaced 'with a prompt to create tests' with surgical prompt format example showing WHERE/WHAT/HOW/SAFETY. - Green Phase delegation prompt: Replaced 'with a highly specific prompt' with surgical prompt format example with exact line refs and API calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-01 10:16:38 -05:00
ed	4e2003c191	chore(gemini): Encode surgical methodology into all Gemini MMA skills Updates three Gemini skill files to match the Claude command methodology: mma-orchestrator/SKILL.md: - New Section 0: Architecture Fallback with links to all 4 docs/guide_*.md - New Surgical Spec Protocol (6-point mandatory checklist) - New Section 5: Cross-Skill Activation for tier transitions - Example 2 rewritten with surgical prompt (exact line refs + API calls) - New Example 3: Track creation with audit-first workflow - Added py_get_definition to tool usage guidance mma-tier1-orchestrator/SKILL.md: - Added Architecture Fallback and Surgical Spec Protocol summary - References activate_skill mma-orchestrator for full protocol mma-tier2-tech-lead/SKILL.md: - Added Architecture Fallback section - Added Surgical Delegation Protocol with WHERE/WHAT/HOW/SAFETY example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-01 10:13:29 -05:00
ed	52a463d13f	conductor: Encode surgical spec methodology into Tier 1 skills for Claude and Gemini Distills what made this session's track specs high-quality into reusable methodology for both Claude and Gemini Tier 1 orchestrators: Key additions to conductor-new-track.md: - MANDATORY Step 2: Deep Codebase Audit before writing any spec - 'Current State Audit' section template (Already Implemented + Gaps) - 6 rules for writing worker-ready tasks (WHERE/WHAT/HOW/SAFETY) - Anti-patterns section (vague specs, no line refs, no audit, etc.) - Architecture doc fallback references Key additions to mma-tier1-orchestrator.md (Claude + Gemini): - 'The Surgical Methodology' section with 6 protocols - Spec template with REQUIRED sections (Current State Audit is mandatory) - Plan template with REQUIRED task format (file:line refs + API calls) - Root cause analysis requirement for fix tracks - Cross-track dependency mapping requirement - Added py_get_definition to Gemini's tool list (was missing) The core insight: the quality gap between this session's output and previous track specs came from (1) reading actual code before writing specs, (2) listing what EXISTS before what's MISSING, and (3) specifying exact locations and APIs in tasks so lesser models don't have to search or guess. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-01 10:08:25 -05:00
ed	458529fb13	chore(conductor): Add index.md to new tracks, archive completed/superseded tracks - Add index.md to mma_pipeline_fix, simulation_hardening, context_token_viz - Archive documentation_refresh_20260224 (superseded by `08e003a` rewrite) - Archive robust_live_simulation_verification (context distilled into simulation_hardening_20260301 spec) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-01 10:00:49 -05:00
ed	0d2b6049d1	conductor: Create 3 MVP tracks with surgical specs from full codebase analysis Three new tracks identified by analyzing product.md requirements against actual codebase state using 1M-context Opus with all architecture docs loaded: 1. mma_pipeline_fix_20260301 (P0, blocker): - Diagnoses why Tier 3 worker output never reaches mma_streams in GUI - Identifies 4 root cause candidates: positional arg ordering, asyncio.Queue thread-safety violation, ai_client.reset_session() side effects, token stats stub returning empty dict - 2 phases, 6 tasks with exact line references 2. simulation_hardening_20260301 (P1, depends on pipeline fix): - Addresses 3 documented issues from robust_live_simulation session compression - Mock triggers wrong approval popup, popup state desync, approval ambiguity - 3 phases, 9 tasks including standalone mock test suite 3. context_token_viz_20260301 (P2): - Builds UI for product.md primary use case #2 'Context & Memory Management' - Backend already complete (get_history_bleed_stats, 140 lines) - Token budget bar, proportion breakdown, trimming preview, cache status - 3 phases, 10 tasks Execution order: pipeline_fix -> simulation_hardening -> gui_ux (parallel w/ token_viz) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-01 09:58:34 -05:00

1 2 3 4 5 ...

812 Commits