refinement of upcoming tracks

2026-03-06 15:41:33 -05:00
parent 3ce6a2ec8a
commit fca40fd8da
24 changed files with 2388 additions and 391 deletions
@@ -1,29 +1,144 @@
 # Implementation Plan: Cache Analytics Display (cache_analytics_20260306)

-## Phase 1: Cache Stats Integration
- [ ] Task: Initialize MMA Environment
- [ ] Task: Verify get_gemini_cache_stats
-    - WHERE: src/ai_client.py
-    - WHAT: Check existing function
-    - HOW: Review implementation
+> **Reference:** [Spec](./spec.md) | [Architecture Guide](../../../docs/guide_architecture.md)
+
+## Phase 1: Foundation & Research
+Focus: Verify existing infrastructure and set up panel structure
+
+- [ ] Task 1.1: Initialize MMA Environment
+    - Run `activate_skill mma-orchestrator` before starting
+
+- [ ] Task 1.2: Verify ai_client.get_gemini_cache_stats()
+    - WHERE: `src/ai_client.py` lines ~200-230
+    - WHAT: Confirm function exists and returns expected dict structure
+    - HOW: Use `manual-slop_py_get_definition` on `get_gemini_cache_stats`
+    - OUTPUT: Document exact return dict structure in task notes
+
+- [ ] Task 1.3: Verify ai_client.cleanup()
+    - WHERE: `src/ai_client.py` line ~220
+    - WHAT: Confirm function clears `_gemini_cache`
+    - HOW: Use `manual-slop_py_get_definition` on `cleanup`
+    - SAFETY: Note that cleanup() also clears other resources - document side effects

 ## Phase 2: Panel Implementation
- [ ] Task: Create cache panel
-    - WHERE: src/gui_2.py
-    - WHAT: Display cache stats
-    - HOW: imgui widgets
-    - SAFETY: Handle no cache
+Focus: Create the GUI panel structure

-## Phase 3: TTL & Controls
- [ ] Task: Display TTL countdown
-    - WHERE: src/gui_2.py
-    - WHAT: Time until cache rebuild
-    - HOW: Calculate from cache_created_at
- [ ] Task: Add clear button
-    - WHERE: src/gui_2.py
-    - WHAT: Manual cache clear
-    - HOW: Call ai_client.cleanup()
+- [ ] Task 2.1: Add cache panel state variables
+    - WHERE: `src/gui_2.py` in `App.__init__` (around line 170)
+    - WHAT: Add minimal state if needed (likely none - read directly from ai_client)
+    - HOW: Follow existing state initialization pattern
+    - CODE STYLE: 1-space indentation
+    - SAFETY: No locks needed - read-only access to ai_client globals

-## Phase 4: Verification
- [ ] Task: Test cache display
- [ ] Task: Conductor - Phase Verification
+- [ ] Task 2.2: Create _render_cache_panel() method
+    - WHERE: `src/gui_2.py` after other `_render_*_panel()` methods
+    - WHAT: New method that displays cache statistics
+    - HOW: 
+      ```python
+      def _render_cache_panel(self) -> None:
+       if self.current_provider != "gemini":
+        return
+       if not imgui.collapsing_header("Cache Analytics"):
+        return
+       stats = ai_client.get_gemini_cache_stats()
+       # Render stats...
+      ```
+    - CODE STYLE: 1-space indentation, NO COMMENTS
+
+- [ ] Task 2.3: Integrate panel into main GUI
+    - WHERE: `src/gui_2.py` in `_gui_func()` method
+    - WHAT: Call `_render_cache_panel()` in appropriate location
+    - HOW: Add near token budget panel or in settings area
+    - SAFETY: Ensure not called during modal dialogs
+
+## Phase 3: Statistics Display
+Focus: Implement the visual statistics rendering
+
+- [ ] Task 3.1: Implement cache status display
+    - WHERE: `src/gui_2.py` in `_render_cache_panel()`
+    - WHAT: Show cache existence and age
+    - HOW:
+      - `imgui.text(f"Cache Active: {stats['cache_exists']}")`
+      - `imgui.text(f"Age: {format_age(stats['cache_age_seconds'])}")`
+    - HELPER: Create `format_age(seconds: float) -> str` helper
+    - CODE STYLE: 1-space indentation
+
+- [ ] Task 3.2: Implement TTL countdown display
+    - WHERE: `src/gui_2.py` in `_render_cache_panel()`
+    - WHAT: Show remaining TTL with percentage
+    - HOW:
+      - `remaining = stats['ttl_remaining']`
+      - `percentage = (remaining / stats['ttl_seconds']) * 100`
+      - Use `imgui.progress_bar()` for visual
+    - VISUAL: Warning color when percentage < 20% (use `vec4` colors defined at top of file)
+
+- [ ] Task 3.3: Implement requests counter (optional enhancement)
+    - WHERE: `src/gui_2.py` in `_render_cache_panel()`
+    - WHAT: Show estimated "requests while cache active"
+    - HOW: Add `self._cache_request_count: int = 0` state, increment on send
+    - SAFETY: This requires hooking into send flow - may skip if complex
+
+## Phase 4: Manual Controls
+Focus: Implement the cache clear functionality
+
+- [ ] Task 4.1: Add clear cache button
+    - WHERE: `src/gui_2.py` in `_render_cache_panel()`
+    - WHAT: Button that clears the Gemini cache
+    - HOW:
+      ```python
+      if imgui.button("Clear Cache"):
+       ai_client.cleanup()
+       # Optionally set a flag to show "Cache cleared" message
+      ```
+    - SAFETY: `cleanup()` clears ALL caches, not just Gemini - document this
+
+- [ ] Task 4.2: Add clear confirmation/feedback
+    - WHERE: `src/gui_2.py` in `_render_cache_panel()`
+    - WHAT: Show feedback after clear
+    - HOW: Add `self._cache_just_cleared: bool = False` flag, show message, auto-clear after 3s
+    - CODE STYLE: 1-space indentation
+
+## Phase 5: Testing
+Focus: Verify all functionality works correctly
+
+- [ ] Task 5.1: Write unit tests for cache panel
+    - WHERE: `tests/test_cache_panel.py` (new file)
+    - WHAT: Test panel visibility, stats display, clear button
+    - HOW: Use `mock_app` or `app_instance` fixture from `conftest.py`
+    - PATTERN: Follow `test_gui_phase4.py` as reference
+    - CODE STYLE: 1-space indentation
+
+- [ ] Task 5.2: Write integration test with live_gui
+    - WHERE: `tests/test_cache_panel.py`
+    - WHAT: Test with actual Gemini provider (or mock)
+    - HOW: Use `live_gui` fixture, set provider to 'gemini', verify panel shows
+    - ARTIFACTS: Write to `tests/artifacts/`
+
+- [ ] Task 5.3: Conductor - Phase Verification
+    - Run targeted tests: `uv run pytest tests/test_cache_panel.py -v`
+    - Verify no lint errors in modified files
+    - Manual visual verification in GUI
+
+## Implementation Notes
+
+### Thread Safety Analysis
+- `ai_client.get_gemini_cache_stats()` reads `_gemini_cache` and `_gemini_cache_created_at`
+- These are set on the asyncio worker thread during `_send_gemini()`
+- Reading from GUI thread is safe (atomic types) but values may be slightly stale
+- No locks required for display purposes
+
+### Code Style Checklist
+- [ ] 1-space indentation throughout
+- [ ] CRLF line endings on Windows
+- [ ] No comments unless explicitly documenting complex logic
+- [ ] Type hints on all new functions
+- [ ] Follow existing `vec4` color naming pattern
+
+### Files Modified
+- `src/gui_2.py`: Add `_render_cache_panel()`, integrate into `_gui_func()`
+- `tests/test_cache_panel.py`: New test file
+
+### Dependencies
+- Existing: `src/ai_client.py::get_gemini_cache_stats()`
+- Existing: `src/ai_client.py::cleanup()`
+- Existing: `imgui_bundle::imgui`
@@ -1,21 +1,118 @@
 # Track Specification: Cache Analytics Display (cache_analytics_20260306)

 ## Overview
-Gemini cache hit/miss visualization, memory usage, TTL status display. Uses existing ai_client.get_gemini_cache_stats() which is not displayed in GUI.
+Gemini cache hit/miss visualization, memory usage, TTL status display. Uses existing `ai_client.get_gemini_cache_stats()` which is implemented but has no GUI representation.
+
+## Current State Audit
+
+### Already Implemented (DO NOT re-implement)
+- **`ai_client.get_gemini_cache_stats()`** (src/ai_client.py) - Returns dict with:
+  - `cache_exists`: bool - Whether a Gemini cache is active
+  - `cache_age_seconds`: float - Age of current cache in seconds
+  - `ttl_seconds`: int - Cache TTL (default 3600)
+  - `ttl_remaining`: float - Seconds until cache expires
+  - `created_at`: float - Unix timestamp of cache creation
+- **Gemini cache variables** (src/ai_client.py lines ~60-70):
+  - `_gemini_cache`: The `CachedContent` object or None
+  - `_gemini_cache_created_at`: float timestamp when cache was created
+  - `_GEMINI_CACHE_TTL`: int = 3600 (1 hour default)
+- **Cache invalidation logic** already handles 90% TTL proactive renewal
+
+### Gaps to Fill (This Track's Scope)
+- No GUI panel to display cache statistics
+- No visual indicator of cache health/TTL
+- No manual cache clear button in UI
+- No hit/miss tracking (Gemini API doesn't expose this directly - may need approximation)

 ## Architectural Constraints
- **Non-Blocking**: Cache queries MUST NOT block UI.
- **Efficient Polling**: Cache stats SHOULD be polled, not pushed.
+
+### Threading & State Access
+- **Non-Blocking**: Cache queries MUST NOT block the UI thread. The `get_gemini_cache_stats()` function reads module-level globals (`_gemini_cache`, `_gemini_cache_created_at`) which are modified on the asyncio worker thread during `_send_gemini()`.
+- **No Lock Needed**: These are atomic reads (bool/float/int), but be aware they may be stale by render time. This is acceptable for display purposes.
+- **Cross-Thread Pattern**: Use `manual-slop_get_git_diff` to understand how other read-only stats are accessed in `gui_2.py` (e.g., `ai_client.get_comms_log()`).
+
+### GUI Integration
+- **Location**: Add to `_render_token_budget_panel()` in `gui_2.py` or create new `_render_cache_panel()` method.
+- **ImGui Pattern**: Use `imgui.collapsing_header("Cache Analytics")` to allow collapsing.
+- **Code Style**: 1-space indentation, no comments unless requested.
+
+### Performance
+- **Polling vs Pushing**: Cache stats are cheap to compute (just float math). Safe to recompute each frame when panel is open.
+- **No Event Needed**: Unlike MMA state, cache stats don't need event-driven updates.
+
+## Architecture Reference
+
+Consult these docs for implementation patterns:
+- **[docs/guide_architecture.md](../../../docs/guide_architecture.md)**: Thread domains, cross-thread patterns
+- **[docs/guide_tools.md](../../../docs/guide_tools.md)**: Hook API if exposing cache stats via API
+
+### Key Integration Points
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `src/ai_client.py` | ~200-230 | `get_gemini_cache_stats()` function |
+| `src/ai_client.py` | ~60-70 | Cache globals (`_gemini_cache`, `_GEMINI_CACHE_TTL`) |
+| `src/ai_client.py` | ~220 | `cleanup()` function for manual cache clear |
+| `src/gui_2.py` | ~1800-1900 | `_render_token_budget_panel()` - potential location |
+| `src/gui_2.py` | ~150-200 | `App.__init__` state initialization pattern |

 ## Functional Requirements
- **Cache Status**: Display active cache count and size.
- **Hit/Miss Ratio**: Calculate and display cache efficiency.
- **TTL Countdown**: Show time until next cache rebuild.
- **Manual Clear**: Button to manually clear cache.
+
+### FR1: Cache Status Display
+- Display whether a Gemini cache is currently active (`cache_exists` bool)
+- Show cache age in human-readable format (e.g., "45m 23s old")
+- Only show panel when `current_provider == "gemini"`
+
+### FR2: TTL Countdown
+- Display remaining TTL in seconds and as percentage (e.g., "15:23 remaining (42%)")
+- Visual indicator when TTL is below 20% (warning color)
+- Note: Cache auto-rebuilds at 90% TTL, so this shows time until rebuild trigger
+
+### FR3: Manual Clear Button
+- Button to manually clear cache via `ai_client.cleanup()`
+- Button should have confirmation or be clearly labeled as destructive
+- After clear, display "Cache cleared - will rebuild on next request"
+
+### FR4: Hit/Miss Estimation (Optional Enhancement)
+- Since Gemini API doesn't expose actual hit/miss counts, estimate by:
+  - Counting number of `send()` calls while cache exists
+  - Display as "Cache active for N requests"
+
+## Non-Functional Requirements
+
+| Requirement | Constraint |
+|-------------|------------|
+| Frame Time Impact | <1ms when panel visible |
+| Memory Overhead | <1KB for display state |
+| Thread Safety | Read-only access to ai_client globals |
+
+## Testing Requirements
+
+### Unit Tests
+- Test panel renders without error when provider is Gemini
+- Test panel is hidden when provider is not Gemini
+- Test clear button calls `ai_client.cleanup()`
+
+### Integration Tests (via `live_gui` fixture)
+- Verify cache stats display after actual Gemini API call
+- Verify TTL countdown decrements over time
+
+### Structural Testing Contract
+- **NO mocking** of `ai_client` internals - use real state
+- Test artifacts go to `tests/artifacts/`
+
+## Out of Scope
+- Anthropic prompt caching display (different mechanism - ephemeral breakpoints)
+- DeepSeek caching (not implemented)
+- Actual hit/miss tracking from Gemini API (not exposed)
+- Persisting cache stats across sessions

 ## Acceptance Criteria
- [ ] Cache panel displays in GUI.
- [ ] Hit/miss ratio calculated correctly.
- [ ] TTL countdown visible.
- [ ] Manual clear button works.
- [ ] Uses existing get_gemini_cache_stats().
+- [ ] Cache panel displays in GUI when provider is Gemini
+- [ ] Cache age shown in human-readable format
+- [ ] TTL countdown visible with percentage
+- [ ] Warning color when TTL < 20%
+- [ ] Manual clear button works and calls `ai_client.cleanup()`
+- [ ] Panel hidden for non-Gemini providers
+- [ ] Uses existing `get_gemini_cache_stats()` - no new ai_client code
+- [ ] 1-space indentation maintained
@@ -1,63 +1,237 @@
-# Implementation Plan: Conductor Path Configuration (conductor_path_configurable_20260306)
+# Implementation Plan: Conductor Path Configuration (conductor_path_configurable_20260306)

-## Phase 1: Centralized Path Resolver
- [ ] Task: Create path resolver module
-    - WHERE: src/orchestrator_pm.py or new module (src/paths.py)
-    - WHAT: Single function to get all configurable paths
-    - HOW: Check env vars, then config, then defaults
-    - SAFETY: Immutable after first call
+> **Reference:** [Spec](./spec.md) | [Architecture Guide](../../../docs/guide_architecture.md)
+> 
+> **CRITICAL:** This is Phase 0 infrastructure. Complete this before other Phase 3 tracks.

-## Phase 2: Define Config Schema
- [ ] Task: Define path configuration schema
-    - WHERE: src/paths.py
-    - WHAT: Define default paths and env var names
-    - HOW: Constants for each path
-    - SAFETY: None
+## Phase 1: Create Centralized Path Module
+Focus: Establish the single source of truth for all paths

-## Phase 3: Update Orchestrator
- [ ] Task: Update orchestrator_pm.py
-    - WHERE: src/orchestrator_pm.py
-    - WHAT: Use centralized resolver
-    - HOW: Import and use get_path('conductor')
- [ ] Task: Update project_manager.py
-    - WHERE: src/project_manager.py
-    - WHAT: Use centralized paths
-    - HOW: Import from paths module
+- [ ] Task 1.1: Initialize MMA Environment
+    - Run `activate_skill mma-orchestrator` before starting
+
+- [ ] Task 1.2: Create src/paths.py module
+    - WHERE: `src/paths.py` (new file)
+    - WHAT: Centralized path resolution module
+    - HOW:
+      ```python
+      from pathlib import Path
+      import os
+      import tomllib
+      from typing import Optional
+      
+      _CONFIG_PATH: Path = Path(os.environ.get("SLOP_CONFIG", "config.toml"))
+      _RESOLVED: dict[str, Path] = {}
+      
+      def _resolve_path(env_var: str, config_key: str, default: str) -> Path:
+       if env_var in os.environ:
+        return Path(os.environ[env_var])
+       try:
+        with open(_CONFIG_PATH, "rb") as f:
+         cfg = tomllib.load(f)
+        if "paths" in cfg and config_key in cfg["paths"]:
+         return Path(cfg["paths"][config_key])
+       except FileNotFoundError:
+        pass
+       return Path(default)
+      
+      def get_conductor_dir() -> Path:
+       if "conductor_dir" not in _RESOLVED:
+        _RESOLVED["conductor_dir"] = _resolve_path("SLOP_CONDUCTOR_DIR", "conductor_dir", "conductor")
+       return _RESOLVED["conductor_dir"]
+      
+      def get_logs_dir() -> Path:
+       if "logs_dir" not in _RESOLVED:
+        _RESOLVED["logs_dir"] = _resolve_path("SLOP_LOGS_DIR", "logs_dir", "logs/sessions")
+       return _RESOLVED["logs_dir"]
+      
+      def get_scripts_dir() -> Path:
+       if "scripts_dir" not in _RESOLVED:
+        _RESOLVED["scripts_dir"] = _resolve_path("SLOP_SCRIPTS_DIR", "scripts_dir", "scripts/generated")
+       return _RESOLVED["scripts_dir"]
+      
+      def get_config_path() -> Path:
+       return _CONFIG_PATH
+      
+      def get_tracks_dir() -> Path:
+       return get_conductor_dir() / "tracks"
+      
+      def get_track_state_dir(track_id: str) -> Path:
+       return get_tracks_dir() / track_id
+      
+      def get_archive_dir() -> Path:
+       return get_conductor_dir() / "archive"
+      
+      def reset_resolved() -> None:
+       """For testing only - clear cached resolutions."""
+       _RESOLVED.clear()
+      ```
+    - CODE STYLE: 1-space indentation
+    - SAFETY: Lazy resolution prevents import-order issues
+
+- [ ] Task 1.3: Write unit tests for paths module
+    - WHERE: `tests/test_paths.py` (new file)
+    - WHAT: Test path resolution logic
+    - HOW: Test defaults, env vars, config overrides, precedence
+    - PATTERN: Mock `os.environ`, create temp config files
+
+## Phase 2: Update Core Modules
+Focus: Migrate orchestrator and project_manager to use paths module
+
+- [ ] Task 2.1: Update orchestrator_pm.py
+    - WHERE: `src/orchestrator_pm.py` line 10
+    - WHAT: Replace `CONDUCTOR_PATH` with path function
+    - HOW:
+      ```python
+      # OLD: CONDUCTOR_PATH: Path = Path("conductor")
+      # NEW:
+      from src import paths
+      # Then use paths.get_conductor_dir() where needed
+      ```
+    - SAFETY: Check all usages of `CONDUCTOR_PATH` in the file
+
+- [ ] Task 2.2: Update project_manager.py
+    - WHERE: `src/project_manager.py` lines 240, 252, 297
+    - WHAT: Replace hardcoded "conductor" with path functions
+    - HOW:
+      ```python
+      from src import paths
+      # save_track_state: track_dir = paths.get_track_state_dir(track_id)
+      # load_track_state: state_file = paths.get_track_state_dir(track_id) / "state.toml"
+      # get_all_tracks: tracks_dir = paths.get_tracks_dir()
+      ```
+    - SAFETY: Maintain `base_dir` parameter for backward compatibility if needed
+
+## Phase 3: Update Session Logger
+Focus: Migrate session_logger.py to use paths module
+
+- [ ] Task 3.1: Update session_logger.py paths
+    - WHERE: `src/session_logger.py` lines 26-27
+    - WHAT: Replace module-level constants with lazy resolution
+    - HOW:
+      ```python
+      # OLD:
+      # _LOG_DIR: Path = Path("./logs/sessions")
+      # _SCRIPTS_DIR: Path = Path("./scripts/generated")
+      # NEW:
+      from src import paths
+      # In functions, use paths.get_logs_dir() and paths.get_scripts_dir()
+      ```
+    - SAFETY: Module-level initialization may need to become function-level
+
+- [ ] Task 3.2: Handle open_session() path resolution
+    - WHERE: `src/session_logger.py` open_session function
+    - WHAT: Resolve paths at call time, not import time
+    - HOW: Call `paths.get_logs_dir()` inside `open_session()`
+    - SAFETY: Verify no import-time side effects

 ## Phase 4: Update App Controller
- [ ] Task: Update app_controller.py
-    - WHERE: src/app_controller.py
-    - WHAT: Use paths for logs, conductor
-    - HOW: Import from paths module
+Focus: Migrate all scattered path references in app_controller.py
+
+- [ ] Task 4.1: Update app_controller.py log paths
+    - WHERE: `src/app_controller.py` lines 643, 674, 1241
+    - WHAT: Replace hardcoded "logs/sessions" with path functions
+    - HOW:
+      ```python
+      from src import paths
+      # cb_load_prior_log: initialdir=str(paths.get_logs_dir())
+      # cb_prune_logs: LogRegistry(paths.get_logs_dir() / "log_registry.toml")
+      # _render_log_management: Path(paths.get_logs_dir())
+      ```
+    - SAFETY: Convert Path to str where file dialog expects string
+
+- [ ] Task 4.2: Update app_controller.py conductor paths
+    - WHERE: `src/app_controller.py` lines 1907, 1937
+    - WHAT: Replace hardcoded "conductor" with path functions
+    - HOW:
+      ```python
+      from src import paths
+      # _render_projects_panel: base = paths.get_conductor_dir()
+      # _render_projects_panel: track_dir = paths.get_tracks_dir() / track_id
+      ```
+    - SAFETY: None

 ## Phase 5: Update GUI
- [ ] Task: Update gui_2.py
-    - WHERE: src/gui_2.py
-    - WHAT: Use centralized paths
-    - HOW: Import from paths module
+Focus: Migrate gui_2.py path references

-## Phase 6: Update Other Modules
- [ ] Task: Update aggregate.py
-    - WHERE: src/aggregate.py
-    - WHAT: Use config path from resolver
-    - HOW: Import from paths module
- [ ] Task: Update session_logger.py
-    - WHERE: src/session_logger.py
-    - WHAT: Use scripts_dir from resolver
-    - HOW: Import from paths module
- [ ] Task: Update other files with hardcoded paths
+- [ ] Task 5.1: Update gui_2.py log path
+    - WHERE: `src/gui_2.py` line 776
+    - WHAT: Replace hardcoded LogRegistry path
+    - HOW:
+      ```python
+      from src import paths
+      # LogRegistry(paths.get_logs_dir() / "log_registry.toml")
+      ```
+    - SAFETY: None

-## Phase 7: Config & Env Vars
- [ ] Task: Add paths to config.toml
-    - WHERE: config.toml
-    - WHAT: Add path configuration section
-    - HOW: toml section [paths]
- [ ] Task: Document environment variables
-    - WHERE: docs/ or README
-    - WHAT: Document all path env vars
+## Phase 6: Configuration Support
+Focus: Add paths section to config.toml

-## Phase 8: Verification
- [ ] Task: Test with custom paths
- [ ] Task: Test default behavior
- [ ] Task: Run test suite
- [ ] Task: Conductor - Phase Verification
+- [ ] Task 6.1: Add [paths] section to config.toml
+    - WHERE: `config.toml`
+    - WHAT: Add documented paths section
+    - HOW:
+      ```toml
+      # Path Configuration (optional - defaults shown)
+      # Override with environment variables: SLOP_CONDUCTOR_DIR, SLOP_LOGS_DIR, SLOP_SCRIPTS_DIR
+      
+      [paths]
+      # conductor_dir = "conductor"
+      # logs_dir = "logs/sessions"
+      # scripts_dir = "scripts/generated"
+      ```
+    - SAFETY: Comments-only defaults don't change behavior
+
+- [ ] Task 6.2: Document environment variables
+    - WHERE: `conductor/tech-stack.md` or `README.md`
+    - WHAT: Add documentation for path environment variables
+    - HOW: Table of env vars with descriptions
+
+## Phase 7: Verification
+Focus: Ensure all changes work correctly
+
+- [ ] Task 7.1: Run full test suite
+    - COMMAND: `uv run pytest tests/ -v --timeout=60`
+    - EXPECTED: All existing tests pass
+    - BATCHING: Run in batches of 4 files max
+
+- [ ] Task 7.2: Test with custom paths
+    - HOW: Set env vars, verify app uses custom paths
+    - VERIFY: Check logs go to custom dir, tracks load from custom dir
+
+- [ ] Task 7.3: Test default behavior unchanged
+    - HOW: Run without env vars, verify defaults work
+    - VERIFY: All paths resolve to original defaults
+
+- [ ] Task 7.4: Conductor - Phase Verification
+    - Run: `uv run pytest tests/test_paths.py -v`
+    - Manual: Start app, verify no path-related errors
+
+## Implementation Notes
+
+### Import Order Considerations
+- `paths.py` must not import from other src modules
+- Other modules can safely import from `paths.py`
+- `session_logger.py` needs careful handling due to module-level state
+
+### Backward Compatibility
+- All existing `base_dir` parameters continue to work
+- Default behavior is unchanged when no config/env overrides
+- Existing `SLOP_CONFIG` env var pattern is preserved
+
+### Files Modified
+- `src/paths.py` (new)
+- `src/orchestrator_pm.py`
+- `src/project_manager.py`
+- `src/session_logger.py`
+- `src/app_controller.py`
+- `src/gui_2.py`
+- `config.toml`
+- `tests/test_paths.py` (new)
+
+### Code Style Checklist
+- [ ] 1-space indentation throughout
+- [ ] CRLF line endings on Windows
+- [ ] No comments unless documenting public API
+- [ ] Type hints on all public functions
+- [ ] Follow existing module patterns
@@ -1,36 +1,185 @@
-# Track Specification: Conductor Path Configuration (conductor_path_configurable_20260306)
+# Track Specification: Conductor Path Configuration (conductor_path_configurable_20260306)

 ## Overview
-Eliminate all hardcoded paths in the application. Make directory paths configurable via config.toml or environment variables, allowing the running app to use different directories from development setup.
+Eliminate all hardcoded paths in the application. Make directory paths configurable via `config.toml` or environment variables, allowing the running app to use different directories from development setup. This is **Phase 0 - Critical Infrastructure** that must be completed before other Phase 3 tracks.

 ## Current State Audit
-### Already Implemented
- CONDUCTOR_PATH in orchestrator_pm.py is hardcoded to Path(\"conductor\")
- project_manager.py uses Path(base_dir) / \"conductor\" / \"tracks\"
- app_controller.py hardcodes \"conductor\" and \"conductor/tracks\"
- logs/sessions hardcoded in app_controller.py, gui_2.py
- config.toml hardcoded in aggregate.py (but models.py has SLOP_CONFIG env var)
- scripts/generated hardcoded in session_logger.py

-### Gaps to Fill
- No config-based path overrides
- No environment variable support for most paths
+### Already Implemented (DO NOT re-implement)

-## Goals
- Make all directory paths configurable
- Support config.toml and environment variables
- Backward compatible (defaults remain)
- Centralized path resolver
+#### Environment Variable Pattern (models.py)
+- **`CONFIG_PATH`**: `Path(os.environ.get("SLOP_CONFIG", "config.toml"))` - This pattern exists and should be replicated.
+
+#### Hardcoded Path Inventory
+
+| File | Line | Current Hardcode | Variable/Usage |
+|------|------|------------------|----------------|
+| `src/orchestrator_pm.py` | 10 | `"conductor"` | `CONDUCTOR_PATH: Path = Path("conductor")` |
+| `src/project_manager.py` | 240 | `"conductor"` | `track_dir = Path(base_dir) / "conductor" / "tracks" / track_id` |
+| `src/project_manager.py` | 252 | `"conductor"` | `state_file = Path(base_dir) / "conductor" / "tracks" / track_id / "state.toml"` |
+| `src/project_manager.py` | 297 | `"conductor"` | `tracks_dir = Path(base_dir) / "conductor" / "tracks"` |
+| `src/app_controller.py` | 1907 | `"conductor"` | `base = Path("conductor")` |
+| `src/app_controller.py` | 1937 | `"conductor"` | `track_dir = Path("conductor/tracks") / track_id` |
+| `src/app_controller.py` | 643 | `"logs/sessions"` | `initialdir="logs/sessions"` |
+| `src/app_controller.py` | 674 | `"logs/sessions"` | `LogRegistry("logs/sessions/log_registry.toml")` |
+| `src/app_controller.py` | 1241 | `"logs/sessions"` | `log_dir = Path("logs/sessions")` |
+| `src/gui_2.py` | 776 | `"logs/sessions"` | `LogRegistry("logs/sessions/log_registry.toml")` |
+| `src/session_logger.py` | 26 | `"./logs/sessions"` | `_LOG_DIR: Path = Path("./logs/sessions")` |
+| `src/session_logger.py` | 27 | `"./scripts/generated"` | `_SCRIPTS_DIR: Path = Path("./scripts/generated")` |
+
+#### Notes on Existing Implementation
+- `session_logger.py` has module-level path constants that are set once at import time
+- `project_manager.py` uses `base_dir` parameter but hardcodes `"conductor"` as subdirectory
+- `app_controller.py` has multiple scattered hardcodes that need consolidation
+
+### Gaps to Fill (This Track's Scope)
+- No centralized path configuration module
+- No `config.toml` section for paths
+- No environment variable support for logs, scripts, or conductor directories
+- Duplicate path definitions across files
+
+## Architectural Constraints
+
+### Single Source of Truth
+- All paths MUST be resolved through a single `paths.py` module
+- No file should hardcode directory strings directly
+
+### Initialization Order
+- Path resolution MUST happen before any module imports that use paths
+- `session_logger.py` imports at module level - may need lazy initialization
+
+### Backward Compatibility
+- Default paths MUST remain unchanged (relative to project root)
+- Existing `SLOP_CONFIG` env var pattern MUST be preserved
+- All existing function signatures that take `base_dir` MUST continue to work
+
+### Thread Safety
+- Path resolution is read-only after initialization
+- No locks needed once paths are resolved
+
+## Architecture Reference
+
+### Key Integration Points
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `src/models.py` | 27-28 | `CONFIG_PATH` pattern to replicate |
+| `src/orchestrator_pm.py` | 10 | `CONDUCTOR_PATH` to replace |
+| `src/project_manager.py` | 238-297 | Track state paths to update |
+| `src/session_logger.py` | 26-27 | `_LOG_DIR`, `_SCRIPTS_DIR` to update |
+| `src/app_controller.py` | 643, 674, 1241, 1907, 1937 | Multiple hardcodes to fix |
+| `src/gui_2.py` | 776 | LogRegistry path to update |
+| `config.toml` | N/A | Add `[paths]` section |
+
+### Proposed Module Structure
+
+```python
+# src/paths.py (NEW FILE)
+
+from pathlib import Path
+import os
+from typing import Optional
+import tomllib
+
+_CONFIG_PATH: Path = Path(os.environ.get("SLOP_CONFIG", "config.toml"))
+_RESOLVED: dict[str, Path] = {}
+
+def _resolve_path(env_var: str, config_key: str, default: Path) -> Path:
+ """Resolve path from env var, config, or default."""
+ if env_var in os.environ:
+  return Path(os.environ[env_var])
+ if _CONFIG_PATH.exists():
+  with open(_CONFIG_PATH, "rb") as f:
+   cfg = tomllib.load(f)
+   if "paths" in cfg and config_key in cfg["paths"]:
+    return Path(cfg["paths"][config_key])
+ return default
+
+def get_conductor_dir() -> Path: ...
+def get_logs_dir() -> Path: ...
+def get_scripts_dir() -> Path: ...
+def get_config_path() -> Path: ...
+```

 ## Functional Requirements
- Config file support in config.toml for all paths
- Environment variable support for all paths
- Runtime path resolution
- All modules use centralized path resolver
+
+### FR1: Config File Support
+Add `[paths]` section to `config.toml`:
+```toml
+[paths]
+conductor_dir = "conductor"
+logs_dir = "logs/sessions"
+scripts_dir = "scripts/generated"
+# config_file uses SLOP_CONFIG env var (existing)
+```
+
+### FR2: Environment Variable Support
+| Env Var | Config Key | Default |
+|---------|------------|---------|
+| `SLOP_CONDUCTOR_DIR` | `conductor_dir` | `"conductor"` |
+| `SLOP_LOGS_DIR` | `logs_dir` | `"logs/sessions"` |
+| `SLOP_SCRIPTS_DIR` | `scripts_dir` | `"scripts/generated"` |
+| `SLOP_CONFIG` | (existing) | `"config.toml"` |
+
+### FR3: Centralized Path Module
+Create `src/paths.py` with:
+- `get_conductor_dir() -> Path`
+- `get_logs_dir() -> Path`
+- `get_scripts_dir() -> Path`
+- `get_config_path() -> Path`
+- `get_tracks_dir() -> Path` (conductor/tracks)
+- `get_track_state_dir(track_id: str) -> Path`
+
+### FR4: Module Migration
+Update all modules to import from `paths.py`:
+- `orchestrator_pm.py`: Use `paths.get_conductor_dir()`
+- `project_manager.py`: Use `paths.get_tracks_dir()`
+- `session_logger.py`: Use `paths.get_logs_dir()`, `paths.get_scripts_dir()`
+- `app_controller.py`: Use all path functions
+- `gui_2.py`: Use `paths.get_logs_dir()`
+
+## Non-Functional Requirements
+
+| Requirement | Constraint |
+|-------------|------------|
+| Import Order | `paths.py` must be importable without side effects |
+| Lazy Resolution | Path resolution on first access, not at import |
+| No Breaking Changes | All existing code continues to work |
+| Default Behavior | Unchanged when no config/env overrides |
+
+## Testing Requirements
+
+### Unit Tests
+- Test each path function returns expected default
+- Test env var override for each path
+- Test config.toml override for each path
+- Test env var takes precedence over config
+
+### Integration Tests
+- Verify app starts with custom paths
+- Verify tracks load from custom conductor dir
+- Verify logs write to custom logs dir
+
+### Test Isolation
+- Reset `_RESOLVED` dict between tests
+- Mock `os.environ` for env var tests
+- Use temp config files for config tests
+
+## Out of Scope
+- Runtime path changes (paths are resolved once at startup)
+- Path validation (directory existence checks)
+- Relative path resolution (always resolve to absolute)

 ## Acceptance Criteria
- [ ] config.toml has options for: conductor_dir, logs_dir, config_file, scripts_dir
- [ ] Environment variables work: CONDUCTOR_DIR, LOGS_DIR, SLOP_CONFIG, SCRIPTS_DIR
+- [ ] `src/paths.py` module created with all path functions
+- [ ] `config.toml` has `[paths]` section
+- [ ] All 4 environment variables work
 - [ ] Default paths remain unchanged
- [ ] All modules use resolved paths
- [ ] Backward compatible
+- [ ] `orchestrator_pm.py` uses `paths.get_conductor_dir()`
+- [ ] `project_manager.py` uses path functions
+- [ ] `session_logger.py` uses path functions
+- [ ] `app_controller.py` uses path functions
+- [ ] `gui_2.py` uses path functions
+- [ ] All existing tests pass
+- [ ] New path resolution tests pass
+- [ ] 1-space indentation maintained
@@ -1,32 +1,169 @@
 # Implementation Plan: Cost & Token Analytics Panel (cost_token_analytics_20260306)

-## Phase 1: Panel Setup
- [ ] Task: Initialize MMA Environment
- [ ] Task: Create cost panel structure
-    - WHERE: src/gui_2.py
-    - WHAT: New panel for cost display
-    - HOW: Add _render_cost_panel method
-    - SAFETY: Non-blocking updates
+> **Reference:** [Spec](./spec.md) | [Architecture Guide](../../../docs/guide_architecture.md)

-## Phase 2: Cost Calculations
- [ ] Task: Integrate cost_tracker
-    - WHERE: src/gui_2.py
-    - WHAT: Use cost_tracker.estimate_cost
-    - HOW: Call with model and token counts
-    - SAFETY: Cache expensive calculations
- [ ] Task: Track session totals
-    - WHERE: src/gui_2.py or app_controller
-    - WHAT: Accumulate cost over session
-    - HOW: Maintain running total
-    - SAFETY: Thread-safe updates
+## Phase 1: Foundation & Research
+Focus: Verify existing infrastructure

-## Phase 3: UI Implementation
- [ ] Task: Render cost breakdown
-    - WHERE: src/gui_2.py
-    - WHAT: Show per-model and per-tier costs
-    - HOW: imgui tables
-    - SAFETY: Handle zero/empty states
+- [ ] Task 1.1: Initialize MMA Environment
+    - Run `activate_skill mma-orchestrator` before starting

-## Phase 4: Verification
- [ ] Task: Test cost calculations
- [ ] Task: Conductor - Phase Verification
+- [ ] Task 1.2: Verify cost_tracker.py implementation
+    - WHERE: `src/cost_tracker.py`
+    - WHAT: Confirm `MODEL_PRICING` dict and `estimate_cost()` function
+    - HOW: Use `manual-slop_py_get_definition` on `estimate_cost`
+    - OUTPUT: Document exact MODEL_PRICING structure for reference
+
+- [ ] Task 1.3: Verify tier_usage in ConductorEngine
+    - WHERE: `src/multi_agent_conductor.py` lines ~50-60
+    - WHAT: Confirm tier_usage dict structure and update mechanism
+    - HOW: Use `manual-slop_py_get_code_outline` on ConductorEngine
+    - SAFETY: Note thread that updates tier_usage
+
+- [ ] Task 1.4: Review existing MMA dashboard
+    - WHERE: `src/gui_2.py` `_render_mma_dashboard()` method
+    - WHAT: Understand existing tier usage table pattern
+    - HOW: Read method to identify extension points
+    - OUTPUT: Note line numbers for table rendering
+
+## Phase 2: State Management
+Focus: Add cost tracking state to app
+
+- [ ] Task 2.1: Add session cost state
+    - WHERE: `src/gui_2.py` or `src/app_controller.py` in `__init__`
+    - WHAT: Add session-level cost tracking state
+    - HOW:
+      ```python
+      self._session_cost_total: float = 0.0
+      self._session_cost_by_model: dict[str, float] = {}
+      self._session_cost_by_tier: dict[str, float] = {
+       "Tier 1": 0.0, "Tier 2": 0.0, "Tier 3": 0.0, "Tier 4": 0.0
+      }
+      ```
+    - CODE STYLE: 1-space indentation
+
+- [ ] Task 2.2: Add cost update logic
+    - WHERE: `src/gui_2.py` in MMA state update handler
+    - WHAT: Calculate costs when tier_usage updates
+    - HOW:
+      ```python
+      def _update_costs_from_tier_usage(self, tier_usage: dict) -> None:
+       for tier, usage in tier_usage.items():
+        cost = cost_tracker.estimate_cost(
+         self.current_model, usage["input"], usage["output"]
+        )
+        self._session_cost_by_tier[tier] = cost
+        self._session_cost_total += cost
+      ```
+    - SAFETY: Called from GUI thread via state update
+
+- [ ] Task 2.3: Reset costs on session reset
+    - WHERE: `src/gui_2.py` or `src/app_controller.py` reset handler
+    - WHAT: Clear cost state when session resets
+    - HOW: Set all cost values to 0.0 in reset function
+
+## Phase 3: Panel Implementation
+Focus: Create the GUI panel
+
+- [ ] Task 3.1: Create _render_cost_panel() method
+    - WHERE: `src/gui_2.py` after other render methods
+    - WHAT: New method to display cost information
+    - HOW:
+      ```python
+      def _render_cost_panel(self) -> None:
+       if not imgui.collapsing_header("Cost Analytics"):
+        return
+       
+       # Total session cost
+       imgui.text(f"Session Total: ${self._session_cost_total:.4f}")
+       
+       # Per-tier breakdown
+       if imgui.begin_table("tier_costs", 3):
+        imgui.table_setup_column("Tier")
+        imgui.table_setup_column("Tokens")
+        imgui.table_setup_column("Cost")
+        imgui.table_headers_row()
+        for tier, cost in self._session_cost_by_tier.items():
+         imgui.table_next_row()
+         imgui.table_set_column_index(0)
+         imgui.text(tier)
+         imgui.table_set_column_index(2)
+         imgui.text(f"${cost:.4f}")
+        imgui.end_table()
+       
+       # Per-model breakdown
+       if self._session_cost_by_model:
+        imgui.separator()
+        imgui.text("By Model:")
+        for model, cost in self._session_cost_by_model.items():
+         imgui.bullet_text(f"{model}: ${cost:.4f}")
+      ```
+    - CODE STYLE: 1-space indentation, no comments
+
+- [ ] Task 3.2: Integrate panel into main GUI
+    - WHERE: `src/gui_2.py` in `_gui_func()` or appropriate panel
+    - WHAT: Call `_render_cost_panel()` in layout
+    - HOW: Add near token budget panel or MMA dashboard
+    - SAFETY: None
+
+## Phase 4: Integration with MMA Dashboard
+Focus: Extend existing dashboard with cost column
+
+- [ ] Task 4.1: Add cost column to tier usage table
+    - WHERE: `src/gui_2.py` `_render_mma_dashboard()` 
+    - WHAT: Add "Est. Cost" column to existing tier usage table
+    - HOW:
+      - Change `imgui.table_setup_column()` from 3 to 4 columns
+      - Add "Est. Cost" header
+      - Calculate cost per tier using current model
+      - Display with dollar formatting
+    - SAFETY: Handle missing tier_usage gracefully
+
+- [ ] Task 4.2: Display model name in table
+    - WHERE: `src/gui_2.py` `_render_mma_dashboard()`
+    - WHAT: Show which model was used for each tier
+    - HOW: Add "Model" column with model name
+    - SAFETY: May not know per-tier model - use current_model as fallback
+
+## Phase 5: Testing
+Focus: Verify all functionality
+
+- [ ] Task 5.1: Write unit tests for cost calculation
+    - WHERE: `tests/test_cost_panel.py` (new file)
+    - WHAT: Test cost accumulation logic
+    - HOW: Mock tier_usage, verify costs calculated correctly
+    - PATTERN: Follow `test_cost_tracker.py` as reference
+
+- [ ] Task 5.2: Write integration test
+    - WHERE: `tests/test_cost_panel.py`
+    - WHAT: Test with live_gui, verify panel displays
+    - HOW: Use `live_gui` fixture, trigger API call, check costs
+    - ARTIFACTS: Write to `tests/artifacts/`
+
+- [ ] Task 5.3: Conductor - Phase Verification
+    - Run: `uv run pytest tests/test_cost_panel.py tests/test_cost_tracker.py -v`
+    - Manual: Verify panel displays in GUI
+
+## Implementation Notes
+
+### Thread Safety
+- tier_usage is updated on asyncio worker thread
+- GUI reads via `_process_pending_gui_tasks` - already synchronized
+- No additional locking needed
+
+### Cost Calculation Strategy
+- Use current model for all tiers (simplification)
+- Future: Track model per tier if needed
+- Unknown models return 0.0 cost (safe default)
+
+### Files Modified
+- `src/gui_2.py`: Add cost state, render methods
+- `src/app_controller.py`: Possibly add cost state (if using controller)
+- `tests/test_cost_panel.py`: New test file
+
+### Code Style Checklist
+- [ ] 1-space indentation throughout
+- [ ] CRLF line endings on Windows
+- [ ] No comments unless requested
+- [ ] Type hints on new state variables
+- [ ] Use existing `vec4` colors for consistency
@@ -1,21 +1,140 @@
 # Track Specification: Cost & Token Analytics Panel (cost_token_analytics_20260306)

 ## Overview
-Real-time cost tracking panel displaying cost per model, session totals, and breakdown by tier. Uses existing cost_tracker.py which is implemented but has no GUI.
+Real-time cost tracking panel displaying cost per model, session totals, and breakdown by tier. Uses existing `cost_tracker.py` which is implemented but has no GUI representation.
+
+## Current State Audit
+
+### Already Implemented (DO NOT re-implement)
+
+#### cost_tracker.py (src/cost_tracker.py)
+- **`MODEL_PRICING` dict**: Pricing per 1M tokens for all supported models
+  ```python
+  MODEL_PRICING: dict[str, dict[str, float]] = {
+   "gemini-2.5-flash-lite": {"input": 0.075, "output": 0.30},
+   "gemini-2.5-flash": {"input": 0.15, "output": 0.60},
+   "gemini-3.1-pro-preview": {"input": 1.25, "output": 5.00},
+   "claude-3-5-sonnet": {"input": 3.00, "output": 15.00},
+   "deepseek-v3": {"input": 0.27, "output": 1.10},
+   # ... more models
+  }
+  ```
+- **`estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float`**: Calculate cost in USD
+- **Returns 0.0 for unknown models** - safe default
+
+#### Token Tracking in ai_client.py
+- **`_add_bleed_derived()`** (ai_client.py): Adds derived token counts to comms entries
+- **`get_history_bleed_stats()`**: Returns token statistics from history
+- **Gemini**: Token counts from API response (`usage_metadata`)
+- **Anthropic**: Token counts from API response (`usage`)
+- **DeepSeek**: Token counts from API response (`usage`)
+
+#### MMA Tier Usage Tracking
+- **`ConductorEngine.tier_usage`** (multi_agent_conductor.py): Tracks per-tier token usage
+  ```python
+  self.tier_usage = {
+   "Tier 1": {"input": 0, "output": 0},
+   "Tier 2": {"input": 0, "output": 0},
+   "Tier 3": {"input": 0, "output": 0},
+   "Tier 4": {"input": 0, "output": 0},
+  }
+  ```
+
+### Gaps to Fill (This Track's Scope)
+- No GUI panel to display cost information
+- No session-level cost accumulation
+- No per-model breakdown visualization
+- No tier breakdown visualization

 ## Architectural Constraints
- **Non-Blocking**: Cost calculations MUST NOT block UI thread.
- **Efficient Updates**: Updates SHOULD be throttled to <10ms latency.
+
+### Non-Blocking Updates
+- Cost calculations MUST NOT block UI thread
+- Token counts are read from existing tracking - no new API calls
+- Use cached values, update on state change events
+
+### Cross-Thread Data Access
+- `tier_usage` is updated on asyncio worker thread
+- GUI reads via `_process_pending_gui_tasks` pattern
+- Already synchronized through MMA state updates
+
+### Memory Efficiency
+- Session cost is a simple float - no history array needed
+- Per-model costs can be dict: `{model_name: float}`
+
+## Architecture Reference
+
+### Key Integration Points
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `src/cost_tracker.py` | 10-40 | `MODEL_PRICING`, `estimate_cost()` |
+| `src/ai_client.py` | ~500-550 | `_add_bleed_derived()`, `get_history_bleed_stats()` |
+| `src/multi_agent_conductor.py` | ~50-60 | `tier_usage` dict |
+| `src/gui_2.py` | ~2700-2800 | `_render_mma_dashboard()` - existing tier usage display |
+| `src/gui_2.py` | ~1800-1900 | `_render_token_budget_panel()` - potential location |
+
+### Existing MMA Dashboard Pattern
+The `_render_mma_dashboard()` method already displays tier usage in a table. Extend this pattern for cost display.

 ## Functional Requirements
- **Cost Display**: Show real-time cost for current session.
- **Per-Model Breakdown**: Display cost grouped by model (Gemini, Anthropic, DeepSeek).
- **Tier Breakdown**: Show cost grouped by tier (Tier 1-4).
- **Session Totals**: Accumulate and display total session cost.
+
+### FR1: Session Cost Accumulation
+- Track total cost for the current session
+- Reset on session reset
+- Store in `App` or `AppController` state
+
+### FR2: Per-Model Cost Display
+- Show cost broken down by model name
+- Group by provider (Gemini, Anthropic, DeepSeek)
+- Show token counts alongside costs
+
+### FR3: Tier Breakdown Display
+- Show cost per MMA tier (Tier 1-4)
+- Use existing `tier_usage` data
+- Calculate cost using `cost_tracker.estimate_cost()`
+
+### FR4: Real-Time Updates
+- Update cost display when MMA state changes
+- Hook into existing `mma_state_update` event handling
+- No polling - event-driven
+
+## Non-Functional Requirements
+
+| Requirement | Constraint |
+|-------------|------------|
+| Frame Time Impact | <1ms when panel visible |
+| Memory Overhead | <1KB for session cost state |
+| Thread Safety | Read tier_usage via state updates only |
+
+## Testing Requirements
+
+### Unit Tests
+- Test `estimate_cost()` with known model/token combinations
+- Test unknown model returns 0.0
+- Test session cost accumulation
+
+### Integration Tests (via `live_gui` fixture)
+- Verify cost panel displays after API call
+- Verify costs update after MMA execution
+- Verify session reset clears costs
+
+### Structural Testing Contract
+- Use real `cost_tracker` module - no mocking
+- Test artifacts go to `tests/artifacts/`
+
+## Out of Scope
+- Historical cost tracking across sessions
+- Cost budgeting/alerts
+- Export cost reports
+- API cost for web searches (no token counts available)

 ## Acceptance Criteria
- [ ] Cost panel displays in GUI.
- [ ] Per-model cost shown correctly.
- [ ] Tier breakdown accurate.
- [ ] Total accumulates correctly.
- [ ] Uses existing cost_tracker.py functions.
+- [ ] Cost panel displays in GUI
+- [ ] Per-model cost shown with token counts
+- [ ] Tier breakdown accurate using `tier_usage`
+- [ ] Total session cost accumulates correctly
+- [ ] Panel updates on MMA state changes
+- [ ] Uses existing `cost_tracker.estimate_cost()`
+- [ ] Session reset clears costs
+- [ ] 1-space indentation maintained
@@ -1,30 +1,167 @@
 # Implementation Plan: Deep AST Context Pruning (deep_ast_context_pruning_20260306)

-## Phase 1: AST Integration
- [ ] Task: Initialize MMA Environment
- [ ] Task: Verify tree_sitter availability
-    - WHERE: requirements.txt, imports
-    - WHAT: Ensure tree_sitter installed and importable
-    - HOW: pip install tree_sitter
+> **Reference:** [Spec](./spec.md) | [Architecture Guide](../../../docs/guide_architecture.md)

-## Phase 2: Skeleton Generation
- [ ] Task: Implement AST parser
-    - WHERE: src/file_cache.py or new module
-    - WHAT: Parse Python AST using tree_sitter
-    - HOW: tree_sitter.Language + Parser
-    - SAFETY: Exception handling for parse errors
- [ ] Task: Implement skeleton generator
-    - WHERE: src/file_cache.py
-    - WHAT: Extract function signatures and docstrings
-    - HOW: Walk AST, collect Def nodes
-    - SAFETY: Handle large files gracefully
- [ ] Task: Integrate with worker dispatch
-    - WHERE: src/multi_agent_conductor.py
-    - WHAT: Inject skeleton into worker prompt
-    - HOW: Modify context generation
-    - SAFETY: Verify tokens reduced
+## Phase 1: Verify Existing Infrastructure
+Focus: Confirm tree-sitter integration works

-## Phase 3: Tests & Verification
- [ ] Task: Write AST parsing tests
- [ ] Task: Verify token reduction
- [ ] Task: Conductor - Phase Verification
+- [ ] Task 1.1: Initialize MMA Environment
+    - Run `activate_skill mma-orchestrator` before starting
+
+- [ ] Task 1.2: Verify tree_sitter installation
+    - WHERE: `requirements.txt`, imports
+    - WHAT: Ensure `tree_sitter` and `tree_sitter_python` are installed
+    - HOW: Check imports in `src/file_cache.py`
+    - CMD: `uv pip list | grep tree`
+
+- [ ] Task 1.3: Verify ASTParser functionality
+    - WHERE: `src/file_cache.py`
+    - WHAT: Test get_skeleton() and get_curated_view()
+    - HOW: Use `manual-slop_py_get_definition` on ASTParser class
+    - OUTPUT: Document exact API
+
+- [ ] Task 1.4: Review worker context injection
+    - WHERE: `src/multi_agent_conductor.py` `run_worker_lifecycle()`
+    - WHAT: Understand current context injection pattern
+    - HOW: Use `manual-slop_py_get_code_outline` on function
+
+## Phase 2: Targeted Function Extraction
+Focus: Extract only relevant functions from target files
+
+- [ ] Task 2.1: Implement targeted extraction function
+    - WHERE: `src/file_cache.py` or new `src/context_pruner.py`
+    - WHAT: Function to extract specific functions by name
+    - HOW:
+      ```python
+      def extract_functions(code: str, function_names: list[str]) -> str:
+       parser = ASTParser("python")
+       tree = parser.parse(code)
+       # Walk AST, find function_definition nodes matching names
+       # Return combined signatures + docstrings
+      ```
+    - CODE STYLE: 1-space indentation
+
+- [ ] Task 2.2: Add dependency traversal
+    - WHERE: Same as Task 2.1
+    - WHAT: Find functions called by target functions
+    - HOW: Parse function body for Call nodes, extract names
+    - SAFETY: Limit traversal depth to prevent explosion
+
+- [ ] Task 2.3: Integrate with worker context
+    - WHERE: `src/multi_agent_conductor.py` `run_worker_lifecycle()`
+    - WHAT: Use targeted extraction when ticket has target_file
+    - HOW:
+      - Check if `ticket.target_file` matches a context file
+      - If so, use `extract_functions()` instead of full content
+      - Fall back to skeleton for other files
+    - SAFETY: Handle missing function names gracefully
+
+## Phase 3: AST Caching
+Focus: Cache parsed trees to avoid re-parsing
+
+- [ ] Task 3.1: Implement AST cache in file_cache.py
+    - WHERE: `src/file_cache.py`
+    - WHAT: LRU cache for parsed AST trees
+    - HOW:
+      ```python
+      from functools import lru_cache
+      from pathlib import Path
+      import time
+      
+      _ast_cache: dict[str, tuple[float, Any]] = {}  # path -> (mtime, tree)
+      _CACHE_MAX_SIZE: int = 10
+      
+      def get_cached_tree(path: str) -> tree_sitter.Tree:
+       mtime = Path(path).stat().st_mtime
+       if path in _ast_cache:
+        cached_mtime, tree = _ast_cache[path]
+        if cached_mtime == mtime:
+         return tree
+       # Parse and cache
+       code = Path(path).read_text()
+       tree = parser.parse(code)
+       _ast_cache[path] = (mtime, tree)
+       if len(_ast_cache) > _CACHE_MAX_SIZE:
+        # Evict oldest
+        oldest = next(iter(_ast_cache))
+        del _ast_cache[oldest]
+       return tree
+      ```
+    - SAFETY: Thread-safe if called from single thread
+
+- [ ] Task 3.2: Use cache in skeleton generation
+    - WHERE: `src/file_cache.py`
+    - WHAT: Use cached tree instead of re-parsing
+    - HOW: Call `get_cached_tree()` in `get_skeleton()`
+
+## Phase 4: Token Measurement
+Focus: Measure and log token reduction
+
+- [ ] Task 4.1: Add token counting to context injection
+    - WHERE: `src/multi_agent_conductor.py`
+    - WHAT: Count tokens before and after pruning
+    - HOW:
+      ```python
+      def _count_tokens(text: str) -> int:
+       return len(text) // 4  # Rough estimate
+      ```
+    - SAFETY: Non-blocking, fast calculation
+
+- [ ] Task 4.2: Log token reduction metrics
+    - WHERE: `src/multi_agent_conductor.py`
+    - WHAT: Log reduction percentage
+    - HOW: `print(f"Context tokens: {before} -> {after} ({reduction_pct}% reduction)")`
+    - SAFETY: Use session_logger for structured logging
+
+- [ ] Task 4.3: Display in MMA dashboard (optional)
+    - WHERE: `src/gui_2.py` `_render_mma_dashboard()`
+    - WHAT: Show token reduction per worker
+    - HOW: Add to worker stream panel
+    - SAFETY: Optional enhancement
+
+## Phase 5: Testing
+Focus: Verify all functionality
+
+- [ ] Task 5.1: Write targeted extraction tests
+    - WHERE: `tests/test_context_pruner.py` (new file)
+    - WHAT: Test extraction returns only specified functions
+    - HOW: Create test file with known functions, extract subset
+
+- [ ] Task 5.2: Write integration test
+    - WHERE: `tests/test_context_pruner.py`
+    - WHAT: Run worker with skeleton context
+    - HOW: Use `live_gui` fixture with mock provider
+    - VERIFY: Worker completes ticket successfully
+
+- [ ] Task 5.3: Performance test
+    - WHERE: `tests/test_context_pruner.py`
+    - WHAT: Verify parse time < 100ms
+    - HOW: Time parsing of various file sizes
+
+- [ ] Task 5.4: Conductor - Phase Verification
+    - Run: `uv run pytest tests/test_context_pruner.py tests/test_ast_parser.py -v`
+    - Verify token reduction in logs
+
+## Implementation Notes
+
+### tree-sitter Pattern
+- Already implemented in `file_cache.py`
+- Language: `tree_sitter_python`
+- Node types: `function_definition`, `class_definition`, `import_statement`
+
+### Cache Strategy
+- Key: file path (absolute)
+- Value: (mtime, tree) tuple
+- Eviction: LRU with max 10 entries
+- Invalidation: mtime comparison
+
+### Files Modified
+- `src/file_cache.py`: Add cache, targeted extraction
+- `src/multi_agent_conductor.py`: Use targeted extraction
+- `tests/test_context_pruner.py`: New test file
+
+### Code Style Checklist
+- [ ] 1-space indentation throughout
+- [ ] CRLF line endings on Windows
+- [ ] No comments unless documenting API
+- [ ] Type hints on all functions
@@ -3,20 +3,126 @@
 ## Overview
 Use tree_sitter to parse target file AST and inject condensed skeletons into worker prompts. Currently workers receive full file context; this track reduces token burn by injecting only relevant function/method signatures.

+## Current State Audit
+
+### Already Implemented (DO NOT re-implement)
+
+#### ASTParser in file_cache.py (src/file_cache.py)
+- **Uses tree-sitter** with `tree_sitter_python` language
+- **`ASTParser.get_skeleton(code: str) -> str`**: Returns file with function bodies replaced by `...`
+- **`ASTParser.get_curated_view(code: str) -> str`**: Enhanced skeleton preserving `@core_logic` and `# [HOT]` bodies
+- **Pattern**: Parse → Walk AST → Identify function_definition nodes → Preserve signature/docstring, replace body
+
+#### Worker Context Injection (multi_agent_conductor.py)
+- **`run_worker_lifecycle()`** function handles context injection
+- **First file**: Gets `get_curated_view()` (full hot paths)
+- **Subsequent files**: Get `get_skeleton()` (signatures only)
+- **`context_requirements`**: List of files from Ticket dataclass
+
+#### MCP Tool Integration (mcp_client.py)
+- **`py_get_skeleton()`**: Already exposes skeleton generation as tool
+- **`py_get_code_outline()`**: Returns hierarchical outline with line ranges
+- **Tools available to workers** for on-demand full reads
+
+### Gaps to Fill (This Track's Scope)
+- Workers still receive full first file in some cases
+- No selective function extraction based on ticket target
+- No caching of parsed ASTs (re-parse on each context build)
+- Token reduction not measured/verified
+
 ## Architectural Constraints
- **Parsing Performance**: AST parsing MUST complete in <100ms per file.
- **Caching**: Parsed AST trees SHOULD be cached to avoid re-parsing.
- **Selective Exposure**: Only target functions/classes related to the ticket's target_file should be included.
+
+### Parsing Performance
+- AST parsing MUST complete in <100ms per file
+- tree-sitter is already fast (C extension)
+- Consider caching parsed trees in memory
+
+### Skeleton Quality
+- Must preserve enough context for worker to understand interface
+- Must preserve docstrings for API documentation
+- Must preserve type hints in signatures
+
+### Worker Autonomy
+- Workers MUST still be able to call `py_get_definition` for full source
+- Skeleton is the default, not the only option
+- Workers can request full reads on-demand
+
+## Architecture Reference
+
+### Key Integration Points
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `src/file_cache.py` | 30-80 | `ASTParser` class with tree-sitter |
+| `src/multi_agent_conductor.py` | 150-200 | `run_worker_lifecycle()` context injection |
+| `src/models.py` | 30-50 | `Ticket.context_requirements` field |
+| `src/mcp_client.py` | 200-250 | `py_get_skeleton()` MCP tool |
+
+### tree-sitter Pattern (existing)
+```python
+from file_cache import ASTParser
+parser = ASTParser("python")
+tree = parser.parse(code)
+skeleton = parser.get_skeleton(code)
+curated = parser.get_curated_view(code)
+```

 ## Functional Requirements
- **AST Parser**: Integrate tree_sitter to parse Python files.
- **Target Detection**: Identify which functions/methods to include based on ticket target_file.
- **Skeleton Generation**: Generate condensed skeletons with signatures and docstrings only.
- **Worker Integration**: Inject skeleton into worker prompt instead of full file content.
+
+### FR1: Targeted Function Extraction
+- Given a ticket's `target_file` and context, identify relevant functions
+- Extract only those function signatures + docstrings
+- Include imports and class definitions they depend on
+
+### FR2: Dependency Graph Traversal
+- For target function, find all called functions
+- Include signatures of dependencies (not full bodies)
+- Limit depth to prevent explosion
+
+### FR3: AST Caching
+- Cache parsed AST trees per file path
+- Invalidate cache when file mtime changes
+- Use `file_cache` pattern already in place
+
+### FR4: Token Measurement
+- Log token count before/after pruning
+- Calculate reduction percentage
+- Display in MMA dashboard or logs
+
+## Non-Functional Requirements
+
+| Requirement | Constraint |
+|-------------|------------|
+| Parse Time | <100ms per file |
+| Memory | Cache size bounded (LRU, max 10 files) |
+| Token Reduction | >50% for typical worker prompts |
+
+## Testing Requirements
+
+### Unit Tests
+- Test targeted extraction returns only specified functions
+- Test dependency traversal includes correct functions
+- Test cache invalidation on file change
+
+### Integration Tests
+- Run worker with skeleton context, verify completion
+- Compare token counts: full vs skeleton
+- Verify worker can still call py_get_definition
+
+### Performance Tests
+- Measure parse time for files of various sizes
+- Verify <100ms for files up to 1000 lines
+
+## Out of Scope
+- Non-Python file parsing (Python only for now)
+- Cross-file dependency tracking
+- Automatic relevance detection (manual target specification only)

 ## Acceptance Criteria
- [ ] tree_sitter successfully parses Python AST.
- [ ] Skeleton includes only target function/class with signature and docstring.
- [ ] Token count reduced by >50% for typical worker prompts.
- [ ] Workers can still complete tickets with skeleton-only context.
- [ ] >80% test coverage.
+- [ ] Targeted function extraction works
+- [ ] Token count reduced by >50% for typical prompts
+- [ ] Workers complete tickets with skeleton-only context
+- [ ] AST caching reduces re-parsing overhead
+- [ ] Token reduction metrics logged
+- [ ] >80% test coverage for new code
+- [ ] 1-space indentation maintained
@@ -1,21 +1,133 @@
 # Track Specification: Kill/Abort Running Workers (kill_abort_workers_20260306)

 ## Overview
-Add ability to kill/abort a running Tier 3 worker mid-execution. Currently workers run to completion; add cancel button with forced termination option.
+Add ability to kill/abort a running Tier 3 worker mid-execution. Currently workers run to completion in `run_in_executor()`; add cancel button with forced termination option.
+
+## Current State Audit
+
+### Already Implemented (DO NOT re-implement)
+
+#### Worker Execution (multi_agent_conductor.py)
+- **`run_worker_lifecycle()`**: Executes ticket via `loop.run_in_executor(None, ...)`
+- **`ConductorEngine.run()`**: Main loop that spawns workers
+- **No cancellation mechanism exists**
+
+#### Thread Pool Pattern (app_controller.py / gui_2.py)
+- Uses `threading.Thread(daemon=True)` for background work
+- `asyncio.run_in_executor()` for blocking AI calls
+- No thread references stored for later cancellation
+
+#### GUI State (gui_2.py)
+- **`mma_streams`**: Dict of worker output streams by tier
+- **`active_tickets`**: List of currently active tickets
+- **No per-worker thread tracking**
+
+### Gaps to Fill (This Track's Scope)
+- No way to track individual worker threads
+- No cancellation signal mechanism
+- No UI for kill/abort
+- No cleanup on termination

 ## Architectural Constraints
- **Clean Termination**: Resources MUST be released properly.
- **No Zombies**: Terminated workers MUST not become zombie processes.
+
+### Thread Safety
+- Worker tracking MUST use thread-safe data structures
+- Kill signal MUST be atomic (threading.Event)
+- Status updates MUST be atomic
+
+### Clean Termination
+- Resources (file handles, network connections) MUST be released
+- Partial results SHOULD be preserved
+- No zombie processes
+
+### AI Client Cancellation
+- `ai_client.send()` is blocking - cannot be interrupted mid-call
+- Kill can only happen between API calls or during tool execution
+- Use `threading.Event` to signal abort between operations
+
+## Architecture Reference
+
+### Key Integration Points
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `src/multi_agent_conductor.py` | 150-250 | `run_worker_lifecycle()` - add abort check |
+| `src/multi_agent_conductor.py` | 80-120 | `ConductorEngine.run()` - track workers |
+| `src/dag_engine.py` | 50-80 | `ExecutionEngine` - add kill method |
+| `src/models.py` | 30-50 | `Ticket` - add abort_event field |
+| `src/gui_2.py` | 2700-2800 | `_render_mma_dashboard()` - add kill buttons |
+
+### Proposed Worker Tracking Pattern
+
+```python
+# In ConductorEngine:
+self._active_workers: dict[str, dict] = {}  # ticket_id -> {thread, event, status}
+
+# In Ticket:
+self._abort_event: threading.Event = threading.Event()
+
+# In run_worker_lifecycle:
+if ticket._abort_event.is_set():
+ return  # Exit early
+```

 ## Functional Requirements
- **Kill Button**: Button to immediately terminate worker.
- **Graceful Abort**: Option to send abort signal and wait.
- **Confirmation**: Confirm dialog before kill.
- **Status Update**: Worker status updates to killed/failed.
+
+### FR1: Worker Thread Tracking
+- Store reference to each worker's thread
+- Store abort Event per worker
+- Track status: running, killing, killed
+
+### FR2: Kill Button UI
+- Button per active worker in MMA dashboard
+- Confirmation dialog before kill
+- Disabled if no workers running
+
+### FR3: Abort Signal Mechanism
+- `threading.Event` per ticket for abort signaling
+- Worker checks event between operations
+- AI client call cannot be interrupted (document limitation)
+
+### FR4: Clean Cleanup
+- Mark ticket as "killed" status
+- Preserve partial output in stream
+- Remove from active workers dict
+
+## Non-Functional Requirements
+
+| Requirement | Constraint |
+|-------------|------------|
+| Response Time | Kill takes effect within 1s of button press |
+| No Deadlocks | Kill cannot cause system hang |
+| Memory Safety | Worker resources freed after kill |
+
+## Testing Requirements
+
+### Unit Tests
+- Test abort event stops worker at check point
+- Test worker tracking dict updates correctly
+- Test kill button enables/disables based on workers
+
+### Integration Tests (via `live_gui` fixture)
+- Start worker, click kill, verify termination
+- Verify partial output preserved
+- Verify no zombie threads
+
+### Structural Testing Contract
+- Use real threading - no mocking
+- Test artifacts go to `tests/artifacts/`
+
+## Out of Scope
+- Force-killing AI API calls (API limitation)
+- Kill and restart (separate track)
+- Kill during PowerShell execution (separate concern)

 ## Acceptance Criteria
- [ ] Kill button visible per worker.
- [ ] Confirmation dialog appears.
- [ ] Worker terminates immediately.
- [ ] Resources cleaned up.
- [ ] Status reflects termination.
+- [ ] Kill button visible per running worker
+- [ ] Confirmation dialog appears
+- [ ] Worker terminates within 1s of kill
+- [ ] Partial output preserved in stream
+- [ ] Resources cleaned up
+- [ ] Status reflects "killed"
+- [ ] No zombie threads after kill
+- [ ] 1-space indentation maintained
@@ -1,21 +1,129 @@
 # Track Specification: Manual Block/Unblock Control (manual_block_control_20260306)

 ## Overview
-Allow user to manually block or unblock tickets with custom reasons. Currently blocked tickets rely on dependency resolution; add manual override.
+Allow user to manually block or unblock tickets with custom reasons. Currently blocked tickets rely solely on dependency resolution; add manual override capability.
+
+## Current State Audit
+
+### Already Implemented (DO NOT re-implement)
+
+#### Ticket Status (src/models.py)
+- **`Ticket` dataclass** has `status` field: "todo" | "in_progress" | "completed" | "blocked"
+- **`blocked_reason` field**: `Optional[str]` - exists but only set by dependency cascade
+- **`mark_blocked(reason: str)` method**: Sets status="blocked", stores reason
+
+#### DAG Blocking (src/dag_engine.py)
+- **`cascade_blocks()` method**: Transitively marks tickets as blocked when dependencies are blocked
+- **Dependency resolution**: Tickets blocked if any `depends_on` is not "completed"
+- **No manual override exists**
+
+#### GUI Display (src/gui_2.py)
+- **`_render_ticket_dag_node()`**: Renders ticket nodes with status colors
+- **Blocked nodes shown in distinct color**
+- **No block/unblock buttons**
+
+### Gaps to Fill (This Track's Scope)
+- No way to manually set blocked status
+- No way to add custom block reason
+- No way to manually unblock (clear blocked status)
+- Visual indicator for manual vs dependency blocking

 ## Architectural Constraints
- **Clear Indication**: Manual blocks MUST be visually distinct.
- **Audit Trail**: Block reason MUST be logged.
+
+### DAG Validity
+- Manual block MUST trigger cascade to downstream tickets
+- Manual unblock MUST check dependencies are satisfied
+- Cannot unblock if dependencies still blocked
+
+### Audit Trail
+- Block reason MUST be stored in Ticket
+- Distinguish manual vs dependency blocking
+
+### State Synchronization
+- Block/unblock MUST update GUI immediately
+- MUST persist to track state
+
+## Architecture Reference
+
+### Key Integration Points
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `src/models.py` | 40-60 | `Ticket.mark_blocked()`, `blocked_reason` |
+| `src/dag_engine.py` | 30-50 | `cascade_blocks()` - call after manual block |
+| `src/gui_2.py` | 2700-2800 | `_render_ticket_dag_node()` - add buttons |
+| `src/project_manager.py` | 238-260 | Track state persistence |
+
+### Proposed Ticket Enhancement
+
+```python
+# Add to Ticket dataclass:
+manual_block: bool = False  # True if blocked manually, False if dependency
+
+def mark_manual_block(self, reason: str) -> None:
+ self.status = "blocked"
+ self.blocked_reason = f"[MANUAL] {reason}"
+ self.manual_block = True
+
+def clear_manual_block(self) -> None:
+ if self.manual_block:
+  self.status = "todo"
+  self.blocked_reason = None
+  self.manual_block = False
+```

 ## Functional Requirements
- **Block Button**: Manually block selected ticket.
- **Unblock Button**: Remove manual block.
- **Reason Field**: Enter custom block reason.
- **Visual Indicator**: Blocked tickets clearly marked.
+
+### FR1: Block Button
+- Button on each ticket node to block
+- Opens text input for block reason
+- Sets `manual_block=True`, calls `mark_manual_block()`
+
+### FR2: Unblock Button
+- Button on blocked tickets to unblock
+- Only enabled if dependencies are satisfied
+- Clears manual block, sets status to "todo"
+
+### FR3: Reason Display
+- Show block reason on hover or in node
+- Different visual for manual vs dependency block
+- Show "[MANUAL]" prefix for manual blocks
+
+### FR4: Cascade Integration
+- Manual block triggers `cascade_blocks()`
+- Manual unblock recalculates blocked status
+
+## Non-Functional Requirements
+
+| Requirement | Constraint |
+|-------------|------------|
+| Response Time | Block/unblock takes effect immediately |
+| Persistence | Block state saved to track state |
+| Visual Clarity | Manual blocks clearly distinguished |
+
+## Testing Requirements
+
+### Unit Tests
+- Test `mark_manual_block()` sets correct fields
+- Test `clear_manual_block()` restores todo status
+- Test cascade after manual block
+
+### Integration Tests (via `live_gui` fixture)
+- Block ticket via GUI, verify status changes
+- Unblock ticket, verify status restored
+- Verify cascade affects downstream tickets
+
+## Out of Scope
+- Blocking during execution (kill first, then block)
+- Scheduled/conditional blocking
+- Block templates

 ## Acceptance Criteria
- [ ] Block button works.
- [ ] Unblock button works.
- [ ] Reason field saves.
- [ ] Visual indicator shows blocked status.
- [ ] Reason displayed in UI.
+- [ ] Block button on each ticket
+- [ ] Unblock button on blocked tickets
+- [ ] Reason input saves to ticket
+- [ ] Visual indicator distinguishes manual vs dependency
+- [ ] Reason displayed in UI
+- [ ] Cascade triggered on block/unblock
+- [ ] State persisted to track state
+- [ ] 1-space indentation maintained
@@ -3,19 +3,34 @@
 ## Overview
 Add UI controls to manually flag files for skeleton injection in discussions. Allow agent to request full file reads or specific def/class definitions on-demand.

-## Architectural Constraints
- **Fast Generation**: Skeletons MUST generate in <500ms.
- **Non-Blocking**: Generation MUST NOT block UI.
+## Current State Audit
+
+### Already Implemented (DO NOT re-implement)
+- **`file_cache.ASTParser.get_skeleton()`**: Returns Python skeleton
+- **`mcp_client.py_get_skeleton()`**: MCP tool for skeleton generation
+- **`aggregate.py`**: Builds file items context
+
+### Gaps to Fill
+- No UI to flag files for skeleton vs full
+- No preview of skeleton before injection
+- No on-demand definition lookup

 ## Functional Requirements
- **File Picker**: Browse and select files for skeleton injection.
- **Skeleton Preview**: Show generated skeleton before injection.
- **Manual Trigger**: Button to manually refresh skeleton.
- **Full Read Option**: Option to inject full file content instead.
+- File picker UI in discussion panel
+- Skeleton preview before injection
+- Toggle: skeleton vs full file
+- Uses existing `py_get_skeleton()` tool
+
+## Key Integration Points
+| File | Purpose |
+|-----|---------|
+| `src/gui_2.py` | File picker, injection UI |
+| `src/mcp_client.py` | `py_get_skeleton()` |
+| `src/file_cache.py` | `ASTParser` |

 ## Acceptance Criteria
- [ ] File picker UI functional.
- [ ] Skeleton preview displays.
- [ ] Manual refresh button works.
- [ ] Full read option available.
- [ ] Uses existing skeleton generation.
+- [ ] File picker UI functional
+- [ ] Skeleton preview displays
+- [ ] Manual refresh button works
+- [ ] Full read option available
+- [ ] 1-space indentation
@@ -3,19 +3,135 @@
 ## Overview
 Split-view GUI for parallel worker streams per tier. Visualize multiple concurrent workers with individual status, output tabs, and resource usage. Enable kill/restart per worker.

+## Current State Audit
+
+### Already Implemented (DO NOT re-implement)
+
+#### Worker Streams (gui_2.py)
+- **`mma_streams` dict**: `{stream_key: output_text}` - stores worker output
+- **`_render_tier_stream_panel()`**: Renders single stream panel
+- **Stream keys**: `"Tier 1"`, `"Tier 2"`, `"Tier 3"`, `"Tier 4"`
+
+#### MMA Dashboard (gui_2.py)
+- **`_render_mma_dashboard()`**: Displays tier usage table, ticket DAG
+- **`active_tickets`**: List of currently active tickets
+- **No multi-worker display**
+
+#### DAG Execution (dag_engine.py, multi_agent_conductor.py)
+- **Sequential execution**: Workers run one at a time
+- **No parallel execution**: `run_in_executor` used but sequentially
+- **See**: `true_parallel_worker_execution_20260306` for parallel implementation
+
+### Gaps to Fill (This Track's Scope)
+- No visualization for concurrent workers
+- No per-worker status display
+- No independent output scrolling per worker
+- No per-worker kill buttons
+
 ## Architectural Constraints
- **Stream Performance**: Multiple concurrent streams MUST NOT degrade UI performance.
- **Memory Efficiency**: Old stream data MUST be pruned to prevent memory bloat.
+
+### Stream Performance
+- Multiple concurrent streams MUST NOT degrade UI
+- Each stream renders only when visible
+- Old output MUST be pruned (memory bound)
+
+### Memory Efficiency
+- Stream output buffer limited per worker (e.g., 10KB max)
+- Prune oldest lines when buffer exceeded
+
+### State Synchronization
+- Stream updates via `_pending_gui_tasks` pattern
+- Thread-safe append to stream dict
+
+## Architecture Reference
+
+### Key Integration Points
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `src/gui_2.py` | 2500-2600 | `mma_streams` dict, stream rendering |
+| `src/gui_2.py` | 2650-2750 | `_render_mma_dashboard()` |
+| `src/multi_agent_conductor.py` | 100-150 | Worker stream output |
+| `src/dag_engine.py` | 80-100 | Execution state |
+
+### Proposed Multi-Worker Stream Structure
+
+```python
+# Enhanced mma_streams structure:
+mma_streams: dict[str, dict[str, Any]] = {
+ "worker-001": {
+  "tier": "Tier 3",
+  "ticket_id": "T-001",
+  "status": "running",  # running | completed | failed | killed
+  "output": "...",
+  "started_at": time.time(),
+  "thread_id": 12345,
+ },
+ "worker-002": {
+  "tier": "Tier 3",
+  "ticket_id": "T-002",
+  "status": "running",
+  ...
+ }
+}
+```

 ## Functional Requirements
- **Multi-Pane Layout**: Split view showing all active workers.
- **Per-Worker Status**: Display running/complete/blocked/failed per worker.
- **Output Tabs**: Each worker has scrollable output tab.
- **Kill/Restart**: Buttons to kill or restart individual workers.
+
+### FR1: Multi-Pane Layout
+- Split view showing all active workers
+- Use `imgui.columns()` or child windows
+- Show worker ID, tier, ticket ID, status
+
+### FR2: Per-Worker Status
+- Display: running, completed, failed, killed
+- Color-coded status indicators
+- Show elapsed time for running workers
+
+### FR3: Output Tabs
+- Each worker has scrollable output area
+- Independent scroll position per tab
+- Auto-scroll option for active workers
+
+### FR4: Per-Worker Kill
+- Kill button on each worker panel
+- Confirmation before kill
+- Status updates to "killed" after termination
+
+## Non-Functional Requirements
+
+| Requirement | Constraint |
+|-------------|------------|
+| Concurrent Workers | Support 4+ workers displayed |
+| Memory per Stream | Max 10KB output buffer |
+| Frame Rate | 60fps with 4 workers |
+
+## Testing Requirements
+
+### Unit Tests
+- Test stream dict structure
+- Test output pruning at buffer limit
+- Test status updates
+
+### Integration Tests (via `live_gui` fixture)
+- Start multiple workers, verify all displayed
+- Kill one worker, verify others continue
+- Verify scroll independence
+
+## Dependencies
+- **Depends on**: `true_parallel_worker_execution_20260306` (for actual parallel execution)
+- This track provides visualization only
+
+## Out of Scope
+- Actual parallel execution (separate track)
+- Worker restart (separate track)
+- Historical worker data

 ## Acceptance Criteria
- [ ] 4+ concurrent workers displayed simultaneously.
- [ ] Each worker shows individual status.
- [ ] Output streams scroll independently.
- [ ] Kill button terminates specific worker.
- [ ] Restart button re-spawns worker.
+- [ ] 4+ concurrent workers displayed simultaneously
+- [ ] Each worker shows individual status
+- [ ] Output streams scroll independently
+- [ ] Kill button terminates specific worker
+- [ ] Status updates in real-time
+- [ ] Memory bounded per stream
+- [ ] 1-space indentation maintained
@@ -1,21 +1,162 @@
 # Track Specification: Native Orchestrator (native_orchestrator_20260306)

 ## Overview
-Absorb mma_exec.py into core application. Manual Slop natively reads/writes plan.md, manages metadata.json, and orchestrates MMA tiers in pure Python without external CLI.
+Absorb `mma_exec.py` functionality into core application. Manual Slop natively reads/writes plan.md, manages metadata.json, and orchestrates MMA tiers in pure Python without external CLI subprocess calls.
+
+## Current State Audit
+
+### Already Implemented (DO NOT re-implement)
+
+#### mma_exec.py (scripts/mma_exec.py)
+- **CLI wrapper**: Parses `--role` argument, builds prompt, calls AI
+- **Model selection**: Maps role to model (tier3-worker → gemini-2.5-flash-lite)
+- **Subprocess execution**: Spawns new Python process for each delegation
+- **Logging**: Writes to `logs/agents/` directory
+
+#### ConductorEngine (src/multi_agent_conductor.py)
+- **`run()` method**: Executes tickets via `run_worker_lifecycle()`
+- **`run_worker_lifecycle()`**: Calls `ai_client.send()` directly
+- **In-process execution**: Workers run in same process (thread pool)
+
+#### orchestrator_pm.py (src/orchestrator_pm.py)
+- **`scan_work_summary()`**: Reads conductor/archive/ and conductor/tracks/
+- **Uses hardcoded `CONDUCTOR_PATH`**: Addressed in conductor_path_configurable track
+
+#### project_manager.py (src/project_manager.py)
+- **`save_track_state()`**: Writes state.toml
+- **`load_track_state()`**: Reads state.toml
+- **`get_all_tracks()`**: Scans tracks directory
+
+### Gaps to Fill (This Track's Scope)
+- No native plan.md parsing/writing
+- No native metadata.json management in ConductorEngine
+- External mma_exec.py still used for some operations
+- No unified orchestration interface

 ## Architectural Constraints
- **Backward Compatibility**: Existing track files MUST remain compatible.
- **No Breaking Changes**: API and CLI interfaces MUST remain functional.
+
+### Backward Compatibility
+- Existing track files MUST remain loadable
+- mma_exec.py CLI MUST still work (as wrapper)
+- No breaking changes to file formats
+
+### Single Process
+- All tier execution in same process
+- Use threading, not multiprocessing
+- Shared ai_client state (with locks)
+
+### Error Propagation
+- Tier errors MUST propagate to caller
+- No silent failures
+- Structured error reporting
+
+## Architecture Reference
+
+### Key Integration Points
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `src/orchestrator_pm.py` | 10-50 | `scan_work_summary()` |
+| `src/multi_agent_conductor.py` | 100-250 | `ConductorEngine`, `run_worker_lifecycle()` |
+| `src/conductor_tech_lead.py` | 10-50 | `generate_tickets()` |
+| `src/project_manager.py` | 238-310 | Track state CRUD |
+| `scripts/mma_exec.py` | 1-200 | Current CLI wrapper |
+
+### Proposed Native Orchestration Module
+
+```python
+# src/native_orchestrator.py (new file)
+from src import ai_client
+from src import conductor_tech_lead
+from src import multi_agent_conductor
+from src.models import Ticket, Track
+from pathlib import Path
+
+class NativeOrchestrator:
+ def __init__(self, base_dir: str = "."):
+  self.base_dir = Path(base_dir)
+  self._conductor: multi_agent_conductor.ConductorEngine | None = None
+ 
+ def load_track(self, track_id: str) -> Track:
+  """Load track from state.toml or metadata.json"""
+  ...
+ 
+ def save_track(self, track: Track) -> None:
+  """Persist track state"""
+  ...
+ 
+ def execute_track(self, track: Track) -> None:
+  """Execute all tickets in track"""
+  ...
+ 
+ def generate_tickets_for_track(self, brief: str) -> list[Ticket]:
+  """Tier 2: Generate tickets from brief"""
+  ...
+ 
+ def execute_ticket(self, ticket: Ticket) -> str:
+  """Tier 3: Execute single ticket"""
+  ...
+ 
+ def analyze_error(self, error: str) -> str:
+  """Tier 4: Analyze error"""
+  ...
+```

 ## Functional Requirements
- **Plan CRUD**: Native read/write of plan.md files.
- **Metadata Management**: Native management of metadata.json.
- **Tier Delegation**: In-process Tier 1-4 delegation without subprocess.
- **CLI Fallback**: Optional mma_exec.py wrapper for backward compatibility.
+
+### FR1: Plan.md CRUD
+- `read_plan(track_id) -> str`: Read plan.md content
+- `write_plan(track_id, content)`: Write plan.md content
+- `parse_plan_tasks(content) -> list[dict]`: Extract task checkboxes
+
+### FR2: Metadata Management
+- `read_metadata(track_id) -> Metadata`: Load metadata.json
+- `write_metadata(track_id, metadata)`: Save metadata.json
+- `create_metadata(track_id, name) -> Metadata`: Create new metadata
+
+### FR3: Tier Delegation (In-Process)
+- **Tier 1**: Call `orchestrator_pm` functions directly
+- **Tier 2**: Call `conductor_tech_lead.generate_tickets()` directly
+- **Tier 3**: Call `ai_client.send()` directly in thread
+- **Tier 4**: Call `ai_client.run_tier4_analysis()` directly
+
+### FR4: CLI Fallback
+- `mma_exec.py` becomes thin wrapper around `NativeOrchestrator`
+- Maintains backward compatibility for external tools
+
+## Non-Functional Requirements
+
+| Requirement | Constraint |
+|-------------|------------|
+| Latency | <10ms overhead vs subprocess |
+| Memory | No additional per-tier overhead |
+| Compatibility | 100% file format compatible |
+
+## Testing Requirements
+
+### Unit Tests
+- Test plan.md parsing
+- Test metadata.json read/write
+- Test tier delegation calls correct functions
+
+### Integration Tests
+- Load existing track, verify compatibility
+- Execute track end-to-end without subprocess
+- Verify mma_exec.py wrapper still works
+
+## Dependencies
+- **Depends on**: `conductor_path_configurable_20260306` for path resolution
+
+## Out of Scope
+- Distributed orchestration
+- Persistent worker processes
+- Hot-reload of track state

 ## Acceptance Criteria
- [ ] plan.md read/write works natively.
- [ ] metadata.json managed in Python.
- [ ] Tier delegation executes in-process.
- [ ] No external CLI required for orchestration.
- [ ] Existing tracks remain loadable.
+- [ ] plan.md read/write works natively
+- [ ] metadata.json managed in Python
+- [ ] Tier delegation executes in-process
+- [ ] No external CLI required for orchestration
+- [ ] Existing tracks remain loadable
+- [ ] mma_exec.py wrapper still works
+- [ ] 1-space indentation maintained
@@ -1,21 +1,33 @@
 # Track Specification: On-Demand Definition Lookup (on_demand_def_lookup_20260306)

 ## Overview
-Add ability for agent to request specific class/function definitions during discussion. User @mentions symbol for inline definition, or AI auto-fetches on unknown symbols.
+Add ability for agent to request specific class/function definitions during discussion. Parse @symbol syntax to trigger lookup.

-## Architectural Constraints
- **Fast Lookup**: Definition lookup MUST complete in <100ms.
- **Accurate Parsing**: Symbol parsing MUST handle edge cases.
+## Current State Audit
+
+### Already Implemented
+- **`mcp_client.py_get_definition()`**: Returns full definition source
+- **`outline_tool.py`**: Code outlining
+
+### Gaps to Fill
+- No @symbol parsing in discussion input
+- No inline definition display
+- No click-to-source navigation

 ## Functional Requirements
- **@Syntax**: Parse @symbol to trigger lookup.
- **Inline Display**: Show definition inline in discussion.
- **Auto-Fetch**: Option for AI to auto-fetch on unknown symbols.
- **Navigation**: Click definition to jump to source.
+- Parse `@ClassName` or `@function_name` in input
+- Display definition inline in discussion
+- Click to jump to source file
+- Uses existing `py_get_definition()` tool
+
+## Key Integration Points
+| File | Purpose |
+|-----|---------|
+| `src/gui_2.py` | Input parsing, definition display |
+| `src/mcp_client.py` | `py_get_definition()` |

 ## Acceptance Criteria
- [ ] @symbol triggers lookup.
- [ ] Definition displays inline.
- [ ] Auto-fetch option works.
- [ ] Click navigation functional.
- [ ] Uses existing py_get_definition.
+- [ ] @symbol triggers lookup
+- [ ] Definition displays inline
+- [ ] Click navigation functional
+- [ ] 1-space indentation
@@ -1,21 +1,37 @@
 # Track Specification: Per-Ticket Model Override (per_ticket_model_20260306)

 ## Overview
-Allow user to manually select which model to use for a specific ticket, overriding the default tier model. Useful for forcing smarter model on hard tickets.
+Allow user to manually select which model to use for a specific ticket, overriding the default tier model.

-## Architectural Constraints
- **Validation**: Selected model MUST be valid and available.
- **Clear Override**: Override MUST be visually distinct.
+## Current State Audit
+
+### Already Implemented
+- **`models.Ticket`**: Has no model_override field
+- **`multi_agent_conductor.py`**: Uses fixed model per tier
+- **`ai_client.py`**: `set_provider()`, `set_model()` functions
+
+### Gaps to Fill
+- No model_override field on Ticket
+- No UI for model selection per ticket
+- No override indicator in GUI

 ## Functional Requirements
- **Model Dropdown**: Select model per ticket.
- **Override Flag**: Mark ticket as using override.
- **Clear Indicator**: Visual show override is active.
- **Reset Option**: Remove override and revert to tier default.
+- Add `model_override: Optional[str]` to Ticket dataclass
+- Model dropdown in ticket UI
+- Visual indicator when override active
+- Reset button to clear override
+
+## Key Integration Points
+| File | Purpose |
+|-----|---------|
+| `src/models.py` | Add model_override field |
+| `src/gui_2.py` | Model dropdown UI |
+| `src/multi_agent_conductor.py` | Use override at execution |

 ## Acceptance Criteria
- [ ] Model dropdown works.
- [ ] Override saves correctly.
- [ ] Visual indicator shows override.
- [ ] Reset returns to default.
- [ ] Override used during execution.
+- [ ] Model dropdown works
+- [ ] Override saves correctly
+- [ ] Visual indicator shows override
+- [ ] Reset returns to default
+- [ ] Override used during execution
+- [ ] 1-space indentation
@@ -1,21 +1,39 @@
 # Track Specification: Performance Dashboard (performance_dashboard_20260306)

 ## Overview
-Expand performance metrics panel with CPU/RAM usage, frame time, input lag with historical graphs. Uses existing performance_monitor.py which has basic metrics but no visualization.
+Expand performance metrics panel with CPU/RAM graphs, frame time histogram. Uses existing `performance_monitor.py`.

-## Architectural Constraints
- **60fps Constraint**: Metrics collection MUST NOT impact frame rate.
- **Thread Safety**: Cross-thread metrics MUST use proper synchronization.
+## Current State Audit
+
+### Already Implemented
+- **`src/performance_monitor.py`**: `PerformanceMonitor` class
+- **`get_metrics()`**: Returns FPS, frame time, CPU, input lag
+- **Basic display in GUI diagnostics**
+
+### Gaps to Fill
+- No historical graphs
+- No rolling window storage
+- No frame time histogram

 ## Functional Requirements
- **CPU/RAM Graphs**: Display rolling CPU and RAM usage over time.
- **Frame Time**: Show frame time histogram.
- **Input Lag**: Track and display input-to-response latency.
- **Historical Data**: Maintain rolling window of metrics history.
+- Rolling window of metrics (deque with maxlen)
+- Line graphs for CPU/RAM over time
+- Frame time histogram
+- Uses existing `PerformanceMonitor.get_metrics()`
+
+## Key Integration Points
+| File | Purpose |
+|-----|---------|
+| `src/performance_monitor.py` | Add history storage |
+| `src/gui_2.py` | Graph rendering |
+
+## Architectural Constraints
+- 60fps during graph rendering
+- Memory bounded (max 100 data points)

 ## Acceptance Criteria
- [ ] CPU graph shows rolling history.
- [ ] RAM graph shows rolling history.
- [ ] Frame time histogram displays.
- [ ] Input lag metrics tracked.
- [ ] Uses existing performance_monitor.py.
+- [ ] CPU graph shows rolling history
+- [ ] RAM graph shows rolling history
+- [ ] Frame time histogram displays
+- [ ] Input lag metrics tracked
+- [ ] 1-space indentation
@@ -1,21 +1,129 @@
 # Track Specification: Pipeline Pause/Resume (pipeline_pause_resume_20260306)

 ## Overview
-Add global pause/resume for entire DAG execution pipeline. Allow user to freeze all worker activity and resume later.
+Add global pause/resume for entire DAG execution pipeline. Allow user to freeze all worker activity and resume later without losing state.
+
+## Current State Audit
+
+### Already Implemented (DO NOT re-implement)
+
+#### Execution Loop (multi_agent_conductor.py)
+- **`ConductorEngine.run()`**: Async loop that processes tickets
+- **Loop continues until**: All complete OR all blocked OR error
+- **No pause mechanism**
+
+#### Execution Engine (dag_engine.py)
+- **`ExecutionEngine.tick()`**: Returns ready tasks
+- **`auto_queue` flag**: Controls automatic task promotion
+- **No global pause state**
+
+#### GUI State (gui_2.py)
+- **`mma_status`**: "idle" | "planning" | "executing" | "done"
+- **No paused state**
+
+### Gaps to Fill (This Track's Scope)
+- No way to pause execution mid-pipeline
+- No way to resume from paused state
+- No visual indicator for paused state

 ## Architectural Constraints
- **State Preservation**: Worker state MUST be preserved across pause.
- **Atomic Operation**: Pause/Resume MUST be atomic.
+
+### State Preservation
+- Running workers MUST complete before pause takes effect
+- Paused state MUST preserve all ticket statuses
+- No data loss on resume
+
+### Atomic Operation
+- Pause MUST be atomic (all-or-nothing)
+- No partial pause state
+
+### Non-Blocking
+- Pause request MUST NOT block GUI thread
+- Pause signaled via threading.Event
+
+## Architecture Reference
+
+### Key Integration Points
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `src/multi_agent_conductor.py` | 80-150 | `ConductorEngine.run()` - add pause check |
+| `src/dag_engine.py` | 50-80 | `ExecutionEngine` - add pause state |
+| `src/gui_2.py` | ~170 | State for pause flag |
+| `src/gui_2.py` | 2650-2750 | `_render_mma_dashboard()` - add pause button |
+
+### Proposed Pause Pattern
+
+```python
+# In ConductorEngine:
+self._pause_event: threading.Event = threading.Event()
+
+def pause(self) -> None:
+ self._pause_event.set()
+
+def resume(self) -> None:
+ self._pause_event.clear()
+
+# In run() loop:
+async def run(self):
+ while True:
+  if self._pause_event.is_set():
+   await asyncio.sleep(0.5)  # Wait while paused
+   continue
+  # Normal processing...
+```

 ## Functional Requirements
- **Pause Button**: Freeze all worker activity.
- **Resume Button**: Continue from paused state.
- **Visual Indicator**: Show pipeline is paused.
- **State Display**: Show which workers are paused.
+
+### FR1: Pause Button
+- Button in MMA dashboard
+- Disabled when no execution active
+- Click triggers `engine.pause()`
+
+### FR2: Resume Button
+- Button in MMA dashboard (replaces pause when paused)
+- Disabled when not paused
+- Click triggers `engine.resume()`
+
+### FR3: Visual Indicator
+- Banner or icon when paused
+- `mma_status` shows "paused"
+- Ticket status preserved
+
+### FR4: State Display
+- Show which workers were running when paused
+- Show pending tasks that will resume
+
+## Non-Functional Requirements
+
+| Requirement | Constraint |
+|-------------|------------|
+| Response Time | Pause takes effect within 500ms |
+| No Data Loss | All state preserved |
+| Visual Feedback | Clear paused indicator |
+
+## Testing Requirements
+
+### Unit Tests
+- Test pause stops task spawning
+- Test resume continues from correct state
+- Test state preserved across pause
+
+### Integration Tests (via `live_gui` fixture)
+- Start execution, pause, verify workers stop
+- Resume, verify execution continues
+- Verify no state loss
+
+## Out of Scope
+- Per-ticket pause (all-or-nothing only)
+- Scheduled pause
+- Pause during individual API call

 ## Acceptance Criteria
- [ ] Pause button freezes pipeline.
- [ ] Resume button continues execution.
- [ ] Visual indicator shows paused state.
- [ ] Worker states preserved.
- [ ] No data loss on resume.
+- [ ] Pause button freezes pipeline
+- [ ] Resume button continues execution
+- [ ] Visual indicator shows paused state
+- [ ] Worker states preserved
+- [ ] No data loss on resume
+- [ ] `mma_status` includes "paused"
+- [ ] 1-space indentation maintained
@@ -1,21 +1,35 @@
 # Track Specification: Session Insights & Efficiency Scores (session_insights_20260306)

 ## Overview
-Token usage over time, cost projections, session summary with efficiency scores. Visualize session_logger data.
+Token usage over time, cost projections, session summary with efficiency scores.

-## Architectural Constraints
- **Efficient Calculation**: Metrics MUST be calculated incrementally.
- **Real-Time**: Updates SHOULD reflect current session state.
+## Current State Audit
+
+### Already Implemented
+- **`session_logger.py`**: Logs comms, tool calls, API hooks
+- **`ai_client.get_comms_log()`**: Returns API interaction history
+- **`cost_tracker.estimate_cost()`**: Cost calculation
+
+### Gaps to Fill
+- No token timeline visualization
+- No cost projection
+- No efficiency score calculation

 ## Functional Requirements
- **Token Timeline**: Graph of token usage over session.
- **Cost Projection**: Estimate remaining budget based on usage.
- **Efficiency Score**: Calculate tokens-per-useful-change ratio.
- **Session Summary**: Text summary of session metrics.
+- Token usage graph over session
+- Cost projection based on burn rate
+- Efficiency score (tokens per useful change)
+- Session summary text
+
+## Key Integration Points
+| File | Purpose |
+|-----|---------|
+| `src/session_logger.py` | Read session data |
+| `src/gui_2.py` | Timeline rendering |

 ## Acceptance Criteria
- [ ] Token timeline renders.
- [ ] Cost projection accurate.
- [ ] Efficiency score calculated.
- [ ] Summary displays key metrics.
- [ ] Uses existing session_logger.
+- [ ] Token timeline renders
+- [ ] Cost projection calculated
+- [ ] Efficiency score shown
+- [ ] Summary displays key metrics
+- [ ] 1-space indentation
@@ -1,21 +1,41 @@
 # Track Specification: Manual Ticket Queue Management (ticket_queue_mgmt_20260306)

 ## Overview
-Allow user to manually reorder, prioritize, or requeue tickets in the DAG. Add drag-drop reordering, priority tags, and bulk selection for execute/skip/block.
+Allow user to manually reorder, prioritize, or requeue tickets. Add drag-drop, priority tags, bulk selection.

-## Architectural Constraints
- **DAG Validity**: Reordering MUST maintain valid dependencies.
- **Atomic Operations**: Bulk operations MUST be atomic.
+## Current State Audit
+
+### Already Implemented
+- **`models.Ticket`**: No priority field
+- **`dag_engine.py`**: `get_ready_tasks()` returns in order
+- **GUI**: Linear ticket list display
+
+### Gaps to Fill
+- No priority field on Ticket
+- No drag-drop reordering
+- No bulk selection/operations

 ## Functional Requirements
- **Drag-Drop**: Reorder tickets via drag-drop.
- **Priority Tags**: Set priority (high/med/low) on tickets.
- **Bulk Select**: Multi-select tickets.
- **Bulk Actions**: Execute/skip/block selected tickets.
+- Add `priority: str = "medium"` to Ticket (high/medium/low)
+- Drag-drop reordering in ticket list
+- Multi-select for bulk operations
+- Bulk execute/skip/block
+
+## Key Integration Points
+| File | Purpose |
+|-----|---------|
+| `src/models.py` | Add priority field |
+| `src/gui_2.py` | Drag-drop, bulk UI |
+| `src/dag_engine.py` | Priority-aware ordering |
+
+## Architectural Constraints
+- DAG validity maintained after reorder
+- Dependency order cannot be violated

 ## Acceptance Criteria
- [ ] Drag-drop reordering works.
- [ ] Priority tags display and save.
- [ ] Multi-select functional.
- [ ] Bulk actions apply correctly.
- [ ] DAG validity maintained.
+- [ ] Drag-drop reordering works
+- [ ] Priority tags display and save
+- [ ] Multi-select functional
+- [ ] Bulk actions apply correctly
+- [ ] DAG validity maintained
+- [ ] 1-space indentation
@@ -1,22 +1,142 @@
 # Track Specification: Advanced Tier 4 QA Auto-Patching (tier4_auto_patching_20260306)

 ## Overview
-Elevate Tier 4 from log summarizer to auto-patcher. When verification tests fail, Tier 4 generates a .patch file. GUI displays side-by-side diff; user clicks Apply Patch to resume pipeline.
+Elevate Tier 4 from log summarizer to auto-patcher. When verification tests fail, Tier 4 generates a unified diff patch. GUI displays side-by-side diff; user clicks Apply Patch to resume pipeline.
+
+## Current State Audit
+
+### Already Implemented (DO NOT re-implement)
+
+#### Tier 4 Analysis (ai_client.py)
+- **`run_tier4_analysis(stderr: str) -> str`**: Analyzes error, returns summary
+- **Prompt**: Uses `mma_prompts.PROMPTS["tier4_error_triage"]`
+- **Output**: Text analysis, no code generation
+
+#### Error Interception (shell_runner.py)
+- **`run_powershell()`**: Accepts `qa_callback` parameter
+- **On failure**: Calls `qa_callback(stderr)` and appends to output
+- **Integrated**: `ai_client._run_script()` passes `qa_callback`
+
+#### MCP Tools (mcp_client.py)
+- **`set_file_slice()`**: Replace line range in file
+- **`py_update_definition()`**: Replace class/function via AST
+- **`edit_file()`**: String replacement in file
+- **No diff generation or patch application**
+
+### Gaps to Fill (This Track's Scope)
+- Tier 4 doesn't generate patches
+- No diff visualization in GUI
+- No patch application mechanism
+- No rollback capability

 ## Architectural Constraints
- **Safe Preview**: Patches MUST be previewed before application.
- **Rollback**: Failed patches MUST be revertable.
- **Atomic Application**: Patch application MUST be atomic (all-or-nothing).
+
+### Safe Preview
+- Patches MUST be previewed before application
+- User MUST see exactly what will change
+- No automatic application without approval
+
+### Atomic Application
+- Patch applies all changes or none
+- If partial application fails, rollback
+
+### Rollback Support
+- Backup created before patch
+- User can undo applied patch
+- Backup stored temporarily
+
+## Architecture Reference
+
+### Key Integration Points
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `src/ai_client.py` | ~700-750 | `run_tier4_analysis()` |
+| `src/shell_runner.py` | 50-100 | `run_powershell()` with qa_callback |
+| `src/mcp_client.py` | 300-350 | `set_file_slice()`, `edit_file()` |
+| `src/gui_2.py` | 2400-2500 | Confirmation dialogs pattern |
+
+### Proposed Patch Workflow
+
+```
+1. Test fails → stderr captured
+2. Tier 4 analyzes → generates unified diff
+3. GUI shows diff viewer with Apply/Reject buttons
+4. User clicks Apply:
+   a. Backup original file(s)
+   b. Apply patch via subprocess or difflib
+   c. Verify patch applied cleanly
+   d. If fails, restore from backup
+5. Pipeline resumes with patched code
+```
+
+### Unified Diff Format
+
+```diff
+--- a/src/target_file.py
+++ b/src/target_file.py
+@@ -10,5 +10,6 @@
+ def existing_function():
+-    old_line
+    new_line
+    additional_line
+```

 ## Functional Requirements
- **Patch Generation**: Tier 4 generates .patch file on test failure.
- **Diff Viewer**: GUI shows side-by-side diff of proposed changes.
- **Apply Button**: User confirms patch application.
- **Patch Application**: Apply patch to working directory and verify.
+
+### FR1: Patch Generation
+- Tier 4 prompt enhanced to generate unified diff
+- Output format: standard `diff -u` format
+- Include file path in diff header
+- Multiple files supported
+
+### FR2: Diff Viewer GUI
+- Side-by-side or unified view
+- Color-coded additions (green) and deletions (red)
+- Line numbers visible
+- Scrollable for large diffs
+
+### FR3: Apply Button
+- Creates backup: `file.py.backup`
+- Applies patch: `patch -p1 < diff.patch` or Python difflib
+- Verifies success
+- Shows confirmation or error
+
+### FR4: Rollback
+- Restore from backup if patch fails
+- Manual rollback button after successful patch
+- Backup deleted after explicit user action
+
+## Non-Functional Requirements
+
+| Requirement | Constraint |
+|-------------|------------|
+| Patch Generation | <5s for typical errors |
+| Diff Rendering | <100ms for 100-line diff |
+| Backup Storage | Temp dir, cleaned on exit |
+
+## Testing Requirements
+
+### Unit Tests
+- Test diff generation format
+- Test patch application logic
+- Test backup/rollback
+
+### Integration Tests (via `live_gui` fixture)
+- Trigger test failure, verify patch generated
+- Apply patch, verify file changed correctly
+- Rollback, verify file restored
+
+## Out of Scope
+- Automatic patch application (always requires approval)
+- Patch conflict resolution (reject if conflict)
+- Multi-file patch coordination

 ## Acceptance Criteria
- [ ] Tier 4 generates valid .patch file.
- [ ] GUI displays readable side-by-side diff.
- [ ] User can approve/reject patch.
- [ ] Approved patches applied correctly.
- [ ] Rollback available on failure.
+- [ ] Tier 4 generates valid unified diff on test failure
+- [ ] GUI displays readable side-by-side diff
+- [ ] User can approve/reject patch
+- [ ] Approved patches applied correctly
+- [ ] Rollback available on failure
+- [ ] Backup files cleaned up
+- [ ] 1-space indentation maintained
@@ -1,21 +1,35 @@
 # Track Specification: Tool Usage Analytics (tool_usage_analytics_20260306)

 ## Overview
-Analytics panel showing most-used tools, average execution time, and failure rates. Uses existing tool_log_callback data.
+Analytics panel showing most-used tools, average execution time, failure rates.

-## Architectural Constraints
- **Efficient Aggregation**: Analytics MUST use efficient data structures.
- **Memory Bounds**: History MUST be bounded to prevent growth.
+## Current State Audit
+
+### Already Implemented
+- **`ai_client.tool_log_callback`**: Called when tool executes
+- **`mcp_client.dispatch()`**: Routes tool calls
+- **No aggregation or storage**
+
+### Gaps to Fill
+- No tool usage tracking
+- No execution time tracking
+- No failure rate tracking

 ## Functional Requirements
- **Usage Ranking**: Show most-used tools ranked by count.
- **Avg Time**: Display average execution time per tool.
- **Failure Rate**: Track and display failure percentage per tool.
- **Time Series**: Show usage over time.
+- Track tool name, execution time, success/failure
+- Aggregate by tool name
+- Display ranking by usage count
+- Show average time per tool
+- Show failure percentage
+
+## Key Integration Points
+| File | Purpose |
+|-----|---------|
+| `src/ai_client.py` | Hook into tool_log_callback |
+| `src/gui_2.py` | Analytics panel rendering |

 ## Acceptance Criteria
- [ ] Tool ranking displayed.
- [ ] Average times accurate.
- [ ] Failure rates tracked.
- [ ] Time series visualization works.
- [ ] Uses existing tool_log_callback.
+- [ ] Tool ranking displayed
+- [ ] Average times accurate
+- [ ] Failure rates tracked
+- [ ] 1-space indentation
@@ -1,21 +1,34 @@
 # Track Specification: Track Progress Visualization (track_progress_viz_20260306)

 ## Overview
-Progress bars and percentage completion for active tracks and tickets. Better visualization of DAG execution state.
+Progress bars and percentage completion for active tracks and tickets.

-## Architectural Constraints
- **Accurate State**: Progress MUST reflect actual ticket status.
- **Efficient Updates**: Status changes MUST trigger efficient UI updates.
+## Current State Audit
+
+### Already Implemented
+- **`models.Track`**: Has tickets list
+- **`project_manager.get_all_tracks()`**: Returns track progress
+- **GUI**: Shows ticket status but no progress bar
+
+### Gaps to Fill
+- No visual progress bar
+- No percentage completion display
+- No ETA estimation

 ## Functional Requirements
- **Progress Bar**: Visual bar showing % complete.
- **Percentage Text**: Display numeric % completion.
- **Ticket Breakdown**: Show completed/remaining counts.
- **ETA Estimate**: Calculate approximate completion time.
+- Progress bar showing % complete
+- Ticket count: X/Y completed
+- ETA estimation based on average time per ticket
+
+## Key Integration Points
+| File | Purpose |
+|-----|---------|
+| `src/gui_2.py` | Progress bar rendering |
+| `src/dag_engine.py` | Completion calculation |

 ## Acceptance Criteria
- [ ] Progress bar renders.
- [ ] Percentage accurate.
- [ ] Counts match actual tickets.
- [ ] ETA calculation works.
- [ ] Updates reflect state changes.
+- [ ] Progress bar renders
+- [ ] Percentage accurate
+- [ ] Counts match actual tickets
+- [ ] ETA calculation works
+- [ ] 1-space indentation
@@ -3,20 +3,134 @@
 ## Overview
 Implement true concurrency for the DAG engine to spawn parallel Tier 3 workers. Currently workers execute sequentially; this track enables 4+ workers to process independent tickets simultaneously, dramatically reducing total pipeline execution time.

+## Current State Audit
+
+### Already Implemented (DO NOT re-implement)
+
+#### Sequential Execution (multi_agent_conductor.py)
+- **`ConductorEngine.run()`**: Main loop calls `loop.run_in_executor(None, run_worker_lifecycle, ...)`
+- **Single executor**: All workers share default ThreadPoolExecutor
+- **Sequential dispatch**: Workers spawned one at a time per loop iteration
+
+#### DAG Engine (dag_engine.py)
+- **`get_ready_tasks()`**: Returns list of tickets with satisfied dependencies
+- **No parallelism**: Returns all ready tickets but engine only processes one
+- **`ExecutionEngine.tick()`**: Returns ready tasks but doesn't spawn them
+
+#### Thread Safety (existing)
+- **`_send_lock`** in ai_client.py serializes API calls
+- **`_pending_gui_tasks_lock`** protects GUI state updates
+- **Ticket status updates** via `engine.update_task_status()`
+
+### Gaps to Fill (This Track's Scope)
+- No parallel worker spawning
+- No worker pool management
+- No concurrent file access protection
+- No per-worker status tracking
+
 ## Architectural Constraints
- **Thread Safety**: All shared state (ticket status, file access) MUST use threading.Lock or equivalent synchronization.
- **File Locking**: Concurrent file access MUST be protected via file locks or Git-based diff-merging to prevent AST collision.
- **Idempotent Status Updates**: Worker status updates MUST be atomic to prevent race conditions.
+
+### Thread Safety (CRITICAL)
+- **`_send_lock` already exists**: All AI calls serialized automatically
+- **Ticket status updates MUST be atomic**: Use lock on status changes
+- **File access MUST be protected**: Concurrent reads OK, writes need coordination
+- **DAG state MUST be protected**: `get_ready_tasks()` returns snapshot
+
+### File Locking Strategy
+- **Read operations**: No locking needed (filesystem handles concurrent reads)
+- **Write operations**: Use file-level locks or serialize via single writer
+- **MCP tools already handle writes**: `mcp_client.py` dispatches synchronously
+
+### Worker Pool Limits
+- Default: 4 concurrent workers
+- Configurable via config.toml
+- Respect API rate limits
+
+## Architecture Reference
+
+### Key Integration Points
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `src/multi_agent_conductor.py` | 80-150 | `ConductorEngine.run()` - parallel dispatch |
+| `src/dag_engine.py` | 50-100 | `ExecutionEngine` - ready task query |
+| `src/ai_client.py` | ~50 | `_send_lock` - API serialization |
+| `src/gui_2.py` | ~170 | `_pending_gui_tasks_lock` - GUI updates |
+
+### Parallel Execution Pattern
+
+```python
+# Current (sequential):
+for ticket in ready_tasks:
+ await loop.run_in_executor(None, run_worker_lifecycle, ticket, ...)
+
+# Proposed (parallel):
+tasks = [
+ loop.run_in_executor(None, run_worker_lifecycle, ticket, ...)
+ for ticket in ready_tasks[:max_workers]
+]
+await asyncio.gather(*tasks, return_exceptions=True)
+```

 ## Functional Requirements
- **Worker Pool**: Implement configurable worker pool (default 4 workers) in DAG engine.
- **Parallel Spawning**: Workers MUST be spawned as independent threads handling ready tickets concurrently.
- **Status Tracking**: Each worker MUST report individual status (running, complete, blocked, failed).
- **Resource Limits**: Pool MUST respect configurable max concurrent workers.
+
+### FR1: Worker Pool
+- Configurable max concurrent workers (default 4)
+- Pool size from config.toml: `[mma] max_workers = 4`
+- Spawn up to `max_workers` tasks in parallel
+
+### FR2: Parallel Spawning
+- `get_ready_tasks()` returns N ready tickets
+- Spawn `min(N, max_workers)` workers simultaneously
+- Use `asyncio.gather()` with `return_exceptions=True`
+
+### FR3: Status Tracking
+- Each worker reports status independently
+- Status updates via atomic `update_task_status()` calls
+- Use lock around status dict modifications
+
+### FR4: Error Isolation
+- One worker failure doesn't kill others
+- `return_exceptions=True` in gather
+- Log errors, mark ticket as failed, continue
+
+## Non-Functional Requirements
+
+| Requirement | Constraint |
+|-------------|------------|
+| Throughput | 4x improvement for 4 independent tickets |
+| Memory | Per-worker stack space (~8MB each) |
+| API Rate | Respect provider rate limits |
+
+## Testing Requirements
+
+### Unit Tests
+- Test `gather` with multiple mock workers
+- Test error isolation (one fails, others continue)
+- Test max_workers limit enforced
+
+### Integration Tests (via `live_gui` fixture)
+- Create 4+ independent tickets
+- Verify all execute simultaneously
+- Verify status updates correctly
+- Verify no race conditions
+
+### Stress Tests
+- 10+ workers with limited pool
+- Verify no deadlock
+- Verify no memory leak
+
+## Out of Scope
+- GPU parallelism (not applicable)
+- Distributed execution (single machine only)
+- Priority-based scheduling (separate track)

 ## Acceptance Criteria
- [ ] Worker pool spawns 4 concurrent workers for 4+ independent tickets.
- [ ] No race conditions on ticket status updates.
- [ ] File locking prevents concurrent edits to same file.
- [ ] Workers report individual status to GUI.
- [ ] >80% test coverage for new concurrency code.
+- [ ] Worker pool spawns 4 concurrent workers for 4+ independent tickets
+- [ ] No race conditions on ticket status updates
+- [ ] File access safe (no corruption)
+- [ ] Workers report individual status to GUI
+- [ ] One worker failure doesn't affect others
+- [ ] max_workers configurable via config.toml
+- [ ] >80% test coverage for concurrency code
+- [ ] 1-space indentation maintained
@@ -1,22 +1,44 @@
 # Track Specification: Visual DAG & Interactive Ticket Editing (visual_dag_ticket_editing_20260306)

 ## Overview
-Replace the linear ticket list in the GUI with an interactive node graph using ImGui Bundle's node editor. Users can visually drag dependency lines, split nodes, or delete tasks before execution.
+Replace linear ticket list with interactive node graph using ImGui Bundle node editor.

-## Architectural Constraints
- **DAG Validity**: Any visual edit MUST maintain valid dependency graph (no cycles).
- **Performance**: Node rendering MUST maintain 60fps with 50+ nodes.
- **State Sync**: Visual state MUST stay synchronized with backend ticket state.
+## Current State Audit
+
+### Already Implemented
+- **`imgui_bundle`**: Includes node editor capability
+- **`_render_ticket_dag_node()`**: Renders ticket nodes (simple)
+- **`dag_engine.py`**: DAG validation, cycle detection
+
+### Gaps to Fill
+- No true node editor integration
+- No visual dependency lines
+- No drag-to-connect for dependencies

 ## Functional Requirements
- **Node Editor**: Integrate ImGui Bundle node editor for ticket visualization.
- **Drag-Drop**: Users can drag nodes to reorder and create dependency lines.
- **Visual Status**: Nodes display color-coded status (todo, running, blocked, done).
- **Edit Validation**: Changes MUST validate against DAG constraints before saving.
+- Use `imgui.node_editor` for ticket visualization
+- Visual dependency lines between nodes
+- Color-coded status (todo=gray, running=yellow, blocked=red, done=green)
+- Drag nodes to create/remove dependencies
+- Validate DAG (no cycles) before saving
+
+## Key Integration Points
+| File | Purpose |
+|-----|---------|
+| `src/gui_2.py` | Node editor integration |
+| `src/dag_engine.py` | DAG validation |
+
+## Architectural Constraints
+- 60fps with 50+ nodes
+- Visual state synced with backend
+
+## Dependencies
+- **Depends on**: `true_parallel_worker_execution_20260306` (for real-time status)

 ## Acceptance Criteria
- [ ] Node editor displays all tickets as connected nodes.
- [ ] Users can drag nodes to create/remove dependencies.
- [ ] Visual changes sync to backend ticket state.
- [ ] DAG validity enforced (no cycles allowed).
- [ ] 60fps maintained with 50+ nodes.
+- [ ] Node editor displays all tickets
+- [ ] Users can create/remove dependencies
+- [ ] Visual changes sync to backend
+- [ ] DAG validity enforced
+- [ ] 60fps with 50+ nodes
+- [ ] 1-space indentation