hopefully done refining

2026-03-06 16:14:31 -05:00
parent 88e27ae414
commit 1294104f7f
20 changed files with 1736 additions and 734 deletions
--- a/conductor/tracks/tool_usage_analytics_20260306/plan.md
+++ b/conductor/tracks/tool_usage_analytics_20260306/plan.md
@@ -1,25 +1,107 @@
 # Implementation Plan: Tool Usage Analytics (tool_usage_analytics_20260306)

-## Phase 1: Data Collection
- [ ] Task: Initialize MMA Environment
- [ ] Task: Verify tool_log_callback
-    - WHERE: src/ai_client.py
-    - WHAT: Check existing logging
+> **Reference:** [Spec](./spec.md) | [Architecture Guide](../../../docs/guide_architecture.md)

-## Phase 2: Aggregation
- [ ] Task: Implement usage aggregation
-    - WHERE: src/gui_2.py or new module
-    - WHAT: Count tools, avg times, failures
-    - HOW: Process tool_log entries
-    - SAFETY: Efficient data structures
+## Phase 1: Data Collection
+Focus: Add tool execution tracking
+
+- [ ] Task 1.1: Initialize MMA Environment
+    - Run `activate_skill mma-orchestrator` before starting
+
+- [ ] Task 1.2: Add tool stats state
+    - WHERE: `src/app_controller.py` or `src/gui_2.py`
+    - WHAT: Add `_tool_stats: dict[str, dict]` state
+    - HOW:
+      ```python
+      self._tool_stats: dict[str, dict] = {}
+      # Structure: {tool_name: {"count": 0, "total_time_ms": 0, "failures": 0}}
+      ```
+    - CODE STYLE: 1-space indentation
+
+- [ ] Task 1.3: Hook into tool execution
+    - WHERE: `src/ai_client.py` in tool execution path
+    - WHAT: Track tool name, time, success/failure
+    - HOW:
+      ```python
+      start_time = time.time()
+      try:
+       result = mcp_client.dispatch(name, args)
+       success = True
+      except Exception:
+       success = False
+      finally:
+       elapsed_ms = (time.time() - start_time) * 1000
+       # Update stats via callback or direct update
+      ```
+    - SAFETY: Don't impact tool execution performance
+
+## Phase 2: Aggregation Logic
+Focus: Calculate derived metrics
+
+- [ ] Task 2.1: Implement stats update function
+    - WHERE: `src/app_controller.py`
+    - WHAT: Function to update tool stats
+    - HOW:
+      ```python
+      def _update_tool_stats(self, tool_name: str, elapsed_ms: float, success: bool) -> None:
+       if tool_name not in self._tool_stats:
+        self._tool_stats[tool_name] = {"count": 0, "total_time_ms": 0.0, "failures": 0}
+       self._tool_stats[tool_name]["count"] += 1
+       self._tool_stats[tool_name]["total_time_ms"] += elapsed_ms
+       if not success:
+        self._tool_stats[tool_name]["failures"] += 1
+      ```
+
+- [ ] Task 2.2: Calculate average time and failure rate
+    - WHERE: `src/gui_2.py` in render function
+    - WHAT: Derive avg_time and failure_rate from stats
+    - HOW:
+      ```python
+      for tool, stats in self._tool_stats.items():
+       count = stats["count"]
+       avg_time = stats["total_time_ms"] / count if count > 0 else 0
+       failure_rate = (stats["failures"] / count * 100) if count > 0 else 0
+      ```

 ## Phase 3: Visualization
- [ ] Task: Render analytics
-    - WHERE: src/gui_2.py
-    - WHAT: Charts and tables
-    - HOW: imgui tables, plot_lines
-    - SAFETY: Handle empty data
+Focus: Display analytics in GUI

-## Phase 4: Verification
- [ ] Task: Test analytics
- [ ] Task: Conductor - Phase Verification
+- [ ] Task 3.1: Add analytics panel
+    - WHERE: `src/gui_2.py` in MMA Dashboard or Operations
+    - WHAT: Table showing tool stats
+    - HOW:
+      ```python
+      if imgui.collapsing_header("Tool Usage Analytics"):
+       if imgui.begin_table("tool_stats", 4):
+        imgui.table_setup_column("Tool")
+        imgui.table_setup_column("Count")
+        imgui.table_setup_column("Avg Time (ms)")
+        imgui.table_setup_column("Failure %")
+        imgui.table_headers_row()
+        for tool, stats in sorted(self._tool_stats.items(), key=lambda x: -x[1]["count"]):
+         imgui.table_next_row()
+         imgui.table_set_column_index(0)
+         imgui.text(tool)
+         # ... other columns
+        imgui.end_table()
+      ```
+
+## Phase 4: Reset on Session Clear
+Focus: Clear stats on new session
+
+- [ ] Task 4.1: Clear stats on session reset
+    - WHERE: `src/gui_2.py` or `src/app_controller.py` reset handler
+    - WHAT: Clear `_tool_stats` dict
+    - HOW: `self._tool_stats.clear()`
+
+## Phase 5: Testing
+Focus: Verify all functionality
+
+- [ ] Task 5.1: Write unit tests
+    - WHERE: `tests/test_tool_analytics.py` (new file)
+    - WHAT: Test stats accumulation, avg calculation
+    - HOW: Mock tool execution, verify stats update
+
+- [ ] Task 5.2: Conductor - Phase Verification
+    - Run: `uv run pytest tests/test_tool_analytics.py -v`
+    - Manual: Verify analytics panel displays in GUI
--- a/conductor/tracks/tool_usage_analytics_20260306/spec.md
+++ b/conductor/tracks/tool_usage_analytics_20260306/spec.md
@@ -1,35 +1,99 @@
 # Track Specification: Tool Usage Analytics (tool_usage_analytics_20260306)

 ## Overview
-Analytics panel showing most-used tools, average execution time, failure rates.
+Analytics panel showing most-used tools, average execution time, and failure rates. Uses existing tool execution data from ai_client.

 ## Current State Audit

-### Already Implemented
- **`ai_client.tool_log_callback`**: Called when tool executes
- **`mcp_client.dispatch()`**: Routes tool calls
- **No aggregation or storage**
+### Already Implemented (DO NOT re-implement)

-### Gaps to Fill
- No tool usage tracking
- No execution time tracking
+#### Tool Execution (src/ai_client.py)
+- **Tool dispatch in `_execute_tool_calls_concurrently()`**: Executes tools via `mcp_client.dispatch()`
+- **`pre_tool_callback`**: Optional callback before tool execution
+- **No built-in tracking or aggregation**
+
+#### MCP Client (src/mcp_client.py)
+- **`dispatch(name, args)`**: Routes tool calls to implementations
+- **26 tools available** (run_powershell, read_file, py_get_skeleton, etc.)
+- **`MUTATING_TOOLS`**: Set of tools that modify files
+
+### Gaps to Fill (This Track's Scope)
+- No tool usage tracking (count per tool)
+- No execution time tracking per tool
 - No failure rate tracking
+- No analytics display in GUI
+
+## Architectural Constraints
+
+### Efficient Aggregation
+- Track tool stats in lightweight data structure
+- Don't impact tool execution performance
+- Use dict: `{tool_name: {count, total_time, failures}}`
+
+### Memory Bounds
+- Only track stats, not full history
+- Reset on session reset
+
+## Architecture Reference
+
+### Key Integration Points
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `src/ai_client.py` | ~500-550 | Tool execution - add tracking |
+| `src/gui_2.py` | ~2700-2800 | Analytics panel |
+
+### Proposed Tracking Structure
+```python
+# In AppController or App:
+self._tool_stats: dict[str, dict] = {}
+# Structure: {"read_file": {"count": 10, "total_time_ms": 150, "failures": 0}, ...}
+```

 ## Functional Requirements
- Track tool name, execution time, success/failure
- Aggregate by tool name
- Display ranking by usage count
- Show average time per tool
- Show failure percentage

-## Key Integration Points
-| File | Purpose |
-|-----|---------|
-| `src/ai_client.py` | Hook into tool_log_callback |
-| `src/gui_2.py` | Analytics panel rendering |
+### FR1: Tool Usage Tracking
+- Track tool name, execution time, success/failure
+- Store in `_tool_stats` dict
+- Update on each tool execution
+
+### FR2: Aggregation by Tool
+- Count total calls per tool
+- Calculate average execution time
+- Track failure count and rate
+
+### FR3: Analytics Display
+- Table showing tool name, count, avg time, failure rate
+- Sort by usage count (most used first)
+- Show in MMA Dashboard or Operations panel
+
+## Non-Functional Requirements
+
+| Requirement | Constraint |
+|-------------|------------|
+| Tracking Overhead | <1ms per tool call |
+| Memory | <1KB for stats dict |
+
+## Testing Requirements
+
+### Unit Tests
+- Test tracking updates correctly
+- Test failure rate calculation
+
+### Integration Tests
+- Execute tools, verify stats accumulate
+- Reset session, verify stats cleared
+
+## Out of Scope
+- Historical analytics across sessions
+- Export to file
+- Per-ticket tool breakdown

 ## Acceptance Criteria
- [ ] Tool ranking displayed
- [ ] Average times accurate
- [ ] Failure rates tracked
- [ ] 1-space indentation
+- [ ] Tool execution tracked
+- [ ] Count per tool accurate
+- [ ] Average time calculated
+- [ ] Failure rate shown
+- [ ] Display in GUI panel
+- [ ] Reset on session clear
+- [ ] 1-space indentation maintained