refinement of upcoming tracks

2026-03-06 15:41:33 -05:00
parent 3ce6a2ec8a
commit fca40fd8da
24 changed files with 2388 additions and 391 deletions
@@ -1,30 +1,167 @@
 # Implementation Plan: Deep AST Context Pruning (deep_ast_context_pruning_20260306)

-## Phase 1: AST Integration
- [ ] Task: Initialize MMA Environment
- [ ] Task: Verify tree_sitter availability
-    - WHERE: requirements.txt, imports
-    - WHAT: Ensure tree_sitter installed and importable
-    - HOW: pip install tree_sitter
+> **Reference:** [Spec](./spec.md) | [Architecture Guide](../../../docs/guide_architecture.md)

-## Phase 2: Skeleton Generation
- [ ] Task: Implement AST parser
-    - WHERE: src/file_cache.py or new module
-    - WHAT: Parse Python AST using tree_sitter
-    - HOW: tree_sitter.Language + Parser
-    - SAFETY: Exception handling for parse errors
- [ ] Task: Implement skeleton generator
-    - WHERE: src/file_cache.py
-    - WHAT: Extract function signatures and docstrings
-    - HOW: Walk AST, collect Def nodes
-    - SAFETY: Handle large files gracefully
- [ ] Task: Integrate with worker dispatch
-    - WHERE: src/multi_agent_conductor.py
-    - WHAT: Inject skeleton into worker prompt
-    - HOW: Modify context generation
-    - SAFETY: Verify tokens reduced
+## Phase 1: Verify Existing Infrastructure
+Focus: Confirm tree-sitter integration works

-## Phase 3: Tests & Verification
- [ ] Task: Write AST parsing tests
- [ ] Task: Verify token reduction
- [ ] Task: Conductor - Phase Verification
+- [ ] Task 1.1: Initialize MMA Environment
+    - Run `activate_skill mma-orchestrator` before starting
+
+- [ ] Task 1.2: Verify tree_sitter installation
+    - WHERE: `requirements.txt`, imports
+    - WHAT: Ensure `tree_sitter` and `tree_sitter_python` are installed
+    - HOW: Check imports in `src/file_cache.py`
+    - CMD: `uv pip list | grep tree`
+
+- [ ] Task 1.3: Verify ASTParser functionality
+    - WHERE: `src/file_cache.py`
+    - WHAT: Test get_skeleton() and get_curated_view()
+    - HOW: Use `manual-slop_py_get_definition` on ASTParser class
+    - OUTPUT: Document exact API
+
+- [ ] Task 1.4: Review worker context injection
+    - WHERE: `src/multi_agent_conductor.py` `run_worker_lifecycle()`
+    - WHAT: Understand current context injection pattern
+    - HOW: Use `manual-slop_py_get_code_outline` on function
+
+## Phase 2: Targeted Function Extraction
+Focus: Extract only relevant functions from target files
+
+- [ ] Task 2.1: Implement targeted extraction function
+    - WHERE: `src/file_cache.py` or new `src/context_pruner.py`
+    - WHAT: Function to extract specific functions by name
+    - HOW:
+      ```python
+      def extract_functions(code: str, function_names: list[str]) -> str:
+       parser = ASTParser("python")
+       tree = parser.parse(code)
+       # Walk AST, find function_definition nodes matching names
+       # Return combined signatures + docstrings
+      ```
+    - CODE STYLE: 1-space indentation
+
+- [ ] Task 2.2: Add dependency traversal
+    - WHERE: Same as Task 2.1
+    - WHAT: Find functions called by target functions
+    - HOW: Parse function body for Call nodes, extract names
+    - SAFETY: Limit traversal depth to prevent explosion
+
+- [ ] Task 2.3: Integrate with worker context
+    - WHERE: `src/multi_agent_conductor.py` `run_worker_lifecycle()`
+    - WHAT: Use targeted extraction when ticket has target_file
+    - HOW:
+      - Check if `ticket.target_file` matches a context file
+      - If so, use `extract_functions()` instead of full content
+      - Fall back to skeleton for other files
+    - SAFETY: Handle missing function names gracefully
+
+## Phase 3: AST Caching
+Focus: Cache parsed trees to avoid re-parsing
+
+- [ ] Task 3.1: Implement AST cache in file_cache.py
+    - WHERE: `src/file_cache.py`
+    - WHAT: LRU cache for parsed AST trees
+    - HOW:
+      ```python
+      from functools import lru_cache
+      from pathlib import Path
+      import time
+      
+      _ast_cache: dict[str, tuple[float, Any]] = {}  # path -> (mtime, tree)
+      _CACHE_MAX_SIZE: int = 10
+      
+      def get_cached_tree(path: str) -> tree_sitter.Tree:
+       mtime = Path(path).stat().st_mtime
+       if path in _ast_cache:
+        cached_mtime, tree = _ast_cache[path]
+        if cached_mtime == mtime:
+         return tree
+       # Parse and cache
+       code = Path(path).read_text()
+       tree = parser.parse(code)
+       _ast_cache[path] = (mtime, tree)
+       if len(_ast_cache) > _CACHE_MAX_SIZE:
+        # Evict oldest
+        oldest = next(iter(_ast_cache))
+        del _ast_cache[oldest]
+       return tree
+      ```
+    - SAFETY: Thread-safe if called from single thread
+
+- [ ] Task 3.2: Use cache in skeleton generation
+    - WHERE: `src/file_cache.py`
+    - WHAT: Use cached tree instead of re-parsing
+    - HOW: Call `get_cached_tree()` in `get_skeleton()`
+
+## Phase 4: Token Measurement
+Focus: Measure and log token reduction
+
+- [ ] Task 4.1: Add token counting to context injection
+    - WHERE: `src/multi_agent_conductor.py`
+    - WHAT: Count tokens before and after pruning
+    - HOW:
+      ```python
+      def _count_tokens(text: str) -> int:
+       return len(text) // 4  # Rough estimate
+      ```
+    - SAFETY: Non-blocking, fast calculation
+
+- [ ] Task 4.2: Log token reduction metrics
+    - WHERE: `src/multi_agent_conductor.py`
+    - WHAT: Log reduction percentage
+    - HOW: `print(f"Context tokens: {before} -> {after} ({reduction_pct}% reduction)")`
+    - SAFETY: Use session_logger for structured logging
+
+- [ ] Task 4.3: Display in MMA dashboard (optional)
+    - WHERE: `src/gui_2.py` `_render_mma_dashboard()`
+    - WHAT: Show token reduction per worker
+    - HOW: Add to worker stream panel
+    - SAFETY: Optional enhancement
+
+## Phase 5: Testing
+Focus: Verify all functionality
+
+- [ ] Task 5.1: Write targeted extraction tests
+    - WHERE: `tests/test_context_pruner.py` (new file)
+    - WHAT: Test extraction returns only specified functions
+    - HOW: Create test file with known functions, extract subset
+
+- [ ] Task 5.2: Write integration test
+    - WHERE: `tests/test_context_pruner.py`
+    - WHAT: Run worker with skeleton context
+    - HOW: Use `live_gui` fixture with mock provider
+    - VERIFY: Worker completes ticket successfully
+
+- [ ] Task 5.3: Performance test
+    - WHERE: `tests/test_context_pruner.py`
+    - WHAT: Verify parse time < 100ms
+    - HOW: Time parsing of various file sizes
+
+- [ ] Task 5.4: Conductor - Phase Verification
+    - Run: `uv run pytest tests/test_context_pruner.py tests/test_ast_parser.py -v`
+    - Verify token reduction in logs
+
+## Implementation Notes
+
+### tree-sitter Pattern
+- Already implemented in `file_cache.py`
+- Language: `tree_sitter_python`
+- Node types: `function_definition`, `class_definition`, `import_statement`
+
+### Cache Strategy
+- Key: file path (absolute)
+- Value: (mtime, tree) tuple
+- Eviction: LRU with max 10 entries
+- Invalidation: mtime comparison
+
+### Files Modified
+- `src/file_cache.py`: Add cache, targeted extraction
+- `src/multi_agent_conductor.py`: Use targeted extraction
+- `tests/test_context_pruner.py`: New test file
+
+### Code Style Checklist
+- [ ] 1-space indentation throughout
+- [ ] CRLF line endings on Windows
+- [ ] No comments unless documenting API
+- [ ] Type hints on all functions
@@ -3,20 +3,126 @@
 ## Overview
 Use tree_sitter to parse target file AST and inject condensed skeletons into worker prompts. Currently workers receive full file context; this track reduces token burn by injecting only relevant function/method signatures.

+## Current State Audit
+
+### Already Implemented (DO NOT re-implement)
+
+#### ASTParser in file_cache.py (src/file_cache.py)
+- **Uses tree-sitter** with `tree_sitter_python` language
+- **`ASTParser.get_skeleton(code: str) -> str`**: Returns file with function bodies replaced by `...`
+- **`ASTParser.get_curated_view(code: str) -> str`**: Enhanced skeleton preserving `@core_logic` and `# [HOT]` bodies
+- **Pattern**: Parse → Walk AST → Identify function_definition nodes → Preserve signature/docstring, replace body
+
+#### Worker Context Injection (multi_agent_conductor.py)
+- **`run_worker_lifecycle()`** function handles context injection
+- **First file**: Gets `get_curated_view()` (full hot paths)
+- **Subsequent files**: Get `get_skeleton()` (signatures only)
+- **`context_requirements`**: List of files from Ticket dataclass
+
+#### MCP Tool Integration (mcp_client.py)
+- **`py_get_skeleton()`**: Already exposes skeleton generation as tool
+- **`py_get_code_outline()`**: Returns hierarchical outline with line ranges
+- **Tools available to workers** for on-demand full reads
+
+### Gaps to Fill (This Track's Scope)
+- Workers still receive full first file in some cases
+- No selective function extraction based on ticket target
+- No caching of parsed ASTs (re-parse on each context build)
+- Token reduction not measured/verified
+
 ## Architectural Constraints
- **Parsing Performance**: AST parsing MUST complete in <100ms per file.
- **Caching**: Parsed AST trees SHOULD be cached to avoid re-parsing.
- **Selective Exposure**: Only target functions/classes related to the ticket's target_file should be included.
+
+### Parsing Performance
+- AST parsing MUST complete in <100ms per file
+- tree-sitter is already fast (C extension)
+- Consider caching parsed trees in memory
+
+### Skeleton Quality
+- Must preserve enough context for worker to understand interface
+- Must preserve docstrings for API documentation
+- Must preserve type hints in signatures
+
+### Worker Autonomy
+- Workers MUST still be able to call `py_get_definition` for full source
+- Skeleton is the default, not the only option
+- Workers can request full reads on-demand
+
+## Architecture Reference
+
+### Key Integration Points
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `src/file_cache.py` | 30-80 | `ASTParser` class with tree-sitter |
+| `src/multi_agent_conductor.py` | 150-200 | `run_worker_lifecycle()` context injection |
+| `src/models.py` | 30-50 | `Ticket.context_requirements` field |
+| `src/mcp_client.py` | 200-250 | `py_get_skeleton()` MCP tool |
+
+### tree-sitter Pattern (existing)
+```python
+from file_cache import ASTParser
+parser = ASTParser("python")
+tree = parser.parse(code)
+skeleton = parser.get_skeleton(code)
+curated = parser.get_curated_view(code)
+```

 ## Functional Requirements
- **AST Parser**: Integrate tree_sitter to parse Python files.
- **Target Detection**: Identify which functions/methods to include based on ticket target_file.
- **Skeleton Generation**: Generate condensed skeletons with signatures and docstrings only.
- **Worker Integration**: Inject skeleton into worker prompt instead of full file content.
+
+### FR1: Targeted Function Extraction
+- Given a ticket's `target_file` and context, identify relevant functions
+- Extract only those function signatures + docstrings
+- Include imports and class definitions they depend on
+
+### FR2: Dependency Graph Traversal
+- For target function, find all called functions
+- Include signatures of dependencies (not full bodies)
+- Limit depth to prevent explosion
+
+### FR3: AST Caching
+- Cache parsed AST trees per file path
+- Invalidate cache when file mtime changes
+- Use `file_cache` pattern already in place
+
+### FR4: Token Measurement
+- Log token count before/after pruning
+- Calculate reduction percentage
+- Display in MMA dashboard or logs
+
+## Non-Functional Requirements
+
+| Requirement | Constraint |
+|-------------|------------|
+| Parse Time | <100ms per file |
+| Memory | Cache size bounded (LRU, max 10 files) |
+| Token Reduction | >50% for typical worker prompts |
+
+## Testing Requirements
+
+### Unit Tests
+- Test targeted extraction returns only specified functions
+- Test dependency traversal includes correct functions
+- Test cache invalidation on file change
+
+### Integration Tests
+- Run worker with skeleton context, verify completion
+- Compare token counts: full vs skeleton
+- Verify worker can still call py_get_definition
+
+### Performance Tests
+- Measure parse time for files of various sizes
+- Verify <100ms for files up to 1000 lines
+
+## Out of Scope
+- Non-Python file parsing (Python only for now)
+- Cross-file dependency tracking
+- Automatic relevance detection (manual target specification only)

 ## Acceptance Criteria
- [ ] tree_sitter successfully parses Python AST.
- [ ] Skeleton includes only target function/class with signature and docstring.
- [ ] Token count reduced by >50% for typical worker prompts.
- [ ] Workers can still complete tickets with skeleton-only context.
- [ ] >80% test coverage.
+- [ ] Targeted function extraction works
+- [ ] Token count reduced by >50% for typical prompts
+- [ ] Workers complete tickets with skeleton-only context
+- [ ] AST caching reduces re-parsing overhead
+- [ ] Token reduction metrics logged
+- [ ] >80% test coverage for new code
+- [ ] 1-space indentation maintained