ed/manual_slop

Fork 0

Files

Ed_ 584bff9c06 docs(mma): Draft Track 4 - Tier 4 QA Interception

2026-02-24 22:26:27 -05:00

7.0 KiB

Raw Blame History

MMA Migration: Epics and Detailed Tasks

Track 1: The Memory Foundations (AST Parser)

Goal: Build the engine that prevents token-bloat by turning massive source files into curated memory views.

1. TDD Approach for `tree-sitter` Integration

Create tests/test_file_cache_ast.py.
Define mock Python source files containing various structures (classes, functions, docstrings, @core_logic decorators, # [HOT] comments).
Write failing tests that instantiate ASTParser and assert that get_skeleton_view() and get_curated_view() return the precisely filtered strings.
Red Phase: Ensure tests fail because ASTParser does not exist.
Green Phase: Implement the tree-sitter logic iteratively until strings match exactly.

2. `ASTParser` Extraction Rules (Tasks)

Task 1.1: Dependency Setup
- Add tree-sitter and tree-sitter-python to pyproject.toml / requirements.txt.
Task 1.2: Core Parser Class
- Create ASTParser in file_cache.py that initializes the language parser.
Task 1.3: Skeleton View Extraction
- Write query to extract function_definition and class_definition.
- Keep signatures, parameters, and return type hints.
- Replace all bodies with pass.
Task 1.4: Curated View Extraction
- Write query to keep class structures and expression_statement docstrings.
- Implement heuristic to preserve full bodies of functions decorated with @core_logic or containing # [HOT] comments.
- Replace all other function bodies with ... # Hidden.

3. Acceptance Testing Criteria

Unit Tests: All AST parsing tests pass with >90% coverage for file_cache.py.
Integration Test: Execute the parser on a large, complex project file (e.g., ai_client.py). The output Skeleton View must be less than 15% of the original token count. The Curated View must correctly retain docstrings and marked functions while stripping standard bodies.

Track 2: State Machine & Data Structures

Goal: Define the rigid Python objects (Pydantic/Dataclasses) that AI agents will pass to each other, enforcing structured data over loose chat strings.

1. TDD Approach for \models.py\

Create \ ests/test_models.py.
Write failing tests that instantiate \Track, \Ticket, and \WorkerContext\ with various valid and invalid schemas.
Write tests that assert state transitions (e.g., from \pending\ to \locked, from \step_paused\ to \completed) correctly update internal flags and dependencies.
Red Phase: Tests fail because \models.py\ classes are undefined or lack transition methods.
Green Phase: Implement the dataclasses and state mutators.

2. State Machine Tasks

Task 2.1: The Dataclasses
- Create \models.py. Define \Ticket\ (id, target_file, prompt, worker_archetype, status, dependencies).
- Define \Track\ (id, title, description, status, tickets).
Task 2.2: Worker Context Definition
- Define \WorkerContext\ holding a \Ticket\ ID, assigned model, configuration injection, and an ephemeral \messages\ array.
Task 2.3: State Mutator Methods
- Implement methods like \ icket.mark_blocked(dependency_id), \ icket.mark_complete(), and \ rack.get_executable_tickets(). Ensure strict validation of valid state transitions.

3. Acceptance Testing Criteria

Unit Tests: \models.py\ has 100% test coverage for all state transitions.
Integration Test: Instantiate a \Track\ with 3 dependent \Tickets\ in Python. Programmatically mark tickets as complete and assert that the subsequent dependent tickets transition from \locked\ to \pending\ without any AI involvement.

Track 3: The Linear Orchestrator & Execution Clutch

Goal: Build the synchronous, debuggable core loop that runs a single Tier 3 Worker and pauses for human approval.

1. TDD Approach for \multi_agent_conductor.py\

Create \ ests/test_conductor.py.
Write tests that mock the AI client response (e.g., returning a mock tool call like \write_file).
Test that
un_worker_lifecycle(ticket: Ticket)\ fetches the Raw View from \ile_cache.py, formats messages, and processes the mock output.
Test that execution pauses (waits for a simulated human signal) when the \ rust_level\ dictates.
Red Phase: Failure occurs because \multi_agent_conductor.py\ lacks the lifecycle execution loop.
Green Phase: Implement the \ConductorEngine\ core execution block.

2. Linear Orchestration Tasks

Task 3.1: The Engine Core
- Create \multi_agent_conductor.py. Implement the \ConductorEngine\ class containing the
  un_worker_lifecycle\ synchronous execution.
Task 3.2: Context Injection
- Implement logic reading the Ticket target, querying \ile_cache.py\ for the \Raw View, and formatting the messages array for the API.
Task 3.3: The HITL Execution Clutch
- Before executing tools via \mcp_client.py\ or \shell_runner.py, intercept the tool payload if the Worker's archetype dictates a \step\ mode.
- Wait for explicit user confirmation via a CLI prompt (or event block for UI future-proofing). Allow editing of the JSON payload.
- Flush history upon \TicketCompleted.

3. Acceptance Testing Criteria

Unit Tests: Context generation, API schema mapping, and event-blocking are tested for all Edge cases.
Integration Test: Manually execute a script pointing the \ConductorEngine\ at a dummy file. The CLI should pause before \write_file\ execution, display the diff, allow manual JSON editing via terminal input, execute the updated JSON file modification, and return \Task Complete.

Track 4: Tier 4 QA Interception

Goal: Stop error traces from destroying the Worker's token window by routing crashes through a cheap, stateless translator.

1. TDD Approach for \shell_runner.py\

Create \ ests/test_shell_runner.py.
Write tests that mock a local execution failure (e.g., returning a mock 3000-line Python stack trace).
Test that the error is intercepted and passed to a mock Tier 4 agent.
Test that the output is compressed into a 20-word fix before returning.
Red Phase: Fails because no interception loop exists in \shell_runner.py.
Green Phase: Implement the try/except logic handling \subprocess.run()\ with
eturncode != 0.

2. QA Interception Tasks

Task 4.1: The Interceptor Loop
- Open \shell_runner.py\ and catch execution errors.
Task 4.2: Tier 4 Instantiation
- Construct a secondary, synchronous API call directly to the \default_cheap\ model, sending the raw \stderr\ and the offending code snippet.
Task 4.3: Payload Formatting
- Inject the 20-word fix response from the Tier 4 agent back into the main Tier 3 worker's history context as a system hint.

3. Acceptance Testing Criteria

Unit Tests: Verify that massive error outputs never leak uncompressed into the main history logs.
Integration Test: Purposely introduce a syntax error in a local script. Ensure the orchestrator catches it, pings the mock/cheap API, and the history log receives the 20-word hint instead of the 200-line stack trace.

7.0 KiB Raw Blame History Unescape Escape

MMA Migration: Epics and Detailed Tasks

Track 1: The Memory Foundations (AST Parser)

1. TDD Approach for tree-sitter Integration

2. ASTParser Extraction Rules (Tasks)

3. Acceptance Testing Criteria

Track 2: State Machine & Data Structures

1. TDD Approach for \models.py\

2. State Machine Tasks

3. Acceptance Testing Criteria

Track 3: The Linear Orchestrator & Execution Clutch

1. TDD Approach for \multi_agent_conductor.py\

2. Linear Orchestration Tasks

3. Acceptance Testing Criteria

Track 4: Tier 4 QA Interception

1. TDD Approach for \shell_runner.py\

2. QA Interception Tasks

3. Acceptance Testing Criteria

7.0 KiB

Raw Blame History

1. TDD Approach for `tree-sitter` Integration

2. `ASTParser` Extraction Rules (Tasks)