manual_slop/conductor/workflow.md

# Project Workflow

## Session Start Checklist (MANDATORY)

## Code Style (MANDATORY - Python)

- **1-space indentation** for ALL Python code (NO EXCEPTIONS)
- **CRLF line endings** on Windows
- **NO COMMENTS** unless explicitly requested
- Type hints required for all public functions
- **ImGui Defer Patterns:** Use `imscope` context managers or `_render_window_if_open` dispatch helpers to prevent resource leaks and keep the main loop flat. See `conductor/code_styleguides/python.md` for details.

### CRITICAL: Native Edit Tool Destroys Indentation

The native `Edit` tool DESTROYS 1-space indentation and converts to 4-space.

**NEVER use native `edit` tool on Python files.**

Instead, use Manual Slop MCP tools:
- `manual-slop_py_update_definition` - Replace function/class
- `manual-slop_set_file_slice` - Replace line range
- `manual-slop_py_set_signature` - Replace signature only

Or use Python subprocess with `newline=''` to preserve line endings:
```python
python -c "
with open('file.py', 'r', encoding='utf-8', newline='') as f:
    content = f.read()
content = content.replace(old, new)
with open('file.py', 'w', encoding='utf-8', newline='') as f:
    f.write(content)
"
```

## Guiding Principles

1. **The Plan is the Source of Truth:** All work must be tracked in `plan.md`
2. **The Tech Stack is Deliberate:** Changes to the tech stack must be documented in `tech-stack.md` *before* implementation
3. **Test-Driven Development:** Write unit tests before implementing functionality
4. **High Code Coverage:** Aim for >80% code coverage for all modules
5. **User Experience First:** Every decision should prioritize user experience
6. **Non-Interactive & CI-Aware:** Prefer non-interactive commands. Use `CI=true` for watch-mode tools (tests, linters) to ensure single execution.
7. **MMA Tiered Delegation is Mandatory:** The Conductor acts as a Tier 1/2 Orchestrator. You MUST delegate all non-trivial coding to Tier 3 Workers and all error analysis to Tier 4 QA Agents. Do NOT perform large file writes directly.
8. **Mandatory Research-First Protocol:** Before reading the full content of any file over 50 lines, you MUST use `get_file_summary`, `py_get_skeleton`, `py_get_code_outline`, or `py_get_docstring` to map the architecture and identify specific target ranges. Use `get_git_diff` to understand recent changes. Use `py_find_usages` to locate where symbols are used.
9. **Architecture Documentation Fallback:** When uncertain about threading, event flow, data structures, or module interactions, consult the deep-dive docs in `docs/` (last refreshed: 2026-06-02 via the comprehensive documentation refresh track, **8 new guides added**):
   - **[docs/guide_architecture.md](../docs/guide_architecture.md):** Thread domains, cross-thread patterns, AI client multi-provider (Gemini, Anthropic, DeepSeek, Gemini CLI, MiniMax), HITL Execution Clutch.
   - **[docs/guide_tools.md](../docs/guide_tools.md):** MCP Bridge 3-layer security, full 45-tool inventory, Hook API, ApiHookClient, `/api/ask` HITL protocol.
   - **[docs/guide_mma.md](../docs/guide_mma.md):** Ticket/Track/WorkerContext data structures, DAG engine, ConductorEngine, Tier 2/3/4 lifecycles, persona application.
   - **[docs/guide_simulations.md](../docs/guide_simulations.md):** `live_gui` fixture, Puppeteer pattern, mock provider, test areas by subsystem.
   - **[docs/guide_testing.md](../docs/guide_testing.md):** **NEW** — 251 test files, 5 categories, 7 conftest fixtures (`isolate_workspace`, `reset_paths`, `reset_ai_client`, `vlogger`, `kill_process_tree`, `mock_app`, `live_gui` session-scoped), Puppeteer pattern, mock provider, structural testing contract.
   - **[docs/guide_gui_2.md](../docs/guide_gui_2.md):** **NEW** — `src/gui_2.py` (260KB main GUI): App class lifecycle, ~90 module-level render functions, Multi-Viewport docks, panel registry, command palette integration, ImGuiScope context managers, hot reload support.
   - **[docs/guide_ai_client.md](../docs/guide_ai_client.md):** **NEW** — `src/ai_client.py` (116KB): multi-provider LLM singleton (5 providers), async dispatch via `asyncio.gather`, threading.local for source tier tagging, Anthropic ephemeral caching + Gemini explicit caching, Tier 4 QA error interception.
   - **[docs/guide_api_hooks.md](../docs/guide_api_hooks.md):** **NEW** — `src/api_hooks.py` + `src/api_hook_client.py` (38KB + 31KB): HookServer on `127.0.0.1:8999`, ApiHookClient wrapper, 8+ endpoints, Remote Confirmation Protocol via `/api/ask`.
   - **[docs/guide_mcp_client.md](../docs/guide_mcp_client.md):** **NEW** — `src/mcp_client.py` (81KB, 45 tools): 3-layer security (Allowlist → Validate → Resolve), all native tools (File I/O, Python AST, C/C++ AST, Analysis, Network, Runtime, Beads), ExternalMCPManager (Stdio + SSE), JSON-RPC 2.0 engine.
   - **[docs/guide_app_controller.md](../docs/guide_app_controller.md):** **NEW** — `src/app_controller.py` (166KB): headless orchestrator, AppState dataclass, all subsystem managers, `_predefined_callbacks`/`_gettable_fields` Hook API registries, SyncEventQueue, headless mode.
   - **[docs/guide_multi_agent_conductor.md](../docs/guide_multi_agent_conductor.md):** **NEW** — `src/multi_agent_conductor.py` + `src/dag_engine.py` (28KB + 10KB): TrackDAG (iterative DFS cycle detection, Kahn's topological sort), ExecutionEngine (Auto-Queue / Step Mode), MultiAgentConductor + WorkerPool (concurrency 4), mma_exec.py sub-agent invocation.
   - **[docs/guide_models.md](../docs/guide_models.md):** **NEW** — `src/models.py` (132KB): centralized data model registry, `AGENT_TOOL_NAMES` canonical 45-tool list, `PROVIDERS` constant, `parse_plan_md` utility, validation patterns, SDM tags.
   - See [docs/Readme.md](../docs/Readme.md) for the full **14-guide index** covering context curation, shaders, RAG, Beads, hot reload, personas, NERV theme, workspace profiles, and command palette.

## Task Workflow

All tasks follow a strict lifecycle:

### Standard Task Workflow

0. **Initialize MMA Environment:** Before executing the first task of any track, you MUST activate the `mma-orchestrator` skill (`activate_skill mma-orchestrator`).

1. **Select Task:** Choose the next available task from `plan.md` in sequential order

2. **Mark In Progress:** Before beginning work, edit `plan.md` and change the task from `[ ]` to `[~]`

3. **High-Signal Research Phase:**
   - **Identify Dependencies:** Use `list_directory`, `get_tree`, and `py_get_imports` to map file relations.
   - **Map Architecture:** Use `py_get_code_outline` or `py_get_skeleton` on identified files to understand their structure.
   - **Audit State:** Use `py_get_code_outline` or `py_get_definition` on the target class's `__init__` method to check for existing, unused, or duplicate state variables before adding new ones.
   - **Analyze Changes:** Use `get_git_diff` if the task involves modifying recently updated code.
   - **Minimize Token Burn:** Only use `read_file` with `start_line`/`end_line` for specific implementation details once target areas are identified.
4. **Write Failing Tests (Red Phase):**
   - **Pre-Delegation Checkpoint:** Before spawning a worker for dangerous or non-trivial changes, ensure your current progress is staged (`git add .`) or committed. This prevents losing iterations if a sub-agent incorrectly uses `git restore`.
   - **Zero-Assertion Ban:** You MUST NOT write tests that contain only `pass` or lack meaningful assertions. A test is only valid if it contains assertions that explicitly test the behavioral change and verify the failure condition.
   - **Code Style:** ALWAYS explicitly mention "Use exactly 1-space indentation for Python code" when prompting a sub-agent.
   - **Delegate Test Creation:** Do NOT write test code directly. Spawn a Tier 3 Worker (`python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`) with a **surgical prompt** specifying WHERE (file:line range), WHAT (test to create), HOW (which assertions/fixtures to use), and SAFETY (thread constraints if applicable). Example: `"Write tests in tests/test_cost_tracker.py for cost_tracker.py:estimate_cost(). Test all model patterns in MODEL_PRICING dict. Assert unknown model returns 0. Use 1-space indentation."` (If repeating due to failures, pass `--failure-count X` to switch to a more capable model).
   - Take the code generated by the Worker and apply it.
   - **CRITICAL:** Run the tests and confirm that they fail as expected. This is the "Red" phase of TDD. Do not proceed until you have failing tests.

5. **Implement to Pass Tests (Green Phase):**
   - **Pre-Delegation Checkpoint:** Ensure current progress is staged or committed before delegating.
   - **Code Style:** ALWAYS explicitly mention "Use exactly 1-space indentation for Python code" when prompting a sub-agent.
   - **Delegate Implementation:** Do NOT write the implementation code directly. Spawn a Tier 3 Worker (`python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`) with a **surgical prompt** specifying WHERE (file:line range to modify), WHAT (the specific change), HOW (which API calls, data structures, or patterns to use), and SAFETY (thread-safety constraints). Example: `"In gui_2.py _render_mma_dashboard (lines 2685-2699), extend the token usage table from 3 to 5 columns. Add 'Model' and 'Est. Cost' using imgui.table_setup_column(). Call cost_tracker.estimate_cost(model, input_tokens, output_tokens). Use 1-space indentation."` (If repeating due to failures, pass `--failure-count X` to switch to a more capable model).
   - Take the code generated by the Worker and apply it.
   - Run the test suite again and confirm that all tests now pass. This is the "Green" phase.

6. **Refactor (Optional but Recommended):**
   - With the safety of passing tests, refactor the implementation code and the test code to improve clarity, remove duplication, and enhance performance without changing the external behavior.
   - Rerun tests to ensure they still pass after refactoring.

7. **Verify Coverage:** Run coverage reports using the project's chosen tools. For example, in a Python project, this might look like:
   ```powershell
   pytest --cov=app --cov-report=html
   ```
   Target: >80% coverage for new code. The specific tools and commands will vary by language and framework.

8. **Document Deviations:** If implementation differs from tech stack:
   - **STOP** implementation
   - Update `tech-stack.md` with new design
   - Add dated note explaining the change
   - Resume implementation

9. **Commit Code Changes:**
   - **CRITICAL - ATOMIC PER-TASK COMMITS**: You MUST commit your changes immediately after completing and verifying a single task. Do NOT move on to the next task in the plan without committing the current one. This ensures precise tracking and safe rollback points.
   - Stage all code changes related to the task.
   - Propose a clear, concise commit message e.g, `feat(ui): Create basic HTML structure for calculator`.
   - Perform the commit.

10. **Attach Task Summary with Git Notes:**
   - **Step 9.1: Get Commit Hash:** Obtain the hash of the *just-completed commit* (`git log -1 --format="%H"`).
   - **Step 9.2: Draft Note Content:** Create a detailed summary for the completed task. This should include the task name, a summary of changes, a list of all created/modified files, and the core "why" for the change.
   - **Step 9.3: Attach Note:** Use the `git notes` command to attach the summary to the commit.
     ```powershell
     # The note content from the previous step is passed via the -m flag.
     git notes add -m "<note content>" <commit_hash>
     ```

11. **Get and Record Task Commit SHA:**
    - **Step 10.1: Update Plan:** Read `plan.md`, find the line for the completed task, update its status from `[~]` to `[x]`, and append the first 7 characters of the *just-completed commit's* commit hash.
    - **Step 10.2: Write Plan:** Write the updated content back to `plan.md`.

12. **Commit Plan Update:**
    - **Action:** Stage the modified `plan.md` file.
    - **Action:** Commit this change with a descriptive message (e.g., `conductor(plan): Mark task 'Create user model' as complete`).

### Phase Completion Verification and Checkpointing Protocol

**Trigger:** This protocol is executed immediately after a task is completed that also concludes a phase in `plan.md`.

1.  **Announce Protocol Start:** Inform the user that the phase is complete and the verification and checkpointing protocol has begun.

2.  **Ensure Test Coverage for Phase Changes:**
    -   **Step 2.1: Determine Phase Scope:** To identify the files changed in this phase, you must first find the starting point. Read `plan.md` to find the Git commit SHA of the *previous* phase's checkpoint. If no previous checkpoint exists, the scope is all changes since the first commit.
    -   **Step 2.2: List Changed Files:** Execute `git diff --name-only <previous_checkpoint_sha> HEAD` to get a precise list of all files modified during this phase.
    -   **Step 2.3: Verify and Create Tests:** For each file in the list:
        -   **CRITICAL:** First, check its extension. Exclude non-code files (e.g., `.json`, `.md`, `.yaml`).
        -   For each remaining code file, verify a corresponding test file exists.
        -   If a test file is missing, you **must** create one. Before writing the test, **first, analyze other test files in the repository to determine the correct naming convention and testing style.** The new tests **must** validate the functionality described in this phase's tasks (`plan.md`).

3.  **Execute Automated Tests in Batches:**
    -   Because the full suite is large (>360 tests) and contains complex UI simulations, running the entire suite frequently can lead to random timeouts or threading access violations.
    -   Before execution, you **must** announce the exact shell command.
    -   **CRITICAL:** When verifying changes, **do not run the full suite (`pytest tests/`)**. Instead, run tests in small, targeted batches (maximum 4 test files at a time). Only use long timeouts (`--timeout=60` or `--timeout=120`) if the specific tests in the batch are known to be slow (e.g., simulation tests).
    -   **Example Announcement:** "I will now run the automated test suite to verify the phase. **Command:** `uv run pytest tests/test_specific_feature.py`"
    -   Execute the announced command.
        - If tests fail with significant output (e.g., a large traceback), **DO NOT** attempt to read the raw `stderr` directly into your context. Instead, pipe the output to a log file and **spawn a Tier 4 QA Agent (`python scripts/mma_exec.py --role tier4-qa "[PROMPT]"`)** to summarize the failure.
        - You **must** inform the user and begin debugging using the QA Agent's summary. You may attempt to propose a fix a **maximum of two times**. If the tests still fail after your second proposed fix, you **must stop**, report the persistent failure, and ask the user for guidance.

4.  **Execute Automated API Hook Verification:**
    -   **CRITICAL:** The Conductor agent will now automatically execute verification tasks using the application's API hooks.
    -   The agent will announce the start of the automated verification to the user.
    -   It will then communicate with the application's IPC server to trigger the necessary verification functions.
    -   **Result Handling:**
        -   All results (successes and failures) from the API hook invocations will be logged.
        -   If all automated verifications pass, the agent will inform the user and proceed to the next step (Create Checkpoint Commit).
        -   If any automated verification fails, the agent will halt the workflow, present the detailed failure logs to the user, and await further instructions for debugging or remediation.

5.  **Present Automated Verification Results and User Confirmation:**
    -   After executing automated verification, the Conductor agent will present the results to the user.
    -   If verification passed, the agent will state: "Automated verification completed successfully."
    -   If verification failed, the agent will state: "Automated verification failed. Please review the logs above for details. You may attempt to propose a fix a **maximum of two times**. If the tests still fail after your second proposed fix, you **must stop**, report the persistent failure, and ask the user for guidance."
    -   **PAUSE** and await the user's response. Do not proceed without an explicit yes or confirmation from the user to proceed if tests pass, or guidance if tests fail.

6.  **Create Checkpoint Commit:**
    -   Stage all changes. If no changes occurred in this step, proceed with an empty commit.
    -   Perform the commit with a clear and concise message (e.g., `conductor(checkpoint): Checkpoint end of Phase X`).

7.  **Attach Auditable Verification Report using Git Notes:**
    -   **Step 7.1: Draft Note Content:** Create a detailed verification report including the automated test command, the manual verification steps, and the user's confirmation.
    -   **Step 7.2: Attach Note:** Use the `git notes` command and the full commit hash from the previous step to attach the full report to the checkpoint commit.

8.  **Get and Record Phase Checkpoint SHA:**
    -   **Step 8.1: Get Commit Hash:** Obtain the hash of the *just-created checkpoint commit* (`git log -1 --format="%H"`).
    -   **Step 8.2: Update Plan:** Read `plan.md`, find the heading for the completed phase, and append the first 7 characters of the commit hash in the format `[checkpoint: <sha>]`.
    -   **Step 8.3: Write Plan:** Write the updated content back to `plan.md`.

9. **Commit Plan Update:**
    - **Action:** Stage the modified `plan.md` file.
    - **Action:** Commit this change with a descriptive message following the format `conductor(plan): Mark phase '<PHASE NAME>' as complete`.

10.  **Announce Completion:** Inform the user that the phase is complete and the checkpoint has been created, with the detailed verification report attached as a git note.

### Verification via API Hooks

For features involving the GUI or complex internal state, unit tests are often insufficient. You MUST use the application's built-in API hooks for empirical verification:

1.  **Launch the App with Hooks:** Run the application in a separate shell with the `--enable-test-hooks` flag:
    ```powershell
    uv run python gui.py --enable-test-hooks
    ```
    This starts the hook server on port `8999`.

2.  **Use the pytest `live_gui` Fixture:** For automated tests, use the session-scoped `live_gui` fixture defined in `tests/conftest.py`. This fixture handles the lifecycle (startup/shutdown) of the application with hooks enabled.
    ```python
    def test_my_feature(live_gui):
        # The GUI is now running on port 8999
        ...
    ```
    Note: pytest must be run with `uv`.

3.  **Verify via ApiHookClient:** Use the `ApiHookClient` in `api_hook_client.py` to interact with the running application. It includes robust retry logic and health checks.

4.  **Verify via REST Commands:** Use PowerShell or `curl` to send commands to the application and verify the response. For example, to check health:
    ```powershell
    Invoke-RestMethod -Uri "http://127.0.0.1:8999/status" -Method Get
    ```

### Quality Gates

Before marking any task complete, verify:

- [ ] All tests pass
- [ ] Code coverage meets requirements (>80%)
- [ ] Code follows project's code style guidelines (as defined in `code_styleguides/`)
- [ ] All public functions/methods are documented (e.g., docstrings, JSDoc, GoDoc)
- [ ] Type safety is enforced (e.g., type hints, TypeScript types, Go types)
- [ ] No linting or static analysis errors (using the project's configured tools)
- [ ] Works correctly on mobile (if applicable)
- [ ] Documentation updated if needed
- [ ] No security vulnerabilities introduced

## Development Commands

**AI AGENT INSTRUCTION: This section should be adapted to the project's specific language, framework, and build tools.**

### Setup

```powershell
# Example: Commands to set up the development environment (e.g., install dependencies, configure database)
# e.g., for a Node.js project: npm install
# e.g., for a Go project: go mod tidy
```

### Daily Development

```powershell
# Example: Commands for common daily tasks (e.g., start dev server, run tests, lint, format)
# e.g., for a Node.js project: npm run dev, npm test, npm run lint
# e.g., for a Go project: go run main.go, go test ./..., go fmt ./...
```

### Before Committing

```powershell
# Example: Commands to run all pre-commit checks (e.g., format, lint, type check, run tests)
# e.g., for a Node.js project: npm run check
# e.g., for a Go project: make check (if a Makefile exists)
```

## Testing Requirements

### Structural Testing Contract

1.  **Ban on Arbitrary Core Mocking:** Tier 3 workers are strictly forbidden from using `unittest.mock.patch` to bypass or stub core infrastructure (e.g., event queues, `ai_client` internals, threading primitives) unless explicitly authorized by the Tier 2 Tech Lead for a specific boundary test.
2.  **`live_gui` Standard:** All integration and end-to-end testing must utilize the `live_gui` fixture to interact with a real instance of the application via the Hook API. Bypassing the hook server to directly mutate GUI state in tests is prohibited.
3.  **Artifact Isolation:** All test-generated artifacts (logs, temporary workspaces, mock outputs) MUST be written to the `tests/artifacts/` or `tests/logs/` directories. These directories are git-ignored to prevent repository pollution.

### Unit Testing

- Every module must have corresponding tests.
- Use appropriate test setup/teardown mechanisms (e.g., fixtures, beforeEach/afterEach).
- Mock external dependencies.
- Test both success and failure cases.

### Integration Testing

- Test complete user flows
- Verify database transactions
- Test authentication and authorization
- Check form submissions

### Mobile Testing

- Test on actual iPhone when possible
- Use Safari developer tools
- Test touch interactions
- Verify responsive layouts
- Check performance on 3G/4G

## Code Review Process

### Self-Review Checklist

Before requesting review:

1. **Functionality**
   - Feature works as specified
   - Edge cases handled
   - Error messages are user-friendly

2. **Code Quality**
   - Follows style guide
   - DRY principle applied
   - Clear variable/function names
   - Appropriate comments

3. **Testing**
   - Unit tests comprehensive
   - Integration tests pass
   - Coverage adequate (>80%)

4. **Security**
   - No hardcoded secrets
   - Input validation present
   - SQL injection prevented
   - XSS protection in place

5. **Performance**
   - Database queries optimized
   - Images optimized
   - Caching implemented where needed

6. **Mobile Experience**
   - Touch targets adequate (44x44px)
   - Text readable without zooming
   - Performance acceptable on mobile
   - Interactions feel native

## Commit Guidelines

### Message Format

```
<type>(<scope>): <description>

[optional body]

[optional footer]
```

### Types

- `feat`: New feature
- `fix`: Bug fix
- `docs`: Documentation only
- `style`: Formatting, missing semicolons, etc.
- `refactor`: Code change that neither fixes a bug nor adds a feature
- `test`: Adding missing tests
- `chore`: Maintenance tasks

### Examples

```powershell
git commit -m "feat(auth): Add remember me functionality"
git commit -m "fix(posts): Correct excerpt generation for short posts"
git commit -m "test(comments): Add tests for emoji reaction limits"
git commit -m "style(mobile): Improve button touch targets"
```

## Definition of Done

A task is complete when:

1. All code implemented to specification
2. Unit tests written and passing
3. Code coverage meets project requirements
4. Documentation complete (if applicable)
5. Code passes all configured linting and static analysis checks
6. Works beautifully on mobile (if applicable)
7. Implementation notes added to `plan.md`
8. Changes committed with proper message
9. Git note with task summary attached to the commit

## Conductor Token Firewalling & Model Switching Strategy

To emulate the 4-Tier MMA Architecture within the standard Conductor extension without requiring a custom fork, adhere to these strict workflow policies:

### 1. Active Model Switching (Simulating the 4 Tiers)

- **Mandatory Skill Activation:** As the very first step of any MMA-driven process, including track initialization and implementation phases, the agent MUST activate the `mma-orchestrator` skill (`activate_skill mma-orchestrator`) and their corresponding role's specific tier skill. This is crucial for enforcing the 4-Tier token firewall.
- **The MMA Bridge (`mma_exec.py`):** All tiered delegation is routed through `uv python scripts/mma_exec.py`. This script acts as the primary bridge, managing model selection, context injection, and logging.
- **Model Tiers:**
    - **Tier 1 (Strategic/Orchestration):** `gemini-3.1-pro-preview`. Focused on product alignment, setup (`/conductor:setup`), and track initialization (`/conductor:newTrack`).
    - **Tier 2 (Architectural/Tech Lead):** `gemini-3-flash-preview`. Focused on architectural design and track execution (`/conductor:implement`). **Note:** Tier 2 maintains persistent memory throughout a track's implementation.
    - **Tier 3 (Execution/Worker):** `gemini-2.5-flash-lite`. Used for surgical code implementation and test generation. Operates statelessly (Context Amnesia) but has access to file I/O tools.
    - **Tier 4 (Utility/QA):** `gemini-2.5-flash-lite`. Used for log summarization and error analysis. Operates statelessly (Context Amnesia) but has access to diagnostic tools.
- **Tiered Delegation Protocol:**
    - **Tier 3 Worker:** `uv run python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`
    - **Tier 4 QA Agent:** `uv run python scripts/mma_exec.py --role tier4-qa "[PROMPT]"`
- **Observability:** All hierarchical interactions are recorded in `logs/mma_delegation.log` and detailed sub-agent logs are saved to `logs/agents/`.

### 2. Context Management and Token Firewalling

- **Context Amnesia (Tiers 3 & 4):** `mma_exec.py` enforces "Context Amnesia" by executing sub-agents in a stateless manner. Each call starts with a clean slate, receiving only the strictly necessary documents and prompts.
- **Persistent Memory (Tier 2):** The Tier 2 Tech Lead does NOT use Context Amnesia during track implementation to ensure continuity of technical strategy.
- **AST Skeleton Views:** For Tier 3 implementation, `mma_exec.py` automatically generates "AST Skeleton Views" of project dependencies. This provides the worker model with the interface-level structure (function signatures, docstrings) of imported modules without the full source code, maximizing the signal-to-noise ratio in the context window.

### 3. Phase Checkpoints (The Final Defense)

- The **Phase Completion Verification and Checkpointing Protocol** is the project's primary defense against token bloat.
- When a Phase is marked complete and a checkpoint commit is created, the AI Agent must actively interpret this as a **"Context Wipe"** signal. It should summarize the outcome in its git notes and move forward treating the checkpoint as absolute truth, deliberately dropping earlier conversational history.
- **MMA Phase Memory Wipe:** After completing a major Phase, use the Tier 1/2 Orchestrator's perspective to consolidate state into Git Notes and then disregard previous trial-and-error histories.

---

## Known Pitfalls (2026-06-05)

### Defer-Not-Catch Pattern for Native Crashes

`imgui-bundle` (and similar native extension libraries) expose C-level functions that can crash the Python process with a Windows access violation (`0xc0000005`) or a SIGSEGV on Linux. **These crashes are not catchable from Python** — `try/except Exception` does not intercept native access violations, only Python exceptions.

The fix is **defer-not-catch**: track a one-shot "ready" flag in instance state; return early on the first call, only invoking the C function on subsequent calls. See [../docs/guide_gui_2.md](../docs/guide_gui_2.md#workspace-profile-defer-not-catch) and [../docs/guide_testing.md](../docs/guide_testing.md#known-gotchas-2026-06-05) for the canonical examples and how to recognize these crashes.

When designing any method that calls into `imgui.*` (or similar native libs), ask: "Can this be called before ImGui is fully initialized?" If yes, add a defer-not-catch guard.

**Sentinel type contract.** When implementing a defer-not-catch guard, the early-return sentinel value must match the type contract of the downstream consumer. For `WorkspaceProfile.ini_content: str` (in this codebase), the sentinel must be `""` (str), not `b""` (bytes) — `tomli_w` rejects bytes (`TypeError: Object of type 'bytes' is not TOML serializable`), and `imgui.load_ini_settings_from_memory(ini_data: str, ...)` also expects `str`. A previous version of this fix used `b""` and silently broke the save flow via a `TypeError` raised by `tomli_w.dump`; tests passed unit-test-wise but failed in the live_gui save+load round-trip. The fix was a 1-character change (`b""` → `""`). The regression test in `tests/test_workspace_profile_serialization.py` encodes this contract.

### Test Failure Bisect Anchors (Theme Track)

When debugging test failures introduced by a theming/visual change, use the following bisect anchors:

- **Pre-existing failures:** bisect to commit `7df65dff` (last commit before the multi_themes_20260604 track began). Failures that reproduce at this anchor are pre-existing and not caused by the theme changes.
- **Theme-caused failures:** bisect to commit `7ea52cbb` (the theme refactor commit). Failures that only appear after this commit but not at `7df65dff` were introduced by the theme track.

In particular, watch for:
- Tests asserting theme color usage: the theme track changed `C_LBL` etc. from `ImVec4` values to callable functions. Tests that assert with `C_LBL` (the function) need to be updated to `C_LBL()` (the call), and they need to patch `src.theme_2.imgui` so the mock's `theme.get_color()` returns the mock's `ImVec4`.
- Tests with production code that builds dicts of theme color callables (e.g. `DIR_COLORS = {"request": C_OUT}`): the dict must store the function, and the use site must call it (`d_col()` not `d_col`). Bug example: `src/gui_2.py:3705-3707` (commit `1469ecac`).

### Live_gui Test Fragility (Authoring-Side)

`live_gui` is a session-scoped fixture. All tests in a session share the same `sloppy.py` subprocess. A test that "passes when run after test X but fails in isolation" is a **fragile test, not a fragile fixture**. The fixture is session-scoped by design; the test must explicitly wait-for-ready, reset state via Hook API, and verify preconditions via `get_value`/`wait_for_event` rather than assuming a "clean" ImGui state from a prior test. See [../docs/guide_testing.md](../docs/guide_testing.md#authoring-robust-live_gui-tests-dont-assume-clean-state) for the 5-rule authoring contract with anti-pattern vs pattern code examples. Bisect failures by running the test both in the full suite and in isolation to distinguish "test needs work" from "real app bug".


### Indentation-Driven Class Method Visibility (CRITICAL)

**The bug:** A class method defined with the right intent (2-space indent) may be parsed as **nested inside the previous function** if indentation is off by even one space. The file "passes" syntactically (imports OK) but the method is **not** on the class. `hasattr(App, 'method_name')` returns `False`. Any production code that calls `app.method_name` falls through to `__getattr__`, which delegates to the Controller (which also doesn't have the method), and a cryptic `AttributeError` is raised at runtime.

**This bit the project in 2026-06-05** during a cleanup commit. `_capture_workspace_profile` was indented with 3 spaces instead of 2 (drift from re-organizing method placement). The Python parser saw the method as a nested function inside `_apply_snapshot` (the previous method). The App class had 59 methods but no `_capture_workspace_profile`. 3 live_gui tests (test_auto_switch_sim, test_workspace_profiles_restoration, test_undo_redo_lifecycle) failed with cryptic `AttributeError: 'AppController' object has no attribute '_capture_workspace_profile'` deep in the test subprocess.

**How to detect during TDD:**
- After modifying a class body, walk the AST and verify all expected methods are class-level:
  ```bash
  uv run python -c "import ast; tree = ast.parse(open('src/gui_2.py').read()); [print(item.name) for n in ast.walk(tree) if isinstance(n, ast.ClassDef) and n.name == 'App' for item in n.body if isinstance(item, ast.FunctionDef)]"
  ```
- The skeleton via `manual-slop_py_get_skeleton` should show the method as a class member. If it's missing, it's nested.

**How to fix:** Re-indent the affected method to exactly 2-space class level. Use the file_slice tool or PyCharm-style auto-format to verify. Run the failing test to confirm.

**Prevention:** When reorganizing a class body, run the AST check above immediately after the edit. This catches the issue in <1 second vs. finding it via failing live_gui tests minutes later.

---

## Planning Session Workflow

Some sessions are *planning-only* — the agent produces `spec.md` + `metadata.json` + `state.toml` + `plan.md` for a new track. NO code is written. The flow:

1. **Explore** the project context. Use the `brainstorming` skill for the structured process (explore → clarify → propose → spec → review → plan).
2. **Ask clarifying questions** (one at a time; multiple choice preferred) to nail down the design. The "what are you trying to achieve + what are the constraints" questions come first; the "what is the scope" question comes after.
3. **Propose 2-3 approaches** with tradeoffs. Lead with the recommended one and explain why.
4. **Write the spec** following the established template (Overview / Goals / Non-Goals / Architecture / Per-File Design / Migration / Risks / Out of Scope / See Also). The spec is the agent's *design intent* — it explains WHY, not just WHAT.
5. **User reviews the spec**. Revise until approved. **The spec MUST be approved before the plan is written.** A plan for an unapproved spec is wasted effort.
6. **Write the plan** following the `writing-plans` skill (2-5 minute steps; full code; TDD). The plan is the agent's *executable plan* — it shows exactly what code to write, one step at a time.
7. **User reviews the plan**. Revise until approved.
8. **Commit spec + plan** in separate commits (per-track: spec commit + plan commit; both with git notes summarizing the work). User invokes implementation in a different session.

**The plan is the only artifact the implementing agent reads.** Specs are reference; plans are executable. Both are committed.

**The agent (planning role) does not execute.** If a "while you're at it, can you also..." request arrives mid-session, redirect to a follow-up track; do NOT bundle unrelated work.

**For the agent's own reference:** the `brainstorming` skill is the source of truth for steps 1-6. The `writing-plans` skill is the source of truth for step 6.

---

## Track Dependencies and Execution Order

Tracks can depend on other tracks. The `blocked_by` field in each track's `metadata.json` lists the track IDs that must ship first. The field name in state.toml is `[blocked_by]` (a table of track_id = "merged" | "planned" | etc.).

Before starting implementation of a track:

1. **Verify all tracks in `blocked_by` are SHIPPED.** Check `conductor/tracks.md` for status (`[x]` = done), or read each blocked_by track's `state.toml` to confirm `current_phase` equals the last phase and the track's notes indicate completion.
2. **If any blocker is NOT shipped:** report to the Tier 2 Tech Lead. Do not proceed.
3. **If the post-state baseline assumptions in the spec (usually a §10 "Coordination with Pending Tracks" section) are not met:** STOP. The implementer must verify the baseline BEFORE starting Phase 1 of the track. The verification commands are in the spec.

The recommended execution order is the topological sort of the `blocked_by` graph. This is usually recorded in the most recent `docs/reports/PLANNING_DIGEST_*.md` (under "Recommended Execution Order" or "Dependency Picture").

---

## State.toml Template

Every track's `conductor/tracks/<track_id>/state.toml` should follow this structure (used as the agent's "where am I in this track" source of truth):

```toml
# Track state for <track_id>
# Updated by Tier 2 Tech Lead as tasks complete

[meta]
track_id = "<track_id>"
name = "<Human-Readable Name>"
status = "active"  # active | completed
current_phase = 0  # 0 = pre-Phase 1; 1..N = in Phase N; "complete" if all phases done
last_updated = "<YYYY-MM-DD>"

[blocked_by]
# Optional. List of track_id = "merged" | "planned" | etc.
# When the implementation agent starts Phase 1, verify all listed tracks are merged.
other_track_id = "merged"

[blocks]
# Optional. Tracks that depend on this one (populated from the spec's §12.1 "Follow-up Track" section).
followup_track_id = "planned in <this_track_id>"

[phases]
# One entry per phase. Update checkpointsha when the phase checkpoint commit is made.
phase_1 = { status = "pending", checkpointsha = "", name = "<Phase Name>" }
phase_2 = { status = "pending", checkpointsha = "", name = "<Phase Name>" }
# ...

[tasks]
# Tasks within phases. Structure: t<phase>_<n> = { status, commit_sha, description }
# status: "pending" | "in_progress" | "completed" | "cancelled"
# The implementing agent marks "in_progress" when starting and "completed" with commit_sha when done.
t1_1 = { status = "pending", commit_sha = "", description = "<task description>" }
# ...

[verification]
# Filled as phases complete. The metadata.json's verification_criteria is the source of truth.
phase_<n>_<thing>_complete = false

[<track_specific_section>]
# Optional. Track-specific progress tracking (e.g., audit_count_progression, refactor_stats).
# Add whatever is useful for THIS track.

[public_api_migration_followup]
# Optional. If the spec plans a follow-up, list it here so future planners can find it.
```

The `current_phase` field is the single source of truth for "where is this track." When the implementing agent advances, they update it.

---

## Per-Task Decision Protocol

When the implementing agent encounters a decision not covered by the plan:

1. **If the decision is purely cosmetic** (e.g., variable naming, comment placement, exact spacing): pick the option that matches the surrounding code style. Document the choice in the commit message.
2. **If the decision affects the architecture** (e.g., the spec's data model doesn't fit the code; the plan's approach doesn't compile; an external library doesn't behave as expected): **STOP. Do not commit. Report to the Tier 2 Tech Lead.** The lead will either:
   - Update the spec to match the new constraint
   - Add a clarifying task to the plan
   - Defer the work to a follow-up track
3. **If the decision is a regression** (e.g., the plan's code works but introduces a known bug, or fails a test the plan didn't anticipate): **STOP and report.** Don't ship a known regression to save time. The lead will decide whether to fix forward or roll back.

**The principle: small decisions, decide yourself. Large decisions, escalate.** The boundary is "does this decision require a new spec or plan update?"

**Documentation:** if a decision was made that the spec or plan should reflect (even if it was a small decision), add a brief note in the commit message. The next agent (after compaction) reads commit messages to recover context.

---

## Skip-Marker Policy: Documentation, Not Avoidance

`@pytest.mark.skip(reason=...)` is **documentation of a known failure**, not a way to avoid fixing the underlying bug. Skip markers are useful for:

- **Opt-in integration tests** that require external resources (a real API key, a live provider, a specific env var). Use `@pytest.mark.skipif(...)` with an env-var gate so the test runs when the resource is available and skips by default.
- **Tests for features that don't exist yet** (planned but not implemented).
- **Tests for features behind a feature flag** that's currently off.

Skip markers are NOT useful for:

- **Pre-existing failing tests** (a test that "used to pass" or "was supposed to pass but the underlying code regressed"). The underlying code/test should be fixed in-session.
- **Tests that the agent doesn't understand** ("I don't know how to fix this, so I'll skip it"). Escalate to a Tier 4 QA agent for analysis, or ask the user.
- **Tests with racy assertions that the agent doesn't want to debug** (e.g., a `time.sleep(0.5)` would fix it). Fix the race, don't skip.

**When you add a skip marker, you MUST also:**
1. Document the underlying issue in the `reason=` string (one or two sentences).
2. State what the fix would be (file:line or a one-line description).
3. Commit the skip with a follow-up note in the commit body that records the underlying issue, so the next agent (or future self after compaction) can find it via `git log --oneline --grep "skip"`.

**When the underlying issue is fixable in-session, FIX IT INSTEAD of adding a skip marker.** Limited context is not an excuse: the agent may not know whether the fix is "important" or "easy" until it tries. A skip marker that never gets revisited is a silent test-suite rot.

**Review checklist before adding a skip marker:**
- [ ] Is this a known-bad infrastructure issue (env-var gated)? Use `@pytest.mark.skipif` instead.
- [ ] Is this a feature not yet implemented? If so, the feature should be a TODO, not a skip.
- [ ] Can the test be fixed in < 30 minutes of investigation? If yes, fix it.
- [ ] If the fix is too large, is the underlying issue tracked elsewhere (a conductor track, a TODO in the code)?

Reference: AGENTS.md "Critical Anti-Patterns" section "Use skip markers as excuse to AVOID" (added 2026-06-07).

---

## Documentation Refresh Protocol

Architectural refactor tracks often change the *shape* of modules the existing docs describe. After a track ships, the affected guides may be partly out of date.

**After each track ships, the implementing agent must:**

1. **Identify affected guides.** Run `grep -l "<renamed_or_moved_thing>" docs/guide_*.md` to find guides that reference renamed/moved symbols. Also check `docs/Readme.md` for the table of guides.
2. **For each affected guide, update it to reflect the new module structure.** If the spec's §3 or §4 lists the new file structure, mirror that in the guide.
3. **If the track introduced a NEW module**, add a new guide (or a new section to an existing guide). Per the project's `docs/Readme.md` structure, deep-dive guides are per-source-file (e.g., `guide_ai_client.md`, `guide_mcp_client.md`).
4. **If the track introduced a NEW convention** (e.g., the `Result[T]` pattern, the `TypeAlias` convention, the sub-MCP architecture), add a styleguide in `conductor/code_styleguides/<convention_name>.md`. Update `conductor/product-guidelines.md` to reference it.
5. **Commit the doc updates** as part of the track's final phase (or as a follow-up track if the scope is too large).

**The "post-tracks documentation" pattern is repeatable.** A track that only updates code (not docs) is incomplete. The latest `docs/reports/PLANNING_DIGEST_*.md` (under "Recommended Future Tracks") often lists the documentation refresh as the next track.

**Test for staleness:** before marking a track complete, run `git log --oneline -10 -- conductor/tracks/<track_id>/` to confirm the docs were touched in the same window as the code. If only code was committed, the track is incomplete.

---

## Audit Script Policy

Whenever a track introduces a new convention that can be statically checked, add an audit script in `scripts/`. The audit + CI gate pair is the convention-enforcement mechanism for this project. Conventions without audits will drift; audits without CI integration will be ignored.

**Script conventions:**
- Filename: `audit_<thing>.py` or `check_<thing>.py` (matching the existing 3 scripts)
- Must have a `--help` that explains what it checks and how to fix violations
- Should support a `--json` mode for CI integration (machine-readable output)
- Should have a default informational mode (exits 0; prints human-readable report) AND a strict mode (exits 1 on regression; used as CI gate)
- Should be runnable from the repo root

**Existing audit scripts as precedent:**
- `scripts/audit_main_thread_imports.py` — enforces the main-thread-purity invariant from the `startup_speedup_20260606` track
- `scripts/audit_weak_types.py` — enforces the type-alias convention from the `data_structure_strengthening_20260606` track
- `scripts/check_test_toml_paths.py` — enforces no real-TOML references in tests (predates the audit-script-policy, but follows the pattern)

**CI integration:** when a new audit script is added, it should be added to whatever CI workflow exists (or a follow-up track should add the CI workflow if one doesn't exist). The strict mode of the audit is the gate.

**The audit-script + styleguide pair:** every audit script's documented "what it checks" should map to a section in a `conductor/code_styleguides/` file. The styleguide says "this is the rule"; the audit says "your code violates this rule." The pair is complete when both exist.