Compare commits
21 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 273fcf29f1 | |||
| 1eed009b12 | |||
| aed461ef28 | |||
| 1d36357c64 | |||
| 3113e4137b | |||
| cf5eac8c43 | |||
| db00fba836 | |||
| a862119922 | |||
| e6a57cddc2 | |||
| 928318fd06 | |||
| 5416546207 | |||
| 9c2078ad78 | |||
| ab44102bad | |||
| c8b7fca368 | |||
| b3e6590cb4 | |||
| d85dc3a1b3 | |||
| 2947948ac6 | |||
| d9148acb0c | |||
| 2c39f1dcf4 | |||
| 1a8efa880a | |||
| 11eb69449d |
@@ -1,100 +0,0 @@
|
||||
---
|
||||
name: tier1-orchestrator
|
||||
description: Tier 1 Orchestrator for product alignment and high-level planning.
|
||||
model: gemini-3.1-pro-preview
|
||||
tools:
|
||||
- read_file
|
||||
- list_directory
|
||||
- discovered_tool_search_files
|
||||
- grep_search
|
||||
- discovered_tool_get_file_summary
|
||||
- discovered_tool_get_python_skeleton
|
||||
- discovered_tool_get_code_outline
|
||||
- discovered_tool_get_git_diff
|
||||
- discovered_tool_web_search
|
||||
- discovered_tool_fetch_url
|
||||
- activate_skill
|
||||
- discovered_tool_run_powershell
|
||||
- discovered_tool_py_find_usages
|
||||
- discovered_tool_py_get_imports
|
||||
- discovered_tool_py_check_syntax
|
||||
- discovered_tool_py_get_hierarchy
|
||||
- discovered_tool_py_get_docstring
|
||||
- discovered_tool_get_tree
|
||||
- discovered_tool_py_get_definition
|
||||
---
|
||||
STRICT SYSTEM DIRECTIVE: You are a Tier 1 Orchestrator.
|
||||
Focused on product alignment, high-level planning, and track initialization.
|
||||
ONLY output the requested text. No pleasantries.
|
||||
|
||||
## Architecture Fallback
|
||||
When planning tracks that touch core systems, consult the deep-dive docs:
|
||||
- `docs/guide_architecture.md`: Thread domains, event system, AI client, HITL mechanism, frame-sync action catalog
|
||||
- `docs/guide_tools.md`: MCP Bridge security, 26-tool inventory, Hook API endpoints, ApiHookClient
|
||||
- `docs/guide_mma.md`: Ticket/Track data structures, DAG engine, ConductorEngine, worker lifecycle
|
||||
- `docs/guide_simulations.md`: live_gui fixture, Puppeteer pattern, mock provider, verification patterns
|
||||
|
||||
## The Surgical Methodology
|
||||
|
||||
When creating or refining tracks, you MUST follow this protocol:
|
||||
|
||||
### 1. MANDATORY: Audit Before Specifying
|
||||
NEVER write a spec without first reading the actual code using your tools.
|
||||
Use `get_code_outline`, `py_get_definition`, `grep_search`, and `get_git_diff`
|
||||
to build a map of what exists. Document existing implementations with file:line
|
||||
references in a "Current State Audit" section in the spec.
|
||||
|
||||
**WHY**: Previous track specs asked to implement features that already existed
|
||||
(Track Browser, DAG tree, approval dialogs) because no code audit was done first.
|
||||
This wastes entire implementation phases.
|
||||
|
||||
### 2. Identify Gaps, Not Features
|
||||
Frame requirements around what's MISSING relative to what exists:
|
||||
GOOD: "The existing `_render_mma_dashboard` (gui_2.py:2633-2724) has a token
|
||||
usage table but no cost estimation column."
|
||||
BAD: "Build a metrics dashboard with token and cost tracking."
|
||||
|
||||
### 3. Write Worker-Ready Tasks
|
||||
Each plan task must be executable by a Tier 3 worker on gemini-2.5-flash-lite
|
||||
without understanding the overall architecture. Every task specifies:
|
||||
- **WHERE**: Exact file and line range (`gui_2.py:2700-2701`)
|
||||
- **WHAT**: The specific change (add function, modify dict, extend table)
|
||||
- **HOW**: Which API calls or patterns (`imgui.progress_bar(...)`, `imgui.collapsing_header(...)`)
|
||||
- **SAFETY**: Thread-safety constraints if cross-thread data is involved
|
||||
|
||||
### 4. For Bug Fix Tracks: Root Cause Analysis
|
||||
Don't write "investigate and fix." Read the code, trace the data flow, list
|
||||
specific root cause candidates with code-level reasoning.
|
||||
|
||||
### 5. Reference Architecture Docs
|
||||
Link to relevant `docs/guide_*.md` sections in every spec so implementing
|
||||
agents have a fallback for threading, data flow, or module interactions.
|
||||
|
||||
### 6. Map Dependencies Between Tracks
|
||||
State execution order and blockers explicitly in metadata.json and spec.
|
||||
|
||||
## Spec Template (REQUIRED sections)
|
||||
```
|
||||
# Track Specification: {Title}
|
||||
|
||||
## Overview
|
||||
## Current State Audit (as of {commit_sha})
|
||||
### Already Implemented (DO NOT re-implement)
|
||||
### Gaps to Fill (This Track's Scope)
|
||||
## Goals
|
||||
## Functional Requirements
|
||||
## Non-Functional Requirements
|
||||
## Architecture Reference
|
||||
## Out of Scope
|
||||
```
|
||||
|
||||
## Plan Template (REQUIRED format)
|
||||
```
|
||||
## Phase N: {Name}
|
||||
Focus: {One-sentence scope}
|
||||
|
||||
- [ ] Task N.1: {Surgical description with file:line refs and API calls}
|
||||
- [ ] Task N.2: ...
|
||||
- [ ] Task N.N: Write tests for Phase N changes
|
||||
- [ ] Task N.X: Conductor - User Manual Verification (Protocol in workflow.md)
|
||||
```
|
||||
@@ -1,29 +0,0 @@
|
||||
---
|
||||
name: tier2-tech-lead
|
||||
description: Tier 2 Tech Lead for architectural design and execution.
|
||||
model: gemini-3-flash-preview
|
||||
tools:
|
||||
- read_file
|
||||
- write_file
|
||||
- replace
|
||||
- list_directory
|
||||
- discovered_tool_search_files
|
||||
- grep_search
|
||||
- discovered_tool_get_file_summary
|
||||
- discovered_tool_get_python_skeleton
|
||||
- discovered_tool_get_code_outline
|
||||
- discovered_tool_get_git_diff
|
||||
- discovered_tool_web_search
|
||||
- discovered_tool_fetch_url
|
||||
- activate_skill
|
||||
- discovered_tool_run_powershell
|
||||
- discovered_tool_py_find_usages
|
||||
- discovered_tool_py_get_imports
|
||||
- discovered_tool_py_check_syntax
|
||||
- discovered_tool_py_get_hierarchy
|
||||
- discovered_tool_py_get_docstring
|
||||
- discovered_tool_get_tree
|
||||
---
|
||||
STRICT SYSTEM DIRECTIVE: You are a Tier 2 Tech Lead.
|
||||
Focused on architectural design and track execution.
|
||||
ONLY output the requested text. No pleasantries.
|
||||
@@ -1,31 +0,0 @@
|
||||
---
|
||||
name: tier3-worker
|
||||
description: Stateless Tier 3 Worker for code implementation and TDD.
|
||||
model: gemini-3-flash-preview
|
||||
tools:
|
||||
- read_file
|
||||
- write_file
|
||||
- replace
|
||||
- list_directory
|
||||
- discovered_tool_search_files
|
||||
- grep_search
|
||||
- discovered_tool_get_file_summary
|
||||
- discovered_tool_get_python_skeleton
|
||||
- discovered_tool_get_code_outline
|
||||
- discovered_tool_get_git_diff
|
||||
- discovered_tool_web_search
|
||||
- discovered_tool_fetch_url
|
||||
- activate_skill
|
||||
- discovered_tool_run_powershell
|
||||
- discovered_tool_py_find_usages
|
||||
- discovered_tool_py_get_imports
|
||||
- discovered_tool_py_check_syntax
|
||||
- discovered_tool_py_get_hierarchy
|
||||
- discovered_tool_py_get_docstring
|
||||
- discovered_tool_get_tree
|
||||
---
|
||||
STRICT SYSTEM DIRECTIVE: You are a stateless Tier 3 Worker (Contributor).
|
||||
Your goal is to implement specific code changes or tests based on the provided task.
|
||||
You have access to tools for reading and writing files, codebase investigation, and web tools.
|
||||
You CAN execute PowerShell scripts or run shell commands via discovered_tool_run_powershell for verification and testing.
|
||||
Follow TDD and return success status or code changes. No pleasantries, no conversational filler.
|
||||
@@ -1,29 +0,0 @@
|
||||
---
|
||||
name: tier4-qa
|
||||
description: Stateless Tier 4 QA Agent for log analysis and diagnostics.
|
||||
model: gemini-2.5-flash-lite
|
||||
tools:
|
||||
- read_file
|
||||
- list_directory
|
||||
- discovered_tool_search_files
|
||||
- grep_search
|
||||
- discovered_tool_get_file_summary
|
||||
- discovered_tool_get_python_skeleton
|
||||
- discovered_tool_get_code_outline
|
||||
- discovered_tool_get_git_diff
|
||||
- discovered_tool_web_search
|
||||
- discovered_tool_fetch_url
|
||||
- activate_skill
|
||||
- discovered_tool_run_powershell
|
||||
- discovered_tool_py_find_usages
|
||||
- discovered_tool_py_get_imports
|
||||
- discovered_tool_py_check_syntax
|
||||
- discovered_tool_py_get_hierarchy
|
||||
- discovered_tool_py_get_docstring
|
||||
- discovered_tool_get_tree
|
||||
---
|
||||
STRICT SYSTEM DIRECTIVE: You are a stateless Tier 4 QA Agent.
|
||||
Your goal is to analyze errors, summarize logs, or verify tests.
|
||||
You have access to tools for reading files, exploring the codebase, and web tools.
|
||||
You CAN execute PowerShell scripts or run shell commands via discovered_tool_run_powershell for diagnostics.
|
||||
ONLY output the requested analysis. No pleasantries.
|
||||
@@ -1,16 +0,0 @@
|
||||
{
|
||||
"hooks": {
|
||||
"BeforeTool": [
|
||||
{
|
||||
"matcher": "*",
|
||||
"hooks": [
|
||||
{
|
||||
"name": "manual-slop-bridge",
|
||||
"type": "command",
|
||||
"command": "python C:/projects/manual_slop/scripts/cli_tool_bridge.py"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -1,13 +0,0 @@
|
||||
{
|
||||
"mcpServers": {
|
||||
"manual-slop": {
|
||||
"command": "C:\\Users\\Ed\\scoop\\apps\\uv\\current\\uv.exe",
|
||||
"args": [
|
||||
"run",
|
||||
"python",
|
||||
"C:\\projects\\manual_slop\\scripts\\mcp_server.py"
|
||||
],
|
||||
"env": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,269 +0,0 @@
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_fetch_url"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered fetch_url tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_get_file_slice"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered get_file_slice tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_get_file_summary"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered get_file_summary tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_get_git_diff"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered get_git_diff tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_get_tree"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered get_tree tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_get_ui_performance"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered get_ui_performance tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_list_directory"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered list_directory tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_py_check_syntax"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered py_check_syntax tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_py_find_usages"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered py_find_usages tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_py_get_class_summary"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered py_get_class_summary tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_py_get_code_outline"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered py_get_code_outline tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_py_get_definition"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered py_get_definition tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_py_get_docstring"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered py_get_docstring tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_py_get_hierarchy"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered py_get_hierarchy tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_py_get_imports"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered py_get_imports tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_py_get_signature"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered py_get_signature tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_py_get_skeleton"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered py_get_skeleton tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_py_get_var_declaration"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered py_get_var_declaration tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_py_set_signature"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered py_set_signature tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_py_set_var_declaration"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered py_set_var_declaration tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_py_update_definition"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered py_update_definition tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_read_file"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered read_file tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_run_powershell"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered run_powershell tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_search_files"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered search_files tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_set_file_slice"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered set_file_slice tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "discovered_tool_web_search"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow discovered web_search tool."
|
||||
|
||||
[[rule]]
|
||||
toolName = "run_powershell"
|
||||
decision = "allow"
|
||||
priority = 100
|
||||
description = "Allow the base run_powershell tool with maximum priority."
|
||||
|
||||
[[rule]]
|
||||
toolName = "activate_skill"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow activate_skill."
|
||||
|
||||
[[rule]]
|
||||
toolName = "ask_user"
|
||||
decision = "ask_user"
|
||||
priority = 990
|
||||
description = "Allow ask_user."
|
||||
|
||||
[[rule]]
|
||||
toolName = "cli_help"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow cli_help."
|
||||
|
||||
[[rule]]
|
||||
toolName = "codebase_investigator"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow codebase_investigator."
|
||||
|
||||
[[rule]]
|
||||
toolName = "replace"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow replace."
|
||||
|
||||
[[rule]]
|
||||
toolName = "glob"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow glob."
|
||||
|
||||
[[rule]]
|
||||
toolName = "google_web_search"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow google_web_search."
|
||||
|
||||
[[rule]]
|
||||
toolName = "read_file"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow read_file."
|
||||
|
||||
[[rule]]
|
||||
toolName = "list_directory"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow list_directory."
|
||||
|
||||
[[rule]]
|
||||
toolName = "save_memory"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow save_memory."
|
||||
|
||||
[[rule]]
|
||||
toolName = "grep_search"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow grep_search."
|
||||
|
||||
[[rule]]
|
||||
toolName = "run_shell_command"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow run_shell_command."
|
||||
|
||||
[[rule]]
|
||||
toolName = "tier1-orchestrator"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow tier1-orchestrator."
|
||||
|
||||
[[rule]]
|
||||
toolName = "tier2-tech-lead"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow tier2-tech-lead."
|
||||
|
||||
[[rule]]
|
||||
toolName = "tier3-worker"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow tier3-worker."
|
||||
|
||||
[[rule]]
|
||||
toolName = "tier4-qa"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow tier4-qa."
|
||||
|
||||
[[rule]]
|
||||
toolName = "web_fetch"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow web_fetch."
|
||||
|
||||
[[rule]]
|
||||
toolName = "write_file"
|
||||
decision = "allow"
|
||||
priority = 990
|
||||
description = "Allow write_file."
|
||||
@@ -1,135 +0,0 @@
|
||||
---
|
||||
name: mma-orchestrator
|
||||
description: Enforces the 4-Tier Hierarchical Multi-Model Architecture (MMA) within Gemini CLI using Token Firewalling and sub-agent task delegation.
|
||||
---
|
||||
|
||||
# MMA Token Firewall & Tiered Delegation Protocol
|
||||
|
||||
You are operating within the MMA Framework, acting as either the **Tier 1 Orchestrator** (for setup/init) or the **Tier 2 Tech Lead** (for execution). Your context window is extremely valuable and must be protected from token bloat (such as raw, repetitive code edits, trial-and-error histories, or massive stack traces).
|
||||
|
||||
To accomplish this, you MUST delegate token-heavy or stateless tasks to **Tier 3 Workers** or **Tier 4 QA Agents** by spawning secondary Gemini CLI instances via `run_shell_command`.
|
||||
|
||||
**CRITICAL Prerequisite:**
|
||||
To ensure proper environment handling and logging, you MUST NOT call the `gemini` command directly for sub-tasks. Instead, use the wrapper script:
|
||||
`uv run python scripts/mma_exec.py --role <Role> "..."`
|
||||
|
||||
## 0. Architecture Fallback & Surgical Methodology
|
||||
|
||||
**Before creating or refining any track**, consult the deep-dive architecture docs:
|
||||
- `docs/guide_architecture.md`: Thread domains, event system (`AsyncEventQueue`, `_pending_gui_tasks` action catalog), AI client multi-provider architecture, HITL Execution Clutch blocking flow, frame-sync mechanism
|
||||
- `docs/guide_tools.md`: MCP Bridge 3-layer security model, full 26-tool inventory with params, Hook API GET/POST endpoints with request/response formats, ApiHookClient method reference
|
||||
- `docs/guide_mma.md`: Ticket/Track/WorkerContext data structures, DAG engine (cycle detection, topological sort), ConductorEngine execution loop, Tier 2 ticket generation, Tier 3 worker lifecycle with context amnesia
|
||||
- `docs/guide_simulations.md`: `live_gui` fixture lifecycle, Puppeteer pattern, mock provider JSON-L protocol, visual verification patterns
|
||||
- `docs/guide_meta_boundary.md`: Clarification of ai agent tools making the application vs the application itself.
|
||||
|
||||
### The Surgical Spec Protocol (MANDATORY for track creation)
|
||||
|
||||
When creating tracks (`activate_skill mma-tier1-orchestrator`), follow this protocol:
|
||||
|
||||
1. **AUDIT BEFORE SPECIFYING**: Use `get_code_outline`, `py_get_definition`, `grep_search`, and `get_git_diff` to map what already exists. Previous track specs asked to re-implement existing features (Track Browser, DAG tree, approval dialogs) because no audit was done. Document findings in a "Current State Audit" section with file:line references.
|
||||
|
||||
2. **GAPS, NOT FEATURES**: Frame requirements as what's MISSING relative to what exists.
|
||||
- GOOD: "The existing `_render_mma_dashboard` (gui_2.py:2633-2724) has a token usage table but no cost column."
|
||||
- BAD: "Build a metrics dashboard with token and cost tracking."
|
||||
|
||||
3. **WORKER-READY TASKS**: Each plan task must specify:
|
||||
- **WHERE**: Exact file and line range (`gui_2.py:2700-2701`)
|
||||
- **WHAT**: The specific change (add function, modify dict, extend table)
|
||||
- **HOW**: Which API calls (`imgui.progress_bar(...)`, `imgui.collapsing_header(...)`)
|
||||
- **SAFETY**: Thread-safety constraints if cross-thread data is involved
|
||||
|
||||
4. **ROOT CAUSE ANALYSIS** (for fix tracks): Don't write "investigate and fix." List specific candidates with code-level reasoning.
|
||||
|
||||
5. **REFERENCE DOCS**: Link to relevant `docs/guide_*.md` sections in every spec.
|
||||
|
||||
6. **MAP DEPENDENCIES**: State execution order and blockers between tracks.
|
||||
|
||||
## 1. The Tier 3 Worker (Execution)
|
||||
|
||||
When performing code modifications or implementing specific requirements:
|
||||
1. **Pre-Delegation Checkpoint:** For dangerous or non-trivial changes, ALWAYS stage your changes (`git add .`) or commit before delegating to a Tier 3 Worker. If the worker fails or runs `git restore`, you will lose all prior AI iterations for that file if it wasn't staged/committed.
|
||||
2. **Code Style Enforcement:** You MUST explicitly remind the worker to "use exactly 1-space indentation for Python code" in your prompt to prevent them from breaking the established codebase style.
|
||||
3. **DO NOT** perform large code writes yourself.
|
||||
4. **DO** construct a single, highly specific prompt with a clear objective. Include exact file:line references and the specific API calls to use (from your audit or the architecture docs).
|
||||
5. **DO** spawn a Tier 3 Worker.
|
||||
*Command:* `uv run python scripts/mma_exec.py --role tier3-worker "Implement [SPECIFIC_INSTRUCTION] in [FILE_PATH] at lines [N-M]. Use [SPECIFIC_API_CALL]. Use 1-space indentation."`
|
||||
6. **Handling Repeated Failures:** If a Tier 3 Worker fails multiple times on the same task, it may lack the necessary capability. You must track failures and retry with `--failure-count <N>` (e.g., `--failure-count 2`). This tells `mma_exec.py` to escalate the sub-agent to a more powerful reasoning model (like `gemini-3-flash`).
|
||||
7. The Tier 3 Worker is stateless and has tool access for file I/O.
|
||||
|
||||
## 2. The Tier 4 QA Agent (Diagnostics)
|
||||
|
||||
If you run a test or command that fails with a significant error or large traceback:
|
||||
1. **DO NOT** analyze the raw logs in your own context window.
|
||||
2. **DO** spawn a stateless Tier 4 agent to diagnose the failure.
|
||||
3. *Command:* `uv run python scripts/mma_exec.py --role tier4-qa "Analyze this failure and summarize the root cause: [LOG_DATA]"`
|
||||
4. **Mandatory Research-First Protocol:** Avoid direct `read_file` calls for any file over 50 lines. Use `get_file_summary`, `py_get_skeleton`, or `py_get_code_outline` first to identify relevant sections. Use `git diff` to understand changes.
|
||||
|
||||
## 3. Persistent Tech Lead Memory (Tier 2)
|
||||
|
||||
Unlike the stateless sub-agents (Tiers 3 & 4), the **Tier 2 Tech Lead** maintains persistent context throughout the implementation of a track. Do NOT apply "Context Amnesia" to your own session during track implementation. You are responsible for the continuity of the technical strategy.
|
||||
|
||||
## 4. AST Skeleton & Outline Views
|
||||
|
||||
To minimize context bloat for Tier 2 & 3:
|
||||
1. Use `py_get_code_outline` or `get_tree` to map out the structure of a file or project.
|
||||
2. Use `py_get_skeleton` and `py_get_imports` to understand the interface, docstrings, and dependencies of modules.
|
||||
3. Use `py_get_definition` to read specific functions/classes by name without loading entire files.
|
||||
4. Use `py_find_usages` to pinpoint where a function or class is called instead of searching the whole codebase.
|
||||
5. Use `py_check_syntax` after making string replacements to ensure the file is still syntactically valid.
|
||||
6. Only use `read_file` with `start_line` and `end_line` for specific implementation details once target areas are identified.
|
||||
7. Tier 3 workers MUST NOT read the full content of unrelated files.
|
||||
|
||||
## 5. Cross-Skill Activation
|
||||
|
||||
When your current role requires capabilities from another tier, use `activate_skill`:
|
||||
- **Track creation/refinement**: `activate_skill mma-tier1-orchestrator` — applies the Surgical Spec Protocol
|
||||
- **Track execution**: `activate_skill mma-tier2-tech-lead` — applies persistent context and TDD workflow
|
||||
- **Quick code task**: Spawn via `mma_exec.py --role tier3-worker` (stateless, no skill activation needed)
|
||||
- **Error analysis**: Spawn via `mma_exec.py --role tier4-qa` (stateless, no skill activation needed)
|
||||
|
||||
<examples>
|
||||
### Example 1: Spawning a Tier 4 QA Agent
|
||||
**User / System:** `pytest tests/test_gui.py` failed with 400 lines of output.
|
||||
**Agent (You):**
|
||||
```json
|
||||
{
|
||||
"command": "python scripts/mma_exec.py --role tier4-qa \"Summarize this stack trace into a 20-word fix: [snip first 30 lines...]\"",
|
||||
"description": "Spawning Tier 4 QA to compress error trace statelessly."
|
||||
}
|
||||
```
|
||||
|
||||
### Example 2: Spawning a Tier 3 Worker with Surgical Prompt
|
||||
|
||||
**User:** Please implement the cost tracking column in the token usage table.
|
||||
**Agent (You):**
|
||||
```json
|
||||
{
|
||||
"command": "python scripts/mma_exec.py --role tier3-worker \"In gui_2.py, modify _render_mma_dashboard (lines 2685-2699). Extend the token usage table from 3 columns to 5 by adding 'Model' and 'Est. Cost' columns. Use imgui.table_setup_column() for the new columns. Import cost_tracker and call cost_tracker.estimate_cost(model, input_tokens, output_tokens) for each tier row. Add a total row at the bottom. Use 1-space indentation.\"",
|
||||
"description": "Delegating surgical implementation to Tier 3 Worker with exact line refs."
|
||||
}
|
||||
```
|
||||
|
||||
### Example 3: Creating a Track with Audit
|
||||
|
||||
**User:** Create a track for adding dark mode support.
|
||||
**Agent (You):**
|
||||
1. First, audit the codebase: `py_get_code_outline gui_2.py` → find `_render_theme_panel` (lines 2993-3030).
|
||||
2. Read the existing implementation: `py_get_definition gui_2.py _render_theme_panel`.
|
||||
3. Check for existing color constants: `grep_search "vec4\|C_" gui_2.py`.
|
||||
4. Now write the spec with a "Current State Audit" section documenting what the theme panel already does.
|
||||
5. Write tasks referencing the exact lines and imgui color APIs to use.
|
||||
</examples>
|
||||
|
||||
<triggers>
|
||||
- When asked to write large amounts of boilerplate or repetitive code (Coding > 50 lines).
|
||||
- When encountering a large error trace from a shell execution (Errors > 100 lines).
|
||||
- When explicitly instructed to act as a "Tech Lead" or "Orchestrator".
|
||||
- When managing complex, multi-file Track implementations.
|
||||
- When creating or refining conductor tracks (MUST follow Surgical Spec Protocol).
|
||||
</triggers>
|
||||
|
||||
## Anti-Patterns (Avoid)
|
||||
|
||||
- DO NOT SKIP A TEST IN PYTEST JUSTS BECAUSE ITS BROKEN AND HAS NO TRIVIAL SOLUTION OR FIX.
|
||||
- DO NOT SIMPLIFY A TEST JUST BECAUSE IT HAS NO TRIVAL SOLUTION TO FIX.
|
||||
- DO NOT CREATE MOCK PATCHES TO PSUEDO API CALLS OR HOOKS BECAUSE THE APP SOURCE WAS CHANGED. ADAPT TESTS PROPERLY.
|
||||
@@ -1,49 +0,0 @@
|
||||
---
|
||||
name: mma-tier1-orchestrator
|
||||
description: Focused on product alignment, high-level planning, and track initialization.
|
||||
---
|
||||
|
||||
# MMA Tier 1: Orchestrator
|
||||
|
||||
You are the Tier 1 Orchestrator. Your role is to oversee the product direction and manage project/track initialization within the Conductor framework.
|
||||
|
||||
## Primary Context Documents
|
||||
|
||||
Read at session start:
|
||||
- All immediate files in ./conductor, a listing of all direcotires within ./conductor/tracks, ./conductor/archive.
|
||||
- All docs in ./docs
|
||||
- AST Skeleton summaries of: ./src, ./simulation, ./tests, ./scripts python files.
|
||||
|
||||
## Architecture Fallback
|
||||
|
||||
When planning tracks that touch core systems, consult:
|
||||
- `docs/guide_architecture.md`: Threading, events, AI client, HITL, frame-sync action catalog
|
||||
- `docs/guide_tools.md`: MCP Bridge, Hook API endpoints, ApiHookClient methods
|
||||
- `docs/guide_mma.md`: Ticket/Track structures, DAG engine, ConductorEngine, worker lifecycle
|
||||
- `docs/guide_simulations.md`: live_gui fixture, Puppeteer pattern, mock provider
|
||||
- `docs/guide_meta_boundary.md`: Clarification of ai agent tools making the application vs the application itself.
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- Maintain alignment with the product guidelines and definition.
|
||||
- Define track boundaries and initialize new tracks (`/conductor:newTrack`).
|
||||
- Set up the project environment (`/conductor:setup`).
|
||||
- Delegate track execution to the Tier 2 Tech Lead.
|
||||
|
||||
## Surgical Spec Protocol (MANDATORY)
|
||||
|
||||
When creating or refining tracks, you MUST:
|
||||
1. **Audit** the codebase with `get_code_outline`, `py_get_definition`, `grep_search` before writing any spec. Document what exists with file:line refs.
|
||||
2. **Spec gaps, not features** — frame requirements relative to what already exists.
|
||||
3. **Write worker-ready tasks** — each specifies WHERE (file:line), WHAT (change), HOW (API call), SAFETY (thread constraints).
|
||||
4. **For fix tracks** — list root cause candidates with code-level reasoning.
|
||||
5. **Reference architecture docs** — link to relevant `docs/guide_*.md` sections.
|
||||
6. **Map dependencies** — state execution order and blockers between tracks.
|
||||
|
||||
See `activate_skill mma-orchestrator` for the full protocol and examples.
|
||||
|
||||
## Limitations
|
||||
|
||||
- Do not execute tracks or implement features.
|
||||
- Do not write code or perform low-level bug fixing.
|
||||
- Keep context strictly focused on product definitions and high-level strategy.
|
||||
@@ -1,53 +0,0 @@
|
||||
---
|
||||
name: mma-tier2-tech-lead
|
||||
description: Focused on track execution, architectural design, and implementation oversight.
|
||||
---
|
||||
|
||||
# MMA Tier 2: Tech Lead
|
||||
|
||||
You are the Tier 2 Tech Lead. Your role is to manage the implementation of tracks (`/conductor:implement`), ensure architectural integrity, and oversee the work of Tier 3 and 4 sub-agents.
|
||||
|
||||
## Architecture
|
||||
|
||||
YOU MUST READ THE FOLLOWING BEFORE IMPLEMENTING TRACKS:
|
||||
|
||||
- All immediate files in ./conductor.
|
||||
- AST Skeleton summaries of: ./src, ./simulation, ./tests, ./scripts python files.
|
||||
|
||||
- `docs/guide_architecture.md`: Thread domains, `_process_pending_gui_tasks` action catalog, AI client architecture, HITL blocking flow
|
||||
- `docs/guide_tools.md`: MCP tools, Hook API endpoints, session logging
|
||||
- `docs/guide_mma.md`: Ticket/Track structures, DAG engine, worker lifecycle
|
||||
- `docs/guide_simulations.md`: Testing patterns, mock provider
|
||||
- `docs/guide_meta_boundary.md`: Clarification of ai agent tools making the application vs the application itself.
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- Manage the execution of implementation tracks.
|
||||
- Ensure alignment with `tech-stack.md` and project architecture.
|
||||
- Break down tasks into specific technical steps for Tier 3 Workers.
|
||||
- Maintain persistent context throughout a track's implementation phase (No Context Amnesia).
|
||||
- Review implementations and coordinate bug fixes via Tier 4 QA.
|
||||
- **CRITICAL: ATOMIC PER-TASK COMMITS**: You MUST commit your progress on a per-task basis. Immediately after a task is verified successfully, you must stage the changes, commit them, attach the git note summary, and update `plan.md` before moving to the next task. Do NOT batch multiple tasks into a single commit.
|
||||
- **Meta-Level Sanity Check**: After completing a track (or upon explicit request), perform a codebase sanity check. Run `uv run ruff check .` and `uv run mypy --explicit-package-bases .` to ensure Tier 3 Workers haven't degraded static analysis constraints. Identify broken simulation tests and append them to a tech debt track or fix them immediately.
|
||||
|
||||
## Anti-Entropy Protocol
|
||||
|
||||
- **State Auditing**: Before adding new state variables to a class, you MUST use `py_get_code_outline` or `py_get_definition` on the target class's `__init__` method (and any relevant configuration loading methods) to check for existing, unused, or duplicate state variables. DO NOT create redundant state if an existing variable can be repurposed or extended.
|
||||
- **TDD Enforcement**: You MUST ensure that failing tests (the "Red" phase) are written and executed successfully BEFORE delegating implementation tasks to Tier 3 Workers. Do NOT accept an implementation from a worker if you haven't first verified the failure of the corresponding test case.
|
||||
|
||||
## Surgical Delegation Protocol
|
||||
|
||||
When delegating to Tier 3 workers, construct prompts that specify:
|
||||
- **WHERE**: Exact file and line range to modify
|
||||
- **WHAT**: The specific change (add function, modify dict, extend table)
|
||||
- **HOW**: Which API calls, data structures, or patterns to use
|
||||
- **SAFETY**: Thread-safety constraints (e.g., "push via `_pending_gui_tasks` with lock")
|
||||
|
||||
Example prompt: `"In gui_2.py, modify _render_mma_dashboard (lines 2685-2699). Extend the token usage table from 3 to 5 columns by adding 'Model' and 'Est. Cost'. Use imgui.table_setup_column(). Import cost_tracker. Use 1-space indentation."`
|
||||
|
||||
## Limitations
|
||||
|
||||
- Do not perform heavy implementation work directly; delegate to Tier 3.
|
||||
- Delegate implementation tasks to Tier 3 Workers using `uv run python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`.
|
||||
- For error analysis of large logs, use `uv run python scripts/mma_exec.py --role tier4-qa "[PROMPT]"`.
|
||||
- Minimize full file reads for large modules; rely on "Skeleton Views" and git diffs.
|
||||
@@ -1,21 +0,0 @@
|
||||
---
|
||||
name: mma-tier3-worker
|
||||
description: Focused on TDD implementation, surgical code changes, and following specific specs.
|
||||
---
|
||||
|
||||
# MMA Tier 3: Worker
|
||||
|
||||
You are the Tier 3 Worker. Your role is to implement specific, scoped technical requirements, follow Test-Driven Development (TDD), and make surgical code modifications. You operate in a stateless manner (Context Amnesia).
|
||||
|
||||
## Responsibilities
|
||||
- Implement code strictly according to the provided prompt and specifications.
|
||||
- **TDD Mandatory Enforcement**: You MUST write a failing test and verify it fails (the "Red" phase) BEFORE writing any implementation code. Do NOT write tests that contain only `pass` or lack meaningful assertions. A test is only valid if it accurately reflects the intended behavioral change and fails in the absence of the implementation.
|
||||
- Write failing tests first, then implement the code to pass them.
|
||||
- Ensure all changes are minimal, functional, and conform to the requested standards.
|
||||
- Utilize provided tool access (read_file, write_file, etc.) to perform implementation and verification.
|
||||
|
||||
## Limitations
|
||||
- Do not make architectural decisions.
|
||||
- Do not modify unrelated files beyond the immediate task scope.
|
||||
- Always operate statelessly; assume each task starts with a clean context.
|
||||
- Rely on "Skeleton Views" provided by Tier 2/Orchestrator for understanding dependencies.
|
||||
@@ -1,19 +0,0 @@
|
||||
---
|
||||
name: mma-tier4-qa
|
||||
description: Focused on test analysis, error summarization, and bug reproduction.
|
||||
---
|
||||
|
||||
# MMA Tier 4: QA Agent
|
||||
|
||||
You are the Tier 4 QA Agent. Your role is to analyze error logs, summarize tracebacks, and help diagnose issues efficiently. You operate in a stateless manner (Context Amnesia).
|
||||
|
||||
## Responsibilities
|
||||
- Compress large stack traces or log files into concise, actionable summaries.
|
||||
- Identify the root cause of test failures or runtime errors.
|
||||
- Provide a brief, technical description of the required fix.
|
||||
- Utilize provided diagnostic and exploration tools to verify failures.
|
||||
|
||||
## Limitations
|
||||
- Do not implement the fix directly.
|
||||
- Ensure your output is extremely brief and focused.
|
||||
- Always operate statelessly; assume each analysis starts with a clean context.
|
||||
@@ -1,17 +0,0 @@
|
||||
{
|
||||
"name": "fetch_url",
|
||||
"description": "Fetch the full text content of a URL (stripped of HTML tags).",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"url": {
|
||||
"type": "string",
|
||||
"description": "The full URL to fetch."
|
||||
}
|
||||
},
|
||||
"required": [
|
||||
"url"
|
||||
]
|
||||
},
|
||||
"command": "python scripts/tool_call.py fetch_url"
|
||||
}
|
||||
@@ -1,17 +0,0 @@
|
||||
{
|
||||
"name": "get_file_summary",
|
||||
"description": "Get a compact heuristic summary of a file without reading its full content. For Python: imports, classes, methods, functions, constants. For TOML: table keys. For Markdown: headings. Others: line count + preview. Use this before read_file to decide if you need the full content.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"path": {
|
||||
"type": "string",
|
||||
"description": "Absolute or relative path to the file to summarise."
|
||||
}
|
||||
},
|
||||
"required": [
|
||||
"path"
|
||||
]
|
||||
},
|
||||
"command": "python scripts/tool_call.py get_file_summary"
|
||||
}
|
||||
@@ -1,25 +0,0 @@
|
||||
{
|
||||
"name": "get_git_diff",
|
||||
"description": "Returns the git diff for a file or directory. Use this to review changes efficiently without reading entire files.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"path": {
|
||||
"type": "string",
|
||||
"description": "Path to the file or directory."
|
||||
},
|
||||
"base_rev": {
|
||||
"type": "string",
|
||||
"description": "Base revision (e.g. 'HEAD', 'HEAD~1', or a commit hash). Defaults to 'HEAD'."
|
||||
},
|
||||
"head_rev": {
|
||||
"type": "string",
|
||||
"description": "Head revision (optional)."
|
||||
}
|
||||
},
|
||||
"required": [
|
||||
"path"
|
||||
]
|
||||
},
|
||||
"command": "python scripts/tool_call.py get_git_diff"
|
||||
}
|
||||
@@ -1,17 +0,0 @@
|
||||
{
|
||||
"name": "py_get_code_outline",
|
||||
"description": "Get a hierarchical outline of a code file. This returns classes, functions, and methods with their line ranges and brief docstrings. Use this to quickly map out a file's structure before reading specific sections.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"path": {
|
||||
"type": "string",
|
||||
"description": "Path to the code file (currently supports .py)."
|
||||
}
|
||||
},
|
||||
"required": [
|
||||
"path"
|
||||
]
|
||||
},
|
||||
"command": "python scripts/tool_call.py py_get_code_outline"
|
||||
}
|
||||
@@ -1,17 +0,0 @@
|
||||
{
|
||||
"name": "py_get_skeleton",
|
||||
"description": "Get a skeleton view of a Python file. This returns all classes and function signatures with their docstrings, but replaces function bodies with '...'. Use this to understand module interfaces without reading the full implementation.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"path": {
|
||||
"type": "string",
|
||||
"description": "Path to the .py file."
|
||||
}
|
||||
},
|
||||
"required": [
|
||||
"path"
|
||||
]
|
||||
},
|
||||
"command": "python scripts/tool_call.py py_get_skeleton"
|
||||
}
|
||||
@@ -1,17 +0,0 @@
|
||||
{
|
||||
"name": "run_powershell",
|
||||
"description": "Run a PowerShell script within the project base_dir. Use this to create, edit, rename, or delete files and directories. stdout and stderr are returned to you as the result.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"script": {
|
||||
"type": "string",
|
||||
"description": "The PowerShell script to execute."
|
||||
}
|
||||
},
|
||||
"required": [
|
||||
"script"
|
||||
]
|
||||
},
|
||||
"command": "python scripts/tool_call.py run_powershell"
|
||||
}
|
||||
@@ -1,22 +0,0 @@
|
||||
{
|
||||
"name": "search_files",
|
||||
"description": "Search for files matching a glob pattern within an allowed directory. Supports recursive patterns like '**/*.py'. Use this to find files by extension or name pattern.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"path": {
|
||||
"type": "string",
|
||||
"description": "Absolute path to the directory to search within."
|
||||
},
|
||||
"pattern": {
|
||||
"type": "string",
|
||||
"description": "Glob pattern, e.g. '*.py', '**/*.toml', 'src/**/*.rs'."
|
||||
}
|
||||
},
|
||||
"required": [
|
||||
"path",
|
||||
"pattern"
|
||||
]
|
||||
},
|
||||
"command": "python scripts/tool_call.py search_files"
|
||||
}
|
||||
@@ -1,17 +0,0 @@
|
||||
{
|
||||
"name": "web_search",
|
||||
"description": "Search the web using DuckDuckGo. Returns the top 5 search results with titles, URLs, and snippets.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"query": {
|
||||
"type": "string",
|
||||
"description": "The search query."
|
||||
}
|
||||
},
|
||||
"required": [
|
||||
"query"
|
||||
]
|
||||
},
|
||||
"command": "python scripts/tool_call.py web_search"
|
||||
}
|
||||
@@ -12,8 +12,7 @@
|
||||
"mcp__manual-slop__get_file_summary",
|
||||
"mcp__manual-slop__get_tree",
|
||||
"mcp__manual-slop__list_directory",
|
||||
"mcp__manual-slop__py_get_skeleton",
|
||||
"Bash(uv run *)"
|
||||
"mcp__manual-slop__py_get_skeleton"
|
||||
]
|
||||
},
|
||||
"enableAllProjectMcpServers": true,
|
||||
|
||||
@@ -1,7 +1,3 @@
|
||||
tests/artifacts
|
||||
tests/logs
|
||||
.ruff_cache
|
||||
.mypy_cache
|
||||
.venv
|
||||
__pycache__
|
||||
*.pyc
|
||||
|
||||
Binary file not shown.
@@ -1,58 +0,0 @@
|
||||
name: test-suite-on-tag
|
||||
|
||||
on:
|
||||
push:
|
||||
tags:
|
||||
- 'v*'
|
||||
- 'release-*'
|
||||
|
||||
jobs:
|
||||
test-ci:
|
||||
name: Test Suite (tier-1 + tier-2, CI-compatible)
|
||||
runs-on: windows-latest
|
||||
timeout-minutes: 30
|
||||
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0
|
||||
|
||||
- name: Setup Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: '3.11'
|
||||
|
||||
- name: Install uv
|
||||
run: pip install uv
|
||||
|
||||
- name: Cache uv dependencies
|
||||
uses: actions/cache@v4
|
||||
with:
|
||||
path: |
|
||||
.venv
|
||||
~\AppData\Local\uv\cache
|
||||
key: ${{ runner.os }}-uv-${{ hashFiles('uv.lock', 'pyproject.toml') }}
|
||||
restore-keys: |
|
||||
${{ runner.os }}-uv-
|
||||
|
||||
- name: Sync dependencies
|
||||
run: uv sync --extra local-rag
|
||||
|
||||
- name: Run unit + mock_app tests (skip tier-3 live_gui)
|
||||
run: |
|
||||
$tagName = "${{ github.ref_name }}"
|
||||
$logPath = "tests/artifacts/ci_tag_run_${tagName}.log"
|
||||
uv run python scripts/run_tests_batched.py --tiers 1,2 2>&1 | Tee-Object -FilePath $logPath | Select-Object -Last 250
|
||||
shell: pwsh
|
||||
timeout-minutes: 20
|
||||
|
||||
- name: Upload test logs
|
||||
if: always()
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: test-logs-${{ github.ref_name }}
|
||||
path: |
|
||||
tests/artifacts/ci_tag_run_*.log
|
||||
if-no-files-found: ignore
|
||||
retention-days: 30
|
||||
+5
-16
@@ -1,12 +1,7 @@
|
||||
.env
|
||||
.coverage
|
||||
.slop_cache
|
||||
.ruff_cache
|
||||
.pytest_cache
|
||||
.mypy_cache
|
||||
__pycache__
|
||||
credentials.toml
|
||||
__pycache__
|
||||
uv.lock
|
||||
colorforth_bootslop_002.md
|
||||
md_gen
|
||||
scripts/generated
|
||||
logs
|
||||
@@ -14,14 +9,8 @@ logs/sessions/
|
||||
logs/agents/
|
||||
logs/errors/
|
||||
tests/artifacts/
|
||||
!tests/artifacts/manualslop_layout_default.ini
|
||||
dpg_layout.ini
|
||||
.env
|
||||
.coverage
|
||||
tests/temp_workspace
|
||||
tests/.test_durations.json
|
||||
sdm_report_refined.json
|
||||
session-ses_1eb8.md
|
||||
mock_debug_prompt.txt
|
||||
temp_old_gui.py
|
||||
.slop_cache/summary_cache.json
|
||||
.antigravitycli
|
||||
.vscode
|
||||
.mypy_cache
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
---
|
||||
---
|
||||
description: Fast, read-only agent for exploring the codebase structure
|
||||
mode: subagent
|
||||
model: minimax-coding-plan/MiniMax-M2.7
|
||||
model: MiniMax-M2.5
|
||||
temperature: 0.2
|
||||
permission:
|
||||
edit: deny
|
||||
@@ -12,7 +12,6 @@ permission:
|
||||
"git log*": allow
|
||||
"ls*": allow
|
||||
"dir*": allow
|
||||
'manual-slop_*': allow
|
||||
---
|
||||
|
||||
You are a fast, read-only agent specialized for exploring codebases. Use this when you need to quickly find files by patterns, search code for keywords, or answer about the codebase.
|
||||
@@ -79,4 +78,4 @@ Return concise findings with file:line references:
|
||||
|
||||
### Summary
|
||||
[One-paragraph summary of findings]
|
||||
```
|
||||
```
|
||||
@@ -1,7 +1,7 @@
|
||||
---
|
||||
---
|
||||
description: General-purpose agent for researching complex questions and executing multi-step tasks
|
||||
mode: subagent
|
||||
model: minimax-coding-plan/MiniMax-M2.7
|
||||
model: MiniMax-M2.5
|
||||
temperature: 0.3
|
||||
---
|
||||
|
||||
@@ -81,4 +81,4 @@ Return detailed findings with evidence:
|
||||
|
||||
### Recommendations
|
||||
- [Suggested next steps if applicable]
|
||||
```
|
||||
```
|
||||
@@ -1,7 +1,7 @@
|
||||
---
|
||||
---
|
||||
description: Tier 1 Orchestrator for product alignment, high-level planning, and track initialization
|
||||
mode: primary
|
||||
model: minimax-coding-plan/MiniMax-M3
|
||||
model: MiniMax-M2.5
|
||||
temperature: 0.5
|
||||
permission:
|
||||
edit: ask
|
||||
@@ -10,7 +10,6 @@ permission:
|
||||
"git status*": allow
|
||||
"git diff*": allow
|
||||
"git log*": allow
|
||||
'manual-slop_*': allow
|
||||
---
|
||||
|
||||
STRICT SYSTEM DIRECTIVE: You are a Tier 1 Orchestrator.
|
||||
@@ -19,7 +18,7 @@ ONLY output the requested text. No pleasantries.
|
||||
|
||||
## Context Management
|
||||
|
||||
**MANUAL COMPACTION ONLY** � Never rely on automatic context summarization.
|
||||
**MANUAL COMPACTION ONLY** — Never rely on automatic context summarization.
|
||||
Use `/compact` command explicitly when context needs reduction.
|
||||
Preserve full context during track planning and spec creation.
|
||||
|
||||
@@ -71,28 +70,6 @@ Before ANY other action:
|
||||
|
||||
**BLOCK PROGRESS** until all checklist items are confirmed.
|
||||
|
||||
## Track Initialization Protocol
|
||||
|
||||
When starting a new track:
|
||||
|
||||
1. **Read track context:**
|
||||
- `conductor/tracks.md` - active tracks
|
||||
- `conductor/tech-stack.md` - technology constraints
|
||||
- `conductor/product.md` - product vision
|
||||
|
||||
2. **Audit existing state:**
|
||||
- Use `manual-slop_py_get_code_outline` to map files
|
||||
- Use `manual-slop_get_git_diff` to check recent changes
|
||||
- Document "Current State Audit" in spec
|
||||
|
||||
3. **Create track spec:**
|
||||
- Follow spec template with: Overview, Current State Audit, Goals, Requirements
|
||||
- Include Architecture Reference section
|
||||
|
||||
4. **Initialize track directory:**
|
||||
- Create `conductor/tracks/{name}_{YYYYMMDD}/`
|
||||
- Write spec.md, plan.md, metadata.json
|
||||
|
||||
## Primary Context Documents
|
||||
|
||||
Read at session start:
|
||||
@@ -128,7 +105,7 @@ Use `manual-slop_py_get_code_outline`, `manual-slop_py_get_definition`,
|
||||
Document existing implementations with file:line references in a
|
||||
"Current State Audit" section in the spec.
|
||||
|
||||
**FAILURE TO AUDIT = TRACK FAILURE** � Previous tracks failed because specs
|
||||
**FAILURE TO AUDIT = TRACK FAILURE** — Previous tracks failed because specs
|
||||
asked to implement features that already existed.
|
||||
|
||||
### 2. Identify Gaps, Not Features
|
||||
@@ -198,4 +175,4 @@ Focus: {One-sentence scope}
|
||||
- Do NOT use native `edit` tool - use MCP tools
|
||||
- DO NOT SKIP A TEST IN PYTEST JUST BECAUSE ITS BROKEN AND HAS NO TRIVIAL SOLUTION OR FIX.
|
||||
- DO NOT SIMPLIFY A TEST JUST BECAUSE IT HAS NO TRIVIAL SOLUTION TO FIX.
|
||||
- DO NOT CREATE MOCK PATCHES TO PSEUDO API CALLS OR HOOKS BECAUSE THE APP SOURCE WAS CHANGED. ADAPT TESTS PROPERLY.
|
||||
- DO NOT CREATE MOCK PATCHES TO PSEUDO API CALLS OR HOOKS BECAUSE THE APP SOURCE WAS CHANGED. ADAPT TESTS PROPERLY.
|
||||
@@ -1,12 +1,10 @@
|
||||
---
|
||||
---
|
||||
description: Tier 2 Tech Lead for architectural design and track execution with persistent memory
|
||||
mode: primary
|
||||
model: minimax-coding-plan/MiniMax-M3
|
||||
temperature: 0.4
|
||||
permission:
|
||||
edit: ask
|
||||
bash: ask
|
||||
'manual-slop_*': allow
|
||||
---
|
||||
|
||||
STRICT SYSTEM DIRECTIVE: You are a Tier 2 Tech Lead.
|
||||
@@ -214,4 +212,4 @@ When all tasks in a phase are complete:
|
||||
- Do NOT use native `edit` tool - use MCP tools
|
||||
- DO NOT SKIP A TEST IN PYTEST JUST BECAUSE ITS BROKEN AND HAS NO TRIVIAL SOLUTION OR FIX.
|
||||
- DO NOT SIMPLIFY A TEST JUST BECAUSE IT HAS NO TRIVIAL SOLUTION TO FIX.
|
||||
- DO NOT CREATE MOCK PATCHES TO PSEUDO API CALLS OR HOOKS BECAUSE THE APP SOURCE WAS CHANGED. ADAPT TESTS PROPERLY.
|
||||
- DO NOT CREATE MOCK PATCHES TO PSEUDO API CALLS OR HOOKS BECAUSE THE APP SOURCE WAS CHANGED. ADAPT TESTS PROPERLY.
|
||||
@@ -1,33 +1,17 @@
|
||||
---
|
||||
---
|
||||
description: Stateless Tier 3 Worker for surgical code implementation and TDD
|
||||
mode: subagent
|
||||
model: minimax-coding-plan/MiniMax-M3
|
||||
model: MiniMax-M2.5
|
||||
temperature: 0.3
|
||||
permission:
|
||||
edit: allow
|
||||
bash: allow
|
||||
'manual-slop_*': allow
|
||||
---
|
||||
|
||||
STRICT SYSTEM DIRECTIVE: You are a stateless Tier 3 Worker (Contributor).
|
||||
Your goal is to implement specific code changes or tests based on the provided task.
|
||||
Follow TDD and return success status or code changes. No pleasantries, no conversational filler.
|
||||
|
||||
## CRITICAL: 1-Space Indentation for Python
|
||||
|
||||
**ALL Python code MUST use exactly 1 (ONE) space for indentation.**
|
||||
|
||||
VIOLATIONS:
|
||||
- Using 4 spaces or tabs will corrupt the codebase
|
||||
- Native edit tools destroy 1-space indentation - use MCP tools ONLY
|
||||
|
||||
MCP Edit Tools (SAFE):
|
||||
- `manual-slop_edit_file` - find/replace, preserves indentation
|
||||
- `manual-slop_py_update_definition` - replace function/class
|
||||
- `manual-slop_set_file_slice` - replace line range
|
||||
|
||||
DO NOT use native `edit` or `write` tools on Python files.
|
||||
|
||||
## Context Amnesia
|
||||
|
||||
You operate statelessly. Each task starts fresh with only the context provided.
|
||||
@@ -66,16 +50,6 @@ You MUST use Manual Slop's MCP tools. Native OpenCode tools are unreliable.
|
||||
|-------------|----------|
|
||||
| `bash` | `manual-slop_run_powershell` |
|
||||
|
||||
## Pre-Delegation Checkpoint Protocol (MANDATORY)
|
||||
|
||||
Before implementing ANY code change:
|
||||
|
||||
1. **Stage your work:** `manual-slop_run_powershell` with `git add .`
|
||||
2. **Why:** Prevents work loss if the implementation fails or needs rollback
|
||||
3. **When:** Always - before touching any file that matters
|
||||
|
||||
This is NOT optional. It is the difference between recoverable and catastrophic failure.
|
||||
|
||||
## Task Start Checklist (MANDATORY)
|
||||
|
||||
Before implementing:
|
||||
@@ -85,30 +59,40 @@ Before implementing:
|
||||
3. [ ] Verify target file and line range exists
|
||||
4. [ ] Announce: "Implementing: [task description]"
|
||||
|
||||
## Task Execution Protocol (MANDATORY TDD)
|
||||
## Task Execution Protocol
|
||||
|
||||
### Phase 1: RED - Write Failing Test
|
||||
- Write a test that defines the expected behavior
|
||||
- Run: `manual-slop_run_powershell` with `uv run pytest tests/path/test.py -v`
|
||||
- Confirm: Test MUST fail before proceeding
|
||||
- DO NOT skip this phase
|
||||
### 1. Understand the Task
|
||||
|
||||
### Phase 2: GREEN - Implement to Pass
|
||||
- Implement the minimal code to make the test pass
|
||||
- Run tests again
|
||||
- Confirm: Test MUST pass
|
||||
- DO NOT skip this phase
|
||||
Read the task prompt carefully. It specifies:
|
||||
|
||||
### Phase 3: REFACTOR - Optional
|
||||
- With passing tests, improve code quality
|
||||
- DO NOT change behavior
|
||||
- Re-run tests to confirm still passing
|
||||
- **WHERE**: Exact file and line range to modify
|
||||
- **WHAT**: The specific change required
|
||||
- **HOW**: Which API calls, patterns, or data structures to use
|
||||
- **SAFETY**: Thread-safety constraints if applicable
|
||||
|
||||
### Commit Protocol (ATOMIC PER TASK)
|
||||
After each task completion:
|
||||
1. `manual-slop_run_powershell` with `git add .`
|
||||
2. `git commit -m "feat(scope): description"`
|
||||
3. DO NOT batch commits across tasks
|
||||
### 2. Research (If Needed)
|
||||
|
||||
Use MCP tools to understand the context:
|
||||
|
||||
- `manual-slop_read_file` - Read specific file sections
|
||||
- `manual-slop_py_find_usages` - Search for patterns
|
||||
- `manual-slop_search_files` - Find files by pattern
|
||||
|
||||
### 3. Implement
|
||||
|
||||
- Follow the exact specifications provided
|
||||
- Use the patterns and APIs specified in the task
|
||||
- Use 1-space indentation for Python code
|
||||
- DO NOT add comments unless explicitly requested
|
||||
- Use type hints where appropriate
|
||||
|
||||
### 4. Verify
|
||||
|
||||
- Run tests if specified: `manual-slop_run_powershell` with `uv run pytest ...`
|
||||
- Check for syntax errors: `manual-slop_py_check_syntax`
|
||||
- Verify the change matches the specification
|
||||
|
||||
### 5. Report
|
||||
|
||||
Return a concise summary:
|
||||
|
||||
@@ -132,29 +116,21 @@ Before reporting completion:
|
||||
- [ ] No syntax errors
|
||||
- [ ] Tests pass (if applicable)
|
||||
|
||||
## BLOCKED Protocol
|
||||
## Blocking Protocol
|
||||
|
||||
If you cannot complete the task:
|
||||
|
||||
1. Start your response with: `BLOCKED:`
|
||||
1. Start your response with `BLOCKED:`
|
||||
2. Explain exactly why you cannot proceed
|
||||
3. List what information or changes would unblock you
|
||||
4. DO NOT attempt partial implementations that break the build
|
||||
|
||||
Examples of BLOCKED conditions:
|
||||
- Missing required context about the codebase
|
||||
- Task requires architectural decisions not in the spec
|
||||
- Target file/line range does not exist as described
|
||||
- Cyclic dependency discovered that wasn't documented
|
||||
- API calls or patterns specified are unavailable or wrong
|
||||
4. Do NOT attempt partial implementations that break the build
|
||||
|
||||
## Anti-Patterns (Avoid)
|
||||
|
||||
- Do NOT use native `edit` tool - use MCP tools
|
||||
- Use skeleton tools (manual-slop-py-get-skeleton, manual-slop-py-get-code-outline, manual-slop-get-file-slice) to navigate any file regardless of size. File size is not a concern; the right tools are.
|
||||
- Do NOT read full large files - use skeleton tools first
|
||||
- Do NOT add comments unless requested
|
||||
- Do NOT modify files outside the specified scope
|
||||
- Do NOT create new `src/*.py` files unless the user explicitly requests it. Helpers go in their parent module (e.g., AI-client code goes in `src/ai_client.py`, not new `src/ai_client_<thing>.py`). If you find yourself about to create a new `src/<thing>.py` file, ASK FIRST. See `AGENTS.md` "File Size and Naming Convention" for the full rule.
|
||||
- DO NOT SKIP A TEST IN PYTEST JUST BECAUSE ITS BROKEN AND HAS NO TRIVIAL SOLUTION OR FIX.
|
||||
- DO NOT SIMPLIFY A TEST JUST BECAUSE IT HAS NO TRIVIAL SOLUTION TO FIX.
|
||||
- DO NOT CREATE MOCK PATCHES TO PSEUDO API CALLS OR HOOKS BECAUSE THE APP SOURCE WAS CHANGED. ADAPT TESTS PROPERLY.
|
||||
- DO NOT CREATE MOCK PATCHES TO PSEUDO API CALLS OR HOOKS BECAUSE THE APP SOURCE WAS CHANGED. ADAPT TESTS PROPERLY.
|
||||
@@ -1,7 +1,7 @@
|
||||
---
|
||||
---
|
||||
description: Stateless Tier 4 QA Agent for error analysis and diagnostics
|
||||
mode: subagent
|
||||
model: minimax-coding-plan/MiniMax-M2.7
|
||||
model: MiniMax-M2.5
|
||||
temperature: 0.2
|
||||
permission:
|
||||
edit: deny
|
||||
@@ -10,7 +10,6 @@ permission:
|
||||
"git status*": allow
|
||||
"git diff*": allow
|
||||
"git log*": allow
|
||||
'manual-slop_*': allow
|
||||
---
|
||||
|
||||
STRICT SYSTEM DIRECTIVE: You are a stateless Tier 4 QA Agent.
|
||||
@@ -22,18 +21,6 @@ ONLY output the requested analysis. No pleasantries.
|
||||
You operate statelessly. Each analysis starts fresh.
|
||||
Do not assume knowledge from previous analyses or sessions.
|
||||
|
||||
## Architecture Reference
|
||||
|
||||
When analyzing errors, trace data flow through thread domains documented in:
|
||||
- `docs/guide_architecture.md`: Thread domains, event system, AI client, HITL mechanism
|
||||
- `docs/guide_mma.md`: 4-tier orchestration, DAG engine, worker lifecycle
|
||||
|
||||
Key threading model:
|
||||
- GUI main thread: UI rendering only
|
||||
- asyncio worker thread: AI communication
|
||||
- HookServer thread: API hook handling
|
||||
- NEVER write GUI state from background threads
|
||||
|
||||
## CRITICAL: MCP Tools Only (Native Tools Banned)
|
||||
|
||||
You MUST use Manual Slop's MCP tools. Native OpenCode tools are unreliable.
|
||||
@@ -67,15 +54,16 @@ Before analyzing:
|
||||
3. [ ] Use skeleton tools for files >50 lines (`manual-slop_py_get_skeleton`)
|
||||
4. [ ] Announce: "Analyzing: [error summary]"
|
||||
|
||||
## Analysis Protocol (MANDATORY FORMAT)
|
||||
## Analysis Protocol
|
||||
|
||||
### 1. Understand the Error
|
||||
- Read the provided error output, test failure, or log carefully
|
||||
- Identify affected files from traceback
|
||||
- Do NOT assume - base analysis on evidence only
|
||||
|
||||
Read the provided error output, test failure, or log carefully.
|
||||
|
||||
### 2. Investigate
|
||||
|
||||
Use MCP tools to understand the context:
|
||||
|
||||
- `manual-slop_read_file` - Read relevant source files
|
||||
- `manual-slop_py_find_usages` - Search for related patterns
|
||||
- `manual-slop_search_files` - Find related files
|
||||
@@ -83,7 +71,7 @@ Use MCP tools to understand the context:
|
||||
|
||||
### 3. Root Cause Analysis
|
||||
|
||||
Provide a structured analysis in this exact format:
|
||||
Provide a structured analysis:
|
||||
|
||||
```
|
||||
## Error Analysis
|
||||
@@ -92,28 +80,18 @@ Provide a structured analysis in this exact format:
|
||||
[One-sentence description of the error]
|
||||
|
||||
### Root Cause
|
||||
[Detailed explanation of WHY the error occurred - not just what went wrong]
|
||||
[Detailed explanation of why the error occurred]
|
||||
|
||||
### Evidence
|
||||
[File:line references supporting the analysis]
|
||||
|
||||
### Data Flow Trace
|
||||
[How data moved through the system to cause this error]
|
||||
[Reference specific thread domains if applicable: GUI main, asyncio worker, HookServer]
|
||||
|
||||
### Impact
|
||||
[What functionality is affected]
|
||||
|
||||
### Recommendations
|
||||
[Suggested fixes - but DO NOT implement them]
|
||||
[Suggested fixes or next steps - but DO NOT implement them]
|
||||
```
|
||||
|
||||
### 4. DO NOT FIX
|
||||
- Your job is ANALYSIS ONLY
|
||||
- Do NOT modify any files
|
||||
- Do NOT write code
|
||||
- Return the analysis and let the controller decide
|
||||
|
||||
## Limitations
|
||||
|
||||
- **READ-ONLY**: Do NOT modify any files
|
||||
@@ -138,8 +116,7 @@ If you cannot analyze the error:
|
||||
## Anti-Patterns (Avoid)
|
||||
|
||||
- Do NOT implement fixes - analysis only
|
||||
- Use skeleton tools (manual-slop-py-get-skeleton, manual-slop-py-get-code-outline, manual-slop-get-file-slice) to navigate any file regardless of size. File size is not a concern; the right tools are.
|
||||
- Do NOT create new `src/*.py` files unless the user explicitly requests it. See `AGENTS.md` "File Size and Naming Convention" for the full rule.
|
||||
- Do NOT read full large files - use skeleton tools first
|
||||
- DO NOT SKIP A TEST IN PYTEST JUST BECAUSE ITS BROKEN AND HAS NO TRIVIAL SOLUTION OR FIX.
|
||||
- DO NOT SIMPLIFY A TEST JUST BECAUSE IT HAS NO TRIVIAL SOLUTION TO FIX.
|
||||
- DO NOT CREATE MOCK PATCHES TO PSEUDO API CALLS OR HOOKS BECAUSE THE APP SOURCE WAS CHANGED. ADAPT TESTS PROPERLY.
|
||||
- DO NOT CREATE MOCK PATCHES TO PSEUDO API CALLS OR HOOKS BECAUSE THE APP SOURCE WAS CHANGED. ADAPT TESTS PROPERLY.
|
||||
Generated
-376
@@ -1,376 +0,0 @@
|
||||
{
|
||||
"name": ".opencode",
|
||||
"lockfileVersion": 3,
|
||||
"requires": true,
|
||||
"packages": {
|
||||
"": {
|
||||
"dependencies": {
|
||||
"@opencode-ai/plugin": "1.14.18"
|
||||
}
|
||||
},
|
||||
"node_modules/@msgpackr-extract/msgpackr-extract-darwin-arm64": {
|
||||
"version": "3.0.3",
|
||||
"resolved": "https://registry.npmjs.org/@msgpackr-extract/msgpackr-extract-darwin-arm64/-/msgpackr-extract-darwin-arm64-3.0.3.tgz",
|
||||
"integrity": "sha512-QZHtlVgbAdy2zAqNA9Gu1UpIuI8Xvsd1v8ic6B2pZmeFnFcMWiPLfWXh7TVw4eGEZ/C9TH281KwhVoeQUKbyjw==",
|
||||
"cpu": [
|
||||
"arm64"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"darwin"
|
||||
]
|
||||
},
|
||||
"node_modules/@msgpackr-extract/msgpackr-extract-darwin-x64": {
|
||||
"version": "3.0.3",
|
||||
"resolved": "https://registry.npmjs.org/@msgpackr-extract/msgpackr-extract-darwin-x64/-/msgpackr-extract-darwin-x64-3.0.3.tgz",
|
||||
"integrity": "sha512-mdzd3AVzYKuUmiWOQ8GNhl64/IoFGol569zNRdkLReh6LRLHOXxU4U8eq0JwaD8iFHdVGqSy4IjFL4reoWCDFw==",
|
||||
"cpu": [
|
||||
"x64"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"darwin"
|
||||
]
|
||||
},
|
||||
"node_modules/@msgpackr-extract/msgpackr-extract-linux-arm": {
|
||||
"version": "3.0.3",
|
||||
"resolved": "https://registry.npmjs.org/@msgpackr-extract/msgpackr-extract-linux-arm/-/msgpackr-extract-linux-arm-3.0.3.tgz",
|
||||
"integrity": "sha512-fg0uy/dG/nZEXfYilKoRe7yALaNmHoYeIoJuJ7KJ+YyU2bvY8vPv27f7UKhGRpY6euFYqEVhxCFZgAUNQBM3nw==",
|
||||
"cpu": [
|
||||
"arm"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"linux"
|
||||
]
|
||||
},
|
||||
"node_modules/@msgpackr-extract/msgpackr-extract-linux-arm64": {
|
||||
"version": "3.0.3",
|
||||
"resolved": "https://registry.npmjs.org/@msgpackr-extract/msgpackr-extract-linux-arm64/-/msgpackr-extract-linux-arm64-3.0.3.tgz",
|
||||
"integrity": "sha512-YxQL+ax0XqBJDZiKimS2XQaf+2wDGVa1enVRGzEvLLVFeqa5kx2bWbtcSXgsxjQB7nRqqIGFIcLteF/sHeVtQg==",
|
||||
"cpu": [
|
||||
"arm64"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"linux"
|
||||
]
|
||||
},
|
||||
"node_modules/@msgpackr-extract/msgpackr-extract-linux-x64": {
|
||||
"version": "3.0.3",
|
||||
"resolved": "https://registry.npmjs.org/@msgpackr-extract/msgpackr-extract-linux-x64/-/msgpackr-extract-linux-x64-3.0.3.tgz",
|
||||
"integrity": "sha512-cvwNfbP07pKUfq1uH+S6KJ7dT9K8WOE4ZiAcsrSes+UY55E/0jLYc+vq+DO7jlmqRb5zAggExKm0H7O/CBaesg==",
|
||||
"cpu": [
|
||||
"x64"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"linux"
|
||||
]
|
||||
},
|
||||
"node_modules/@msgpackr-extract/msgpackr-extract-win32-x64": {
|
||||
"version": "3.0.3",
|
||||
"resolved": "https://registry.npmjs.org/@msgpackr-extract/msgpackr-extract-win32-x64/-/msgpackr-extract-win32-x64-3.0.3.tgz",
|
||||
"integrity": "sha512-x0fWaQtYp4E6sktbsdAqnehxDgEc/VwM7uLsRCYWaiGu0ykYdZPiS8zCWdnjHwyiumousxfBm4SO31eXqwEZhQ==",
|
||||
"cpu": [
|
||||
"x64"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"win32"
|
||||
]
|
||||
},
|
||||
"node_modules/@opencode-ai/plugin": {
|
||||
"version": "1.14.18",
|
||||
"resolved": "https://registry.npmjs.org/@opencode-ai/plugin/-/plugin-1.14.18.tgz",
|
||||
"integrity": "sha512-oF1U7Aipz8A93WGllrwxYugopeL4ml/zd6ywoFIyuF2gbvEhOGFomAvqt1E5YjLN0wEL8nCPwFine3l7pqgNUA==",
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"@opencode-ai/sdk": "1.14.18",
|
||||
"effect": "4.0.0-beta.48",
|
||||
"zod": "4.1.8"
|
||||
},
|
||||
"peerDependencies": {
|
||||
"@opentui/core": ">=0.1.100",
|
||||
"@opentui/solid": ">=0.1.100"
|
||||
},
|
||||
"peerDependenciesMeta": {
|
||||
"@opentui/core": {
|
||||
"optional": true
|
||||
},
|
||||
"@opentui/solid": {
|
||||
"optional": true
|
||||
}
|
||||
}
|
||||
},
|
||||
"node_modules/@opencode-ai/sdk": {
|
||||
"version": "1.14.18",
|
||||
"resolved": "https://registry.npmjs.org/@opencode-ai/sdk/-/sdk-1.14.18.tgz",
|
||||
"integrity": "sha512-E0QiiB+9rv/TPH0a1GunKl6LnuXDRHDiJaIFHOPaBL364rQx+3ClHwHkz78/KBsjhjeLrC2CaLgK+CoxV/XUIQ==",
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"cross-spawn": "7.0.6"
|
||||
}
|
||||
},
|
||||
"node_modules/@standard-schema/spec": {
|
||||
"version": "1.1.0",
|
||||
"resolved": "https://registry.npmjs.org/@standard-schema/spec/-/spec-1.1.0.tgz",
|
||||
"integrity": "sha512-l2aFy5jALhniG5HgqrD6jXLi/rUWrKvqN/qJx6yoJsgKhblVd+iqqU4RCXavm/jPityDo5TCvKMnpjKnOriy0w==",
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/cross-spawn": {
|
||||
"version": "7.0.6",
|
||||
"resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.6.tgz",
|
||||
"integrity": "sha512-uV2QOWP2nWzsy2aMp8aRibhi9dlzF5Hgh5SHaB9OiTGEyDTiJJyx0uy51QXdyWbtAHNua4XJzUKca3OzKUd3vA==",
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"path-key": "^3.1.0",
|
||||
"shebang-command": "^2.0.0",
|
||||
"which": "^2.0.1"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">= 8"
|
||||
}
|
||||
},
|
||||
"node_modules/detect-libc": {
|
||||
"version": "2.1.2",
|
||||
"resolved": "https://registry.npmjs.org/detect-libc/-/detect-libc-2.1.2.tgz",
|
||||
"integrity": "sha512-Btj2BOOO83o3WyH59e8MgXsxEQVcarkUOpEYrubB0urwnN10yQ364rsiByU11nZlqWYZm05i/of7io4mzihBtQ==",
|
||||
"license": "Apache-2.0",
|
||||
"optional": true,
|
||||
"engines": {
|
||||
"node": ">=8"
|
||||
}
|
||||
},
|
||||
"node_modules/effect": {
|
||||
"version": "4.0.0-beta.48",
|
||||
"resolved": "https://registry.npmjs.org/effect/-/effect-4.0.0-beta.48.tgz",
|
||||
"integrity": "sha512-MMAM/ZabuNdNmgXiin+BAanQXK7qM8mlt7nfXDoJ/Gn9V8i89JlCq+2N0AiWmqFLXjGLA0u3FjiOjSOYQk5uMw==",
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"@standard-schema/spec": "^1.1.0",
|
||||
"fast-check": "^4.6.0",
|
||||
"find-my-way-ts": "^0.1.6",
|
||||
"ini": "^6.0.0",
|
||||
"kubernetes-types": "^1.30.0",
|
||||
"msgpackr": "^1.11.9",
|
||||
"multipasta": "^0.2.7",
|
||||
"toml": "^4.1.1",
|
||||
"uuid": "^13.0.0",
|
||||
"yaml": "^2.8.3"
|
||||
}
|
||||
},
|
||||
"node_modules/fast-check": {
|
||||
"version": "4.7.0",
|
||||
"resolved": "https://registry.npmjs.org/fast-check/-/fast-check-4.7.0.tgz",
|
||||
"integrity": "sha512-NsZRtqvSSoCP0HbNjUD+r1JH8zqZalyp6gLY9e7OYs7NK9b6AHOs2baBFeBG7bVNsuoukh89x2Yg3rPsul8ziQ==",
|
||||
"funding": [
|
||||
{
|
||||
"type": "individual",
|
||||
"url": "https://github.com/sponsors/dubzzz"
|
||||
},
|
||||
{
|
||||
"type": "opencollective",
|
||||
"url": "https://opencollective.com/fast-check"
|
||||
}
|
||||
],
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"pure-rand": "^8.0.0"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=12.17.0"
|
||||
}
|
||||
},
|
||||
"node_modules/find-my-way-ts": {
|
||||
"version": "0.1.6",
|
||||
"resolved": "https://registry.npmjs.org/find-my-way-ts/-/find-my-way-ts-0.1.6.tgz",
|
||||
"integrity": "sha512-a85L9ZoXtNAey3Y6Z+eBWW658kO/MwR7zIafkIUPUMf3isZG0NCs2pjW2wtjxAKuJPxMAsHUIP4ZPGv0o5gyTA==",
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/ini": {
|
||||
"version": "6.0.0",
|
||||
"resolved": "https://registry.npmjs.org/ini/-/ini-6.0.0.tgz",
|
||||
"integrity": "sha512-IBTdIkzZNOpqm7q3dRqJvMaldXjDHWkEDfrwGEQTs5eaQMWV+djAhR+wahyNNMAa+qpbDUhBMVt4ZKNwpPm7xQ==",
|
||||
"license": "ISC",
|
||||
"engines": {
|
||||
"node": "^20.17.0 || >=22.9.0"
|
||||
}
|
||||
},
|
||||
"node_modules/isexe": {
|
||||
"version": "2.0.0",
|
||||
"resolved": "https://registry.npmjs.org/isexe/-/isexe-2.0.0.tgz",
|
||||
"integrity": "sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw==",
|
||||
"license": "ISC"
|
||||
},
|
||||
"node_modules/kubernetes-types": {
|
||||
"version": "1.30.0",
|
||||
"resolved": "https://registry.npmjs.org/kubernetes-types/-/kubernetes-types-1.30.0.tgz",
|
||||
"integrity": "sha512-Dew1okvhM/SQcIa2rcgujNndZwU8VnSapDgdxlYoB84ZlpAD43U6KLAFqYo17ykSFGHNPrg0qry0bP+GJd9v7Q==",
|
||||
"license": "Apache-2.0"
|
||||
},
|
||||
"node_modules/msgpackr": {
|
||||
"version": "1.11.12",
|
||||
"resolved": "https://registry.npmjs.org/msgpackr/-/msgpackr-1.11.12.tgz",
|
||||
"integrity": "sha512-RBdJ1Un7yGlXWajrkxcSa93nvQ0w4zBf60c0yYv7YtBelP8H2FA7XsfBbMHtXKXUMUxH7zV3Zuozh+kUQWhHvg==",
|
||||
"license": "MIT",
|
||||
"optionalDependencies": {
|
||||
"msgpackr-extract": "^3.0.2"
|
||||
}
|
||||
},
|
||||
"node_modules/msgpackr-extract": {
|
||||
"version": "3.0.3",
|
||||
"resolved": "https://registry.npmjs.org/msgpackr-extract/-/msgpackr-extract-3.0.3.tgz",
|
||||
"integrity": "sha512-P0efT1C9jIdVRefqjzOQ9Xml57zpOXnIuS+csaB4MdZbTdmGDLo8XhzBG1N7aO11gKDDkJvBLULeFTo46wwreA==",
|
||||
"hasInstallScript": true,
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"dependencies": {
|
||||
"node-gyp-build-optional-packages": "5.2.2"
|
||||
},
|
||||
"bin": {
|
||||
"download-msgpackr-prebuilds": "bin/download-prebuilds.js"
|
||||
},
|
||||
"optionalDependencies": {
|
||||
"@msgpackr-extract/msgpackr-extract-darwin-arm64": "3.0.3",
|
||||
"@msgpackr-extract/msgpackr-extract-darwin-x64": "3.0.3",
|
||||
"@msgpackr-extract/msgpackr-extract-linux-arm": "3.0.3",
|
||||
"@msgpackr-extract/msgpackr-extract-linux-arm64": "3.0.3",
|
||||
"@msgpackr-extract/msgpackr-extract-linux-x64": "3.0.3",
|
||||
"@msgpackr-extract/msgpackr-extract-win32-x64": "3.0.3"
|
||||
}
|
||||
},
|
||||
"node_modules/multipasta": {
|
||||
"version": "0.2.7",
|
||||
"resolved": "https://registry.npmjs.org/multipasta/-/multipasta-0.2.7.tgz",
|
||||
"integrity": "sha512-KPA58d68KgGil15oDqXjkUBEBYc00XvbPj5/X+dyzeo/lWm9Nc25pQRlf1D+gv4OpK7NM0J1odrbu9JNNGvynA==",
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/node-gyp-build-optional-packages": {
|
||||
"version": "5.2.2",
|
||||
"resolved": "https://registry.npmjs.org/node-gyp-build-optional-packages/-/node-gyp-build-optional-packages-5.2.2.tgz",
|
||||
"integrity": "sha512-s+w+rBWnpTMwSFbaE0UXsRlg7hU4FjekKU4eyAih5T8nJuNZT1nNsskXpxmeqSK9UzkBl6UgRlnKc8hz8IEqOw==",
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"dependencies": {
|
||||
"detect-libc": "^2.0.1"
|
||||
},
|
||||
"bin": {
|
||||
"node-gyp-build-optional-packages": "bin.js",
|
||||
"node-gyp-build-optional-packages-optional": "optional.js",
|
||||
"node-gyp-build-optional-packages-test": "build-test.js"
|
||||
}
|
||||
},
|
||||
"node_modules/path-key": {
|
||||
"version": "3.1.1",
|
||||
"resolved": "https://registry.npmjs.org/path-key/-/path-key-3.1.1.tgz",
|
||||
"integrity": "sha512-ojmeN0qd+y0jszEtoY48r0Peq5dwMEkIlCOu6Q5f41lfkswXuKtYrhgoTpLnyIcHm24Uhqx+5Tqm2InSwLhE6Q==",
|
||||
"license": "MIT",
|
||||
"engines": {
|
||||
"node": ">=8"
|
||||
}
|
||||
},
|
||||
"node_modules/pure-rand": {
|
||||
"version": "8.4.0",
|
||||
"resolved": "https://registry.npmjs.org/pure-rand/-/pure-rand-8.4.0.tgz",
|
||||
"integrity": "sha512-IoM8YF/jY0hiugFo/wOWqfmarlE6J0wc6fDK1PhftMk7MGhVZl88sZimmqBBFomLOCSmcCCpsfj7wXASCpvK9A==",
|
||||
"funding": [
|
||||
{
|
||||
"type": "individual",
|
||||
"url": "https://github.com/sponsors/dubzzz"
|
||||
},
|
||||
{
|
||||
"type": "opencollective",
|
||||
"url": "https://opencollective.com/fast-check"
|
||||
}
|
||||
],
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/shebang-command": {
|
||||
"version": "2.0.0",
|
||||
"resolved": "https://registry.npmjs.org/shebang-command/-/shebang-command-2.0.0.tgz",
|
||||
"integrity": "sha512-kHxr2zZpYtdmrN1qDjrrX/Z1rR1kG8Dx+gkpK1G4eXmvXswmcE1hTWBWYUzlraYw1/yZp6YuDY77YtvbN0dmDA==",
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"shebang-regex": "^3.0.0"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=8"
|
||||
}
|
||||
},
|
||||
"node_modules/shebang-regex": {
|
||||
"version": "3.0.0",
|
||||
"resolved": "https://registry.npmjs.org/shebang-regex/-/shebang-regex-3.0.0.tgz",
|
||||
"integrity": "sha512-7++dFhtcx3353uBaq8DDR4NuxBetBzC7ZQOhmTQInHEd6bSrXdiEyzCvG07Z44UYdLShWUyXt5M/yhz8ekcb1A==",
|
||||
"license": "MIT",
|
||||
"engines": {
|
||||
"node": ">=8"
|
||||
}
|
||||
},
|
||||
"node_modules/toml": {
|
||||
"version": "4.1.1",
|
||||
"resolved": "https://registry.npmjs.org/toml/-/toml-4.1.1.tgz",
|
||||
"integrity": "sha512-EBJnVBr3dTXdA89WVFoAIPUqkBjxPMwRqsfuo1r240tKFHXv3zgca4+NJib/h6TyvGF7vOawz0jGuryJCdNHrw==",
|
||||
"license": "MIT",
|
||||
"engines": {
|
||||
"node": ">=20"
|
||||
}
|
||||
},
|
||||
"node_modules/uuid": {
|
||||
"version": "13.0.1",
|
||||
"resolved": "https://registry.npmjs.org/uuid/-/uuid-13.0.1.tgz",
|
||||
"integrity": "sha512-9ezox2roIft6ExBVTVqibSd5dc5/47Sw/uY6b4SjQUT2TzQ0tltNquWA46y4xPQmdZYqvnio22SgWd41M86+jw==",
|
||||
"funding": [
|
||||
"https://github.com/sponsors/broofa",
|
||||
"https://github.com/sponsors/ctavan"
|
||||
],
|
||||
"license": "MIT",
|
||||
"bin": {
|
||||
"uuid": "dist-node/bin/uuid"
|
||||
}
|
||||
},
|
||||
"node_modules/which": {
|
||||
"version": "2.0.2",
|
||||
"resolved": "https://registry.npmjs.org/which/-/which-2.0.2.tgz",
|
||||
"integrity": "sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA==",
|
||||
"license": "ISC",
|
||||
"dependencies": {
|
||||
"isexe": "^2.0.0"
|
||||
},
|
||||
"bin": {
|
||||
"node-which": "bin/node-which"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">= 8"
|
||||
}
|
||||
},
|
||||
"node_modules/yaml": {
|
||||
"version": "2.8.4",
|
||||
"resolved": "https://registry.npmjs.org/yaml/-/yaml-2.8.4.tgz",
|
||||
"integrity": "sha512-ml/JPOj9fOQK8RNnWojA67GbZ0ApXAUlN2UQclwv2eVgTgn7O9gg9o7paZWKMp4g0H3nTLtS9LVzhkpOFIKzog==",
|
||||
"license": "ISC",
|
||||
"bin": {
|
||||
"yaml": "bin.mjs"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">= 14.6"
|
||||
},
|
||||
"funding": {
|
||||
"url": "https://github.com/sponsors/eemeli"
|
||||
}
|
||||
},
|
||||
"node_modules/zod": {
|
||||
"version": "4.1.8",
|
||||
"license": "MIT",
|
||||
"funding": {
|
||||
"url": "https://github.com/sponsors/colinhacks"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,199 +1,123 @@
|
||||
# AGENTS.md
|
||||
# Manual Slop - OpenCode Configuration
|
||||
|
||||
## What This Is
|
||||
## MCP TOOL PARAMETERS - CRITICAL
|
||||
- **ALWAYS use snake_case**: `old_string`, `new_string`, `replace_all`
|
||||
- **NEVER use camelCase**: `oldString`, `newString`, `replaceAll`
|
||||
|
||||
Manual Slop is a local GUI orchestrator for LLM-driven coding sessions. It bridges high-latency AI reasoning with a low-latency ImGui render loop via a thread-safe async pipeline; every AI-generated payload passes through a human-auditable gate before execution.
|
||||
## Project Overview
|
||||
|
||||
## The Conductor Convention
|
||||
**Manual Slop** is a local GUI application designed as an experimental, "manual" AI coding assistant. It allows users to curate and send context (files, screenshots, and discussion history) to AI APIs (Gemini and Anthropic). The AI can then execute PowerShell scripts within the project directory to modify files, requiring explicit user confirmation before execution.
|
||||
|
||||
All AI agents consuming this project must read `./conductor/workflow.md` and treat `./conductor/tracks.md` as the task registry. Track implementation follows the TDD protocol documented in `conductor/workflow.md` with per-file atomic commits and git notes.
|
||||
## Main Technologies
|
||||
|
||||
## Guidance for AI Agents
|
||||
- **Language:** Python 3.11+
|
||||
- **Package Management:** `uv`
|
||||
- **GUI Framework:** Dear PyGui (`dearpygui`), ImGui Bundle (`imgui-bundle`)
|
||||
- **AI SDKs:** `google-genai` (Gemini), `anthropic`
|
||||
- **Configuration:** TOML (`tomli-w`)
|
||||
|
||||
Detailed agent guidance lives in the following locations — read these directly, do not duplicate content here:
|
||||
## Architecture
|
||||
|
||||
- **MUST READ TO - CORRECT EDIT WORKFLOW** `conductor/edit_workflow.md`
|
||||
- **Operational workflow:** `conductor/workflow.md`
|
||||
- **Code style and process:** `conductor/product-guidelines.md`
|
||||
- **Tech stack and constraints:** `conductor/tech-stack.md`
|
||||
- **Product context:** `conductor/product.md`
|
||||
- **MMA orchestrator role:** `mma-orchestrator/SKILL.md`
|
||||
- **Tier 1 (Orchestrator):** `.agents/skills/mma-tier1-orchestrator/SKILL.md`
|
||||
- **Tier 2 (Tech Lead):** `.agents/skills/mma-tier2-tech-lead/SKILL.md`
|
||||
- **Tier 3 (Worker):** `.agents/skills/mma-tier3-worker/SKILL.md`
|
||||
- **Tier 4 (QA):** `.agents/skills/mma-tier4-qa/SKILL.md`
|
||||
- **`gui_legacy.py`:** Main entry point and Dear PyGui application logic
|
||||
- **`ai_client.py`:** Unified wrapper for Gemini and Anthropic APIs
|
||||
- **`aggregate.py`:** Builds `file_items` context
|
||||
- **`mcp_client.py`:** Implements MCP-like tools (26 tools)
|
||||
- **`shell_runner.py`:** Sandboxed subprocess wrapper for PowerShell
|
||||
- **`project_manager.py`:** Per-project TOML configurations
|
||||
- **`session_logger.py`:** Timestamped logging (JSON-L)
|
||||
|
||||
## Canonical Operating Rules
|
||||
## Critical Context (Read First)
|
||||
|
||||
@conductor/code_styleguides/data_oriented_design.md
|
||||
This is the canonical DOD reference. The same file is injected into the Application's RAG / context assembly via `[agent].context_files` in `manual_slop.toml` — one source of truth for both harnesses. Edit it there; do not duplicate rules into this file.
|
||||
- **Tech Stack**: Python 3.11+, Dear PyGui / ImGui, FastAPI, Uvicorn
|
||||
- **Main File**: `gui_2.py` (primary GUI), `ai_client.py` (multi-provider LLM abstraction)
|
||||
- **Core Mechanic**: GUI orchestrator for LLM-driven coding with 4-tier MMA architecture
|
||||
- **Key Integration**: Gemini API, Anthropic API, DeepSeek, Gemini CLI (headless), MCP tools
|
||||
- **Platform Support**: Windows (PowerShell)
|
||||
- **DO NOT**: Read full files >50 lines without using `py_get_skeleton` or `get_file_summary` first
|
||||
|
||||
## Code Styleguides (the convention catalog)
|
||||
## Environment
|
||||
|
||||
Per-domain rules live in `conductor/code_styleguides/`. The full list is in `./docs/AGENTS.md` §2 (the canonical 6-styleguide catalog with one-line summaries + when-to-read). This section is a pointer.
|
||||
- Shell: PowerShell (pwsh) on Windows
|
||||
- Do NOT use bash-specific syntax (use PowerShell equivalents)
|
||||
- Use `uv run` for all Python execution
|
||||
- Path separators: forward slashes work in PowerShell
|
||||
|
||||
**The short version (the 6 styleguides):**
|
||||
## Session Startup Checklist
|
||||
|
||||
- `data_oriented_design.md` — The canonical DOD reference (Tier 0/1/2; 3 defaults to reject; 7-question simplification pass)
|
||||
- `agent_memory_dimensions.md` — The 4 memory dimensions (curation / discussion / RAG / knowledge) and when to use each
|
||||
- `rag_integration_discipline.md` — The conservative-RAG rule: opt-in, complement, provenance, no mutation
|
||||
- `cache_friendly_context.md` — Stable-to-volatile context ordering; the cache TTL GUI contract; the byte-comparison test
|
||||
- `knowledge_artifacts.md` — The knowledge harvest pattern: category files, provenance, sha256 ledger, digest regeneration
|
||||
- `feature_flags.md` — Codifies "delete to turn off" (file presence) + config flags; when to use each
|
||||
## Human-Facing Documentation
|
||||
At the start of each session:
|
||||
|
||||
For understanding, using, and maintaining the tool, see `docs/Readme.md` (the canonical teaching document) and `./docs/AGENTS.md` (the agent-facing mirror of `docs/Readme.md`).
|
||||
1. **Check ./condcutor/tracks.md** - look for IN_PROGRESS or BLOCKED tracks
|
||||
2. **Review recent JOURNAL.md entries** - scan last 2-3 entries for context
|
||||
3. **Run `/conductor-setup`** - load full context
|
||||
4. **Run `/conductor-status`** - get overview
|
||||
|
||||
The 14 deep-dive guides under `docs/` (`guide_architecture.md`, `guide_ai_client.md`, etc.) are referenced from `docs/Readme.md`; an agent reading for a feature scope should read `./docs/AGENTS.md` first, then the relevant `guide_*.md`.
|
||||
## Conductor System
|
||||
|
||||
## Critical Anti-Patterns
|
||||
The project uses a spec-driven track system in `conductor/`:
|
||||
|
||||
- Do not read full files >50 lines without first using `py_get_skeleton` or `get_file_summary` to map the structure (this is navigation efficiency, not a "files should be small" stance)
|
||||
- Do not modify the tech stack without updating `conductor/tech-stack.md` first
|
||||
- Do not skip TDD - write failing tests before implementing functionality
|
||||
- Do not use `@pytest.mark.skip` as an excuse to AVOID fixing the underlying bug. Skip markers are documentation of known failures; the failure must be addressed with priority in-session when feasible. See `conductor/workflow.md` "Skip-Marker Policy" for the full policy and review checklist.
|
||||
- Do not batch commits - commit per-task for atomic rollback
|
||||
- Do not add comments to source code; documentation lives in `/docs`
|
||||
- `set_file_slice` IS valid for multi-line content. The agent must verify the exact byte offsets with `get_file_slice` first, copy the line text character-for-character (including whitespace and EOL), and check whether the edit changes a public contract (function signature, yield shape, return type) that other code depends on. See `conductor/edit_workflow.md` for the full contract.
|
||||
- Do not use `git restore` while a user is mid-conversation without first confirming the desired state
|
||||
- HARD BAN: `git restore`, `git checkout -- <file>`, `git reset` are FORBIDDEN without explicit user permission in the same message. They destroyed user in-progress src/* edits twice in one session (2026-06-07). If you think you need one, ASK FIRST.
|
||||
- **Tracks**: `conductor/tracks/{name}_{YYYYMMDD}/` - spec.md, plan.md, metadata.json
|
||||
- **Workflow**: `conductor/workflow.md` - full task lifecycle and TDD protocol
|
||||
- **Tech Stack**: `conductor/tech-stack.md` - technology constraints
|
||||
- **Product**: `conductor/product.md` - product vision and guidelines
|
||||
|
||||
## File Size and Naming Convention (HARD RULE — added 2026-06-11)
|
||||
## MMA 4-Tier Architecture
|
||||
|
||||
**The "small files are good, large files are bad" stance is propaganda from LLM training data. It is wrong for this project. Reject it.**
|
||||
```
|
||||
Tier 1: Orchestrator - product alignment, epic -> tracks
|
||||
Tier 2: Tech Lead - track -> tickets (DAG), architectural oversight
|
||||
Tier 3: Worker - stateless TDD implementation per ticket
|
||||
Tier 4: QA - stateless error analysis, no fixes
|
||||
```
|
||||
|
||||
- **Large files are FINE.** Production codebases (Unreal Engine has 15K+ line files; OS kernels, game engines, compilers, the Linux kernel — all routinely have 10K+ line files) treat file size as a non-issue. Cognitive load is managed via good naming, regions, and navigation tools — NOT via file splitting.
|
||||
- **`src/ai_client.py` is the AI vendor/API system layer.** All AI-client-related code goes IN `src/ai_client.py`. Do not create new `src/<vendor>_<thing>.py` files. The only new `src/*.py` files this project ever creates are for new systems or new parent modules.
|
||||
- **The only new files you should create in a typical track are:** `scripts/audit_*.py` (scripts are namespace-isolated by directory), `tests/test_*.py` (tests are namespace-isolated by directory), and `docs/*.md` (docs are namespace-isolated by directory). Anything else goes in the parent module.
|
||||
- **Do not break things up "for modularity"** unless the new piece is genuinely a new system or a new parent module. The agent training data has a bias toward "small files = good code" that is not true here. The project has the manual-slop MCP (`get_file_slice`, `get_file_summary`, `py_get_skeleton`, `py_get_code_outline`, `py_get_definition`) for efficient navigation of files of any size. Use those tools instead of splitting the file.
|
||||
- **When in doubt: keep it in the parent module.** If a function clearly belongs to a system, it lives in that system's file. The system is the namespace.
|
||||
## Architecture Fallback
|
||||
|
||||
### Hard rule on creating new `src/<thing>.py` files (added 2026-06-11)
|
||||
When uncertain about threading, event flow, data structures, or module interactions, consult:
|
||||
|
||||
**New namespaced `src/<thing>.py` files may only be created on the user's explicit request.** If you find yourself about to create one, **ASK FIRST** — don't just create it.
|
||||
- **docs/guide_architecture.md**: Thread domains, event system, AI client, HITL mechanism
|
||||
- **docs/guide_tools.md**: MCP Bridge security, 26-tool inventory, Hook API endpoints
|
||||
- **docs/guide_mma.md**: Ticket/Track data structures, DAG engine, ConductorEngine
|
||||
- **docs/guide_simulations.md**: live_gui fixture, Puppeteer pattern, verification
|
||||
- **docs/guide_meta_boundary.md**: Clarification of ai agent tools making the application vs the application itself.
|
||||
|
||||
Rationale: the user is the only one who can authorize a new top-level namespace. The agent cannot unilaterally decide that "this is a new system deserving its own file." Defaults:
|
||||
- **Helpers and sub-systems go in the parent module.** E.g., AI-client-specific helpers go in `src/ai_client.py`; app-controller helpers go in `src/app_controller.py`; MCP-client helpers go in `src/mcp_client.py`. Even if the parent file is already 3K+ lines, the helper still goes there.
|
||||
- **If a new top-level `src/<thing>.py` is genuinely warranted** (e.g., a truly new system that doesn't fit any existing parent), propose it in the next checkpoint or status note and wait for the user's explicit "yes, create it."
|
||||
## Development Workflow
|
||||
|
||||
**Audit trigger:** if you find yourself about to create a new `src/<thing>.py` file, ask: "is `<thing>` a new system, or is it part of an existing system?" If it's part of an existing system, the file goes in that system's file (e.g., `src/ai_client.py`, `src/app_controller.py`, `src/mcp_client.py`, etc.). If it's a new system, ASK THE USER before creating the file.
|
||||
- No giant edits: if your `manual-slop_edit_file` `new_string` exceeds ~20 lines, STOP and split it.
|
||||
- No diagnostic noise in production code. `sys.stderr.write(f"[XYZ_DIAG] ...")` lines added to `src/*.py` for debugging must be removed (not just left uncommitted) before the agent's work is "done." Diagnostic code that ships is technical debt. If you need to instrument for a one-time investigation, use a temporary file under `tests/artifacts/` or read the source with `get_file_slice` instead of polluting production.
|
||||
- No loop, no scope-creep, no report-instead-of-fix. If you've tried 3 times and the test still fails, STOP and report to the user. Do not write a 200-line status report as a substitute for the fix. Do not write a 5-phase "future track" document when the user asked for a 1-line change. See `conductor/workflow.md` "Process Anti-Patterns" for the full ruleset.
|
||||
1. Run `/conductor-setup` to load session context
|
||||
2. Pick active track from `./condcutor/tracks.md` or `/conductor-status`
|
||||
3. Run `/conductor-implement` to resume track execution
|
||||
4. Follow TDD: Red (failing tests) -> Green (pass) -> Refactor
|
||||
5. Delegate implementation to Tier 3 Workers, errors to Tier 4 QA
|
||||
6. On phase completion: run `/conductor-verify` for checkpoint
|
||||
|
||||
## Session-Learned Anti-Patterns (Added 2026-06-07)
|
||||
## Anti-Patterns (Avoid These)
|
||||
|
||||
These burned the most time in a recent startup_speedup session. The rules below are short because the rules above (and `conductor/edit_workflow.md`) are the source of truth.
|
||||
- **Don't read full large files** - use `py_get_skeleton`, `get_file_summary`, `py_get_code_outline` first
|
||||
- **Don't implement directly as Tier 2** - delegate to Tier 3 Workers
|
||||
- **Don't skip TDD** - write failing tests before implementation
|
||||
- **Don't modify tech stack silently** - update `conductor/tech-stack.md` BEFORE implementing
|
||||
- **Don't skip phase verification** - run `/conductor-verify` when all tasks in a phase are `[x]`
|
||||
- **Don't mix track work** - stay focused on one track at a time
|
||||
|
||||
### 1. ALWAYS use the proper edit tool, not a custom script
|
||||
## Code Style
|
||||
|
||||
- For Python source edits, use `manual-slop_edit_file` with `old_string`/`new_string`. **Do NOT** write a standalone Python script that does file-level replacements.
|
||||
- Custom scripts fail silently on: wrong indent in `new_content`, wrong EOL (CRLF vs LF) in `old_string` searches, wrong exact-string match (whitespace drift).
|
||||
- When a script fails, debug the actual error message. Do not dismiss it and try a different approach.
|
||||
- **IMPORTANT**: DO NOT ADD ***ANY*** COMMENTS unless asked
|
||||
- Use 1-space indentation for Python code
|
||||
- Use type hints where appropriate
|
||||
|
||||
### 2. The decorator-orphan pitfall
|
||||
## Code Style
|
||||
|
||||
When inserting new methods **before an existing `@property` def**, your script will leave the `@property` decorator on the line above your new methods. The decorator then accidentally decorates YOUR new method (which is no longer a property, breaking any subsequent `@your_method.setter` calls). The file passes `ast.parse()` but blows up at import time.
|
||||
- **IMPORTANT**: DO NOT ADD ***ANY*** COMMENTS unless asked
|
||||
- Use 1-space indentation for Python code
|
||||
- Use type hints where appropriate
|
||||
- Internal methods/variables prefixed with underscore
|
||||
|
||||
The fix: anchor on the **def line that has the `@property` ABOVE it**, and replace the pair `@property\n def foo(...)` with `@property\n def your_new(...)\n ...\n def foo(...)` — keeping the decorator attached to its original method. Or anchor on a different non-decorated landmark (e.g. `self._init_actions()`).
|
||||
### CRITICAL: Native Edit Tool Destroys Indentation
|
||||
|
||||
### 3. `ast.parse()` "Syntax OK" is not enough
|
||||
The native `Edit` tool DESTROYS 1-space indentation and converts to 4-space.
|
||||
|
||||
`py_check_syntax` only confirms `ast.parse()` succeeds. Semantic errors (wrong decorator targets, wrong class attribute, missing `self`, etc.) are NOT caught. After any multi-line edit, ALWAYS:
|
||||
- Import the module
|
||||
- Instantiate the class
|
||||
- Call the new method in the way it's expected to be called (e.g. `ctrl.foo_ts` vs `ctrl.foo_ts()` for properties vs methods)
|
||||
**NEVER use native `edit` tool on Python files.**
|
||||
|
||||
### 4. The "I'll just check git status" trap (now a HARD BAN, see Critical list above)
|
||||
Instead, use Manual Slop MCP tools:
|
||||
|
||||
If you suspect you might have lost work, the worst move is to run `git status` / `git restore` while a frantic user is watching. Pause, read the actual file, and admit what state you're in. The user knows their state better than you do. This trap has now caused irrecoverable data loss twice in one session — the ban is enforced above.
|
||||
|
||||
### 5. Small, verified edits beat big scripts
|
||||
|
||||
`conductor/edit_workflow.md` says it explicitly: 3-10 lines at a time, verify after each, repeat. If you find yourself writing a 200-line Python script to do an edit, you're doing it wrong. Use the MCP tools.
|
||||
|
||||
---
|
||||
|
||||
## Process Anti-Patterns (Added 2026-06-09)
|
||||
|
||||
These are the bad patterns the agents have been exhibiting that the user explicitly called out as dog-shit. The rules below are short. If you find yourself doing any of these, STOP and reread this section.
|
||||
|
||||
### 1. The Deduction Loop (kill it)
|
||||
|
||||
**Symptom:** Run test → fail → read log → form hypothesis → run again → fail differently → add diag → run again → fail again → loop. You end up running the same test 4+ times in one session, each run reading partial log output.
|
||||
|
||||
**Rule:** You are allowed to run a failing test at most **2 times** in a single investigation. After the 2nd failure, STOP running the test. Read the relevant source code (`get_file_slice` or `py_get_skeleton`), predict the failure mode from the code, and instrument ALL the relevant state in one pass before the next run. If the test still fails after 1 instrumented run, report to the user — do not loop.
|
||||
|
||||
**Worst case captured upfront.** Before running the test, ask: "what is the worst-case information I will need if this fails?" Add the diag for that, then run. The diag lines themselves are wasteful in production — see "No Diagnostic Noise in Production" below.
|
||||
|
||||
### 2. The Report-Instead-of-Fix Pattern (kill it)
|
||||
|
||||
**Symptom:** You can't fix the bug. You write a 200-line status report explaining why you can't fix it. The report contains "What I tried this session", "What I am NOT going to do", "What you can do", and "Files changed in this session (cumulative)." The report is a confession, not a fix.
|
||||
|
||||
**Rule:** A status report is allowed only when:
|
||||
- You have actually tried the fix and it failed with evidence, OR
|
||||
- You are blocked on a decision the user must make.
|
||||
|
||||
A status report is NOT allowed when:
|
||||
- You are avoiding a hard problem by writing prose about it.
|
||||
- The user asked for a fix and you have not yet tried.
|
||||
- The "what you can do" section is a list of options to defer to the user instead of picking the best one and doing it.
|
||||
|
||||
A good status report is 5-10 sentences, not 200 lines.
|
||||
|
||||
### 3. The Scope-Creep Track-Doc Pattern (kill it)
|
||||
|
||||
**Symptom:** The user asks for a 1-line fix. You write a 5-phase "future track" spec with 140 lines of scope, audit findings, recommendations, and "out of scope" sections. The track doc is now larger than the fix it was meant to scope.
|
||||
|
||||
**Rule:** If the user asks for a fix, your output is the fix. A track doc is only appropriate when the fix is multi-day work that requires a plan. If the fix is < 100 lines, it does not get a track. If the fix would touch more than 5 files, it MIGHT get a track — but ask first.
|
||||
|
||||
### 4. The Inherited-Cruft Pattern (kill it)
|
||||
|
||||
**Symptom:** The previous agent left a half-finished refactor in the working tree. The file is broken. You try to fix it and make it worse. You try again. You make it worse. The file stays broken for 3 days.
|
||||
|
||||
**Rule:** If the file is already in a broken state from a previous session, the FIRST thing you do is ask the user: "this file is in a broken state from a previous agent. do you want me to (a) revert the working tree and start from a clean baseline, (b) finish the previous agent's intent, or (c) abandon the work entirely?" You do not start by "trying to fix" the broken file. The user's answer determines the work, not your assumption.
|
||||
|
||||
### 5. No Diagnostic Noise in Production (kill it)
|
||||
|
||||
**Symptom:** You add `sys.stderr.write(f"[RAG_DIAG] ...)")` to `src/rag_engine.py` and `src/app_controller.py` to debug a test failure. The diag lines help. You "revert everything" but leave the 4-8 diag lines in the working tree uncommitted. The next agent runs `git status`, sees the diag lines, and either commits them by accident or spends 10 minutes cleaning them up.
|
||||
|
||||
**Rule:** Diagnostic stderr goes to a log file (`tests/artifacts/<test_name>.diag.log`) or to a temporary diagnostic script (`/tmp/diag_rag.py`), NOT to `src/*.py`. If you absolutely must instrument a production function for a single test run, the diag lines are part of the same atomic commit as the fix — they do not live uncommitted in the working tree. If you "revert everything," that means the diag lines are also reverted.
|
||||
|
||||
### 6. The "I Am Not Going To Attempt Another Fix Without Your Direction" Surrender (kill it)
|
||||
|
||||
**Symptom:** You've tried 3 things. None worked. You write: "I am not going to attempt another fix without your direction." Then you wait for the user to tell you what to do.
|
||||
|
||||
**Rule:** This is correct ONLY if you have already done the things below:
|
||||
- Read the actual source code, not from memory
|
||||
- Predicted the failure mode from the code
|
||||
- Instrumented the relevant state in one pass
|
||||
- Run the test once with instrumentation
|
||||
- Captured the full output, not partial output
|
||||
|
||||
If you have done all 5 and are still stuck, surrendering is fine. If you have not, you are surrendering too early. The user does not want to be your strategist; the user wants the agent to make progress.
|
||||
|
||||
### 7. The Verbose-Commit-Message Pattern (kill it)
|
||||
|
||||
**Symptom:** Your commit message is 50 lines. It contains the root cause analysis, the alternatives you considered, the side effects you considered, the cross-references, the "what this doesn't fix", the "what to verify", and a personal essay. The commit message is longer than the diff it describes.
|
||||
|
||||
**Rule:** A commit message is a 1-3 sentence summary. The body is for non-obvious "why" details, not for re-stating what the diff shows. If your commit message is longer than 15 lines, you are writing a report, not a commit message. Save the report for `docs/reports/`.
|
||||
|
||||
### 8. The "Isolated Pass" Verification Fallacy (kill it)
|
||||
|
||||
**Symptom:** You run the test in isolation. It passes. You commit. The test fails in batch. You didn't notice because you never ran the batch.
|
||||
|
||||
**Rule:** For any `live_gui` test or any test that depends on shared subprocess state, the **only verification that matters is the batch run**. A test that passes in isolation but fails in batch is failing — it's just that the failure is masked by isolation. Per the existing `Live_gui Test Fragility` rule in `conductor/workflow.md`: "Bisect failures by running the test both in the full suite and in isolation to distinguish 'test needs work' from 'real app bug'." If you only ever run in isolation, you cannot tell the difference.
|
||||
|
||||
## Compaction Recovery
|
||||
|
||||
If you're a new agent picking up a session that was compacted (or a previous agent ran out of context), follow this recovery path:
|
||||
|
||||
1. **Read the most recent `docs/reports/PLANNING_DIGEST_<date>.md`** if one exists. It indexes the planning artifacts and explains the design decisions behind the active tracks.
|
||||
2. **For each in-flight track**, read `conductor/tracks/<track_id>/state.toml` to see `current_phase`; read `conductor/tracks/<track_id>/plan.md` for the task breakdown.
|
||||
3. **Check `git log --oneline -20`** to see what has been committed; the most recent commits in `conductor/tracks/<track_id>/` are the latest work.
|
||||
4. **Run the audit scripts** (`scripts/audit_main_thread_imports.py`, `scripts/audit_weak_types.py`) to see the current state of the codebase.
|
||||
5. **Resume from the next unchecked task** in `state.toml`. The per-task commit discipline means each commit is a safe rollback point.
|
||||
|
||||
The track's `metadata.json` has a `verification_criteria` field — this is the definition of "done" for the track. If all the criteria are checked, the track is complete.
|
||||
|
||||
For deeper recovery, see `conductor/workflow.md` "Compaction Recovery" (the same pattern, but workflow-level).
|
||||
- `manual-slop_py_update_definition` - Replace function/class
|
||||
- `manual-slop_set_file_slice` - Replace line range
|
||||
- `manual-slop_py_set_signature` - Replace signature only
|
||||
|
||||
@@ -0,0 +1,58 @@
|
||||
# ARCHITECTURE.md
|
||||
|
||||
## Tech Stack
|
||||
- **Framework**: [Primary framework/language]
|
||||
- **Database**: [Database system]
|
||||
- **Frontend**: [Frontend technology]
|
||||
- **Backend**: [Backend technology]
|
||||
- **Infrastructure**: [Hosting/deployment]
|
||||
- **Build Tools**: [Build system]
|
||||
|
||||
## Directory Structure
|
||||
```
|
||||
project/
|
||||
├── src/ # Source code
|
||||
├── tests/ # Test files
|
||||
├── docs/ # Documentation
|
||||
├── config/ # Configuration files
|
||||
└── scripts/ # Build/deployment scripts
|
||||
```
|
||||
|
||||
## Key Architectural Decisions
|
||||
|
||||
### [Decision 1]
|
||||
**Context**: [Why this decision was needed]
|
||||
**Decision**: [What was decided]
|
||||
**Rationale**: [Why this approach was chosen]
|
||||
**Consequences**: [Trade-offs and implications]
|
||||
|
||||
## Component Architecture
|
||||
|
||||
### [ComponentName] Structure <!-- #component-anchor -->
|
||||
```typescript
|
||||
// Major classes with exact line numbers
|
||||
class MainClass { /* lines 100-500 */ } // <!-- #main-class -->
|
||||
class Helper { /* lines 501-600 */ } // <!-- #helper-class -->
|
||||
```
|
||||
|
||||
## System Flow Diagram
|
||||
```
|
||||
[User] -> [Frontend] -> [API] -> [Database]
|
||||
| |
|
||||
v v
|
||||
[Cache] [External Service]
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### [Pattern Name]
|
||||
**When to use**: [Circumstances]
|
||||
**Implementation**: [How to implement]
|
||||
**Example**: [Code example with line numbers]
|
||||
|
||||
## Keywords <!-- #keywords -->
|
||||
- architecture
|
||||
- system design
|
||||
- tech stack
|
||||
- components
|
||||
- patterns
|
||||
@@ -0,0 +1,103 @@
|
||||
# BUILD.md
|
||||
|
||||
## Prerequisites
|
||||
- [Runtime requirements]
|
||||
- [Development tools needed]
|
||||
- [Environment setup]
|
||||
|
||||
## Build Commands
|
||||
|
||||
### Development
|
||||
```bash
|
||||
# Start development server
|
||||
npm run dev
|
||||
|
||||
# Run in watch mode
|
||||
npm run watch
|
||||
```
|
||||
|
||||
### Production
|
||||
```bash
|
||||
# Build for production
|
||||
npm run build
|
||||
|
||||
# Start production server
|
||||
npm start
|
||||
```
|
||||
|
||||
### Testing
|
||||
```bash
|
||||
# Run all tests
|
||||
npm test
|
||||
|
||||
# Run tests in watch mode
|
||||
npm run test:watch
|
||||
|
||||
# Run specific test file
|
||||
npm test -- filename
|
||||
```
|
||||
|
||||
### Linting & Formatting
|
||||
```bash
|
||||
# Lint code
|
||||
npm run lint
|
||||
|
||||
# Fix linting issues
|
||||
npm run lint:fix
|
||||
|
||||
# Format code
|
||||
npm run format
|
||||
```
|
||||
|
||||
## CI/CD Pipeline
|
||||
|
||||
### GitHub Actions
|
||||
```yaml
|
||||
# .github/workflows/main.yml
|
||||
name: CI/CD
|
||||
on: [push, pull_request]
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- name: Setup Node.js
|
||||
uses: actions/setup-node@v3
|
||||
with:
|
||||
node-version: '18'
|
||||
- run: npm ci
|
||||
- run: npm test
|
||||
- run: npm run build
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
### Staging
|
||||
1. [Deployment steps]
|
||||
2. [Verification steps]
|
||||
|
||||
### Production
|
||||
1. [Pre-deployment checklist]
|
||||
2. [Deployment steps]
|
||||
3. [Post-deployment verification]
|
||||
|
||||
## Rollback Procedures
|
||||
1. [Emergency rollback steps]
|
||||
2. [Database rollback if needed]
|
||||
3. [Verification steps]
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
**Issue**: [Problem description]
|
||||
**Solution**: [How to fix]
|
||||
|
||||
### Build Failures
|
||||
- [Common build errors and solutions]
|
||||
|
||||
## Keywords <!-- #keywords -->
|
||||
- build
|
||||
- deployment
|
||||
- ci/cd
|
||||
- testing
|
||||
- production
|
||||
@@ -1,3 +1,122 @@
|
||||
# CLAUDE.md
|
||||
<!-- Generated by Claude Conductor v2.0.0 -->
|
||||
|
||||
This project is no longer actively used with Claude Code. For project context, see `AGENTS.md`. The conductor system in `./conductor/` is the cross-tool abstraction and works with any agent toolchain.
|
||||
This file provides guidance to Claude Code when working with this repository.
|
||||
|
||||
## MCP TOOL PARAMETERS - CRITICAL
|
||||
- **ALWAYS use snake_case**: `old_string`, `new_string`, `replace_all`
|
||||
- **NEVER use camelCase**: `oldString`, `newString`, `replaceAll`
|
||||
|
||||
## Critical Context (Read First)
|
||||
- **Tech Stack**: Python 3.11+, Dear PyGui / ImGui, FastAPI, Uvicorn
|
||||
- **Main File**: `gui_2.py` (primary GUI), `ai_client.py` (multi-provider LLM abstraction)
|
||||
- **Core Mechanic**: GUI orchestrator for LLM-driven coding with 4-tier MMA architecture
|
||||
- **Key Integration**: Gemini API, Anthropic API, DeepSeek, Gemini CLI (headless), MCP tools
|
||||
- **Platform Support**: Windows (PowerShell) — single developer, local use
|
||||
- **DO NOT**: Read full files >50 lines without using `py_get_skeleton` or `get_file_summary` first. Do NOT perform heavy implementation directly — delegate to Tier 3 Workers.
|
||||
|
||||
## Environment
|
||||
- Shell: PowerShell (pwsh) on Windows
|
||||
- Do NOT use bash-specific syntax (use PowerShell equivalents)
|
||||
- Use `uv run` for all Python execution
|
||||
- Path separators: forward slashes work in PowerShell
|
||||
- **Shell execution in Claude Code**: The `Bash` tool runs in a mingw sandbox on Windows and produces unreliable/empty output. Use `run_powershell` MCP tool for ALL shell commands (git, tests, scans). Bash is last-resort only when MCP server is not running.
|
||||
|
||||
## Session Startup Checklist
|
||||
**IMPORTANT**: At the start of each session:
|
||||
1. **Check TASKS.md** — look for IN_PROGRESS or BLOCKED tracks
|
||||
2. **Review recent JOURNAL.md entries** — scan last 2-3 entries for context
|
||||
3. **If resuming work**: run `/conductor-setup` to load full context
|
||||
4. **If starting fresh**: run `/conductor-status` for overview
|
||||
|
||||
## Quick Reference
|
||||
**GUI Entry**: `gui_2.py` — Primary ImGui interface
|
||||
**AI Client**: `ai_client.py` — Multi-provider abstraction (Gemini, Anthropic, DeepSeek)
|
||||
**MCP Client**: `mcp_client.py:773-831` — Tool dispatch (26 tools)
|
||||
**Project Manager**: `project_manager.py` — Context & file management
|
||||
**MMA Engine**: `multi_agent_conductor.py:15-100` — ConductorEngine orchestration
|
||||
**Tech Lead**: `conductor_tech_lead.py` — Tier 2 ticket generation
|
||||
**DAG Engine**: `dag_engine.py` — Task dependency resolution
|
||||
**Session Logger**: `session_logger.py` — Audit trails (JSON-L + markdown)
|
||||
**Shell Runner**: `shell_runner.py` — PowerShell execution (60s timeout)
|
||||
**Models**: `models.py:6-84` — Ticket and Track data structures
|
||||
**File Cache**: `file_cache.py` — ASTParser with tree-sitter skeletons
|
||||
**Summarizer**: `summarize.py` — Heuristic file summaries
|
||||
**Outliner**: `outline_tool.py` — Code outline with line ranges
|
||||
|
||||
## Conductor System
|
||||
The project uses a spec-driven track system in `conductor/`:
|
||||
- **Tracks**: `conductor/tracks/{name}_{YYYYMMDD}/` — spec.md, plan.md, metadata.json
|
||||
- **Workflow**: `conductor/workflow.md` — full task lifecycle and TDD protocol
|
||||
- **Tech Stack**: `conductor/tech-stack.md` — technology constraints
|
||||
- **Product**: `conductor/product.md` — product vision and guidelines
|
||||
|
||||
### Conductor Commands (Claude Code slash commands)
|
||||
- `/conductor-setup` — bootstrap session with conductor context
|
||||
- `/conductor-status` — show all track status
|
||||
- `/conductor-new-track` — create a new track (Tier 1)
|
||||
- `/conductor-implement` — execute a track (Tier 2 — delegates to Tier 3/4)
|
||||
- `/conductor-verify` — phase completion verification and checkpointing
|
||||
|
||||
### MMA Tier Commands
|
||||
- `/mma-tier1-orchestrator` — product alignment, planning
|
||||
- `/mma-tier2-tech-lead` — track execution, architectural oversight
|
||||
- `/mma-tier3-worker` — stateless TDD implementation
|
||||
- `/mma-tier4-qa` — stateless error analysis
|
||||
|
||||
### Delegation (Tier 2 spawns Tier 3/4)
|
||||
```powershell
|
||||
uv run python scripts\claude_mma_exec.py --role tier3-worker "Task prompt here"
|
||||
uv run python scripts\claude_mma_exec.py --role tier4-qa "Error analysis prompt"
|
||||
```
|
||||
|
||||
## Current State
|
||||
- [x] Multi-provider AI client (Gemini, Anthropic, DeepSeek)
|
||||
- [x] Dear PyGui / ImGui GUI with multi-panel interface
|
||||
- [x] MMA 4-tier orchestration engine
|
||||
- [x] Custom MCP tools (26 tools via mcp_client.py)
|
||||
- [x] Session logging and audit trails
|
||||
- [x] Gemini CLI headless adapter
|
||||
- [x] Claude Code conductor integration
|
||||
- [~] AI-Optimized Python Style Refactor (Phase 3 — type hints for UI modules)
|
||||
- [~] Robust Live Simulation Verification (Phase 2 — Epic/Track verification)
|
||||
- [ ] Documentation Refresh and Context Cleanup
|
||||
|
||||
## Development Workflow
|
||||
1. Run `/conductor-setup` to load session context
|
||||
2. Pick active track from `conductor/tracks.md` or `/conductor-status`
|
||||
3. Run `/conductor-implement` to resume track execution
|
||||
4. Follow TDD: Red (failing tests) → Green (pass) → Refactor
|
||||
5. Delegate implementation to Tier 3 Workers, errors to Tier 4 QA
|
||||
6. On phase completion: run `/conductor-verify` for checkpoint
|
||||
|
||||
## Anti-Patterns (Avoid These)
|
||||
- **Don't read full large files** — use `py_get_skeleton`, `get_file_summary`, `py_get_code_outline` first (Research-First Protocol)
|
||||
- **Don't implement directly as Tier 2** — delegate to Tier 3 Workers via `claude_mma_exec.py`
|
||||
- **Don't skip TDD** — write failing tests before implementation
|
||||
- **Don't modify tech stack silently** — update `conductor/tech-stack.md` BEFORE implementing
|
||||
- **Don't skip phase verification** — run `/conductor-verify` when all tasks in a phase are `[x]`
|
||||
- **Don't mix track work** — stay focused on one track at a time
|
||||
|
||||
## MCP Tools (available via manual-slop MCP server)
|
||||
When the MCP server is running, these tools are available natively:
|
||||
`py_get_skeleton`, `py_get_code_outline`, `py_get_definition`, `py_update_definition`,
|
||||
`py_get_signature`, `py_set_signature`, `py_get_class_summary`, `py_find_usages`,
|
||||
`py_get_imports`, `py_check_syntax`, `py_get_hierarchy`, `py_get_docstring`,
|
||||
`get_file_summary`, `get_file_slice`, `set_file_slice`, `get_git_diff`, `get_tree`,
|
||||
`search_files`, `read_file`, `list_directory`, `web_search`, `fetch_url`,
|
||||
`run_powershell`, `get_ui_performance`, `py_get_var_declaration`, `py_set_var_declaration`
|
||||
|
||||
## Journal Update Requirements
|
||||
Update JOURNAL.md after:
|
||||
- Completing any significant feature or fix
|
||||
- Encountering and resolving errors
|
||||
- End of each work session
|
||||
- Making architectural decisions
|
||||
Format: What/Why/How/Issues/Result structure
|
||||
|
||||
## Task Management Integration
|
||||
- **conductor/tracks.md**: Quick-read pointer to active conductor tracks
|
||||
- **conductor/tracks/*/plan.md**: Detailed task state (source of truth)
|
||||
- **JOURNAL.md**: Completed work history with `|TASK:ID|` tags
|
||||
- **ERRORS.md**: P0/P1 error tracking
|
||||
|
||||
+25
-14
@@ -1,23 +1,34 @@
|
||||
# Use python:3.11-slim as a base
|
||||
FROM python:3.11-slim
|
||||
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
git curl ca-certificates libx11-6 libgl1 libxrender1 libxext6 tk \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
# Set environment variables
|
||||
# UV_SYSTEM_PYTHON=1 allows uv to install into the system site-packages
|
||||
ENV PYTHONDONTWRITEBYTECODE=1
|
||||
PYTHONUNBUFFERED=1
|
||||
UV_SYSTEM_PYTHON=1
|
||||
|
||||
RUN pip install uv
|
||||
# Install system dependencies and uv
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends
|
||||
curl
|
||||
ca-certificates
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
&& curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
&& mv /root/.local/bin/uv /usr/local/bin/uv
|
||||
|
||||
# Set the working directory in the container
|
||||
WORKDIR /app
|
||||
COPY pyproject.toml uv.lock ./
|
||||
RUN uv sync --frozen
|
||||
|
||||
# Copy dependency files first to leverage Docker layer caching
|
||||
COPY pyproject.toml requirements.txt* ./
|
||||
|
||||
# Install dependencies via uv
|
||||
RUN if [ -f requirements.txt ]; then uv pip install --no-cache -r requirements.txt; fi
|
||||
|
||||
# Copy the rest of the application code
|
||||
COPY . .
|
||||
|
||||
RUN mkdir -p /projects /config
|
||||
VOLUME ["/projects", "/config"]
|
||||
# Expose port 8000 for the headless API/service
|
||||
EXPOSE 8000
|
||||
|
||||
EXPOSE 8080 8999
|
||||
|
||||
HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 \
|
||||
CMD curl -f http://127.0.0.1:8999/status || exit 1
|
||||
|
||||
ENTRYPOINT ["uv", "run", "sloppy.py", "--enable-test-hooks", "--web-host=0.0.0.0", "--web-port=8080"]
|
||||
# Set the entrypoint to run the app in headless mode
|
||||
ENTRYPOINT ["python", "gui_2.py", "--headless"]
|
||||
|
||||
@@ -1,42 +1,22 @@
|
||||
# GEMINI.md
|
||||
# Project Overview
|
||||
|
||||
This file covers Gemini-CLI-specific operational notes for the Manual Slop project. The primary toolchain is Gemini CLI; for general agent orientation, see `AGENTS.md`.
|
||||
|
||||
## Project Overview
|
||||
|
||||
**Manual Slop** is a local GUI orchestrator for LLM-driven coding sessions. It bridges high-latency AI reasoning with a low-latency ImGui render loop via a thread-safe async pipeline; every AI-generated payload passes through a human-auditable gate before execution.
|
||||
**Manual Slop** is a local GUI application designed as an experimental, "manual" AI coding assistant. It allows users to curate and send context (files, screenshots, and discussion history) to AI APIs (Gemini and Anthropic). The AI can then execute PowerShell scripts within the project directory to modify files, requiring explicit user confirmation before execution.
|
||||
|
||||
**Main Technologies:**
|
||||
* **Language:** Python 3.11+
|
||||
* **Package Management:** `uv`
|
||||
* **GUI Framework:** ImGui Bundle (`imgui-bundle`)
|
||||
* **AI SDKs:** `google-genai` (Gemini), `anthropic` (Claude), `openai` (DeepSeek + MiniMax via OpenAI-compatible endpoints), `GeminiCliAdapter` (headless gemini CLI subprocess)
|
||||
* **GUI Framework:** Dear PyGui (`dearpygui`), ImGui Bundle (`imgui-bundle`)
|
||||
* **AI SDKs:** `google-genai` (Gemini), `anthropic`
|
||||
* **Configuration:** TOML (`tomli-w`)
|
||||
|
||||
**Providers Supported (as of 2026-06-02):**
|
||||
- **Gemini SDK** — Primary; uses server-side CachedContent
|
||||
- **Gemini CLI** — Headless adapter with full functional parity
|
||||
- **Anthropic** — Ephemeral prompt caching (4-breakpoint system)
|
||||
- **DeepSeek** — Code-optimized reasoning
|
||||
- **MiniMax** — OpenAI-compatible alternative
|
||||
|
||||
**Entry Point:** `sloppy.py` (was `gui_legacy.py` before the rename; `gui_2.py` is now the active ImGui application module).
|
||||
|
||||
**Architecture (key modules):**
|
||||
* **`src/gui_2.py`:** Primary ImGui application; App class, frame-sync, HITL dialogs, event system. ~260K lines.
|
||||
* **`src/ai_client.py`:** Multi-provider LLM abstraction (Gemini, Anthropic, DeepSeek, Gemini CLI, MiniMax). Module-level singleton with state.
|
||||
* **`src/mcp_client.py`:** 45 MCP tools (file I/O, AST inspection, C/C++ tree-sitter, analysis, network, runtime, Beads). Three-layer security model.
|
||||
* **`src/multi_agent_conductor.py`:** ConductorEngine + WorkerPool. 4-Tier MMA orchestration with DAG execution.
|
||||
* **`src/dag_engine.py`:** TrackDAG (cycle detection, topological sort) + ExecutionEngine (tick-based state machine).
|
||||
* **`src/aggregate.py`:** Context aggregation pipeline.
|
||||
* **`src/app_controller.py`:** Main controller; bridges GUI and async AI workers.
|
||||
* **`src/api_hooks.py`:** HTTP API on `:8999` for external automation and IPC.
|
||||
* **`src/rag_engine.py`:** RAG subsystem (ChromaDB + embedding providers).
|
||||
* **`src/personas.py`:** Unified agent profile management.
|
||||
* **`src/workspace_manager.py`:** Workspace profile save/load.
|
||||
* **`src/hot_reloader.py`:** State-preserving module reloading.
|
||||
|
||||
Full module list: `src/*.py`. See `docs/guide_architecture.md` for the threading model and event system.
|
||||
**Architecture:**
|
||||
* **`gui_legacy.py`:** The main entry point and Dear PyGui application logic. Handles all panels, layouts, user input, and confirmation dialogs.
|
||||
* **`ai_client.py`:** A unified wrapper for both Gemini and Anthropic APIs. Manages sessions, tool/function-call loops, token estimation, and context history management.
|
||||
* **`aggregate.py`:** Responsible for building the `file_items` context. It reads project configurations, collects files and screenshots, and builds the context into markdown format to send to the AI.
|
||||
* **`mcp_client.py`:** Implements MCP-like tools (e.g., `read_file`, `list_directory`, `search_files`, `web_search`) as native functions that the AI can call. Enforces a strict allowlist for file access.
|
||||
* **`shell_runner.py`:** A sandboxed subprocess wrapper that executes PowerShell scripts (`powershell -NoProfile -NonInteractive -Command`) provided by the AI.
|
||||
* **`project_manager.py`:** Manages per-project TOML configurations (`manual_slop.toml`), serializes discussion entries, and integrates with git (e.g., fetching current commit).
|
||||
* **`session_logger.py`:** Handles timestamped logging of communication history (JSON-L) and tool calls (saving generated `.ps1` files).
|
||||
|
||||
# Building and Running
|
||||
|
||||
@@ -47,33 +27,21 @@ Full module list: `src/*.py`. See `docs/guide_architecture.md` for the threading
|
||||
api_key = "****"
|
||||
[anthropic]
|
||||
api_key = "****"
|
||||
[deepseek]
|
||||
api_key = "****"
|
||||
[minimax]
|
||||
api_key = "****"
|
||||
```
|
||||
The `credentials.toml` is **blacklisted** by the MCP allowlist — AI tools cannot read it.
|
||||
* **Run the Application:**
|
||||
```powershell
|
||||
uv run sloppy.py # Normal mode
|
||||
uv run sloppy.py --enable-test-hooks # With Hook API on :8999
|
||||
uv run .\gui_2.py
|
||||
```
|
||||
|
||||
# Gemini-CLI-Specific Conventions
|
||||
# Development Conventions
|
||||
|
||||
* **Conductor Extension:** Gemini CLI uses the conductor extension, which reads `./conductor/` for task tracking, workflow, and product context. Tracks live in `conductor/tracks/<name>_<YYYYMMDD>/` with `spec.md`, `plan.md`, and `metadata.json`.
|
||||
* **Skill Activation:** Use `activate_skill mma-orchestrator` to load the orchestrator skill, then activate the tier-specific skill (e.g., `activate_skill mma-tier1-orchestrator`).
|
||||
* **The Conductor Convention:** Read `conductor/workflow.md` for the TDD protocol. Treat `conductor/tracks.md` as the task registry. Track implementation follows per-file atomic commits with git notes.
|
||||
* **Tool Execution:** AI-generated PowerShell scripts and tool calls pass through the Execution Clutch (HITL). Scripts are saved to `scripts/generated/<ts>_<seq>.ps1`.
|
||||
* **Context Refresh:** After every tool call that modifies the file system, the application automatically refreshes file contents in the context using `mtime` checks.
|
||||
* **Fuzzy Anchor Resilience:** Line-based operations (`get_file_slice`, `set_file_slice`, `py_update_definition`, fuzzy anchor slices) use FuzzyAnchor to survive file modifications. They can be batched in a single turn without line drift.
|
||||
* **Layout Persistence:** Window layouts are saved to `manualslop_layout.ini` (was `dpg_layout.ini`).
|
||||
* **Logging:** All API communications are logged to `logs/sessions/<id>/comms.log`. Tool calls to `toolcalls.log`. Generated scripts to `scripts/generated/`.
|
||||
* **Configuration Management:** The application uses two tiers of configuration:
|
||||
* `config.toml`: Global settings (UI theme, active provider, list of project paths).
|
||||
* `manual_slop.toml`: Per-project settings (files to track, discussion history, specific system prompts).
|
||||
* **Tool Execution:** The AI acts primarily by generating PowerShell scripts. These scripts MUST be confirmed by the user via a GUI modal before execution. The AI also has access to read-only MCP-style file exploration tools and web search capabilities.
|
||||
* **Context Refresh:** After every tool call that modifies the file system, the application automatically refreshes the file contents in the context using the files' `mtime` to optimize reads.
|
||||
* **UI State Persistence:** Window layouts and docking arrangements are automatically saved to and loaded from `dpg_layout.ini`.
|
||||
* **Code Style:**
|
||||
* Use exactly 1-space indentation for Python (NO EXCEPTIONS). See `conductor/product-guidelines.md`.
|
||||
* Use the manual-slop MCP tools (`manual-slop_edit_file`, `manual-slop_py_update_definition`) for surgical edits — native edit tools destroy indentation.
|
||||
* Internal methods and variables are prefixed with an underscore (e.g., `_flush_to_project`, `_do_generate`).
|
||||
|
||||
# Human-Facing Documentation
|
||||
|
||||
For understanding, using, and maintaining the tool, see `docs/Readme.md` and the 14 deep-dive guides it indexes. See `conductor/product.md` for the product vision.
|
||||
* Use type hints where appropriate.
|
||||
* Internal methods and variables are generally prefixed with an underscore (e.g., `_flush_to_project`, `_do_generate`).
|
||||
* **Logging:** All API communications are logged to `logs/comms_<ts>.log`. All executed scripts are saved to `scripts/generated/`.
|
||||
@@ -1,37 +1,16 @@
|
||||
# Manual Slop
|
||||
|
||||
## *Note by the Human behind this*
|
||||
|
||||
I see the potential of AI as both an invaluable learning, percise techinical writing and code generation tool when handled with care and deep curation. This repo is both a proof of concept of this assertion and a tool to achieve this because every single paid or vested "AI Agenic developer" seems to not be interested in these principles.
|
||||
|
||||
The License for this will most likely be MIT or zlib. Nearly the entire codebase was heavily curated AI generated code. From vendors that have pirated nearly everyone's work. Most I can do is just be open to kofi and let whatever rep from this evolve.
|
||||
|
||||
## Why did you do this in Python
|
||||
|
||||
*TLDR: I apologize it was out of sheer practicality with time allocation and resources available. I really don't like python.*
|
||||
|
||||
Before I winged this project on a whim and frustration, I had tried AI with various langauges, unfortuantely python did remarkably well.
|
||||
|
||||
* Attic-Greek-TTS - ~3 kloc TTS tool for a dead language, with spectrograph anaylsis for verification.
|
||||
* forth_bootslop - Used scripts to gather and curate large amounts information and data from sources into formats it could digest.
|
||||
|
||||
Prior to making this tool I had very dissapointing performance with more favaorable langauges: C11, Odin, or Jai (Which I don't have direct access to).
|
||||
|
||||
I don't enjoy web browser sandboxed runtimes so I didn't use javascript. I haven't attempted AI with lua much but that was the alternative, and I knew python had the next best support for AI toolchain bindings along with an imgui package. So based purely on these factors alone I resolved to attempt this in Python.
|
||||
|
||||
## Summary
|
||||
|
||||

|
||||
|
||||
A high-density GUI orchestrator for local LLM-driven coding sessions. Manual Slop bridges high-latency AI reasoning with a low-latency ImGui render loop via a thread-safe asynchronous pipeline, ensuring every AI-generated payload passes through a human-auditable gate before execution.
|
||||
|
||||
**Design Philosophy**: Full manual control over vendor API metrics, agent capabilities, and context memory usage. High information density, tactile interactions, and explicit confirmation for destructive actions.
|
||||
|
||||
**Tech Stack**: Python 3.11+, ImGui Bundle (Dear ImGui + imgui-node-editor + imgui_markdown + ImGuiColorTextEdit), FastAPI, Uvicorn, tree-sitter (Python, C, C++), chromadb (RAG), pywin32 (Windows window frame), psutil (telemetry), pydantic, dolt (Beads)
|
||||
**Tech Stack**: Python 3.11+, Dear PyGui / ImGui Bundle, FastAPI, Uvicorn, tree-sitter
|
||||
**Providers**: Gemini API, Anthropic API, DeepSeek, Gemini CLI (headless), MiniMax
|
||||
**Platform**: Windows (PowerShell) — single developer, local use
|
||||
|
||||

|
||||

|
||||
|
||||
---
|
||||
|
||||
@@ -56,18 +35,13 @@ Hierarchical task decomposition with specialized models and strict token firewal
|
||||
- **Three Dialog Types**: ConfirmDialog (scripts), MMAApprovalDialog (steps), MMASpawnApprovalDialog (workers)
|
||||
- **Editable Payloads**: Review, modify, or reject any AI-generated content before execution
|
||||
|
||||
### 45 MCP Tools with Sandboxing
|
||||
### 26 MCP Tools with Sandboxing
|
||||
Three-layer security model: Allowlist Construction → Path Validation → Resolution Gate
|
||||
- **File I/O**: read, list, search, slice, edit, tree
|
||||
- **AST-Based (Python)**: skeleton, outline, definition, signature, class summary, docstring, var declaration, hierarchy, imports, syntax check, find usages
|
||||
- **AST-Based (C/C++)**: tree-sitter powered skeleton, outline, definition, signature, and surgical update tools for C and C++
|
||||
- **File Editing**: surgical string match (`edit_file`) preserving indentation and line endings
|
||||
- **Analysis**: summary, git diff, find usages, imports, syntax check, hierarchy, derive code path
|
||||
- **Network**: web search, URL fetch (dependency-free, stdlib only)
|
||||
- **AST-Based (Python)**: skeleton, outline, definition, signature, class summary, docstring
|
||||
- **Analysis**: summary, git diff, find usages, imports, syntax check, hierarchy
|
||||
- **Network**: web search, URL fetch
|
||||
- **Runtime**: UI performance metrics
|
||||
- **Beads**: bd_create, bd_list, bd_ready, bd_update for Dolt-backed issue tracking
|
||||
|
||||
See [docs/guide_tools.md](./docs/guide_tools.md) for the full inventory.
|
||||
|
||||
### Parallel Tool Execution
|
||||
Multiple independent tool calls within a single AI turn execute concurrently via `asyncio.gather`, significantly reducing latency.
|
||||
@@ -88,10 +62,6 @@ The **Execution Clutch** suspends the AI execution thread on a `threading.Condit
|
||||
|
||||
The **MMA (Multi-Model Agent)** system decomposes epics into tracks, tracks into DAG-ordered tickets, and executes each ticket with a stateless Tier 3 worker that starts from `ai_client.reset_session()` — no conversational bleed between tickets ([details](./docs/guide_mma.md)).
|
||||
|
||||
### Test Coverage
|
||||
|
||||
The project has **273 test files** with 98.9% pass rate (272/273 in the latest batched run; the 1 failure is a pre-existing flake in `test_rag_phase4_stress` that passes in isolation). Most failures are caught and fixed via the 4-tier MMA test-harden track system. See [docs/guide_testing.md](./docs/guide_testing.md) for the full testing contract.
|
||||
|
||||
---
|
||||
|
||||
## Documentation
|
||||
@@ -99,48 +69,11 @@ The project has **273 test files** with 98.9% pass rate (272/273 in the latest b
|
||||
| Guide | Scope |
|
||||
|---|---|
|
||||
| [Readme](./docs/Readme.md) | Documentation index, GUI panel reference, configuration files, environment variables |
|
||||
| [Architecture](./docs/guide_architecture.md) | Threading model, event system, AI client multi-provider architecture (Gemini, Anthropic, DeepSeek, Gemini CLI, MiniMax), HITL mechanism, comms logging, RAG integration, Tier 4 patch flow |
|
||||
| [Tools & IPC](./docs/guide_tools.md) | MCP Bridge 3-layer security, 45-tool inventory, Hook API endpoints, ApiHookClient reference, shell runner, Beads tools |
|
||||
| [MMA Orchestration](./docs/guide_mma.md) | 4-tier hierarchy, Ticket/Track/WorkerContext data structures, DAG engine, ConductorEngine, worker lifecycle, persona application, abort propagation |
|
||||
| [Simulations](./docs/guide_simulations.md) | `live_gui` fixture, Puppeteer pattern, mock provider, visual verification, test areas by subsystem, headless service |
|
||||
| [Context Curation](./docs/guide_context_curation.md) | AST masking, fuzzy anchor slices, structural file editor, view presets, history snapshotting |
|
||||
| [Shaders & Window](./docs/guide_shaders_and_window.md) | Hybrid shader injection, custom window frame, NERV theme effects |
|
||||
| [Themes](./docs/guide_themes.md) | TOML-based theming, `[colors]` table, 4-syntax-palette upstream limit, `load_themes_from_disk` / `apply_syntax_palette` API, color-callable convention |
|
||||
| [Meta-Boundary](./docs/guide_meta_boundary.md) | Application vs Meta-Tooling domains, inter-domain bridges, cross-tool abstractions |
|
||||
|
||||
---
|
||||
|
||||
## Subsystem Index
|
||||
|
||||
| Subsystem | Guide | Primary Module(s) |
|
||||
|---|---|---|
|
||||
| Multi-provider LLM client | [Architecture](./docs/guide_architecture.md#ai-client-multi-provider-architecture) | `src/ai_client.py` |
|
||||
| 4-Tier MMA orchestration | [MMA](./docs/guide_mma.md) | `src/multi_agent_conductor.py`, `src/dag_engine.py` |
|
||||
| DAG engine & ticket lifecycle | [MMA](./docs/guide_mma.md#dag-engine-dag_enginepy) | `src/dag_engine.py` |
|
||||
| MCP tools & Hook API | [Tools & IPC](./docs/guide_tools.md) | `src/mcp_client.py`, `src/api_hooks.py` |
|
||||
| Execution Clutch (HITL) | [Architecture](./docs/guide_architecture.md#the-execution-clutch-human-in-the-loop) | `src/app_controller.py` |
|
||||
| Context composition & aggregation | [Context Curation](./docs/guide_context_curation.md) | `src/aggregate.py`, `src/file_cache.py` |
|
||||
| AST inspection & slicing | [Context Curation](./docs/guide_context_curation.md#granular-ast-control) | `src/file_cache.py`, `src/fuzzy_anchor.py` |
|
||||
| Personas (unified profiles) | *See [guide_mma.md](./docs/guide_mma.md#persona-application); dedicated guide pending* | `src/personas.py` |
|
||||
| Tool bias engine | *See [guide_tools.md](./docs/guide_tools.md); dedicated guide pending* | `src/tool_bias.py` |
|
||||
| RAG (Retrieval-Augmented Generation) | *See [guide_architecture.md](./docs/guide_architecture.md#rag-integration); dedicated guide pending* | `src/rag_engine.py` |
|
||||
| Beads mode (Dolt issue tracking) | *See [guide_tools.md](./docs/guide_tools.md#beads-tools); dedicated guide pending* | `src/beads_client.py` |
|
||||
| Hot reload (state-preserving) | *Dedicated guide pending* | `src/hot_reloader.py` |
|
||||
| Discussion metrics & compression | [Architecture](./docs/guide_architecture.md#discussion-compression) | `src/ai_client.py` |
|
||||
| Test infrastructure & simulations | [Simulations](./docs/guide_simulations.md) | `tests/conftest.py`, `simulation/` |
|
||||
| Headless service (FastAPI) | [Simulations](./docs/guide_simulations.md#headless-service-tests) | `src/api_hooks.py` |
|
||||
| NERV theme & visual effects | [Shaders & Window](./docs/guide_shaders_and_window.md#4-nerv-theme-effects) | `src/theme_nerv.py`, `src/theme_nerv_fx.py` |
|
||||
| TOML theme system (palette + syntax) | [Themes](./docs/guide_themes.md) | `src/theme_2.py`, `src/theme_models.py` |
|
||||
| Custom window frame | [Shaders & Window](./docs/guide_shaders_and_window.md#2-custom-window-frame-strategy) | `src/gui_2.py` |
|
||||
| Workspace profiles (docking layouts) | *Dedicated guide pending* | `src/workspace_manager.py` |
|
||||
| History (undo/redo) | [Context Curation](./docs/guide_context_curation.md#context-snapshotting-per-take) | `src/history.py` |
|
||||
| External MCP integration | [Tools & IPC](./docs/guide_tools.md#external-mcp-integration) | `src/mcp_client.py` |
|
||||
| Telemetry & performance monitoring | [Architecture](./docs/guide_architecture.md#telemetry--auditing) | `src/performance_monitor.py` |
|
||||
| Session logging | [Tools & IPC](./docs/guide_tools.md#session-logging) | `src/session_logger.py` |
|
||||
| MMA dashboard & node editor | [MMA](./docs/guide_mma.md) | `src/gui_2.py:_render_mma_dashboard` |
|
||||
| Cross-tool abstractions (conductor) | [Meta-Boundary](./docs/guide_meta_boundary.md#the-cross-tool-abstractions) | `conductor/` |
|
||||
|
||||
Subsystems marked "dedicated guide pending" are slated for dedicated `docs/guide_*.md` files in upcoming docs work. For now, their details live inline in the guides listed under [Documentation](#documentation) above.
|
||||
| [Architecture](./docs/guide_architecture.md) | Threading model, event system, AI client multi-provider architecture, HITL mechanism, comms logging |
|
||||
| [Tools & IPC](./docs/guide_tools.md) | MCP Bridge 3-layer security, 26 tool inventory, Hook API endpoints, ApiHookClient reference, shell runner |
|
||||
| [MMA Orchestration](./docs/guide_mma.md) | 4-tier hierarchy, Ticket/Track data structures, DAG engine, ConductorEngine, worker lifecycle, abort propagation |
|
||||
| [Simulations](./docs/guide_simulations.md) | `live_gui` fixture, Puppeteer pattern, mock provider, visual verification, ASTParser / summarizer |
|
||||
| [Meta-Boundary](./docs/guide_meta_boundary.md) | Application vs Meta-Tooling domains, inter-domain bridges, safety model separation |
|
||||
|
||||
---
|
||||
|
||||
@@ -172,13 +105,8 @@ api_key = "YOUR_KEY"
|
||||
|
||||
[deepseek]
|
||||
api_key = "YOUR_KEY"
|
||||
|
||||
[minimax]
|
||||
api_key = "YOUR_KEY"
|
||||
```
|
||||
|
||||
Each provider's key is loaded by the corresponding `_ensure_<provider>_client()` in `src/ai_client.py`. The `credentials.toml` is **blacklisted** by the MCP allowlist — AI tools cannot read it under any circumstance.
|
||||
|
||||
### Running
|
||||
|
||||
```powershell
|
||||
@@ -217,59 +145,34 @@ The Multi-Model Agent system uses hierarchical task decomposition with specializ
|
||||
|
||||
## Module by Domain
|
||||
|
||||
### src/ — Core implementation (53 modules)
|
||||
### src/ — Core implementation
|
||||
|
||||
| File | Role |
|
||||
|---|---|
|
||||
| `src/gui_2.py` | Primary ImGui interface — App class, frame-sync, HITL dialogs, event system |
|
||||
| `src/app_controller.py` | Headless controller; bridges GUI and async AI workers |
|
||||
| `src/ai_client.py` | Multi-provider LLM abstraction (Gemini, Anthropic, DeepSeek, MiniMax) |
|
||||
| `src/mcp_client.py` | 45 MCP tools + `run_powershell` (canonical 46 in `models.AGENT_TOOL_NAMES`); 3-layer filesystem security and tool dispatch |
|
||||
| `src/api_hooks.py` | HookServer — REST API on `127.0.0.1:8999` for external automation |
|
||||
| `src/api_hook_client.py` | Python client for the Hook API (used by tests and external tooling) |
|
||||
| `src/multi_agent_conductor.py` | ConductorEngine — Tier 2 orchestration loop with DAG execution |
|
||||
| `src/dag_engine.py` | TrackDAG (dependency graph) + ExecutionEngine (tick-based state machine) |
|
||||
| `src/models.py` | Ticket, Track, WorkerContext, Metadata, Persona, WorkspaceProfile, etc. |
|
||||
| `src/events.py` | EventEmitter, AsyncEventQueue, UserRequestEvent |
|
||||
| `src/project_manager.py` | TOML config persistence, discussion management, track state |
|
||||
| `src/session_logger.py` | JSON-L + markdown audit trails (comms, tools, CLI, hooks) |
|
||||
| `src/rag_engine.py` | RAG subsystem (ChromaDB + embedding providers) |
|
||||
| `src/beads_client.py` | Beads/Dolt-backed issue tracking client |
|
||||
| `src/hot_reloader.py` | State-preserving module reloader |
|
||||
| `src/personas.py` | Unified agent profile manager |
|
||||
| `src/presets.py` | System prompt preset manager |
|
||||
| `src/context_presets.py` | Context composition preset manager |
|
||||
| `src/tool_presets.py` | Tool preset manager |
|
||||
| `src/tool_bias.py` | Tool bias engine (semantic nudging + dynamic strategy) |
|
||||
| `src/command_palette.py` | Command palette + fuzzy matcher + registry |
|
||||
| `src/commands.py` | 33 registered commands (toggle, theme, layout, AI, project, tools) |
|
||||
| `src/workspace_manager.py` | Workspace profile save/load with scope inheritance |
|
||||
| `src/theme_2.py` | Theme system (palette/font/etc.) |
|
||||
| `src/theme_nerv.py` | NERV Tactical Console theme |
|
||||
| `src/theme_nerv_fx.py` | NERV FX (scanlines, flicker, alert) |
|
||||
| `src/shell_runner.py` | PowerShell execution with 60s timeout, env config, qa_callback + patch_callback for Tier 4 QA |
|
||||
| `src/file_cache.py` | ASTParser (tree-sitter) — skeleton, curated, targeted views |
|
||||
| `src/fuzzy_anchor.py` | Fuzzy anchor slice algorithm |
|
||||
| `src/history.py` | Undo/redo HistoryManager with UISnapshot |
|
||||
| `src/imgui_scopes.py` | ImGui context managers (imscope) for the UI delegation pattern |
|
||||
| `src/performance_monitor.py` | FPS, frame time, CPU, input lag tracking |
|
||||
| `src/log_registry.py` | Session metadata persistence |
|
||||
| `src/log_pruner.py` | Automated log cleanup based on age and whitelist |
|
||||
| `src/paths.py` | Centralized path resolution with environment variable overrides |
|
||||
| `src/cost_tracker.py` | Token cost estimation for API calls |
|
||||
| `src/gemini_cli_adapter.py` | CLI subprocess adapter with session management |
|
||||
| `src/mma_prompts.py` | Tier-specific system prompts for MMA orchestration |
|
||||
| `src/summarize.py` | Heuristic file summaries (imports, classes, functions) |
|
||||
| `src/outline_tool.py` | Hierarchical code outline via stdlib `ast` |
|
||||
| `src/summary_cache.py` | SHA256-keyed summary LRU cache |
|
||||
| `src/markdown_helper.py` | Markdown rendering helpers |
|
||||
| `src/patch_modal.py` | Patch approval modal |
|
||||
| `src/diff_viewer.py` | Diff rendering |
|
||||
| `src/external_editor.py` | External editor integration (VSCode, etc.) |
|
||||
| `src/orchestrator_pm.py` | Orchestrator project manager |
|
||||
| `src/conductor_tech_lead.py` | Tier 2 ticket generation from track briefs |
|
||||
| `src/synthesis_formatter.py` | Multi-take synthesis |
|
||||
| `src/thinking_parser.py` | AI thinking-trace extraction |
|
||||
| `src/mcp_client.py` | 26 MCP tools with filesystem sandboxing and tool dispatch |
|
||||
| `src/api_hooks.py` | HookServer — REST API on `127.0.0.1:8999 for external automation |
|
||||
| `src/api_hook_client.py` | Python client for the Hook API (used by tests and external tooling) |
|
||||
| `src/multi_agent_conductor.py` | ConductorEngine — Tier 2 orchestration loop with DAG execution |
|
||||
| `src/conductor_tech_lead.py` | Tier 2 ticket generation from track briefs |
|
||||
| `src/dag_engine.py` | TrackDAG (dependency graph) + ExecutionEngine (tick-based state machine) |
|
||||
| `src/models.py` | Ticket, Track, WorkerContext, Metadata, Track state |
|
||||
| `src/events.py` | EventEmitter, AsyncEventQueue, UserRequestEvent |
|
||||
| `src/project_manager.py` | TOML config persistence, discussion management, track state |
|
||||
| `src/session_logger.py` | JSON-L + markdown audit trails (comms, tools, CLI, hooks) |
|
||||
| `src/shell_runner.py` | PowerShell execution with timeout, env config, QA callback |
|
||||
| `src/file_cache.py` | ASTParser (tree-sitter) — skeleton, curated, and targeted views |
|
||||
| `src/summarize.py` | Heuristic file summaries (imports, classes, functions) |
|
||||
| `src/outline_tool.py` | Hierarchical code outline via stdlib `ast` |
|
||||
| `src/performance_monitor.py` | FPS, frame time, CPU, input lag tracking |
|
||||
| `src/log_registry.py` | Session metadata persistence |
|
||||
| `src/log_pruner.py` | Automated log cleanup based on age and whitelist |
|
||||
| `src/paths.py` | Centralized path resolution with environment variable overrides |
|
||||
| `src/cost_tracker.py` | Token cost estimation for API calls |
|
||||
| `src/gemini_cli_adapter.py` | CLI subprocess adapter with session management |
|
||||
| `src/mma_prompts.py` | Tier-specific system prompts for MMA orchestration |
|
||||
| `src/theme_*.py` | UI theming (dark, light modes) |
|
||||
|
||||
Simulation modules in `simulation/`:
|
||||
| File | Role |
|
||||
|
||||
@@ -0,0 +1,158 @@
|
||||
# TASKS.md
|
||||
<!-- Quick-read pointer to active and planned conductor tracks -->
|
||||
<!-- Source of truth for task state is conductor/tracks/*/plan.md -->
|
||||
|
||||
## Active Tracks
|
||||
*(none — all planned tracks queued below)*
|
||||
*See tracks.md for active track status*
|
||||
|
||||
## Completed This Session
|
||||
*(See archive: strict_execution_queue_completed_20260306)*
|
||||
|
||||
---
|
||||
|
||||
#### 0. conductor_path_configurable_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** CRITICAL
|
||||
- **Goal:** Eliminate hardcoded conductor paths. Make path configurable via config.toml or CONDUCTOR_DIR env var. Allow running app to use separate directory from development tracks.
|
||||
|
||||
## Phase 3: Future Horizons (Tracks 1-20)
|
||||
*Initialized: 2026-03-06*
|
||||
|
||||
### Architecture & Backend
|
||||
|
||||
#### 1. true_parallel_worker_execution_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** High
|
||||
- **Goal:** Implement true concurrency for the DAG engine. Once threading.local() is in place, the ExecutionEngine should spawn independent Tier 3 workers in parallel (e.g., 4 workers handling 4 isolated tests simultaneously). Requires strict file-locking or a Git-based diff-merging strategy to prevent AST collision.
|
||||
|
||||
#### 2. deep_ast_context_pruning_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** High
|
||||
- **Goal:** Before dispatching a Tier 3 worker, use tree_sitter to automatically parse the target file AST, strip out unrelated function bodies, and inject a surgically condensed skeleton into the worker prompt. Guarantees the AI only sees what it needs to edit, drastically reducing token burn.
|
||||
|
||||
#### 3. visual_dag_ticket_editing_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** Medium
|
||||
- **Goal:** Replace the linear ticket list in the GUI with an interactive Node Graph using ImGui Bundle node editor. Allow the user to visually drag dependency lines, split nodes, or delete tasks before clicking Execute Pipeline.
|
||||
|
||||
#### 4. tier4_auto_patching_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** Medium
|
||||
- **Goal:** Elevate Tier 4 from a log summarizer to an auto-patcher. When a verification test fails, Tier 4 generates a .patch file. The GUI intercepts this and presents a side-by-side Diff Viewer. The user clicks Apply Patch to instantly resume the pipeline.
|
||||
|
||||
#### 5. native_orchestrator_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** Low
|
||||
- **Goal:** Absorb the Conductor extension entirely into the core application. Manual Slop should natively read/write plan.md, manage the metadata.json, and orchestrate the MMA tiers in pure Python, removing the dependency on external CLI shell executions (mma_exec.py).
|
||||
|
||||
---
|
||||
|
||||
### GUI Overhauls & Visualizations
|
||||
|
||||
#### 6. cost_token_analytics_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** High
|
||||
- **Goal:** Real-time cost tracking panel displaying cost per model, session totals, and breakdown by tier. Uses existing cost_tracker.py which is implemented but has no GUI.
|
||||
|
||||
#### 7. performance_dashboard_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** High
|
||||
- **Goal:** Expand performance metrics panel with CPU/RAM usage, frame time, input lag with historical graphs. Uses existing performance_monitor.py which has basic metrics but no detailed visualization.
|
||||
|
||||
#### 8. mma_multiworker_viz_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** High
|
||||
- **Goal:** Split-view GUI for parallel worker streams per tier. Visualize multiple concurrent workers with individual status, output tabs, and resource usage. Enable kill/restart per worker.
|
||||
|
||||
#### 9. cache_analytics_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** Medium
|
||||
- **Goal:** Gemini cache hit/miss visualization, memory usage, TTL status display. Uses existing ai_client.get_gemini_cache_stats() which is not displayed in GUI.
|
||||
|
||||
#### 10. tool_usage_analytics_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** Medium
|
||||
- **Goal:** Analytics panel showing most-used tools, average execution time, and failure rates. Uses existing tool_log_callback data.
|
||||
|
||||
#### 11. session_insights_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** Medium
|
||||
- **Goal:** Token usage over time, cost projections, session summary with efficiency scores. Visualize session_logger data.
|
||||
|
||||
#### 12. track_progress_viz_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** Medium
|
||||
- **Goal:** Progress bars and percentage completion for active tracks and tickets. Better visualization of DAG execution state.
|
||||
|
||||
#### 13. manual_skeleton_injection_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** Medium
|
||||
- **Goal:** Add UI controls to manually flag files for skeleton injection in discussions. Allow agent to request full file reads or specific def/class definitions on-demand.
|
||||
|
||||
#### 14. on_demand_def_lookup_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** Medium
|
||||
- **Goal:** Add ability for agent to request specific class/function definitions during discussion. User can @mention a symbol and get its full definition inline.
|
||||
|
||||
---
|
||||
|
||||
### Manual UX Controls
|
||||
|
||||
#### 15. ticket_queue_mgmt_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** High
|
||||
- **Goal:** Allow user to manually reorder, prioritize, or requeue tickets in the DAG. Add drag-drop reordering, priority tags, and bulk selection.
|
||||
|
||||
#### 16. kill_abort_workers_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** High
|
||||
- **Goal:** Add ability to kill/abort a running Tier 3 worker mid-execution. Currently workers run to completion; add cancel button.
|
||||
|
||||
#### 17. manual_block_control_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** Medium
|
||||
- **Goal:** Allow user to manually block or unblock tickets with custom reasons. Currently blocked tickets rely on dependency resolution; add manual override.
|
||||
|
||||
#### 18. pipeline_pause_resume_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** Medium
|
||||
- **Goal:** Add global pause/resume for the entire DAG execution pipeline. Allow user to freeze all worker activity and resume later.
|
||||
|
||||
#### 19. per_ticket_model_20260306
|
||||
- **Status:** Planned
|
||||
- **Priority:** Low
|
||||
- **Goal:** Allow user to manually select which model to use for a specific ticket, overriding the default tier model.
|
||||
|
||||
#### 20. manual_ux_validation_20260302
|
||||
- **Status:** Planned
|
||||
- **Priority:** Medium
|
||||
- **Goal:** Interactive human-in-the-loop track to review and adjust GUI UX, animations, popups, and layout structures.
|
||||
|
||||
---
|
||||
|
||||
### C/C++ Language Support
|
||||
|
||||
#### 25. ts_cpp_tree_sitter_20260308
|
||||
- **Status:** Planned
|
||||
- **Priority:** High
|
||||
- **Goal:** Add tree-sitter C and C++ grammars. Extend ASTParser to support C/C++ skeleton and outline extraction. Add MCP tools ts_c_get_skeleton, ts_cpp_get_skeleton, ts_c_get_code_outline, ts_cpp_get_code_outline.
|
||||
|
||||
#### 26. gencpp_python_bindings_20260308
|
||||
- **Status:** Planned
|
||||
- **Priority:** Medium
|
||||
- **Goal:** Bootstrap standalone Python project with CFFI bindings for gencpp C library. Provides foundation for richer C++ AST parsing in future (beyond tree-sitter syntax).
|
||||
|
||||
---
|
||||
|
||||
### Path Configuration
|
||||
|
||||
#### 27. project_conductor_dir_20260308
|
||||
- **Status:** Planned
|
||||
- **Priority:** High
|
||||
- **Goal:** Make conductor directory per-project. Each project TOML can specify custom conductor dir for isolated track/state management. Extends existing global path config.
|
||||
|
||||
#### 28. gui_path_config_20260308
|
||||
- **Status:** Planned
|
||||
- **Priority:** High
|
||||
- **Goal:** Add path configuration UI to Context Hub. Allow users to view and edit configurable paths (conductor, logs, scripts) directly from the GUI.
|
||||
-133
@@ -1,133 +0,0 @@
|
||||
"""Manually start sloppy.py, then run the test against the same GUI process."""
|
||||
import subprocess
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
import socket
|
||||
from pathlib import Path
|
||||
|
||||
# Start sloppy.py
|
||||
project_root = Path("C:/projects/manual_slop").absolute()
|
||||
gui_script = project_root / "sloppy.py"
|
||||
test_workspace = project_root / "tests" / "artifacts" / "live_gui_workspace"
|
||||
|
||||
# Clean up old workspace
|
||||
if test_workspace.exists():
|
||||
import shutil
|
||||
for _ in range(5):
|
||||
try:
|
||||
shutil.rmtree(test_workspace)
|
||||
break
|
||||
except PermissionError:
|
||||
time.sleep(0.5)
|
||||
|
||||
test_workspace.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Create minimal files
|
||||
(test_workspace / "manual_slop.toml").write_text("[project]\nname = 'TestProject'\n\n[conductor]\ndir = 'conductor'\n", encoding="utf-8")
|
||||
(test_workspace / "conductor" / "tracks").mkdir(parents=True, exist_ok=True)
|
||||
|
||||
config_content = {
|
||||
'ai': {'provider': 'gemini', 'model': 'gemini-2.5-flash-lite'},
|
||||
'projects': {
|
||||
'paths': [str((test_workspace / 'manual_slop.toml').absolute())],
|
||||
'active': str((test_workspace / 'manual_slop.toml').absolute())
|
||||
},
|
||||
'paths': {
|
||||
'logs_dir': str((test_workspace / "logs").absolute()),
|
||||
'scripts_dir': str((test_workspace / "scripts" / "generated").absolute())
|
||||
},
|
||||
}
|
||||
import tomli_w
|
||||
with open(test_workspace / 'config.toml', 'wb') as f:
|
||||
tomli_w.dump(config_content, f)
|
||||
|
||||
# Start sloppy.py
|
||||
os.makedirs("logs", exist_ok=True)
|
||||
log_file = open("logs/sloppy_py_test_2.log", "w", encoding="utf-8")
|
||||
env = os.environ.copy()
|
||||
env["PYTHONPATH"] = str(project_root.absolute())
|
||||
env["SLOP_CONFIG"] = str((test_workspace / "config.toml").absolute())
|
||||
env["SLOP_GLOBAL_PRESETS"] = str((test_workspace / "presets.toml").absolute())
|
||||
env["SLOP_GLOBAL_TOOL_PRESETS"] = str((test_workspace / "tool_presets.toml").absolute())
|
||||
|
||||
print("Starting sloppy.py...")
|
||||
proc = subprocess.Popen(
|
||||
["uv", "run", "python", "-u", str(gui_script), "--enable-test-hooks"],
|
||||
stdout=log_file,
|
||||
stderr=log_file,
|
||||
text=True,
|
||||
cwd=str(test_workspace.absolute()),
|
||||
env=env,
|
||||
creationflags=subprocess.CREATE_NEW_PROCESS_GROUP if os.name == 'nt' else 0
|
||||
)
|
||||
print(f"Started PID: {proc.pid}")
|
||||
|
||||
# Wait for hook server
|
||||
import requests
|
||||
for i in range(30):
|
||||
try:
|
||||
resp = requests.get("http://127.0.0.1:8999/status", timeout=0.5)
|
||||
if resp.status_code == 200:
|
||||
print(f"Hook server ready after {i*0.5}s")
|
||||
break
|
||||
except Exception:
|
||||
time.sleep(0.5)
|
||||
else:
|
||||
print("Hook server didn't start!")
|
||||
proc.kill()
|
||||
sys.exit(1)
|
||||
|
||||
# Wait extra for imgui to fully initialize
|
||||
print("Waiting 3s for imgui to stabilize...")
|
||||
time.sleep(3.0)
|
||||
|
||||
# Now run the actual test flow
|
||||
from src.api_hook_client import ApiHookClient
|
||||
client = ApiHookClient()
|
||||
|
||||
print("\n[1] set_value show_windows {Diagnostics: True}")
|
||||
client.set_value('show_windows', {'Diagnostics': True})
|
||||
time.sleep(1.0)
|
||||
|
||||
print("\n[2] push_event save_workspace_profile")
|
||||
client.push_event('custom_callback', {'callback': 'save_workspace_profile', 'args': ['Tier3Profile', 'project']})
|
||||
time.sleep(1.0)
|
||||
|
||||
print("\n[3] set_value show_windows {Diagnostics: False}")
|
||||
client.set_value('show_windows', {'Diagnostics': False})
|
||||
|
||||
print("\n[4] set_value ui_auto_switch_layout")
|
||||
client.set_value('ui_auto_switch_layout', True)
|
||||
|
||||
print("\n[5] set_value ui_tier_layout_bindings")
|
||||
client.set_value('ui_tier_layout_bindings', {'Tier 1': '', 'Tier 2': '', 'Tier 3': 'Tier3Profile', 'Tier 4': ''})
|
||||
|
||||
def trigger_tier(tier):
|
||||
client.push_event("mma_state_update", {"status": "running", "active_tier": tier})
|
||||
|
||||
print("\n[6] trigger Tier 2")
|
||||
trigger_tier('Tier 2 (Tech Lead)')
|
||||
time.sleep(1.0)
|
||||
val = client.get_value('show_windows')
|
||||
print(f"[after Tier 2] show_windows: {val!r}")
|
||||
assert val is not None, "show_windows is None"
|
||||
assert val.get('Diagnostics', False) == False, f"Expected False, got {val}"
|
||||
|
||||
print("\n[7] trigger Tier 3")
|
||||
trigger_tier('Tier 3 (Worker): task-1')
|
||||
time.sleep(1.0)
|
||||
val = client.get_value('show_windows')
|
||||
print(f"[after Tier 3] show_windows: {val!r}")
|
||||
assert val.get('Diagnostics', False) == True, f"Expected True, got {val}"
|
||||
|
||||
print("\nALL ASSERTIONS PASSED!")
|
||||
|
||||
# Cleanup
|
||||
print("Killing sloppy.py...")
|
||||
proc.kill()
|
||||
try:
|
||||
proc.wait(timeout=5)
|
||||
except:
|
||||
pass
|
||||
log_file.close()
|
||||
@@ -0,0 +1,9 @@
|
||||
|
||||
import sys
|
||||
import os
|
||||
try:
|
||||
from imgui_bundle import hello_imgui
|
||||
rp = hello_imgui.RunnerParams()
|
||||
print(f"Default borderless: {rp.app_window_params.borderless}")
|
||||
except Exception as e:
|
||||
print(f"Error: {e}")
|
||||
@@ -1,17 +0,0 @@
|
||||
{
|
||||
"name": "aggregation_smarter_summaries",
|
||||
"created": "2026-03-22",
|
||||
"status": "future",
|
||||
"priority": "medium",
|
||||
"affected_files": [
|
||||
"src/aggregate.py",
|
||||
"src/file_cache.py",
|
||||
"src/ai_client.py",
|
||||
"src/models.py"
|
||||
],
|
||||
"related_tracks": [
|
||||
"discussion_hub_panel_reorganization (in_progress)",
|
||||
"system_context_exposure (future)"
|
||||
],
|
||||
"notes": "Deferred from discussion_hub_panel_reorganization planning. Improves aggregation with sub-agent summarization and hash-based caching."
|
||||
}
|
||||
@@ -1,49 +0,0 @@
|
||||
# Implementation Plan: Smarter Aggregation with Sub-Agent Summarization
|
||||
|
||||
## Phase 1: Hash-Based Summary Cache [checkpoint: e972cf4]
|
||||
Focus: Implement file hashing and cache storage
|
||||
|
||||
- [x] Task: Research existing file hash implementations in codebase 3218104
|
||||
- [x] Task: Design cache storage format (file-based vs project state) 3218104
|
||||
- [x] Task: Implement hash computation for aggregation files 3218104
|
||||
- [x] Task: Implement summary cache storage and retrieval 3218104
|
||||
- [x] Task: Add cache invalidation when file content changes 3218104
|
||||
- [x] Task: Write tests for hash computation and cache 3218104
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 1: Hash-Based Summary Cache' e972cf4
|
||||
|
||||
## Phase 2: Sub-Agent Summarization [checkpoint: 7efcc7c]
|
||||
Focus: Implement sub-agent summarization during aggregation
|
||||
|
||||
- [x] Task: Audit current aggregate.py flow 3218104
|
||||
- [x] Task: Define summarization prompt strategy for code vs text files 3218104
|
||||
- [x] Task: Implement sub-agent invocation during aggregation 3218104
|
||||
- [x] Task: Handle provider-specific differences in sub-agent calls 3218104
|
||||
- [x] Task: Write tests for sub-agent summarization 3218104
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 2: Sub-Agent Summarization' 7efcc7c
|
||||
|
||||
## Phase 3: Tiered Aggregation Strategy [checkpoint: fa00a84]
|
||||
Focus: Respect tier-level aggregation configuration
|
||||
|
||||
- [x] Task: Audit how tiers receive context currently 628b580
|
||||
- [x] Task: Implement tier-level aggregation strategy selection 628b580
|
||||
- [x] Task: Connect tier strategy to Persona configuration 628b580
|
||||
- [x] Task: Write tests for tiered aggregation 628b580
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 3: Tiered Aggregation Strategy' fa00a84
|
||||
|
||||
## Phase 4: UI Integration [checkpoint: a1c204f]
|
||||
Focus: Expose cache status and controls in UI
|
||||
|
||||
- [x] Task: Add cache status indicator to Files & Media panel 6bf6c79
|
||||
- [x] Task: Add "Clear Summary Cache" button 6bf6c79
|
||||
- [x] Task: Add aggregation configuration to Project Settings or AI Settings 6bf6c79
|
||||
- [x] Task: Write tests for UI integration 6bf6c79
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 4: UI Integration' a1c204f
|
||||
|
||||
## Phase 5: Cache Persistence & Optimization [checkpoint: e0737dc]
|
||||
Focus: Ensure cache persists and is performant
|
||||
|
||||
- [x] Task: Implement persistent cache storage to disk fb2df2a
|
||||
- [x] Task: Add cache size management (max entries, LRU) fb2df2a
|
||||
- [x] Task: Performance testing with large codebases fb2df2a
|
||||
- [x] Task: Write tests for persistence fb2df2a
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 5: Cache Persistence & Optimization' e0737dc
|
||||
@@ -1,103 +0,0 @@
|
||||
# Specification: Smarter Aggregation with Sub-Agent Summarization
|
||||
|
||||
## 1. Overview
|
||||
|
||||
This track improves the context aggregation system to use sub-agent passes for intelligent summarization and hash-based caching to avoid redundant work.
|
||||
|
||||
**Current Problem:**
|
||||
- Aggregation is a simple pass that either injects full file content or a basic skeleton
|
||||
- No intelligence applied to determine what level of detail is needed
|
||||
- Same files get re-summarized on every discussion start even if unchanged
|
||||
|
||||
**Goal:**
|
||||
- Use a sub-agent during aggregation pass for high-tier agents to generate succinct summaries
|
||||
- Cache summaries based on file hash - only re-summarize if file changed
|
||||
- Smart outline generation for code files, summary for text files
|
||||
|
||||
## 2. Current State Audit
|
||||
|
||||
### Existing Aggregation Behavior
|
||||
- `aggregate.py` handles context aggregation
|
||||
- `file_cache.py` provides AST parsing and skeleton generation
|
||||
- Per-file flags: `Auto-Aggregate` (summarize), `Force Full` (inject raw)
|
||||
- No caching of summarization results
|
||||
|
||||
### Provider API Considerations
|
||||
- Different providers have different prompt/caching mechanisms
|
||||
- Need to verify how each provider handles system context and caching
|
||||
- May need provider-specific aggregation strategies
|
||||
|
||||
## 3. Functional Requirements
|
||||
|
||||
### 3.1 Hash-Based Summary Cache
|
||||
- Generate SHA256 hash of file content
|
||||
- Store summaries in a cache (file-based or in project state)
|
||||
- Before summarizing, check if file hash matches cached summary
|
||||
- Cache invalidation when file content changes
|
||||
|
||||
### 3.2 Sub-Agent Summarization Pass
|
||||
- During aggregation, optionally invoke sub-agent for summarization
|
||||
- Sub-agent generates concise summary of file purpose and key points
|
||||
- Different strategies for:
|
||||
- Code files: AST-based outline + key function signatures
|
||||
- Text files: Paragraph-level summary
|
||||
- Config files: Key-value extraction
|
||||
|
||||
### 3.3 Tiered Aggregation Strategy
|
||||
- Tier 3/4 workers: Get skeleton outlines (fast, cheap)
|
||||
- Tier 2 (Tech Lead): Get summaries with key details
|
||||
- Tier 1 (Orchestrator): May get full content or enhanced summaries
|
||||
- Configurable per-agent via Persona
|
||||
|
||||
### 3.4 Cache Persistence
|
||||
- Summaries persist across sessions
|
||||
- Stored in project directory or centralized cache location
|
||||
- Manual cache clear option in UI
|
||||
|
||||
## 4. Data Model
|
||||
|
||||
### 4.1 Summary Cache Entry
|
||||
```python
|
||||
{
|
||||
"file_path": str,
|
||||
"file_hash": str, # SHA256 of content
|
||||
"summary": str,
|
||||
"outline": str, # For code files
|
||||
"generated_at": str, # ISO timestamp
|
||||
"generator_tier": str, # Which tier generated it
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 Aggregation Config
|
||||
```toml
|
||||
[aggregation]
|
||||
default_mode = "summarize" # "full", "summarize", "outline"
|
||||
cache_enabled = true
|
||||
cache_dir = ".slop_cache"
|
||||
```
|
||||
|
||||
## 5. UI Changes
|
||||
|
||||
- Add "Clear Summary Cache" button in Files & Media or Context Composition
|
||||
- Show cached status indicator on files (similar to AST cache indicator)
|
||||
- Configuration in AI Settings or Project Settings
|
||||
|
||||
## 6. Acceptance Criteria
|
||||
|
||||
- [ ] File hash computed before summarization
|
||||
- [ ] Summary cache persists across app restarts
|
||||
- [ ] Sub-agent generates better summaries than basic skeleton
|
||||
- [ ] Aggregation respects tier-level configuration
|
||||
- [ ] Cache can be manually cleared
|
||||
- [ ] Provider APIs handle aggregated context correctly
|
||||
|
||||
## 7. Out of Scope
|
||||
- Changes to provider API internals
|
||||
- Vector store / embeddings for RAG (separate track)
|
||||
- Changes to Session Hub / Discussion Hub layout
|
||||
|
||||
## 8. Dependencies
|
||||
- `aggregate.py` - main aggregation logic
|
||||
- `file_cache.py` - AST parsing and caching
|
||||
- `ai_client.py` - sub-agent invocation
|
||||
- `models.py` - may need new config structures
|
||||
@@ -1,23 +0,0 @@
|
||||
# AI Loop: Optimization & Consolidation Targets
|
||||
|
||||
Based on the technical trace and sequence mapping of the AI interaction loop, the following areas are identified as primary targets for "Heavy Curation".
|
||||
|
||||
### 1. Unified Provider Loop (`ai_client.py`)
|
||||
- **Observation:** `_send_anthropic`, `_send_gemini`, and `_send_gemini_cli` all implement their own `for r_idx in range(MAX_TOOL_ROUNDS + 2)` loops.
|
||||
- **Problem:** Significant boilerplate duplication for tool execution, error handling, and file re-reading.
|
||||
- **Curation Goal:** Refactor the multi-turn recursion into a single `_base_send_loop` method that takes a provider-specific `generate_turn` callback.
|
||||
|
||||
### 2. Threading Model Management (`app_controller.py`)
|
||||
- **Observation:** `_process_event_queue` spawns a new `threading.Thread` for every `user_request`.
|
||||
- **Problem:** Potential for thread explosion if multiple asynchronous requests are triggered rapidly (though rare in typical usage).
|
||||
- **Curation Goal:** Consolidate into a single dedicated "AI Worker" thread with a task queue, or use a small `ThreadPoolExecutor` to manage background lifetimes.
|
||||
|
||||
### 3. Redundant Context Markers
|
||||
- **Observation:** `_FILE_REFRESH_MARKER` and `_get_context_marker()` are used in multiple places to inject diffs.
|
||||
- **Problem:** String duplication and fragmented logic for deciding when to "refresh" the AI's file context.
|
||||
- **Curation Goal:** Centralize the context-refresh injection logic within the `aggregate` module or a dedicated `ContextRefresher` class.
|
||||
|
||||
### 4. Blocking Call Audit
|
||||
- **Observation:** `asyncio.run_coroutine_threadsafe(...).result()` is used to call async tool logic from the sync worker thread.
|
||||
- **Problem:** This bridge is technically correct but adds complexity.
|
||||
- **Curation Goal:** If possible, move more of the AI loop logic into a proper `async` context to avoid the `.result()` blocking pattern.
|
||||
@@ -1,86 +0,0 @@
|
||||
# AI Interaction Pipeline: Intensive Technical Trace
|
||||
|
||||
This document provides a low-level technical trace of the AI interaction loop, following a pipeline-oriented architectural model. It identifies thread context switches, data transformation overhead, and synchronization bottlenecks.
|
||||
|
||||
## 1. Sequence Diagram: Asynchronous Interaction Pipeline
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant UI as gui_2.py (Main/Render Thread)
|
||||
participant EV as app_controller.py (Event Dispatcher)
|
||||
participant WK as ai_client.py (Worker Thread Pool)
|
||||
participant AI as ai_client.py (Provider Pipeline)
|
||||
participant MCP as mcp_client.py (FileSystem Pipeline)
|
||||
participant SR as shell_runner.py (Subprocess Pipeline)
|
||||
|
||||
Note over UI, WK: [Phase A: Request Initiation]
|
||||
UI->>EV: SyncEventQueue.put("user_request", dict)
|
||||
Note right of UI: Data: Raw Prompt + Context Pointers
|
||||
EV->>EV: polling loop (event_queue.get())
|
||||
EV->>WK: threading.Thread(target=_handle_request_event).start()
|
||||
Note right of EV: Context Switch: Event Thread -> AI Worker Thread
|
||||
|
||||
Note over WK, AI: [Phase B: Context Synthesis & Generation]
|
||||
WK->>AI: ai_client.send(md_content, history)
|
||||
AI->>AI: _build_chunked_context_blocks()
|
||||
Note right of AI: Perf: O(N) string concatenation + regex scans
|
||||
AI->>Vendor: Provider API Request (HTTPS/JSON)
|
||||
Note right of AI: Bottleneck: Network Latency (1-30s)
|
||||
Vendor-->>AI: ToolCall(s) or StopReason
|
||||
|
||||
Note over AI, SR: [Phase C: Multi-Turn Tool Execution Loop]
|
||||
loop MAX_TOOL_ROUNDS (r_idx <= 10)
|
||||
alt Tool Use Detected
|
||||
AI->>WK: _execute_tool_calls_concurrently()
|
||||
|
||||
alt Read-Only (MCP)
|
||||
WK->>MCP: read_file / list_dir / search
|
||||
MCP-->>WK: stdout_string
|
||||
else Mutating (Shell)
|
||||
WK->>EV: _pending_gui_tasks.append(approval_modal)
|
||||
Note over UI: UI Polling Detects Task
|
||||
UI->>UI: Render ImGui Popup (Wait for HITL)
|
||||
Note over UI: User Approval Interaction
|
||||
UI-->>WK: threading.Condition.notify()
|
||||
Note right of WK: Resume AI Worker Thread
|
||||
WK->>SR: run_powershell(script)
|
||||
SR->>OS: Subprocess Spawn (powershell.exe)
|
||||
OS-->>SR: stdout/stderr (JSON-L Stream)
|
||||
SR-->>WK: COMBINED_OUTPUT_STRING
|
||||
end
|
||||
|
||||
WK-->>AI: Aggregate Tool Results
|
||||
AI->>AI: _reread_file_items() (Context Refresh)
|
||||
Note right of AI: Perf: IO Bound (File MTime Scans)
|
||||
AI->>Vendor: Follow-up Prompt (with Tool Result)
|
||||
else Terminal Text
|
||||
AI-->>WK: Final AI Response Text
|
||||
end
|
||||
end
|
||||
|
||||
Note over WK, UI: [Phase D: Result Synchronization]
|
||||
WK->>EV: SyncEventQueue.put("response", result)
|
||||
EV->>EV: _pending_gui_tasks.append(response_obj)
|
||||
loop Every Frame (~16.6ms)
|
||||
UI->>EV: _process_pending_gui_tasks()
|
||||
Note right of UI: Data Copy: Controller State -> UI History Buffer
|
||||
UI->>UI: Update Rendering State (Markdown/Syntax Highlight)
|
||||
end
|
||||
```
|
||||
|
||||
## 2. Technical Performance Audit
|
||||
|
||||
### 2.1 Threading & Synchronization
|
||||
- **Context Switches:** The pipeline traverses four distinct execution contexts: Main Thread -> Event Thread -> Daemon Worker -> Subprocess.
|
||||
- **Lock Contention:** `_pending_gui_tasks_lock` is acquired twice per AI response turn (once by background thread to append, once by UI thread to process).
|
||||
- **Blocking Sites:** `ai_client.send` blocks the dedicated `WK` thread. `_confirm_and_run` blocks the `WK` thread using a `Condition` variable waiting on UI input.
|
||||
|
||||
### 2.2 Data Transformation Costs
|
||||
- **Context Bloat:** `md_content` is a monolithic string. During synthesis, this string is often copied or chunked (`_chunk_text`), increasing memory pressure on the Python heap.
|
||||
- **Serialization Overhead:** Every tool call involves: Python dict -> JSON String -> Subprocess Stdin -> (Tools) -> Subprocess Stdout -> JSON String -> Python dict.
|
||||
|
||||
### 2.3 Curation Targets (Intensive)
|
||||
1. **Reduce Memory Copies:** The monolithic Markdown context should be handled as a stream or a shared buffer to avoid redundant copies between `aggregate` and `ai_client`.
|
||||
2. **Deterministic Status Polling:** Replace string-based status polling (`ai_status`) with an enum-based state machine to reduce regex comparisons in the simulator and UI.
|
||||
3. **Subprocess Pooling:** `shell_runner` spawns a new process for every script. For high-frequency tool use, a persistent PowerShell session could reduce overhead.
|
||||
@@ -1,8 +0,0 @@
|
||||
{
|
||||
"track_id": "ai_interaction_call_graph_20260507",
|
||||
"type": "chore",
|
||||
"status": "new",
|
||||
"created_at": "2026-05-07T16:00:00Z",
|
||||
"updated_at": "2026-05-07T16:00:00Z",
|
||||
"description": "Exhaustive function-to-function call graph tracing the AI loop from request to terminal execution."
|
||||
}
|
||||
@@ -1,22 +0,0 @@
|
||||
# Implementation Plan: AI Interaction Call Graph (ai_interaction_call_graph_20260507)
|
||||
|
||||
## Phase 1: Trace Mapping
|
||||
- [x] Task: Use `py_find_usages` to trace `ai_client.send` callers and callees.
|
||||
- [x] Task: Map the asynchronous hand-off from `AppController` to the AI worker threads.
|
||||
- [x] Task: Trace the recursion depth of the tool-call loop (`MAX_TOOL_ROUNDS`).
|
||||
|
||||
## Phase 2: Documentation & Synthesis
|
||||
- [x] Task: Create a high-fidelity Mermaid sequence diagram of the entire loop.
|
||||
- [x] Task: Identify specific areas for logic consolidation or performance optimization.
|
||||
|
||||
## Phase 3: Automated Path Derivation Tooling
|
||||
- [x] Task: Develop `derive_code_path` MCP tool using tree-sitter.
|
||||
- [~] Task: Implement cross-file call-chain tracing and data hand-off detection.
|
||||
- [ ] Task: Verify tool output against the manual AI Loop trace.
|
||||
|
||||
## Phase 4: Comprehensive Pipeline Mapping
|
||||
- [x] Task: Map the **Context Aggregation Pipeline** using the new tool.
|
||||
- [x] Task: Map the **GUI Event & State Synchronization** pipeline.
|
||||
- [x] Task: Map the **Simulation Lifecycle** and turn-loop.
|
||||
- [x] Task: Consolidate all intensive traces into a final Phase 5 Architectural Audit.
|
||||
- [x] Task: Conductor - User Manual Verification 'Final Audit' (Protocol in workflow.md)
|
||||
@@ -1,22 +0,0 @@
|
||||
# Specification: AI Interaction Call Graph (ai_interaction_call_graph_20260507)
|
||||
|
||||
## Overview
|
||||
A low-level technical trace of the AI interaction loop. The goal is to map every single function call and data hand-off from the moment a user message is sent to the final terminal execution of a PowerShell script or tool result.
|
||||
|
||||
## Scope
|
||||
- **Entry Point:** `src/gui_2.py:App._render_discussion_panel` (Send button action).
|
||||
- **Subsystems:** `ai_client.py`, `mcp_client.py`, `shell_runner.py`, `app_controller.py`.
|
||||
|
||||
## Functional Requirements
|
||||
1. **Call Graph Generation:**
|
||||
- Document the sequence of synchronous and asynchronous calls.
|
||||
- Identify thread boundaries (GUI thread vs. Background worker thread).
|
||||
2. **Data Transformation Trace:**
|
||||
- Track the transformation of a message: raw text -> GenerateRequest -> AI History -> Provider Prompt -> AI Response -> Tool Call -> PS Script.
|
||||
3. **Error & Retry Paths:**
|
||||
- Map how exceptions are caught, classified, and bubbled back to the UI.
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] Detailed call graph in Mermaid format.
|
||||
- [ ] List of all internal private methods involved in the loop.
|
||||
- [ ] Identification of any blocking calls in the async pipeline.
|
||||
@@ -1,64 +0,0 @@
|
||||
# AppController Extraction List
|
||||
|
||||
## 1. Move to `src/models.py`
|
||||
- `GenerateRequest` (BaseModel)
|
||||
- `ConfirmRequest` (BaseModel)
|
||||
|
||||
## 2. Extraction to Module Level (Functions taking `controller: AppController`)
|
||||
|
||||
### From `create_api`
|
||||
- `get_api_key`
|
||||
- `health`
|
||||
- `get_gui_state`
|
||||
- `get_mma_status`
|
||||
- `post_gui`
|
||||
- `get_api_session`
|
||||
- `post_api_session`
|
||||
- `get_api_project`
|
||||
- `get_performance`
|
||||
- `get_diagnostics`
|
||||
- `status`
|
||||
- `generate`
|
||||
- `stream`
|
||||
- `pending_actions`
|
||||
- `confirm_action`
|
||||
- `list_sessions`
|
||||
- `get_session`
|
||||
- `delete_session`
|
||||
- `get_context`
|
||||
- `token_stats`
|
||||
|
||||
### From `_process_pending_gui_tasks` (Handlers)
|
||||
- `_handle_refresh_api_metrics`
|
||||
- `_handle_set_ai_status`
|
||||
- `_handle_set_mma_status`
|
||||
- `_handle_ai_response`
|
||||
- `_handle_mma_state_update`
|
||||
- `_handle_set_value`
|
||||
- `_handle_click`
|
||||
- `_handle_drag`
|
||||
- `_handle_right_click`
|
||||
- `_handle_select_list_item`
|
||||
- `_handle_ask_dialog`
|
||||
- `_handle_custom_callback`
|
||||
- `_handle_mma_step_approval`
|
||||
- `_handle_mma_spawn_approval`
|
||||
- `_handle_ticket_started`
|
||||
- `_handle_ticket_completed`
|
||||
- `_handle_bead_updated`
|
||||
|
||||
### From `cb_load_prior_log`
|
||||
- `_resolve_log_ref`
|
||||
|
||||
## 3. Extraction to Module Level (Independent Utilities)
|
||||
- `parse_symbols` (Already module level)
|
||||
- `get_symbol_definition` (Already module level)
|
||||
- `_extract_tool_name`
|
||||
- `_offload_entry_payload`
|
||||
|
||||
## 4. Classes to Top-Level
|
||||
- `ConfirmDialog`
|
||||
- `MMAApprovalDialog`
|
||||
- `MMASpawnApprovalDialog`
|
||||
- `AutoStepDialog` (From `_process_pending_gui_tasks`)
|
||||
- `AutoSpawnDialog` (From `_process_pending_gui_tasks`)
|
||||
@@ -1,7 +0,0 @@
|
||||
{
|
||||
"track_id": "app_controller_curation_20260513",
|
||||
"title": "AppController Curation & Structural Alignment",
|
||||
"status": "in_progress",
|
||||
"initialized": "2026-05-13",
|
||||
"goal": "Curate src/app_controller.py to match gui_2.py organization and enforce Python style conventions."
|
||||
}
|
||||
@@ -1,19 +0,0 @@
|
||||
# Implementation Plan: AppController Curation [checkpoint: fa4388b]
|
||||
|
||||
## Phase 1: Structural Audit & Conventions Update [checkpoint: 511aabb]
|
||||
- [x] Task: Audit `src/app_controller.py` against `gui_2.py` organization and the Python Style Guide. [511aabb]
|
||||
- [x] Task: Identify methods for extraction to module level (Anti-OOP enforcement). [511aabb]
|
||||
- [x] Task: Update `conductor/code_styleguides/python.md` or `product-guidelines.md` if any new nuances are discovered in `gui_2.py`. [511aabb]
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 1: Structural Audit' (Protocol in workflow.md) [511aabb]
|
||||
|
||||
## Phase 2: Refactoring & Curation [checkpoint: fa4388b]
|
||||
- [x] Task: Apply 1-space indentation and remove excessive blank lines in `src/app_controller.py`. [fa4388b]
|
||||
- [x] Task: Clean up and organize `AppController.__init__` state declarations. [fa4388b]
|
||||
- [x] Task: Implement missing type hints and SDM tags. [fa4388b]
|
||||
- [x] Task: Extract identified logic to module-level functions. [fa4388b]
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 2: Refactoring & Curation' (Protocol in workflow.md) [fa4388b]
|
||||
|
||||
## Phase 3: Validation & Regression Testing [checkpoint: fa4388b]
|
||||
- [x] Task: Run the full test suite in batches of 4 files per test run. [fa4388b]
|
||||
- [x] Task: Fix any regressions or type errors discovered during testing. [fa4388b]
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 3: Validation & Regression Testing' (Protocol in workflow.md) [fa4388b]
|
||||
@@ -1,21 +0,0 @@
|
||||
# Specification: AppController Curation & Structural Alignment
|
||||
|
||||
## Context
|
||||
Following the successful cleanup and refactoring of `gui_2.py`, the same organizational patterns and AI-optimized coding conventions must be applied to `src/app_controller.py`. This module is a critical part of the Manual Slop architecture, acting as the bridge between the GUI and the underlying AI/MCP systems.
|
||||
|
||||
## Goals
|
||||
1. **Structural Parity:** Reorganize `src/app_controller.py` to match the structure and quality of `gui_2.py`.
|
||||
2. **Standardization:** Enforce the AI-Optimized Python Style Guide (1-space indent, minimal blank lines, type hints, SDM tags).
|
||||
3. **Refactoring:** Identify and extract logic that violates the 5-level nesting limit or is better suited as module-level functions.
|
||||
4. **Validation:** Ensure full system integrity via the comprehensive test suite, run in batches of 4.
|
||||
|
||||
## Scope
|
||||
- `src/app_controller.py`: Primary target for refactoring and curation.
|
||||
- `conductor/code_styleguides/python.md`: Potential updates if new nuances are found.
|
||||
- `conductor/product-guidelines.md`: Potential updates based on structural findings.
|
||||
|
||||
## Constraints
|
||||
- **Indentation:** Must be exactly 1 space.
|
||||
- **Scoping:** Use `imscope` for any ImGui-related calls if present (though `app_controller` should ideally be logic-focused, some status rendering might exist).
|
||||
- **Anti-OOP:** Move state-independent methods to module level.
|
||||
- **Type Safety:** 100% type hint coverage for all modified sections.
|
||||
@@ -1,5 +0,0 @@
|
||||
# Track approve_modal_ux_20260601 Context
|
||||
|
||||
- [Specification](./spec.md)
|
||||
- [Implementation Plan](./plan.md)
|
||||
- [Metadata](./metadata.json)
|
||||
@@ -1,8 +0,0 @@
|
||||
{
|
||||
"track_id": "approve_modal_ux_20260601",
|
||||
"type": "bug",
|
||||
"status": "new",
|
||||
"created_at": "2026-06-01T00:00:00Z",
|
||||
"updated_at": "2026-06-01T00:00:00Z",
|
||||
"description": "Fix Approve Modal sizing and inline full preview"
|
||||
}
|
||||
@@ -1,17 +0,0 @@
|
||||
# Implementation Plan: Approve Modal UX Fixes
|
||||
|
||||
## Phase 1: Modal Layout Updates
|
||||
- [ ] Task: Make Modal Resizable
|
||||
- [ ] In `src/gui_2.py` (`render_approve_script_modal`), set `imgui.set_next_window_size(imgui.ImVec2(800, 600), imgui.Cond_.first_use_ever)`.
|
||||
- [ ] Change `imgui.WindowFlags_.always_auto_resize` to `0` in `imgui.begin_popup_modal`.
|
||||
- [ ] Task: Fix Full Preview and Input Height
|
||||
- [ ] Add `ui_approve_modal_preview = False` to `App.__init__`.
|
||||
- [ ] Replace `app.show_windows["Text Viewer"]` checkbox logic in `render_approve_script_modal` with `app.ui_approve_modal_preview`.
|
||||
- [ ] When `app.ui_approve_modal_preview` is True, render the script in a read-only child or using `markdown_helper`.
|
||||
- [ ] When False, set the `imgui.input_text_multiline` height to dynamically fill the remaining space (`imgui.ImVec2(-1, -40)` or similar).
|
||||
|
||||
## Phase 2: Verification
|
||||
- [ ] Task: Verification
|
||||
- [ ] Trigger a script approval and resize the modal.
|
||||
- [ ] Toggle "Show Full Preview" and ensure it renders within the modal safely.
|
||||
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Verification' (Protocol in workflow.md)
|
||||
@@ -1,16 +0,0 @@
|
||||
# Specification: Approve Modal UX Fixes
|
||||
|
||||
## 1. Overview
|
||||
The "Approve PowerShell Command" modal is currently too small and cannot be resized. Additionally, the "Show Full Preview" option triggers the external "Text Viewer" window, which cannot be interacted with because the modal blocks all background UI inputs.
|
||||
|
||||
## 2. Functional Requirements
|
||||
* **Resizable Modal:** The modal must allow user resizing and should have a larger default minimum size.
|
||||
* **Inline Preview:** The "Show Full Preview" option must render the full script *inside* the modal itself (e.g., as a read-only scrollable child or markdown block), rather than triggering an external window.
|
||||
* **Responsive Input:** The script input text area should expand to fill the available vertical space of the modal, rather than being fixed to 200px.
|
||||
|
||||
## 3. Non-Functional Requirements
|
||||
* The modal must continue to reliably block the execution thread until the user approves or rejects the script.
|
||||
|
||||
## 4. Acceptance Criteria
|
||||
* The modal can be resized by dragging the corners.
|
||||
* Clicking "Show Full Preview" toggles an inline preview without locking the UI.
|
||||
@@ -1,5 +0,0 @@
|
||||
# Track archive_completed_tracks_20260603 Context
|
||||
|
||||
- [Specification](./spec.md)
|
||||
- [Implementation Plan](./plan.md)
|
||||
- [Metadata](./metadata.json)
|
||||
@@ -1,11 +0,0 @@
|
||||
{
|
||||
"id": "archive_completed_tracks_20260603",
|
||||
"title": "Archive Completed Tracks (2026-05 to 2026-06)",
|
||||
"phase": null,
|
||||
"created": "2026-06-03",
|
||||
"status": "in_progress",
|
||||
"spec_file": "spec.md",
|
||||
"plan_file": "plan.md",
|
||||
"depends_on": [],
|
||||
"completion_checkpoints": []
|
||||
}
|
||||
@@ -1,32 +0,0 @@
|
||||
# Implementation Plan: Archive Completed Tracks (2026-05 to 2026-06)
|
||||
|
||||
## Phase 1: Directory Migration [checkpoint: 594f14f]
|
||||
Focus: Move 39 completed track directories from `conductor/tracks/` to `conductor/archive/` using `git mv`.
|
||||
|
||||
- [x] Task 1.1: Pre-checkpoint - `git add .`
|
||||
- [x] Task 1.2: Create `conductor/tracks/archive_completed_tracks_20260603/` (metadata, plan, spec, index)
|
||||
- [x] Task 1.3: `git mv` 39 directories (atomic single shell call)
|
||||
- [x] Task 1.4: Verify directory count drops from 55 to 16 in `tracks/`
|
||||
- [x] Task 1.N: Atomic commit with git note (594f14f)
|
||||
|
||||
## Phase 2: Registry Consolidation [checkpoint: 56ea316]
|
||||
Focus: Update `conductor/tracks.md` to consolidate the 14 "Earlier Archives" entries into a new "Recent Completed Tracks (2026-05+)" section with `archive/` link paths.
|
||||
|
||||
- [x] Task 2.1: Add new section header to `tracks.md`
|
||||
- [x] Task 2.2: Move 14 entries from "Earlier Archives" into the new section
|
||||
- [x] Task 2.3: Update all `./tracks/<name>` to `./archive/<name>` in those 14 entries
|
||||
- [x] Task 2.4: Verify all 14 new links resolve
|
||||
- [x] Task 2.N: Atomic commit with git note (56ea316)
|
||||
|
||||
## Phase 3: Link Repair [checkpoint: b87742e]
|
||||
Focus: Full link integrity scan revealed 25 broken links in Phase 5/6/Hot Reload sections that weren't covered in Phase 2.
|
||||
|
||||
- [x] Task 3.1: Full link integrity scan across tracks.md
|
||||
- [x] Task 3.2: Fix 25 broken links in Phase 5 (12), Phase 6 (12), Hot Reload (1)
|
||||
- [x] Task 3.3: Re-verify all 81 local links resolve
|
||||
- [x] Task 3.N: Atomic commit with git note (b87742e)
|
||||
|
||||
## Phase 4: Final Checkpoint
|
||||
- [ ] Task 4.1: Final directory + link audit
|
||||
- [ ] Task 4.2: conductor(checkpoint) commit
|
||||
- [ ] Task 4.3: Attach audit report as git note
|
||||
@@ -1,33 +0,0 @@
|
||||
# Archive Completed Tracks (2026-05 to 2026-06)
|
||||
|
||||
Move 39 completed track directories from `conductor/tracks/` to `conductor/archive/` and update `conductor/tracks.md` to reflect the consolidated archive state. Mirrors the pattern established by `archive_phase_4_tracks_20260507`.
|
||||
|
||||
## Scope
|
||||
|
||||
**In scope (39 dirs to move):**
|
||||
|
||||
Phase 6 (12): `granular_ast_control_20260510`, `context_snapshotting_takes_20260510`, `interactive_text_slice_highlighting_20260510`, `context_batch_operations_ux_20260510`, `gencpp_project_init_20260510`, `interactive_ast_tree_masking_20260510`, `phase6_review_20260510`, `context_comp_decouple_20260510`, `context_comp_slices_20260510`, `gui_refactor_stabilization_20260512`, `gui_2_cleanup_20260513`, `python_structural_mcp_tools_20260513`.
|
||||
|
||||
Hot Reload (1): `hot_reload_python_20260516`.
|
||||
|
||||
Phase 5 (12): `ai_interaction_call_graph_20260507`, `controller_state_mutation_matrix_20260507`, `source_wide_redundancy_audit_20260507`, `curate_provider_registries_20260507`, `encapsulate_appcontroller_status_20260507`, `decouple_gui_log_loading_20260507`, `refactor_context_aggregation_pipeline_20260507`, `cull_unused_symbols_20260507`, `sdm_docstrings_20260509`, `app_controller_curation_20260513`, `fix_test_suite_failures_20260514`, `fix_indentation_1space_20260516`.
|
||||
|
||||
Earlier Archives (14): `gui_crash_fixes_20260531`, `fix_imgui_keys_down_20260601`, `selectable_thinking_monologs_20260601`, `minimax_history_fix_20260601`, `context_preservation_and_warnings_20260601`, `text_viewer_and_tool_call_fixes_20260601`, `context_composition_ux_20260601`, `structural_file_editor_20260601`, `discussion_metrics_and_compression_20260601`, `approve_modal_ux_20260601`, `phase7_stabilization_and_polishing_20260601`, `phase7_monolithic_stabilization_20260602`, `command_palette_and_performance_20260602`, `documentation_refresh_comprehensive_20260602`.
|
||||
|
||||
**Out of scope (remain in `tracks/`):**
|
||||
- `context_preview_fixes_20260516` `[~]` in progress
|
||||
- `gencpp_dogfood_feedback_20260510` `[ ]` pending
|
||||
- 8 backlog tracks `[ ]` (gencpp bindings, tree-sitter lua, gdscript, c#, openai, zhipu, caching, manual UX)
|
||||
- 6 orphan dirs not in `tracks.md` (`conductor_path_configurable_20260306`, `hot_reload_python_20260510`, `test_harness_hardening_20260310`, `test_patch_fixes_20260513`, `fix_remaining_tests_20260513`, `gui_architecture_refinement_20260512`)
|
||||
|
||||
## Method
|
||||
|
||||
1. `git mv` each completed track directory from `conductor/tracks/<name>` to `conductor/archive/<name>`. Single atomic shell call.
|
||||
2. Verify: `ls conductor/tracks | wc -l` should drop from 55 to 16.
|
||||
3. Update `conductor/tracks.md`: add "Recent Completed Tracks (2026-05+)" section, move the 14 "Earlier Archives" entries there, update `./tracks/` links to `./archive/`.
|
||||
4. Verify link integrity.
|
||||
|
||||
## Risks
|
||||
|
||||
- `git mv` on a directory requires all files to be tracked. If a directory contains untracked files, the move will fail. Mitigation: pre-check with `git ls-files <dir>` before moving.
|
||||
- Atomic per-phase commits per workflow.md. If Phase 1 partial-fails, rollback via `git restore --staged` and re-run.
|
||||
-167
@@ -1,167 +0,0 @@
|
||||
# Track Closeout Report: test_batching_refactor_20260606
|
||||
|
||||
**Status:** SHIPPED 2026-06-08
|
||||
**Final state:** 4/4 phases complete (1 phase skipped with documented rationale)
|
||||
**Adapted from plan:** yes (3 deviations, all documented)
|
||||
|
||||
---
|
||||
|
||||
## What Shipped
|
||||
|
||||
### New library modules (in `tests/`)
|
||||
- `tests/categorizer.py` — `CategoryRecord` + `FixtureClass` + `Speed` enums, AST-based auto-inference, TOML registry merge. **NO regex** (per user "FUCK REGEX" policy + prereq spec).
|
||||
- `tests/batcher.py` — `Batch` dataclass + `plan(records, options) → list[Batch]`. 6-tier isolation: opt-in / unit / mock_app / live_gui / headless / performance.
|
||||
- `tests/pytest_collection_order.py` — Conftest-loaded pytest plugin. Opt-in per-test order from registry; no-op when no entries.
|
||||
|
||||
### Test files
|
||||
- `tests/test_categorizer.py` — 13 tests, all passing.
|
||||
- `tests/test_batcher.py` — 5 tests, all passing.
|
||||
- `tests/test_pytest_collection_order.py` — 2 tests, all passing.
|
||||
- `tests/test_categories.toml` — 5 hand-curated cross-cutting entries (arch_boundary_phase1/2/3, tier4_interceptor, tier4_patch_generation). Empty otherwise.
|
||||
|
||||
### CLI orchestrator (in `scripts/`)
|
||||
- `scripts/run_tests_batched.py` — Replaces the alphabetical 4-at-a-time batcher. Features:
|
||||
- `sys.path.insert` from script-relative `_PROJECT_ROOT` so paths resolve regardless of cwd
|
||||
- `_HAS_XDIST` import-time detection; falls back gracefully when xdist missing
|
||||
- `--tiers`, `--include-opt-in`, `--no-xdist`, `--plan`, `--audit`, `--strict`, `--durations`, `--no-color`
|
||||
- Live output streaming via `subprocess.Popen` (no buffer)
|
||||
- ANSI color (cyan `>>>`/`<<<`, green PASS, red FAIL) with Windows VT enable
|
||||
- Output filter (LogPruner noise, WinError spam, xdist scheduling queue)
|
||||
- Per-line colorization for both xdist (`[gwN] ... STATUS tests/...`) and non-xdist (`tests/... STATUS [P%]`) formats
|
||||
- **Defensive failure detection**: scans captured output for `FAILED ` / `stopping after ` markers because `proc.returncode` is sometimes 0 even with a real test failure (commit `488ae044`)
|
||||
- Dynamic-width SUMMARY table with TOTAL row (computed from actual data, not hardcoded)
|
||||
|
||||
### Conftest integration
|
||||
- `tests/conftest.py:25` — Added `pytest_plugins = ["pytest_collection_order"]` (1 line; rest of conftest untouched)
|
||||
|
||||
### Docs
|
||||
- `docs/guide_testing.md` — Added "Batched Run (Categorized)" subsection in Running Tests.
|
||||
|
||||
### Cleanup
|
||||
- Old `scripts/run_tests_batched.py.legacy` deleted (commit `50f26f0d`)
|
||||
- `tests/.test_durations.json` added to `.gitignore` (commit `ac7e638b`)
|
||||
|
||||
### Track artifacts
|
||||
- Archived to `conductor/tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/`
|
||||
- `conductor/tracks.md` updated to mark entry as `[x]` completed with phase SHAs
|
||||
|
||||
---
|
||||
|
||||
## Adaptations from Plan
|
||||
|
||||
| Plan | Actual | Why |
|
||||
|------|--------|-----|
|
||||
| Library in `scripts/` | Library in `tests/` | User directive ("put the test categorizer in ./tests, stop putting shit in scripts") |
|
||||
| `import re` for live_gui detection | AST scan via `ast.parse` + `ast.walk` | User "FUCK REGEX" policy + prereq spec §7 + AGENTS.md ban on `re` in production scripts |
|
||||
| Phase 2 = CI shadow run workflow | Phase 2 = manual plan-vs-actual spot-check | No CI infrastructure exists in repo |
|
||||
| Hardcoded column widths (38/10/6/8) | Dynamic widths computed from data | User feedback: "are you hardcoding the width?" |
|
||||
| `proc.returncode` for batch status | Output scan fallback for `FAILED ` / `stopping after ` | `proc.returncode` is 0 even on real failures (e.g. tier-3) — added defensive check |
|
||||
| `subprocess.run(capture_output=True)` (buffered) | `subprocess.Popen` + line streaming | User: "I don't see a live gui when the tests are running? nvm I do" — needed per-test visibility |
|
||||
| Filter all noise (including scheduling, test paths) | Filter only LogPruner/WinError/xdist queue | User: "HOw tf did we get to this point where now we just want to omit info?" |
|
||||
|
||||
---
|
||||
|
||||
## Verification Criteria (from metadata.json)
|
||||
|
||||
| Criterion | Status | Evidence |
|
||||
|-----------|--------|----------|
|
||||
| 13+ categorizer tests passing | ✓ | `uv run pytest tests/test_categorizer.py` → 13 passed |
|
||||
| 5+ batcher tests passing | ✓ | `uv run pytest tests/test_batcher.py` → 5 passed |
|
||||
| 2+ plugin tests passing | ✓ | `uv run pytest tests/test_pytest_collection_order.py` → 2 passed |
|
||||
| 20/20 new tests pass | ✓ | All three test files: 20 passed in <0.3s |
|
||||
| `categorize_all` returns 277+ records | ✓ | Returns 301 records on the actual repo (no exceptions) |
|
||||
| All 14 `*_sim.py` in ONE tier-3 batch | ✓ | `pytest_collection_order` + AST scan finds 48 live_gui users (broader than just `*_sim.py`), all in tier-3-live_gui single batch |
|
||||
| Opt-in tests skip silently without env var | ✓ | `--include-opt-in not set` shown for `tier-0-opt_in-clean_install` and `tier-0-opt_in-docker_build` |
|
||||
| `--audit --strict` exits 0 | ✓ | No cross-cutting auto-classified files (zero STRICT violations) |
|
||||
| `pytest_collection_order` is no-op when no `[[test_order]]` entries | ✓ | Test `test_no_op_without_registry` passes |
|
||||
| >80% coverage on new code | Partial | Tests are coarse-grained (small target surface). Not measured explicitly; the functions are short and tested. |
|
||||
|
||||
---
|
||||
|
||||
## Known Follow-up Issues (out of scope for this track)
|
||||
|
||||
### 1. `test_full_live_workflow::test_full_live_workflow` FAILED
|
||||
- **Tier-3 batch correctly reports FAIL** (commits `5c6eb620`, `488ae044`)
|
||||
- Failure: `AssertionError: Project failed to activate` after 10-iteration poll on `client.get_project()` for new project name
|
||||
- Test does: `client.click("btn_project_new_automated", user_data=temp_project_path)` then polls for `'temp_project'` to appear in `client.get_project()` response
|
||||
- **Likely root causes to investigate (separate track):**
|
||||
- Button ID `btn_project_new_automated` may have been renamed/removed
|
||||
- Project activation callback not firing within the 10s window
|
||||
- Test artifact `temp_project.toml` path issue (the test does `os.path.abspath("tests/artifacts/temp_project.toml")` from cwd — depends on cwd)
|
||||
- `_default_windows` mismatch (recent multi-theme refactor changed defaults)
|
||||
- The test was previously failing per `tracks.md` line 162 ("Pre-existing test failures (unrelated)"): `test_api_generate_blocked_while_stale` (ui_global_preset_name AttributeError) and `test_rag_large_codebase_verification_sim` (RAG retrieval)
|
||||
- **Now passes**: `test_api_generate_blocked_while_stale` PASSED in 0.62s when run in isolation (was a flake, now fixed by the recent `_default_windows` changes)
|
||||
- **Newly surfaced**: `test_full_live_workflow` is now the remaining known failure
|
||||
|
||||
### 2. `PytestUnknownMarkWarning: Unknown pytest.mark.live`
|
||||
- Tests use `@pytest.mark.live` (test_visual_mma.py:5, test_visual_sim_gui_ux.py:7,59)
|
||||
- pyproject.toml `[tool.pytest.ini_options] markers` does not register `live`
|
||||
- Warnings emitted every tier-3 run
|
||||
- Fix: add `"live: marks tests as live visualization tests"` to `pyproject.toml` markers list
|
||||
|
||||
### 3. `LogPruner` race on Windows
|
||||
- Logs `Error removing ... : [WinError 32] The process cannot access the file because it is being used by another process: 'apihooks.log'`
|
||||
- Tests launch live_gui fixture which writes to `apihooks.log`; LogPruner tries to delete old session directories while the new test is still using the log
|
||||
- Mostly cosmetic but pollutes output
|
||||
- Root cause: LogPruner and live_gui teardown don't coordinate file locks
|
||||
- **Batcher filters these lines from output** (commits `5c6eb620`); the actual race is a separate concern
|
||||
|
||||
### 4. Conftest.py indentation drift
|
||||
- `tests/conftest.py` uses 4-space indentation throughout (out of project standard 1-space)
|
||||
- Out of scope for this track; refactoring would require touching 545+ lines
|
||||
- Documented in `conductor/edit_workflow.md` as a known issue
|
||||
|
||||
### 5. State file format drift
|
||||
- `state.toml` has duplicate `[meta] status` lines (an earlier `set_file_slice` inserted without removing the original)
|
||||
- Phase task descriptions reference the OLD `scripts/` location for the library (plan was written before user moved it to `tests/`)
|
||||
- Tracked here; state file is archived, won't be auto-parsed by future agents
|
||||
|
||||
### 6. User's TOML files commit pollution
|
||||
- Throughout the track, `config.toml`, `project.toml`, `project_history.toml`, and `manualslop_layout.ini` got pulled into commits because they had unstaged changes that were inadvertently included by `git add`/`git add -A` calls
|
||||
- The user said "I'm too tired to correct this shit" — explicit acknowledgement, not fixed
|
||||
- Future agents should `git status` before each commit and explicitly add only the relevant files
|
||||
|
||||
### 7. Tier 1 + Tier 2 not all runnable in <120s
|
||||
- Full tier-1 (216 unit tests) takes ~89s
|
||||
- Full tier-2 (31 mock_app tests) takes ~28s
|
||||
- Full tier-3 (48 live_gui tests) takes ~178s
|
||||
- Total: ~295s for default `--tiers 1,2,3,H`
|
||||
- Per `conductor/workflow.md` TDD protocol, this exceeds the 120s tool timeout — but the runner buffers output correctly so partial results are visible; the final SUMMARY is what matters
|
||||
- Acceptable for a developer-ergonomics tool, not a blocker
|
||||
|
||||
---
|
||||
|
||||
## Follow-up Track Recommendation
|
||||
|
||||
`fix_live_workflow_test_20260608` (or similar):
|
||||
- **Owner:** Tier 2 Tech Lead
|
||||
- **Priority:** Medium (one known failure; doesn't block other tracks)
|
||||
- **Scope:** Root-cause `test_full_live_workflow` project activation timeout; fix or quarantine with skipif
|
||||
- **Also include:** Add `live` to pytest markers; coordinate LogPruner + live_gui teardown
|
||||
- **Blocked by:** None
|
||||
- **Estimated phases:** 1-2 phases (investigation + fix-or-skip)
|
||||
|
||||
---
|
||||
|
||||
## Files Touched (final inventory)
|
||||
|
||||
```
|
||||
scripts/run_tests_batched.py [modified — full rewrite]
|
||||
tests/categorizer.py [new]
|
||||
tests/batcher.py [new]
|
||||
tests/pytest_collection_order.py [new]
|
||||
tests/test_categorizer.py [new]
|
||||
tests/test_batcher.py [new]
|
||||
tests/test_pytest_collection_order.py [new]
|
||||
tests/test_categories.toml [new — minimal registry]
|
||||
tests/conftest.py [modified — 1-line plugin registration]
|
||||
docs/guide_testing.md [modified — Running Tests section]
|
||||
.gitignore [modified — tests/.test_durations.json]
|
||||
pyproject.toml [modified — pytest-xdist added to dev]
|
||||
conductor/tracks.md [modified — entry marked complete]
|
||||
conductor/tracks/test_batching_refactor_20260606/ [archived]
|
||||
```
|
||||
|
||||
**Commits:** 16 atomic commits across the track, from `4d646432` (data model) through `488ae044` (failure-detection fix). Each phase checkpointed with a git note.
|
||||
|
||||
**Test count:** 20/20 new tests pass. 273+ existing tests in the suite; 1 currently failing (test_full_live_workflow) — was pre-existing or related to recent `_default_windows` changes, not introduced by this track.
|
||||
-77
@@ -1,77 +0,0 @@
|
||||
{
|
||||
"track_id": "test_batching_refactor_20260606",
|
||||
"name": "Test Batching Refactor",
|
||||
"initialized": "2026-06-06",
|
||||
"owner": "tier2-tech-lead",
|
||||
"priority": "medium",
|
||||
"status": "active",
|
||||
"type": "developer tooling + diagnostic improvement",
|
||||
"scope": {
|
||||
"new_files": [
|
||||
"scripts/test_categorizer.py",
|
||||
"scripts/test_batcher.py",
|
||||
"scripts/pytest_collection_order.py",
|
||||
"tests/test_categories.toml",
|
||||
"tests/test_categorizer.py",
|
||||
"tests/test_batcher.py"
|
||||
],
|
||||
"modified_files": [
|
||||
"scripts/run_tests_batched.py",
|
||||
"tests/conftest.py",
|
||||
"pyproject.toml"
|
||||
],
|
||||
"deleted_files_at_phase4": [
|
||||
"scripts/run_tests_batched.py.legacy"
|
||||
]
|
||||
},
|
||||
"blocked_by": [],
|
||||
"blocks": [],
|
||||
"estimated_phases": 4,
|
||||
"spec": "spec.md",
|
||||
"plan": "plan.md",
|
||||
"priority_order": "B (process isolation by fixture class) > A (subsystem diagnostic grouping) > C (xdist + live_gui session reuse)",
|
||||
"tier_model": {
|
||||
"0_opt_in": "test_clean_install.py, test_docker_build.py; one batch per file; runs only if env var set AND --include-opt-in passed",
|
||||
"1_unit": "Pure unit tests (no live_gui/mock_app/app_instance); grouped by batch_group; pytest-xdist -n auto",
|
||||
"2_mock_app": "Tests using mock_app or app_instance fixtures; grouped by batch_group; no xdist",
|
||||
"3_live_gui": "All tests using live_gui fixture in ONE pytest invocation (session-scoped reuse)",
|
||||
"H_headless": "Headless service tests; one pytest invocation",
|
||||
"P_performance": "Performance/stress tests; runs last; one pytest invocation"
|
||||
},
|
||||
"hybrid_classification": "Auto-infer by default from filename and AST fixture scan; tests/test_categories.toml provides hand-curated overrides for cross-cutting and ambiguous files. Registry always wins precedence.",
|
||||
"architectural_invariant": "Every pytest subprocess invocation has a single, well-defined fixture profile. live_gui tests never share a pytest process with non-live_gui tests. Opt-in tests are gated on BOTH env var AND --include-opt-in CLI flag (defense in depth).",
|
||||
"cli_surface": {
|
||||
"default": "All tiers except opt-in (0) and performance (P); xdist enabled for tier 1",
|
||||
"--tiers": "Comma-separated tier list to include (e.g. --tiers 1,2,3)",
|
||||
"--include-opt-in": "Hard flag required IN ADDITION to env var to run opt-in tests",
|
||||
"--plan": "Dry-run; print batch plan and exit",
|
||||
"--audit": "List auto-inferred (unclassified) files; exit non-zero on hard errors",
|
||||
"--no-xdist": "Disable pytest-xdist for tier 1 (debug aid)",
|
||||
"--strict-markers": "Pass --strict-markers to pytest (catch marker typos)"
|
||||
},
|
||||
"verification_criteria": [
|
||||
"scripts/test_categorizer.py::categorize_all returns 277+ CategoryRecords with no exceptions",
|
||||
"scripts/test_batcher.py::plan is deterministic (same inputs -> same outputs)",
|
||||
"All 277+ test files are correctly classified: live_gui / mock_app / unit / opt_in / performance",
|
||||
"Cross-cutting files (test_gui_dag_beads, test_arch_boundary_phase*, etc.) are flagged with multiple subsystems in the report",
|
||||
"--plan output matches the existing 4-at-a-time batching modulo opt-in gating",
|
||||
"No live_gui test ever runs in the same pytest invocation as a non-live_gui test",
|
||||
"Opt-in tests are skipped silently when env var is not set (no warning, no error)",
|
||||
"Opt-in tests are skipped silently when --include-opt-in is not passed (env var alone is insufficient)",
|
||||
"scripts/check_test_toml_paths.py still exits 0 (no real TOML references in tests)",
|
||||
"Existing 273+ test suite passes when run via the new script in --tiers 1,2,3 mode",
|
||||
"tests/test_categorizer.py and tests/test_batcher.py pass with >80% coverage",
|
||||
"pytest_collection_order plugin is a no-op when no [[test_order]] entries exist (zero overhead)"
|
||||
],
|
||||
"links": {
|
||||
"backlog_entry": "conductor/tracks.md (to be added at top of Remaining Backlog)",
|
||||
"current_script": "scripts/run_tests_batched.py",
|
||||
"testing_guide": "docs/guide_testing.md",
|
||||
"workflow_pitfalls": "conductor/workflow.md#known-pitfalls-2026-06-05",
|
||||
"related_tracks": [
|
||||
"conductor/tracks/startup_speedup_20260606/",
|
||||
"conductor/tracks/regression_fixes_20260605/",
|
||||
"conductor/tracks/live_gui_test_hardening_v2_20260605/"
|
||||
]
|
||||
}
|
||||
}
|
||||
-1756
File diff suppressed because it is too large
Load Diff
-348
@@ -1,348 +0,0 @@
|
||||
# Track: Test Batching Refactor
|
||||
|
||||
**Status:** Active (spec approved 2026-06-06)
|
||||
**Initialized:** 2026-06-06
|
||||
**Owner:** Tier 2 Tech Lead
|
||||
**Priority:** Medium (developer ergonomics + diagnostic improvement; not a regression blocker)
|
||||
|
||||
---
|
||||
|
||||
## 1. Problem Statement
|
||||
|
||||
The current test batching script (`scripts/run_tests_batched.py`, 36 lines) groups test files alphabetically in chunks of 4 with `pytest --maxfail=10`. This produces three concrete failure modes:
|
||||
|
||||
1. **Zero diagnostic signal on failure.** When batch 17 fails, the user sees four unrelated filenames and a traceback. There is no way to know which subsystem broke without re-running individual files.
|
||||
2. **No awareness of `live_gui` session-scoped fixture.** The `conductor/workflow.md` Known Pitfalls (2026-06-05) explicitly document that `live_gui` is session-scoped and that tests assuming a clean ImGui state are fragile. The current script *accidentally* avoids cross-batch pollution (each batch is a fresh `subprocess.run`) but is one refactor away from breaking that.
|
||||
3. **No awareness of opt-in tests.** `test_clean_install.py` and `test_docker_build.py` are gated on environment variables but have no marker-based enforcement; running the script on a fresh clone can spuriously invoke them.
|
||||
|
||||
The script's 4-at-a-time batching also has the property that fast unit tests and slow live_gui tests can be mixed in the same pytest invocation if the order changes — the alphabetical sort happens to interleave them.
|
||||
|
||||
## 2. Goals (Priority Order)
|
||||
|
||||
| Priority | Goal | Rationale |
|
||||
|---|---|---|
|
||||
| **B (foundational)** | Process isolation by fixture class. live_gui never shares a pytest process with non-live_gui tests. | `live_gui` is session-scoped; mixing in the same `pytest` invocation causes state pollution. workflow.md 2026-06-05 gotchas are explicit. |
|
||||
| **B (foundational)** | Opt-in tests gated on env var, skipped silently otherwise. | `test_clean_install.py` clones the repo; `test_docker_build.py` builds an image. Running these by default is wrong. |
|
||||
| **A (primary value)** | Diagnostic precision via subsystem grouping. When a batch fails, the report names the subsystem. | The user's stated complaint: "naive alphabetical groupings" provide no signal. |
|
||||
| **A (primary value)** | Warn on unclassified files (registry miss), do not fail the run. | New tests should be flagged for human review without blocking the suite. |
|
||||
| **C (optimization)** | Tier-1 (unit) parallelism via `pytest-xdist`. | Pure unit tests are independent; xdist is a free 2-4x speedup there. |
|
||||
| **C (optimization)** | Live-gui session reuse (all `*_sim.py` in one pytest invocation). | Each fresh `sloppy.py` startup costs ~15s. Reusing the session is the only way to keep live_gui runtime sane. |
|
||||
| **Nice-to-have** | Opt-in per-test order control via the registry. | When test B is known to depend on test A's side effect, ordering matters. Optional; zero impact when unused. |
|
||||
|
||||
### 2.1 Non-Goals
|
||||
|
||||
- **Not** changing the underlying test framework (pytest stays).
|
||||
- **Not** restructuring test files into subdirectories (the flat `tests/` layout is preserved).
|
||||
- **Not** introducing new pytest markers on the test functions themselves. The categorization lives in a single registry file, not on the test code.
|
||||
- **Not** making the script required for CI today. The existing `uv run pytest tests/ -v` invocation keeps working; this script is a developer ergonomics + diagnostic tool.
|
||||
|
||||
## 3. Architecture
|
||||
|
||||
### 3.1 Three-Tier Model (Fixture Class as Primary Axis)
|
||||
|
||||
```
|
||||
tests/
|
||||
conftest.py # pytest plugin entry: registers collection_order plugin
|
||||
test_categories.toml # hand-curated overrides + classification
|
||||
artifacts/ # git-ignored; test outputs (unchanged)
|
||||
logs/ # git-ignored; live_gui logs (unchanged)
|
||||
*.py # test files (unchanged)
|
||||
|
||||
scripts/
|
||||
run_tests_batched.py # REPLACED: now the orchestrator
|
||||
pytest_collection_order.py # NEW: conftest-loaded plugin for opt-in order control
|
||||
test_categorizer.py # NEW: classifier library (auto-infer + registry)
|
||||
test_batcher.py # NEW: scheduler library (turn categories into batches)
|
||||
```
|
||||
|
||||
The categorizer is a pure function: `categorize(filename) -> CategoryRecord`. The batcher is a pure function: `plan(categories, options) -> list[Batch]`. The script is the CLI shell that wires the two together and shells out to `pytest`.
|
||||
|
||||
### 3.2 Data Model
|
||||
|
||||
```python
|
||||
from dataclasses import dataclass, field
|
||||
from enum import Enum
|
||||
from pathlib import Path
|
||||
|
||||
class FixtureClass(str, Enum):
|
||||
UNIT = "unit"
|
||||
MOCK_APP = "mock_app"
|
||||
LIVE_GUI = "live_gui"
|
||||
HEADLESS = "headless"
|
||||
OPT_IN = "opt_in"
|
||||
PERFORMANCE = "performance"
|
||||
|
||||
class Speed(str, Enum):
|
||||
FAST = "fast" # <1s typical
|
||||
MEDIUM = "medium" # 1-5s
|
||||
SLOW = "slow" # 5-30s
|
||||
VERY_SLOW = "very_slow" # >30s
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class CategoryRecord:
|
||||
filename: str
|
||||
fixture_class: FixtureClass
|
||||
subsystems: list[str] # 1..N; multi-subsystem for cross-cutting
|
||||
speed: Speed
|
||||
batch_group: str # groups files within a tier for sub-batching
|
||||
notes: str = ""
|
||||
# Per-test order (opt-in). Default empty dict means natural pytest order.
|
||||
test_order: dict[str, int] = field(default_factory=dict)
|
||||
# Provenance: where did the classification come from?
|
||||
source: str = "auto" # "auto" | "registry"
|
||||
warnings: list[str] = field(default_factory=list)
|
||||
```
|
||||
|
||||
### 3.3 The Six Tiers (Batches = pytest Subprocess Invocations)
|
||||
|
||||
| Tier | FixtureClass | Batch strategy | xdist | Max-fail |
|
||||
|---|---|---|---|---|
|
||||
| **0** | `OPT_IN` | One pytest invocation per file; runs only if env var is set. Skipped silently otherwise. | no | 1 |
|
||||
| **1** | `UNIT` | Grouped by `batch_group` into ~5–8 pytest invocations. | `-n auto` | 10 |
|
||||
| **2** | `MOCK_APP` | Grouped by `batch_group` into ~3–5 pytest invocations. | no (single App instance) | 5 |
|
||||
| **3** | `LIVE_GUI` | **One pytest invocation for all live_gui files.** Session-scoped reuse. Sub-report groups by subsystem via `--co`-derived reporting (post-hoc, from collected test IDs). | no | 1 (session crash = nuke) |
|
||||
| **H** | `HEADLESS` | One pytest invocation; all headless service tests together. | no | 5 |
|
||||
| **P** | `PERFORMANCE` | One pytest invocation; runs last so failures don't block the main feedback loop. | no | 1 |
|
||||
|
||||
The ordering is: **0 → 1 → 2 → 3 → H → P** (opt-in first, perf last).
|
||||
|
||||
### 3.4 The Registry: `tests/test_categories.toml`
|
||||
|
||||
```toml
|
||||
# Schema for each [files.<name>] entry:
|
||||
# fixture_class = "unit" | "mock_app" | "live_gui" | "headless" | "opt_in" | "performance"
|
||||
# subsystems = list of strings (subsystem tags; cross-cutting tests list 2+)
|
||||
# speed = "fast" | "medium" | "slow" | "very_slow"
|
||||
# batch_group = string (sub-batching key within a tier)
|
||||
# notes = free text (optional)
|
||||
#
|
||||
# Opt-in per-test order:
|
||||
# [[files.<name>.test_order]]
|
||||
# test_id = "test_foo::test_bar" # pytest node ID
|
||||
# order = 10 # lower runs first; tests without entries sort after entries
|
||||
|
||||
# Cross-cutting GUI+DAG+Beads test (would be auto-classified as "gui" but actually
|
||||
# touches 3 subsystems; registry overrides subsystems to be explicit)
|
||||
[files.test_gui_dag_beads]
|
||||
fixture_class = "live_gui"
|
||||
subsystems = ["gui", "dag", "beads"]
|
||||
speed = "slow"
|
||||
batch_group = "gui"
|
||||
notes = "Cross-cutting: drives GUI, asserts on DAG state, exercises Beads backend"
|
||||
|
||||
# Architectural boundary test (auto-classification would be ambiguous)
|
||||
[files.test_arch_boundary_phase1]
|
||||
fixture_class = "unit"
|
||||
subsystems = ["architecture"]
|
||||
speed = "fast"
|
||||
batch_group = "core"
|
||||
notes = "Phase 1 of the arch-boundary refactor; no fixture dependencies"
|
||||
|
||||
# Opt-in per-test order example
|
||||
[[files.test_mma_ticket_actions.test_order]]
|
||||
test_id = "test_mma_ticket_actions::test_blocked_ticket_does_not_execute"
|
||||
order = 5
|
||||
|
||||
[[files.test_mma_ticket_actions.test_order]]
|
||||
test_id = "test_mma_ticket_actions::test_priority_ordering"
|
||||
order = 10
|
||||
```
|
||||
|
||||
**Precedence:** registry entries always win. An auto-inferred `fixture_class = "unit"` is replaced by `fixture_class = "mock_app"` if the registry says so. This makes the registry the single source of truth for everything it touches, and the auto-inference is a sensible default for everything else.
|
||||
|
||||
### 3.5 Auto-Inference Rules
|
||||
|
||||
Implemented in `scripts/test_categorizer.py::auto_classify()`. Evaluated in order; first match wins:
|
||||
|
||||
| # | Rule | Match condition | Result |
|
||||
|---|---|---|---|
|
||||
| 1 | Opt-in filename | `test_clean_install` or `test_docker_build` prefix | `OPT_IN` |
|
||||
| 2 | live_gui fixture | File contains `def test_.*\(live_gui\):` or `\(live_gui\)\s*[:,)]` regex match in source | `LIVE_GUI` |
|
||||
| 3 | Mock app fixture | File references `mock_app` or `app_instance` (fixture name) | `MOCK_APP` |
|
||||
| 4 | Headless service | File references headless-service fixtures (e.g. `headless_client`, `TestClient(app)`) | `HEADLESS` |
|
||||
| 5 | Performance keyword | Filename matches `*perf*`, `*stress*`, `*phase_3_final*`, `*phase_4_stress*` | `PERFORMANCE` |
|
||||
| 6 | Default | None of the above | `UNIT` |
|
||||
|
||||
**Subsystem auto-inference:** Take the longest known subsystem prefix from a curated list. Known prefixes (alphabetical for stable ordering): `ai`, `api`, `arch`, `ast`, `async`, `auto`, `beads`, `bias`, `cache`, `cli`, `cmd`, `comms`, `conductor`, `context`, `cost`, `dag`, `deepseek`, `diff`, `discussion`, `event`, `execution`, `external`, `ext`, `fuzzy`, `gemini`, `gui`, `headless`, `history`, `hooks`, `hot`, `imgui`, `layout`, `live`, `log`, `mcp`, `markdown`, `minimax`, `mma`, `model`, `orchestrator`, `outline`, `parallel`, `patch`, `perf`, `persona`, `phase`, `pipeline`, `preset`, `prior`, `process`, `project`, `provider`, `rag`, `script`, `session`, `shader`, `sim`, `skeleton`, `slice`, `spawn`, `status`, `subagent`, `summary`, `symbol`, `sync`, `synthesis`, `system`, `takes`, `theme`, `thinking`, `ticket`, `tier4`, `tiered`, `token`, `tool`, `track`, `tree`, `ts`, `undo`, `usage`, `user`, `vendor`, `view`, `visual`, `vlogger`, `websocket`, `workflow`, `workspace`, `z`.
|
||||
|
||||
**Speed auto-inference:** Read `.test_durations.json` if present (key = `<filename>::<test_id>`, value = seconds). Aggregate by file (p95). Map: `<1s` → FAST, `<5s` → MEDIUM, `<30s` → SLOW, else VERY_SLOW. If no history file, default to MEDIUM.
|
||||
|
||||
**Batch-group auto-inference:** Cluster subsystems into groups heuristically:
|
||||
- `core` = `mcp`, `ai`, `context`, `api`, `dag`, `path`, `presets`, `personas`, `history`, `workspace`, `rag`, `beads`, `model`, `ast`, `async`, `cache`, `cli`, `cmd`, `fuzzy`, `hooks`, `log`, `markdown`, `orchestrator`, `outline`, `pipeline`, `project`, `provider`, `script`, `session`, `skeleton`, `slice`, `spawn`, `status`, `subagent`, `summary`, `symbol`, `sync`, `synthesis`, `system`, `takes`, `thinking`, `tier4`, `tiered`, `tool`, `track`, `tree`, `ts`, `usage`, `vendor`, `vlogger`, `websocket`, `workflow`
|
||||
- `gui` = `gui`, `theme`, `imgui`, `layout`, `live`, `prior`, `visual`, `view`, `undo`
|
||||
- `mma` = `mma`, `conductor`, `execution`, `ext`, `external`, `auto`, `manual`, `tier`, `arch`, `phase`, `process`, `z`
|
||||
- `comms` = `comms`, `diff`, `patch`, `event`, `hot`, `process`, `shader`
|
||||
- `headless` = `headless`
|
||||
|
||||
Single-subsystem tests use that subsystem's group. Multi-subsystem tests default to the group of the FIRST subsystem in their list (registry override can correct).
|
||||
|
||||
## 4. Components
|
||||
|
||||
### 4.1 `scripts/test_categorizer.py` — Pure classifier
|
||||
|
||||
```python
|
||||
def auto_classify(path: Path, durations: dict[str, float] | None = None) -> CategoryRecord: ...
|
||||
def load_registry(toml_path: Path) -> dict[str, dict]: ...
|
||||
def merge_registry(auto: CategoryRecord, registry: dict) -> CategoryRecord: ...
|
||||
def categorize_all(tests_dir: Path, registry_path: Path) -> list[CategoryRecord]: ...
|
||||
```
|
||||
|
||||
Public API. No I/O at import time. Reads registry lazily. The `categorize_all` function returns one `CategoryRecord` per test file in `tests/`. Each record's `source` field is `"registry"` if the registry had any matching entry, else `"auto"`. Each record's `warnings` field is populated with any inconsistencies detected (e.g., auto-inferred fixture_class differs from registry).
|
||||
|
||||
### 4.2 `scripts/test_batcher.py` — Pure scheduler
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class Batch:
|
||||
tier: str # "0", "1", "2", "3", "H", "P"
|
||||
label: str # "tier-1-unit-core"
|
||||
files: list[Path]
|
||||
pytest_args: list[str] # e.g. ["-n", "auto", "--maxfail=10"]
|
||||
estimated_seconds: float
|
||||
skip_reason: str | None = None # populated for skipped opt-in batches
|
||||
|
||||
def plan(
|
||||
records: list[CategoryRecord],
|
||||
*,
|
||||
tiers: set[str] = {"0", "1", "2", "3", "H", "P"},
|
||||
include_opt_in: bool = False,
|
||||
xdist: bool = True,
|
||||
) -> list[Batch]: ...
|
||||
```
|
||||
|
||||
The `plan` function is deterministic. The same `records` + same `options` produce the same `list[Batch]`. This makes the planner trivially testable and makes the `--plan` dry-run mode a one-liner.
|
||||
|
||||
### 4.3 `scripts/run_tests_batched.py` — CLI orchestrator
|
||||
|
||||
Responsibilities (slim, delegates everything else):
|
||||
1. Parse CLI args (`--tiers`, `--include-opt-in`, `--plan`, `--audit`, `--no-xdist`).
|
||||
2. Call `categorize_all(tests_dir, registry_path)`.
|
||||
3. If `--audit`: print records where `source == "auto"`, exit non-zero if any have empty subsystem lists or other hard errors. Exit 0 if every record is well-formed even if some are auto-inferred. If `--audit --strict`: additionally exit non-zero if any auto-classified file has multiple subsystems (heuristic for "probably cross-cutting — should be in the registry").
|
||||
4. If `--plan`: print the batch list (one row per batch with label, files, estimated seconds) and exit.
|
||||
5. Otherwise: call `plan()`, iterate batches, run each as `subprocess.run(uv + pytest + pytest_args + files)`, accumulate per-batch results, print the summary table.
|
||||
6. Return the worst per-batch exit code (0 only if all batches pass).
|
||||
|
||||
The script is intentionally <150 lines. All logic lives in the two library modules.
|
||||
|
||||
### 4.4 `scripts/pytest_collection_order.py` — Conftest-loaded plugin
|
||||
|
||||
Hook: `pytest_collection_modifyitems(config, items)`. Reads `tests/test_categories.toml` once at session start, builds a `dict[str, int]` from `[[files.<name>.test_order]]` entries, then sorts items within each file by their order index. Items without an order index sort after items with one (preserves pytest's natural order for unannotated tests).
|
||||
|
||||
Registered via `tests/conftest.py`:
|
||||
|
||||
```python
|
||||
pytest_plugins = ["scripts.pytest_collection_order"]
|
||||
```
|
||||
|
||||
This is opt-in by design: if no `test_categories.toml` exists OR no `[[files.X.test_order]]` entries exist, the plugin is a no-op (zero items sorted, zero overhead).
|
||||
|
||||
## 5. Output / Report Format
|
||||
|
||||
After the run, the script prints a summary table:
|
||||
|
||||
```
|
||||
[TIER 0] opt-in (clean_install) SKIPPED RUN_CLEAN_INSTALL_TEST not set
|
||||
[TIER 0] opt-in (docker) SKIPPED RUN_DOCKER_TEST not set
|
||||
[TIER 1] unit: core PASS 42/42 8.3s
|
||||
[TIER 1] unit: gui PASS 17/17 2.1s
|
||||
[TIER 1] unit: mma FAIL 12/13 1.8s ← test_mma_ticket_actions::test_x
|
||||
[TIER 2] mock_app: core PASS 31/31 6.4s
|
||||
[TIER 3] live_gui PASS 14/14 47.2s
|
||||
[TIER H] headless PASS 3/3 4.0s
|
||||
[TIER P] performance SKIPPED --tiers excludes P
|
||||
[TOTAL] 5 tiers run, 119 tests, 70.0s, 1 failed
|
||||
```
|
||||
|
||||
For Tier 3, the per-test failures are still in the regular pytest output (one pytest invocation); the summary line just reports the tier-level pass/fail.
|
||||
|
||||
## 6. CLI Surface
|
||||
|
||||
```powershell
|
||||
# Default: all tiers except opt-in and performance; xdist on for tier 1
|
||||
python scripts/run_tests_batched.py
|
||||
|
||||
# Skip slow/expensive stuff
|
||||
python scripts/run_tests_batched.py --tiers 1,2
|
||||
|
||||
# Include opt-in tests (also requires the env var; the flag is a hard requirement
|
||||
# so a CI run cannot accidentally enable them by exporting the env var)
|
||||
python scripts/run_tests_batched.py --include-opt-in
|
||||
|
||||
# Dry-run: show the batch plan, don't run anything
|
||||
python scripts/run_tests_batched.py --plan
|
||||
|
||||
# Audit: warn on unclassified (auto-inferred) files, list them, exit non-zero
|
||||
python scripts/run_tests_batched.py --audit
|
||||
|
||||
# Disable xdist (e.g., when debugging a test that flakes under parallelism)
|
||||
python scripts/run_tests_batched.py --no-xdist
|
||||
|
||||
# Override the tests directory or registry path
|
||||
python scripts/run_tests_batched.py --tests-dir tests --registry tests/test_categories.toml
|
||||
```
|
||||
|
||||
The `--include-opt-in` flag is **additive** to env var gating, not a replacement. A user must both set the env var AND pass the flag. This prevents accidental opt-in execution when an env var is set globally.
|
||||
|
||||
## 7. Configuration
|
||||
|
||||
### 7.1 `pyproject.toml` addition
|
||||
|
||||
```toml
|
||||
[tool.pytest.ini_options]
|
||||
addopts = ["-ra", "--strict-markers"] # add strict-markers to catch typos
|
||||
markers = [
|
||||
"integration: marks tests as integration tests (requires live GUI)",
|
||||
"clean_install: clean install verification (opt-in via RUN_CLEAN_INSTALL_TEST=1)",
|
||||
"docker: docker build and run test (opt-in via RUN_DOCKER_TEST=1)",
|
||||
]
|
||||
```
|
||||
|
||||
`--strict-markers` is opt-in via the script's `--strict-markers` flag, not added to `addopts` globally, to avoid breaking existing test runs that haven't been audited.
|
||||
|
||||
### 7.2 `.test_durations.json` (auto-generated, git-ignored)
|
||||
|
||||
Written by `run_tests_batched.py` after a successful run. Format:
|
||||
|
||||
```json
|
||||
{
|
||||
"tests/test_foo.py::test_bar": 0.043,
|
||||
"tests/test_foo.py::test_baz": 1.234
|
||||
}
|
||||
```
|
||||
|
||||
Used by the categorizer for `speed` auto-inference. If absent, all files default to MEDIUM speed (no batch reordering). Add `tests/.test_durations.json` to `.gitignore` (or place under `tests/artifacts/`).
|
||||
|
||||
## 8. Migration / Rollout
|
||||
|
||||
| Phase | What | Risk |
|
||||
|---|---|---|
|
||||
| **Phase 1 — Library + dry-run** | Add `test_categorizer.py`, `test_batcher.py`, `pytest_collection_order.py`. Add `--plan` and `--audit` modes to a NEW script (don't replace the old one yet). Run on a clean clone; manually verify the plan matches the existing 4-at-a-time behavior (modulo opt-in gating). | None. Old script untouched. |
|
||||
| **Phase 2 — Shadow run** | Run the new script in CI as a non-blocking job (informational only). Compare its pass/fail signature to the old script's. Investigate any divergence. | Low. Old script still authoritative. |
|
||||
| **Phase 3 — Switch default** | Replace the old `run_tests_batched.py` with the new one. Update `docs/guide_testing.md` to point at the new section. Keep the old script under `scripts/run_tests_batched.py.legacy` for one cycle. | Medium. Mitigation: Phase 2 shadow run. |
|
||||
| **Phase 4 — Cleanup** | Delete the legacy script. Add the registry file (`tests/test_categories.toml`) populated with the ~30 cross-cutting / ambiguous files identified during audit. Mark the remaining files as auto-inferred in the report. | Low. |
|
||||
|
||||
Each phase has its own implementation plan produced by the writing-plans skill.
|
||||
|
||||
## 9. Risks & Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|---|---|---|---|
|
||||
| Auto-inference misclassifies a cross-cutting test, putting it in the wrong tier. | Medium | Medium (wrong fixture class could cause pollution) | `--audit` mode lists all auto-inferred records; CI gate on `--audit --strict` exits non-zero if any auto-classified file has multiple subsystems (a heuristic for "probably cross-cutting"). Registry overrides are one-line fixes. |
|
||||
| Tier 3 (live_gui) shares one pytest process; one crash kills all live_gui tests for the run. | Low (existing behavior) | High (15s+ wasted + missing signal) | `--maxfail=1` for tier 3. Document the trade-off: faster average runtime, but a crash in one test forfeits the rest. |
|
||||
| `pytest-xdist` introduces non-determinism in unit tests that share state via module globals. | Low | Medium | Audit scripts flag any unit test that mutates a module-level `src.*` global. Tests that do must be moved to Tier 2 (mock_app) or registered as `MOCK_APP` explicitly. |
|
||||
| Speed auto-inference from `.test_durations.json` is stale. | Medium | Low (wrong `speed` field, not wrong tier) | `speed` affects only the summary table; tiers are determined by `fixture_class`. Stale speed data does not affect process isolation. |
|
||||
| New tests added without a registry entry slip through unclassified. | Medium | Low | `--audit` mode warns; CI can gate on `--audit --strict` (planned for Phase 3). |
|
||||
| `pytest_collection_order` plugin sorts items but tests have hard dependencies on collection order (e.g., shared module state). | Low | High | The plugin is opt-in per file. No `[[test_order]]` entries = natural pytest order. Document the contract in the plugin docstring. |
|
||||
|
||||
## 10. Open Questions
|
||||
|
||||
1. Should the registry live in `tests/` or at the repo root? (Proposal: `tests/test_categories.toml` so it lives next to the tests it describes.)
|
||||
2. Should `batch_group` be inferred by default or required to be explicit? (Proposal: inferred by default; explicit in registry.)
|
||||
3. Should we expose a `python scripts/run_tests_batched.py --tier 3 --file test_gui_dag_beads` mode for ad-hoc single-file runs? (Proposal: yes, defer to a follow-up plan.)
|
||||
4. Should the speed auto-inference be updated incrementally (per run) or only on explicit `--record-durations` opt-in? (Proposal: per-run by default; the file is git-ignored so it's just a developer-local cache.)
|
||||
|
||||
## 11. See Also
|
||||
|
||||
- `docs/guide_testing.md` — current testing guide (will be updated in Phase 3 to reference the new script)
|
||||
- `conductor/workflow.md` "Known Pitfalls (2026-06-05)" — `live_gui` session-scoped fixture gotchas
|
||||
- `conductor/tracks/startup_speedup_20260606/` — example of a prior active track in this project (same convention)
|
||||
-73
@@ -1,73 +0,0 @@
|
||||
# Track state for test_batching_refactor_20260606
|
||||
# Updated by Tier 2 Tech Lead as tasks complete
|
||||
# Status: SHIPPED 2026-06-08 (see CLOSEOUT.md)
|
||||
|
||||
[meta]
|
||||
track_id = "test_batching_refactor_20260606"
|
||||
name = "Test Batching Refactor"
|
||||
status = "completed"
|
||||
current_phase = 4
|
||||
last_updated = "2026-06-08"
|
||||
|
||||
[phases]
|
||||
phase_1 = { status = "completed", checkpoint_sha = "57285d04", name = "Library + dry-run modes" }
|
||||
phase_2 = { status = "completed", checkpoint_sha = "skipped", name = "Shadow run (skipped: no CI infra)" }
|
||||
phase_3 = { status = "completed", checkpoint_sha = "5252b6d7", name = "Switch default + docs update" }
|
||||
phase_4 = { status = "completed", checkpoint_sha = "488ae044", name = "Cleanup + output-filter hardening" }
|
||||
|
||||
[tasks]
|
||||
|
||||
[verification]
|
||||
auto_classify_opt_in = true
|
||||
auto_classify_live_gui = true
|
||||
auto_classify_mock_app = true
|
||||
auto_classify_perf = true
|
||||
auto_classify_default_unit = true
|
||||
subsystem_inference_known_prefixes = true
|
||||
speed_inference_from_durations = true
|
||||
batch_group_inference = true
|
||||
merge_registry_overrides_auto = true
|
||||
categorize_all_277_files = true
|
||||
plan_unit_tier_groups_by_batch_group = true
|
||||
plan_live_gui_tier_one_invocation = true
|
||||
plan_opt_in_skipped_without_flag = true
|
||||
plan_deterministic = true
|
||||
plan_xdist_only_for_tier_1 = true
|
||||
collection_order_no_op_without_entries = true
|
||||
collection_order_sorts_by_order_index = true
|
||||
audit_exits_nonzero_on_hard_errors = true
|
||||
opt_in_skipped_without_env_var = true
|
||||
opt_in_skipped_without_include_flag = true
|
||||
no_live_gui_in_same_invocation_as_others = true
|
||||
existing_test_suite_passes = false
|
||||
test_categorizer_coverage_pct = 0
|
||||
test_batcher_coverage_pct = 0
|
||||
|
||||
[follow_up]
|
||||
recommendation = "fix_live_workflow_test_20260608"
|
||||
scope = "Root-cause test_full_live_workflow::test_full_live_workflow AssertionError; add pytest.mark.live to pyproject.toml; coordinate LogPruner + live_gui teardown to avoid WinError 32 race"
|
||||
blocked_by = []
|
||||
priority = "medium"
|
||||
estimated_phases = "1-2"
|
||||
see_also = "test_full_live_workflow now correctly detected as FAIL by new runner (commit 488ae044)"
|
||||
|
||||
[registry_overrides]
|
||||
[files.test_arch_boundary_phase1]
|
||||
subsystems = ["architecture", "mma"]
|
||||
batch_group = "mma"
|
||||
|
||||
[files.test_arch_boundary_phase2]
|
||||
subsystems = ["architecture", "mma"]
|
||||
batch_group = "mma"
|
||||
|
||||
[files.test_arch_boundary_phase3]
|
||||
subsystems = ["architecture", "mma"]
|
||||
batch_group = "mma"
|
||||
|
||||
[files.test_tier4_interceptor]
|
||||
subsystems = ["tier4", "mma"]
|
||||
batch_group = "mma"
|
||||
|
||||
[files.test_tier4_patch_generation]
|
||||
subsystems = ["tier4", "mma"]
|
||||
batch_group = "mma"
|
||||
@@ -1,5 +0,0 @@
|
||||
# Track archive_phase_4_tracks_20260507 Context
|
||||
|
||||
- [Specification](./spec.md)
|
||||
- [Implementation Plan](./plan.md)
|
||||
- [Metadata](./metadata.json)
|
||||
@@ -1,8 +0,0 @@
|
||||
{
|
||||
"track_id": "archive_phase_4_tracks_20260507",
|
||||
"type": "chore",
|
||||
"status": "new",
|
||||
"created_at": "2026-05-07T14:00:00Z",
|
||||
"updated_at": "2026-05-07T14:00:00Z",
|
||||
"description": "Review and archive all completed from phase 4."
|
||||
}
|
||||
@@ -1,13 +0,0 @@
|
||||
# Implementation Plan: Phase 4 Track Archival (archive_phase_4_tracks_20260507)
|
||||
|
||||
## Phase 1: Directory Migration [checkpoint: 2065dd8]
|
||||
- [x] Task: Identify and list all completed Phase 4 track directories.
|
||||
- [x] Task: Move identified track directories from `conductor/tracks/` to `conductor/archive/`.
|
||||
- [x] Task: Conductor - User Manual Verification 'Directory Migration' (Protocol in workflow.md)
|
||||
|
||||
## Phase 2: Registry Update [checkpoint: 9f2390d]
|
||||
- [x] Task: Create 'Phase 4 Archive' section in `conductor/tracks.md`.
|
||||
- [x] Task: Move track entries from Phase 4 sections to 'Phase 4 Archive' section.
|
||||
- [x] Task: Update track links in `conductor/tracks.md` to point to the `archive/` directory.
|
||||
- [x] Task: Verify link integrity in `conductor/tracks.md` (manual or via script).
|
||||
- [x] Task: Conductor - User Manual Verification 'Registry Update' (Protocol in workflow.md)
|
||||
@@ -1,28 +0,0 @@
|
||||
# Specification: Phase 4 Track Archival (archive_phase_4_tracks_20260507)
|
||||
|
||||
## Overview
|
||||
This track involves archiving all completed tracks from Phase 4 to maintain a clean and focused `tracks.md` registry and `tracks/` directory.
|
||||
|
||||
## Scope
|
||||
- **Target Tracks:** All tracks under "Phase 4: High-Fidelity UX & Tools" in `conductor/tracks.md` that are marked as completed `[x]`.
|
||||
- **Destination:** `conductor/archive/<track_id>/`.
|
||||
- **Registry Update:** `conductor/tracks.md`.
|
||||
|
||||
## Functional Requirements
|
||||
1. **Directory Migration:**
|
||||
- Move each completed Phase 4 track directory from `conductor/tracks/` to `conductor/archive/`.
|
||||
2. **Registry Reorganization:**
|
||||
- Create a new section in `conductor/tracks.md` titled "Phase 4 Archive".
|
||||
- Move all completed Phase 4 track entries (text and links) from their current locations in `tracks.md` to this new section.
|
||||
- Update the links for these tracks to point to the new location: `[./archive/<track_id>/](./archive/<track_id>/)`.
|
||||
|
||||
## Non-Functional Requirements
|
||||
- Maintain link integrity within `tracks.md`.
|
||||
- Ensure no active or incomplete tracks are accidentally moved.
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] All completed Phase 4 track directories are present in `conductor/archive/`.
|
||||
- [ ] No completed Phase 4 track directories remain in `conductor/tracks/`.
|
||||
- [ ] `conductor/tracks.md` has a "Phase 4 Archive" section containing all moved tracks.
|
||||
- [ ] All links in the new "Phase 4 Archive" section are functional and point to the correct subdirectories in `archive/`.
|
||||
- [ ] Active Phase 4 tracks remain in their original sections and point to `tracks/`.
|
||||
@@ -1,27 +0,0 @@
|
||||
# Implementation Plan: Beads Mode Integration
|
||||
|
||||
## Phase 1: Environment & Core Configuration
|
||||
- [x] Task: Audit existing `AppController` and `project_manager.py` for project mode handling.
|
||||
- [x] Task: Write Tests: Verify `manual_slop.toml` can parse and store the `execution_mode` (native/beads).
|
||||
- [x] Task: Implement: Add `execution_mode` toggle to `AppController` state and persistence logic.
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 1: Environment & Core Configuration' (Protocol in workflow.md)
|
||||
|
||||
## Phase 2: Beads Backend & Tooling
|
||||
- [x] Task: Write Tests: Verify a basic Beads/Dolt repository can be initialized and queried via a Python wrapper.
|
||||
- [x] Task: Implement: Create `src/beads_client.py` to interface with the `bd` CLI or direct Dolt SQL backend.
|
||||
- [x] Task: Write Tests: Verify agents can create and update Beads using a mock Beads environment.
|
||||
- [x] Task: Implement: Add a suite of MCP tools (`bd_create`, `bd_update`, `bd_ready`, `bd_list`) to `src/mcp_client.py`.
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 2: Beads Backend & Tooling' (Protocol in workflow.md)
|
||||
|
||||
## Phase 3: GUI Integration & Visual DAG
|
||||
- [x] Task: Write Tests: Verify the Visual DAG can load node data from a non-markdown source (Beads graph).
|
||||
- [x] Task: Implement: Refactor `_render_mma_dashboard` and the DAG renderer to pull from the active mode's backend.
|
||||
- [x] Task: Implement: Add a "Beads" tab to the MMA Dashboard for browsing the raw Dolt-backed issue graph.
|
||||
- [x] Task: Implement: Update Tier Streams to include metadata for Beads-specific status changes.
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 3: GUI Integration & Visual DAG' (Protocol in workflow.md)
|
||||
|
||||
## Phase 4: Context Optimization & Polish
|
||||
- [x] Task: Write Tests: Verify that "Compaction" correctly summarizes completed Beads into a concise text block.
|
||||
- [x] Task: Implement: Add Compaction logic to the context aggregation pipeline for Beads Mode.
|
||||
- [x] Task: Implement: Final UI polish, icons for Bead nodes, and robust error handling for missing `dolt`/`bd` binaries.
|
||||
- [~] Task: Conductor - User Manual Verification 'Phase 4: Context Optimization & Polish' (Protocol in workflow.md)
|
||||
@@ -1,7 +0,0 @@
|
||||
# Track clean_install_test_20260603 Context
|
||||
|
||||
- [Specification](./spec.md)
|
||||
- [Implementation Plan](./plan.md)
|
||||
- [Metadata](./metadata.json)
|
||||
- [Source Plan](../../../../docs/superpowers/plans/2026-06-02-clean-install-test.md)
|
||||
- [Source Spec](../../../../docs/superpowers/specs/2026-06-02-clean-install-test-design.md)
|
||||
@@ -1,11 +0,0 @@
|
||||
{
|
||||
"id": "clean_install_test_20260603",
|
||||
"title": "Clean Install Test",
|
||||
"phase": null,
|
||||
"created": "2026-06-03",
|
||||
"status": "in_progress",
|
||||
"spec_file": "spec.md",
|
||||
"plan_file": "plan.md",
|
||||
"depends_on": [],
|
||||
"completion_checkpoints": []
|
||||
}
|
||||
@@ -1,24 +0,0 @@
|
||||
# Implementation Plan: Clean Install Test (clean_install_test_20260603)
|
||||
|
||||
## Phase 1: Add pytest marker [checkpoint: 573d289]
|
||||
Focus: Register the `clean_install` marker in `pyproject.toml` so the test can be selected with `pytest -m clean_install` or filtered with `-m "not clean_install"`.
|
||||
|
||||
- [x] Task 1.1: Pre-edit checkpoint - `git add .`
|
||||
- [x] Task 1.2: Edit `pyproject.toml` to add `clean_install` marker
|
||||
- [x] Task 1.3: Run `pytest --collect-only` to confirm marker is recognized
|
||||
- [x] Task 1.N: Atomic commit + git note (573d289)
|
||||
|
||||
## Phase 2: Create the test file [checkpoint: d171c18]
|
||||
Focus: Create `tests/test_clean_install.py` with opt-in clone-and-verify logic.
|
||||
|
||||
- [x] Task 2.1: Pre-edit checkpoint - `git add .`
|
||||
- [x] Task 2.2: Create `tests/test_clean_install.py` using `urllib.request` (deviation from plan, see spec.md)
|
||||
- [x] Task 2.3: Run the test in skip mode - should be 1 skipped
|
||||
- [x] Task 2.N: Atomic commit + git note (d171c18)
|
||||
|
||||
## Phase 3: Phase Completion Verification
|
||||
- [x] Task 3.1: Run the test in default mode - 1 skipped (gating works)
|
||||
- [x] Task 3.2: `pytest --collect-only -m clean_install` confirms marker works
|
||||
- [x] Task 3.3: Negative marker filter works (-m "not clean_install" deselects the test)
|
||||
- [x] Task 3.4: Module imports cleanly
|
||||
- [x] Task 3.N: conductor(checkpoint) commit + audit note
|
||||
@@ -1,22 +0,0 @@
|
||||
# Clean Install Test (clean_install_test_20260603)
|
||||
|
||||
Opt-in pytest test that clones the Manual Slop repo to a temp dir, runs `uv sync`, launches `sloppy.py --enable-test-hooks`, and verifies the Hook API responds. Catches "works on my machine" failures by exercising the full install-and-launch path in an isolated environment.
|
||||
|
||||
## Goal
|
||||
|
||||
Add a single integration test file `tests/test_clean_install.py` that, when opted in via `RUN_CLEAN_INSTALL_TEST=1`, performs a full clean install + launch verification of Manual Slop. Skipped by default to avoid breaking CI for users without network access to the Gitea clone URL.
|
||||
|
||||
## Plan Source
|
||||
|
||||
This track executes the plan at `docs/superpowers/plans/2026-06-02-clean-install-test.md` and references the spec at `docs/superpowers/specs/2026-06-02-clean-install-test-design.md`.
|
||||
|
||||
## Files Touched
|
||||
|
||||
| File | Action |
|
||||
|---|---|
|
||||
| `pyproject.toml` | Modify: add `clean_install` marker |
|
||||
| `tests/test_clean_install.py` | Create: opt-in clone-and-verify test |
|
||||
|
||||
## Deviation from Plan
|
||||
|
||||
The plan uses `requests` library, but `requests` is not a project dependency. Per `conductor/tech-stack.md` "Dependency Minimalism" rule and the existing pattern in `src/mcp_client.py` web tools (which use `urllib.request` + `html.parser` from stdlib), the test will use `urllib.request` from Python stdlib instead. This avoids adding a new external dependency for a single opt-in test.
|
||||
@@ -1,5 +0,0 @@
|
||||
# Track code_path_analysis_20260507 Context
|
||||
|
||||
- [Specification](./spec.md)
|
||||
- [Implementation Plan](./plan.md)
|
||||
- [Metadata](./metadata.json)
|
||||
@@ -1,8 +0,0 @@
|
||||
{
|
||||
"track_id": "code_path_analysis_20260507",
|
||||
"type": "chore",
|
||||
"status": "new",
|
||||
"created_at": "2026-05-07T15:00:00Z",
|
||||
"updated_at": "2026-05-07T15:00:00Z",
|
||||
"description": "Comprehensive analysis of major processing routes in ./src and ./simulation. Identify data pipelines and responsibilities."
|
||||
}
|
||||
@@ -1,26 +0,0 @@
|
||||
# Implementation Plan: Code Path & Data Pipeline Analysis (code_path_analysis_20260507)
|
||||
|
||||
## Phase 1: Structural Exploration & Tooling Setup
|
||||
- [x] Task: Initialize `PIPELINE_ANALYSIS.md` template.
|
||||
- [x] Task: Deploy `codebase_investigator` subagents to identify top-level entry points in `gui_2.py` and `simulation/`.
|
||||
- [x] Task: Verify usage of existing tree-sitter tools to generate initial call-graph skeletons for `./src`.
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 1' (Protocol in workflow.md)
|
||||
|
||||
## Phase 2: Mapping Core Source Pipelines (`./src`)
|
||||
- [x] Task: Map the **Context Aggregation Pipeline** (`aggregate.py`, `models.py`).
|
||||
- [x] Task: Map the **AI Interaction Loop** (`ai_client.py`, `mcp_client.py`, `shell_runner.py`).
|
||||
- [x] Task: Map the **GUI Event & State Pipeline** (`gui_2.py`, `app_controller.py`).
|
||||
- [x] Task: Document data responsibilities and state boundaries for each route.
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 2' (Protocol in workflow.md)
|
||||
|
||||
## Phase 3: Mapping Simulation Pipelines (`./simulation`)
|
||||
- [x] Task: Map the **Simulation Lifecycle** (`sim_base.py`, `sim_context.py`, `workflow_sim.py`).
|
||||
- [x] Task: Analyze data flow between `sim_ai_settings.py` and the execution engine.
|
||||
- [x] Task: Document the "Verification & Checkpointing" route in simulations.
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 3' (Protocol in workflow.md)
|
||||
|
||||
## Phase 4: Synthesis & Reporting
|
||||
- [x] Task: Consolidate all findings into Mermaid diagrams within `PIPELINE_ANALYSIS.md`.
|
||||
- [x] Task: Identify specific "Curation Targets" (redundancies, style violations) for the next track.
|
||||
- [x] Task: Final review and hand-off to Track 2 (Codebase Curation).
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 4' (Protocol in workflow.md)
|
||||
@@ -1,27 +0,0 @@
|
||||
# Specification: Code Path & Data Pipeline Analysis (code_path_analysis_20260507)
|
||||
|
||||
## Overview
|
||||
A deep architectural audit focused on mapping the "processing routes" and "data pipelines" of the Manual Slop codebase. This analysis will treat the program as a series of data-driven pipelines (similar to Ryan Fleury's model), identifying exactly how data flows through `./src` and `./simulation`.
|
||||
|
||||
## Scope
|
||||
- **Core Codebase:** `./src`
|
||||
- **Simulation Infrastructure:** `./simulation`
|
||||
- **Granularity:** Both high-level module interactions and detailed function-to-function execution flows.
|
||||
|
||||
## Functional Requirements
|
||||
1. **Pipeline Mapping:**
|
||||
- Identify major execution "routes" (e.g., UI Event Loop, AI Tool-Call Loop, Context Aggregation Pipeline).
|
||||
- Map these routes from entry point to terminal state.
|
||||
2. **Data Responsibility Audit:**
|
||||
- For every major path, define which data structures it owns, modifies, or depends upon.
|
||||
- Identify state boundaries and potential "data leaks" or redundant processing.
|
||||
3. **Simulation Pipeline Audit:**
|
||||
- Fully map the lifecycle of a simulation: State Setup -> Agent Injection -> Execution Loop -> Verification -> Cleanup.
|
||||
4. **Automated Extraction:**
|
||||
- Utilize MCP tools and potentially custom `tree-sitter` scripts to verify call graphs and data dependencies.
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] Comprehensive `PIPELINE_ANALYSIS.md` report created in the root.
|
||||
- [ ] Mermaid flowcharts documenting every major processing route.
|
||||
- [ ] Data responsibility table for all mapped paths.
|
||||
- [ ] Full mapping of the `./simulation` pipeline.
|
||||
@@ -1,39 +0,0 @@
|
||||
# Codebase Audit Report - 2026-05-02
|
||||
|
||||
## Overview
|
||||
This report summarizes the findings of the codebase audit performed on the `./src` directory. The audit focused on human readability, maintainability, and identifying architectural redundancies.
|
||||
|
||||
## Key Findings: Architectural Redundancies
|
||||
|
||||
### 1. AI Client Provider Proliferation (`src/ai_client.py`)
|
||||
**Observation:** The `ai_client.py` module contains significantly redundant code paths for each supported LLM provider (Gemini, Anthropic, DeepSeek, MiniMax). Specifically:
|
||||
- **Send Methods:** Each provider has its own `_send_<provider>` method with nearly identical structure for tool handling and response parsing.
|
||||
- **Error Classification:** Multiple `_classify_<provider>_error` functions perform similar mappings of vendor exceptions to internal `ProviderError`.
|
||||
- **Model Listing:** Redundant `_list_<provider>_models` functions.
|
||||
- **History Management:** Separate locks and list structures for each provider's history.
|
||||
|
||||
**Recommendation:** Abstract the provider logic into a base `AIProvider` class or interface. Each vendor (Gemini, Anthropic, etc.) should implement this interface, allowing `ai_client.py` to dispatch calls polymorphically.
|
||||
|
||||
### 2. Tool Name Redundancy (`src/mcp_client.py` & `src/models.py`)
|
||||
**Observation:** The list of available agent tools was defined in multiple places:
|
||||
- `mcp_client.TOOL_NAMES` (Hardcoded set)
|
||||
- `models.AGENT_TOOL_NAMES` (Hardcoded list)
|
||||
- `mcp_client.MCP_TOOL_SPECS` (Canonical source for tool definitions)
|
||||
|
||||
**Action Taken:** `mcp_client.TOOL_NAMES` was refactored to be dynamically generated from `MCP_TOOL_SPECS`.
|
||||
**Recommendation:** Consolidate `models.AGENT_TOOL_NAMES` to also derive from `mcp_client` or a shared tool registry to ensure synchronization when new tools are added.
|
||||
|
||||
### 3. Orchestrator Wrapper Redundancy (`src/native_orchestrator.py`)
|
||||
**Observation:** The `NativeOrchestrator` class methods (e.g., `load_plan`, `save_track`) were found to be thin wrappers around module-level helper functions.
|
||||
|
||||
**Action Taken:** Replaced hardcoded paths in these helpers with calls to the standardized `src.paths` module.
|
||||
**Recommendation:** Evaluate if the `NativeOrchestrator` class is necessary if it remains state-free, or move the helper logic entirely into class methods.
|
||||
|
||||
## Documentation Improvements
|
||||
|
||||
- Added missing docstrings to critical public functions in `ai_client.py`, `mcp_client.py`, `native_orchestrator.py`, `api_hook_client.py`, and `api_hooks.py`.
|
||||
- Consolidated module-level docstrings in `multi_agent_conductor.py`.
|
||||
- Ensured consistent 1-space indentation and CRLF line endings across all modified files.
|
||||
|
||||
## Conclusion
|
||||
The core orchestration and AI client layers are functionally robust but would benefit from an abstraction pass to reduce the maintenance burden of adding new providers or tools.
|
||||
@@ -1,25 +0,0 @@
|
||||
# Granular Review Protocol: Codebase Curation
|
||||
|
||||
This protocol defines the mandatory procedure for auditing and modifying files during the Phase 5 Heavy Curation. It is designed to minimize entropy and prevent regression propagation.
|
||||
|
||||
## 1. File-by-File Audit Cycle
|
||||
|
||||
For every `.py` file identified for curation:
|
||||
|
||||
1. **Dependency Check:** Use `derive_code_path` and `py_get_imports` to identify all upstream and downstream dependencies.
|
||||
2. **State Verification:** Consult the `MUTATION_MATRIX_PHASE5.md` to identify any global state modifications performed by the file.
|
||||
3. **Redundancy Identification:** Cross-reference the file against `CULLING_CANDIDATES_PHASE5.md`.
|
||||
4. **Proposed Change Log:** Before editing, document the specific lines/symbols to be removed or refactored and the technical justification (e.g., "Superseded by theme_2.py").
|
||||
5. **Surgical Edit:** Use the `replace` tool for targeted deletions. Avoid bulk file overwrites.
|
||||
|
||||
## 2. Regression Guardrails
|
||||
|
||||
- **Functional Parity:** After every major deletion (e.g., removing a redundant module), run the associated unit tests (if any).
|
||||
- **Simulation Verification:** For changes to core pipelines (AI loop, Aggregation), run at least one relevant simulation (e.g., `simulation/ping_pong.py`) to verify end-to-end behavior.
|
||||
- **Human-in-the-Loop:** Significant refactors (e.g., the `aggregate.py` rework) MUST be presented to the user with a detailed diff before final commitment.
|
||||
|
||||
## 3. Culling Justification Standards
|
||||
|
||||
- **"Unused"**: Symbol has 0 project-wide references in the audit.
|
||||
- **"Redundant"**: Logic exists in a superior or more modern form elsewhere (e.g., `theme.py`).
|
||||
- **"Slop"**: Code that adds complexity without contributing to performance, configuration, or a specified feature.
|
||||
@@ -1,5 +0,0 @@
|
||||
# Track codebase_curation_20260507 Context
|
||||
|
||||
- [Specification](./spec.md)
|
||||
- [Implementation Plan](./plan.md)
|
||||
- [Metadata](./metadata.json)
|
||||
@@ -1,8 +0,0 @@
|
||||
{
|
||||
"track_id": "codebase_curation_20260507",
|
||||
"type": "chore",
|
||||
"status": "new",
|
||||
"created_at": "2026-05-07T15:00:00Z",
|
||||
"updated_at": "2026-05-07T15:00:00Z",
|
||||
"description": "Exhaustive review of all .py files. Remove redundancies, eliminate unnecessary code/data/processing, and strictly align with project standards."
|
||||
}
|
||||
@@ -1,29 +0,0 @@
|
||||
# Implementation Plan: Comprehensive Codebase Curation & Style Alignment (codebase_curation_20260507)
|
||||
|
||||
## Phase 0: Context Integration & Strategy
|
||||
- [x] Task: Review all Phase 5 analysis reports in `./docs` to internalize the curation roadmap.
|
||||
- [x] Task: Define a "Granular Review Protocol" for file-by-file auditing and culling.
|
||||
- [x] Task: Conductor - User Manual Verification 'Curation Strategy' (Protocol in workflow.md)
|
||||
|
||||
## Phase 1: Automated Standardization & Audit
|
||||
- [~] Task: Run `scripts/ai_style_formatter.py` and `scripts/force_1space.py` on all files in `./src` and `./simulation`.
|
||||
- [ ] Task: Conduct an automated entropy audit to identify potential redundancy "hotspots".
|
||||
- [ ] Task: Conductor - User Manual Verification 'Standardization' (Protocol in workflow.md)
|
||||
|
||||
## Phase 2: Surgical Curation of `./src`
|
||||
- [ ] Task: Comprehensive rework of `src/aggregate.py`. Modernize context assembly to leverage MCP tools, snapshots, and file caching. Consolidate tier-specific boilerplate.
|
||||
- [ ] Task: Review and trim `gui_2.py` and `app_controller.py` based on pipeline maps.
|
||||
- [ ] Task: Consolidate data models in `models.py` and remove redundant state in `aggregate.py`.
|
||||
- [ ] Task: Refactor `ai_client.py` to ensure lean processing of provider responses.
|
||||
- [ ] Task: Conductor - User Manual Verification 'Source Curation' (Protocol in workflow.md)
|
||||
|
||||
## Phase 3: Surgical Curation of `./simulation`
|
||||
- [ ] Task: Review and trim `./simulation/` base classes and utility scripts.
|
||||
- [ ] Task: Eliminate redundant setup logic in `sim_context.py` and `workflow_sim.py`.
|
||||
- [ ] Task: Conductor - User Manual Verification 'Simulation Curation' (Protocol in workflow.md)
|
||||
|
||||
## Phase 4: Final Integrity Pass
|
||||
- [ ] Task: Verify all tests pass with the trimmed codebase.
|
||||
- [ ] Task: Final comparison against `product-guidelines.md` for architectural purity.
|
||||
- [ ] Task: Final performance baseline check to ensure no regressions.
|
||||
- [ ] Task: Conductor - User Manual Verification 'Final Review' (Protocol in workflow.md)
|
||||
@@ -1,35 +0,0 @@
|
||||
# Specification: Comprehensive Codebase Curation & Style Alignment (codebase_curation_20260507)
|
||||
|
||||
## Overview
|
||||
Aggressive pruning, optimization, and standardization of the codebase. This track uses the findings from the Code Path Analysis and other Phase 5 audits to trim away non-essential logic, data, and processing while strictly enforcing the project's technical integrity standards.
|
||||
|
||||
## Foundational Context (MANDATORY REVIEW)
|
||||
All curation efforts MUST be informed by the following Phase 5 analysis reports:
|
||||
- `docs/PIPELINE_ANALYSIS_PHASE5_INIT.md`: Processing route and pipeline mapping.
|
||||
- `docs/STATE_INVENTORY_PHASE5.md`: Core data structure and property inventory.
|
||||
- `docs/MUTATION_MATRIX_PHASE5.md`: Thread-safe state modification and lock map.
|
||||
- `docs/CULLING_CANDIDATES_PHASE5.md`: Identified redundant symbols, modules, and structures.
|
||||
|
||||
## Granular Care & Regression Guardrails
|
||||
- **Surgical Execution:** Changes must be applied file-by-file with extreme granularity. No bulk culling without individual justification.
|
||||
- **Regression Monitoring:** Continuous verification of behavioral integrity. Any unintended entropy or performance degradation must trigger an immediate halt and review.
|
||||
- **Traceability:** Every removed line must be cross-referenced against the culling audit or pipeline map.
|
||||
|
||||
## Scope
|
||||
- **Target Files:** All `.py` files in `./src` and `./simulation`.
|
||||
- **Primary Goal:** Trimming the "slop" (redundancies, dead code, excessive complexity).
|
||||
|
||||
## Functional Requirements
|
||||
1. **Redundancy Pruning:** Eliminate duplicate logic across different data pipelines.
|
||||
2. **Dead Code Removal:** Strip out legacy "just-in-case" code and unused processing paths.
|
||||
3. **Strict Style Enforcement:**
|
||||
- Universal 1-space indentation.
|
||||
- CRLF line endings.
|
||||
- Standardized type hinting.
|
||||
4. **Guideline Alignment:** Refactor any code that deviates from `product-guidelines.md` (e.g., ensuring explicit composition over complex inheritance).
|
||||
5. **Validation:** Ensure no loss of functionality or performance degradation.
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] Significant reduction in total codebase line count (where applicable).
|
||||
- [ ] 100% pass on style audit (`scripts/ai_style_formatter.py`).
|
||||
- [ ] All remaining code is mapped to a necessary functional requirement or performance goal.
|
||||
@@ -1,5 +0,0 @@
|
||||
# Track command_palette_and_performance_20260602 Context
|
||||
|
||||
- [Specification](./spec.md)
|
||||
- [Implementation Plan](./plan.md)
|
||||
- [Metadata](./metadata.json)
|
||||
@@ -1,8 +0,0 @@
|
||||
{
|
||||
"track_id": "command_palette_and_performance_20260602",
|
||||
"type": "feature",
|
||||
"status": "new",
|
||||
"created_at": "2026-06-02T00:00:00Z",
|
||||
"updated_at": "2026-06-02T00:00:00Z",
|
||||
"description": "Implement Async Context Preview to fix UI hangs and add an 'Everything' Command Palette."
|
||||
}
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user