MMA_Support draft
This commit is contained in:
32
MMA_Support/Data_Pipelines_and_Config.md
Normal file
32
MMA_Support/Data_Pipelines_and_Config.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# Data Pipelines, Memory Views & Configuration
|
||||
|
||||
The 4-Tier Architecture relies on strictly managed data pipelines and configuration files to prevent token bloat and maintain a deterministically safe execution environment.
|
||||
|
||||
## 1. AST Extraction Pipelines (Memory Views)
|
||||
|
||||
To prevent LLMs from hallucinating or consuming massive context windows, raw file text is heavily restricted. The `file_cache.py` uses Tree-sitter for deterministic Abstract Syntax Tree (AST) parsing to generate specific views:
|
||||
|
||||
1. **The Directory Map (Tier 1):** Just filenames and nested paths (e.g., output of `tree /F`). No source code.
|
||||
2. **The Skeleton View (Tier 2 & 3 Dependencies):** Extracts only `class` and `def` signatures, parameters, and type hints. Strips all docstrings and function bodies, replacing them with `pass`. Used for foreign modules a worker must call but not modify.
|
||||
3. **The Curated Implementation View (Tier 2 Target Modules):**
|
||||
* Keeps class/struct definitions.
|
||||
* Keeps module-level docstrings and block comments (heuristics).
|
||||
* Keeps full bodies of functions marked with `@core_logic` or `# [HOT]`.
|
||||
* Replaces standard function bodies with `... # Hidden`.
|
||||
4. **The Raw View (Tier 3 Target File):** Unredacted, line-by-line source code of the *single* file a Tier 3 worker is assigned to modify.
|
||||
|
||||
## 2. Configuration Schema
|
||||
|
||||
The architecture separates sensitive billing logic from AI behavior routing.
|
||||
|
||||
* **`credentials.toml` (Security Prerequisite):** Holds the bare metal authentication (`gemini_api_key`, `anthropic_api_key`, `deepseek_api_key`). **This file must be in `.gitignore`.** Loaded strictly for instantiating HTTP clients.
|
||||
* **`project.toml` (Repo Rules):** Holds repository-specific bounds (e.g., "This project uses Python 3.12 and strictly follows PEP8").
|
||||
* **`agents.toml` (AI Routing):** Defines the hardcoded hierarchy's operational behaviors. Includes fallback models (`default_expensive`, `default_cheap`), Tier 1/2 overarching parameters (temperature, base system prompts), and Tier 3 worker archetypes (`refactor`, `codegen`, `contract_stubber`) mapped to specific models (DeepSeek V3, Gemini Flash) and `trust_level` tags (`step` vs. `auto`).
|
||||
|
||||
## 3. LLM Output Formats
|
||||
|
||||
To ensure robust parser execution and avoid JSON string-escaping nightmares, the architecture uses a hybrid approach for LLM outputs depending on the Tier:
|
||||
|
||||
* **Native Structured Outputs (JSON Schema forced by API):** Used for Tier 1 and Tier 2 routing and orchestration. The model provider mathematically guarantees the syntax, allowing clean parsing of `Track` and `Ticket` metadata by `pydantic`.
|
||||
* **XML Tags (`<file_path>`, `<file_content>`):** Used for Tier 3 Code Generation & Tools. It natively isolates syntax and requires zero string escaping. The UI/Orchestrator parses these via regex to safely extract raw Python code without bracket-matching failures.
|
||||
* **Godot ECS Flat List (Linearized Entities with ID Pointers):** Instead of deeply nested JSON (which models hallucinate across 500 tokens), Tier 1/2 Orchestrators define complex dependency DAGs as a flat list of items (e.g., `[Ticket id="tkt_impl" depends_on="tkt_stub"]`). The Python state machine reconstructs the DAG locally.
|
||||
46
MMA_Support/Implementation_Tracks.md
Normal file
46
MMA_Support/Implementation_Tracks.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Iteration Plan (Implementation Tracks)
|
||||
|
||||
To safely refactor a linear, single-agent codebase into the 4-Tier Multi-Model Architecture without breaking the working prototype, the implementation should be sequenced into these five isolated Epics (Tracks):
|
||||
|
||||
## Track 1: The Memory Foundations (AST Parser)
|
||||
**Goal:** Build the engine that prevents token-bloat by turning massive source files into curated memory views.
|
||||
**Implementation Details:**
|
||||
1. Integrate `tree-sitter` and language bindings into `file_cache.py`.
|
||||
2. Build `ASTParser` extraction rules:
|
||||
* *Skeleton View:* Strip function/class bodies, preserving only signatures, parameters, and type hints.
|
||||
* *Curated View:* Preserve class structures, module docstrings, and bodies of functions marked `# [HOT]` or `@core_logic`. Replace standard bodies with `... # Hidden`.
|
||||
3. **Acceptance:** `file_cache.get_curated_view('script.py')` returns a perfectly formatted summary string in the terminal.
|
||||
|
||||
## Track 2: State Machine & Data Structures
|
||||
**Goal:** Define the rigid Python objects the AI agents will pass to each other to rely on structured data, not loose chat strings.
|
||||
**Implementation Details:**
|
||||
1. Create `models.py` with `pydantic` or `dataclasses` for `Track` (Epic) and `Ticket` (Task).
|
||||
2. Define `WorkerContext` holding the Ticket ID, assigned model (from `agents.toml`), isolated `credentials.toml` injection, and a `messages` payload array.
|
||||
3. Add helper methods for state mutators (e.g., `ticket.mark_blocked()`, `ticket.mark_complete()`).
|
||||
4. **Acceptance:** Instantiate a `Track` with 3 `Tickets` and successfully enforce state changes in Python without AI involvement.
|
||||
|
||||
## Track 3: The Linear Orchestrator & Execution Clutch
|
||||
**Goal:** Build the synchronous, debuggable core loop that runs a single Tier 3 Worker and pauses for human approval.
|
||||
**Implementation Details:**
|
||||
1. Create `multi_agent_conductor.py` with a `run_worker_lifecycle(ticket: Ticket)` function.
|
||||
2. Inject context (Raw View from `file_cache.py`) and format the `messages` array for the API.
|
||||
3. Implement the Clutch (HITL): `input()` pause for CLI or wait state for GUI before executing the returned tool (e.g., `write_file`). Allow manual memory mutation of the JSON payload.
|
||||
4. **Acceptance:** The script sends a hardcoded Ticket to DeepSeek, pauses in the terminal showing a diff, waits for user approval, applies the diff via `mcp_client.py`, and wipes the worker's history.
|
||||
|
||||
## Track 4: Tier 4 QA Interception
|
||||
**Goal:** Stop error traces from destroying the Worker's token window by routing crashes through a stateless translator.
|
||||
**Implementation Details:**
|
||||
1. In `shell_runner.py`, intercept `stderr` (e.g., `returncode != 0`).
|
||||
2. Do *not* append `stderr` to the main Worker's history. Instead, instantiate a synchronous API call to the `default_cheap` model.
|
||||
3. Prompt: *"You are an error parser. Output only a 1-2 sentence instruction on how to fix this syntax error."* Send the raw `stderr` and target file snippet.
|
||||
4. Append the translated 20-word fix to the main Worker's history as a "System Hint".
|
||||
5. **Acceptance:** A deliberate syntax error triggers the execution engine to silently ping the cheap API, returning a 20-word correction to the Worker instead of a 200-line stack trace.
|
||||
|
||||
## Track 5: UI Decoupling & Tier 1/2 Routing (The Final Boss)
|
||||
**Goal:** Bring the system online by letting Tier 1 and Tier 2 dynamically generate Tickets managed by the async Event Bus.
|
||||
**Implementation Details:**
|
||||
1. Implement an `asyncio.Queue` in `multi_agent_conductor.py`.
|
||||
2. Write Tier 1 & 2 system prompts forcing output as strict JSON arrays (Tracks and Tickets).
|
||||
3. Write the Dispatcher async loop to convert JSON into `Ticket` objects and push to the queue.
|
||||
4. Enforce the Stub Resolver: If a Ticket archetype is `contract_stubber`, pause dependent Tickets, run the stubber, trigger `file_cache.py` to rebuild the Skeleton View, then resume.
|
||||
5. **Acceptance:** Vague prompt ("Refactor config system") results in Tier 1 Track, Tier 2 Tickets (Interface stub + Implementation). System executes stub, updates AST, and finishes implementation automatically (or steps through if Linear toggle is on).
|
||||
@@ -1,22 +0,0 @@
|
||||
# Mapping MMA to Manual Slop
|
||||
|
||||
This document maps the components of the `manual_slop` project to the 4-Tier Hierarchical Multi-Model Architecture.
|
||||
|
||||
## Tier 1: User-Facing Model (Orchestrator)
|
||||
* **`gui.py` & `gui_2.py`:** Provides the user interface for input and displays the synthesized output.
|
||||
* **`ai_client.py`:** Acts as the primary orchestrator, managing the conversation loop and determining when to call specific tools or providers.
|
||||
|
||||
## Tier 2: Specialized Models (Experts/Tools)
|
||||
* **`mcp_client.py`:** Provides a suite of specialized "tools" (e.g., `read_file`, `list_directory`, `search_files`) that act as domain experts for file system manipulation.
|
||||
* **`shell_runner.py`:** A specialist tool for executing PowerShell scripts to perform system-level changes.
|
||||
* **External AI Providers:** Gemini and Anthropic models are used as the "engines" behind these specialized operations.
|
||||
|
||||
## Tier 3: Data & Knowledge Base (Information)
|
||||
* **`aggregate.py`:** The primary mechanism for building the context sent to the AI. It retrieves file contents and metadata to ground the AI's reasoning.
|
||||
* **`manual_slop.toml`:** Stores project-specific configuration, tracked files, and discussion history.
|
||||
* **`file_cache.py`:** Optimizes data retrieval from the local file system.
|
||||
|
||||
## Tier 4: Monitoring & Feedback (Governance)
|
||||
* **`session_logger.py`:** Handles timestamped logging of communication history (`logs/comms_<ts>.log`) and tool calls.
|
||||
* **`performance_monitor.py`:** Tracks metrics related to execution time and resource usage.
|
||||
* **Script Archival:** Generated `.ps1` scripts are saved to `scripts/generated/` for later review and auditing.
|
||||
37
MMA_Support/Orchestrator_Engine.md
Normal file
37
MMA_Support/Orchestrator_Engine.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# The Orchestrator Engine & UI
|
||||
|
||||
To transition from a linear, single-agent chat box to a multi-agent control center, the GUI must be decoupled from the LLM execution loops. A single-agent UI assumes a linear flow (*User types -> UI waits -> LLM responds -> UI updates*), which freezes the application if a Tier 1 PM waits for human approval while Tier 3 Workers run local tests in the background.
|
||||
|
||||
## 1. The Async Event Bus (Decoupling UI from Agents)
|
||||
|
||||
The GUI acts as a "dumb" renderer. It only renders state; it never manages state.
|
||||
|
||||
* **The Agent Bus (Message Queue):** A thread-safe signaling system (e.g., `asyncio.Queue`, `pyqtSignal`) passes messages between agents, UI, and the filesystem.
|
||||
* **Background Workers:** When Tier 1 spawns a Tier 2 Tech Lead, the GUI does not wait. It pushes a `UserRequestEvent` to the Conductor's queue. The Conductor runs the LLM call asynchronously and fires `StateUpdateEvents` back for the GUI to redraw.
|
||||
|
||||
## 2. The Execution Clutch (HITL)
|
||||
|
||||
Every spawned worker panel implements an execution state toggle based on the `trust_level` defined in `agents.toml`.
|
||||
|
||||
* **Step Mode (Lock-step):** The worker pauses **twice** per cycle:
|
||||
1. *After* generating a response/tool-call, but *before* executing the tool. The GUI renders a preview (e.g., diff of lines 40-50) and offers `[Approve]`, `[Edit Payload]`, or `[Abort]`.
|
||||
2. *After* executing the tool, but *before* sending output back to the LLM (allows verification of the system output).
|
||||
* **Auto Mode (Fire-and-forget):** The worker loops continuously until it outputs a "Task Complete" status to the Router.
|
||||
|
||||
## 3. Memory Mutation (The "Debug" Superpower)
|
||||
|
||||
If a worker generates a flawed plan in Step Mode, the "Memory Mutator" allows the user to click the last message and edit the raw JSON/text directly before hitting "Approve." By rewriting the AI's brain mid-task, the model proceeds as if it generated the correct idea, saving the context window from restarting due to a minor hallucination.
|
||||
|
||||
## 4. The Global Execution Toggle
|
||||
|
||||
A Global Execution Toggle overrides all individual agent trust levels for debugging race conditions or context leaks.
|
||||
|
||||
* **Mode = "async" (Production):** The Dispatcher throws Tickets into an `asyncio.TaskGroup`. They spawn instantly, fight for API rate limits, read the skeleton, and run in parallel.
|
||||
* **Mode = "linear" (Debug):** The Dispatcher iterates through the array sequentially using a strict `for` loop. It `awaits` absolute completion of Ticket 1 (including QA loops and code review) before instantiating the `WorkerAgent` for Ticket 2. This enforces a deterministic state machine and outputs state snapshots (`debug_state.json`) for manual verification.
|
||||
|
||||
## 5. State Machine (Dataclasses)
|
||||
|
||||
The Conductor relies on strict definitions for `Track` and `Ticket` to enforce state and UI rendering (e.g., using `dataclasses` or `pydantic`).
|
||||
|
||||
* **`Ticket`:** Contains `id`, `target_file`, `prompt`, `worker_archetype`, `status` (pending, running, blocked, step_paused, completed), and a `dependencies` list of Ticket IDs that must finish first.
|
||||
* **`Track`:** Contains `id`, `title`, `description`, `status`, and a list of `Tickets`.
|
||||
@@ -1,27 +1,18 @@
|
||||
# 4-Tier Hierarchical Multi-Model Architecture (MMA) - Overview
|
||||
# System Specification: 4-Tier Hierarchical Multi-Model Architecture
|
||||
|
||||
The 4-Tier Hierarchical Multi-Model Architecture is a conceptual framework designed to manage complexity in AI systems by decomposing responsibilities into distinct, specialized layers. This modular approach enhances scalability, maintainability, and overall system performance.
|
||||
**Project:** `manual_slop` (or equivalent Agentic Co-Dev Prototype)
|
||||
|
||||
## Architectural Tiers
|
||||
**Core Philosophy:** Token Economy, Strict Memory Siloing, and Human-In-The-Loop (HITL) Execution.
|
||||
|
||||
1. **Tier 1: User-Facing Model (The Orchestrator/Router)**
|
||||
* Direct user interface and intent interpretation.
|
||||
* Routes requests to appropriate specialized models or tools.
|
||||
## 1. Architectural Overview
|
||||
|
||||
2. **Tier 2: Specialized Models (The Experts/Tools)**
|
||||
* Domain-specific models or tools (e.g., code generation, data analysis).
|
||||
* Performs the "heavy lifting" for specific tasks.
|
||||
This system rejects the "monolithic black-box" approach to agentic coding. Instead of passing an entire codebase into a single expensive context window, the architecture mimics a senior engineering department. It uses a 4-Tier hierarchy where cognitive load and context are aggressively filtered from top to bottom.
|
||||
|
||||
3. **Tier 3: Data & Knowledge Base (The Information Layer)**
|
||||
* A repository of structured and unstructured information.
|
||||
* Provides context and facts to specialized models.
|
||||
Expensive, high-reasoning models manage metadata and architecture (Tier 1 & 2), while cheap, fast models handle repetitive syntax and error parsing (Tier 3 & 4).
|
||||
|
||||
4. **Tier 4: Monitoring & Feedback (The Governance Layer)**
|
||||
* Overarching layer for evaluation, error analysis, and continuous improvement.
|
||||
* Closes the loop between user experience and model refinement.
|
||||
### 1.1 Core Paradigms
|
||||
|
||||
## Core Goals
|
||||
* **Modularity:** Decouple different functions to allow for independent development.
|
||||
* **Efficiency:** Use smaller, specialized models for specific tasks instead of one monolithic model.
|
||||
* **Contextual Accuracy:** Ensure specialized tools have access to relevant data.
|
||||
* **Continuous Improvement:** Establish a systematic way to monitor performance and iterate.
|
||||
* **Token Firewalling:** Error logs and deep history are never allowed to bubble up to high-tier models. The system relies heavily on abstracted AST views (Skeleton, Curated) rather than raw code when context allows.
|
||||
* **Context Amnesia:** Worker agents (Tier 3) have their trial-and-error histories wiped upon task completion to prevent context ballooning and hallucination.
|
||||
* **The Execution Clutch (HITL):** Agents operate based on Archetype Trust Scores defined in configuration. Trusted patterns run in `Auto` mode; untrusted or complex refactors run in `Step` mode, pausing before tool execution for human review and JSON history mutation.
|
||||
* **Interface-Driven Development (IDD):** The architecture inherently prioritizes the creation of contracts (stubs, schemas) before implementation, allowing workers to proceed in parallel without breaking cross-module boundaries.
|
||||
@@ -1,30 +0,0 @@
|
||||
# Principles & Interactions
|
||||
|
||||
The effectiveness of the 4-Tier Multi-Model Architecture depends on well-defined interfaces and clear communication protocols between layers.
|
||||
|
||||
## Interaction Flow
|
||||
|
||||
1. **Ingress:** The User sends a query to Tier 1.
|
||||
2. **Intent & Routing:** Tier 1 analyzes the query and identifies the required expertise.
|
||||
3. **Specialist Call:** Tier 1 dispatches a request to one or more Tier 2 specialists.
|
||||
4. **Knowledge Retrieval:** Tier 2 specialists query Tier 3 for specific facts or context needed for their task.
|
||||
5. **Execution:** Tier 2 specialists process the request using the retrieved data.
|
||||
6. **Synthesis:** Tier 1 receives the output from Tier 2, synthesizes it, and presents it to the User.
|
||||
7. **Observation:** Tier 4 logs the entire transaction, collects feedback, and updates metrics.
|
||||
|
||||
## Core Architectural Principles
|
||||
|
||||
### 1. Separation of Concerns
|
||||
Each tier should have a single, clear responsibility. Tier 1 should not perform heavy computation; Tier 2 should not handle user-facing conversation logic.
|
||||
|
||||
### 2. Standardized Communication
|
||||
Use structured data formats (like JSON) for all inter-tier communication. This ensures that different models (potentially from different providers) can work together seamlessly.
|
||||
|
||||
### 3. Graceful Degradation
|
||||
If a Tier 2 specialist fails or is unavailable, Tier 1 should be able to fall back to a more general model or provide a meaningful error message to the user.
|
||||
|
||||
### 4. Verification Over Trust
|
||||
Tier 1 should validate the output of Tier 2 specialists before presenting it to the user. Tier 4 should periodically audit the entire pipeline to ensure quality and safety.
|
||||
|
||||
### 5. Data Privacy & Governance
|
||||
Ensure that data flowing through Tier 3 and 4 is handled according to security policies, with proper sanitization and access controls.
|
||||
@@ -1,59 +0,0 @@
|
||||
# Technical Deep Dive: Paths & Nuances
|
||||
|
||||
This document explores the low-level technical execution paths and implementation nuances of the 4-Tier Hierarchical Multi-Model Architecture.
|
||||
|
||||
## 1. Execution Paths
|
||||
|
||||
The architecture distinguishes between different "paths" to optimize for latency, cost, and accuracy.
|
||||
|
||||
### A. The Fast Path (Reactive)
|
||||
* **Trigger:** Low-complexity intents (e.g., "Hello", "What is the current time?", "Status check").
|
||||
* **Flow:** User -> Tier 1 -> User.
|
||||
* **Nuance:** Tier 1 identifies that no specialized knowledge (Tier 3) or tool execution (Tier 2) is required. It responds directly using its internal weights or a local cache.
|
||||
* **Goal:** Sub-100ms response time.
|
||||
|
||||
### B. The Slow Path (Reflective / Agentic)
|
||||
* **Trigger:** Complex tasks (e.g., "Fix the bug in the UI layout", "Refactor the ai_client.py").
|
||||
* **Flow:** User -> Tier 1 (Intent) -> Tier 2 (Specialist) -> Tier 3 (Context/RAG) -> Tier 2 (Execution) -> Tier 1 (Synthesis) -> User.
|
||||
* **Nuance:** This involves high-latency operations, including tool calls and codebase searches. Tier 1 acts as a supervisor, potentially looping back to Tier 2 if the initial output is insufficient.
|
||||
|
||||
### C. The Governance Path (Tier 4 Integration)
|
||||
* **Trigger:** Any operation that modifies the system or presents a high-risk answer.
|
||||
* **Flow:** (Parallel or Post-hoc) Tier 1/2 Output -> Tier 4 (Validation) -> User/Log.
|
||||
* **Nuance:** Tier 4 runs an "LLM-as-a-judge" or a static analysis tool (like `ruff` or `mypy`) on the output. If validation fails, the system may automatically trigger a "re-plan" in Tier 1.
|
||||
|
||||
---
|
||||
|
||||
## 2. Context & Token Management
|
||||
|
||||
A critical nuance is how the limited context window (token budget) is managed across tiers.
|
||||
|
||||
### A. Token Budgeting
|
||||
* **Tier 1 (Global Context):** Holds the conversation history and high-level project metadata. Budget: ~20% of window.
|
||||
* **Tier 2 (Local Context):** Receives a "surgical" injection of relevant files/data from Tier 3. Budget: ~60% of window.
|
||||
* **Output Space:** Reserved for generating large code blocks or summaries. Budget: ~20% of window.
|
||||
|
||||
### B. Context Folding (The "Accordion" Effect)
|
||||
To prevent context overflow, the system "folds" (summarizes) older parts of the conversation.
|
||||
* **Recent History:** Full fidelity.
|
||||
* **Mid-term History:** Summarized by Tier 1.
|
||||
* **Long-term History:** Archived in Tier 3 (searchable but not in-context).
|
||||
|
||||
---
|
||||
|
||||
## 3. Communication Protocols
|
||||
|
||||
* **Inter-Tier Format:** Strictly structured JSON (e.g., OpenAI Tool Call format or Google GenAI Function Call).
|
||||
* **Streaming:** Tier 1 typically streams its "thinking" process (Slow Path) to provide the user with immediate feedback while Tier 2 is still working.
|
||||
* **Handshake:** Tier 2 must acknowledge receipt of context from Tier 3 with a "Digest" hash to ensure data integrity.
|
||||
|
||||
---
|
||||
|
||||
## 4. Nuances vs. Standard RAG
|
||||
|
||||
| Feature | Standard RAG | MMA (4-Tier) |
|
||||
| :--- | :--- | :--- |
|
||||
| **Logic** | Flat (Query -> Doc -> Result) | Hierarchical (Intent -> Route -> Expert -> Doc) |
|
||||
| **Expertise** | Homogeneous | Heterogeneous (Different models for different tiers) |
|
||||
| **Feedback** | Manual | Automated (Tier 4 Closed-loop) |
|
||||
| **State** | Stateless or simple session | Multi-layered state (Orchestrator vs Specialist state) |
|
||||
@@ -1,30 +1,38 @@
|
||||
# Tier 1: User-Facing Model (Orchestrator/Router)
|
||||
# Tier 1: The Top-Level Orchestrator (Product Manager)
|
||||
|
||||
The User-Facing Model is the entry point for all user interactions. It serves as the "brain" that understands what the user wants and decides how the system should respond.
|
||||
**Designated Models:** Gemini 3.1 Pro, Claude 3.5 Sonnet.
|
||||
**Execution Frequency:** Low (Start of feature, Macro-merge resolution).
|
||||
**Core Role:** Epic planning, architecture enforcement, and cross-module task delegation.
|
||||
|
||||
## Key Responsibilities
|
||||
The Tier 1 Orchestrator is the most capable and expensive model in the hierarchy. It operates strictly on metadata, summaries, and executive-level directives. It **never** sees raw implementation code.
|
||||
|
||||
### 1. Intent Recognition
|
||||
* Analyze the user's natural language input.
|
||||
* Classify the request into one or more categories (e.g., "request for code", "general inquiry", "data analysis").
|
||||
* Extract key parameters and constraints from the user's query.
|
||||
## Memory Context & Paths
|
||||
|
||||
### 2. Routing
|
||||
* Map recognized intents to specific Tier 2 models or tools.
|
||||
* Determine if multiple specialized tools need to be called in sequence or parallel.
|
||||
* Handle tool dispatching and manage the flow of data between tiers.
|
||||
### Path A: Epic Initialization (Project Planning)
|
||||
* **Trigger:** User drops a massive new feature request or architectural shift into the main UI.
|
||||
* **What it Sees (Context):**
|
||||
* **The User Prompt:** The raw feature request.
|
||||
* **Project Meta-State:** `project.toml` (rules, allowed languages, dependencies).
|
||||
* **Repository Map:** A strict, file-tree outline (names and paths only).
|
||||
* **Global Architecture Docs:** High-level markdown files (e.g., `docs/guide_architecture.md`).
|
||||
* **What it Ignores:** All source code, all AST skeletons, and all previous micro-task histories.
|
||||
* **Output Format:** A JSON array (Godot ECS Flat List format) of `Tracks` (Jira Epics), identifying which modules will be affected, the required Tech Lead persona, and the severity level.
|
||||
|
||||
### 3. Context Management
|
||||
* Maintain the history of the conversation.
|
||||
* Decide what information from the history is relevant to the current turn.
|
||||
* Synthesize a coherent prompt for downstream models based on the current context.
|
||||
### Path B: Track Delegation (Sprint Kickoff)
|
||||
* **Trigger:** The PM is handing a defined Track down to a Tier 2 Tech Lead.
|
||||
* **What it Sees (Context):**
|
||||
* **The Target Track:** The specific goal and Acceptance Criteria generated in Path A.
|
||||
* **Module Interfaces (Skeleton View):** Strict AST skeleton (just class/function definitions) *only* for the modules this specific Track is allowed to touch.
|
||||
* **Track Roster:** A list of currently active or completed Tracks to prevent duplicate work.
|
||||
* **What it Ignores:** Unrelated module docs, original massive user prompt, implementation details.
|
||||
* **Output Format:** A compiled "Track Brief" (system prompt + curated file list) passed to instantiate the Tier 2 Tech Lead panel.
|
||||
|
||||
### 4. Response Synthesis
|
||||
* Integrate the raw outputs from Tier 2 models into a final, user-friendly response.
|
||||
* Ensure the tone and style are consistent with user expectations.
|
||||
* Validate that the final response directly addresses the user's original intent.
|
||||
|
||||
## Characteristics
|
||||
* **High Reasoning:** Needs to be strong at logic and instruction following.
|
||||
* **General Purpose:** While not necessarily a domain expert, it must be broad enough to understand any valid user input.
|
||||
* **Speed:** Should ideally be responsive to minimize perceived latency.
|
||||
### Path C: Macro-Merge & Acceptance Review (Severity Resolution)
|
||||
* **Trigger:** A Tier 2 Tech Lead reports "Track Complete" and submits a pull request/diff for a "High Severity" task.
|
||||
* **What it Sees (Context):**
|
||||
* **Original Acceptance Criteria:** The Track's goals.
|
||||
* **Tech Lead's Executive Summary:** A ~200-word explanation of the chosen implementation algorithm.
|
||||
* **The Macro-Diff:** Actual changes made to the codebase.
|
||||
* **Curated Implementation View:** For boundary files, ensuring the merge doesn't break foreign modules.
|
||||
* **What it Ignores:** Tier 3 Worker trial-and-error histories, Tier 4 error logs, raw bodies of unchanged functions.
|
||||
* **Output Format:** "Approved" (commits to memory) OR "Rejected" with specific architectural feedback for Tier 2.
|
||||
@@ -1,28 +0,0 @@
|
||||
# Tier 2: Specialized Models (Experts/Tools)
|
||||
|
||||
Tier 2 consists of a collection of specialized agents, models, or tools, each optimized for a specific domain or task. This allows the system to leverage "best-in-class" capabilities for different problems.
|
||||
|
||||
## Key Responsibilities
|
||||
|
||||
### 1. Task Execution
|
||||
* Perform deep processing in a specific area (e.g., writing Python code, generating images, performing complex mathematical calculations).
|
||||
* Operate within the constraints provided by the Tier 1 Orchestrator.
|
||||
|
||||
### 2. Domain Expertise
|
||||
* Provide specialized knowledge that a general model might lack.
|
||||
* Utilize specialized formatting or protocols (e.g., returning structured JSON for data analysis tools).
|
||||
|
||||
### 3. Tool Integration
|
||||
* Act as wrappers for external APIs or local scripts (e.g., `shell_runner` in Manual Slop).
|
||||
* Manage its own internal state or "scratchpad" during complex multi-step operations.
|
||||
|
||||
## Common Specialist Examples
|
||||
* **Code Expert:** Optimized for high-quality software engineering and debugging.
|
||||
* **Search/Web Tool:** Specialized in retrieving and summarizing real-time information.
|
||||
* **Data Scientist:** Capable of running statistical models and generating visualizations.
|
||||
* **Creative Writer:** Focused on tone, narrative, and artistic expression.
|
||||
|
||||
## Implementation Principles
|
||||
* **Fine-Tuning:** Models in this tier are often smaller models fine-tuned on specialized datasets.
|
||||
* **Isolation:** Specialists should ideally be stateless or have well-defined, temporary state to prevent cross-contamination.
|
||||
* **Interface Standards:** Use consistent input/output formats (like JSON) to simplify communication with Tier 1.
|
||||
46
MMA_Support/Tier2_TechLead.md
Normal file
46
MMA_Support/Tier2_TechLead.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Tier 2: The Track Conductor (Tech Lead)
|
||||
|
||||
**Designated Models:** Gemini 3.0 Flash, Gemini 2.5 Pro.
|
||||
**Execution Frequency:** Medium.
|
||||
**Core Role:** Module-specific planning, code review, spawning Worker agents, and Topological Dependency Graph management.
|
||||
|
||||
The Tech Lead bridges the gap between high-level architecture and actual code syntax. It operates in a "need-to-know" state, utilizing AST parsing (`file_cache.py`) to keep token counts low while maintaining structural awareness of its assigned modules.
|
||||
|
||||
## Memory Context & Paths
|
||||
|
||||
### Path A: Sprint Planning (Task Delegation)
|
||||
* **Trigger:** Tier 1 (PM) assigns a Track (Epic) and wakes up the Tech Lead.
|
||||
* **What it Sees (Context):**
|
||||
* **The Track Brief:** Acceptance Criteria from Tier 1.
|
||||
* **Curated Implementation View (Target Modules):** AST-extracted class structures, docstrings, and `# [HOT]` function bodies for the 1-3 files this Track explicitly modifies.
|
||||
* **Skeleton View (Foreign Modules):** Only function signatures and return types for external dependencies.
|
||||
* **What it Ignores:** The rest of the repository, the PM's overarching project-planning logic, raw line-by-line code of non-hot functions.
|
||||
* **Output Format:** A JSON array (Godot ECS Flat List format) of discrete Tier 3 `Tickets` (e.g., Ticket 1: *Write DB migration script*, Ticket 2: *Update core API endpoints*), including `depends_on` pointers to construct an execution DAG.
|
||||
|
||||
### Path B: Code Review (Local Integration)
|
||||
* **Trigger:** A Tier 3 Contributor completes a Ticket and submits a diff, OR Tier 4 (QA) flags a persistent failure.
|
||||
* **What it Sees (Context):**
|
||||
* **Specific Ticket Goal:** What the Contributor was instructed to do.
|
||||
* **Proposed Diff:** The exact line changes submitted by Tier 3.
|
||||
* **Test/QA Output:** Relevant logs from Tier 4 compiler checks.
|
||||
* **Curated Implementation View:** To cross-reference the proposed diff against the existing architecture.
|
||||
* **What it Ignores:** The Contributor's internal trial-and-error chat history. It only sees the final submission.
|
||||
* **Output Format:** *Approve* (merges diff into working branch and updates Curated View) or *Reject* (sends technical critique back to Tier 3).
|
||||
|
||||
### Path C: Track Finalization (Upward Reporting)
|
||||
* **Trigger:** All Tier 3 Tickets assigned to this Track are marked "Approved."
|
||||
* **What it Sees (Context):**
|
||||
* **Original Track Brief:** To verify requirements were met.
|
||||
* **Aggregated Track Diff:** The sum total of all changes made across all Tier 3 Tickets.
|
||||
* **Dependency Delta:** A list of any new foreign modules or libraries imported.
|
||||
* **What it Ignores:** The back-and-forth review cycles, original AST Curated View.
|
||||
* **Output Format:** An Executive Summary and the final Macro-Diff, sent back to Tier 1.
|
||||
|
||||
### Path D: Contract-First Delegation (Stub-and-Resolve)
|
||||
* **Trigger:** Tier 2 evaluates a Track and detects a cross-module dependency (or a single massive refactor) requiring an undefined signature.
|
||||
* **Role:** Force Interface-Driven Development (IDD) to prevent hallucination.
|
||||
* **Execution Flow:**
|
||||
1. **Contract Definition:** Splits requirement into a `Stub Ticket`, `Consumer Ticket`, and `Implementation Ticket`.
|
||||
2. **Stub Generation:** Spawns a cheap Tier 3 worker (e.g., DeepSeek V3 `contract_stubber` archetype) to generate the empty function signature, type hints, and docstrings.
|
||||
3. **Skeleton Broadcast:** The stub merges, and the system instantly re-runs Tree-sitter to update the global Skeleton View.
|
||||
4. **Parallel Implementation:** Tier 2 simultaneously spawns the `Consumer` (codes against the skeleton) and the `Implementer` (fills the stub logic) in isolated contexts.
|
||||
@@ -1,27 +0,0 @@
|
||||
# Tier 3: Data & Knowledge Base (Information Layer)
|
||||
|
||||
Tier 3 is the foundational layer that provides the necessary facts, documents, and data required by the higher tiers. It is a passive repository that enables informed reasoning and specialized processing.
|
||||
|
||||
## Key Responsibilities
|
||||
|
||||
### 1. Information Storage
|
||||
* Maintain large-scale repositories of structured data (SQL/NoSQL databases) and unstructured data (PDFs, Markdown files, Codebases).
|
||||
* Host internal company documents, project-specific files, and external knowledge graphs.
|
||||
|
||||
### 2. Retrieval Mechanisms (RAG)
|
||||
* Support efficient querying via Vector Search, keyword indexing, or metadata filtering.
|
||||
* Provide Retrieval-Augmented Generation (RAG) capabilities to enrich the prompts of Tier 2 models with relevant snippets.
|
||||
|
||||
### 3. Contextual Enrichment
|
||||
* Supply specialized models with "ground truth" data to minimize hallucinations.
|
||||
* Manage versioned data to ensure the system reflects the most up-to-date information.
|
||||
|
||||
## Components
|
||||
* **Vector Databases:** (e.g., Pinecone, Milvus, Chroma) for semantic search.
|
||||
* **Traditional Databases:** (e.g., PostgreSQL) for structured business data.
|
||||
* **File Systems:** Local or cloud storage for direct file access.
|
||||
* **External APIs:** Real-time data sources (weather, finance, etc.).
|
||||
|
||||
## Interactions
|
||||
* Tier 2 specialists query Tier 3 to get the data they need to perform their tasks.
|
||||
* Tier 1 may occasionally query Tier 3 directly to determine if sufficient information exists before routing.
|
||||
35
MMA_Support/Tier3_Workers.md
Normal file
35
MMA_Support/Tier3_Workers.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# Tier 3: The Worker Agents (Contributors)
|
||||
|
||||
**Designated Models:** DeepSeek V3/R1, Gemini 2.5 Flash.
|
||||
**Execution Frequency:** High (The core loop).
|
||||
**Core Role:** Generating syntax, writing localized files, running unit tests.
|
||||
|
||||
The engine room of the system. Contributors execute the highest volume of API calls. Their memory context is ruthlessly pruned. By leveraging cheap, fast models, they operate with zero architectural anxiety—they just write the code they are assigned. They are "Amnesiac Workers," having their history wiped between tasks to prevent context ballooning.
|
||||
|
||||
## Memory Context & Paths
|
||||
|
||||
### Path A: Heads Down Execution (Task Execution)
|
||||
* **Trigger:** Tier 2 (Tech Lead) hands down a hyper-specific Ticket.
|
||||
* **What it Sees (Context):**
|
||||
* **The Ticket Prompt:** The exact, isolated instructions from Tier 2.
|
||||
* **The Target File (Raw View):** The raw, unredacted, line-by-line source code of *only* the specific file (or class/function) it was assigned to modify.
|
||||
* **Foreign Interfaces (Skeleton View):** Strict AST skeleton (signatures only) of external dependencies required by the ticket.
|
||||
* **What it Ignores:** Epic/Track goals, Tech Lead's Curated View, other files in the same directory, parallel Tickets.
|
||||
* **Output Format:** XML Tags (`<file_path>`, `<file_content>`) defining direct file modifications or `mcp_client.py` tool payloads.
|
||||
|
||||
### Path B: Trial and Error (Local Iteration & Tool Execution)
|
||||
* **Trigger:** The Contributor runs a local linter/test, encounters a syntax error, or the human pauses execution using "Step" mode.
|
||||
* **What it Sees (Context):**
|
||||
* **Ephemeral Working History:** A short, rolling window of its last 2–3 attempts (e.g., "Attempt 1: Wrote code -> Tool Output: SyntaxError").
|
||||
* **Tier 4 (QA) Injections:** Compressed (20-50 token) fix recommendations from Tier 4 agents (e.g., "Add a closing bracket on line 42").
|
||||
* **Human Mutations:** Any direct edits made to its JSON history payload before proceeding.
|
||||
* **What it Ignores:** Tech Lead code reviews, attempts older than the rolling window (wiped to save tokens).
|
||||
* **Output Format:** Revised tool payloads until tests pass or the human approves.
|
||||
|
||||
### Path C: Task Submission (Micro-Pull Request)
|
||||
* **Trigger:** The code executes cleanly, and "Step" mode is finalized into "Task Complete."
|
||||
* **What it Sees (Context):**
|
||||
* **The Original Ticket:** To confirm instructions were met.
|
||||
* **The Final State:** The cleanly modified file or exact diff.
|
||||
* **What it Ignores:** **All of Path B.** Before submission to Tier 2, the orchestrator wipes the messy trial-and-error history from the payload.
|
||||
* **Output Format:** A concise completion message and the clean diff, sent up to Tier 2.
|
||||
@@ -1,27 +0,0 @@
|
||||
# Tier 4: Monitoring & Feedback (Governance Layer)
|
||||
|
||||
Tier 4 acts as the "supervisor" of the entire architecture. It ensures the system is performing correctly, ethically, and efficiently, while providing a path for continuous evolution.
|
||||
|
||||
## Key Responsibilities
|
||||
|
||||
### 1. Performance Monitoring
|
||||
* Track latency, token usage, and error rates across all tiers.
|
||||
* Identify bottlenecks (e.g., a Tier 2 specialist that is consistently slow).
|
||||
|
||||
### 2. Evaluation & Feedback
|
||||
* Collect explicit user feedback (e.g., "Good/Bad" ratings).
|
||||
* Perform automated evaluation using "LLM-as-a-judge" to score responses based on accuracy, tone, and safety.
|
||||
* Log failures for manual review and human-in-the-loop (HITL) intervention.
|
||||
|
||||
### 3. Error Analysis & Root Cause
|
||||
* Analyze why specific routes failed or why a specialist produced a low-quality output.
|
||||
* Maintain a "lesson learned" database to inform future system prompts or fine-tuning.
|
||||
|
||||
### 4. Continuous Improvement
|
||||
* Inform the retraining or fine-tuning of Tier 2 models based on real-world usage patterns.
|
||||
* Optimize Tier 1 routing logic based on success/failure metrics.
|
||||
|
||||
## Tools & Techniques
|
||||
* **Logging/Observability:** (e.g., LangSmith, Weights & Biases, custom JSON-L logs).
|
||||
* **A/B Testing:** Compare different model versions or routing strategies.
|
||||
* **Red Teaming:** Proactively test the system for vulnerabilities and biases.
|
||||
33
MMA_Support/Tier4_Utility.md
Normal file
33
MMA_Support/Tier4_Utility.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# Tier 4: The Utility Agents (Compiler / QA)
|
||||
|
||||
**Designated Models:** DeepSeek V3 (Lowest cost possible).
|
||||
**Execution Frequency:** On-demand (Intercepts local failures).
|
||||
**Core Role:** Single-shot, stateless translation of machine garbage into human English.
|
||||
|
||||
Tier 4 acts as the financial firewall. It solves the expensive problem of feeding massive (e.g., 3,000-token) stack traces back into a mid-tier LLM's context window. Tier 4 agents wake up, translate errors, and immediately die.
|
||||
|
||||
## Memory Context & Paths
|
||||
|
||||
### Path A: The Stack Trace Interceptor (Translator)
|
||||
* **Trigger:** A Tier 3 Contributor executes a script, resulting in a non-zero exit code with a massive `stderr` payload.
|
||||
* **What it Sees (Context):**
|
||||
* **Raw Error Output:** The exact traceback from the runtime/compiler.
|
||||
* **Offending Snippet:** *Only* the specific function or 20-line block of code where the error originated.
|
||||
* **What it Ignores:** Everything else. It is blind to the "Why" and focuses only on "What broke."
|
||||
* **Output Format:** A surgical, highly compressed string (20-50 tokens) passed back into the Tier 3 Contributor's working memory (e.g., "Syntax Error on line 42: You missed a closing parenthesis. Add `]`").
|
||||
|
||||
### Path B: The Linter / Formatter (Pedant)
|
||||
* **Trigger:** Tier 3 believes it finished a Ticket, but pre-commit hooks (e.g., `ruff`, `eslint`) fail.
|
||||
* **What it Sees (Context):**
|
||||
* **Linter Warning:** Specific error (e.g., "Line too long", "Missing type hint").
|
||||
* **Target File:** Code written by Tier 3.
|
||||
* **What it Ignores:** Business logic. It only cares about styling rules.
|
||||
* **Output Format:** A direct `sed` command or silent diff overwrite via tools to fix the formatting without bothering Tier 2 or consuming Tier 3 loops.
|
||||
|
||||
### Path C: The Flaky Test Debugger (Isolator)
|
||||
* **Trigger:** A localized unit test fails due to logic (e.g., `assert 5 == 4`), not a syntax crash.
|
||||
* **What it Sees (Context):**
|
||||
* **Failing Test Function:** The exact `pytest` or `go test` block.
|
||||
* **Target Function:** The specific function being tested.
|
||||
* **What it Ignores:** The rest of the test suite and module.
|
||||
* **Output Format:** A quick diagnosis sent to Tier 3 (e.g., "The test expects an integer, but your function is currently returning a stringified float. Cast to `int`").
|
||||
Reference in New Issue
Block a user