MMA_Support draft
This commit is contained in:
32
MMA_Support/Data_Pipelines_and_Config.md
Normal file
32
MMA_Support/Data_Pipelines_and_Config.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# Data Pipelines, Memory Views & Configuration
|
||||
|
||||
The 4-Tier Architecture relies on strictly managed data pipelines and configuration files to prevent token bloat and maintain a deterministically safe execution environment.
|
||||
|
||||
## 1. AST Extraction Pipelines (Memory Views)
|
||||
|
||||
To prevent LLMs from hallucinating or consuming massive context windows, raw file text is heavily restricted. The `file_cache.py` uses Tree-sitter for deterministic Abstract Syntax Tree (AST) parsing to generate specific views:
|
||||
|
||||
1. **The Directory Map (Tier 1):** Just filenames and nested paths (e.g., output of `tree /F`). No source code.
|
||||
2. **The Skeleton View (Tier 2 & 3 Dependencies):** Extracts only `class` and `def` signatures, parameters, and type hints. Strips all docstrings and function bodies, replacing them with `pass`. Used for foreign modules a worker must call but not modify.
|
||||
3. **The Curated Implementation View (Tier 2 Target Modules):**
|
||||
* Keeps class/struct definitions.
|
||||
* Keeps module-level docstrings and block comments (heuristics).
|
||||
* Keeps full bodies of functions marked with `@core_logic` or `# [HOT]`.
|
||||
* Replaces standard function bodies with `... # Hidden`.
|
||||
4. **The Raw View (Tier 3 Target File):** Unredacted, line-by-line source code of the *single* file a Tier 3 worker is assigned to modify.
|
||||
|
||||
## 2. Configuration Schema
|
||||
|
||||
The architecture separates sensitive billing logic from AI behavior routing.
|
||||
|
||||
* **`credentials.toml` (Security Prerequisite):** Holds the bare metal authentication (`gemini_api_key`, `anthropic_api_key`, `deepseek_api_key`). **This file must be in `.gitignore`.** Loaded strictly for instantiating HTTP clients.
|
||||
* **`project.toml` (Repo Rules):** Holds repository-specific bounds (e.g., "This project uses Python 3.12 and strictly follows PEP8").
|
||||
* **`agents.toml` (AI Routing):** Defines the hardcoded hierarchy's operational behaviors. Includes fallback models (`default_expensive`, `default_cheap`), Tier 1/2 overarching parameters (temperature, base system prompts), and Tier 3 worker archetypes (`refactor`, `codegen`, `contract_stubber`) mapped to specific models (DeepSeek V3, Gemini Flash) and `trust_level` tags (`step` vs. `auto`).
|
||||
|
||||
## 3. LLM Output Formats
|
||||
|
||||
To ensure robust parser execution and avoid JSON string-escaping nightmares, the architecture uses a hybrid approach for LLM outputs depending on the Tier:
|
||||
|
||||
* **Native Structured Outputs (JSON Schema forced by API):** Used for Tier 1 and Tier 2 routing and orchestration. The model provider mathematically guarantees the syntax, allowing clean parsing of `Track` and `Ticket` metadata by `pydantic`.
|
||||
* **XML Tags (`<file_path>`, `<file_content>`):** Used for Tier 3 Code Generation & Tools. It natively isolates syntax and requires zero string escaping. The UI/Orchestrator parses these via regex to safely extract raw Python code without bracket-matching failures.
|
||||
* **Godot ECS Flat List (Linearized Entities with ID Pointers):** Instead of deeply nested JSON (which models hallucinate across 500 tokens), Tier 1/2 Orchestrators define complex dependency DAGs as a flat list of items (e.g., `[Ticket id="tkt_impl" depends_on="tkt_stub"]`). The Python state machine reconstructs the DAG locally.
|
||||
Reference in New Issue
Block a user