# Data Pipelines, Memory Views & Configuration The 4-Tier Architecture relies on strictly managed data pipelines and configuration files to prevent token bloat and maintain a deterministically safe execution environment. ## 1. AST Extraction Pipelines (Memory Views) To prevent LLMs from hallucinating or consuming massive context windows, raw file text is heavily restricted. The `file_cache.py` uses Tree-sitter for deterministic Abstract Syntax Tree (AST) parsing to generate specific views: 1. **The Directory Map (Tier 1):** Just filenames and nested paths (e.g., output of `tree /F`). No source code. 2. **The Skeleton View (Tier 2 & 3 Dependencies):** Extracts only `class` and `def` signatures, parameters, and type hints. Strips all docstrings and function bodies, replacing them with `pass`. Used for foreign modules a worker must call but not modify. 3. **The Curated Implementation View (Tier 2 Target Modules):** * Keeps class/struct definitions. * Keeps module-level docstrings and block comments (heuristics). * Keeps full bodies of functions marked with `@core_logic` or `# [HOT]`. * Replaces standard function bodies with `... # Hidden`. 4. **The Raw View (Tier 3 Target File):** Unredacted, line-by-line source code of the *single* file a Tier 3 worker is assigned to modify. ## 2. Configuration Schema The architecture separates sensitive billing logic from AI behavior routing. * **`credentials.toml` (Security Prerequisite):** Holds the bare metal authentication (`gemini_api_key`, `anthropic_api_key`, `deepseek_api_key`). **This file must be in `.gitignore`.** Loaded strictly for instantiating HTTP clients. * **`project.toml` (Repo Rules):** Holds repository-specific bounds (e.g., "This project uses Python 3.12 and strictly follows PEP8"). * **`agents.toml` (AI Routing):** Defines the hardcoded hierarchy's operational behaviors. Includes fallback models (`default_expensive`, `default_cheap`), Tier 1/2 overarching parameters (temperature, base system prompts), and Tier 3 worker archetypes (`refactor`, `codegen`, `contract_stubber`) mapped to specific models (DeepSeek V3, Gemini Flash) and `trust_level` tags (`step` vs. `auto`). ## 3. LLM Output Formats To ensure robust parser execution and avoid JSON string-escaping nightmares, the architecture uses a hybrid approach for LLM outputs depending on the Tier: * **Native Structured Outputs (JSON Schema forced by API):** Used for Tier 1 and Tier 2 routing and orchestration. The model provider mathematically guarantees the syntax, allowing clean parsing of `Track` and `Ticket` metadata by `pydantic`. * **XML Tags (``, ``):** Used for Tier 3 Code Generation & Tools. It natively isolates syntax and requires zero string escaping. The UI/Orchestrator parses these via regex to safely extract raw Python code without bracket-matching failures. * **Godot ECS Flat List (Linearized Entities with ID Pointers):** Instead of deeply nested JSON (which models hallucinate across 500 tokens), Tier 1/2 Orchestrators define complex dependency DAGs as a flat list of items (e.g., `[Ticket id="tkt_impl" depends_on="tkt_stub"]`). The Python state machine reconstructs the DAG locally.