# Data Pipelines, Memory Views & Configuration

The 4-Tier Architecture relies on strictly managed data pipelines and configuration files to prevent token bloat and maintain a deterministically safe execution environment.

## 1. AST Extraction Pipelines (Memory Views)

To prevent LLMs from hallucinating or consuming massive context windows, raw file text is heavily restricted. The `file_cache.py` uses Tree-sitter for deterministic Abstract Syntax Tree (AST) parsing to generate specific views:

1.  **The Directory Map (Tier 1):** Just filenames and nested paths (e.g., output of `tree /F`). No source code.
2.  **The Skeleton View (Tier 2 & 3 Dependencies):** Extracts only `class` and `def` signatures, parameters, and type hints. Strips all docstrings and function bodies, replacing them with `pass`. Used for foreign modules a worker must call but not modify.
3.  **The Curated Implementation View (Tier 2 Target Modules):**
    *   Keeps class/struct definitions.
    *   Keeps module-level docstrings and block comments (heuristics).
    *   Keeps full bodies of functions marked with `@core_logic` or `# [HOT]`.
    *   Replaces standard function bodies with `... # Hidden`.
4.  **The Raw View (Tier 3 Target File):** Unredacted, line-by-line source code of the *single* file a Tier 3 worker is assigned to modify.

## 2. Configuration Schema

The architecture separates sensitive billing logic from AI behavior routing.

*   **`credentials.toml` (Security Prerequisite):** Holds the bare metal authentication (`gemini_api_key`, `anthropic_api_key`, `deepseek_api_key`). **This file must be in `.gitignore`.** Loaded strictly for instantiating HTTP clients.
*   **`project.toml` (Repo Rules):** Holds repository-specific bounds (e.g., "This project uses Python 3.12 and strictly follows PEP8").
*   **`agents.toml` (AI Routing):** Defines the hardcoded hierarchy's operational behaviors. Includes fallback models (`default_expensive`, `default_cheap`), Tier 1/2 overarching parameters (temperature, base system prompts), and Tier 3 worker archetypes (`refactor`, `codegen`, `contract_stubber`) mapped to specific models (DeepSeek V3, Gemini Flash) and `trust_level` tags (`step` vs. `auto`).

## 3. LLM Output Formats

To ensure robust parser execution and avoid JSON string-escaping nightmares, the architecture uses a hybrid approach for LLM outputs depending on the Tier:

*   **Native Structured Outputs (JSON Schema forced by API):** Used for Tier 1 and Tier 2 routing and orchestration. The model provider mathematically guarantees the syntax, allowing clean parsing of `Track` and `Ticket` metadata by `pydantic`.
*   **XML Tags (`<file_path>`, `<file_content>`):** Used for Tier 3 Code Generation & Tools. It natively isolates syntax and requires zero string escaping. The UI/Orchestrator parses these via regex to safely extract raw Python code without bracket-matching failures.
*   **Godot ECS Flat List (Linearized Entities with ID Pointers):** Instead of deeply nested JSON (which models hallucinate across 500 tokens), Tier 1/2 Orchestrators define complex dependency DAGs as a flat list of items (e.g., `[Ticket id="tkt_impl" depends_on="tkt_stub"]`). The Python state machine reconstructs the DAG locally.