Data Pipelines, Memory Views & Configuration

The 4-Tier Architecture relies on strictly managed data pipelines and configuration files to prevent token bloat and maintain a deterministically safe execution environment.

1. AST Extraction Pipelines (Memory Views)

To prevent LLMs from hallucinating or consuming massive context windows, raw file text is heavily restricted. The file_cache.py uses Tree-sitter for deterministic Abstract Syntax Tree (AST) parsing to generate specific views:

The Directory Map (Tier 1): Just filenames and nested paths (e.g., output of tree /F). No source code.
The Skeleton View (Tier 2 & 3 Dependencies): Extracts only class and def signatures, parameters, and type hints. Strips all docstrings and function bodies, replacing them with pass. Used for foreign modules a worker must call but not modify.
The Curated Implementation View (Tier 2 Target Modules):
- Keeps class/struct definitions.
- Keeps module-level docstrings and block comments (heuristics).
- Keeps full bodies of functions marked with @core_logic or # [HOT].
- Replaces standard function bodies with ... # Hidden.
The Raw View (Tier 3 Target File): Unredacted, line-by-line source code of the single file a Tier 3 worker is assigned to modify.

2. Configuration Schema

The architecture separates sensitive billing logic from AI behavior routing.

credentials.toml (Security Prerequisite): Holds the bare metal authentication (gemini_api_key, anthropic_api_key, deepseek_api_key). This file must be in .gitignore. Loaded strictly for instantiating HTTP clients.
project.toml (Repo Rules): Holds repository-specific bounds (e.g., "This project uses Python 3.12 and strictly follows PEP8").
agents.toml (AI Routing): Defines the hardcoded hierarchy's operational behaviors. Includes fallback models (default_expensive, default_cheap), Tier 1/2 overarching parameters (temperature, base system prompts), and Tier 3 worker archetypes (refactor, codegen, contract_stubber) mapped to specific models (DeepSeek V3, Gemini Flash) and trust_level tags (step vs. auto).

3. LLM Output Formats

To ensure robust parser execution and avoid JSON string-escaping nightmares, the architecture uses a hybrid approach for LLM outputs depending on the Tier:

Native Structured Outputs (JSON Schema forced by API): Used for Tier 1 and Tier 2 routing and orchestration. The model provider mathematically guarantees the syntax, allowing clean parsing of Track and Ticket metadata by pydantic.
XML Tags (<file_path>, <file_content>): Used for Tier 3 Code Generation & Tools. It natively isolates syntax and requires zero string escaping. The UI/Orchestrator parses these via regex to safely extract raw Python code without bracket-matching failures.
Godot ECS Flat List (Linearized Entities with ID Pointers): Instead of deeply nested JSON (which models hallucinate across 500 tokens), Tier 1/2 Orchestrators define complex dependency DAGs as a flat list of items (e.g., [Ticket id="tkt_impl" depends_on="tkt_stub"]). The Python state machine reconstructs the DAG locally.

3.2 KiB Raw Blame History

Data Pipelines, Memory Views & Configuration

1. AST Extraction Pipelines (Memory Views)

2. Configuration Schema

3. LLM Output Formats

3.2 KiB

Raw Blame History