3.2 KiB
3.2 KiB
Data Pipelines, Memory Views & Configuration
The 4-Tier Architecture relies on strictly managed data pipelines and configuration files to prevent token bloat and maintain a deterministically safe execution environment.
1. AST Extraction Pipelines (Memory Views)
To prevent LLMs from hallucinating or consuming massive context windows, raw file text is heavily restricted. The file_cache.py uses Tree-sitter for deterministic Abstract Syntax Tree (AST) parsing to generate specific views:
- The Directory Map (Tier 1): Just filenames and nested paths (e.g., output of
tree /F). No source code. - The Skeleton View (Tier 2 & 3 Dependencies): Extracts only
classanddefsignatures, parameters, and type hints. Strips all docstrings and function bodies, replacing them withpass. Used for foreign modules a worker must call but not modify. - The Curated Implementation View (Tier 2 Target Modules):
- Keeps class/struct definitions.
- Keeps module-level docstrings and block comments (heuristics).
- Keeps full bodies of functions marked with
@core_logicor# [HOT]. - Replaces standard function bodies with
... # Hidden.
- The Raw View (Tier 3 Target File): Unredacted, line-by-line source code of the single file a Tier 3 worker is assigned to modify.
2. Configuration Schema
The architecture separates sensitive billing logic from AI behavior routing.
credentials.toml(Security Prerequisite): Holds the bare metal authentication (gemini_api_key,anthropic_api_key,deepseek_api_key). This file must be in.gitignore. Loaded strictly for instantiating HTTP clients.project.toml(Repo Rules): Holds repository-specific bounds (e.g., "This project uses Python 3.12 and strictly follows PEP8").agents.toml(AI Routing): Defines the hardcoded hierarchy's operational behaviors. Includes fallback models (default_expensive,default_cheap), Tier 1/2 overarching parameters (temperature, base system prompts), and Tier 3 worker archetypes (refactor,codegen,contract_stubber) mapped to specific models (DeepSeek V3, Gemini Flash) andtrust_leveltags (stepvs.auto).
3. LLM Output Formats
To ensure robust parser execution and avoid JSON string-escaping nightmares, the architecture uses a hybrid approach for LLM outputs depending on the Tier:
- Native Structured Outputs (JSON Schema forced by API): Used for Tier 1 and Tier 2 routing and orchestration. The model provider mathematically guarantees the syntax, allowing clean parsing of
TrackandTicketmetadata bypydantic. - XML Tags (
<file_path>,<file_content>): Used for Tier 3 Code Generation & Tools. It natively isolates syntax and requires zero string escaping. The UI/Orchestrator parses these via regex to safely extract raw Python code without bracket-matching failures. - Godot ECS Flat List (Linearized Entities with ID Pointers): Instead of deeply nested JSON (which models hallucinate across 500 tokens), Tier 1/2 Orchestrators define complex dependency DAGs as a flat list of items (e.g.,
[Ticket id="tkt_impl" depends_on="tkt_stub"]). The Python state machine reconstructs the DAG locally.