SSDL Report: Context Curation & Caching Pipeline

Track/Context: Technical Architecture Reference
Date: 2026-06-13
Status: Completed
Subject: SSDL trace and architectural analysis of the context curation and aggregation pipeline.

1. Architectural Overview

The Context Curation Pipeline (src/aggregate.py) compiles the active state of the workspace (source code files, screenshot attachments, history logs, and RAG search results) into a unified context document that is injected into the LLM prompt.

To control the token footprint and avoid overloading model context windows, the aggregator dynamically applies multiple compression formats (full, diffs, AST skeletons, signatures, slices, or summaries) depending on the active agent tier, persona configuration, and user overrides.

2. SSDL Topology Diagram

This diagram displays the execution shapes ([I], ->, [Q], [S], [B], [M], o->) inside the context compilation process:

===================================================================================================
                                  CONTEXT PIPELINE TOPOLOGY
===================================================================================================

[Q:flat_config]
       │
       ▼
[I:build_file_items] (read file sizes, contents, mtimes)
       │
       ▼
o-> [B:Focus or Tier 3?] ─── yes ───► [I:Render Full File Content] ────┐
       │                                                                │
       └─ no                                                            │
           │                                                            │
           ├─ [B:Slices Configured?] ─── yes ───► [I:FuzzyAnchor.resolve] ────┐
           │                                      (skipped skip lines)        │
           │                                                                  │
           ├─ [B:AST Symbol Mask?] ──── yes ───► [I:ts_c_get_definition] ─────┐│
           │                                     (extract signature/def)      ││
           │                                                                  ││
           ├─ [B:AST Skeleton/Sig?] ─── yes ───► [I:parser.get_skeleton] ───┐ ││
           │                                                                │ ││
           └─ no ──────────────────────────────► [I:summarise_file] ───────┐│ ││
                                                                           ▼▼ ▼▼
                                                                           └┴─┴┘
                                                                             │
                                                                             ▼
                                                                    [I:build_screenshots]
                                                                    (encode attachments)
                                                                             │
                                                                             ▼
                                                                    [I:build_discussion]
                                                                    (scrollback formatting)
                                                                             │
                                                                             ▼
                                                                    [I:write_output_file]
                                                                    (dump md output)
                                                                             │
                                                                             ▼
                                                                            [T]

3. Context Processing Stages

Stage 1: State Extraction (`build_file_items`)

The pipeline gathers file items, querying filesystem attributes (mtime, byte size, and full UTF-8 contents) to establish a baseline:

SSDL shape: [I:build_file_items] ─── (I/O read files)
Details: Cached metadata is compiled once to avoid double I/O during downstream rendering.

Stage 2: Format Routing Check

For each file item in the workspace context, the pipeline applies a priority-based routing tree to select the optimal token-saving representation:

Amnesia / Priority Bypass: If a file is currently focused in the editor or the active agent is Tier 3 (Worker), the system bypasses compression and dumps the entire file:
[B:Focus or Tier 3?] ─── yes ───► [I:Render Full File Content]
Line Slices: If the user has marked specific line ranges of interest, it resolves them using FuzzyAnchor to survive local mutations, skipping non-targeted lines:
[B:Slices Configured?] ─── yes ───► [I:FuzzyAnchor.resolve]
AST Symbol Masking: If an AST mask is defined, it targets specific classes or methods and extracts their signatures/definitions using AST or Tree-Sitter MCP tools:
[B:AST Symbol Mask?] ──── yes ───► [I:ts_c_get_definition]
AST Skeleton & Outline: Falls back to Python/C/C++ Tree-Sitter AST code-outlines and skeletons:
[B:AST Skeleton/Sig?] ─── yes ───► [I:parser.get_skeleton]
Summarization Fallback: If no specific format is matched, the file is summarized into a high-level text description:
[I:summarise_file]

Stage 3: Aggregation & Serialization

Screenshots: Formats screenshot attachments as base64 or reference links.
Discussion Scrollback: Formats dialogue history into a readable scrollback.
File Dump: Writes the finalized prompt document to an incremented project file (e.g. project_001.md):
[I:write_output_file]

4. Architectural Invariants

Fuzzy Anchor Resilience: Slices and masks do not rely on hardcoded line numbers. The FuzzyAnchor resolving algorithm checks anchor boundaries to guarantee correct slices even if the target file has shifted.
Single-Pass I/O: Files are read from disk exactly once at the beginning of run() to populate file_items, preventing race conditions and race errors if files are modified mid-aggregation.
Token Caching Strategy: Using AST Outlines and Unified Diffs rather than full files helps Anthropic's prompt caching hit rates, avoiding cache invalidation on unrelated code segments.

6.6 KiB Raw Blame History