Private

Public Access

Files

T

conductor-tier2 161ebb0da6 docs(fix): correct nav link case + relative-path level

Gitea (and any case-sensitive filesystem) was rendering the [Top]
nav links in /docs as broken because of two bugs:

1. Case-sensitivity: 22 links used '../README.md' (all-uppercase)
   but the actual file is 'docs/Readme.md' (capital R, lowercase
   rest). 21 guide_*.md nav bars were affected, plus 1 internal
   cross-link in Readme.md itself. Works on Windows (case-
   insensitive) but broken on Linux/Gitea.

   Fix: 22 occurrences across 22 files changed
   '../README.md' -> '../Readme.md'

2. Wrong relative-path level: 16 links used '../../conductor/...'
   from 'docs/guide_*.md' to reach 'conductor/'. This goes up 2
   levels to 'projects/', which doesn't exist. The correct path
   from 'docs/guide_*.md' to 'conductor/' is 1 level up
   ('../conductor/...'). 12 unique patterns across 10 files
   affected.

   Fix: 16 occurrences across 10 files changed
   '../../conductor/' -> '../conductor/'

3. Bonus: 1 planned-guide link in guide_context_curation.md
   referenced a never-written 'guide_context_presets.md'. The
   ContextPreset schema is now fully covered in the new
   'guide_context_aggregation.md' (per the 2026-06-08 docs
   refresh). Fix: link target updated.

No content was changed, only link paths. 24 files, 37 link
replacements, 37 deletions.

Verification:
- All .md links in docs/ now resolve to existing files
  (validated by path-resolution check from each file's directory)
- The 3 new guides from the previous docs refresh commit
  (guide_discussions.md, guide_state_lifecycle.md,
  guide_context_aggregation.md) had the case bug inherited from
  guide_architecture.md's existing nav pattern; their top-of-file
  nav bars are now correct
- The 21 pre-existing guide nav bars that had the same bug
  (all 21 of them, except the 3 that used the correct case:
  guide_mma.md, guide_simulations.md, guide_tools.md) are now
  also fixed
- Inter-guide links (e.g. [Discussions](guide_discussions.md))
  were not affected; they were always correct because both the
  link text and the actual filename are lowercase

This is a docs-only fix. No code modified.

2026-06-08 19:51:55 -04:00

25 KiB

Raw Blame History

Context Aggregation: How Manual Slop Builds the AI's Context

Top | Discussions | Context Curation | Models | Architecture

Overview

src/aggregate.py (518 lines) is the context composition pipeline — the single function that turns a project's files + screenshots + history config into the final markdown string the AI sees. It is called by:

src/ai_client.py:_send_anthropic, _send_deepseek, _send_gemini, _send_gemini_cli, _send_minimax (every provider)
src/app_controller.py:AppController._do_generate (the main send path)
src/app_controller.py:AppController._cb_start_track, AppController._process_event_queue, AppController._start_track_logic (MMA paths)
src/gui_2.py:App.run, App.main, App._render_snapshot_tab (the GUI and the prior-session replay)
simulation/sim_base.py:run_sim and 6 other simulation entry points

This is one of the most-touched modules in the project. After the nagent_review, this pipeline is recognized as Manual Slop's strongest curation dimension (vs nagent's conversation-log dimension). See conductor/tracks/nagent_review_20260608/report.md §6 and decisions.md candidate #7 for the related future-track.

Domain classification. The pipeline is Application-domain. The MMA sub-agents consume it but the pipeline itself does not call into Meta-Tooling code. See guide_meta_boundary.md.

The Pipeline At A Glance

aggregate.run(config, aggregation_strategy)
  ├─ find_next_increment(output_dir, namespace)    # next file number for output
  ├─ build_file_items(base_dir, files)             # read + view-mode transform
  ├─ build_markdown_from_items(file_items, ...)   # compose sections
  │   ├─ ## Files (or Files (Summary) or Files (Tier 3 - Focused))
  │   │   └─ _build_files_section_from_items OR summarize.build_summary_markdown
  │   ├─ ## Screenshots (if any)
  │   ├─ ## Beads Mode: Progress Track (if execution_mode == "beads")
  │   └─ ## Discussion History (if any)
  └─ output_file.write_text(markdown)

The output is a markdown file at {output_dir}/{namespace}_{NNN}.md where NNN is a zero-padded increment. The pipeline does not send the markdown — that's the AI client's job. The pipeline produces the markdown.

The return value is (markdown: str, output_file: Path, file_items: list[dict]). The file_items list is reused by callers that want to inspect the read state without re-reading from disk.

The Three Aggregation Strategies

aggregation_strategy: str selects how files are rendered. The values:

Strategy	File rendering	History rendering	Tier 3 handling	Use case
`auto`	If `summary_only` is True → summary; else → full	Standard	Standard	Default. Reads `config.project.summary_only`.
`summarize`	Always `summarize.build_summary_markdown(file_items)` (compact multi-file view)	Standard	Standard	Token-budget-constrained runs.
`full`	Always `_build_files_section_from_items(file_items)` (full content)	Standard	Standard	Debugging; when you want the AI to see everything.

Implementation: aggregate.py:330-346 build_markdown_from_items. The three-way dispatch is at lines 335-339:

if   aggregation_strategy == "summarize": parts.append("## Files (Summary)\n\n" + summarize.build_summary_markdown(file_items))
elif aggregation_strategy == "full":      parts.append("## Files\n\n"           + _build_files_section_from_items(file_items))
else: # auto
    if summary_only: parts.append("## Files (Summary)\n\n" + summarize.build_summary_markdown(file_items))
    else:            parts.append("## Files\n\n"           + _build_files_section_from_items(file_items))

The auto strategy is the only one that respects config.project.summary_only; the other two are explicit overrides. Personas can also set aggregation_strategy (per guide_personas.md), and a persona-set strategy overrides the config-level setting.

View Modes — The Per-File Transform

view_mode: str is the per-file content transform. The value is set on the FileItem (or the legacy dict-shaped config entry) and determines how the file's bytes are rendered into the markdown.

View mode	Behavior	Source
`full`	Raw `path.read_text(encoding="utf-8")` content.	`aggregate.py:205`
`summary`	`summarize.summarise_file(path, content)` — heuristic summary from `src/summarize.py`.	`aggregate.py:210`
`skeleton`	For `.py`: `ASTParser("python").get_skeleton(content)` (tree-sitter). For `.c`/`.h`: `mcp_client.ts_c_get_skeleton`. For `.cpp`/`.hpp`: `mcp_client.ts_cpp_get_skeleton`. Other → summary.	`aggregate.py:211-220`
`outline`	For `.py`: `ASTParser("python").get_code_outline(content)`. For C/C++: `mcp_client.ts_c*_get_code_outline`. Other → summary.	`aggregate.py:221-230`
`masked`	For each `{symbol: mode}` in `ast_mask`, fetch `def` or `sig` via `mcp_client.py/ts_*_get_definition/signature`. Concatenate.	`aggregate.py:231-249`
`none`	Literal string `"(context excluded)"` — the file is in the file_items list but contributes no content.	`aggregate.py:250`
`custom`	Render only the `custom_slices` from the FileItem. Each slice is a `{start_line, end_line, tag, comment}` dict. Lines outside the slices are excluded.	`aggregate.py:251-266`

The default view mode is full. The persona can override via Persona.aggregation_strategy; the FileItem can override via FileItem.view_mode or FileItem.force_full (which forces full regardless of the FileItem's own setting).

Errors are graceful. A FileNotFoundError produces f"ERROR: file not found: {path}" content with error: True and mtime: 0.0. A view_mode that throws produces f"ERROR in {view_mode} view mode for {path}:\n{traceback.format_exc()}". Errors do not halt the pipeline.

The FileItem Schema (Full)

src/models.py:510-559 FileItem is the per-file curation memory that nagent_review identified as Manual Slop's strongest dimension. The dataclass has 9 mutable fields + a __post_init__ normalizer:

@dataclass
class FileItem:
    path:            str                # the artifact identity (path-keyed, no inode)
    auto_aggregate:  bool = True       # include in auto-aggregation? (skip in build_*_from_items if False)
    force_full:      bool = False      # bypass view_mode; force raw content
    view_mode:       str = 'full'      # one of: full, summary, skeleton, outline, masked, custom, none
    selected:        bool = False      # for batch operations (the Context Panel multi-select)
    ast_signatures:  bool = False      # include only signatures (skeleton-equivalent shortcut)
    ast_definitions: bool = False      # include only definitions (skeleton-equivalent shortcut)
    ast_mask:        dict[str, str]    # per-symbol mask: {symbol_path: 'def'|'sig'|'hide'} (from Structural File Editor)
    custom_slices:   list[dict]        # Fuzzy Anchor slices: {start_line, end_line, tag, comment, ...}
    injected_at:     Optional[float]   # timestamp of last injection

The 9 fields are all serialized by to_dict() and all deserialized by from_dict() (with .get(..., default) for forward compatibility). The dataclass is round-trip-safe through TOML.

__post_init__ normalizes custom_slices: each slice dict gets tag=None and comment=None defaults added so downstream code can .get("tag") safely.

The Custom Slice Schema

A custom_slices entry is {start_line, end_line, tag, comment, ...} (plus Fuzzy Anchor metadata). The full schema is in src/fuzzy_anchor.py:FuzzyAnchor.create_slice:

{
    "start_line":  int,        # 1-based original line
    "end_line":    int,        # 1-based original line (inclusive)
    "tag":         str|None,   # human label, defaults to None
    "comment":     str|None,   # human comment, defaults to None
    "content_hash": str,       # SHA-256 of the slice content (for Fuzzy Anchor stability)
    "anchor_lines": [str, ...],# surrounding context for re-resolution
    # plus the original positioning metadata
}

When view_mode == 'custom', the aggregate.py:251-264 block renders each slice as:

---
[Slice: <tag>] (<comment>)
Lines <start>-<end>:
<content>

Multiple slices in a file are joined with \n\n.

The ContextPreset Schema

src/models.py:909-937 ContextPreset is a named, persisted set of FileItems — a reusable "context composition":

@dataclass
class ContextPreset:
    name:        str                                  # the preset name (used as TOML key)
    files:       list[ContextFileEntry] = field(default_factory=list)
    screenshots: list[str] = field(default_factory=list)
    description: str = ""

ContextFileEntry is a FileItem (or a string path that's promoted to a FileItem on load). The description is a human-readable label for the preset list.

ContextPresetManager (in src/context_presets.py, 30 lines) handles CRUD:

save_preset(preset: ContextPreset) writes to manual_slop.toml or a project TOML
load_all() -> dict[str, ContextPreset] reads all presets
delete_preset(name: str) removes a preset
apply_preset(name: str) switches the active context composition to the named preset

reload_context_presets() (in app_controller.py) is called when the project TOML changes; it validates that all files in the preset still exist and warns the user about any that don't.

Scope: ContextPresets can be Global (in <user_config>/manual_slop.toml) or Project-specific (in the project's manual_slop.toml). Project presets override global presets of the same name. This is the same scope-inheritance pattern as Personas, Presets, and Workspace Profiles.

The Discussion History Section

aggregate.py:109 build_discussion_section(history) is the section that includes the prior conversation:

def build_discussion_section(history: list[Any]) -> str:
    sections = []
    for i, entry in enumerate(history, start=1):
        if isinstance(entry, dict):
            role    = entry.get("role", "Unknown")
            content = entry.get("content", "").strip()
            text    = f"{role}: {content}"
        else:
            text = str(entry).strip()
        sections.append(f"### Discussion Excerpt {i}\n\n{text}")
    return "\n\n---\n\n".join(sections)

The section handles both legacy list[str] (e.g. ["User: ...", "AI: ..."]) and the new list[dict] shape ([{"role": ..., "content": ...}, ...]). The dict shape is what's persisted by _flush_disc_entries_to_project (per app_controller.py:3225-3240) and what's stored in the new format.

The section is named ## Discussion History and is placed at the end of the markdown (after files, screenshots, beads). This is deliberate: the cache-hit-friendly static prefix is at the top, the dynamic history is at the bottom. See guide_architecture.md §"Cache Strategy".

Cache Strategy

The pipeline is structured to maximize provider cache hits. The static prefix (Files + Screenshots + Beads) is the same across all turns of a discussion; only the Discussion History changes. The provider's cache key is the prefix; the history is appended.

build_markdown_no_history (aggregate.py:348-353) is the explicit "static-only" builder used by _do_generate before adding the history. The full builder is build_markdown_from_items which adds the history if non-empty. This split allows the AI client to:

Send the static prefix once.
Append the history to the next send without re-sending the prefix.
Re-use the cached prefix on the third send (if the files haven't changed).

The cache strategy is documented in detail in guide_ai_client.md §"Caching Strategy" and guide_architecture.md §"Cache Hit Strategy".

The Tier-3 Variant

aggregate.py:364-454 build_tier3_context is the MMA worker context — a different layout for sub-agent invocations. The differences from the standard pipeline:

Focus files (passed as focus_files: list[str]) are rendered as full content regardless of their view_mode. A file is a focus file if its entry, name, or path matches one of the focus paths.
Slices are resolved via FuzzyAnchor. If a file has custom_slices and the file content has been modified since the slice was created, the FuzzyAnchor re-resolves the line ranges. This is critical for sub-agents receiving slices that may be stale.
Section header is ## Files (Tier 3 - Focused). Distinct from the standard ## Files so the worker (and its tools) can recognize its own context.
The is_focus check is multi-level. Entry match, name match, path match, and substring match. Sub-agents with looser file-matching needs can pass a focus set that's just a list of basenames.

The Tier 3 build skips the summarize.build_summary_markdown path entirely; every file is rendered with _build_files_section_from_items-style formatting (or the AST skeleton for non-focus Python files, or the AST signature/outline for C/C++).

The Tier 3 build is called from multi_agent_conductor.py:run_worker_lifecycle via aggregate.run(config, aggregation_strategy=tier_strategy).

The Bypass — `force_full`

FileItem.force_full = True short-circuits the view_mode selection:

if force_full: view_mode = "full"

This is set at the FileItem level (not the strategy level). Use case: the user has set a global "skeleton" view mode for the project but wants one specific file to always be inlined in full. The force is per-file and overrides both the FileItem's own view_mode and any strategy-level override.

For Tier 3, force_full is treated as a focus flag:

if is_focus or tier == 3 or force_full:
    # full content, no skeleton

So a force_full=True file in a Tier 3 worker context is treated as a focus file and rendered in full.

Auto-Aggregate Skip

FileItem.auto_aggregate = False causes the file to be included in the file_items list but excluded from the rendered markdown:

for item in file_items:
    if not item.get("auto_aggregate", True): continue
    # ... build section

Use case: the file is in the files list for the AI's awareness (e.g. "you can read it via read_file") but should not be inlined. The file's mtime and view_mode are still tracked; the file is omitted from the rendered markdown.

This is distinct from view_mode == "none":

auto_aggregate = False → file is not in the rendered markdown at all (no ### File header)
view_mode = "none" → file is in the rendered markdown as ### File (excluded) with a "(context excluded)" body

The two are useful for different scenarios. auto_aggregate = False is for "the AI knows the file exists, can read it on demand." view_mode = "none" is for "the AI knows we deliberately excluded this content."

Screenshots

aggregate.py:126-140 build_screenshots_section renders the screenshots list as a ## Screenshots markdown section. Each screenshot is rendered as ![name](path) (markdown image syntax). Path resolution uses resolve_paths (same as for files), so wildcards and absolute paths work.

Screenshots are placed after Files and before Beads and Discussion History. This is a deliberate ordering: the AI sees the project's files first (the static content), then the screenshots (the visual context), then the beads status (if applicable), then the discussion history (the dynamic content).

Beads Mode

When execution_mode == "beads" (set in config.project.execution_mode), the pipeline appends a ## Beads Mode: Progress Track section between Screenshots and Discussion History. The section is built by aggregate.py:309-328 build_beads_section:

Lists all completed beads as a comma-separated list
Lists all active beads as bullet points with title, id, and description

build_beads_section returns an empty string if the project is not a Beads project (client.is_initialized() is False) or if there are no beads. The caller (build_markdown_from_items) checks the truthiness before appending.

See guide_beads.md for the full Beads integration.

Output File Numbering

find_next_increment(output_dir, namespace) (aggregate.py:36-44) scans output_dir for files matching ^{namespace}_(\d+)\.md$ and returns max_num + 1. The output filename is {namespace}_{NNN:03d}.md (zero-padded to 3 digits). The increment starts at 1 and grows monotonically.

The increment is the artifact identity for the conversation. Each turn produces a new file. The current implementation does not delete old files; the LogPruner (per guide_architecture.md) handles cleanup separately.

Pipeline Callers

aggregate.run is called from many places. The most important:

Caller	Purpose
`src/ai_client.py:_send_anthropic`	Build the markdown for an Anthropic send.
`src/ai_client.py:_send_gemini`	Build the markdown for a Gemini send.
`src/ai_client.py:_send_deepseek`	Build the markdown for a DeepSeek send.
`src/ai_client.py:_send_gemini_cli`	Build the markdown for a Gemini CLI send.
`src/ai_client.py:_send_minimax`	Build the markdown for a MiniMax send.
`src/app_controller.py:AppController._do_generate`	The main 1:1 send path.
`src/app_controller.py:AppController._cb_start_track`	Start a new MMA track.
`src/app_controller.py:AppController._process_event_queue`	Process a queued event (e.g. send, switch discussion).
`src/multi_agent_conductor.py:run_worker_lifecycle`	Spawn a Tier 3 worker (with Tier 3 context).
`src/gui_2.py:App.run`	The main GUI loop.
`src/gui_2.py:App._render_snapshot_tab`	Render a prior-session replay snapshot.
`simulation/sim_base.py:run_sim`	Run a simulation.

The aggregation strategy is set per-call:

The main _do_generate uses config.project.aggregation_strategy (which is the persona-set strategy if a persona is active).
MMA worker contexts use the worker's aggregation_strategy from the ticket config.
The simulation uses a fixed auto.

Public API Surface

The public API of aggregate.py is:

Function	Signature	Purpose
`find_next_increment`	`(output_dir: Path, namespace: str) -> int`	Next file number for output.
`resolve_paths`	`(base_dir: Path, entry: str) -> list[Path]`	Expand globs and absolute paths. Blacklist `history.toml` and `*_history.toml`.
`group_files_by_dir`	`(files: list[Any]) -> dict[str, list[Any]]`	Group FileItems by relative directory path (used by the Context Panel UI).
`compute_file_stats`	`(abs_path: str) -> dict[str, int]`	Line count + AST element count for Python files.
`build_file_items`	`(base_dir, files) -> list[dict]`	Read + view-mode transform per file. The most-called function.
`build_discussion_section`	`(history) -> str`	Render the `## Discussion History` markdown.
`build_screenshots_section`	`(base_dir, screenshots) -> str`	Render the `## Screenshots` markdown.
`build_beads_section`	`(base_dir) -> str`	Render the `## Beads Mode: Progress Track` markdown.
`build_markdown_from_items`	`(file_items, screenshot_base_dir, screenshots, history, summary_only, aggregation_strategy, execution_mode, base_dir) -> str`	Compose all sections. The "compose" function.
`build_markdown_no_history`	`(file_items, screenshot_base_dir, screenshots, summary_only, aggregation_strategy) -> str`	Compose without history (for stable caching).
`build_discussion_text`	`(history) -> str`	Just the history section, for callers that want to append to a pre-built static prefix.
`build_tier3_context`	`(file_items, screenshot_base_dir, screenshots, history, focus_files) -> str`	Tier 3 worker context.
`build_markdown`	`(base_dir, files, screenshot_base_dir, screenshots, history, summary_only, execution_mode) -> str`	Convenience: read files + compose.
`run`	`(config, aggregation_strategy) -> tuple[str, Path, list[dict]]`	The full pipeline.
`main`	`() -> None`	CLI entry point. Loads config, calls `run`, prints output path.

Performance: the entire pipeline is O(N) in the number of files, with the per-file AST work being the most expensive step. build_tier3_context includes with get_monitor().scope("build_tier3_context") (and similar for build_file_items and build_markdown_no_history) for performance monitoring. The monitor is documented in guide_architecture.md §"Performance".

Performance Considerations

The view_mode selection has a meaningful performance impact:

view_mode	Per-file cost	When to use
`full`	1 file read + string concat	Small files, files the user is actively editing.
`summary`	1 file read + 1 heuristic call to `summarize.summarise_file`	Large files where structural info is enough.
`skeleton`	1 file read + 1 tree-sitter parse + skeleton build	Python/C/C++ files where the structure matters more than the content.
`outline`	1 file read + 1 tree-sitter parse + outline build	When the AI only needs the public API surface.
`masked`	1 file read + N `mcp_client.py/ts__get_` calls (one per masked symbol)	When the user has explicitly marked symbols as "def" or "sig".
`none`	1 file read (still reads the bytes, just discards)	When the user wants the file in the list but not in the rendered markdown.
`custom`	1 file read + line slicing per slice	When the user has explicitly created Fuzzy Anchor slices.

The force_full = True and auto_aggregate = False flags skip some of the work:

force_full = True skips the view-mode dispatch and goes straight to raw content.
auto_aggregate = False skips the view-mode dispatch entirely and skips the markdown section build.

For very large codebases (1000+ files), the bottleneck is the tree-sitter parsing for skeleton / outline / masked modes. The Tier 3 builder uses ASTParser("python") lazily (if not parser: parser = ASTParser("python")) so the tree-sitter grammar is loaded only once per pipeline call.

Tests

tests/test_aggregate_flags.py — test_auto_aggregate_skip, test_force_full, test_view_mode_full, test_view_mode_summary, test_view_mode_skeleton, test_view_mode_outline, test_view_mode_none, test_view_mode_custom, test_view_mode_masked
tests/test_aggregate_beads.py — test_build_beads_compaction
tests/test_context_composition_phase3.py — test_group_files_by_dir, test_compute_file_stats
tests/test_context_composition_phase6.py — test_view_mode_default_summary, test_view_mode_full, test_view_mode_none, test_view_mode_outline, test_view_mode_skeleton, test_view_mode_summary, test_view_mode_custom, test_view_mode_custom_empty_default_to_summary, test_files_section_rendering
tests/test_tiered_context.py — test_build_tier3_context_exists, test_build_tier3_context_ast_skeleton, test_build_tier3_context_scaling, test_tiered_context_by_tier_field, test_build_file_items_with_tiers, test_build_files_section_with_dicts
tests/test_ast_masking_core.py — test_ast_masking_gencpp_samples
tests/test_gencpp_full_suite.py — test_gencpp_full_suite
tests/test_perf_aggregate.py — test_build_tier3_context_scaling
tests/test_history_management.py — test_aggregate_blacklist, test_aggregate_includes_segregated_history, test_aggregate_respects_*
tests/test_ui_summary_only_removal.py — test_aggregate_from_items_respects_auto_aggregate
tests/test_aggregate_helpers.py — test_resolve_paths_blacklist, test_resolve_paths_glob, test_resolve_paths_absolute
tests/test_aggregate_perf.py — test_find_next_increment_*

Cross-References

The pipeline source: src/aggregate.py (518 lines)
FileItem schema: src/models.py:510-559 FileItem
ContextPreset schema: src/models.py:909-937 ContextPreset
ContextPresetManager: src/context_presets.py (30 lines)
AI client consumption: src/ai_client.py:_send_<provider> × 5, see guide_ai_client.md
Tier 3 worker consumption: src/multi_agent_conductor.py:run_worker_lifecycle, see guide_multi_agent_conductor.md
Per-file curation features: guide_context_curation.md (Fuzzy Anchors, AST Inspector, Granular AST Control)
Cache strategy: guide_architecture.md §"Cache Hit Strategy", guide_ai_client.md §"Caching"
Discussion section builder: guide_discussions.md §"Persistence", src/aggregate.py:109 build_discussion_section
Deep-dive on the design philosophy: conductor/tracks/nagent_review_20260608/report.md §6 (per-file memory)
Actionable patterns for richer per-file memory: conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md §4 (file_id), §6 (git history), §7 (Meta-Tooling DSL)
Future-track candidate for per-file conversation log: conductor/tracks/nagent_review_20260608/decisions.md candidate #7

25 KiB Raw Blame History Unescape Escape