Gitea (and any case-sensitive filesystem) was rendering the [Top]
nav links in /docs as broken because of two bugs:
1. Case-sensitivity: 22 links used '../README.md' (all-uppercase)
but the actual file is 'docs/Readme.md' (capital R, lowercase
rest). 21 guide_*.md nav bars were affected, plus 1 internal
cross-link in Readme.md itself. Works on Windows (case-
insensitive) but broken on Linux/Gitea.
Fix: 22 occurrences across 22 files changed
'../README.md' -> '../Readme.md'
2. Wrong relative-path level: 16 links used '../../conductor/...'
from 'docs/guide_*.md' to reach 'conductor/'. This goes up 2
levels to 'projects/', which doesn't exist. The correct path
from 'docs/guide_*.md' to 'conductor/' is 1 level up
('../conductor/...'). 12 unique patterns across 10 files
affected.
Fix: 16 occurrences across 10 files changed
'../../conductor/' -> '../conductor/'
3. Bonus: 1 planned-guide link in guide_context_curation.md
referenced a never-written 'guide_context_presets.md'. The
ContextPreset schema is now fully covered in the new
'guide_context_aggregation.md' (per the 2026-06-08 docs
refresh). Fix: link target updated.
No content was changed, only link paths. 24 files, 37 link
replacements, 37 deletions.
Verification:
- All .md links in docs/ now resolve to existing files
(validated by path-resolution check from each file's directory)
- The 3 new guides from the previous docs refresh commit
(guide_discussions.md, guide_state_lifecycle.md,
guide_context_aggregation.md) had the case bug inherited from
guide_architecture.md's existing nav pattern; their top-of-file
nav bars are now correct
- The 21 pre-existing guide nav bars that had the same bug
(all 21 of them, except the 3 that used the correct case:
guide_mma.md, guide_simulations.md, guide_tools.md) are now
also fixed
- Inter-guide links (e.g. [Discussions](guide_discussions.md))
were not affected; they were always correct because both the
link text and the actual filename are lowercase
This is a docs-only fix. No code modified.
25 KiB
Context Aggregation: How Manual Slop Builds the AI's Context
Top | Discussions | Context Curation | Models | Architecture
Overview
src/aggregate.py (518 lines) is the context composition pipeline — the single function that turns a project's files + screenshots + history config into the final markdown string the AI sees. It is called by:
src/ai_client.py:_send_anthropic,_send_deepseek,_send_gemini,_send_gemini_cli,_send_minimax(every provider)src/app_controller.py:AppController._do_generate(the main send path)src/app_controller.py:AppController._cb_start_track,AppController._process_event_queue,AppController._start_track_logic(MMA paths)src/gui_2.py:App.run,App.main,App._render_snapshot_tab(the GUI and the prior-session replay)simulation/sim_base.py:run_simand 6 other simulation entry points
This is one of the most-touched modules in the project. After the nagent_review, this pipeline is recognized as Manual Slop's strongest curation dimension (vs nagent's conversation-log dimension). See conductor/tracks/nagent_review_20260608/report.md §6 and decisions.md candidate #7 for the related future-track.
Domain classification. The pipeline is Application-domain. The MMA sub-agents consume it but the pipeline itself does not call into Meta-Tooling code. See
guide_meta_boundary.md.
The Pipeline At A Glance
aggregate.run(config, aggregation_strategy)
├─ find_next_increment(output_dir, namespace) # next file number for output
├─ build_file_items(base_dir, files) # read + view-mode transform
├─ build_markdown_from_items(file_items, ...) # compose sections
│ ├─ ## Files (or Files (Summary) or Files (Tier 3 - Focused))
│ │ └─ _build_files_section_from_items OR summarize.build_summary_markdown
│ ├─ ## Screenshots (if any)
│ ├─ ## Beads Mode: Progress Track (if execution_mode == "beads")
│ └─ ## Discussion History (if any)
└─ output_file.write_text(markdown)
The output is a markdown file at {output_dir}/{namespace}_{NNN}.md where NNN is a zero-padded increment. The pipeline does not send the markdown — that's the AI client's job. The pipeline produces the markdown.
The return value is (markdown: str, output_file: Path, file_items: list[dict]). The file_items list is reused by callers that want to inspect the read state without re-reading from disk.
The Three Aggregation Strategies
aggregation_strategy: str selects how files are rendered. The values:
| Strategy | File rendering | History rendering | Tier 3 handling | Use case |
|---|---|---|---|---|
auto |
If summary_only is True → summary; else → full |
Standard | Standard | Default. Reads config.project.summary_only. |
summarize |
Always summarize.build_summary_markdown(file_items) (compact multi-file view) |
Standard | Standard | Token-budget-constrained runs. |
full |
Always _build_files_section_from_items(file_items) (full content) |
Standard | Standard | Debugging; when you want the AI to see everything. |
Implementation: aggregate.py:330-346 build_markdown_from_items. The three-way dispatch is at lines 335-339:
if aggregation_strategy == "summarize": parts.append("## Files (Summary)\n\n" + summarize.build_summary_markdown(file_items))
elif aggregation_strategy == "full": parts.append("## Files\n\n" + _build_files_section_from_items(file_items))
else: # auto
if summary_only: parts.append("## Files (Summary)\n\n" + summarize.build_summary_markdown(file_items))
else: parts.append("## Files\n\n" + _build_files_section_from_items(file_items))
The auto strategy is the only one that respects config.project.summary_only; the other two are explicit overrides. Personas can also set aggregation_strategy (per guide_personas.md), and a persona-set strategy overrides the config-level setting.
View Modes — The Per-File Transform
view_mode: str is the per-file content transform. The value is set on the FileItem (or the legacy dict-shaped config entry) and determines how the file's bytes are rendered into the markdown.
| View mode | Behavior | Source |
|---|---|---|
full |
Raw path.read_text(encoding="utf-8") content. |
aggregate.py:205 |
summary |
summarize.summarise_file(path, content) — heuristic summary from src/summarize.py. |
aggregate.py:210 |
skeleton |
For .py: ASTParser("python").get_skeleton(content) (tree-sitter). For .c/.h: mcp_client.ts_c_get_skeleton. For .cpp/.hpp: mcp_client.ts_cpp_get_skeleton. Other → summary. |
aggregate.py:211-220 |
outline |
For .py: ASTParser("python").get_code_outline(content). For C/C++: mcp_client.ts_c*_get_code_outline. Other → summary. |
aggregate.py:221-230 |
masked |
For each {symbol: mode} in ast_mask, fetch def or sig via mcp_client.py/ts_*_get_definition/signature. Concatenate. |
aggregate.py:231-249 |
none |
Literal string "(context excluded)" — the file is in the file_items list but contributes no content. |
aggregate.py:250 |
custom |
Render only the custom_slices from the FileItem. Each slice is a {start_line, end_line, tag, comment} dict. Lines outside the slices are excluded. |
aggregate.py:251-266 |
The default view mode is full. The persona can override via Persona.aggregation_strategy; the FileItem can override via FileItem.view_mode or FileItem.force_full (which forces full regardless of the FileItem's own setting).
Errors are graceful. A FileNotFoundError produces f"ERROR: file not found: {path}" content with error: True and mtime: 0.0. A view_mode that throws produces f"ERROR in {view_mode} view mode for {path}:\n{traceback.format_exc()}". Errors do not halt the pipeline.
The FileItem Schema (Full)
src/models.py:510-559 FileItem is the per-file curation memory that nagent_review identified as Manual Slop's strongest dimension. The dataclass has 9 mutable fields + a __post_init__ normalizer:
@dataclass
class FileItem:
path: str # the artifact identity (path-keyed, no inode)
auto_aggregate: bool = True # include in auto-aggregation? (skip in build_*_from_items if False)
force_full: bool = False # bypass view_mode; force raw content
view_mode: str = 'full' # one of: full, summary, skeleton, outline, masked, custom, none
selected: bool = False # for batch operations (the Context Panel multi-select)
ast_signatures: bool = False # include only signatures (skeleton-equivalent shortcut)
ast_definitions: bool = False # include only definitions (skeleton-equivalent shortcut)
ast_mask: dict[str, str] # per-symbol mask: {symbol_path: 'def'|'sig'|'hide'} (from Structural File Editor)
custom_slices: list[dict] # Fuzzy Anchor slices: {start_line, end_line, tag, comment, ...}
injected_at: Optional[float] # timestamp of last injection
The 9 fields are all serialized by to_dict() and all deserialized by from_dict() (with .get(..., default) for forward compatibility). The dataclass is round-trip-safe through TOML.
__post_init__ normalizes custom_slices: each slice dict gets tag=None and comment=None defaults added so downstream code can .get("tag") safely.
The Custom Slice Schema
A custom_slices entry is {start_line, end_line, tag, comment, ...} (plus Fuzzy Anchor metadata). The full schema is in src/fuzzy_anchor.py:FuzzyAnchor.create_slice:
{
"start_line": int, # 1-based original line
"end_line": int, # 1-based original line (inclusive)
"tag": str|None, # human label, defaults to None
"comment": str|None, # human comment, defaults to None
"content_hash": str, # SHA-256 of the slice content (for Fuzzy Anchor stability)
"anchor_lines": [str, ...],# surrounding context for re-resolution
# plus the original positioning metadata
}
When view_mode == 'custom', the aggregate.py:251-264 block renders each slice as:
---
[Slice: <tag>] (<comment>)
Lines <start>-<end>:
<content>
Multiple slices in a file are joined with \n\n.
The ContextPreset Schema
src/models.py:909-937 ContextPreset is a named, persisted set of FileItems — a reusable "context composition":
@dataclass
class ContextPreset:
name: str # the preset name (used as TOML key)
files: list[ContextFileEntry] = field(default_factory=list)
screenshots: list[str] = field(default_factory=list)
description: str = ""
ContextFileEntry is a FileItem (or a string path that's promoted to a FileItem on load). The description is a human-readable label for the preset list.
ContextPresetManager (in src/context_presets.py, 30 lines) handles CRUD:
save_preset(preset: ContextPreset)writes tomanual_slop.tomlor a project TOMLload_all() -> dict[str, ContextPreset]reads all presetsdelete_preset(name: str)removes a presetapply_preset(name: str)switches the active context composition to the named preset
reload_context_presets() (in app_controller.py) is called when the project TOML changes; it validates that all files in the preset still exist and warns the user about any that don't.
Scope: ContextPresets can be Global (in <user_config>/manual_slop.toml) or Project-specific (in the project's manual_slop.toml). Project presets override global presets of the same name. This is the same scope-inheritance pattern as Personas, Presets, and Workspace Profiles.
The Discussion History Section
aggregate.py:109 build_discussion_section(history) is the section that includes the prior conversation:
def build_discussion_section(history: list[Any]) -> str:
sections = []
for i, entry in enumerate(history, start=1):
if isinstance(entry, dict):
role = entry.get("role", "Unknown")
content = entry.get("content", "").strip()
text = f"{role}: {content}"
else:
text = str(entry).strip()
sections.append(f"### Discussion Excerpt {i}\n\n{text}")
return "\n\n---\n\n".join(sections)
The section handles both legacy list[str] (e.g. ["User: ...", "AI: ..."]) and the new list[dict] shape ([{"role": ..., "content": ...}, ...]). The dict shape is what's persisted by _flush_disc_entries_to_project (per app_controller.py:3225-3240) and what's stored in the new format.
The section is named ## Discussion History and is placed at the end of the markdown (after files, screenshots, beads). This is deliberate: the cache-hit-friendly static prefix is at the top, the dynamic history is at the bottom. See guide_architecture.md §"Cache Strategy".
Cache Strategy
The pipeline is structured to maximize provider cache hits. The static prefix (Files + Screenshots + Beads) is the same across all turns of a discussion; only the Discussion History changes. The provider's cache key is the prefix; the history is appended.
build_markdown_no_history (aggregate.py:348-353) is the explicit "static-only" builder used by _do_generate before adding the history. The full builder is build_markdown_from_items which adds the history if non-empty. This split allows the AI client to:
- Send the static prefix once.
- Append the history to the next send without re-sending the prefix.
- Re-use the cached prefix on the third send (if the files haven't changed).
The cache strategy is documented in detail in guide_ai_client.md §"Caching Strategy" and guide_architecture.md §"Cache Hit Strategy".
The Tier-3 Variant
aggregate.py:364-454 build_tier3_context is the MMA worker context — a different layout for sub-agent invocations. The differences from the standard pipeline:
- Focus files (passed as
focus_files: list[str]) are rendered as full content regardless of theirview_mode. A file is a focus file if itsentry, name, or path matches one of the focus paths. - Slices are resolved via FuzzyAnchor. If a file has
custom_slicesand the file content has been modified since the slice was created, the FuzzyAnchor re-resolves the line ranges. This is critical for sub-agents receiving slices that may be stale. - Section header is
## Files (Tier 3 - Focused). Distinct from the standard## Filesso the worker (and its tools) can recognize its own context. - The
is_focuscheck is multi-level. Entry match, name match, path match, and substring match. Sub-agents with looser file-matching needs can pass a focus set that's just a list of basenames.
The Tier 3 build skips the summarize.build_summary_markdown path entirely; every file is rendered with _build_files_section_from_items-style formatting (or the AST skeleton for non-focus Python files, or the AST signature/outline for C/C++).
The Tier 3 build is called from multi_agent_conductor.py:run_worker_lifecycle via aggregate.run(config, aggregation_strategy=tier_strategy).
The Bypass — force_full
FileItem.force_full = True short-circuits the view_mode selection:
if force_full: view_mode = "full"
This is set at the FileItem level (not the strategy level). Use case: the user has set a global "skeleton" view mode for the project but wants one specific file to always be inlined in full. The force is per-file and overrides both the FileItem's own view_mode and any strategy-level override.
For Tier 3, force_full is treated as a focus flag:
if is_focus or tier == 3 or force_full:
# full content, no skeleton
So a force_full=True file in a Tier 3 worker context is treated as a focus file and rendered in full.
Auto-Aggregate Skip
FileItem.auto_aggregate = False causes the file to be included in the file_items list but excluded from the rendered markdown:
for item in file_items:
if not item.get("auto_aggregate", True): continue
# ... build section
Use case: the file is in the files list for the AI's awareness (e.g. "you can read it via read_file") but should not be inlined. The file's mtime and view_mode are still tracked; the file is omitted from the rendered markdown.
This is distinct from view_mode == "none":
auto_aggregate = False→ file is not in the rendered markdown at all (no### Fileheader)view_mode = "none"→ file is in the rendered markdown as### File (excluded)with a"(context excluded)"body
The two are useful for different scenarios. auto_aggregate = False is for "the AI knows the file exists, can read it on demand." view_mode = "none" is for "the AI knows we deliberately excluded this content."
Screenshots
aggregate.py:126-140 build_screenshots_section renders the screenshots list as a ## Screenshots markdown section. Each screenshot is rendered as  (markdown image syntax). Path resolution uses resolve_paths (same as for files), so wildcards and absolute paths work.
Screenshots are placed after Files and before Beads and Discussion History. This is a deliberate ordering: the AI sees the project's files first (the static content), then the screenshots (the visual context), then the beads status (if applicable), then the discussion history (the dynamic content).
Beads Mode
When execution_mode == "beads" (set in config.project.execution_mode), the pipeline appends a ## Beads Mode: Progress Track section between Screenshots and Discussion History. The section is built by aggregate.py:309-328 build_beads_section:
- Lists all completed beads as a comma-separated list
- Lists all active beads as bullet points with title, id, and description
build_beads_section returns an empty string if the project is not a Beads project (client.is_initialized() is False) or if there are no beads. The caller (build_markdown_from_items) checks the truthiness before appending.
See guide_beads.md for the full Beads integration.
Output File Numbering
find_next_increment(output_dir, namespace) (aggregate.py:36-44) scans output_dir for files matching ^{namespace}_(\d+)\.md$ and returns max_num + 1. The output filename is {namespace}_{NNN:03d}.md (zero-padded to 3 digits). The increment starts at 1 and grows monotonically.
The increment is the artifact identity for the conversation. Each turn produces a new file. The current implementation does not delete old files; the LogPruner (per guide_architecture.md) handles cleanup separately.
Pipeline Callers
aggregate.run is called from many places. The most important:
| Caller | Purpose |
|---|---|
src/ai_client.py:_send_anthropic |
Build the markdown for an Anthropic send. |
src/ai_client.py:_send_gemini |
Build the markdown for a Gemini send. |
src/ai_client.py:_send_deepseek |
Build the markdown for a DeepSeek send. |
src/ai_client.py:_send_gemini_cli |
Build the markdown for a Gemini CLI send. |
src/ai_client.py:_send_minimax |
Build the markdown for a MiniMax send. |
src/app_controller.py:AppController._do_generate |
The main 1:1 send path. |
src/app_controller.py:AppController._cb_start_track |
Start a new MMA track. |
src/app_controller.py:AppController._process_event_queue |
Process a queued event (e.g. send, switch discussion). |
src/multi_agent_conductor.py:run_worker_lifecycle |
Spawn a Tier 3 worker (with Tier 3 context). |
src/gui_2.py:App.run |
The main GUI loop. |
src/gui_2.py:App._render_snapshot_tab |
Render a prior-session replay snapshot. |
simulation/sim_base.py:run_sim |
Run a simulation. |
The aggregation strategy is set per-call:
- The main
_do_generateusesconfig.project.aggregation_strategy(which is the persona-set strategy if a persona is active). - MMA worker contexts use the worker's
aggregation_strategyfrom the ticket config. - The simulation uses a fixed
auto.
Public API Surface
The public API of aggregate.py is:
| Function | Signature | Purpose |
|---|---|---|
find_next_increment |
(output_dir: Path, namespace: str) -> int |
Next file number for output. |
resolve_paths |
(base_dir: Path, entry: str) -> list[Path] |
Expand globs and absolute paths. Blacklist history.toml and *_history.toml. |
group_files_by_dir |
(files: list[Any]) -> dict[str, list[Any]] |
Group FileItems by relative directory path (used by the Context Panel UI). |
compute_file_stats |
(abs_path: str) -> dict[str, int] |
Line count + AST element count for Python files. |
build_file_items |
(base_dir, files) -> list[dict] |
Read + view-mode transform per file. The most-called function. |
build_discussion_section |
(history) -> str |
Render the ## Discussion History markdown. |
build_screenshots_section |
(base_dir, screenshots) -> str |
Render the ## Screenshots markdown. |
build_beads_section |
(base_dir) -> str |
Render the ## Beads Mode: Progress Track markdown. |
build_markdown_from_items |
(file_items, screenshot_base_dir, screenshots, history, summary_only, aggregation_strategy, execution_mode, base_dir) -> str |
Compose all sections. The "compose" function. |
build_markdown_no_history |
(file_items, screenshot_base_dir, screenshots, summary_only, aggregation_strategy) -> str |
Compose without history (for stable caching). |
build_discussion_text |
(history) -> str |
Just the history section, for callers that want to append to a pre-built static prefix. |
build_tier3_context |
(file_items, screenshot_base_dir, screenshots, history, focus_files) -> str |
Tier 3 worker context. |
build_markdown |
(base_dir, files, screenshot_base_dir, screenshots, history, summary_only, execution_mode) -> str |
Convenience: read files + compose. |
run |
(config, aggregation_strategy) -> tuple[str, Path, list[dict]] |
The full pipeline. |
main |
() -> None |
CLI entry point. Loads config, calls run, prints output path. |
Performance: the entire pipeline is O(N) in the number of files, with the per-file AST work being the most expensive step. build_tier3_context includes with get_monitor().scope("build_tier3_context") (and similar for build_file_items and build_markdown_no_history) for performance monitoring. The monitor is documented in guide_architecture.md §"Performance".
Performance Considerations
The view_mode selection has a meaningful performance impact:
| view_mode | Per-file cost | When to use |
|---|---|---|
full |
1 file read + string concat | Small files, files the user is actively editing. |
summary |
1 file read + 1 heuristic call to summarize.summarise_file |
Large files where structural info is enough. |
skeleton |
1 file read + 1 tree-sitter parse + skeleton build | Python/C/C++ files where the structure matters more than the content. |
outline |
1 file read + 1 tree-sitter parse + outline build | When the AI only needs the public API surface. |
masked |
1 file read + N mcp_client.py/ts_*_get_* calls (one per masked symbol) |
When the user has explicitly marked symbols as "def" or "sig". |
none |
1 file read (still reads the bytes, just discards) | When the user wants the file in the list but not in the rendered markdown. |
custom |
1 file read + line slicing per slice | When the user has explicitly created Fuzzy Anchor slices. |
The force_full = True and auto_aggregate = False flags skip some of the work:
force_full = Trueskips the view-mode dispatch and goes straight to raw content.auto_aggregate = Falseskips the view-mode dispatch entirely and skips the markdown section build.
For very large codebases (1000+ files), the bottleneck is the tree-sitter parsing for skeleton / outline / masked modes. The Tier 3 builder uses ASTParser("python") lazily (if not parser: parser = ASTParser("python")) so the tree-sitter grammar is loaded only once per pipeline call.
Tests
tests/test_aggregate_flags.py—test_auto_aggregate_skip,test_force_full,test_view_mode_full,test_view_mode_summary,test_view_mode_skeleton,test_view_mode_outline,test_view_mode_none,test_view_mode_custom,test_view_mode_maskedtests/test_aggregate_beads.py—test_build_beads_compactiontests/test_context_composition_phase3.py—test_group_files_by_dir,test_compute_file_statstests/test_context_composition_phase6.py—test_view_mode_default_summary,test_view_mode_full,test_view_mode_none,test_view_mode_outline,test_view_mode_skeleton,test_view_mode_summary,test_view_mode_custom,test_view_mode_custom_empty_default_to_summary,test_files_section_renderingtests/test_tiered_context.py—test_build_tier3_context_exists,test_build_tier3_context_ast_skeleton,test_build_tier3_context_scaling,test_tiered_context_by_tier_field,test_build_file_items_with_tiers,test_build_files_section_with_dictstests/test_ast_masking_core.py—test_ast_masking_gencpp_samplestests/test_gencpp_full_suite.py—test_gencpp_full_suitetests/test_perf_aggregate.py—test_build_tier3_context_scalingtests/test_history_management.py—test_aggregate_blacklist,test_aggregate_includes_segregated_history,test_aggregate_respects_*tests/test_ui_summary_only_removal.py—test_aggregate_from_items_respects_auto_aggregatetests/test_aggregate_helpers.py—test_resolve_paths_blacklist,test_resolve_paths_glob,test_resolve_paths_absolutetests/test_aggregate_perf.py—test_find_next_increment_*
Cross-References
- The pipeline source:
src/aggregate.py(518 lines) - FileItem schema:
src/models.py:510-559 FileItem - ContextPreset schema:
src/models.py:909-937 ContextPreset - ContextPresetManager:
src/context_presets.py(30 lines) - AI client consumption:
src/ai_client.py:_send_<provider>× 5, seeguide_ai_client.md - Tier 3 worker consumption:
src/multi_agent_conductor.py:run_worker_lifecycle, seeguide_multi_agent_conductor.md - Per-file curation features:
guide_context_curation.md(Fuzzy Anchors, AST Inspector, Granular AST Control) - Cache strategy:
guide_architecture.md §"Cache Hit Strategy",guide_ai_client.md §"Caching" - Discussion section builder:
guide_discussions.md §"Persistence",src/aggregate.py:109 build_discussion_section - Deep-dive on the design philosophy:
conductor/tracks/nagent_review_20260608/report.md §6(per-file memory) - Actionable patterns for richer per-file memory:
conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md §4(file_id), §6 (git history), §7 (Meta-Tooling DSL) - Future-track candidate for per-file conversation log:
conductor/tracks/nagent_review_20260608/decisions.mdcandidate #7