Private
Public Access
0
0
Files
manual_slop/docs/guide_context_aggregation.md
T
conductor-tier2 161ebb0da6 docs(fix): correct nav link case + relative-path level
Gitea (and any case-sensitive filesystem) was rendering the [Top]
nav links in /docs as broken because of two bugs:

1. Case-sensitivity: 22 links used '../README.md' (all-uppercase)
   but the actual file is 'docs/Readme.md' (capital R, lowercase
   rest). 21 guide_*.md nav bars were affected, plus 1 internal
   cross-link in Readme.md itself. Works on Windows (case-
   insensitive) but broken on Linux/Gitea.

   Fix: 22 occurrences across 22 files changed
   '../README.md' -> '../Readme.md'

2. Wrong relative-path level: 16 links used '../../conductor/...'
   from 'docs/guide_*.md' to reach 'conductor/'. This goes up 2
   levels to 'projects/', which doesn't exist. The correct path
   from 'docs/guide_*.md' to 'conductor/' is 1 level up
   ('../conductor/...'). 12 unique patterns across 10 files
   affected.

   Fix: 16 occurrences across 10 files changed
   '../../conductor/' -> '../conductor/'

3. Bonus: 1 planned-guide link in guide_context_curation.md
   referenced a never-written 'guide_context_presets.md'. The
   ContextPreset schema is now fully covered in the new
   'guide_context_aggregation.md' (per the 2026-06-08 docs
   refresh). Fix: link target updated.

No content was changed, only link paths. 24 files, 37 link
replacements, 37 deletions.

Verification:
- All .md links in docs/ now resolve to existing files
  (validated by path-resolution check from each file's directory)
- The 3 new guides from the previous docs refresh commit
  (guide_discussions.md, guide_state_lifecycle.md,
  guide_context_aggregation.md) had the case bug inherited from
  guide_architecture.md's existing nav pattern; their top-of-file
  nav bars are now correct
- The 21 pre-existing guide nav bars that had the same bug
  (all 21 of them, except the 3 that used the correct case:
  guide_mma.md, guide_simulations.md, guide_tools.md) are now
  also fixed
- Inter-guide links (e.g. [Discussions](guide_discussions.md))
  were not affected; they were always correct because both the
  link text and the actual filename are lowercase

This is a docs-only fix. No code modified.
2026-06-08 19:51:55 -04:00

395 lines
25 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Context Aggregation: How Manual Slop Builds the AI's Context
[Top](../Readme.md) | [Discussions](guide_discussions.md) | [Context Curation](guide_context_curation.md) | [Models](guide_models.md) | [Architecture](guide_architecture.md)
---
## Overview
`src/aggregate.py` (518 lines) is the **context composition pipeline** — the single function that turns a project's `files` + `screenshots` + `history` config into the final markdown string the AI sees. It is called by:
- `src/ai_client.py:_send_anthropic`, `_send_deepseek`, `_send_gemini`, `_send_gemini_cli`, `_send_minimax` (every provider)
- `src/app_controller.py:AppController._do_generate` (the main send path)
- `src/app_controller.py:AppController._cb_start_track`, `AppController._process_event_queue`, `AppController._start_track_logic` (MMA paths)
- `src/gui_2.py:App.run`, `App.main`, `App._render_snapshot_tab` (the GUI and the prior-session replay)
- `simulation/sim_base.py:run_sim` and 6 other simulation entry points
This is one of the most-touched modules in the project. After the nagent_review, this pipeline is recognized as **Manual Slop's strongest curation dimension** (vs nagent's conversation-log dimension). See `conductor/tracks/nagent_review_20260608/report.md §6` and `decisions.md` candidate #7 for the related future-track.
> **Domain classification.** The pipeline is **Application**-domain. The MMA sub-agents consume it but the pipeline itself does not call into Meta-Tooling code. See `guide_meta_boundary.md`.
---
## The Pipeline At A Glance
```
aggregate.run(config, aggregation_strategy)
├─ find_next_increment(output_dir, namespace) # next file number for output
├─ build_file_items(base_dir, files) # read + view-mode transform
├─ build_markdown_from_items(file_items, ...) # compose sections
│ ├─ ## Files (or Files (Summary) or Files (Tier 3 - Focused))
│ │ └─ _build_files_section_from_items OR summarize.build_summary_markdown
│ ├─ ## Screenshots (if any)
│ ├─ ## Beads Mode: Progress Track (if execution_mode == "beads")
│ └─ ## Discussion History (if any)
└─ output_file.write_text(markdown)
```
The **output** is a markdown file at `{output_dir}/{namespace}_{NNN}.md` where `NNN` is a zero-padded increment. The pipeline does not *send* the markdown — that's the AI client's job. The pipeline *produces* the markdown.
The **return value** is `(markdown: str, output_file: Path, file_items: list[dict])`. The file_items list is reused by callers that want to inspect the read state without re-reading from disk.
---
## The Three Aggregation Strategies
`aggregation_strategy: str` selects how files are rendered. The values:
| Strategy | File rendering | History rendering | Tier 3 handling | Use case |
|---|---|---|---|---|
| `auto` | If `summary_only` is True → summary; else → full | Standard | Standard | Default. Reads `config.project.summary_only`. |
| `summarize` | Always `summarize.build_summary_markdown(file_items)` (compact multi-file view) | Standard | Standard | Token-budget-constrained runs. |
| `full` | Always `_build_files_section_from_items(file_items)` (full content) | Standard | Standard | Debugging; when you want the AI to see everything. |
**Implementation:** `aggregate.py:330-346 build_markdown_from_items`. The three-way dispatch is at lines 335-339:
```python
if aggregation_strategy == "summarize": parts.append("## Files (Summary)\n\n" + summarize.build_summary_markdown(file_items))
elif aggregation_strategy == "full": parts.append("## Files\n\n" + _build_files_section_from_items(file_items))
else: # auto
if summary_only: parts.append("## Files (Summary)\n\n" + summarize.build_summary_markdown(file_items))
else: parts.append("## Files\n\n" + _build_files_section_from_items(file_items))
```
The `auto` strategy is the *only* one that respects `config.project.summary_only`; the other two are explicit overrides. Personas can also set `aggregation_strategy` (per `guide_personas.md`), and a persona-set strategy overrides the config-level setting.
---
## View Modes — The Per-File Transform
`view_mode: str` is the per-file content transform. The value is set on the `FileItem` (or the legacy dict-shaped config entry) and determines how the file's bytes are rendered into the markdown.
| View mode | Behavior | Source |
|---|---|---|
| `full` | Raw `path.read_text(encoding="utf-8")` content. | `aggregate.py:205` |
| `summary` | `summarize.summarise_file(path, content)` — heuristic summary from `src/summarize.py`. | `aggregate.py:210` |
| `skeleton` | For `.py`: `ASTParser("python").get_skeleton(content)` (tree-sitter). For `.c`/`.h`: `mcp_client.ts_c_get_skeleton`. For `.cpp`/`.hpp`: `mcp_client.ts_cpp_get_skeleton`. Other → summary. | `aggregate.py:211-220` |
| `outline` | For `.py`: `ASTParser("python").get_code_outline(content)`. For C/C++: `mcp_client.ts_c*_get_code_outline`. Other → summary. | `aggregate.py:221-230` |
| `masked` | For each `{symbol: mode}` in `ast_mask`, fetch `def` or `sig` via `mcp_client.py/ts_*_get_definition/signature`. Concatenate. | `aggregate.py:231-249` |
| `none` | Literal string `"(context excluded)"` — the file is in the file_items list but contributes no content. | `aggregate.py:250` |
| `custom` | Render only the `custom_slices` from the FileItem. Each slice is a `{start_line, end_line, tag, comment}` dict. Lines outside the slices are excluded. | `aggregate.py:251-266` |
**The default view mode** is `full`. The persona can override via `Persona.aggregation_strategy`; the FileItem can override via `FileItem.view_mode` or `FileItem.force_full` (which forces `full` regardless of the FileItem's own setting).
**Errors are graceful.** A `FileNotFoundError` produces `f"ERROR: file not found: {path}"` content with `error: True` and `mtime: 0.0`. A `view_mode` that throws produces `f"ERROR in {view_mode} view mode for {path}:\n{traceback.format_exc()}"`. Errors do not halt the pipeline.
---
## The FileItem Schema (Full)
`src/models.py:510-559 FileItem` is the **per-file curation memory** that nagent_review identified as Manual Slop's strongest dimension. The dataclass has 9 mutable fields + a `__post_init__` normalizer:
```python
@dataclass
class FileItem:
path: str # the artifact identity (path-keyed, no inode)
auto_aggregate: bool = True # include in auto-aggregation? (skip in build_*_from_items if False)
force_full: bool = False # bypass view_mode; force raw content
view_mode: str = 'full' # one of: full, summary, skeleton, outline, masked, custom, none
selected: bool = False # for batch operations (the Context Panel multi-select)
ast_signatures: bool = False # include only signatures (skeleton-equivalent shortcut)
ast_definitions: bool = False # include only definitions (skeleton-equivalent shortcut)
ast_mask: dict[str, str] # per-symbol mask: {symbol_path: 'def'|'sig'|'hide'} (from Structural File Editor)
custom_slices: list[dict] # Fuzzy Anchor slices: {start_line, end_line, tag, comment, ...}
injected_at: Optional[float] # timestamp of last injection
```
The 9 fields are *all* serialized by `to_dict()` and *all* deserialized by `from_dict()` (with `.get(..., default)` for forward compatibility). The dataclass is round-trip-safe through TOML.
`__post_init__` normalizes `custom_slices`: each slice dict gets `tag=None` and `comment=None` defaults added so downstream code can `.get("tag")` safely.
### The Custom Slice Schema
A `custom_slices` entry is `{start_line, end_line, tag, comment, ...}` (plus Fuzzy Anchor metadata). The full schema is in `src/fuzzy_anchor.py:FuzzyAnchor.create_slice`:
```python
{
"start_line": int, # 1-based original line
"end_line": int, # 1-based original line (inclusive)
"tag": str|None, # human label, defaults to None
"comment": str|None, # human comment, defaults to None
"content_hash": str, # SHA-256 of the slice content (for Fuzzy Anchor stability)
"anchor_lines": [str, ...],# surrounding context for re-resolution
# plus the original positioning metadata
}
```
When `view_mode == 'custom'`, the `aggregate.py:251-264` block renders each slice as:
```markdown
---
[Slice: <tag>] (<comment>)
Lines <start>-<end>:
<content>
```
Multiple slices in a file are joined with `\n\n`.
---
## The ContextPreset Schema
`src/models.py:909-937 ContextPreset` is a *named, persisted set* of `FileItem`s — a reusable "context composition":
```python
@dataclass
class ContextPreset:
name: str # the preset name (used as TOML key)
files: list[ContextFileEntry] = field(default_factory=list)
screenshots: list[str] = field(default_factory=list)
description: str = ""
```
`ContextFileEntry` is a `FileItem` (or a string path that's promoted to a `FileItem` on load). The `description` is a human-readable label for the preset list.
`ContextPresetManager` (in `src/context_presets.py`, 30 lines) handles CRUD:
- `save_preset(preset: ContextPreset)` writes to `manual_slop.toml` or a project TOML
- `load_all() -> dict[str, ContextPreset]` reads all presets
- `delete_preset(name: str)` removes a preset
- `apply_preset(name: str)` switches the active context composition to the named preset
`reload_context_presets()` (in `app_controller.py`) is called when the project TOML changes; it validates that all files in the preset still exist and warns the user about any that don't.
**Scope:** ContextPresets can be **Global** (in `<user_config>/manual_slop.toml`) or **Project-specific** (in the project's `manual_slop.toml`). Project presets override global presets of the same name. This is the same scope-inheritance pattern as Personas, Presets, and Workspace Profiles.
---
## The Discussion History Section
`aggregate.py:109 build_discussion_section(history)` is the section that includes the prior conversation:
```python
def build_discussion_section(history: list[Any]) -> str:
sections = []
for i, entry in enumerate(history, start=1):
if isinstance(entry, dict):
role = entry.get("role", "Unknown")
content = entry.get("content", "").strip()
text = f"{role}: {content}"
else:
text = str(entry).strip()
sections.append(f"### Discussion Excerpt {i}\n\n{text}")
return "\n\n---\n\n".join(sections)
```
The section handles *both* legacy `list[str]` (e.g. `["User: ...", "AI: ..."]`) and the new `list[dict]` shape (`[{"role": ..., "content": ...}, ...]`). The dict shape is what's persisted by `_flush_disc_entries_to_project` (per `app_controller.py:3225-3240`) and what's stored in the new format.
The section is named **`## Discussion History`** and is placed at the *end* of the markdown (after files, screenshots, beads). This is deliberate: the cache-hit-friendly static prefix is at the top, the dynamic history is at the bottom. See `guide_architecture.md §"Cache Strategy"`.
---
## Cache Strategy
The pipeline is structured to maximize provider cache hits. The static prefix (Files + Screenshots + Beads) is the same across all turns of a discussion; only the Discussion History changes. The provider's cache key is the prefix; the history is appended.
`build_markdown_no_history` (`aggregate.py:348-353`) is the explicit "static-only" builder used by `_do_generate` *before* adding the history. The full builder is `build_markdown_from_items` which adds the history if non-empty. This split allows the AI client to:
1. Send the static prefix once.
2. Append the history to the next send without re-sending the prefix.
3. Re-use the cached prefix on the third send (if the files haven't changed).
The cache strategy is documented in detail in `guide_ai_client.md §"Caching Strategy"` and `guide_architecture.md §"Cache Hit Strategy"`.
---
## The Tier-3 Variant
`aggregate.py:364-454 build_tier3_context` is the **MMA worker context** — a different layout for sub-agent invocations. The differences from the standard pipeline:
1. **Focus files** (passed as `focus_files: list[str]`) are rendered as **full content** regardless of their `view_mode`. A file is a focus file if its `entry`, name, or path matches one of the focus paths.
2. **Slices are resolved via FuzzyAnchor.** If a file has `custom_slices` and the file content has been modified since the slice was created, the FuzzyAnchor re-resolves the line ranges. This is critical for sub-agents receiving slices that may be stale.
3. **Section header is `## Files (Tier 3 - Focused)`.** Distinct from the standard `## Files` so the worker (and its tools) can recognize its own context.
4. **The `is_focus` check is multi-level.** Entry match, name match, path match, and substring match. Sub-agents with looser file-matching needs can pass a focus set that's just a list of basenames.
The Tier 3 build skips the `summarize.build_summary_markdown` path entirely; every file is rendered with `_build_files_section_from_items`-style formatting (or the AST skeleton for non-focus Python files, or the AST signature/outline for C/C++).
The Tier 3 build is called from `multi_agent_conductor.py:run_worker_lifecycle` via `aggregate.run(config, aggregation_strategy=tier_strategy)`.
---
## The Bypass — `force_full`
`FileItem.force_full = True` short-circuits the `view_mode` selection:
```python
if force_full: view_mode = "full"
```
This is set at the `FileItem` level (not the strategy level). Use case: the user has set a global "skeleton" view mode for the project but wants one specific file to always be inlined in full. The force is per-file and overrides both the FileItem's own `view_mode` and any strategy-level override.
For Tier 3, `force_full` is treated as a *focus flag*:
```python
if is_focus or tier == 3 or force_full:
# full content, no skeleton
```
So a `force_full=True` file in a Tier 3 worker context is treated as a focus file and rendered in full.
---
## Auto-Aggregate Skip
`FileItem.auto_aggregate = False` causes the file to be *included in the file_items list* but *excluded from the rendered markdown*:
```python
for item in file_items:
if not item.get("auto_aggregate", True): continue
# ... build section
```
Use case: the file is in the `files` list for the AI's *awareness* (e.g. "you can read it via `read_file`") but should not be inlined. The file's `mtime` and `view_mode` are still tracked; the file is *omitted* from the rendered markdown.
This is distinct from `view_mode == "none"`:
- `auto_aggregate = False` → file is not in the rendered markdown at all (no `### File` header)
- `view_mode = "none"` → file is in the rendered markdown as `### File (excluded)` with a `"(context excluded)"` body
The two are useful for different scenarios. `auto_aggregate = False` is for "the AI knows the file exists, can read it on demand." `view_mode = "none"` is for "the AI knows we deliberately excluded this content."
---
## Screenshots
`aggregate.py:126-140 build_screenshots_section` renders the screenshots list as a `## Screenshots` markdown section. Each screenshot is rendered as `![name](path)` (markdown image syntax). Path resolution uses `resolve_paths` (same as for files), so wildcards and absolute paths work.
**Screenshots are placed *after* Files and *before* Beads and Discussion History.** This is a deliberate ordering: the AI sees the project's files first (the static content), then the screenshots (the visual context), then the beads status (if applicable), then the discussion history (the dynamic content).
---
## Beads Mode
When `execution_mode == "beads"` (set in `config.project.execution_mode`), the pipeline appends a `## Beads Mode: Progress Track` section between Screenshots and Discussion History. The section is built by `aggregate.py:309-328 build_beads_section`:
- Lists all *completed* beads as a comma-separated list
- Lists all *active* beads as bullet points with title, id, and description
`build_beads_section` returns an empty string if the project is not a Beads project (`client.is_initialized()` is False) or if there are no beads. The caller (`build_markdown_from_items`) checks the truthiness before appending.
See `guide_beads.md` for the full Beads integration.
---
## Output File Numbering
`find_next_increment(output_dir, namespace)` (`aggregate.py:36-44`) scans `output_dir` for files matching `^{namespace}_(\d+)\.md$` and returns `max_num + 1`. The output filename is `{namespace}_{NNN:03d}.md` (zero-padded to 3 digits). The increment starts at 1 and grows monotonically.
The increment is the *artifact identity* for the conversation. Each turn produces a new file. The current implementation does *not* delete old files; the `LogPruner` (per `guide_architecture.md`) handles cleanup separately.
---
## Pipeline Callers
`aggregate.run` is called from many places. The most important:
| Caller | Purpose |
|---|---|
| `src/ai_client.py:_send_anthropic` | Build the markdown for an Anthropic send. |
| `src/ai_client.py:_send_gemini` | Build the markdown for a Gemini send. |
| `src/ai_client.py:_send_deepseek` | Build the markdown for a DeepSeek send. |
| `src/ai_client.py:_send_gemini_cli` | Build the markdown for a Gemini CLI send. |
| `src/ai_client.py:_send_minimax` | Build the markdown for a MiniMax send. |
| `src/app_controller.py:AppController._do_generate` | The main 1:1 send path. |
| `src/app_controller.py:AppController._cb_start_track` | Start a new MMA track. |
| `src/app_controller.py:AppController._process_event_queue` | Process a queued event (e.g. send, switch discussion). |
| `src/multi_agent_conductor.py:run_worker_lifecycle` | Spawn a Tier 3 worker (with Tier 3 context). |
| `src/gui_2.py:App.run` | The main GUI loop. |
| `src/gui_2.py:App._render_snapshot_tab` | Render a prior-session replay snapshot. |
| `simulation/sim_base.py:run_sim` | Run a simulation. |
The aggregation strategy is set per-call:
- The main `_do_generate` uses `config.project.aggregation_strategy` (which is the persona-set strategy if a persona is active).
- MMA worker contexts use the worker's `aggregation_strategy` from the ticket config.
- The simulation uses a fixed `auto`.
---
## Public API Surface
The public API of `aggregate.py` is:
| Function | Signature | Purpose |
|---|---|---|
| `find_next_increment` | `(output_dir: Path, namespace: str) -> int` | Next file number for output. |
| `resolve_paths` | `(base_dir: Path, entry: str) -> list[Path]` | Expand globs and absolute paths. Blacklist `history.toml` and `*_history.toml`. |
| `group_files_by_dir` | `(files: list[Any]) -> dict[str, list[Any]]` | Group FileItems by relative directory path (used by the Context Panel UI). |
| `compute_file_stats` | `(abs_path: str) -> dict[str, int]` | Line count + AST element count for Python files. |
| `build_file_items` | `(base_dir, files) -> list[dict]` | Read + view-mode transform per file. The most-called function. |
| `build_discussion_section` | `(history) -> str` | Render the `## Discussion History` markdown. |
| `build_screenshots_section` | `(base_dir, screenshots) -> str` | Render the `## Screenshots` markdown. |
| `build_beads_section` | `(base_dir) -> str` | Render the `## Beads Mode: Progress Track` markdown. |
| `build_markdown_from_items` | `(file_items, screenshot_base_dir, screenshots, history, summary_only, aggregation_strategy, execution_mode, base_dir) -> str` | Compose all sections. The "compose" function. |
| `build_markdown_no_history` | `(file_items, screenshot_base_dir, screenshots, summary_only, aggregation_strategy) -> str` | Compose without history (for stable caching). |
| `build_discussion_text` | `(history) -> str` | Just the history section, for callers that want to append to a pre-built static prefix. |
| `build_tier3_context` | `(file_items, screenshot_base_dir, screenshots, history, focus_files) -> str` | Tier 3 worker context. |
| `build_markdown` | `(base_dir, files, screenshot_base_dir, screenshots, history, summary_only, execution_mode) -> str` | Convenience: read files + compose. |
| `run` | `(config, aggregation_strategy) -> tuple[str, Path, list[dict]]` | The full pipeline. |
| `main` | `() -> None` | CLI entry point. Loads config, calls `run`, prints output path. |
**Performance:** the entire pipeline is O(N) in the number of files, with the per-file AST work being the most expensive step. `build_tier3_context` includes `with get_monitor().scope("build_tier3_context")` (and similar for `build_file_items` and `build_markdown_no_history`) for performance monitoring. The monitor is documented in `guide_architecture.md §"Performance"`.
---
## Performance Considerations
The `view_mode` selection has a meaningful performance impact:
| view_mode | Per-file cost | When to use |
|---|---|---|
| `full` | 1 file read + string concat | Small files, files the user is actively editing. |
| `summary` | 1 file read + 1 heuristic call to `summarize.summarise_file` | Large files where structural info is enough. |
| `skeleton` | 1 file read + 1 tree-sitter parse + skeleton build | Python/C/C++ files where the structure matters more than the content. |
| `outline` | 1 file read + 1 tree-sitter parse + outline build | When the AI only needs the public API surface. |
| `masked` | 1 file read + N `mcp_client.py/ts_*_get_*` calls (one per masked symbol) | When the user has explicitly marked symbols as "def" or "sig". |
| `none` | 1 file read (still reads the bytes, just discards) | When the user wants the file in the list but not in the rendered markdown. |
| `custom` | 1 file read + line slicing per slice | When the user has explicitly created Fuzzy Anchor slices. |
The `force_full = True` and `auto_aggregate = False` flags skip *some* of the work:
- `force_full = True` skips the view-mode dispatch and goes straight to raw content.
- `auto_aggregate = False` skips the view-mode dispatch entirely and skips the markdown section build.
For very large codebases (1000+ files), the bottleneck is the tree-sitter parsing for `skeleton` / `outline` / `masked` modes. The Tier 3 builder uses `ASTParser("python")` lazily (`if not parser: parser = ASTParser("python")`) so the tree-sitter grammar is loaded only once per pipeline call.
---
## Tests
- `tests/test_aggregate_flags.py``test_auto_aggregate_skip`, `test_force_full`, `test_view_mode_full`, `test_view_mode_summary`, `test_view_mode_skeleton`, `test_view_mode_outline`, `test_view_mode_none`, `test_view_mode_custom`, `test_view_mode_masked`
- `tests/test_aggregate_beads.py``test_build_beads_compaction`
- `tests/test_context_composition_phase3.py``test_group_files_by_dir`, `test_compute_file_stats`
- `tests/test_context_composition_phase6.py``test_view_mode_default_summary`, `test_view_mode_full`, `test_view_mode_none`, `test_view_mode_outline`, `test_view_mode_skeleton`, `test_view_mode_summary`, `test_view_mode_custom`, `test_view_mode_custom_empty_default_to_summary`, `test_files_section_rendering`
- `tests/test_tiered_context.py``test_build_tier3_context_exists`, `test_build_tier3_context_ast_skeleton`, `test_build_tier3_context_scaling`, `test_tiered_context_by_tier_field`, `test_build_file_items_with_tiers`, `test_build_files_section_with_dicts`
- `tests/test_ast_masking_core.py``test_ast_masking_gencpp_samples`
- `tests/test_gencpp_full_suite.py``test_gencpp_full_suite`
- `tests/test_perf_aggregate.py``test_build_tier3_context_scaling`
- `tests/test_history_management.py``test_aggregate_blacklist`, `test_aggregate_includes_segregated_history`, `test_aggregate_respects_*`
- `tests/test_ui_summary_only_removal.py``test_aggregate_from_items_respects_auto_aggregate`
- `tests/test_aggregate_helpers.py``test_resolve_paths_blacklist`, `test_resolve_paths_glob`, `test_resolve_paths_absolute`
- `tests/test_aggregate_perf.py``test_find_next_increment_*`
---
## Cross-References
- **The pipeline source:** `src/aggregate.py` (518 lines)
- **FileItem schema:** `src/models.py:510-559 FileItem`
- **ContextPreset schema:** `src/models.py:909-937 ContextPreset`
- **ContextPresetManager:** `src/context_presets.py` (30 lines)
- **AI client consumption:** `src/ai_client.py:_send_<provider>` × 5, see `guide_ai_client.md`
- **Tier 3 worker consumption:** `src/multi_agent_conductor.py:run_worker_lifecycle`, see `guide_multi_agent_conductor.md`
- **Per-file curation features:** `guide_context_curation.md` (Fuzzy Anchors, AST Inspector, Granular AST Control)
- **Cache strategy:** `guide_architecture.md §"Cache Hit Strategy"`, `guide_ai_client.md §"Caching"`
- **Discussion section builder:** `guide_discussions.md §"Persistence"`, `src/aggregate.py:109 build_discussion_section`
- **Deep-dive on the design philosophy:** `conductor/tracks/nagent_review_20260608/report.md §6` (per-file memory)
- **Actionable patterns for richer per-file memory:** `conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md §4` (file_id), §6 (git history), §7 (Meta-Tooling DSL)
- **Future-track candidate for per-file conversation log:** `conductor/tracks/nagent_review_20260608/decisions.md` candidate #7