docs(refresh): 3 new guides + cross-links from nagent_review
Per the docs Refresh Protocol (conductor/workflow.md), after a
reference/analysis track ships, the affected guides must be updated
to reflect new module structure or new conventions. The nagent_review
track (9cc51ca9) produced a deep-dive + 10 actionable takeaways that
named 3 documentation gaps in /docs. This commit fills them.
3 new guides (1,122 lines total):
1. guide_discussions.md (353 lines) — The Discussion system
- 23-operation matrix: A1-A7 per-entry + B1-B11 discussion-level
+ C1-C5 undo/redo
- Take naming convention (<base>_take_<n>), branching, promotion
- User-managed role list (app.disc_roles)
- Per-role filter linked to MMA persona focus
- _disc_entries_lock thread-safety contract
- Hook API session endpoints
- Persistence: _flush_to_project, _flush_disc_entries_to_project,
context_snapshot
- 9 file:line refs into gui_2.py:3770-4260 + history.py
2. guide_state_lifecycle.md (375 lines) — Undo/redo + reset + state
delegation
- HistoryManager + UISnapshot (13 captured fields, 100-snapshot
capacity, debounced change-detection at render frame)
- _handle_reset_session (clears 30+ fields, replaces project,
preserves active_project_path per the 2026-06-08 regression fix)
- App.__getattr__/__setattr__ state delegation to Controller
- 4-thread access pattern with 7 lock-protected regions
- State persistence: in-memory vs project TOML vs config TOML
- Hot-reload integration
- Hook API registries (_predefined_callbacks, _gettable_fields)
- 14 file:line refs into gui_2.py:1140-1170, history.py,
app_controller.py:3286-3356
3. guide_context_aggregation.md (394 lines) — The aggregate.py
pipeline
- 3 aggregation strategies (auto, summarize, full)
- 7 per-file view modes (full, summary, skeleton, outline,
masked, custom, none)
- Full FileItem schema (9 fields + __post_init__ normalizer)
at models.py:510-559
- ContextPreset schema and ContextPresetManager
- Tier 3 worker variant (build_tier3_context with FuzzyAnchor
re-resolution and focus-file handling)
- force_full / auto_aggregate short-circuits
- Cache strategy (static prefix + dynamic history)
- 23 file:line refs into aggregate.py:36-518 + models.py:909-937
8 existing guides cross-linked to the 3 new guides and to the
nagent_review track:
- guide_gui_2.md (+ See Also entries for discussions,
state lifecycle, context aggregation,
nagent_review report)
- guide_app_controller.md (+ See Also entries for discussions,
state lifecycle, context aggregation,
nagent_review report)
- guide_context_curation.md (+ new See Also section pointing to
context aggregation + nagent_review)
- guide_architecture.md (+ new See Also section listing all 10
guides + nagent_review report)
- guide_ai_client.md (+ See Also entries for state lifecycle,
context aggregation, nagent_review
pitfalls #2 and #4)
- guide_mma.md (+ new See Also section pointing to
context aggregation, discussions,
nagent_review report §9 + takeaways §3/§10
for SubConversationRunner priority)
- guide_models.md (+ See Also entries for context
aggregation, discussions, nagent_review
report §6 on FileItem as strongest
curation dimension)
- Readme.md (+ 3 new guide entries in the index
table, with one-line summaries)
No code modified. This is documentation only.
Why these 3 guides specifically:
- guide_discussions.md: The discussion system is the user's most
edited surface. nagent_review's report §3 enumerated 23 operations
(A1-C5) that previously existed only as scattered file:line refs
across gui_2.py. A dedicated guide makes the operation matrix
discoverable.
- guide_state_lifecycle.md: The undo/redo + reset + state delegation
machinery is architecturally load-bearing but scattered across 4
files. After nagent_review identified the provider-side history
divergence as Pitfall #4, the relationship between Manual Slop's
state and the provider's state needs explicit documentation.
- guide_context_aggregation.md: aggregate.py (518 lines) is the
most-touched module after ai_client.py but had no dedicated
guide. nagent_review confirmed it's Manual Slop's strongest
curation dimension. A dedicated guide makes the 7 view modes
and 3 strategies discoverable.
The 3 new guides total 1,122 lines and follow the existing
per-source-file deep-dive style (architectural, data-oriented,
state-management-focused).
This commit is contained in:
@@ -0,0 +1,394 @@
|
||||
# Context Aggregation: How Manual Slop Builds the AI's Context
|
||||
|
||||
[Top](../README.md) | [Discussions](guide_discussions.md) | [Context Curation](guide_context_curation.md) | [Models](guide_models.md) | [Architecture](guide_architecture.md)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
`src/aggregate.py` (518 lines) is the **context composition pipeline** — the single function that turns a project's `files` + `screenshots` + `history` config into the final markdown string the AI sees. It is called by:
|
||||
|
||||
- `src/ai_client.py:_send_anthropic`, `_send_deepseek`, `_send_gemini`, `_send_gemini_cli`, `_send_minimax` (every provider)
|
||||
- `src/app_controller.py:AppController._do_generate` (the main send path)
|
||||
- `src/app_controller.py:AppController._cb_start_track`, `AppController._process_event_queue`, `AppController._start_track_logic` (MMA paths)
|
||||
- `src/gui_2.py:App.run`, `App.main`, `App._render_snapshot_tab` (the GUI and the prior-session replay)
|
||||
- `simulation/sim_base.py:run_sim` and 6 other simulation entry points
|
||||
|
||||
This is one of the most-touched modules in the project. After the nagent_review, this pipeline is recognized as **Manual Slop's strongest curation dimension** (vs nagent's conversation-log dimension). See `conductor/tracks/nagent_review_20260608/report.md §6` and `decisions.md` candidate #7 for the related future-track.
|
||||
|
||||
> **Domain classification.** The pipeline is **Application**-domain. The MMA sub-agents consume it but the pipeline itself does not call into Meta-Tooling code. See `guide_meta_boundary.md`.
|
||||
|
||||
---
|
||||
|
||||
## The Pipeline At A Glance
|
||||
|
||||
```
|
||||
aggregate.run(config, aggregation_strategy)
|
||||
├─ find_next_increment(output_dir, namespace) # next file number for output
|
||||
├─ build_file_items(base_dir, files) # read + view-mode transform
|
||||
├─ build_markdown_from_items(file_items, ...) # compose sections
|
||||
│ ├─ ## Files (or Files (Summary) or Files (Tier 3 - Focused))
|
||||
│ │ └─ _build_files_section_from_items OR summarize.build_summary_markdown
|
||||
│ ├─ ## Screenshots (if any)
|
||||
│ ├─ ## Beads Mode: Progress Track (if execution_mode == "beads")
|
||||
│ └─ ## Discussion History (if any)
|
||||
└─ output_file.write_text(markdown)
|
||||
```
|
||||
|
||||
The **output** is a markdown file at `{output_dir}/{namespace}_{NNN}.md` where `NNN` is a zero-padded increment. The pipeline does not *send* the markdown — that's the AI client's job. The pipeline *produces* the markdown.
|
||||
|
||||
The **return value** is `(markdown: str, output_file: Path, file_items: list[dict])`. The file_items list is reused by callers that want to inspect the read state without re-reading from disk.
|
||||
|
||||
---
|
||||
|
||||
## The Three Aggregation Strategies
|
||||
|
||||
`aggregation_strategy: str` selects how files are rendered. The values:
|
||||
|
||||
| Strategy | File rendering | History rendering | Tier 3 handling | Use case |
|
||||
|---|---|---|---|---|
|
||||
| `auto` | If `summary_only` is True → summary; else → full | Standard | Standard | Default. Reads `config.project.summary_only`. |
|
||||
| `summarize` | Always `summarize.build_summary_markdown(file_items)` (compact multi-file view) | Standard | Standard | Token-budget-constrained runs. |
|
||||
| `full` | Always `_build_files_section_from_items(file_items)` (full content) | Standard | Standard | Debugging; when you want the AI to see everything. |
|
||||
|
||||
**Implementation:** `aggregate.py:330-346 build_markdown_from_items`. The three-way dispatch is at lines 335-339:
|
||||
|
||||
```python
|
||||
if aggregation_strategy == "summarize": parts.append("## Files (Summary)\n\n" + summarize.build_summary_markdown(file_items))
|
||||
elif aggregation_strategy == "full": parts.append("## Files\n\n" + _build_files_section_from_items(file_items))
|
||||
else: # auto
|
||||
if summary_only: parts.append("## Files (Summary)\n\n" + summarize.build_summary_markdown(file_items))
|
||||
else: parts.append("## Files\n\n" + _build_files_section_from_items(file_items))
|
||||
```
|
||||
|
||||
The `auto` strategy is the *only* one that respects `config.project.summary_only`; the other two are explicit overrides. Personas can also set `aggregation_strategy` (per `guide_personas.md`), and a persona-set strategy overrides the config-level setting.
|
||||
|
||||
---
|
||||
|
||||
## View Modes — The Per-File Transform
|
||||
|
||||
`view_mode: str` is the per-file content transform. The value is set on the `FileItem` (or the legacy dict-shaped config entry) and determines how the file's bytes are rendered into the markdown.
|
||||
|
||||
| View mode | Behavior | Source |
|
||||
|---|---|---|
|
||||
| `full` | Raw `path.read_text(encoding="utf-8")` content. | `aggregate.py:205` |
|
||||
| `summary` | `summarize.summarise_file(path, content)` — heuristic summary from `src/summarize.py`. | `aggregate.py:210` |
|
||||
| `skeleton` | For `.py`: `ASTParser("python").get_skeleton(content)` (tree-sitter). For `.c`/`.h`: `mcp_client.ts_c_get_skeleton`. For `.cpp`/`.hpp`: `mcp_client.ts_cpp_get_skeleton`. Other → summary. | `aggregate.py:211-220` |
|
||||
| `outline` | For `.py`: `ASTParser("python").get_code_outline(content)`. For C/C++: `mcp_client.ts_c*_get_code_outline`. Other → summary. | `aggregate.py:221-230` |
|
||||
| `masked` | For each `{symbol: mode}` in `ast_mask`, fetch `def` or `sig` via `mcp_client.py/ts_*_get_definition/signature`. Concatenate. | `aggregate.py:231-249` |
|
||||
| `none` | Literal string `"(context excluded)"` — the file is in the file_items list but contributes no content. | `aggregate.py:250` |
|
||||
| `custom` | Render only the `custom_slices` from the FileItem. Each slice is a `{start_line, end_line, tag, comment}` dict. Lines outside the slices are excluded. | `aggregate.py:251-266` |
|
||||
|
||||
**The default view mode** is `full`. The persona can override via `Persona.aggregation_strategy`; the FileItem can override via `FileItem.view_mode` or `FileItem.force_full` (which forces `full` regardless of the FileItem's own setting).
|
||||
|
||||
**Errors are graceful.** A `FileNotFoundError` produces `f"ERROR: file not found: {path}"` content with `error: True` and `mtime: 0.0`. A `view_mode` that throws produces `f"ERROR in {view_mode} view mode for {path}:\n{traceback.format_exc()}"`. Errors do not halt the pipeline.
|
||||
|
||||
---
|
||||
|
||||
## The FileItem Schema (Full)
|
||||
|
||||
`src/models.py:510-559 FileItem` is the **per-file curation memory** that nagent_review identified as Manual Slop's strongest dimension. The dataclass has 9 mutable fields + a `__post_init__` normalizer:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class FileItem:
|
||||
path: str # the artifact identity (path-keyed, no inode)
|
||||
auto_aggregate: bool = True # include in auto-aggregation? (skip in build_*_from_items if False)
|
||||
force_full: bool = False # bypass view_mode; force raw content
|
||||
view_mode: str = 'full' # one of: full, summary, skeleton, outline, masked, custom, none
|
||||
selected: bool = False # for batch operations (the Context Panel multi-select)
|
||||
ast_signatures: bool = False # include only signatures (skeleton-equivalent shortcut)
|
||||
ast_definitions: bool = False # include only definitions (skeleton-equivalent shortcut)
|
||||
ast_mask: dict[str, str] # per-symbol mask: {symbol_path: 'def'|'sig'|'hide'} (from Structural File Editor)
|
||||
custom_slices: list[dict] # Fuzzy Anchor slices: {start_line, end_line, tag, comment, ...}
|
||||
injected_at: Optional[float] # timestamp of last injection
|
||||
```
|
||||
|
||||
The 9 fields are *all* serialized by `to_dict()` and *all* deserialized by `from_dict()` (with `.get(..., default)` for forward compatibility). The dataclass is round-trip-safe through TOML.
|
||||
|
||||
`__post_init__` normalizes `custom_slices`: each slice dict gets `tag=None` and `comment=None` defaults added so downstream code can `.get("tag")` safely.
|
||||
|
||||
### The Custom Slice Schema
|
||||
|
||||
A `custom_slices` entry is `{start_line, end_line, tag, comment, ...}` (plus Fuzzy Anchor metadata). The full schema is in `src/fuzzy_anchor.py:FuzzyAnchor.create_slice`:
|
||||
|
||||
```python
|
||||
{
|
||||
"start_line": int, # 1-based original line
|
||||
"end_line": int, # 1-based original line (inclusive)
|
||||
"tag": str|None, # human label, defaults to None
|
||||
"comment": str|None, # human comment, defaults to None
|
||||
"content_hash": str, # SHA-256 of the slice content (for Fuzzy Anchor stability)
|
||||
"anchor_lines": [str, ...],# surrounding context for re-resolution
|
||||
# plus the original positioning metadata
|
||||
}
|
||||
```
|
||||
|
||||
When `view_mode == 'custom'`, the `aggregate.py:251-264` block renders each slice as:
|
||||
|
||||
```markdown
|
||||
---
|
||||
[Slice: <tag>] (<comment>)
|
||||
Lines <start>-<end>:
|
||||
<content>
|
||||
```
|
||||
|
||||
Multiple slices in a file are joined with `\n\n`.
|
||||
|
||||
---
|
||||
|
||||
## The ContextPreset Schema
|
||||
|
||||
`src/models.py:909-937 ContextPreset` is a *named, persisted set* of `FileItem`s — a reusable "context composition":
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ContextPreset:
|
||||
name: str # the preset name (used as TOML key)
|
||||
files: list[ContextFileEntry] = field(default_factory=list)
|
||||
screenshots: list[str] = field(default_factory=list)
|
||||
description: str = ""
|
||||
```
|
||||
|
||||
`ContextFileEntry` is a `FileItem` (or a string path that's promoted to a `FileItem` on load). The `description` is a human-readable label for the preset list.
|
||||
|
||||
`ContextPresetManager` (in `src/context_presets.py`, 30 lines) handles CRUD:
|
||||
- `save_preset(preset: ContextPreset)` writes to `manual_slop.toml` or a project TOML
|
||||
- `load_all() -> dict[str, ContextPreset]` reads all presets
|
||||
- `delete_preset(name: str)` removes a preset
|
||||
- `apply_preset(name: str)` switches the active context composition to the named preset
|
||||
|
||||
`reload_context_presets()` (in `app_controller.py`) is called when the project TOML changes; it validates that all files in the preset still exist and warns the user about any that don't.
|
||||
|
||||
**Scope:** ContextPresets can be **Global** (in `<user_config>/manual_slop.toml`) or **Project-specific** (in the project's `manual_slop.toml`). Project presets override global presets of the same name. This is the same scope-inheritance pattern as Personas, Presets, and Workspace Profiles.
|
||||
|
||||
---
|
||||
|
||||
## The Discussion History Section
|
||||
|
||||
`aggregate.py:109 build_discussion_section(history)` is the section that includes the prior conversation:
|
||||
|
||||
```python
|
||||
def build_discussion_section(history: list[Any]) -> str:
|
||||
sections = []
|
||||
for i, entry in enumerate(history, start=1):
|
||||
if isinstance(entry, dict):
|
||||
role = entry.get("role", "Unknown")
|
||||
content = entry.get("content", "").strip()
|
||||
text = f"{role}: {content}"
|
||||
else:
|
||||
text = str(entry).strip()
|
||||
sections.append(f"### Discussion Excerpt {i}\n\n{text}")
|
||||
return "\n\n---\n\n".join(sections)
|
||||
```
|
||||
|
||||
The section handles *both* legacy `list[str]` (e.g. `["User: ...", "AI: ..."]`) and the new `list[dict]` shape (`[{"role": ..., "content": ...}, ...]`). The dict shape is what's persisted by `_flush_disc_entries_to_project` (per `app_controller.py:3225-3240`) and what's stored in the new format.
|
||||
|
||||
The section is named **`## Discussion History`** and is placed at the *end* of the markdown (after files, screenshots, beads). This is deliberate: the cache-hit-friendly static prefix is at the top, the dynamic history is at the bottom. See `guide_architecture.md §"Cache Strategy"`.
|
||||
|
||||
---
|
||||
|
||||
## Cache Strategy
|
||||
|
||||
The pipeline is structured to maximize provider cache hits. The static prefix (Files + Screenshots + Beads) is the same across all turns of a discussion; only the Discussion History changes. The provider's cache key is the prefix; the history is appended.
|
||||
|
||||
`build_markdown_no_history` (`aggregate.py:348-353`) is the explicit "static-only" builder used by `_do_generate` *before* adding the history. The full builder is `build_markdown_from_items` which adds the history if non-empty. This split allows the AI client to:
|
||||
|
||||
1. Send the static prefix once.
|
||||
2. Append the history to the next send without re-sending the prefix.
|
||||
3. Re-use the cached prefix on the third send (if the files haven't changed).
|
||||
|
||||
The cache strategy is documented in detail in `guide_ai_client.md §"Caching Strategy"` and `guide_architecture.md §"Cache Hit Strategy"`.
|
||||
|
||||
---
|
||||
|
||||
## The Tier-3 Variant
|
||||
|
||||
`aggregate.py:364-454 build_tier3_context` is the **MMA worker context** — a different layout for sub-agent invocations. The differences from the standard pipeline:
|
||||
|
||||
1. **Focus files** (passed as `focus_files: list[str]`) are rendered as **full content** regardless of their `view_mode`. A file is a focus file if its `entry`, name, or path matches one of the focus paths.
|
||||
2. **Slices are resolved via FuzzyAnchor.** If a file has `custom_slices` and the file content has been modified since the slice was created, the FuzzyAnchor re-resolves the line ranges. This is critical for sub-agents receiving slices that may be stale.
|
||||
3. **Section header is `## Files (Tier 3 - Focused)`.** Distinct from the standard `## Files` so the worker (and its tools) can recognize its own context.
|
||||
4. **The `is_focus` check is multi-level.** Entry match, name match, path match, and substring match. Sub-agents with looser file-matching needs can pass a focus set that's just a list of basenames.
|
||||
|
||||
The Tier 3 build skips the `summarize.build_summary_markdown` path entirely; every file is rendered with `_build_files_section_from_items`-style formatting (or the AST skeleton for non-focus Python files, or the AST signature/outline for C/C++).
|
||||
|
||||
The Tier 3 build is called from `multi_agent_conductor.py:run_worker_lifecycle` via `aggregate.run(config, aggregation_strategy=tier_strategy)`.
|
||||
|
||||
---
|
||||
|
||||
## The Bypass — `force_full`
|
||||
|
||||
`FileItem.force_full = True` short-circuits the `view_mode` selection:
|
||||
|
||||
```python
|
||||
if force_full: view_mode = "full"
|
||||
```
|
||||
|
||||
This is set at the `FileItem` level (not the strategy level). Use case: the user has set a global "skeleton" view mode for the project but wants one specific file to always be inlined in full. The force is per-file and overrides both the FileItem's own `view_mode` and any strategy-level override.
|
||||
|
||||
For Tier 3, `force_full` is treated as a *focus flag*:
|
||||
|
||||
```python
|
||||
if is_focus or tier == 3 or force_full:
|
||||
# full content, no skeleton
|
||||
```
|
||||
|
||||
So a `force_full=True` file in a Tier 3 worker context is treated as a focus file and rendered in full.
|
||||
|
||||
---
|
||||
|
||||
## Auto-Aggregate Skip
|
||||
|
||||
`FileItem.auto_aggregate = False` causes the file to be *included in the file_items list* but *excluded from the rendered markdown*:
|
||||
|
||||
```python
|
||||
for item in file_items:
|
||||
if not item.get("auto_aggregate", True): continue
|
||||
# ... build section
|
||||
```
|
||||
|
||||
Use case: the file is in the `files` list for the AI's *awareness* (e.g. "you can read it via `read_file`") but should not be inlined. The file's `mtime` and `view_mode` are still tracked; the file is *omitted* from the rendered markdown.
|
||||
|
||||
This is distinct from `view_mode == "none"`:
|
||||
- `auto_aggregate = False` → file is not in the rendered markdown at all (no `### File` header)
|
||||
- `view_mode = "none"` → file is in the rendered markdown as `### File (excluded)` with a `"(context excluded)"` body
|
||||
|
||||
The two are useful for different scenarios. `auto_aggregate = False` is for "the AI knows the file exists, can read it on demand." `view_mode = "none"` is for "the AI knows we deliberately excluded this content."
|
||||
|
||||
---
|
||||
|
||||
## Screenshots
|
||||
|
||||
`aggregate.py:126-140 build_screenshots_section` renders the screenshots list as a `## Screenshots` markdown section. Each screenshot is rendered as `` (markdown image syntax). Path resolution uses `resolve_paths` (same as for files), so wildcards and absolute paths work.
|
||||
|
||||
**Screenshots are placed *after* Files and *before* Beads and Discussion History.** This is a deliberate ordering: the AI sees the project's files first (the static content), then the screenshots (the visual context), then the beads status (if applicable), then the discussion history (the dynamic content).
|
||||
|
||||
---
|
||||
|
||||
## Beads Mode
|
||||
|
||||
When `execution_mode == "beads"` (set in `config.project.execution_mode`), the pipeline appends a `## Beads Mode: Progress Track` section between Screenshots and Discussion History. The section is built by `aggregate.py:309-328 build_beads_section`:
|
||||
|
||||
- Lists all *completed* beads as a comma-separated list
|
||||
- Lists all *active* beads as bullet points with title, id, and description
|
||||
|
||||
`build_beads_section` returns an empty string if the project is not a Beads project (`client.is_initialized()` is False) or if there are no beads. The caller (`build_markdown_from_items`) checks the truthiness before appending.
|
||||
|
||||
See `guide_beads.md` for the full Beads integration.
|
||||
|
||||
---
|
||||
|
||||
## Output File Numbering
|
||||
|
||||
`find_next_increment(output_dir, namespace)` (`aggregate.py:36-44`) scans `output_dir` for files matching `^{namespace}_(\d+)\.md$` and returns `max_num + 1`. The output filename is `{namespace}_{NNN:03d}.md` (zero-padded to 3 digits). The increment starts at 1 and grows monotonically.
|
||||
|
||||
The increment is the *artifact identity* for the conversation. Each turn produces a new file. The current implementation does *not* delete old files; the `LogPruner` (per `guide_architecture.md`) handles cleanup separately.
|
||||
|
||||
---
|
||||
|
||||
## Pipeline Callers
|
||||
|
||||
`aggregate.run` is called from many places. The most important:
|
||||
|
||||
| Caller | Purpose |
|
||||
|---|---|
|
||||
| `src/ai_client.py:_send_anthropic` | Build the markdown for an Anthropic send. |
|
||||
| `src/ai_client.py:_send_gemini` | Build the markdown for a Gemini send. |
|
||||
| `src/ai_client.py:_send_deepseek` | Build the markdown for a DeepSeek send. |
|
||||
| `src/ai_client.py:_send_gemini_cli` | Build the markdown for a Gemini CLI send. |
|
||||
| `src/ai_client.py:_send_minimax` | Build the markdown for a MiniMax send. |
|
||||
| `src/app_controller.py:AppController._do_generate` | The main 1:1 send path. |
|
||||
| `src/app_controller.py:AppController._cb_start_track` | Start a new MMA track. |
|
||||
| `src/app_controller.py:AppController._process_event_queue` | Process a queued event (e.g. send, switch discussion). |
|
||||
| `src/multi_agent_conductor.py:run_worker_lifecycle` | Spawn a Tier 3 worker (with Tier 3 context). |
|
||||
| `src/gui_2.py:App.run` | The main GUI loop. |
|
||||
| `src/gui_2.py:App._render_snapshot_tab` | Render a prior-session replay snapshot. |
|
||||
| `simulation/sim_base.py:run_sim` | Run a simulation. |
|
||||
|
||||
The aggregation strategy is set per-call:
|
||||
- The main `_do_generate` uses `config.project.aggregation_strategy` (which is the persona-set strategy if a persona is active).
|
||||
- MMA worker contexts use the worker's `aggregation_strategy` from the ticket config.
|
||||
- The simulation uses a fixed `auto`.
|
||||
|
||||
---
|
||||
|
||||
## Public API Surface
|
||||
|
||||
The public API of `aggregate.py` is:
|
||||
|
||||
| Function | Signature | Purpose |
|
||||
|---|---|---|
|
||||
| `find_next_increment` | `(output_dir: Path, namespace: str) -> int` | Next file number for output. |
|
||||
| `resolve_paths` | `(base_dir: Path, entry: str) -> list[Path]` | Expand globs and absolute paths. Blacklist `history.toml` and `*_history.toml`. |
|
||||
| `group_files_by_dir` | `(files: list[Any]) -> dict[str, list[Any]]` | Group FileItems by relative directory path (used by the Context Panel UI). |
|
||||
| `compute_file_stats` | `(abs_path: str) -> dict[str, int]` | Line count + AST element count for Python files. |
|
||||
| `build_file_items` | `(base_dir, files) -> list[dict]` | Read + view-mode transform per file. The most-called function. |
|
||||
| `build_discussion_section` | `(history) -> str` | Render the `## Discussion History` markdown. |
|
||||
| `build_screenshots_section` | `(base_dir, screenshots) -> str` | Render the `## Screenshots` markdown. |
|
||||
| `build_beads_section` | `(base_dir) -> str` | Render the `## Beads Mode: Progress Track` markdown. |
|
||||
| `build_markdown_from_items` | `(file_items, screenshot_base_dir, screenshots, history, summary_only, aggregation_strategy, execution_mode, base_dir) -> str` | Compose all sections. The "compose" function. |
|
||||
| `build_markdown_no_history` | `(file_items, screenshot_base_dir, screenshots, summary_only, aggregation_strategy) -> str` | Compose without history (for stable caching). |
|
||||
| `build_discussion_text` | `(history) -> str` | Just the history section, for callers that want to append to a pre-built static prefix. |
|
||||
| `build_tier3_context` | `(file_items, screenshot_base_dir, screenshots, history, focus_files) -> str` | Tier 3 worker context. |
|
||||
| `build_markdown` | `(base_dir, files, screenshot_base_dir, screenshots, history, summary_only, execution_mode) -> str` | Convenience: read files + compose. |
|
||||
| `run` | `(config, aggregation_strategy) -> tuple[str, Path, list[dict]]` | The full pipeline. |
|
||||
| `main` | `() -> None` | CLI entry point. Loads config, calls `run`, prints output path. |
|
||||
|
||||
**Performance:** the entire pipeline is O(N) in the number of files, with the per-file AST work being the most expensive step. `build_tier3_context` includes `with get_monitor().scope("build_tier3_context")` (and similar for `build_file_items` and `build_markdown_no_history`) for performance monitoring. The monitor is documented in `guide_architecture.md §"Performance"`.
|
||||
|
||||
---
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
The `view_mode` selection has a meaningful performance impact:
|
||||
|
||||
| view_mode | Per-file cost | When to use |
|
||||
|---|---|---|
|
||||
| `full` | 1 file read + string concat | Small files, files the user is actively editing. |
|
||||
| `summary` | 1 file read + 1 heuristic call to `summarize.summarise_file` | Large files where structural info is enough. |
|
||||
| `skeleton` | 1 file read + 1 tree-sitter parse + skeleton build | Python/C/C++ files where the structure matters more than the content. |
|
||||
| `outline` | 1 file read + 1 tree-sitter parse + outline build | When the AI only needs the public API surface. |
|
||||
| `masked` | 1 file read + N `mcp_client.py/ts_*_get_*` calls (one per masked symbol) | When the user has explicitly marked symbols as "def" or "sig". |
|
||||
| `none` | 1 file read (still reads the bytes, just discards) | When the user wants the file in the list but not in the rendered markdown. |
|
||||
| `custom` | 1 file read + line slicing per slice | When the user has explicitly created Fuzzy Anchor slices. |
|
||||
|
||||
The `force_full = True` and `auto_aggregate = False` flags skip *some* of the work:
|
||||
- `force_full = True` skips the view-mode dispatch and goes straight to raw content.
|
||||
- `auto_aggregate = False` skips the view-mode dispatch entirely and skips the markdown section build.
|
||||
|
||||
For very large codebases (1000+ files), the bottleneck is the tree-sitter parsing for `skeleton` / `outline` / `masked` modes. The Tier 3 builder uses `ASTParser("python")` lazily (`if not parser: parser = ASTParser("python")`) so the tree-sitter grammar is loaded only once per pipeline call.
|
||||
|
||||
---
|
||||
|
||||
## Tests
|
||||
|
||||
- `tests/test_aggregate_flags.py` — `test_auto_aggregate_skip`, `test_force_full`, `test_view_mode_full`, `test_view_mode_summary`, `test_view_mode_skeleton`, `test_view_mode_outline`, `test_view_mode_none`, `test_view_mode_custom`, `test_view_mode_masked`
|
||||
- `tests/test_aggregate_beads.py` — `test_build_beads_compaction`
|
||||
- `tests/test_context_composition_phase3.py` — `test_group_files_by_dir`, `test_compute_file_stats`
|
||||
- `tests/test_context_composition_phase6.py` — `test_view_mode_default_summary`, `test_view_mode_full`, `test_view_mode_none`, `test_view_mode_outline`, `test_view_mode_skeleton`, `test_view_mode_summary`, `test_view_mode_custom`, `test_view_mode_custom_empty_default_to_summary`, `test_files_section_rendering`
|
||||
- `tests/test_tiered_context.py` — `test_build_tier3_context_exists`, `test_build_tier3_context_ast_skeleton`, `test_build_tier3_context_scaling`, `test_tiered_context_by_tier_field`, `test_build_file_items_with_tiers`, `test_build_files_section_with_dicts`
|
||||
- `tests/test_ast_masking_core.py` — `test_ast_masking_gencpp_samples`
|
||||
- `tests/test_gencpp_full_suite.py` — `test_gencpp_full_suite`
|
||||
- `tests/test_perf_aggregate.py` — `test_build_tier3_context_scaling`
|
||||
- `tests/test_history_management.py` — `test_aggregate_blacklist`, `test_aggregate_includes_segregated_history`, `test_aggregate_respects_*`
|
||||
- `tests/test_ui_summary_only_removal.py` — `test_aggregate_from_items_respects_auto_aggregate`
|
||||
- `tests/test_aggregate_helpers.py` — `test_resolve_paths_blacklist`, `test_resolve_paths_glob`, `test_resolve_paths_absolute`
|
||||
- `tests/test_aggregate_perf.py` — `test_find_next_increment_*`
|
||||
|
||||
---
|
||||
|
||||
## Cross-References
|
||||
|
||||
- **The pipeline source:** `src/aggregate.py` (518 lines)
|
||||
- **FileItem schema:** `src/models.py:510-559 FileItem`
|
||||
- **ContextPreset schema:** `src/models.py:909-937 ContextPreset`
|
||||
- **ContextPresetManager:** `src/context_presets.py` (30 lines)
|
||||
- **AI client consumption:** `src/ai_client.py:_send_<provider>` × 5, see `guide_ai_client.md`
|
||||
- **Tier 3 worker consumption:** `src/multi_agent_conductor.py:run_worker_lifecycle`, see `guide_multi_agent_conductor.md`
|
||||
- **Per-file curation features:** `guide_context_curation.md` (Fuzzy Anchors, AST Inspector, Granular AST Control)
|
||||
- **Cache strategy:** `guide_architecture.md §"Cache Hit Strategy"`, `guide_ai_client.md §"Caching"`
|
||||
- **Discussion section builder:** `guide_discussions.md §"Persistence"`, `src/aggregate.py:109 build_discussion_section`
|
||||
- **Deep-dive on the design philosophy:** `conductor/tracks/nagent_review_20260608/report.md §6` (per-file memory)
|
||||
- **Actionable patterns for richer per-file memory:** `conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md §4` (file_id), §6 (git history), §7 (Meta-Tooling DSL)
|
||||
- **Future-track candidate for per-file conversation log:** `conductor/tracks/nagent_review_20260608/decisions.md` candidate #7
|
||||
Reference in New Issue
Block a user