Private
Public Access
0
0

docs(architecture): add MiniMax provider, RAG integration, Tier 4 patch flow, discussion compression, subagent summarization, async tool execution

This commit is contained in:
2026-06-02 18:57:56 -04:00
parent a66878c168
commit f81f1f5eaa
+178 -13
View File
@@ -395,9 +395,10 @@ def resolve_pending_action(self, action_id: str, approved: bool) -> bool:
### Module-Level State
```python
_provider: str = "gemini" # "gemini" | "anthropic" | "deepseek" | "gemini_cli"
_provider: str = "gemini" # "gemini" | "anthropic" | "deepseek" | "gemini_cli" | "minimax"
_model: str = "gemini-2.5-flash-lite"
_temperature: float = 0.0
_top_p: float = 1.0
_max_tokens: int = 8192
_history_trunc_limit: int = 8000 # Char limit for truncating old tool outputs
@@ -411,7 +412,9 @@ Per-provider client objects:
_gemini_client: genai.Client | None
_gemini_chat: Any # Holds history internally
_gemini_cache: Any # Server-side CachedContent
_gemini_cache_md_hash: int | None # For cache invalidation
_gemini_cache_md_hash: str | None # Hash for cache invalidation
_gemini_cache_created_at: float | None # Monotonic time of cache creation
_gemini_cached_file_paths: list[str] # File paths included in the active cache
_GEMINI_CACHE_TTL: int = 3600 # 1-hour; rebuilt at 90% (3240s)
# Anthropic (client-managed history)
@@ -420,9 +423,15 @@ _anthropic_history: list[dict] # Mutable [{role, content}, ...]
_anthropic_history_lock: threading.Lock
# DeepSeek (raw HTTP, client-managed history)
_deepseek_client: Any | None
_deepseek_history: list[dict]
_deepseek_history_lock: threading.Lock
# MiniMax (raw HTTP, client-managed history)
_minimax_client: Any | None
_minimax_history: list[dict]
_minimax_history_lock: threading.Lock
# Gemini CLI (adapter wrapper)
_gemini_cli_adapter: GeminiCliAdapter | None
```
@@ -442,27 +451,41 @@ _GEMINI_MAX_INPUT_TOKENS: int = 900_000 # 1M window minus headroom
```python
def send(md_content, user_message, base_dir=".", file_items=None,
discussion_history="", stream=False,
pre_tool_callback=None, qa_callback=None) -> str:
pre_tool_callback=None, qa_callback=None,
enable_tools=True, stream_callback=None, patch_callback=None,
rag_engine=None) -> str:
with _send_lock:
if _provider == "gemini": return _send_gemini(...)
elif _provider == "gemini_cli": return _send_gemini_cli(...)
elif _provider == "anthropic": return _send_anthropic(...)
elif _provider == "deepseek": return _send_deepseek(..., stream=stream)
elif _provider == "minimax": return _send_minimax(..., stream=stream)
```
`_send_lock` serializes all API calls — only one provider call can be in-flight at a time. All providers share the same callback signatures. Return type is always `str`.
**Parameter evolution** (newer parameters, may be missing from older docstring mirrors):
- `enable_tools: bool = True` — Per-call gate for the PowerShell + MCP tool set. Tier 4 and certain planning calls pass `enable_tools=False` to force text-only responses.
- `stream_callback: Optional[Callable[[str], None]]` — Provider-specific streaming sink. The DeepSeek and MiniMax paths invoke this as tokens arrive; other providers deliver the full response after the network round-trip.
- `patch_callback: Optional[Callable[[str, str], Optional[str]]]` — Tier 4 patch generation hook. Receives `(error_text, file_context)` and returns an optional diff. See [Tier 4 Patch Generation](#tier-4-patch-generation-flow) below.
- `rag_engine: Optional[Any]` — When provided, the dispatcher injects RAG-retrieved context into `md_content` before the provider call. The RAG engine is owned by the caller (typically `AppController` or `multi_agent_conductor.run_worker_lifecycle`); `ai_client` does not own its lifecycle. See [RAG Integration](#rag-integration) below.
`_send_lock` serializes all API calls — only one provider call can be in-flight at a time. All providers share the same callback signatures. Return type is always `str`.
### Provider Comparison
| Aspect | Gemini SDK | Anthropic | DeepSeek | Gemini CLI |
|---|---|---|---|---|
| **Client** | `genai.Client` | `anthropic.Anthropic` | Raw `requests.post` | `GeminiCliAdapter` (subprocess) |
| **History** | SDK-managed (`_gemini_chat._history`) | Client-managed list | Client-managed list | CLI-managed (session ID) |
| **Caching** | Server-side `CachedContent` with TTL | Prompt caching via `cache_control: ephemeral` (4 breakpoints) | None | None |
| **Tool format** | `types.FunctionDeclaration` | JSON Schema dict | Not declared | Same as SDK via adapter |
| **Tool results** | `Part.from_function_response(response={"output": ...})` | `{"type": "tool_result", "tool_use_id": ..., "content": ...}` | `{"role": "tool", "tool_call_id": ..., "content": ...}` | `{"role": "tool", ...}` |
| **History trimming** | In-place at 40% of 900K token estimate | 2-phase: strip stale file refreshes, then drop turn pairs at 180K | None | None |
| **Streaming** | No | No | Yes | No |
| Aspect | Gemini SDK | Anthropic | DeepSeek | Gemini CLI | MiniMax |
|---|---|---|---|---|---|
| **Client** | `genai.Client` | `anthropic.Anthropic` | Raw `requests.post` | `GeminiCliAdapter` (subprocess) | Raw `requests.post` (OpenAI-compatible endpoint) |
| **History** | SDK-managed (`_gemini_chat._history`) | Client-managed list | Client-managed list | CLI-managed (session ID) | Client-managed list |
| **Caching** | Server-side `CachedContent` with TTL | Prompt caching via `cache_control: ephemeral` (4 breakpoints) | None | None | None |
| **Tool format** | `types.FunctionDeclaration` | JSON Schema dict | Not declared | Same as SDK via adapter | Not declared |
| **Tool results** | `Part.from_function_response(response={"output": ...})` | `{"type": "tool_result", "tool_use_id": ..., "content": ...}` | `{"role": "tool", "tool_call_id": ..., "content": ...}` | `{"role": "tool", ...}` | `{"role": "tool", "tool_call_id": ..., "content": ...}` |
| **History trimming** | In-place at 40% of 900K token estimate | 2-phase: strip stale file refreshes, then drop turn pairs at 180K | None | None | 2-phase: drop turn pairs at 180K (Anthropic-equivalent) |
| **Streaming** | No | No | Yes | No | Yes |
| **Error classifier** | `_classify_gemini_error` | `_classify_anthropic_error` | `_classify_deepseek_error` | (inherits Gemini) | `_classify_minimax_error` |
| **Repair hook** | (SDK self-heals) | `_repair_anthropic_history` | `_repair_deepseek_history` | (CLI handles) | `_repair_minimax_history` |
### Tool-Call Loop (common pattern across providers)
@@ -512,9 +535,151 @@ Before placing breakpoint 4, all existing `cache_control` is stripped from histo
System instruction content is hashed. On each call, a 3-way decision:
- **Hash changed**: Delete old cache, rebuild with new content.
- **Cache age > 90% of TTL**: Proactive renewal (delete + rebuild).
- **Cache age > 90% of TTL**: Proactive renewal (delete + rebuild). `cache_created_at` is tracked via `time.monotonic()` for this check.
- **No cache exists**: Create new `CachedContent` if token count >= 2048; otherwise inline.
The active cache's file inclusion set is tracked in `_gemini_cached_file_paths: list[str]`. On rebuild, the list is replaced atomically. The GUI uses this list to render the "cached files" indicator in the Cache Panel.
---
## Async Tool Execution
Independent tool calls within a single round execute concurrently via `asyncio.gather`. This is the major latency win: when the AI emits 3 read_file calls in one turn, they run in parallel rather than sequentially.
### Entry Point
```python
async def _execute_tool_calls_concurrently(
calls: list[Any],
base_dir: str,
pre_tool_callback: ...,
qa_callback: ...,
r_idx: int,
provider: str,
patch_callback: ... = None,
) -> list[tuple[str, str, str, str]]: # (tool_name, call_id, output, original_name)
...
```
### Per-Call Worker
```python
async def _execute_single_tool_call_async(
name: str, args: dict, call_id: str, base_dir: str,
pre_tool_callback, qa_callback, r_idx: int,
tier: str | None = None,
patch_callback = None,
) -> tuple[str, str, str, str]:
...
```
`tier: str | None` is propagated to the comms log and pre-tool callback so audit trails can attribute tool calls to a specific MMA tier (e.g., "Tier 3", "Tier 4"). Thread-local `_local_storage.current_tier` is the source; the parameter is the explicit pass-through.
### Exception Handling
If any individual call raises, `asyncio.gather` with `return_exceptions=True` converts the exception to a returned value rather than cancelling siblings. The post-round loop in `_send_*` then formats the error per provider. See [guide_tools.md](guide_tools.md#parallel-tool-execution) for the full implementation pattern and the timing analysis (sequential vs concurrent latency for a typical 3-call round).
---
## RAG Integration
`ai_client.send()` accepts an optional `rag_engine` parameter. When supplied, the dispatcher augments `md_content` with RAG-retrieved context before the provider call.
```python
def send(md_content, user_message, base_dir=".", file_items=None, ...,
rag_engine: Optional[Any] = None) -> str:
if rag_engine is not None:
retrieved = rag_engine.query(user_message, top_k=5)
md_content = _inject_rag_context(md_content, retrieved)
...
```
The RAG engine is **not** owned by `ai_client`; the caller (typically `AppController` for the main discussion flow, or `multi_agent_conductor.run_worker_lifecycle` for Tier 3 workers) is responsible for instantiating and configuring it. This keeps `ai_client` decoupled from any specific retrieval backend (ChromaDB local, external MCP RAG server, or none).
**Lifecycle**:
- The `AppController` constructs a single `RAGEngine` per project load.
- The RAG engine is passed through to `send()` for every AI call.
- If a project disables RAG, `rag_engine=None` is passed and the integration is a no-op.
- See [guide_rag.md](guide_rag.md) (placeholder; written in Task 10) for the vector store, chunking, and indexing pipeline.
---
## Tier 4 Patch Generation Flow
When a Tier 3 worker's test run fails, the engine can request a Tier 4 patch instead of just an error summary. This is a structured diff, not a free-form suggestion.
### Entry Point
```python
def run_tier4_patch_generation(error: str, file_context: str) -> str:
...
```
### Flow
1. Tier 3 worker fails a test; `stderr` is captured by the test runner.
2. The conductor thread calls `run_tier4_patch_callback(stderr, base_dir)` to get a candidate patch.
3. If a patch is generated, the GUI's patch modal (`src/patch_modal.py`) presents the diff for human review.
4. User clicks Apply Patch to resume the pipeline, or Reject to send the worker back for another attempt.
5. The `patch_callback` parameter on `send()` is the Tier 4 hook; it can be `None` for callers that don't support patch generation.
### Threading
`run_tier4_patch_generation` calls `send()` with `enable_tools=False` to force a text-only response. The result is parsed as a unified diff. If parsing fails, the modal shows the raw response and the user can manually copy-edit.
---
## Discussion Compression
Long discussions accumulate tool outputs and intermediate reasoning that bloat the context. The `run_discussion_compression` function asks the active provider to produce a compressed summary of the discussion so far.
### Entry Point
```python
def run_discussion_compression(discussion_text: str) -> str:
...
```
### Flow
1. Caller (typically the GUI's "Compress Discussion" button or an automatic trigger when history exceeds N tokens) invokes `run_discussion_compression(current_history)`.
2. The function dispatches to the active provider with `enable_tools=False` and a fixed system prompt instructing the model to summarize while preserving key decisions, file paths, and unresolved questions.
3. The returned string replaces the discussion history in subsequent `send()` calls.
4. The original history is archived to the session log (`logs/sessions/<id>/comms.log`) for audit.
### Provider Robustness
The function tolerates case- and whitespace-variation in the provider string (`" MiniMax "` is normalized to `"minimax"`). This is important because the active provider may be set via different code paths (TOML, env var, runtime override).
---
## Subagent Summarization
For very large files, the heuristic `summarise_file` in `src/summarize.py` may be insufficient. The `run_subagent_summarization` function asks the active provider to produce a high-signal summary of a single file using a model call rather than a heuristic.
### Entry Point
```python
def run_subagent_summarization(file_path: str, content: str, is_code: bool, outline: str) -> str:
...
```
### When Invoked
- File exceeds the heuristic summary's effective scope (configurable, typically > 5000 lines or > 100KB)
- The aggregation strategy in `aggregate.py` is set to `summarize` (rather than `full` or `skeleton`)
- The Tier 2 ticket generation explicitly requests a sub-agent summary for a high-priority file
### Flow
1. Caller builds a structured prompt combining the file path, content, an AST outline (if `is_code=True`), and a "summary" instruction.
2. The function dispatches to the active provider with `enable_tools=False`.
3. The returned string is the file's summary, which replaces the full content in the aggregated context.
### Cost vs Quality Trade-off
Sub-agent summarization is more expensive than heuristic summarization (one full provider call per file) but produces higher-quality results for complex files. The caller decides based on the project's token budget and quality requirements.
---
## Comms Log System