chore(conductor): Complete Code Path & Data Pipeline Analysis

This commit is contained in:
2026-05-07 22:01:25 -04:00
parent aff88bd151
commit 822d803ad8
3 changed files with 191 additions and 18 deletions
+173
View File
@@ -0,0 +1,173 @@
# Code Path & Data Pipeline Analysis
This document tracks the analysis of major processing routes and data pipelines within the Manual Slop codebase, following a pipeline-oriented architectural model.
---
## Executive Summary
This analysis maps the Manual Slop codebase as a series of data-driven pipelines. The system transitions from asynchronous background services (AI, MMA) to a synchronous frame-based GUI, and uses a Puppeteer-style simulation framework for automated verification.
---
## 1. Top-Level Entry Points
### 1.1 GUI Entry Point (`src/gui_2.py`)
- **Main Driver:** `main()` function initiates the `App` instance and calls `app.run()`.
- **Primary Rendering Loop:** Powered by `immapp.run()` from `imgui-bundle`. The per-frame UI state logic resides in `App._gui_func`.
- **Background Event Loop:** `AppController` is initialized within `App.__init__` and runs a dedicated background thread (`_process_event_queue` in `app_controller.py`) for processing AI requests and non-UI tasks.
### 1.2 Simulation Entry Points (`simulation/`)
- **Lifecycle Orchestrator:** `run_sim()` in `sim_base.py` manages the standard `setup() -> run() -> teardown()` pipeline.
- **Base Class:** `BaseSimulation` in `sim_base.py` defines the interface for all simulation tasks.
- **High-Level Turn Loop:** `WorkflowSimulator.run_discussion_turn()` in `workflow_sim.py` implements a polling loop that monitors `ai_status` and message history via the `ApiHookClient` to orchestrate multi-turn interactions.
---
## 2. Core Source Pipelines (`./src`)
### 2.1 Context Aggregation Pipeline
```mermaid
graph TD
A[aggregate.run] --> B[resolve_paths]
B --> C[build_file_items]
C --> D{summary_only?}
D -- Yes --> E[summarize.py]
D -- No --> F[build_markdown]
E --> F
F --> G[Monolithic Markdown Context]
```
- **Entry Point:** `aggregate.run()`
- **Route:**
1. **Path Resolution:** `resolve_paths()` handles globs and absolute paths from the project configuration.
2. **Item Construction:** `build_file_items()` reads raw content, modification times, and tier metadata.
3. **Summarization (Optional):** If `summary_only` is enabled, items are piped through `summarize.py` for AST-based or heuristic compression.
4. **Markdown Synthesis:** `build_markdown_from_items()` (or tier-specific variants) assembles the files, screenshots (`build_screenshots_section`), and discussion history (`build_discussion_section`) into the final context string.
- **Data Responsibility:**
- **Owned:** `FileItem` list, `history` list.
- **Mutated:** None (pure synthesis pipeline).
- **Terminal Output:** A monolithic Markdown string and a list of `file_items` (for provider-specific file uploads).
### 2.2 AI Interaction & Tool-Call Loop
```mermaid
graph TD
A[ai_client.send] --> B[Prompt Assembly]
B --> C[Provider SDK Call]
C --> D{Tool Call?}
D -- Read-Only --> E[mcp_client]
D -- Mutating --> F[GUI Approval Modal]
D -- PowerShell --> G[shell_runner.run_powershell]
E --> H[Tool Result]
F -- Approved --> G
G --> H
H --> I[Append Result to History]
I --> C
D -- No --> J[Final AI Response]
```
- **Entry Point:** `ai_client.send()`
- **Route:**
1. **Provider Selection:** Logic routes to `_send_gemini`, `_send_anthropic`, etc., based on configuration.
2. **Prompt Assembly:** Combines the project context (from Pipeline 2.1) with conversation history and provider-specific system instructions.
3. **Execution Loop:** Handles multi-turn tool calling (up to `MAX_TOOL_ROUNDS`).
4. **Tool Dispatch:**
- **Read-Only:** Calls `mcp_client` tools directly.
- **Mutating:** Triggers `pre_tool_callback` (GUI modal) for user approval.
- **PowerShell:** `_run_script()` delegates to `shell_runner.run_powershell()`.
5. **Response Synthesis:** Final AI text or tool results are returned to the caller.
- **Data Responsibility:**
- **Owned:** Conversation history, tool schemas, API credentials.
- **Mutated:** Conversation history (appends turns), `cost_tracker` state.
- **Terminal Output:** Final AI message, generated scripts, and updated conversation state.
### 2.3 GUI Event & State Synchronization
```mermaid
graph LR
subgraph Foreground [gui_2.py - ImGui Loop]
A[App._gui_func] --> B[_process_pending_gui_tasks]
B --> C[Trigger Modals / Update Panels]
end
subgraph Background [app_controller.py - Event Loop]
D[AppController._process_event_queue] --> E{Event Type}
E -- user_request --> F[Trigger AI Loop]
E -- response --> G[Queue gui_task]
G --> B
end
UI[User Input] --> D
```
- **Entry Points:** `gui_2.py:App._gui_func()` (Foreground), `app_controller.py:AppController._process_event_queue()` (Background).
- **Route:**
1. **User Action:** UI event (e.g., clicking "Send") places a request in `AppController.event_queue`.
2. **Background Dispatch:** `_process_event_queue()` identifies the event type. `user_request` spawns a thread (`_handle_request_event`) to trigger Pipeline 2.2 (AI Loop).
3. **Task Queuing:** Background services (AI, MMA, Indexing) place `gui_task` or `mma_state_update` objects into `AppController._pending_gui_tasks`.
4. **Foreground Sync:** `App._gui_func()` checks for pending tasks every frame via `_process_pending_gui_tasks()`, updating the ImGui state and triggering modals.
- **Data Responsibility:**
- **Owned:** ImGui window states, panel visibility, text viewer buffers.
- **Mutated:** `ai_status`, `mma_status`, pending tool call lists.
- **Terminal Output:** Updated UI visuals and user-approved actions.
---
## 3. Simulation Pipelines (`./simulation`)
### 3.1 Simulation Lifecycle
```mermaid
graph TD
A[run_sim] --> B[BaseSimulation.setup]
B --> C[Scaffold Temp Project]
C --> D[Simulation.run]
D --> E[WorkflowSimulator.run_discussion_turn]
E --> F[wait_for_ai_response]
F --> G{Status == idle & Last == AI?}
G -- No --> F
G -- Yes --> H[Validation/Assertions]
H --> I[BaseSimulation.teardown]
```
- **Entry Point:** `run_sim(MySimulation)`
- **Route:**
1. **Scaffolding:** `BaseSimulation.setup()` initializes the `ApiHookClient`, clears the current session, and creates a temporary test project.
2. **Workflow Orchestration:** `WorkflowSimulator.setup_new_project()` and `create_discussion()` configure the UI state for the test scenario.
3. **Interaction Loop:** `WorkflowSimulator.run_discussion_turn()` manages the multi-turn exchange.
- Polling: Continuously checks `ai_status` via HTTP hooks.
- Stall Recovery: Automatically re-triggers the Send action if the AI stops without a final response (e.g., after a tool call).
4. **Validation:** Subclasses perform assertions against the UI state (e.g., `assert_panel_visible()`).
5. **Cleanup:** `BaseSimulation.teardown()` handles resource deallocation.
- **Data Responsibility:**
- **Owned:** Mock project paths, synthetic user messages.
- **Mutated:** Global `ai_status` (indirectly via Hooks), target file system in the test project.
- **Terminal Output:** Test pass/fail status, performance/coverage metrics.
### 3.2 Verification & Checkpointing Protocol
- **Turn Completion Logic:** `WorkflowSimulator.wait_for_ai_response()` implements a state machine for turn detection.
- **Transition-Based:** Tracks `was_busy` (status in ["thinking", "streaming", "running powershell", etc.]) and triggers completion when status returns to "idle" and the last history role is "AI".
- **Error Handling:** GUI-reported "error" statuses trigger an immediate abort.
- **Stall Recovery:** Detects "stalled" turns where the last role is "Tool" but the system is "idle" (indicating a tool result was received but the AI didn't automatically continue). The simulator re-triggers the `btn_gen_send` hook to force progress.
- **State Determinism:** Simulations force `auto_add_history=True` and reset sessions during `setup()` to ensure a clean slate for verification.
---
## 4. Data Responsibility & State Boundaries
*Mapping which pipelines own and mutate specific data structures.*
| Pipeline | Primary Data Owned | Mutated State | Terminal Output |
| :--- | :--- | :--- | :--- |
| **2.1 Context Aggregation** | `FileItem` list, `history` list | None (Pure Synthesis) | Markdown Context String |
| **2.2 AI Interaction** | AI History, Tool Schemas | `history` (Turns), `cost_tracker` | AI Response, Tool Calls |
| **2.3 GUI & Sync** | ImGui State, Controller Config | `ai_status`, `pending_tasks` | Visual Feedback, Log Entries |
| **Simulation (3.1)** | `BaseSimulation` state, Mock Hooks | Virtual `ai_status`, polled history | Test Pass/Fail, Coverage Metrics |
---
## 5. Identified Redundancies & Curation Targets
*List of specific areas for pruning in the next phase.*
### 5.1 Configuration & Model Redundancies
- **Duplicate Class Definitions:** `models.py` contains redundant definitions for `TextEditorConfig` and `ExternalEditorConfig`.
- **Provider Registry:** Both `gui_2.py` and `app_controller.py` maintain their own `PROVIDERS` list. This should be consolidated into `models.py` or a dedicated config module.
### 5.2 Processing Overlap
- **Context Synthesis:** `aggregate.py` has several tier-specific functions (`build_tier1_context`, `build_tier2_context`, etc.) that share significant boilerplate logic. These should be refactored into a single param-driven pipeline.
- **Simulation Setup:** `WorkflowSimulator` and `BaseSimulation` have overlapping responsibilities for project scaffolding and session resetting.
### 5.3 Style & Integrity Violations
- **Inconsistent Docstrings:** Some older modules lack the standardized "Architecture" and "Key Components" headers.
- **Type Hinting Gaps:** `shell_runner.py` and some simulation utility scripts have incomplete type hints.
- **Indentation Check:** Perform a sweep to ensure 100% compliance with the 1-space indentation rule.
+1 -1
View File
@@ -10,7 +10,7 @@ This file tracks all major tracks for the project. Each track has its own detail
### Analysis & Structural Review
1. [ ] **Track: Code Path & Data Pipeline Analysis**
1. [x] **Track: Code Path & Data Pipeline Analysis**
*Link: [./tracks/code_path_analysis_20260507/](./tracks/code_path_analysis_20260507/)*
*Goal: Comprehensive analysis of major processing routes in `./src` and `./simulation`. Identify data pipelines and responsibilities. Map core execution flows to inform curation efforts.*
@@ -1,26 +1,26 @@
# Implementation Plan: Code Path & Data Pipeline Analysis (code_path_analysis_20260507)
## Phase 1: Structural Exploration & Tooling Setup
- [ ] Task: Initialize `PIPELINE_ANALYSIS.md` template.
- [ ] Task: Deploy `codebase_investigator` subagents to identify top-level entry points in `gui_2.py` and `simulation/`.
- [ ] Task: Verify usage of existing tree-sitter tools to generate initial call-graph skeletons for `./src`.
- [ ] Task: Conductor - User Manual Verification 'Phase 1' (Protocol in workflow.md)
- [x] Task: Initialize `PIPELINE_ANALYSIS.md` template.
- [x] Task: Deploy `codebase_investigator` subagents to identify top-level entry points in `gui_2.py` and `simulation/`.
- [x] Task: Verify usage of existing tree-sitter tools to generate initial call-graph skeletons for `./src`.
- [x] Task: Conductor - User Manual Verification 'Phase 1' (Protocol in workflow.md)
## Phase 2: Mapping Core Source Pipelines (`./src`)
- [ ] Task: Map the **Context Aggregation Pipeline** (`aggregate.py`, `models.py`).
- [ ] Task: Map the **AI Interaction Loop** (`ai_client.py`, `mcp_client.py`, `shell_runner.py`).
- [ ] Task: Map the **GUI Event & State Pipeline** (`gui_2.py`, `app_controller.py`).
- [ ] Task: Document data responsibilities and state boundaries for each route.
- [ ] Task: Conductor - User Manual Verification 'Phase 2' (Protocol in workflow.md)
- [x] Task: Map the **Context Aggregation Pipeline** (`aggregate.py`, `models.py`).
- [x] Task: Map the **AI Interaction Loop** (`ai_client.py`, `mcp_client.py`, `shell_runner.py`).
- [x] Task: Map the **GUI Event & State Pipeline** (`gui_2.py`, `app_controller.py`).
- [x] Task: Document data responsibilities and state boundaries for each route.
- [x] Task: Conductor - User Manual Verification 'Phase 2' (Protocol in workflow.md)
## Phase 3: Mapping Simulation Pipelines (`./simulation`)
- [ ] Task: Map the **Simulation Lifecycle** (`sim_base.py`, `sim_context.py`, `workflow_sim.py`).
- [ ] Task: Analyze data flow between `sim_ai_settings.py` and the execution engine.
- [ ] Task: Document the "Verification & Checkpointing" route in simulations.
- [ ] Task: Conductor - User Manual Verification 'Phase 3' (Protocol in workflow.md)
- [x] Task: Map the **Simulation Lifecycle** (`sim_base.py`, `sim_context.py`, `workflow_sim.py`).
- [x] Task: Analyze data flow between `sim_ai_settings.py` and the execution engine.
- [x] Task: Document the "Verification & Checkpointing" route in simulations.
- [x] Task: Conductor - User Manual Verification 'Phase 3' (Protocol in workflow.md)
## Phase 4: Synthesis & Reporting
- [ ] Task: Consolidate all findings into Mermaid diagrams within `PIPELINE_ANALYSIS.md`.
- [ ] Task: Identify specific "Curation Targets" (redundancies, style violations) for the next track.
- [ ] Task: Final review and hand-off to Track 2 (Codebase Curation).
- [ ] Task: Conductor - User Manual Verification 'Phase 4' (Protocol in workflow.md)
- [x] Task: Consolidate all findings into Mermaid diagrams within `PIPELINE_ANALYSIS.md`.
- [x] Task: Identify specific "Curation Targets" (redundancies, style violations) for the next track.
- [x] Task: Final review and hand-off to Track 2 (Codebase Curation).
- [x] Task: Conductor - User Manual Verification 'Phase 4' (Protocol in workflow.md)