chore(conductor): Complete Code Path & Data Pipeline Analysis

2026-05-07 22:01:25 -04:00
parent aff88bd151
commit 822d803ad8
3 changed files with 191 additions and 18 deletions
@@ -0,0 +1,173 @@
+# Code Path & Data Pipeline Analysis
+
+This document tracks the analysis of major processing routes and data pipelines within the Manual Slop codebase, following a pipeline-oriented architectural model.
+
+---
+
+## Executive Summary
+This analysis maps the Manual Slop codebase as a series of data-driven pipelines. The system transitions from asynchronous background services (AI, MMA) to a synchronous frame-based GUI, and uses a Puppeteer-style simulation framework for automated verification.
+
+---
+
+## 1. Top-Level Entry Points
+
+### 1.1 GUI Entry Point (`src/gui_2.py`)
+- **Main Driver:** `main()` function initiates the `App` instance and calls `app.run()`.
+- **Primary Rendering Loop:** Powered by `immapp.run()` from `imgui-bundle`. The per-frame UI state logic resides in `App._gui_func`.
+- **Background Event Loop:** `AppController` is initialized within `App.__init__` and runs a dedicated background thread (`_process_event_queue` in `app_controller.py`) for processing AI requests and non-UI tasks.
+
+### 1.2 Simulation Entry Points (`simulation/`)
+- **Lifecycle Orchestrator:** `run_sim()` in `sim_base.py` manages the standard `setup() -> run() -> teardown()` pipeline.
+- **Base Class:** `BaseSimulation` in `sim_base.py` defines the interface for all simulation tasks.
+- **High-Level Turn Loop:** `WorkflowSimulator.run_discussion_turn()` in `workflow_sim.py` implements a polling loop that monitors `ai_status` and message history via the `ApiHookClient` to orchestrate multi-turn interactions.
+
+---
+
+## 2. Core Source Pipelines (`./src`)
+
+### 2.1 Context Aggregation Pipeline
+```mermaid
+graph TD
+    A[aggregate.run] --> B[resolve_paths]
+    B --> C[build_file_items]
+    C --> D{summary_only?}
+    D -- Yes --> E[summarize.py]
+    D -- No --> F[build_markdown]
+    E --> F
+    F --> G[Monolithic Markdown Context]
+```
+- **Entry Point:** `aggregate.run()`
+- **Route:**
+  1. **Path Resolution:** `resolve_paths()` handles globs and absolute paths from the project configuration.
+  2. **Item Construction:** `build_file_items()` reads raw content, modification times, and tier metadata.
+  3. **Summarization (Optional):** If `summary_only` is enabled, items are piped through `summarize.py` for AST-based or heuristic compression.
+  4. **Markdown Synthesis:** `build_markdown_from_items()` (or tier-specific variants) assembles the files, screenshots (`build_screenshots_section`), and discussion history (`build_discussion_section`) into the final context string.
+- **Data Responsibility:** 
+  - **Owned:** `FileItem` list, `history` list.
+  - **Mutated:** None (pure synthesis pipeline).
+  - **Terminal Output:** A monolithic Markdown string and a list of `file_items` (for provider-specific file uploads).
+
+### 2.2 AI Interaction & Tool-Call Loop
+```mermaid
+graph TD
+    A[ai_client.send] --> B[Prompt Assembly]
+    B --> C[Provider SDK Call]
+    C --> D{Tool Call?}
+    D -- Read-Only --> E[mcp_client]
+    D -- Mutating --> F[GUI Approval Modal]
+    D -- PowerShell --> G[shell_runner.run_powershell]
+    E --> H[Tool Result]
+    F -- Approved --> G
+    G --> H
+    H --> I[Append Result to History]
+    I --> C
+    D -- No --> J[Final AI Response]
+```
+- **Entry Point:** `ai_client.send()`
+- **Route:**
+  1. **Provider Selection:** Logic routes to `_send_gemini`, `_send_anthropic`, etc., based on configuration.
+  2. **Prompt Assembly:** Combines the project context (from Pipeline 2.1) with conversation history and provider-specific system instructions.
+  3. **Execution Loop:** Handles multi-turn tool calling (up to `MAX_TOOL_ROUNDS`).
+  4. **Tool Dispatch:**
+     - **Read-Only:** Calls `mcp_client` tools directly.
+     - **Mutating:** Triggers `pre_tool_callback` (GUI modal) for user approval.
+     - **PowerShell:** `_run_script()` delegates to `shell_runner.run_powershell()`.
+  5. **Response Synthesis:** Final AI text or tool results are returned to the caller.
+- **Data Responsibility:**
+  - **Owned:** Conversation history, tool schemas, API credentials.
+  - **Mutated:** Conversation history (appends turns), `cost_tracker` state.
+  - **Terminal Output:** Final AI message, generated scripts, and updated conversation state.
+
+### 2.3 GUI Event & State Synchronization
+```mermaid
+graph LR
+    subgraph Foreground [gui_2.py - ImGui Loop]
+    A[App._gui_func] --> B[_process_pending_gui_tasks]
+    B --> C[Trigger Modals / Update Panels]
+    end
+    subgraph Background [app_controller.py - Event Loop]
+    D[AppController._process_event_queue] --> E{Event Type}
+    E -- user_request --> F[Trigger AI Loop]
+    E -- response --> G[Queue gui_task]
+    G --> B
+    end
+    UI[User Input] --> D
+```
+- **Entry Points:** `gui_2.py:App._gui_func()` (Foreground), `app_controller.py:AppController._process_event_queue()` (Background).
+- **Route:**
+  1. **User Action:** UI event (e.g., clicking "Send") places a request in `AppController.event_queue`.
+  2. **Background Dispatch:** `_process_event_queue()` identifies the event type. `user_request` spawns a thread (`_handle_request_event`) to trigger Pipeline 2.2 (AI Loop).
+  3. **Task Queuing:** Background services (AI, MMA, Indexing) place `gui_task` or `mma_state_update` objects into `AppController._pending_gui_tasks`.
+  4. **Foreground Sync:** `App._gui_func()` checks for pending tasks every frame via `_process_pending_gui_tasks()`, updating the ImGui state and triggering modals.
+- **Data Responsibility:**
+  - **Owned:** ImGui window states, panel visibility, text viewer buffers.
+  - **Mutated:** `ai_status`, `mma_status`, pending tool call lists.
+  - **Terminal Output:** Updated UI visuals and user-approved actions.
+
+---
+
+## 3. Simulation Pipelines (`./simulation`)
+
+### 3.1 Simulation Lifecycle
+```mermaid
+graph TD
+    A[run_sim] --> B[BaseSimulation.setup]
+    B --> C[Scaffold Temp Project]
+    C --> D[Simulation.run]
+    D --> E[WorkflowSimulator.run_discussion_turn]
+    E --> F[wait_for_ai_response]
+    F --> G{Status == idle & Last == AI?}
+    G -- No --> F
+    G -- Yes --> H[Validation/Assertions]
+    H --> I[BaseSimulation.teardown]
+```
+- **Entry Point:** `run_sim(MySimulation)`
+- **Route:**
+  1. **Scaffolding:** `BaseSimulation.setup()` initializes the `ApiHookClient`, clears the current session, and creates a temporary test project.
+  2. **Workflow Orchestration:** `WorkflowSimulator.setup_new_project()` and `create_discussion()` configure the UI state for the test scenario.
+  3. **Interaction Loop:** `WorkflowSimulator.run_discussion_turn()` manages the multi-turn exchange.
+     - Polling: Continuously checks `ai_status` via HTTP hooks.
+     - Stall Recovery: Automatically re-triggers the Send action if the AI stops without a final response (e.g., after a tool call).
+  4. **Validation:** Subclasses perform assertions against the UI state (e.g., `assert_panel_visible()`).
+  5. **Cleanup:** `BaseSimulation.teardown()` handles resource deallocation.
+- **Data Responsibility:**
+  - **Owned:** Mock project paths, synthetic user messages.
+  - **Mutated:** Global `ai_status` (indirectly via Hooks), target file system in the test project.
+  - **Terminal Output:** Test pass/fail status, performance/coverage metrics.
+
+### 3.2 Verification & Checkpointing Protocol
+- **Turn Completion Logic:** `WorkflowSimulator.wait_for_ai_response()` implements a state machine for turn detection.
+  - **Transition-Based:** Tracks `was_busy` (status in ["thinking", "streaming", "running powershell", etc.]) and triggers completion when status returns to "idle" and the last history role is "AI".
+  - **Error Handling:** GUI-reported "error" statuses trigger an immediate abort.
+- **Stall Recovery:** Detects "stalled" turns where the last role is "Tool" but the system is "idle" (indicating a tool result was received but the AI didn't automatically continue). The simulator re-triggers the `btn_gen_send` hook to force progress.
+- **State Determinism:** Simulations force `auto_add_history=True` and reset sessions during `setup()` to ensure a clean slate for verification.
+
+---
+
+## 4. Data Responsibility & State Boundaries
+*Mapping which pipelines own and mutate specific data structures.*
+
+| Pipeline | Primary Data Owned | Mutated State | Terminal Output |
+| :--- | :--- | :--- | :--- |
+| **2.1 Context Aggregation** | `FileItem` list, `history` list | None (Pure Synthesis) | Markdown Context String |
+| **2.2 AI Interaction** | AI History, Tool Schemas | `history` (Turns), `cost_tracker` | AI Response, Tool Calls |
+| **2.3 GUI & Sync** | ImGui State, Controller Config | `ai_status`, `pending_tasks` | Visual Feedback, Log Entries |
+| **Simulation (3.1)** | `BaseSimulation` state, Mock Hooks | Virtual `ai_status`, polled history | Test Pass/Fail, Coverage Metrics |
+
+---
+
+## 5. Identified Redundancies & Curation Targets
+*List of specific areas for pruning in the next phase.*
+
+### 5.1 Configuration & Model Redundancies
+- **Duplicate Class Definitions:** `models.py` contains redundant definitions for `TextEditorConfig` and `ExternalEditorConfig`.
+- **Provider Registry:** Both `gui_2.py` and `app_controller.py` maintain their own `PROVIDERS` list. This should be consolidated into `models.py` or a dedicated config module.
+
+### 5.2 Processing Overlap
+- **Context Synthesis:** `aggregate.py` has several tier-specific functions (`build_tier1_context`, `build_tier2_context`, etc.) that share significant boilerplate logic. These should be refactored into a single param-driven pipeline.
+- **Simulation Setup:** `WorkflowSimulator` and `BaseSimulation` have overlapping responsibilities for project scaffolding and session resetting.
+
+### 5.3 Style & Integrity Violations
+- **Inconsistent Docstrings:** Some older modules lack the standardized "Architecture" and "Key Components" headers.
+- **Type Hinting Gaps:** `shell_runner.py` and some simulation utility scripts have incomplete type hints.
+- **Indentation Check:** Perform a sweep to ensure 100% compliance with the 1-space indentation rule.
@@ -10,7 +10,7 @@ This file tracks all major tracks for the project. Each track has its own detail

 ### Analysis & Structural Review

-1. [ ] **Track: Code Path & Data Pipeline Analysis**
+1. [x] **Track: Code Path & Data Pipeline Analysis**
   *Link: [./tracks/code_path_analysis_20260507/](./tracks/code_path_analysis_20260507/)*
   *Goal: Comprehensive analysis of major processing routes in `./src` and `./simulation`. Identify data pipelines and responsibilities. Map core execution flows to inform curation efforts.*

@@ -1,26 +1,26 @@
 # Implementation Plan: Code Path & Data Pipeline Analysis (code_path_analysis_20260507)

 ## Phase 1: Structural Exploration & Tooling Setup
- [ ] Task: Initialize `PIPELINE_ANALYSIS.md` template.
- [ ] Task: Deploy `codebase_investigator` subagents to identify top-level entry points in `gui_2.py` and `simulation/`.
- [ ] Task: Verify usage of existing tree-sitter tools to generate initial call-graph skeletons for `./src`.
- [ ] Task: Conductor - User Manual Verification 'Phase 1' (Protocol in workflow.md)
+- [x] Task: Initialize `PIPELINE_ANALYSIS.md` template.
+- [x] Task: Deploy `codebase_investigator` subagents to identify top-level entry points in `gui_2.py` and `simulation/`.
+- [x] Task: Verify usage of existing tree-sitter tools to generate initial call-graph skeletons for `./src`.
+- [x] Task: Conductor - User Manual Verification 'Phase 1' (Protocol in workflow.md)

 ## Phase 2: Mapping Core Source Pipelines (`./src`)
- [ ] Task: Map the **Context Aggregation Pipeline** (`aggregate.py`, `models.py`).
- [ ] Task: Map the **AI Interaction Loop** (`ai_client.py`, `mcp_client.py`, `shell_runner.py`).
- [ ] Task: Map the **GUI Event & State Pipeline** (`gui_2.py`, `app_controller.py`).
- [ ] Task: Document data responsibilities and state boundaries for each route.
- [ ] Task: Conductor - User Manual Verification 'Phase 2' (Protocol in workflow.md)
+- [x] Task: Map the **Context Aggregation Pipeline** (`aggregate.py`, `models.py`).
+- [x] Task: Map the **AI Interaction Loop** (`ai_client.py`, `mcp_client.py`, `shell_runner.py`).
+- [x] Task: Map the **GUI Event & State Pipeline** (`gui_2.py`, `app_controller.py`).
+- [x] Task: Document data responsibilities and state boundaries for each route.
+- [x] Task: Conductor - User Manual Verification 'Phase 2' (Protocol in workflow.md)

 ## Phase 3: Mapping Simulation Pipelines (`./simulation`)
- [ ] Task: Map the **Simulation Lifecycle** (`sim_base.py`, `sim_context.py`, `workflow_sim.py`).
- [ ] Task: Analyze data flow between `sim_ai_settings.py` and the execution engine.
- [ ] Task: Document the "Verification & Checkpointing" route in simulations.
- [ ] Task: Conductor - User Manual Verification 'Phase 3' (Protocol in workflow.md)
+- [x] Task: Map the **Simulation Lifecycle** (`sim_base.py`, `sim_context.py`, `workflow_sim.py`).
+- [x] Task: Analyze data flow between `sim_ai_settings.py` and the execution engine.
+- [x] Task: Document the "Verification & Checkpointing" route in simulations.
+- [x] Task: Conductor - User Manual Verification 'Phase 3' (Protocol in workflow.md)

 ## Phase 4: Synthesis & Reporting
- [ ] Task: Consolidate all findings into Mermaid diagrams within `PIPELINE_ANALYSIS.md`.
- [ ] Task: Identify specific "Curation Targets" (redundancies, style violations) for the next track.
- [ ] Task: Final review and hand-off to Track 2 (Codebase Curation).
- [ ] Task: Conductor - User Manual Verification 'Phase 4' (Protocol in workflow.md)
+- [x] Task: Consolidate all findings into Mermaid diagrams within `PIPELINE_ANALYSIS.md`.
+- [x] Task: Identify specific "Curation Targets" (redundancies, style violations) for the next track.
+- [x] Task: Final review and hand-off to Track 2 (Codebase Curation).
+- [x] Task: Conductor - User Manual Verification 'Phase 4' (Protocol in workflow.md)