chore(conductor): Complete Code Path & Data Pipeline Analysis
This commit is contained in:
@@ -0,0 +1,173 @@
|
||||
# Code Path & Data Pipeline Analysis
|
||||
|
||||
This document tracks the analysis of major processing routes and data pipelines within the Manual Slop codebase, following a pipeline-oriented architectural model.
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
This analysis maps the Manual Slop codebase as a series of data-driven pipelines. The system transitions from asynchronous background services (AI, MMA) to a synchronous frame-based GUI, and uses a Puppeteer-style simulation framework for automated verification.
|
||||
|
||||
---
|
||||
|
||||
## 1. Top-Level Entry Points
|
||||
|
||||
### 1.1 GUI Entry Point (`src/gui_2.py`)
|
||||
- **Main Driver:** `main()` function initiates the `App` instance and calls `app.run()`.
|
||||
- **Primary Rendering Loop:** Powered by `immapp.run()` from `imgui-bundle`. The per-frame UI state logic resides in `App._gui_func`.
|
||||
- **Background Event Loop:** `AppController` is initialized within `App.__init__` and runs a dedicated background thread (`_process_event_queue` in `app_controller.py`) for processing AI requests and non-UI tasks.
|
||||
|
||||
### 1.2 Simulation Entry Points (`simulation/`)
|
||||
- **Lifecycle Orchestrator:** `run_sim()` in `sim_base.py` manages the standard `setup() -> run() -> teardown()` pipeline.
|
||||
- **Base Class:** `BaseSimulation` in `sim_base.py` defines the interface for all simulation tasks.
|
||||
- **High-Level Turn Loop:** `WorkflowSimulator.run_discussion_turn()` in `workflow_sim.py` implements a polling loop that monitors `ai_status` and message history via the `ApiHookClient` to orchestrate multi-turn interactions.
|
||||
|
||||
---
|
||||
|
||||
## 2. Core Source Pipelines (`./src`)
|
||||
|
||||
### 2.1 Context Aggregation Pipeline
|
||||
```mermaid
|
||||
graph TD
|
||||
A[aggregate.run] --> B[resolve_paths]
|
||||
B --> C[build_file_items]
|
||||
C --> D{summary_only?}
|
||||
D -- Yes --> E[summarize.py]
|
||||
D -- No --> F[build_markdown]
|
||||
E --> F
|
||||
F --> G[Monolithic Markdown Context]
|
||||
```
|
||||
- **Entry Point:** `aggregate.run()`
|
||||
- **Route:**
|
||||
1. **Path Resolution:** `resolve_paths()` handles globs and absolute paths from the project configuration.
|
||||
2. **Item Construction:** `build_file_items()` reads raw content, modification times, and tier metadata.
|
||||
3. **Summarization (Optional):** If `summary_only` is enabled, items are piped through `summarize.py` for AST-based or heuristic compression.
|
||||
4. **Markdown Synthesis:** `build_markdown_from_items()` (or tier-specific variants) assembles the files, screenshots (`build_screenshots_section`), and discussion history (`build_discussion_section`) into the final context string.
|
||||
- **Data Responsibility:**
|
||||
- **Owned:** `FileItem` list, `history` list.
|
||||
- **Mutated:** None (pure synthesis pipeline).
|
||||
- **Terminal Output:** A monolithic Markdown string and a list of `file_items` (for provider-specific file uploads).
|
||||
|
||||
### 2.2 AI Interaction & Tool-Call Loop
|
||||
```mermaid
|
||||
graph TD
|
||||
A[ai_client.send] --> B[Prompt Assembly]
|
||||
B --> C[Provider SDK Call]
|
||||
C --> D{Tool Call?}
|
||||
D -- Read-Only --> E[mcp_client]
|
||||
D -- Mutating --> F[GUI Approval Modal]
|
||||
D -- PowerShell --> G[shell_runner.run_powershell]
|
||||
E --> H[Tool Result]
|
||||
F -- Approved --> G
|
||||
G --> H
|
||||
H --> I[Append Result to History]
|
||||
I --> C
|
||||
D -- No --> J[Final AI Response]
|
||||
```
|
||||
- **Entry Point:** `ai_client.send()`
|
||||
- **Route:**
|
||||
1. **Provider Selection:** Logic routes to `_send_gemini`, `_send_anthropic`, etc., based on configuration.
|
||||
2. **Prompt Assembly:** Combines the project context (from Pipeline 2.1) with conversation history and provider-specific system instructions.
|
||||
3. **Execution Loop:** Handles multi-turn tool calling (up to `MAX_TOOL_ROUNDS`).
|
||||
4. **Tool Dispatch:**
|
||||
- **Read-Only:** Calls `mcp_client` tools directly.
|
||||
- **Mutating:** Triggers `pre_tool_callback` (GUI modal) for user approval.
|
||||
- **PowerShell:** `_run_script()` delegates to `shell_runner.run_powershell()`.
|
||||
5. **Response Synthesis:** Final AI text or tool results are returned to the caller.
|
||||
- **Data Responsibility:**
|
||||
- **Owned:** Conversation history, tool schemas, API credentials.
|
||||
- **Mutated:** Conversation history (appends turns), `cost_tracker` state.
|
||||
- **Terminal Output:** Final AI message, generated scripts, and updated conversation state.
|
||||
|
||||
### 2.3 GUI Event & State Synchronization
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph Foreground [gui_2.py - ImGui Loop]
|
||||
A[App._gui_func] --> B[_process_pending_gui_tasks]
|
||||
B --> C[Trigger Modals / Update Panels]
|
||||
end
|
||||
subgraph Background [app_controller.py - Event Loop]
|
||||
D[AppController._process_event_queue] --> E{Event Type}
|
||||
E -- user_request --> F[Trigger AI Loop]
|
||||
E -- response --> G[Queue gui_task]
|
||||
G --> B
|
||||
end
|
||||
UI[User Input] --> D
|
||||
```
|
||||
- **Entry Points:** `gui_2.py:App._gui_func()` (Foreground), `app_controller.py:AppController._process_event_queue()` (Background).
|
||||
- **Route:**
|
||||
1. **User Action:** UI event (e.g., clicking "Send") places a request in `AppController.event_queue`.
|
||||
2. **Background Dispatch:** `_process_event_queue()` identifies the event type. `user_request` spawns a thread (`_handle_request_event`) to trigger Pipeline 2.2 (AI Loop).
|
||||
3. **Task Queuing:** Background services (AI, MMA, Indexing) place `gui_task` or `mma_state_update` objects into `AppController._pending_gui_tasks`.
|
||||
4. **Foreground Sync:** `App._gui_func()` checks for pending tasks every frame via `_process_pending_gui_tasks()`, updating the ImGui state and triggering modals.
|
||||
- **Data Responsibility:**
|
||||
- **Owned:** ImGui window states, panel visibility, text viewer buffers.
|
||||
- **Mutated:** `ai_status`, `mma_status`, pending tool call lists.
|
||||
- **Terminal Output:** Updated UI visuals and user-approved actions.
|
||||
|
||||
---
|
||||
|
||||
## 3. Simulation Pipelines (`./simulation`)
|
||||
|
||||
### 3.1 Simulation Lifecycle
|
||||
```mermaid
|
||||
graph TD
|
||||
A[run_sim] --> B[BaseSimulation.setup]
|
||||
B --> C[Scaffold Temp Project]
|
||||
C --> D[Simulation.run]
|
||||
D --> E[WorkflowSimulator.run_discussion_turn]
|
||||
E --> F[wait_for_ai_response]
|
||||
F --> G{Status == idle & Last == AI?}
|
||||
G -- No --> F
|
||||
G -- Yes --> H[Validation/Assertions]
|
||||
H --> I[BaseSimulation.teardown]
|
||||
```
|
||||
- **Entry Point:** `run_sim(MySimulation)`
|
||||
- **Route:**
|
||||
1. **Scaffolding:** `BaseSimulation.setup()` initializes the `ApiHookClient`, clears the current session, and creates a temporary test project.
|
||||
2. **Workflow Orchestration:** `WorkflowSimulator.setup_new_project()` and `create_discussion()` configure the UI state for the test scenario.
|
||||
3. **Interaction Loop:** `WorkflowSimulator.run_discussion_turn()` manages the multi-turn exchange.
|
||||
- Polling: Continuously checks `ai_status` via HTTP hooks.
|
||||
- Stall Recovery: Automatically re-triggers the Send action if the AI stops without a final response (e.g., after a tool call).
|
||||
4. **Validation:** Subclasses perform assertions against the UI state (e.g., `assert_panel_visible()`).
|
||||
5. **Cleanup:** `BaseSimulation.teardown()` handles resource deallocation.
|
||||
- **Data Responsibility:**
|
||||
- **Owned:** Mock project paths, synthetic user messages.
|
||||
- **Mutated:** Global `ai_status` (indirectly via Hooks), target file system in the test project.
|
||||
- **Terminal Output:** Test pass/fail status, performance/coverage metrics.
|
||||
|
||||
### 3.2 Verification & Checkpointing Protocol
|
||||
- **Turn Completion Logic:** `WorkflowSimulator.wait_for_ai_response()` implements a state machine for turn detection.
|
||||
- **Transition-Based:** Tracks `was_busy` (status in ["thinking", "streaming", "running powershell", etc.]) and triggers completion when status returns to "idle" and the last history role is "AI".
|
||||
- **Error Handling:** GUI-reported "error" statuses trigger an immediate abort.
|
||||
- **Stall Recovery:** Detects "stalled" turns where the last role is "Tool" but the system is "idle" (indicating a tool result was received but the AI didn't automatically continue). The simulator re-triggers the `btn_gen_send` hook to force progress.
|
||||
- **State Determinism:** Simulations force `auto_add_history=True` and reset sessions during `setup()` to ensure a clean slate for verification.
|
||||
|
||||
---
|
||||
|
||||
## 4. Data Responsibility & State Boundaries
|
||||
*Mapping which pipelines own and mutate specific data structures.*
|
||||
|
||||
| Pipeline | Primary Data Owned | Mutated State | Terminal Output |
|
||||
| :--- | :--- | :--- | :--- |
|
||||
| **2.1 Context Aggregation** | `FileItem` list, `history` list | None (Pure Synthesis) | Markdown Context String |
|
||||
| **2.2 AI Interaction** | AI History, Tool Schemas | `history` (Turns), `cost_tracker` | AI Response, Tool Calls |
|
||||
| **2.3 GUI & Sync** | ImGui State, Controller Config | `ai_status`, `pending_tasks` | Visual Feedback, Log Entries |
|
||||
| **Simulation (3.1)** | `BaseSimulation` state, Mock Hooks | Virtual `ai_status`, polled history | Test Pass/Fail, Coverage Metrics |
|
||||
|
||||
---
|
||||
|
||||
## 5. Identified Redundancies & Curation Targets
|
||||
*List of specific areas for pruning in the next phase.*
|
||||
|
||||
### 5.1 Configuration & Model Redundancies
|
||||
- **Duplicate Class Definitions:** `models.py` contains redundant definitions for `TextEditorConfig` and `ExternalEditorConfig`.
|
||||
- **Provider Registry:** Both `gui_2.py` and `app_controller.py` maintain their own `PROVIDERS` list. This should be consolidated into `models.py` or a dedicated config module.
|
||||
|
||||
### 5.2 Processing Overlap
|
||||
- **Context Synthesis:** `aggregate.py` has several tier-specific functions (`build_tier1_context`, `build_tier2_context`, etc.) that share significant boilerplate logic. These should be refactored into a single param-driven pipeline.
|
||||
- **Simulation Setup:** `WorkflowSimulator` and `BaseSimulation` have overlapping responsibilities for project scaffolding and session resetting.
|
||||
|
||||
### 5.3 Style & Integrity Violations
|
||||
- **Inconsistent Docstrings:** Some older modules lack the standardized "Architecture" and "Key Components" headers.
|
||||
- **Type Hinting Gaps:** `shell_runner.py` and some simulation utility scripts have incomplete type hints.
|
||||
- **Indentation Check:** Perform a sweep to ensure 100% compliance with the 1-space indentation rule.
|
||||
+1
-1
@@ -10,7 +10,7 @@ This file tracks all major tracks for the project. Each track has its own detail
|
||||
|
||||
### Analysis & Structural Review
|
||||
|
||||
1. [ ] **Track: Code Path & Data Pipeline Analysis**
|
||||
1. [x] **Track: Code Path & Data Pipeline Analysis**
|
||||
*Link: [./tracks/code_path_analysis_20260507/](./tracks/code_path_analysis_20260507/)*
|
||||
*Goal: Comprehensive analysis of major processing routes in `./src` and `./simulation`. Identify data pipelines and responsibilities. Map core execution flows to inform curation efforts.*
|
||||
|
||||
|
||||
@@ -1,26 +1,26 @@
|
||||
# Implementation Plan: Code Path & Data Pipeline Analysis (code_path_analysis_20260507)
|
||||
|
||||
## Phase 1: Structural Exploration & Tooling Setup
|
||||
- [ ] Task: Initialize `PIPELINE_ANALYSIS.md` template.
|
||||
- [ ] Task: Deploy `codebase_investigator` subagents to identify top-level entry points in `gui_2.py` and `simulation/`.
|
||||
- [ ] Task: Verify usage of existing tree-sitter tools to generate initial call-graph skeletons for `./src`.
|
||||
- [ ] Task: Conductor - User Manual Verification 'Phase 1' (Protocol in workflow.md)
|
||||
- [x] Task: Initialize `PIPELINE_ANALYSIS.md` template.
|
||||
- [x] Task: Deploy `codebase_investigator` subagents to identify top-level entry points in `gui_2.py` and `simulation/`.
|
||||
- [x] Task: Verify usage of existing tree-sitter tools to generate initial call-graph skeletons for `./src`.
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 1' (Protocol in workflow.md)
|
||||
|
||||
## Phase 2: Mapping Core Source Pipelines (`./src`)
|
||||
- [ ] Task: Map the **Context Aggregation Pipeline** (`aggregate.py`, `models.py`).
|
||||
- [ ] Task: Map the **AI Interaction Loop** (`ai_client.py`, `mcp_client.py`, `shell_runner.py`).
|
||||
- [ ] Task: Map the **GUI Event & State Pipeline** (`gui_2.py`, `app_controller.py`).
|
||||
- [ ] Task: Document data responsibilities and state boundaries for each route.
|
||||
- [ ] Task: Conductor - User Manual Verification 'Phase 2' (Protocol in workflow.md)
|
||||
- [x] Task: Map the **Context Aggregation Pipeline** (`aggregate.py`, `models.py`).
|
||||
- [x] Task: Map the **AI Interaction Loop** (`ai_client.py`, `mcp_client.py`, `shell_runner.py`).
|
||||
- [x] Task: Map the **GUI Event & State Pipeline** (`gui_2.py`, `app_controller.py`).
|
||||
- [x] Task: Document data responsibilities and state boundaries for each route.
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 2' (Protocol in workflow.md)
|
||||
|
||||
## Phase 3: Mapping Simulation Pipelines (`./simulation`)
|
||||
- [ ] Task: Map the **Simulation Lifecycle** (`sim_base.py`, `sim_context.py`, `workflow_sim.py`).
|
||||
- [ ] Task: Analyze data flow between `sim_ai_settings.py` and the execution engine.
|
||||
- [ ] Task: Document the "Verification & Checkpointing" route in simulations.
|
||||
- [ ] Task: Conductor - User Manual Verification 'Phase 3' (Protocol in workflow.md)
|
||||
- [x] Task: Map the **Simulation Lifecycle** (`sim_base.py`, `sim_context.py`, `workflow_sim.py`).
|
||||
- [x] Task: Analyze data flow between `sim_ai_settings.py` and the execution engine.
|
||||
- [x] Task: Document the "Verification & Checkpointing" route in simulations.
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 3' (Protocol in workflow.md)
|
||||
|
||||
## Phase 4: Synthesis & Reporting
|
||||
- [ ] Task: Consolidate all findings into Mermaid diagrams within `PIPELINE_ANALYSIS.md`.
|
||||
- [ ] Task: Identify specific "Curation Targets" (redundancies, style violations) for the next track.
|
||||
- [ ] Task: Final review and hand-off to Track 2 (Codebase Curation).
|
||||
- [ ] Task: Conductor - User Manual Verification 'Phase 4' (Protocol in workflow.md)
|
||||
- [x] Task: Consolidate all findings into Mermaid diagrams within `PIPELINE_ANALYSIS.md`.
|
||||
- [x] Task: Identify specific "Curation Targets" (redundancies, style violations) for the next track.
|
||||
- [x] Task: Final review and hand-off to Track 2 (Codebase Curation).
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 4' (Protocol in workflow.md)
|
||||
|
||||
Reference in New Issue
Block a user