diff --git a/docs/reports/FOLLOWUP_module_taxonomy_20260627.md b/docs/reports/FOLLOWUP_module_taxonomy_20260627.md new file mode 100644 index 00000000..3d1ecaba --- /dev/null +++ b/docs/reports/FOLLOWUP_module_taxonomy_20260627.md @@ -0,0 +1,252 @@ +# Module Taxonomy Audit + Refactor Plan + +**Date:** 2026-06-27 +**Reviewer:** Tier 1 +**Trigger:** User directive: "if anything I want more unification. I only want splitifcation if there is a good reason such as import load times. If there isn't an import issue or definition pollution issue just keep it in the same file." + +--- + +## Decision rule (the user's principle) + +**Split a file only if ONE of:** +- Import load time: the file has heavy imports (vendored SDKs, ML models) that some code paths don't need +- Definition pollution: the file mixes 3+ unrelated domains with 30+ classes/functions + +**Otherwise:** keep in a single file. Move imports around, but don't fragment. + +**No sub-directories.** All files at `src/` flat with prefix naming. + +--- + +## TL;DR + +Only TWO clear refactors are justified: + +1. **MERGE 5 ImGui LEAKS into `gui_2.py`** (clear violation of the GUI boundary) +2. **SPLIT `models.py` into `mma.py` + `project.py` + `project_files.py`** (clear definition pollution; 36 classes, 5+ unrelated domains, 1044 lines) +3. **MERGE 2 vendor files into `ai_client.py`** (per user's explicit directive) + +Everything else: KEEP AS-IS. No unnecessary fragmentation. + +--- + +## Full audit: 65 files in `src/` + +### Group A: MERGE (5 ImGui LEAKS into `gui_2.py`) + +User directive: "all ImGui rendering should be in `gui_2.py`. Only exception: `imgui_scopes.py`" + +| File | Lines | LEAK content | Destination | +|---|---:|---|---| +| `src/bg_shader.py` | 66 | ImGui background shader code | → `gui_2.py` | +| `src/shaders.py` | 33 | ImGui shader code | → `gui_2.py` | +| `src/command_palette.py` | 165 | ImGui command palette UI | → `gui_2.py` | +| `src/diff_viewer.py` | 164 | ImGui diff viewer UI | → `gui_2.py` | +| `src/patch_modal.py` | 102 | ImGui patch modal UI | → `gui_2.py` | + +**Verification:** `git grep -l "imgui\\." -- 'src/*.py'` should return ONLY `gui_2.py` + `imgui_scopes.py`. + +### Group B: MERGE (2 vendor files into `ai_client.py`) + +User directive: "vendor_capabilities.py and vendor_state.py are related to ai_client.py... they're the ai vendoring layer." + +| File | Lines | Destination | +|---|---:|---| +| `src/vendor_capabilities.py` | 85 | → `ai_client.py` (add as section "Vendor Capabilities") | +| `src/vendor_state.py` | 78 | → `ai_client.py` (add as section "Vendor State") | + +ai_client.py grows from 3147 → ~3310 lines. Justified: these ARE the vendor layer per user; keeping them split is fragmenting a single domain. + +### Group C: SPLIT (`models.py` is the only clear definition pollution) + +`models.py` = 1044 lines, 36 classes, 5+ unrelated domains. Justified split. + +**The new taxonomy:** + +| New file | What it gets | Lines (est.) | +|---|---|---:| +| **`src/mma.py`** | MMA Core + TrackState: ThinkingSegment, Ticket, Track, WorkerContext, TrackState | ~250 | +| **`src/project.py`** | ProjectContext + 5 sub-dataclasses + config I/O (`_clean_nones`, `load_config_from_disk`, `save_config_to_disk`, `parse_history_entries`) | ~200 | +| **`src/project_files.py`** | FileItem, ContextPreset, ContextFileEntry, NamedViewPreset, Preset | ~150 | + +**Classes that merge into EXISTING sub-system files (not new files):** + +| Class from `models.py` | Destination (existing file) | +|---|---| +| `Persona` | `src/personas.py` (93 lines, exists) | +| `Tool`, `ToolPreset` | `src/tool_presets.py` (123 lines, exists) | +| `BiasProfile` | `src/tool_bias.py` (63 lines, exists) | +| `TextEditorConfig`, `ExternalEditorConfig` | `src/external_editor.py` (129 lines, exists) | +| `MCPServerConfig`, `MCPConfiguration`, `VectorStoreConfig`, `RAGConfig`, `load_mcp_config` | `src/mcp_client.py` (1803 lines, exists) | +| `WorkspaceProfile` | `src/workspace_manager.py` (73 lines, exists) | + +**`src/models.py` reduced to:** +- `_create_generate_request`, `_create_confirm_request`, `__getattr__` (Pydantic lazy proxies for the API; could also move to `api_hooks.py` if they're truly API-specific) +- Top-level docstring updated to reflect the new scope + +**`AGENT_TOOL_NAMES` is REDUNDANT — DELETE it (not just move).** It's a hardcoded snapshot of `mcp_tool_specs.tool_names()`. The existing test `test_tool_names_subset_of_models_agent_tool_names` literally asserts `tool_names() ⊆ AGENT_TOOL_NAMES`. Derive the list at consumer sites: `list(mcp_tool_specs.tool_names())`. Update 8 consumer sites (3 in `app_controller.py` + 5 in `tests/test_arch_boundary_phase2.py`). The cross-check test becomes either redundant or converts to a positive assertion that the set is derived correctly. + +Estimated: ~30 lines (down from 1044, down from 60 if you keep the redundant constant). + +### Group D: KEEP AS-IS (the rest) + +All remaining files have clear single responsibilities. No reason to split: + +| Category | Files | Total lines | +|---|---|---:| +| **Core types** | `paths.py`, `result_types.py`, `type_aliases.py` | 523 | +| **AI vendor** (unified) | `ai_client.py` (with vendor_*.py merged) | 3310 | +| **MMA** (mostly) | `multi_agent_conductor.py`, `dag_engine.py`, `conductor_tech_lead.py`, `orchestrator_pm.py`, `mma_prompts.py`, `events.py` | 1369 | +| **MCP** | `mcp_client.py` (with config merged), `mcp_tool_specs.py`, `beads_client.py` | 1978 | +| **Project** (unified) | `project_manager.py` (main), `presets.py`, `context_presets.py`, `project.py` (NEW), `project_files.py` (NEW) | ~900 | +| **GUI** (unified) | `gui_2.py` (with ImGui LEAKS merged), `imgui_scopes.py` (EXCEPTION per user) | ~8300 | +| **Theme** | `theme_2.py`, `theme_models.py`, `theme_nerv_fx.py`, `theme_nerv.py` | 728 | +| **Tool/persona/editor/mcp config** (merged) | `tool_presets.py`, `tool_bias.py`, `personas.py`, `external_editor.py`, `workspace_manager.py` | ~500 | +| **API hook** | `api_hooks.py`, `api_hook_client.py`, `api_hooks_helpers.py` | 1480 | +| **Infra** | `log_registry.py`, `log_pruner.py`, `session_logger.py`, `history.py`, `warmup.py`, `startup_profiler.py`, `performance_monitor.py`, `io_pool.py`, `module_loader.py`, `shell_runner.py`, `hot_reloader.py`, `summary_cache.py`, `summarize.py`, `synthesis_formatter.py`, `fuzzy_anchor.py`, `outline_tool.py`, `file_cache.py`, `aggregate.py` | ~3700 | + +--- + +## Why this taxonomy (per the user's principle) + +### MERGE actions (3 files moved, 5 deleted): + +| Action | Files deleted | Justification | +|---|---|---| +| ImGui LEAKS → `gui_2.py` | 5 deleted | Clear violation of GUI boundary (user directive) | +| Vendor files → `ai_client.py` | 2 deleted | User explicit directive; unified vendor layer | + +### SPLIT actions (1 file split into 3): + +| Action | New files | Justification | +|---|---|---| +| `models.py` split | `mma.py` + `project.py` + `project_files.py` | Definition pollution (5+ domains, 36 classes, 1044 lines) | +| Other models.py classes merged into existing files | (none new) | Persona/Tool/Editor/MCP/Workspace already have their own files; just merge in the dataclass | + +### KEEP actions (52 files unchanged): + +No reason to split. They're either: +- Single-domain files (e.g., `log_registry.py` is just session log registration) +- Already have natural boundaries (e.g., `theme_*.py` files are theme-specific, not polluted) +- Don't have import load time issues (e.g., `multi_agent_conductor.py` is MMA-specific but doesn't pull in heavy SDKs at import time) + +--- + +## Refactor Plan (5 phases, atomic commits per group) + +### Phase 1: Move ImGui LEAKS into `gui_2.py` (5 commits) + +For each of `bg_shader.py`, `shaders.py`, `command_palette.py`, `diff_viewer.py`, `patch_modal.py`: +1. Read source file +2. Add content to `gui_2.py` (in a clearly-marked section) +3. Update imports across the codebase (replace `from src.bg_shader import X` with `from src.gui_2 import X`) +4. Delete the original file via `git rm` +5. Verify all affected tests pass + +### Phase 2: Merge vendor files into `ai_client.py` (2 commits) + +For each of `vendor_capabilities.py`, `vendor_state.py`: +1. Read source file +2. Add content to `ai_client.py` (in a clearly-marked section "Vendor Capabilities" / "Vendor State") +3. Update imports across the codebase +4. Delete the original file via `git rm` +5. Verify all affected tests pass + +### Phase 3: Split `models.py` into `mma.py` + `project.py` + `project_files.py` (3 commits + 6 merges) + +1. Create `src/mma.py` with MMA Core + TrackState (from models.py) +2. Create `src/project.py` with ProjectContext + sub + config I/O (from models.py) +3. Create `src/project_files.py` with file-related dataclasses (from models.py) +4. Merge `Persona` into `personas.py` +5. Merge `Tool`, `ToolPreset` into `tool_presets.py` +6. Merge `BiasProfile` into `tool_bias.py` +7. Merge `TextEditorConfig`, `ExternalEditorConfig` into `external_editor.py` +8. Merge MCP config dataclasses into `mcp_client.py` +9. Merge `WorkspaceProfile` into `workspace_manager.py` +10. Reduce `models.py` to ~60 lines (Pydantic proxies + AGENT_TOOL_NAMES only) +11. Update all 136 import sites for the moved classes + +### Phase 4: Verify all 7 audit gates pass `--strict` (1 commit, no code changes) + +### Phase 5: End-of-track (2 commits: report + state) + +--- + +## Risks + +| # | Risk | Likelihood | Mitigation | +|---|---|---|---| +| R1 | ImGui LEAKS move breaks existing tests | low | Run full affected test set after each move; revert + fix on regression | +| R2 | Vendor merge into `ai_client.py` creates circular imports | medium | Vendor code uses vendor client holder from `ai_client.py`; both should already be in the same module hierarchy; if circular, the `vendor_capabilities.py` lazy import pattern (PROVIDERS) is the workaround | +| R3 | `models.py` split breaks 136 import sites | high | The split is mechanical but invasive; per-file move with regression-guard tests after each | +| R4 | The `ProviderPayload` / `UIPanelConfig` / `PathInfo` classes from `metadata_promotion_20260624` are in `models.py` per that track | high | These were added AFTER my taxonomy audit. Need to also move them to the right home (probably `project.py` or split into separate files) | + +--- + +## Acceptance Criteria (10 VCs) + +| # | Criterion | Verification | +|---|---|---| +| VC1 | ImGui imports limited to `gui_2.py` + `imgui_scopes.py` | `git grep -l "imgui_bundle\|from imgui\." HEAD -- 'src/*.py'` returns 2 files | +| VC2 | `src/bg_shader.py`, `src/shaders.py`, `src/command_palette.py`, `src/diff_viewer.py`, `src/patch_modal.py` deleted | `ls src/{bg_shader,shaders,command_palette,diff_viewer,patch_modal}.py` returns not-found | +| VC3 | `src/vendor_capabilities.py`, `src/vendor_state.py` deleted | `ls src/{vendor_capabilities,vendor_state}.py` returns not-found | +| VC4 | Vendor symbols importable from `src.ai_client` | `python -c "from src.ai_client import PROVIDER_CAPABILITIES, get_vendor_state"` | +| VC5 | `src/mma.py` exists with MMA Core + TrackState | `python -c "from src.mma import ThinkingSegment, Ticket, Track, WorkerContext, TrackState"` | +| VC6 | `src/project.py` exists with ProjectContext + sub + config I/O | `python -c "from src.project import ProjectContext, ProjectMeta, ProjectOutput, ProjectFiles, ProjectScreenshots, ProjectDiscussion, _clean_nones, load_config_from_disk, save_config_to_disk, parse_history_entries"` | +| VC7 | `src/project_files.py` exists with file-related dataclasses | `python -c "from src.project_files import FileItem, ContextPreset, ContextFileEntry, NamedViewPreset, Preset"` | +| VC8 | Persona/Tool/Editor/MCP/Workspace dataclasses in their proper sub-system files | `python -c "from src.personas import Persona; from src.tool_presets import Tool, ToolPreset; from src.tool_bias import BiasProfile; from src.external_editor import TextEditorConfig, ExternalEditorConfig; from src.mcp_client import MCPServerConfig, MCPConfiguration, VectorStoreConfig, RAGConfig, load_mcp_config; from src.workspace_manager import WorkspaceProfile"` | +| VC9 | `src/models.py` reduced to <100 lines (only Pydantic proxies + AGENT_TOOL_NAMES) | `wc -l src/models.py` returns < 100 | +| VC10 | All 7 audit gates pass `--strict` | same as current baseline | + +--- + +## Scope summary + +| Operation | Files affected | Net change | +|---|---|---| +| DELETE | 7 (5 ImGui + 2 vendor) | -7 files | +| CREATE | 3 (mma.py, project.py, project_files.py) | +3 files | +| MODIFY | 7 (ai_client.py, gui_2.py, personas.py, tool_presets.py, tool_bias.py, external_editor.py, mcp_client.py, workspace_manager.py) + reduce models.py | 8 files modified | +| TOTAL | 17 file changes; net -4 files | -4 files | + +Before: 65 files in `src/` +After: 61 files in `src/` (with cleaner taxonomy) + +--- + +## Open question: rename existing files for prefix consistency? + +The user said "top-level prefix for modules that cannot have their definitions in the single file". Renames are NOT required (the user wants minimal splitting). But for naming consistency, some renames MIGHT be considered: + +| Current name | Suggested rename | Reason | +|---|---|---| +| `mma_prompts.py` | (keep) | Already prefixed | +| `multi_agent_conductor.py` | `mma_conductor.py` | For consistency with `mma_prompts.py` | +| `dag_engine.py` | `mma_dag.py` | Same | +| `conductor_tech_lead.py` | `mma_tech_lead.py` | Same | +| `orchestrator_pm.py` | `mma_pm.py` | Same | +| `events.py` | (keep) | Generic, not MMA-specific | +| `gemini_cli_adapter.py` | (keep) | Per user: don't split ai_client.py; the adapter is its own concern | +| `qwen_adapter.py` | (keep) | Same | +| `mcp_tool_specs.py` | (keep) | Already prefixed | +| `beads_client.py` | (keep) | Beads is its own concern (separate from MCP client) | + +**Recommendation: do the renames as a SEPARATE phase if desired.** They improve clarity but are not strictly necessary. The user's main complaint is the dumping-ground problem (models.py), not naming convention. + +--- + +## Recommendation + +Execute the 5-phase refactor. The 3 substantive phases (Phase 1 ImGui merge, Phase 2 vendor merge, Phase 3 models split) are all justified. The renaming in the Open Question section is OPTIONAL — defer to a follow-up if the user wants. + +The user should approve this plan before Tier 2/3 starts executing. The plan is conservative: only moves that have a clear "good reason" per the user's principle. Everything else stays put. + +--- + +## See also + +- `docs/reports/FOLLOWUP_module_taxonomy_20260627.md` — previous taxonomy discussion (this document is the revised version) +- `AGENTS.md` — "File Size and Naming Convention" HARD RULE +- `conductor/code_styleguides/data_oriented_design.md` — "Prefer Fewer Types" principle +- `src/models.py` — current 1044-line dumping ground +- `conductor/tracks/cruft_elimination_20260627/SPEC_CORRECTION_phase_2.md` — Phase 2 spec correction (related to project.py refactor)