Private
Public Access
0
0
Commit Graph

1001 Commits

Author SHA1 Message Date
ed b91962e458 conductor(plan): Mark Phase 5D complete - gui_2 lazy proxy + dead import removal 2026-06-06 17:19:14 -04:00
ed f7b11f7f1c conductor(plan): write 5-phase implementation plan for data_oriented_error_handling_20260606
~25 tasks across 5 phases, each with explicit Red-Green-Refactor TDD steps:
- Phase 1 (1.1-1.9): Foundation. Post-tracks baseline verification, typing_extensions
  dep, src/result_types.py (10 unit tests), conductor/code_styleguides/error_handling.md
  canonical reference, product-guidelines.md + workflow.md updates.
- Phase 2 (2.1-2.7): mcp_client.py refactor. _resolve_and_check returns Result[Path];
  all 9 tool functions return Result[str]; 30+ 'assert p is not None' chain removed;
  tool dispatch updated; existing tests migrated to .data/.errors pattern.
- Phase 3 (3.1-3.8): ai_client.py refactor (HIGHEST RISK). _classify_<vendor>_error()
  returns ErrorInfo (not raise ProviderError); _send_<vendor>() renamed to
  _send_<vendor>_result() returning Result[str] (8 vendors); ProviderError class
  REMOVED; new public send_result() API; send() marked @deprecated (rewired to
  call send_result() and unwrap).
- Phase 4 (4.1-4.5): rag_engine.py refactor. _init_vector_store, _validate_collection_dim
  return Result; NilRAGState used; broad except Exception becomes ErrorInfo entries.
- Phase 5 (5.1-5.7): Deprecation wiring (filterwarnings in conftest.py to silence
  send() warning in existing tests), docs updates (guide_ai_client + guide_mcp_client),
  follow-up track public_api_migration_20260606 placeholder in tracks.md, manual
  smoke test, archive the track.

Coordination with the 3 pending tracks (startup_speedup, test_batching_refactor,
qwen_llama_grok_integration) addressed throughout. Phase 1 Task 1.1 verifies the
baseline before any refactor begins. Post-tracks state considerations from spec
§10 fully integrated into the task breakdown.

1-space indentation per project style guide. No placeholders. All test code
is concrete. Self-review at end confirms full spec coverage (every section
of spec.md mapped to a task).
2026-06-06 17:06:30 -04:00
ed 32edad0a4b conductor(plan): Mark Phase 5A-5C complete (commands, theme_2, markdown_helper lazy imports) 2026-06-06 17:01:05 -04:00
ed cbc3b075a0 conductor(track): Initialize data_oriented_error_handling_20260606
Track + metadata + state + tracks.md registration for the Fleury-pattern
error handling refactor.

Key design decisions (per user approval):
- Option A for _send_<vendor>() handling: rename to _send_<vendor>_result()
  and change return type to Result[str] (contained to internal callers).
- send() is marked @typing_extensions.deprecated; send_result() is the new
  public API.
- ProviderError exception is FULLY REPLACED by ErrorInfo dataclass
  (a value, not an exception).
- 5 phases: foundation, mcp_client, ai_client, rag_engine, deprecation+archive.
- Post-tracks baseline check (Phase 1 Task 1.1) verifies the 3 pending
  tracks have merged before proceeding.
- 9 Open Questions, 7 Risks, 5 verification criteria, follow-up track
  public_api_migration_20260606 planned in spec §12.1.

Blocked by: startup_speedup_20260606, test_batching_refactor_20260606,
qwen_llama_grok_integration_20260606. Blocks: public_api_migration_20260606.
2026-06-06 16:58:22 -04:00
ed 494f68f9d9 conductor(spec): Add 'Coordination with Pending Tracks' section (§10)
This track executes after startup_speedup, test_batching_refactor, and
qwen_llama_grok_integration land. Section 10 documents the expected
post-tracks codebase state and answers 6 critical coordination questions:

- Q1: Existing _send_<vendor>() functions (returning str) are renamed
  to _send_<vendor>_result() and changed to return Result[str] (Option A:
  clean rename, contained to internal callers).
- Q2: send_openai_compatible in src/openai_compatible.py STAYS as-is
  (it raises at the SDK boundary; correct per Fleury). The new
  _send_<vendor>_result() functions catch and convert to ErrorInfo.
- Q3: Deprecation warning on send() will produce Python warnings in
  tests; filterwarnings in conftest.py silences them during transition.
- Q4: The except ProviderError clauses in src/ai_client.py become
  dead code after the refactor and are removed in Phase 3.
- Q5: ProviderError is FULLY REPLACED by ErrorInfo (a value, not an
  exception). ProviderError removed entirely; ErrorInfo is the new
  error type.
- Q6: ProviderError.ui_message() moves to ErrorInfo.ui_message().

Phase 1 also adds a baseline verification task to confirm the 3 pending
tracks have merged before proceeding.

Also renumbered Out of Scope (11) and See Also (12) sections to
preserve monotonic section numbers.
2026-06-06 16:54:25 -04:00
ed 16291234ff conductor(plan): Record Phase 4 checkpoint SHA 883682c1 2026-06-06 16:37:27 -04:00
ed a0ff1bde91 conductor(plan): Mark Phase 4 complete - app_controller fastapi import removal + _require_warmed lift 2026-06-06 16:36:20 -04:00
ed 7fb13fbf4b conductor(plan): Record Phase 3 checkpoint SHA + mark T3.6 complete 2026-06-06 16:13:35 -04:00
ed 8905c26bff conductor(plan): Mark Phase 3 complete - ai_client SDK import removal done 2026-06-06 16:11:14 -04:00
ed 9eed60238a conductor(plan): mark T3.1 RED done; T3.2 holding for MCP fix (16780ec6) 2026-06-06 15:16:02 -04:00
ed b17cbbdeca conductor(plan): write 6-phase implementation plan for qwen_llama_grok_integration_20260606
~30 tasks across 6 phases, each with explicit Red-Green-Refactor TDD steps:
- Phase 1 (1.1-1.8): Capability matrix framework (src/vendor_capabilities.py)
  + shared OpenAI-compatible helper (src/openai_compatible.py). 13 unit tests.
- Phase 2 (2.1-2.8): Qwen via DashScope native SDK. 5 unit tests.
- Phase 3 (3.1-3.7): Grok (xAI) + Llama (Ollama + OpenRouter + custom URL)
  via shared helper. 8 unit tests.
- Phase 4 (4.1-4.3): MiniMax refactor (_send_minimax from ~250 -> ~50 lines).
  Safety net: existing tests/test_minimax_provider.py.
- Phase 5 (5.1-5.5): 9 capability-driven UX adaptations in src/gui_2.py.
  Manual smoke test for all 3 new vendors.
- Phase 6 (6.1-6.4): Update docs/guide_ai_client.md + guide_models.md.
  Archive the track.

Data-oriented design: shared helper is the algorithm on normalized data;
_send_<vendor>() entry points are thin boundary adapters.

1-space indentation per project style guide. No placeholders. All test
code is concrete. Self-review at end confirms spec coverage (every
section of spec.md mapped to a task).
2026-06-06 15:06:30 -04:00
ed 97daaff29b conductor(spec): Fix Qwen-Audio matrix entry consistency (vision=false, audio deferred)
The capability matrix v1 has no 'audio' field (audio_input is deferred to v2).
Qwen-Audio's vision flag was incorrectly marked true. Changed to false and
clarified that v1 uses Qwen-Audio as text-only; audio attachment UI is
hidden via the absent audio capability check.
2026-06-06 14:58:03 -04:00
ed 7c1d597ef1 conductor(track): Initialize qwen_llama_grok_integration_20260606 spec
Three new vendors + capability matrix framework + MiniMax refactor:

**Capability matrix v1 (7 features):** vision, tool_calling, caching, streaming,
model_discovery, context_window, cost_tracking. Audio and server-side code
execution deferred to a follow-up track.

**Qwen via DashScope native SDK:** Qwen-Turbo, Qwen-Plus, Qwen-Max, Qwen-Long
(1M context), Qwen-VL-Plus/Max (vision), Qwen-Audio. Native API chosen over
OpenAI-compatible mode to unlock Qwen-Audio, Qwen-Long custom chunking, and
Qwen-VL-Max enhanced vision.

**Llama (OpenAI-compatible, multi-backend):** Ollama (local, free), OpenRouter
(cloud aggregator covering Together/Groq/Fireworks), custom URL escape hatch.
Models: Llama 3.1 8B/70B/405B, 3.2 1B/3B, 3.2 11B/90B Vision, 3.3 70B.

**Grok via xAI (OpenAI-compatible):** Grok-2, Grok-2-Vision, Grok-Beta.

**Shared OpenAI-compatible helper** in src/openai_compatible.py processes a
normalized request/response data structure; each _send_<vendor>() is a thin
adapter at the boundary (data-oriented design per Fleury/Acton/Lottes).

**MiniMax refactor:** ~250 lines reduced to ~50 by using the shared helper.
Existing test_minimax_provider.py is the safety net.

**UX adaptation:** 9 UI elements (screenshot, tools toggle, cache panel, stream
progress, fetch models, token budget, cost panel) read from the matrix instead
of hard-coding per-vendor branches.

**Out of scope (deferred):** Anthropic/Gemini/DeepSeek migration to the matrix
(separate track), audio input, server-side code execution, PDF input, batch API,
fine-tuning.

6 phases planned: matrix+helper, Qwen, Grok+Llama, MiniMax refactor, UX
adaptation, docs+archive.
2026-06-06 14:56:00 -04:00
ed 7eb743c6cb conductor(plan): Phase 2 complete - io_pool + warmup foundation in place
Phase 2 of startup_speedup_20260606 is done.

Tasks:
  T2.1 (Red)   tests/test_io_pool.py         1354679e  4 tests
  T2.2 (Green) src/io_pool.py                1354679e  make_io_pool() factory
  T2.3 (Red)   tests/test_warmup.py          1354679e  10 tests
  T2.4 (Green) src/warmup.py                 1354679e  WarmupManager
  T2.5 (Wire)  AppController integration     922c5ad9  io_pool + warmup in __init__ + 5 public delegation methods
  T2.6 (Plan)  this commit

What now exists:
  - make_io_pool() returns a 4-worker ThreadPoolExecutor named 'controller-io-N'
  - WarmupManager class with submit/status/is_done/wait/on_complete/reset
  - AppController creates self._io_pool + self._warmup early in __init__
  - Warmup is submitted immediately (jobs run concurrent with the rest of init)
  - Public API: controller.warmup_status(), controller.is_warmup_done(),
    controller.wait_for_warmup(timeout), controller.on_warmup_complete(cb)
  - controller._compute_warmup_list() returns 9 always + 2 conditional (fastapi)
  - shutdown() now also shuts down the io_pool

Currently the warmup is a no-op for modules already imported at the top
of app_controller.py (fastapi, requests). Phase 3 will remove those
top-level imports; the warmup infrastructure will then start doing
real work.

18/18 tests passing (4 io_pool + 10 warmup + 4 test_app_controller_*).

Next: Phase 3 (remove top-level SDK imports from src/ai_client.py).
Expected to fix ~3 audit violations (google.genai, anthropic, openai).
2026-06-06 14:52:04 -04:00
ed 7fdab70529 conductor(plan): write 4-phase implementation plan for test_batching_refactor_20260606
16 tasks across 4 phases, each with explicit Red-Green-Refactor TDD steps:
- Phase 1 (1.1-1.16): Library + dry-run. 20 unit tests across categorizer,
  batcher, plugin. New run_tests_batched.py has --plan/--audit only.
- Phase 2 (2.1-2.3): Shadow run via CI. Compare new vs old plan output.
- Phase 3 (3.1-3.4): Switch default. Full CLI with --tiers, --durations.
  Old script becomes .legacy. Update docs/guide_testing.md.
- Phase 4 (4.1-4.6): Populate registry, gitignore durations, delete
  legacy, archive track.

1-space indentation per project style guide. No placeholders. All
test code is concrete.
2026-06-06 14:24:39 -04:00
ed f9a0125847 conductor(plan): Phase 1 complete - baseline + audit infrastructure ready
Phase 1 of startup_speedup_20260606 track is done.

Tasks completed:
  T1.1 baseline benchmark        -> 6f9a3af2 (docs/reports/startup_baseline_20260606.txt)
  T1.2 audit_gui2_imports.py     -> 6f9a3af2 (scripts/ + audit results)
  T1.3 StartupProfiler           -> 5a856536 (src/ + 5 tests)
  T1.4 audit_main_thread_imports -> 6f9a3af2 (scripts/ + 9 tests)
  T1.5 plan update                -> this commit

Baseline numbers (3-run median, from scripts/benchmark_imports.py):
  src.gui_2                1770ms   (main-thread bottleneck)
  simulation.user_agent    1517ms
  google.genai             1001ms
  openai                    482ms
  anthropic                 441ms
  imgui_bundle              255ms   (KEEP - ImGui hot path)
  src.theme_nerv_fx         254ms
  src.theme_nerv            246ms
  src.markdown_table        243ms
  src.command_palette       242ms

Audit violations on current codebase: 67. These are the targets
for Phases 3-5 (remove top-level heavy imports to fix each one).

Next: Phase 2 (Job Pool + Warmup Foundation).
2026-06-06 14:24:20 -04:00
ed 0553983ce9 conductor(spec): Clarify --audit --strict semantics in Section 4.3
Default --audit exits non-zero on hard errors only. --strict adds the
'multiple subsystems = probably cross-cutting' heuristic from Section 9
as a CI gate. Two modes, one flag.
2026-06-06 14:16:13 -04:00
ed b7a9737443 conductor(track): Initialize test_batching_refactor_20260606 spec
Three-tier batching refactor: replace alphabetical 4-at-a-time batching with
fixture-class-isolated tiers (0 opt-in, 1 unit/xdist, 2 mock_app, 3 live_gui
in one session, H headless, P performance).

Hybrid classification: auto-infer from filename + AST fixture scan; hand-curated
tests/test_categories.toml overrides for cross-cutting and ambiguous files.

Opt-in per-test order control via [[files.X.test_order]] sub-tables, gated on
a conftest-loaded pytest plugin (no-op without entries).

Priority order: B (process isolation) > A (subsystem diagnostic) > C (speed).
2026-06-06 14:12:14 -04:00
ed 96158edd97 conductor(plan): mark T1.3 StartupProfiler complete (5a856536) 2026-06-06 13:59:02 -04:00
ed f2f5ee1197 conductor(plan): flip track from lazy-loading to proactive warmup
Architectural shift driven by user clarification: lazy-loading on first
use causes user-perceptible lag when the user-triggered action (e.g.
provider switch) propagates to a controller method that triggers the
first import. The fix is to pre-import heavy modules on a bg thread
at startup and have functions access them via _require_warmed().

Old design (rejected):
  - from google import genai inside _send_gemini (lazy on first call)
  - First user action that triggers this pays the cost; UI feels laggy

New design (this commit):
  - Top-level heavy imports REMOVED from main-thread-reachable files
  - AppController.__init__ submits warmup jobs to _io_pool (4 threads,
    named 'controller-io-N')
  - Each warmup worker imports its module and updates a thread-safe
    warmup_status dict
  - Functions access modules via _require_warmed(name), which assumes
    the module is in sys.modules (warmed at startup)
  - When all jobs complete, _warmup_done_event is set and registered
    on_warmup_complete callbacks fire
  - GUI shows status indicator + toast when warmup completes
  - Hook API exposes /api/warmup_status and /api/warmup_wait
  - Tests can call controller.wait_for_warmup() before exercising
    warmup-dependent functionality

Phase 2 now bundles job pool + warmup (T2.3+T2.4 add warmup tests +
implementation). Phases 3-5 do 'remove top-level imports' instead of
'lazy-load'. Phase 7 is the notification surface (Hook API + GUI).
Definition of Done includes warmup-completion criteria, the
'no function-body imports' check, and an end-to-end 'provider switch
is INSTANT' smoke test.

No code changes; this is a planning update only.
2026-06-06 13:45:05 -04:00
ed cd4fb04541 conductor(track): create startup_speedup_20260606 track for sloppy.py startup latency
Fulfills the existing backlog entry at conductor/tracks.md:152
(2026-06-05 root-cause analysis of live_gui wait_for_server timeouts).

Main Thread Purity Invariant: the main thread (entering immapp.run())
must never import a module heavier than imgui_bundle and the lean
gui_2 skeleton. Enforced by:
  - static gate: scripts/audit_main_thread_imports.py (CI)
  - runtime hook: tests/test_main_thread_purity.py (sys.addaudithook)

Threading constraint: no new threading.Thread(...) calls in src/.
All background work goes through AppController._io_pool
(ThreadPoolExecutor, max_workers=4, thread_name_prefix='controller-io').

9 phases, 57 tasks: audit+baseline, job pool, lazy-load SDKs, lazy-load
FastAPI, lazy-load feature-gated GUI, migrate ad-hoc threads, runtime
enforcement, hook API + diagnostics, verify+checkpoint.

Expected savings: ~2000-2400ms off main-thread import cost.
Target: import src.ai_client < 50ms (from ~1800ms), live_gui fixtures
no longer time out at wait_for_server(timeout=15).
2026-06-06 12:57:20 -04:00
ed 6c541bc788 move track mds to tracks 2026-06-06 00:42:40 -04:00
ed db3490a70f conductor(plan): document imgui save_ini crash root cause and fix 2026-06-05 15:12:23 -04:00
ed b0c8589f68 conductor(plan): document root cause - imgui-bundle C-level crash blocks live_gui 2026-06-05 13:47:55 -04:00
ed 1c6919aafc conductor(plan): update task status - 5 done, 6 deferred pending live_gui 2026-06-05 12:43:33 -04:00
ed 07d35c9d39 conductor(plan): regression fixes - 21 failures from full suite run 2026-06-05 10:10:29 -04:00
ed cd24c43f8f conductor(plan): theme + syntax modularization - 7-task plan 2026-06-04 22:20:58 -04:00
ed ce211e76f8 straggler spec 2026-06-04 19:42:04 -04:00
ed ba7733b365 conductor(plan): Mark context_first_message_fix task complete 2026-06-04 18:47:42 -04:00
ed 6ce119dffe conductor(checkpoint): Fix markdown_helper.py for imgui-bundle >=1.92.801 complete 2026-06-03 00:54:07 -04:00
ed 7a34edf605 fixes 2026-06-03 00:47:40 -04:00
ed d14ae3bd08 conductor(checkpoint): Clean install test complete 2026-06-03 00:31:55 -04:00
ed 573d289941 test(pytest): register clean_install marker for opt-in clone-and-verify test 2026-06-03 00:28:20 -04:00
ed 0309420ba1 conductor(checkpoint): Archive Completed Tracks (2026-05 to 2026-06) complete 2026-06-03 00:19:13 -04:00
ed 594f14f943 conductor(archive): move 39 completed tracks (2026-05 to 2026-06) to archive/ 2026-06-03 00:09:52 -04:00
ed 1a323a8cd1 docs(track): docs layer gap analysis for comprehensive refresh 2026-06-02 18:45:01 -04:00
ed 8733528f67 fix(gui): Final monolithic stabilization and UI polish
- Restore monolithic architecture in gui_2.py to fix test compatibility.
- Implement full-width horizontal expansion for Markdown tables in discussion entries.
- Re-implement layered role-based tints using draw_list channels.
- Standardize Text Viewer docking ID to '###Text_Viewer_Unified'.
- Fix MiniMax compression routing and base URL.
- Fully restore missing theme_2.py definitions.
2026-06-02 18:04:49 -04:00
ed 7eb8f9eed4 chore(conductor): Add new track 'command_palette_and_performance_20260602' 2026-06-02 17:51:34 -04:00
ed 8f6f47d46b fix(gui): Final monolithic stabilization pass
- Restore monolithic architecture in gui_2.py to fix test breakages and circular imports.
- Update Text Viewer stable ID to '###Text_Viewer_Unified' to definitively fix docking conflicts.
- Refactor discussion entry renderer to force full-width horizontal expansion for Markdown.
- Fully restore theme_2.py definitions (palettes, fonts, scale) while retaining role-tint logic.
- Robustify ImGui ID stack in imgui_scopes.py to prevent access violations.
- Verify all fixes with the comprehensive unit and visual test suite.
2026-06-02 17:30:46 -04:00
ed ad98475a2e fix(gui): Definitive monolithic restoration and UI stabilization
- Restore all rendering logic to gui_2.py to maintain monolithic architecture and test compatibility.
- Fix horizontal squashing of Markdown tables by ensuring full panel width in entry groups.
- Resolve Text Viewer docking conflicts by standardizing on a stable window ID ('###Text_Viewer_Unified').
- Fix theme initialization by restoring missing load/save functions in theme_2.py.
- Prevent ImGui access violations by ensuring ID stack always receives strings in imgui_scopes.py.
- Successfully verified all UI regressions with a passing unit test suite.
2026-06-02 16:17:32 -04:00
ed df6aa1f455 fix(gui): Resolve Text Viewer docking conflict with unified ID
- Update Text Viewer window ID to '###Text_Viewer_Unified'.
- Ensures ImGui treats the window as a single stable entity across title changes.
- Prevents docking loop glitches.
2026-06-02 15:45:07 -04:00
ed 4d8e949720 fix(gui): Force newline in discussion entries to prevent squashed layout
- Insert imgui.new_line() before rendering discussion content.
- Ensures the Markdown renderer inherits the full horizontal width of the panel.
- Definitively fixes vertical squashing of tables and long text blocks.
2026-06-02 15:42:04 -04:00
ed fee41032b6 chore(conductor): Add new track 'phase7_monolithic_stabilization_20260602' 2026-06-02 15:14:18 -04:00
ed c4811f00c1 fix(gui): Final Phase 7 stabilization and polish
- Resolve ImportError by correctly prefixing 'src' in modular renderers.
- Fix ImGui access violation by ensuring push_id always receives string IDs.
- Restore visible role-based background tints using layered rendering (channels).
- Definitively fix horizontal Markdown table widths by forcing group expansion.
- Centralize color management in theme_2.py and ui_shared.py.
- Standardize Files & Media inventory layout and remove legacy controls.
- Update test mocks to support modular UI and theme-driven styling.
2026-06-02 13:27:38 -04:00
ed e9ff6efe20 UX UX UX UX UX 2026-06-02 02:58:33 -04:00
ed 5b7b818ed2 feat(gui): Implement per-response token metrics and AI discussion compression
- Display token metrics (input/output/cache) per response in Discussion Hub.
- Add total Discussion Token usage in the panel header.
- Implement 'Compress' feature to intelligently summarize and replace exhausted discussion histories using an AI subagent.
2026-06-02 01:36:57 -04:00
ed ff849e7990 chore(conductor): Add new track 'approve_modal_ux_20260601' 2026-06-02 00:43:23 -04:00
ed 797f283f44 chore(conductor): Add new tracks for Phase 7
- Add 'structural_file_editor_20260601' track.
- Add 'discussion_metrics_and_compression_20260601' track.
2026-06-02 00:25:10 -04:00
ed 4baaadd88d chore(conductor): Add new track 'context_composition_ux_20260601' 2026-06-02 00:19:59 -04:00
ed 0f859d81d6 feat(gui): Unified window state and fixed context preservation regressions
- Implement unified show_windows['Text Viewer'] state and fix docking conflict loops.
- Fix Tool Call row interactivity using spanned selectables.
- Fix context selection loss when switching/creating discussions.
- Implement 'Empty Context Warning' modal for safer generation.
- Correct IndentationError in app_controller.py.
- Remove legacy show_text_viewer attribute and update API hooks.
2026-06-02 00:18:48 -04:00