chore(conductor): Add new track 'Agent Tool Preference & Bias Tuning'

2026-03-08 14:09:06 -04:00
parent 20f5c34c4b
commit c766954c52
5 changed files with 98 additions and 0 deletions
@@ -17,6 +17,10 @@ This file tracks all major tracks for the project. Each track has its own detail
   *Link: [./tracks/rag_support_20260308/](./tracks/rag_support_20260308/)*
   *Goal: Add support for RAG (Retrieval-Augmented Generation) using local vector stores (Chroma/Qdrant), native vendor retrieval, and external RAG APIs. Implement indexing pipeline and retrieval UI.*

+3. [ ] **Track: Agent Tool Preference & Bias Tuning**
+   *Link: [./tracks/tool_bias_tuning_20260308/](./tracks/tool_bias_tuning_20260308/)*
+   *Goal: Influence agent tool selection via a weighting system. Implement semantic nudges in tool descriptions and a dynamic "Tooling Strategy" section in the system prompt. Includes GUI badges and sliders for weight adjustment.*
+
 ---

 ### GUI Overhauls & Visualizations
@@ -0,0 +1,5 @@
+# Track tool_bias_tuning_20260308 Context
+
+- [Specification](./spec.md)
+- [Implementation Plan](./plan.md)
+- [Metadata](./metadata.json)
@@ -0,0 +1,8 @@
+{
+  "track_id": "tool_bias_tuning_20260308",
+  "type": "feature",
+  "status": "new",
+  "created_at": "2026-03-08T14:09:00Z",
+  "updated_at": "2026-03-08T14:09:00Z",
+  "description": "Agent Tool Preference & Bias Tuning - Influencing tool selection via weighted descriptions and strategy nudges."
+}
@@ -0,0 +1,41 @@
+# Implementation Plan: Agent Tool Preference & Bias Tuning
+
+## Phase 1: Data Model & Storage Extension
+- [ ] Task: Extend the `ToolPreset` and `Tool` models.
+    - [ ] Update `src/tool_presets.py` (created in the dependency track) to include `weight` (int, 1-5) for tools.
+    - [ ] Add `parameter_bias` (dict mapping parameter names to priority strings) to the `Tool` model.
+    - [ ] Update `ToolPresetManager` to handle saving and loading these new fields from `tool_presets.toml`.
+- [ ] Task: Implement Global Bias Profiles.
+    - [ ] Define `BiasProfile` dataclass in `src/models.py`.
+    - [ ] Implement logic to store and retrieve these profiles from `config.toml`.
+- [ ] Task: Write unit tests for the extended data model and storage logic.
+- [ ] Task: Conductor - User Manual Verification 'Phase 1: Data Model Extension' (Protocol in workflow.md)
+
+## Phase 2: Orchestration & Nudging Logic
+- [ ] Task: Implement the `ToolBiasEngine` in `src/ai_client.py` (or a new module).
+    - [ ] Implement `apply_semantic_nudges(tool_definitions, preset)`: This function should modify tool descriptions with priority tags.
+    - [ ] Implement `generate_tooling_strategy(preset, global_bias)`: This function should return a Markdown string for the system prompt.
+- [ ] Task: Integrate the bias engine into the AI client `send()` loop.
+    - [ ] Ensure that for every agent turn, the tool definitions and system instructions are dynamically biased based on the active agent's role and selected preset.
+- [ ] Task: Write integration tests for the bias generation logic.
+    - [ ] Verify that high-weight tools correctly receive "[HIGH PRIORITY]" tags.
+    - [ ] Verify that the strategy section is correctly appended to the system instructions.
+- [ ] Task: Conductor - User Manual Verification 'Phase 2: Orchestration Logic' (Protocol in workflow.md)
+
+## Phase 3: GUI Integration
+- [ ] Task: Update the Tool Preset Manager UI.
+    - [ ] Add `imgui.slider_int` for each tool to adjust its weight.
+    - [ ] Add a sub-menu or modal for editing parameter-level bias.
+- [ ] Task: Enhance tool list visualization.
+    - [ ] Implement color-coded priority badges in the Operations panel and tool settings.
+- [ ] Task: Implement the "Bias Override" in the agent focus modal.
+    - [ ] Add a dropdown to select a global bias profile or a specific preset override before spawning a worker.
+- [ ] Task: Write visual regression tests using `live_gui` to verify the new UI components.
+- [ ] Task: Conductor - User Manual Verification 'Phase 3: GUI Integration' (Protocol in workflow.md)
+
+## Phase 4: Verification & Final Polish
+- [ ] Task: Create a Bias Efficacy Simulation.
+    - [ ] Implement a specialized simulation test where two tools could solve a problem, and verify the agent chooses the one with higher weight.
+- [ ] Task: Final UI polish (spacing, icons, tooltips explaining the bias system).
+- [ ] Task: Run full suite of relevant tests.
+- [ ] Task: Conductor - User Manual Verification 'Phase 4: Verification & Polish' (Protocol in workflow.md)
@@ -0,0 +1,40 @@
+# Specification: Agent Tool Preference & Bias Tuning
+
+## Overview
+This track introduces a mechanism to influence AI agent tool selection by implementing a weighting and scoring system at the orchestration layer. Since model APIs do not natively support tool priority, this feature uses semantic nudging (tags in tool descriptions) and explicit system instructions to "bias" the agent toward preferred tools and parameters.
+
+## Dependencies
+- This track is strictly dependent on the completion of the **Saved Tool Presets** track, as it extends the tool preset data model.
+
+## Functional Requirements
+- **Weighting Mechanism (Hybrid):**
+    - **Description Nudging:** Automatically prefix or suffix tool descriptions with priority indicators (e.g., `[HIGH PRIORITY]`, `[PREFERRED]`) based on their assigned weight.
+    - **Strategy Injection:** Dynamically generate a "Tooling Strategy" section in the System Prompt that lists preferred tools and usage guidelines based on the active preset and global bias.
+- **Priority Levels:**
+    - Support a 5-level priority scale (1: Lowest, 5: Highest).
+    - Default all tools to Level 3 (Neutral).
+- **Parameter-Level Bias:**
+    - Allow users to assign "Preferred" or "Discouraged" flags to specific tool parameters (e.g., biasing `search_files` to always use `pattern` instead of just `path`).
+- **Configuration & Storage:**
+    - **Preset-Based:** Store tool and parameter weights within the `tool_presets.toml` file.
+    - **Global Bias:** Implement a global "Bias Profile" (e.g., `Balanced`, `Discovery-Heavy`, `Execution-Focused`) that applies multipliers to tool categories.
+- **GUI Integration:**
+    - **Priority Badges:** Display color-coded badges (e.g., Red for High, Gray for Low) in tool lists and the Operations panel.
+    - **Weight Sliders:** Add sliders to the Tool Preset manager to allow fine-grained adjustment of tool weights.
+    - **Active Bias Control:** Include a "Bias Override" dropdown in the agent focus modal to allow temporary adjustments before spawning a worker.
+
+## Non-Functional Requirements
+- **Low Latency:** The dynamic generation of nudged descriptions and system instructions must not noticeably delay agent initialization.
+- **Provider Consistency:** The biasing strategy must be effective across Gemini, Anthropic, and OpenAI models.
+- **Scalability:** The system should handle future additions of new tools and parameters without requiring core logic changes.
+
+## Acceptance Criteria
+- [ ] Users can adjust tool weights in the Tool Preset manager and see the changes reflected in color-coded badges.
+- [ ] Tool descriptions sent to the AI include semantic priority tags based on the assigned weights.
+- [ ] The System Prompt includes a dynamically generated "Tooling Strategy" section.
+- [ ] Agents show a statistically significant preference for high-weight tools in controlled tests.
+- [ ] Parameter-level bias correctly influences how agents formulate tool calls.
+
+## Out of Scope
+- Implementing reinforcement learning to "learn" tool weights automatically.
+- Hardcoding weights into the AI client (all weights must be user-configurable via presets).