Files
manual_slop/conductor/tracks/gemini_cli_headless_20260224/spec.md

3.1 KiB

Specification: Gemini CLI Headless Integration

Overview

This track integrates the gemini CLI as a headless backend provider for Manual Slop. This allows users to leverage their Gemini subscription and the CLI's advanced features (e.g., specialized sub-agents like codebase_investigator, structured JSON streaming, and robust session management) directly within the Manual Slop GUI.

Goals

  • Add "Gemini CLI" as a selectable AI provider in Manual Slop.
  • Support both persistent interactive sessions and one-off task-specific delegation (e.g., running gemini investigate).
  • Implement a secure "BeforeTool" hook to ensure all CLI-initiated tool calls are intercepted and confirmed via the Manual Slop GUI.
  • Capture and display the CLI's visually enriched output (via JSONL stream) within the existing discussion history.

Functional Requirements

1. Gemini CLI Provider Adapter

  • Implementation: Create a GeminiCliAdapter class (or extend ai_client.py) that wraps the gemini CLI subprocess.
  • Communication: Use --output-format stream-json to receive real-time updates (text chunks, tool calls, status).
  • Session Management: Support session persistence by tracking the session ID and passing it to subsequent CLI calls.
  • Authentication:
    • Provide a "Login to Gemini CLI" action in the GUI that triggers gemini login.
    • Support passing an API key via environment variables if configured in manual_slop.toml.

2. GUI Intercepted Tool Execution

  • Mechanism: Use the Gemini CLI's BeforeTool hook.
  • Hook Helper: A small Python script scripts/cli_tool_bridge.py will be registered as the BeforeTool hook.
  • IPC: This bridge script will communicate with Manual Slop's HookServer (extending it to support synchronous "ask" requests).
  • Confirmation: When a tool is requested, the bridge blocks until the user confirms/denies the action in the GUI, returning the decision as JSON to the CLI.

3. Visual & Telemetry Integration

  • Rich Output: Parse the stream-json events to display markdown content and tool status in the GUI.
  • Telemetry: Extract and display token usage and latency metrics provided by the CLI's result event.

Non-Functional Requirements

  • Performance: The subprocess bridge should introduce minimal latency (<100ms overhead for communication).
  • Reliability: Gracefully handle CLI crashes or timeouts by reporting errors in the GUI and allowing session resets.

Acceptance Criteria

  • User can select "Gemini CLI" in the Provider dropdown.
  • User can successfully send messages and receive streamed responses from the CLI.
  • Any tool call (PowerShell/MCP) initiated by the CLI triggers the standard Manual Slop confirmation modal.
  • Tools only execute after user approval; rejection correctly notifies the CLI agent.
  • Session history is maintained correctly across multiple turns when using the CLI provider.

Out of Scope

  • Full terminal emulation (ANSI color support) within the GUI; the focus is on structured text and data.
  • Migrating existing raw client_api sessions to CLI sessions.