check point support MMA

2026-02-24 19:03:22 -05:00
parent 2bf55a89c2
commit f68a07e30e
9 changed files with 1795 additions and 0 deletions
@@ -0,0 +1,22 @@
+# Mapping MMA to Manual Slop
+
+This document maps the components of the `manual_slop` project to the 4-Tier Hierarchical Multi-Model Architecture.
+
+## Tier 1: User-Facing Model (Orchestrator)
+*   **`gui.py` & `gui_2.py`:** Provides the user interface for input and displays the synthesized output.
+*   **`ai_client.py`:** Acts as the primary orchestrator, managing the conversation loop and determining when to call specific tools or providers.
+
+## Tier 2: Specialized Models (Experts/Tools)
+*   **`mcp_client.py`:** Provides a suite of specialized "tools" (e.g., `read_file`, `list_directory`, `search_files`) that act as domain experts for file system manipulation.
+*   **`shell_runner.py`:** A specialist tool for executing PowerShell scripts to perform system-level changes.
+*   **External AI Providers:** Gemini and Anthropic models are used as the "engines" behind these specialized operations.
+
+## Tier 3: Data & Knowledge Base (Information)
+*   **`aggregate.py`:** The primary mechanism for building the context sent to the AI. It retrieves file contents and metadata to ground the AI's reasoning.
+*   **`manual_slop.toml`:** Stores project-specific configuration, tracked files, and discussion history.
+*   **`file_cache.py`:** Optimizes data retrieval from the local file system.
+
+## Tier 4: Monitoring & Feedback (Governance)
+*   **`session_logger.py`:** Handles timestamped logging of communication history (`logs/comms_<ts>.log`) and tool calls.
+*   **`performance_monitor.py`:** Tracks metrics related to execution time and resource usage.
+*   **Script Archival:** Generated `.ps1` scripts are saved to `scripts/generated/` for later review and auditing.
@@ -0,0 +1,27 @@
+# 4-Tier Hierarchical Multi-Model Architecture (MMA) - Overview
+
+The 4-Tier Hierarchical Multi-Model Architecture is a conceptual framework designed to manage complexity in AI systems by decomposing responsibilities into distinct, specialized layers. This modular approach enhances scalability, maintainability, and overall system performance.
+
+## Architectural Tiers
+
+1.  **Tier 1: User-Facing Model (The Orchestrator/Router)**
+    *   Direct user interface and intent interpretation.
+    *   Routes requests to appropriate specialized models or tools.
+
+2.  **Tier 2: Specialized Models (The Experts/Tools)**
+    *   Domain-specific models or tools (e.g., code generation, data analysis).
+    *   Performs the "heavy lifting" for specific tasks.
+
+3.  **Tier 3: Data & Knowledge Base (The Information Layer)**
+    *   A repository of structured and unstructured information.
+    *   Provides context and facts to specialized models.
+
+4.  **Tier 4: Monitoring & Feedback (The Governance Layer)**
+    *   Overarching layer for evaluation, error analysis, and continuous improvement.
+    *   Closes the loop between user experience and model refinement.
+
+## Core Goals
+*   **Modularity:** Decouple different functions to allow for independent development.
+*   **Efficiency:** Use smaller, specialized models for specific tasks instead of one monolithic model.
+*   **Contextual Accuracy:** Ensure specialized tools have access to relevant data.
+*   **Continuous Improvement:** Establish a systematic way to monitor performance and iterate.
@@ -0,0 +1,30 @@
+# Principles & Interactions
+
+The effectiveness of the 4-Tier Multi-Model Architecture depends on well-defined interfaces and clear communication protocols between layers.
+
+## Interaction Flow
+
+1.  **Ingress:** The User sends a query to Tier 1.
+2.  **Intent & Routing:** Tier 1 analyzes the query and identifies the required expertise.
+3.  **Specialist Call:** Tier 1 dispatches a request to one or more Tier 2 specialists.
+4.  **Knowledge Retrieval:** Tier 2 specialists query Tier 3 for specific facts or context needed for their task.
+5.  **Execution:** Tier 2 specialists process the request using the retrieved data.
+6.  **Synthesis:** Tier 1 receives the output from Tier 2, synthesizes it, and presents it to the User.
+7.  **Observation:** Tier 4 logs the entire transaction, collects feedback, and updates metrics.
+
+## Core Architectural Principles
+
+### 1. Separation of Concerns
+Each tier should have a single, clear responsibility. Tier 1 should not perform heavy computation; Tier 2 should not handle user-facing conversation logic.
+
+### 2. Standardized Communication
+Use structured data formats (like JSON) for all inter-tier communication. This ensures that different models (potentially from different providers) can work together seamlessly.
+
+### 3. Graceful Degradation
+If a Tier 2 specialist fails or is unavailable, Tier 1 should be able to fall back to a more general model or provide a meaningful error message to the user.
+
+### 4. Verification Over Trust
+Tier 1 should validate the output of Tier 2 specialists before presenting it to the user. Tier 4 should periodically audit the entire pipeline to ensure quality and safety.
+
+### 5. Data Privacy & Governance
+Ensure that data flowing through Tier 3 and 4 is handled according to security policies, with proper sanitization and access controls.
@@ -0,0 +1,59 @@
+# Technical Deep Dive: Paths & Nuances
+
+This document explores the low-level technical execution paths and implementation nuances of the 4-Tier Hierarchical Multi-Model Architecture.
+
+## 1. Execution Paths
+
+The architecture distinguishes between different "paths" to optimize for latency, cost, and accuracy.
+
+### A. The Fast Path (Reactive)
+*   **Trigger:** Low-complexity intents (e.g., "Hello", "What is the current time?", "Status check").
+*   **Flow:** User -> Tier 1 -> User.
+*   **Nuance:** Tier 1 identifies that no specialized knowledge (Tier 3) or tool execution (Tier 2) is required. It responds directly using its internal weights or a local cache.
+*   **Goal:** Sub-100ms response time.
+
+### B. The Slow Path (Reflective / Agentic)
+*   **Trigger:** Complex tasks (e.g., "Fix the bug in the UI layout", "Refactor the ai_client.py").
+*   **Flow:** User -> Tier 1 (Intent) -> Tier 2 (Specialist) -> Tier 3 (Context/RAG) -> Tier 2 (Execution) -> Tier 1 (Synthesis) -> User.
+*   **Nuance:** This involves high-latency operations, including tool calls and codebase searches. Tier 1 acts as a supervisor, potentially looping back to Tier 2 if the initial output is insufficient.
+
+### C. The Governance Path (Tier 4 Integration)
+*   **Trigger:** Any operation that modifies the system or presents a high-risk answer.
+*   **Flow:** (Parallel or Post-hoc) Tier 1/2 Output -> Tier 4 (Validation) -> User/Log.
+*   **Nuance:** Tier 4 runs an "LLM-as-a-judge" or a static analysis tool (like `ruff` or `mypy`) on the output. If validation fails, the system may automatically trigger a "re-plan" in Tier 1.
+
+---
+
+## 2. Context & Token Management
+
+A critical nuance is how the limited context window (token budget) is managed across tiers.
+
+### A. Token Budgeting
+*   **Tier 1 (Global Context):** Holds the conversation history and high-level project metadata. Budget: ~20% of window.
+*   **Tier 2 (Local Context):** Receives a "surgical" injection of relevant files/data from Tier 3. Budget: ~60% of window.
+*   **Output Space:** Reserved for generating large code blocks or summaries. Budget: ~20% of window.
+
+### B. Context Folding (The "Accordion" Effect)
+To prevent context overflow, the system "folds" (summarizes) older parts of the conversation.
+*   **Recent History:** Full fidelity.
+*   **Mid-term History:** Summarized by Tier 1.
+*   **Long-term History:** Archived in Tier 3 (searchable but not in-context).
+
+---
+
+## 3. Communication Protocols
+
+*   **Inter-Tier Format:** Strictly structured JSON (e.g., OpenAI Tool Call format or Google GenAI Function Call).
+*   **Streaming:** Tier 1 typically streams its "thinking" process (Slow Path) to provide the user with immediate feedback while Tier 2 is still working.
+*   **Handshake:** Tier 2 must acknowledge receipt of context from Tier 3 with a "Digest" hash to ensure data integrity.
+
+---
+
+## 4. Nuances vs. Standard RAG
+
+| Feature | Standard RAG | MMA (4-Tier) |
+| :--- | :--- | :--- |
+| **Logic** | Flat (Query -> Doc -> Result) | Hierarchical (Intent -> Route -> Expert -> Doc) |
+| **Expertise** | Homogeneous | Heterogeneous (Different models for different tiers) |
+| **Feedback** | Manual | Automated (Tier 4 Closed-loop) |
+| **State** | Stateless or simple session | Multi-layered state (Orchestrator vs Specialist state) |
@@ -0,0 +1,30 @@
+# Tier 1: User-Facing Model (Orchestrator/Router)
+
+The User-Facing Model is the entry point for all user interactions. It serves as the "brain" that understands what the user wants and decides how the system should respond.
+
+## Key Responsibilities
+
+### 1. Intent Recognition
+*   Analyze the user's natural language input.
+*   Classify the request into one or more categories (e.g., "request for code", "general inquiry", "data analysis").
+*   Extract key parameters and constraints from the user's query.
+
+### 2. Routing
+*   Map recognized intents to specific Tier 2 models or tools.
+*   Determine if multiple specialized tools need to be called in sequence or parallel.
+*   Handle tool dispatching and manage the flow of data between tiers.
+
+### 3. Context Management
+*   Maintain the history of the conversation.
+*   Decide what information from the history is relevant to the current turn.
+*   Synthesize a coherent prompt for downstream models based on the current context.
+
+### 4. Response Synthesis
+*   Integrate the raw outputs from Tier 2 models into a final, user-friendly response.
+*   Ensure the tone and style are consistent with user expectations.
+*   Validate that the final response directly addresses the user's original intent.
+
+## Characteristics
+*   **High Reasoning:** Needs to be strong at logic and instruction following.
+*   **General Purpose:** While not necessarily a domain expert, it must be broad enough to understand any valid user input.
+*   **Speed:** Should ideally be responsive to minimize perceived latency.
@@ -0,0 +1,28 @@
+# Tier 2: Specialized Models (Experts/Tools)
+
+Tier 2 consists of a collection of specialized agents, models, or tools, each optimized for a specific domain or task. This allows the system to leverage "best-in-class" capabilities for different problems.
+
+## Key Responsibilities
+
+### 1. Task Execution
+*   Perform deep processing in a specific area (e.g., writing Python code, generating images, performing complex mathematical calculations).
+*   Operate within the constraints provided by the Tier 1 Orchestrator.
+
+### 2. Domain Expertise
+*   Provide specialized knowledge that a general model might lack.
+*   Utilize specialized formatting or protocols (e.g., returning structured JSON for data analysis tools).
+
+### 3. Tool Integration
+*   Act as wrappers for external APIs or local scripts (e.g., `shell_runner` in Manual Slop).
+*   Manage its own internal state or "scratchpad" during complex multi-step operations.
+
+## Common Specialist Examples
+*   **Code Expert:** Optimized for high-quality software engineering and debugging.
+*   **Search/Web Tool:** Specialized in retrieving and summarizing real-time information.
+*   **Data Scientist:** Capable of running statistical models and generating visualizations.
+*   **Creative Writer:** Focused on tone, narrative, and artistic expression.
+
+## Implementation Principles
+*   **Fine-Tuning:** Models in this tier are often smaller models fine-tuned on specialized datasets.
+*   **Isolation:** Specialists should ideally be stateless or have well-defined, temporary state to prevent cross-contamination.
+*   **Interface Standards:** Use consistent input/output formats (like JSON) to simplify communication with Tier 1.
@@ -0,0 +1,27 @@
+# Tier 3: Data & Knowledge Base (Information Layer)
+
+Tier 3 is the foundational layer that provides the necessary facts, documents, and data required by the higher tiers. It is a passive repository that enables informed reasoning and specialized processing.
+
+## Key Responsibilities
+
+### 1. Information Storage
+*   Maintain large-scale repositories of structured data (SQL/NoSQL databases) and unstructured data (PDFs, Markdown files, Codebases).
+*   Host internal company documents, project-specific files, and external knowledge graphs.
+
+### 2. Retrieval Mechanisms (RAG)
+*   Support efficient querying via Vector Search, keyword indexing, or metadata filtering.
+*   Provide Retrieval-Augmented Generation (RAG) capabilities to enrich the prompts of Tier 2 models with relevant snippets.
+
+### 3. Contextual Enrichment
+*   Supply specialized models with "ground truth" data to minimize hallucinations.
+*   Manage versioned data to ensure the system reflects the most up-to-date information.
+
+## Components
+*   **Vector Databases:** (e.g., Pinecone, Milvus, Chroma) for semantic search.
+*   **Traditional Databases:** (e.g., PostgreSQL) for structured business data.
+*   **File Systems:** Local or cloud storage for direct file access.
+*   **External APIs:** Real-time data sources (weather, finance, etc.).
+
+## Interactions
+*   Tier 2 specialists query Tier 3 to get the data they need to perform their tasks.
+*   Tier 1 may occasionally query Tier 3 directly to determine if sufficient information exists before routing.
@@ -0,0 +1,27 @@
+# Tier 4: Monitoring & Feedback (Governance Layer)
+
+Tier 4 acts as the "supervisor" of the entire architecture. It ensures the system is performing correctly, ethically, and efficiently, while providing a path for continuous evolution.
+
+## Key Responsibilities
+
+### 1. Performance Monitoring
+*   Track latency, token usage, and error rates across all tiers.
+*   Identify bottlenecks (e.g., a Tier 2 specialist that is consistently slow).
+
+### 2. Evaluation & Feedback
+*   Collect explicit user feedback (e.g., "Good/Bad" ratings).
+*   Perform automated evaluation using "LLM-as-a-judge" to score responses based on accuracy, tone, and safety.
+*   Log failures for manual review and human-in-the-loop (HITL) intervention.
+
+### 3. Error Analysis & Root Cause
+*   Analyze why specific routes failed or why a specialist produced a low-quality output.
+*   Maintain a "lesson learned" database to inform future system prompts or fine-tuning.
+
+### 4. Continuous Improvement
+*   Inform the retraining or fine-tuning of Tier 2 models based on real-world usage patterns.
+*   Optimize Tier 1 routing logic based on success/failure metrics.
+
+## Tools & Techniques
+*   **Logging/Observability:** (e.g., LangSmith, Weights & Biases, custom JSON-L logs).
+*   **A/B Testing:** Compare different model versions or routing strategies.
+*   **Red Teaming:** Proactively test the system for vulnerabilities and biases.