check point support MMA

2026-02-24 19:03:22 -05:00
parent 2bf55a89c2
commit f68a07e30e
9 changed files with 1795 additions and 0 deletions
--- a/MMA_Support/Manual_Slop_Mapping.md
+++ b/MMA_Support/Manual_Slop_Mapping.md
@@ -0,0 +1,22 @@
 # Mapping MMA to Manual Slop
 This document maps the components of the `manual_slop` project to the 4-Tier Hierarchical Multi-Model Architecture.
 ## Tier 1: User-Facing Model (Orchestrator)
 *   **`gui.py` & `gui_2.py`:** Provides the user interface for input and displays the synthesized output.
 *   **`ai_client.py`:** Acts as the primary orchestrator, managing the conversation loop and determining when to call specific tools or providers.
 ## Tier 2: Specialized Models (Experts/Tools)
 *   **`mcp_client.py`:** Provides a suite of specialized "tools" (e.g., `read_file`, `list_directory`, `search_files`) that act as domain experts for file system manipulation.
 *   **`shell_runner.py`:** A specialist tool for executing PowerShell scripts to perform system-level changes.
 *   **External AI Providers:** Gemini and Anthropic models are used as the "engines" behind these specialized operations.
 ## Tier 3: Data & Knowledge Base (Information)
 *   **`aggregate.py`:** The primary mechanism for building the context sent to the AI. It retrieves file contents and metadata to ground the AI's reasoning.
 *   **`manual_slop.toml`:** Stores project-specific configuration, tracked files, and discussion history.
 *   **`file_cache.py`:** Optimizes data retrieval from the local file system.
 ## Tier 4: Monitoring & Feedback (Governance)
 *   **`session_logger.py`:** Handles timestamped logging of communication history (`logs/comms_<ts>.log`) and tool calls.
 *   **`performance_monitor.py`:** Tracks metrics related to execution time and resource usage.
 *   **Script Archival:** Generated `.ps1` scripts are saved to `scripts/generated/` for later review and auditing.
--- a/MMA_Support/OriginalDiscussion.md
+++ b/MMA_Support/OriginalDiscussion.md
--- a/MMA_Support/Overview.md
+++ b/MMA_Support/Overview.md
@@ -0,0 +1,27 @@
 # 4-Tier Hierarchical Multi-Model Architecture (MMA) - Overview
 The 4-Tier Hierarchical Multi-Model Architecture is a conceptual framework designed to manage complexity in AI systems by decomposing responsibilities into distinct, specialized layers. This modular approach enhances scalability, maintainability, and overall system performance.
 ## Architectural Tiers
 1.  **Tier 1: User-Facing Model (The Orchestrator/Router)**
    *   Direct user interface and intent interpretation.
    *   Routes requests to appropriate specialized models or tools.
 2.  **Tier 2: Specialized Models (The Experts/Tools)**
    *   Domain-specific models or tools (e.g., code generation, data analysis).
    *   Performs the "heavy lifting" for specific tasks.
 3.  **Tier 3: Data & Knowledge Base (The Information Layer)**
    *   A repository of structured and unstructured information.
    *   Provides context and facts to specialized models.
 4.  **Tier 4: Monitoring & Feedback (The Governance Layer)**
    *   Overarching layer for evaluation, error analysis, and continuous improvement.
    *   Closes the loop between user experience and model refinement.
 ## Core Goals
 *   **Modularity:** Decouple different functions to allow for independent development.
 *   **Efficiency:** Use smaller, specialized models for specific tasks instead of one monolithic model.
 *   **Contextual Accuracy:** Ensure specialized tools have access to relevant data.
 *   **Continuous Improvement:** Establish a systematic way to monitor performance and iterate.
--- a/MMA_Support/Principles_Interactions.md
+++ b/MMA_Support/Principles_Interactions.md
@@ -0,0 +1,30 @@
 # Principles & Interactions
 The effectiveness of the 4-Tier Multi-Model Architecture depends on well-defined interfaces and clear communication protocols between layers.
 ## Interaction Flow
 1.  **Ingress:** The User sends a query to Tier 1.
 2.  **Intent & Routing:** Tier 1 analyzes the query and identifies the required expertise.
 3.  **Specialist Call:** Tier 1 dispatches a request to one or more Tier 2 specialists.
 4.  **Knowledge Retrieval:** Tier 2 specialists query Tier 3 for specific facts or context needed for their task.
 5.  **Execution:** Tier 2 specialists process the request using the retrieved data.
 6.  **Synthesis:** Tier 1 receives the output from Tier 2, synthesizes it, and presents it to the User.
 7.  **Observation:** Tier 4 logs the entire transaction, collects feedback, and updates metrics.
 ## Core Architectural Principles
 ### 1. Separation of Concerns
 Each tier should have a single, clear responsibility. Tier 1 should not perform heavy computation; Tier 2 should not handle user-facing conversation logic.
 ### 2. Standardized Communication
 Use structured data formats (like JSON) for all inter-tier communication. This ensures that different models (potentially from different providers) can work together seamlessly.
 ### 3. Graceful Degradation
 If a Tier 2 specialist fails or is unavailable, Tier 1 should be able to fall back to a more general model or provide a meaningful error message to the user.
 ### 4. Verification Over Trust
 Tier 1 should validate the output of Tier 2 specialists before presenting it to the user. Tier 4 should periodically audit the entire pipeline to ensure quality and safety.
 ### 5. Data Privacy & Governance
 Ensure that data flowing through Tier 3 and 4 is handled according to security policies, with proper sanitization and access controls.
--- a/MMA_Support/Technical_Deep_Dive.md
+++ b/MMA_Support/Technical_Deep_Dive.md
@@ -0,0 +1,59 @@
 # Technical Deep Dive: Paths & Nuances
 This document explores the low-level technical execution paths and implementation nuances of the 4-Tier Hierarchical Multi-Model Architecture.
 ## 1. Execution Paths
 The architecture distinguishes between different "paths" to optimize for latency, cost, and accuracy.
 ### A. The Fast Path (Reactive)
 *   **Trigger:** Low-complexity intents (e.g., "Hello", "What is the current time?", "Status check").
 *   **Flow:** User -> Tier 1 -> User.
 *   **Nuance:** Tier 1 identifies that no specialized knowledge (Tier 3) or tool execution (Tier 2) is required. It responds directly using its internal weights or a local cache.
 *   **Goal:** Sub-100ms response time.
 ### B. The Slow Path (Reflective / Agentic)
 *   **Trigger:** Complex tasks (e.g., "Fix the bug in the UI layout", "Refactor the ai_client.py").
 *   **Flow:** User -> Tier 1 (Intent) -> Tier 2 (Specialist) -> Tier 3 (Context/RAG) -> Tier 2 (Execution) -> Tier 1 (Synthesis) -> User.
 *   **Nuance:** This involves high-latency operations, including tool calls and codebase searches. Tier 1 acts as a supervisor, potentially looping back to Tier 2 if the initial output is insufficient.
 ### C. The Governance Path (Tier 4 Integration)
 *   **Trigger:** Any operation that modifies the system or presents a high-risk answer.
 *   **Flow:** (Parallel or Post-hoc) Tier 1/2 Output -> Tier 4 (Validation) -> User/Log.
 *   **Nuance:** Tier 4 runs an "LLM-as-a-judge" or a static analysis tool (like `ruff` or `mypy`) on the output. If validation fails, the system may automatically trigger a "re-plan" in Tier 1.
 ---
 ## 2. Context & Token Management
 A critical nuance is how the limited context window (token budget) is managed across tiers.
 ### A. Token Budgeting
 *   **Tier 1 (Global Context):** Holds the conversation history and high-level project metadata. Budget: ~20% of window.
 *   **Tier 2 (Local Context):** Receives a "surgical" injection of relevant files/data from Tier 3. Budget: ~60% of window.
 *   **Output Space:** Reserved for generating large code blocks or summaries. Budget: ~20% of window.
 ### B. Context Folding (The "Accordion" Effect)
 To prevent context overflow, the system "folds" (summarizes) older parts of the conversation.
 *   **Recent History:** Full fidelity.
 *   **Mid-term History:** Summarized by Tier 1.
 *   **Long-term History:** Archived in Tier 3 (searchable but not in-context).
 ---
 ## 3. Communication Protocols
 *   **Inter-Tier Format:** Strictly structured JSON (e.g., OpenAI Tool Call format or Google GenAI Function Call).
 *   **Streaming:** Tier 1 typically streams its "thinking" process (Slow Path) to provide the user with immediate feedback while Tier 2 is still working.
 *   **Handshake:** Tier 2 must acknowledge receipt of context from Tier 3 with a "Digest" hash to ensure data integrity.
 ---
 ## 4. Nuances vs. Standard RAG
 | Feature | Standard RAG | MMA (4-Tier) |
 | :--- | :--- | :--- |
 | **Logic** | Flat (Query -> Doc -> Result) | Hierarchical (Intent -> Route -> Expert -> Doc) |
 | **Expertise** | Homogeneous | Heterogeneous (Different models for different tiers) |
 | **Feedback** | Manual | Automated (Tier 4 Closed-loop) |
 | **State** | Stateless or simple session | Multi-layered state (Orchestrator vs Specialist state) |
--- a/MMA_Support/Tier1_Orchestrator.md
+++ b/MMA_Support/Tier1_Orchestrator.md
@@ -0,0 +1,30 @@
 # Tier 1: User-Facing Model (Orchestrator/Router)
 The User-Facing Model is the entry point for all user interactions. It serves as the "brain" that understands what the user wants and decides how the system should respond.
 ## Key Responsibilities
 ### 1. Intent Recognition
 *   Analyze the user's natural language input.
 *   Classify the request into one or more categories (e.g., "request for code", "general inquiry", "data analysis").
 *   Extract key parameters and constraints from the user's query.
 ### 2. Routing
 *   Map recognized intents to specific Tier 2 models or tools.
 *   Determine if multiple specialized tools need to be called in sequence or parallel.
 *   Handle tool dispatching and manage the flow of data between tiers.
 ### 3. Context Management
 *   Maintain the history of the conversation.
 *   Decide what information from the history is relevant to the current turn.
 *   Synthesize a coherent prompt for downstream models based on the current context.
 ### 4. Response Synthesis
 *   Integrate the raw outputs from Tier 2 models into a final, user-friendly response.
 *   Ensure the tone and style are consistent with user expectations.
 *   Validate that the final response directly addresses the user's original intent.
 ## Characteristics
 *   **High Reasoning:** Needs to be strong at logic and instruction following.
 *   **General Purpose:** While not necessarily a domain expert, it must be broad enough to understand any valid user input.
 *   **Speed:** Should ideally be responsive to minimize perceived latency.
--- a/MMA_Support/Tier2_Specialists.md
+++ b/MMA_Support/Tier2_Specialists.md
@@ -0,0 +1,28 @@
 # Tier 2: Specialized Models (Experts/Tools)
 Tier 2 consists of a collection of specialized agents, models, or tools, each optimized for a specific domain or task. This allows the system to leverage "best-in-class" capabilities for different problems.
 ## Key Responsibilities
 ### 1. Task Execution
 *   Perform deep processing in a specific area (e.g., writing Python code, generating images, performing complex mathematical calculations).
 *   Operate within the constraints provided by the Tier 1 Orchestrator.
 ### 2. Domain Expertise
 *   Provide specialized knowledge that a general model might lack.
 *   Utilize specialized formatting or protocols (e.g., returning structured JSON for data analysis tools).
 ### 3. Tool Integration
 *   Act as wrappers for external APIs or local scripts (e.g., `shell_runner` in Manual Slop).
 *   Manage its own internal state or "scratchpad" during complex multi-step operations.
 ## Common Specialist Examples
 *   **Code Expert:** Optimized for high-quality software engineering and debugging.
 *   **Search/Web Tool:** Specialized in retrieving and summarizing real-time information.
 *   **Data Scientist:** Capable of running statistical models and generating visualizations.
 *   **Creative Writer:** Focused on tone, narrative, and artistic expression.
 ## Implementation Principles
 *   **Fine-Tuning:** Models in this tier are often smaller models fine-tuned on specialized datasets.
 *   **Isolation:** Specialists should ideally be stateless or have well-defined, temporary state to prevent cross-contamination.
 *   **Interface Standards:** Use consistent input/output formats (like JSON) to simplify communication with Tier 1.
--- a/MMA_Support/Tier3_Knowledge.md
+++ b/MMA_Support/Tier3_Knowledge.md
@@ -0,0 +1,27 @@
 # Tier 3: Data & Knowledge Base (Information Layer)
 Tier 3 is the foundational layer that provides the necessary facts, documents, and data required by the higher tiers. It is a passive repository that enables informed reasoning and specialized processing.
 ## Key Responsibilities
 ### 1. Information Storage
 *   Maintain large-scale repositories of structured data (SQL/NoSQL databases) and unstructured data (PDFs, Markdown files, Codebases).
 *   Host internal company documents, project-specific files, and external knowledge graphs.
 ### 2. Retrieval Mechanisms (RAG)
 *   Support efficient querying via Vector Search, keyword indexing, or metadata filtering.
 *   Provide Retrieval-Augmented Generation (RAG) capabilities to enrich the prompts of Tier 2 models with relevant snippets.
 ### 3. Contextual Enrichment
 *   Supply specialized models with "ground truth" data to minimize hallucinations.
 *   Manage versioned data to ensure the system reflects the most up-to-date information.
 ## Components
 *   **Vector Databases:** (e.g., Pinecone, Milvus, Chroma) for semantic search.
 *   **Traditional Databases:** (e.g., PostgreSQL) for structured business data.
 *   **File Systems:** Local or cloud storage for direct file access.
 *   **External APIs:** Real-time data sources (weather, finance, etc.).
 ## Interactions
 *   Tier 2 specialists query Tier 3 to get the data they need to perform their tasks.
 *   Tier 1 may occasionally query Tier 3 directly to determine if sufficient information exists before routing.
--- a/MMA_Support/Tier4_Monitoring.md
+++ b/MMA_Support/Tier4_Monitoring.md
@@ -0,0 +1,27 @@
 # Tier 4: Monitoring & Feedback (Governance Layer)
 Tier 4 acts as the "supervisor" of the entire architecture. It ensures the system is performing correctly, ethically, and efficiently, while providing a path for continuous evolution.
 ## Key Responsibilities
 ### 1. Performance Monitoring
 *   Track latency, token usage, and error rates across all tiers.
 *   Identify bottlenecks (e.g., a Tier 2 specialist that is consistently slow).
 ### 2. Evaluation & Feedback
 *   Collect explicit user feedback (e.g., "Good/Bad" ratings).
 *   Perform automated evaluation using "LLM-as-a-judge" to score responses based on accuracy, tone, and safety.
 *   Log failures for manual review and human-in-the-loop (HITL) intervention.
 ### 3. Error Analysis & Root Cause
 *   Analyze why specific routes failed or why a specialist produced a low-quality output.
 *   Maintain a "lesson learned" database to inform future system prompts or fine-tuning.
 ### 4. Continuous Improvement
 *   Inform the retraining or fine-tuning of Tier 2 models based on real-world usage patterns.
 *   Optimize Tier 1 routing logic based on success/failure metrics.
 ## Tools & Techniques
 *   **Logging/Observability:** (e.g., LangSmith, Weights & Biases, custom JSON-L logs).
 *   **A/B Testing:** Compare different model versions or routing strategies.
 *   **Red Teaming:** Proactively test the system for vulnerabilities and biases.