check point support MMA

This commit is contained in:
2026-02-24 19:03:22 -05:00
parent 2bf55a89c2
commit f68a07e30e
9 changed files with 1795 additions and 0 deletions

View File

@@ -0,0 +1,22 @@
# Mapping MMA to Manual Slop
This document maps the components of the `manual_slop` project to the 4-Tier Hierarchical Multi-Model Architecture.
## Tier 1: User-Facing Model (Orchestrator)
* **`gui.py` & `gui_2.py`:** Provides the user interface for input and displays the synthesized output.
* **`ai_client.py`:** Acts as the primary orchestrator, managing the conversation loop and determining when to call specific tools or providers.
## Tier 2: Specialized Models (Experts/Tools)
* **`mcp_client.py`:** Provides a suite of specialized "tools" (e.g., `read_file`, `list_directory`, `search_files`) that act as domain experts for file system manipulation.
* **`shell_runner.py`:** A specialist tool for executing PowerShell scripts to perform system-level changes.
* **External AI Providers:** Gemini and Anthropic models are used as the "engines" behind these specialized operations.
## Tier 3: Data & Knowledge Base (Information)
* **`aggregate.py`:** The primary mechanism for building the context sent to the AI. It retrieves file contents and metadata to ground the AI's reasoning.
* **`manual_slop.toml`:** Stores project-specific configuration, tracked files, and discussion history.
* **`file_cache.py`:** Optimizes data retrieval from the local file system.
## Tier 4: Monitoring & Feedback (Governance)
* **`session_logger.py`:** Handles timestamped logging of communication history (`logs/comms_<ts>.log`) and tool calls.
* **`performance_monitor.py`:** Tracks metrics related to execution time and resource usage.
* **Script Archival:** Generated `.ps1` scripts are saved to `scripts/generated/` for later review and auditing.

File diff suppressed because it is too large Load Diff

27
MMA_Support/Overview.md Normal file
View File

@@ -0,0 +1,27 @@
# 4-Tier Hierarchical Multi-Model Architecture (MMA) - Overview
The 4-Tier Hierarchical Multi-Model Architecture is a conceptual framework designed to manage complexity in AI systems by decomposing responsibilities into distinct, specialized layers. This modular approach enhances scalability, maintainability, and overall system performance.
## Architectural Tiers
1. **Tier 1: User-Facing Model (The Orchestrator/Router)**
* Direct user interface and intent interpretation.
* Routes requests to appropriate specialized models or tools.
2. **Tier 2: Specialized Models (The Experts/Tools)**
* Domain-specific models or tools (e.g., code generation, data analysis).
* Performs the "heavy lifting" for specific tasks.
3. **Tier 3: Data & Knowledge Base (The Information Layer)**
* A repository of structured and unstructured information.
* Provides context and facts to specialized models.
4. **Tier 4: Monitoring & Feedback (The Governance Layer)**
* Overarching layer for evaluation, error analysis, and continuous improvement.
* Closes the loop between user experience and model refinement.
## Core Goals
* **Modularity:** Decouple different functions to allow for independent development.
* **Efficiency:** Use smaller, specialized models for specific tasks instead of one monolithic model.
* **Contextual Accuracy:** Ensure specialized tools have access to relevant data.
* **Continuous Improvement:** Establish a systematic way to monitor performance and iterate.

View File

@@ -0,0 +1,30 @@
# Principles & Interactions
The effectiveness of the 4-Tier Multi-Model Architecture depends on well-defined interfaces and clear communication protocols between layers.
## Interaction Flow
1. **Ingress:** The User sends a query to Tier 1.
2. **Intent & Routing:** Tier 1 analyzes the query and identifies the required expertise.
3. **Specialist Call:** Tier 1 dispatches a request to one or more Tier 2 specialists.
4. **Knowledge Retrieval:** Tier 2 specialists query Tier 3 for specific facts or context needed for their task.
5. **Execution:** Tier 2 specialists process the request using the retrieved data.
6. **Synthesis:** Tier 1 receives the output from Tier 2, synthesizes it, and presents it to the User.
7. **Observation:** Tier 4 logs the entire transaction, collects feedback, and updates metrics.
## Core Architectural Principles
### 1. Separation of Concerns
Each tier should have a single, clear responsibility. Tier 1 should not perform heavy computation; Tier 2 should not handle user-facing conversation logic.
### 2. Standardized Communication
Use structured data formats (like JSON) for all inter-tier communication. This ensures that different models (potentially from different providers) can work together seamlessly.
### 3. Graceful Degradation
If a Tier 2 specialist fails or is unavailable, Tier 1 should be able to fall back to a more general model or provide a meaningful error message to the user.
### 4. Verification Over Trust
Tier 1 should validate the output of Tier 2 specialists before presenting it to the user. Tier 4 should periodically audit the entire pipeline to ensure quality and safety.
### 5. Data Privacy & Governance
Ensure that data flowing through Tier 3 and 4 is handled according to security policies, with proper sanitization and access controls.

View File

@@ -0,0 +1,59 @@
# Technical Deep Dive: Paths & Nuances
This document explores the low-level technical execution paths and implementation nuances of the 4-Tier Hierarchical Multi-Model Architecture.
## 1. Execution Paths
The architecture distinguishes between different "paths" to optimize for latency, cost, and accuracy.
### A. The Fast Path (Reactive)
* **Trigger:** Low-complexity intents (e.g., "Hello", "What is the current time?", "Status check").
* **Flow:** User -> Tier 1 -> User.
* **Nuance:** Tier 1 identifies that no specialized knowledge (Tier 3) or tool execution (Tier 2) is required. It responds directly using its internal weights or a local cache.
* **Goal:** Sub-100ms response time.
### B. The Slow Path (Reflective / Agentic)
* **Trigger:** Complex tasks (e.g., "Fix the bug in the UI layout", "Refactor the ai_client.py").
* **Flow:** User -> Tier 1 (Intent) -> Tier 2 (Specialist) -> Tier 3 (Context/RAG) -> Tier 2 (Execution) -> Tier 1 (Synthesis) -> User.
* **Nuance:** This involves high-latency operations, including tool calls and codebase searches. Tier 1 acts as a supervisor, potentially looping back to Tier 2 if the initial output is insufficient.
### C. The Governance Path (Tier 4 Integration)
* **Trigger:** Any operation that modifies the system or presents a high-risk answer.
* **Flow:** (Parallel or Post-hoc) Tier 1/2 Output -> Tier 4 (Validation) -> User/Log.
* **Nuance:** Tier 4 runs an "LLM-as-a-judge" or a static analysis tool (like `ruff` or `mypy`) on the output. If validation fails, the system may automatically trigger a "re-plan" in Tier 1.
---
## 2. Context & Token Management
A critical nuance is how the limited context window (token budget) is managed across tiers.
### A. Token Budgeting
* **Tier 1 (Global Context):** Holds the conversation history and high-level project metadata. Budget: ~20% of window.
* **Tier 2 (Local Context):** Receives a "surgical" injection of relevant files/data from Tier 3. Budget: ~60% of window.
* **Output Space:** Reserved for generating large code blocks or summaries. Budget: ~20% of window.
### B. Context Folding (The "Accordion" Effect)
To prevent context overflow, the system "folds" (summarizes) older parts of the conversation.
* **Recent History:** Full fidelity.
* **Mid-term History:** Summarized by Tier 1.
* **Long-term History:** Archived in Tier 3 (searchable but not in-context).
---
## 3. Communication Protocols
* **Inter-Tier Format:** Strictly structured JSON (e.g., OpenAI Tool Call format or Google GenAI Function Call).
* **Streaming:** Tier 1 typically streams its "thinking" process (Slow Path) to provide the user with immediate feedback while Tier 2 is still working.
* **Handshake:** Tier 2 must acknowledge receipt of context from Tier 3 with a "Digest" hash to ensure data integrity.
---
## 4. Nuances vs. Standard RAG
| Feature | Standard RAG | MMA (4-Tier) |
| :--- | :--- | :--- |
| **Logic** | Flat (Query -> Doc -> Result) | Hierarchical (Intent -> Route -> Expert -> Doc) |
| **Expertise** | Homogeneous | Heterogeneous (Different models for different tiers) |
| **Feedback** | Manual | Automated (Tier 4 Closed-loop) |
| **State** | Stateless or simple session | Multi-layered state (Orchestrator vs Specialist state) |

View File

@@ -0,0 +1,30 @@
# Tier 1: User-Facing Model (Orchestrator/Router)
The User-Facing Model is the entry point for all user interactions. It serves as the "brain" that understands what the user wants and decides how the system should respond.
## Key Responsibilities
### 1. Intent Recognition
* Analyze the user's natural language input.
* Classify the request into one or more categories (e.g., "request for code", "general inquiry", "data analysis").
* Extract key parameters and constraints from the user's query.
### 2. Routing
* Map recognized intents to specific Tier 2 models or tools.
* Determine if multiple specialized tools need to be called in sequence or parallel.
* Handle tool dispatching and manage the flow of data between tiers.
### 3. Context Management
* Maintain the history of the conversation.
* Decide what information from the history is relevant to the current turn.
* Synthesize a coherent prompt for downstream models based on the current context.
### 4. Response Synthesis
* Integrate the raw outputs from Tier 2 models into a final, user-friendly response.
* Ensure the tone and style are consistent with user expectations.
* Validate that the final response directly addresses the user's original intent.
## Characteristics
* **High Reasoning:** Needs to be strong at logic and instruction following.
* **General Purpose:** While not necessarily a domain expert, it must be broad enough to understand any valid user input.
* **Speed:** Should ideally be responsive to minimize perceived latency.

View File

@@ -0,0 +1,28 @@
# Tier 2: Specialized Models (Experts/Tools)
Tier 2 consists of a collection of specialized agents, models, or tools, each optimized for a specific domain or task. This allows the system to leverage "best-in-class" capabilities for different problems.
## Key Responsibilities
### 1. Task Execution
* Perform deep processing in a specific area (e.g., writing Python code, generating images, performing complex mathematical calculations).
* Operate within the constraints provided by the Tier 1 Orchestrator.
### 2. Domain Expertise
* Provide specialized knowledge that a general model might lack.
* Utilize specialized formatting or protocols (e.g., returning structured JSON for data analysis tools).
### 3. Tool Integration
* Act as wrappers for external APIs or local scripts (e.g., `shell_runner` in Manual Slop).
* Manage its own internal state or "scratchpad" during complex multi-step operations.
## Common Specialist Examples
* **Code Expert:** Optimized for high-quality software engineering and debugging.
* **Search/Web Tool:** Specialized in retrieving and summarizing real-time information.
* **Data Scientist:** Capable of running statistical models and generating visualizations.
* **Creative Writer:** Focused on tone, narrative, and artistic expression.
## Implementation Principles
* **Fine-Tuning:** Models in this tier are often smaller models fine-tuned on specialized datasets.
* **Isolation:** Specialists should ideally be stateless or have well-defined, temporary state to prevent cross-contamination.
* **Interface Standards:** Use consistent input/output formats (like JSON) to simplify communication with Tier 1.

View File

@@ -0,0 +1,27 @@
# Tier 3: Data & Knowledge Base (Information Layer)
Tier 3 is the foundational layer that provides the necessary facts, documents, and data required by the higher tiers. It is a passive repository that enables informed reasoning and specialized processing.
## Key Responsibilities
### 1. Information Storage
* Maintain large-scale repositories of structured data (SQL/NoSQL databases) and unstructured data (PDFs, Markdown files, Codebases).
* Host internal company documents, project-specific files, and external knowledge graphs.
### 2. Retrieval Mechanisms (RAG)
* Support efficient querying via Vector Search, keyword indexing, or metadata filtering.
* Provide Retrieval-Augmented Generation (RAG) capabilities to enrich the prompts of Tier 2 models with relevant snippets.
### 3. Contextual Enrichment
* Supply specialized models with "ground truth" data to minimize hallucinations.
* Manage versioned data to ensure the system reflects the most up-to-date information.
## Components
* **Vector Databases:** (e.g., Pinecone, Milvus, Chroma) for semantic search.
* **Traditional Databases:** (e.g., PostgreSQL) for structured business data.
* **File Systems:** Local or cloud storage for direct file access.
* **External APIs:** Real-time data sources (weather, finance, etc.).
## Interactions
* Tier 2 specialists query Tier 3 to get the data they need to perform their tasks.
* Tier 1 may occasionally query Tier 3 directly to determine if sufficient information exists before routing.

View File

@@ -0,0 +1,27 @@
# Tier 4: Monitoring & Feedback (Governance Layer)
Tier 4 acts as the "supervisor" of the entire architecture. It ensures the system is performing correctly, ethically, and efficiently, while providing a path for continuous evolution.
## Key Responsibilities
### 1. Performance Monitoring
* Track latency, token usage, and error rates across all tiers.
* Identify bottlenecks (e.g., a Tier 2 specialist that is consistently slow).
### 2. Evaluation & Feedback
* Collect explicit user feedback (e.g., "Good/Bad" ratings).
* Perform automated evaluation using "LLM-as-a-judge" to score responses based on accuracy, tone, and safety.
* Log failures for manual review and human-in-the-loop (HITL) intervention.
### 3. Error Analysis & Root Cause
* Analyze why specific routes failed or why a specialist produced a low-quality output.
* Maintain a "lesson learned" database to inform future system prompts or fine-tuning.
### 4. Continuous Improvement
* Inform the retraining or fine-tuning of Tier 2 models based on real-world usage patterns.
* Optimize Tier 1 routing logic based on success/failure metrics.
## Tools & Techniques
* **Logging/Observability:** (e.g., LangSmith, Weights & Biases, custom JSON-L logs).
* **A/B Testing:** Compare different model versions or routing strategies.
* **Red Teaming:** Proactively test the system for vulnerabilities and biases.