Private
Public Access
0
0

docs(ascii-dsl): add §8 Screenshot-to-ASCII Reverse Engineering (opt-in extension)

Documents the MiniMax_understand_image workflow for converting
screenshots to ASCII Layout Maps. Covers: when to use it, the
6-step workflow, the proportional-measurement prompt pattern,
faithful rendering rules (width ratios, empty space, floating
window position, color annotations, tab bars, table rows),
multi-screenshot composition, and limitations.
This commit is contained in:
2026-06-30 09:04:09 -04:00
parent 7e3ce307e1
commit ee7b1e263e
+69
View File
@@ -271,3 +271,72 @@ Once a design contract is locked and implemented, it must pass a three-tiered ve
1. **AST Integrity:** Every docstring modification must pass `py_check_syntax` to ensure it doesn't break python parsing.
2. **Regression Check:** The test runner (`pytest tests/`) must be run to verify zero side-effects. Docstring additions must never alter execution logic.
3. **Puppeteer Visual Audit:** In visual simulation tests, the captured Dear ImGui layout boundaries and widget visibility flags are compared against the rows, columns, and conditional states defined in the ASCII design contract.
---
## 8. Screenshot-to-ASCII Reverse Engineering (Opt-In Extension)
When a running GUI state needs to be captured as an ASCII Layout Map — for bug reports, regression documentation, or Tier 2 handoff — the `MiniMax_understand_image` MCP tool can reverse-engineer a screenshot into the DSL. This is an **opt-in** workflow; the standard DSL (§1-§7) remains the forward-design path (text-first, code-second). This section covers the reverse path (screenshot-first, text-second).
### 8.1 When to Use This Extension
- **Bug reports**: the user sees a broken layout and screenshots it; the agent converts to ASCII for the report
- **Regression documentation**: before/after screenshots converted to ASCII pairs to document what changed
- **Tier 2 handoff**: the user provides a screenshot of the current working state; Tier 1 converts to ASCII so Tier 2 can see the target layout without running the GUI
- **Layout audit**: the user provides a screenshot of a misbehaving panel; the agent converts to ASCII to reason about the structure
### 8.2 The Workflow
```
Step 1: User provides screenshot file path(s)
Step 2: Agent calls MiniMax_understand_image with a proportional-measurement prompt
Step 3: Agent converts the structured description into an ASCII Layout Map
Step 4: User reviews + corrects proportions ("the left panel is wider", "the Debug window is top-right not center")
Step 5: Agent revises until the ASCII faithfully represents the screenshot
Step 6: The final ASCII map is committed to docs or a track spec
```
### 8.3 The Proportional-Measurement Prompt
The first `MiniMax_understand_image` call must ask for **precise proportional measurements**, not just a list of elements. The prompt should request:
1. Panel width percentages (left panel X%, right panel Y%)
2. Vertical order and height proportions of each section within each panel
3. Exact position of floating/overlay windows (which panel, which corner, relative size)
4. Exact text labels, button labels, tab names, checkbox states
5. Color annotations for status text (red for errors, green for success, blue for info)
6. Empty space proportions (how much of each panel is blank)
Without proportional measurements, the resulting ASCII will be "scrunched" — elements compressed into too-small areas, losing the visual hierarchy that makes the layout map useful.
### 8.4 Faithful Rendering Rules
When converting the structured description to ASCII:
- **Width ratios must be preserved.** If the left panel is 25% and the right is 75%, the ASCII must show the left panel as roughly 1/4 the total width and the right as 3/4. Do not make them 50/50.
- **Empty space must be represented.** If 80% of a panel is blank, the ASCII must show that blank space as empty lines within the panel border. Do not compress it away.
- **Floating windows must be positioned correctly.** If the Debug window is top-right of the Discussion Hub, it must appear in the top-right area of the right panel in the ASCII, not centered or bottom.
- **Color annotations use inline markers.** Red text: `1 failed` with a note `^^^ in red`. Green text: `OUT request` with a note. Blue text: `tool_call` with a note.
- **Tab bars list all tabs.** Even inactive tabs must appear so the reader can see the full navigation surface.
- **Tables show all visible rows.** The telemetry table with 4 data rows must show all 4 rows, not just 1-2.
### 8.5 Multi-Screenshot Composition
When the user provides multiple screenshots (e.g., different panel configurations, before/after states), each gets its own ASCII Layout Map. The maps are presented sequentially with a header line identifying the screenshot source:
```
**Screenshot 1** (timestamp) — Panel A + Panel B:
<ASCII map>
**Screenshot 2** (timestamp) — Panel A + Panel C + Debug overlay:
<ASCII map>
```
Do not attempt to merge multiple screenshots into a single composite ASCII. Each screenshot is its own layout state.
### 8.6 Limitations
- The `MiniMax_understand_image` tool cannot read images from the clipboard directly; the user must provide a file path (e.g., a ShareX screenshot path).
- The proportional measurements are estimates, not pixel-perfect. The user must review and correct.
- Complex layouts with many small elements may lose resolution in the ASCII. Use the Feature Zooming technique (§4.1) to decompose dense areas into zoomed micro-layouts.
- Color information is lost in ASCII. Use inline text annotations (`^^^ in red`) to preserve critical color signals.