New track for prior-session sepia tint:
- 3 new theme slots (prior_session_bg, prior_session_tint, prior_session_amount)
- per-palette state dict mirroring _brightness/_contrast/_gamma
- apply_prior_tint helper (float-only math per user requirement)
- 6 prior-session render sites wrapped (2 bubble_vendor swaps + 4 tint wraps)
- Theme Settings panel slider with persistence
Code-block tonemap fix is OUT OF SCOPE (upstream imgui_bundle 1.92.5
API only exposes 4-value PaletteId enum, no per-instance struct).
See spec §1.1.1 and design doc 'Honest constraint' section.
The Phase 6+ section had two duplicate '### Active' headers, which
made the chronology confusing. The user (paraphrased): preserve the
chronology of project progress, don't need full detail, follow the
previous restructure's lightweight pattern.
Changes:
- Add '### Recently Completed (2026-06-06 to 2026-06-10)' subsection
containing the 3 closed tracks (startup_speedup, test_batching_refactor,
test_infrastructure_hardening) with lightweight entries: per-phase
commit SHAs only, 1-line summary, link to spec/plan/state folder.
Trimmed the verbose per-sub-track commentary that was in the old
startup_speedup entry (the per-sub-track bullets for warmup, status
indicator, audit violations, post-shipping fixes are in the
archive's spec/plan, not the tracks.md).
- Remove the duplicate '### Active' header.
- Update section intro to reflect '3 recently completed, 4 in plan'
(was '2 already completed, 3 in plan').
- test_infrastructure_hardening entry now has phase commit SHAs
(5df22fa8, 67d0211e, 006bb114, b8fcd9d6, 33d5cac, 7b87bbf5,
84edb200, 719fe9a) instead of just the closing-report link.
Chronology is now visible at a glance; per-track full detail is
in the linked archive/ folder.
These were authored at track start but missed by the final-state
commit. They are the brief 1-2 page design intent and executable
plan for the docs sync track. The closing report at
docs/reports/docs_sync_test_era_20260610.md summarizes the actual
17-commit execution.
- state.toml: status active->completed, all 25 tasks marked complete
with commit SHAs, all 4 phases checkpointed
- metadata.json: status active->shipped, 17-commit list, all 9
verification criteria flipped to DONE
- Structural Testing Contract (mirrors workflow.md)
- Isolated-Pass Verification Fallacy (Lesson 1, with link to the
test_infrastructure_hardening_batch_green_20260610 incident report
that motivated the rule)
- Audit Scripts as CI Gates (4 scripts: check_test_toml_paths,
audit_main_thread_imports, audit_weak_types, audit_no_models_config_io)
- Skip Markers Are Documentation, Not Avoidance (workflow.md policy)
Known Pitfalls (new subsection):
- HARD BAN: git checkout -- <file>, git restore, git reset
(per AGENTS.md Critical Anti-Patterns; destroyed user in-progress
edits twice on 2026-06-07; concrete 2026-06-10 incident:
mma_tier_usage_reset_fix regression)
Live_gui Test Fragility (2 new subsections):
- Anti-pattern: push_event + time.sleep(N) + assert is a race.
Fix: poll-until-state-visible with bounded retries. 5+ tests
affected in 2026-06-10 batch-green wave.
- Async setters need poll-for-state. mma_state_update and rag_*
setters dispatch to _pending_gui_tasks queue; the setter returns
before the GUI render loop processes the task. Assert immediately
= race. Fix: poll via get_value with bounded retry.
Lesson 5 from the 4-day test-hell saga. The chroma cache lives at
tests/artifacts/.slop_cache/chroma_<collection>/, NOT at the per-run
live_gui_workspace_<timestamp>/ subdir. The trailing-slash bug in
Path(active_project_path).parent places the cache one level higher
than expected.
RAG tests must pre-clean the cache to avoid persistent state from
prior batched runs. Documents the cleanup pattern (shutil.rmtree with
ignore_errors=True), the auto-recovery mechanism (_validate_collection_dim),
and 3 anti-patterns (assuming per-run, not cleaning, asserting on
first chunk in batched context).
New entry at the top of the Recently Shipped list, linking to the
archive/ folder. Includes:
- 314/314 green across all 11 tier batches
- FR1-FR5 summary
- 3 lineage tracks also archived
- The 4 unblocked tracks
- Link to the closing batch-green report
- Remove row 1 from Active Tracks table
- Update rows 2-5, 17: test_infrastructure_hardening_20260609 -> '(merged)'
- Mark test_infrastructure_hardening as [COMPLETE 2026-06-10] [archived]
- Update link to use archive/ instead of tracks/
- Add closing note: 314/314 tests green, lineage tracks also archived
One addition to conductor/code_styleguides/python.md §8
"AI-Agent Specific Conventions":
- **No diagnostic noise in production code (Added
2026-06-09).** `sys.stderr.write(f"[XYZ_DIAG] ...") lines
in src/*.py are technical debt. The right place for
one-time investigation output is tests/artifacts/<test>.diag.log
(a log file) or a standalone /tmp/diag_<name>.py script.
If you must instrument production code, the diag lines
are part of the same atomic commit as the fix.
- **Test files ARE allowed to be diagnostic.** The rule
applies to src/*.py only; tests/test_*.py may use
print(..., file=sys.stderr) freely.
Markdown only. No code modified.
Two additions to conductor/workflow.md §"Known Pitfalls":
1. **Isolated-Pass Verification Fallacy (Added 2026-06-09)** —
the rule that a test passing in isolation but failing in
batch is FAILING. The only verification that matters for
live_gui tests is the batch run. This is the flip side of
the existing "Live_gui Test Fragility (Authoring-Side)"
rule. Cross-references that rule.
2. **Process Anti-Patterns (Added 2026-06-09)** — 8-rule
summary list, with cross-reference to AGENTS.md for the
full ruleset. The 8 patterns are: Deduction Loop,
Report-Instead-of-Fix, Scope-Creep Track-Doc,
Inherited-Cruft, Diagnostic Noise in Production, Premature
Surrender, Verbose Commit Message, Isolated-Pass
Verification Fallacy.
Markdown only. No code modified. Cross-references
AGENTS.md (the load-bearing agent doc) for the full text
of each pattern.
Three surgical fixes to conductor/edit_workflow.md:
1. **§2 "Verify Before Editing"** — removed the leftover
`git checkout -- src/gui_2.py` instruction. The user's
commit `4eba059e unfuck edit workflow` removed most of
the git checkout nuke instructions but missed §2. The
revised §2 now says: read the contract (function signature,
yield shape, return type) before editing, and DO NOT use
`git checkout` to revert. Ask the user.
2. **§3 "Reading Before Editing"** — added the line-number
offset check. `set_file_slice` uses 1-indexed inclusive
`start_line`/`end_line`; off-by-one is a common silent
failure. The rule is now: confirm the exact line range
with `get_file_slice` first.
3. **§8 "set_file_slice IS Valid for Multi-Line Content
(Revised 2026-06-09)"** — replaced the wrong rule
("Do not use set_file_slice for multi-line content") with
the correct rule: set_file_slice IS valid for 3-10 line
surgical edits, with a tool-selection guide (which tool
for which job), a mandatory contract-change check
(search for callers of the symbol being changed; update
all callers in the same atomic commit if the public
interface changes), and a mandatory whitespace-and-EOL
rule (preserve line ending, indentation, and line count).
4. **§9 "No Diagnostic Noise in Production Code
(Added 2026-06-09)"** — new section. Diag stderr goes
to log files or /tmp scripts, NOT src/*.py. If you must
add diag lines to production code, they are part of the
same atomic commit as the fix — they do not live
uncommitted in the working tree.
5. **"If set_file_slice produces wrong indentation"** —
new handler in the Step-by-Step Workflow. Tells the
agent: you wrote the wrong indent; the tool did what
you asked; re-read the file with get_file_slice; do
NOT use git checkout to revert.
These are the rule corrections the user demanded after
the Tier-2's bad set_file_slice + git nuke + diag-noise
behavior. Markdown only. No code modified.
The SSDL digest (docs/reports/computational_shapes_ssdl_digest_20260608.md,
504 lines, 30KB) is the theoretical foundation for the chunkification
pattern. Per the digest's Technique 5 "Assume-away (Xar)" in §2.2
and the "Xar-style chunked arrays" recommendation in §5.2, the
chunkification track is a *direct application* of the SSDL's
"assume as much as possible" lens (§4).
This commit adds the SSDL digest to the See Also of the v1+v2
C11-Python interop assessment (front-matter Cross-references line).
The same cross-reference is also being added to:
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md
(in a new §6.1 "SSDL alignment" subsection)
- conductor/tracks/manual_ux_validation_20260608_PLACEHOLDER/spec.md
(in §5 Architectural Reference + §6 See Also + a new §2.6
"SSDL cross-reference" section that distinguishes GUI ASCII
vocabulary from SSDL vocabulary)
No code modified. Cross-reference only.
Also: small update to conductor/tracks.md to add the 2 new
tracks (manual_ux_validation_20260608_PLACEHOLDER as Active;
chunkification_optimization_20260608_PLACEHOLDER as Backlog/Contingency).
The user said (verbatim): "On number 1. I love the idea and definitely
see poitental." This commit creates a full track that promotes the
ASCII-sketch UX ideation workflow
(docs/reports/ascii_sketch_ux_workflow_20260608.md, 340 lines) to
a real track with a concrete first target.
The track complements (does not replace) the existing
manual_ux_validation_20260302 track (which is a general UX review
track; this 2026-06-08 track is *focused* on the ASCII-sketch
workflow specifically).
Files (5 total, ~52KB, 12,000+ words):
- spec.md (186 lines, 9 sections) - track design, 5 open
questions, first target analysis, SSDL cross-reference
- plan.md (~280 lines, 4 phases, 21 tasks) - TDD-style with
WHERE/WHAT/HOW/SAFETY annotations
- metadata.json (~120 lines) - structured metadata, 5 open
questions with defaults, 5 SSDL principles available
- state.toml (~95 lines) - per-task tracking + phase status
- index.md (~50 lines) - track context + related docs
Key design decisions captured:
1. Two distinct vocabularies are conflated at first glance:
- GUI ASCII (the workflow) for panel sketches
- SSDL (computational shapes digest) for internal code sketches
Spec §2.6 makes the distinction explicit; both are useful for
this track (GUI ASCII for Phase 2 design; SSDL for Phase 3
internal refactoring documentation).
2. The 5 open questions from the workflow report (Q1 vocabulary,
Q2 comparison policy, Q3 storage location, Q4 tooling,
Q5 frequency) are documented with sensible defaults in
spec.md §2.1-2.5 and metadata.json. The user can override
any of them; defaults pre-stage the work.
3. First target is src/gui_2.py:3770 render_discussion_entry
(Discussion Hub per-entry panel). Rationale:
- Most-edited surface (every AI/user message)
- User has strong opinions (per nagent_review_20260608 3 rounds
of corrections)
- 23-op matrix A1-A7 is the source of truth
- ImGui layout maps cleanly to ASCII
- SSDL defusing techniques can guide the internal refactoring
4. 4 phases: 1=resolve 5 questions, 2=execute workflow on first
target (1-3 ASCII rounds), 3=implement per design contract
(TDD with 7 test files for A1-A7 operations),
4=document the pattern + propose 5-7 next targets.
Cross-references added throughout:
- docs/reports/computational_shapes_ssdl_digest_20260608.md
(the SSDL digest, with explicit "this is a different vocabulary
for a different purpose" note in spec §2.6)
- docs/reports/ascii_sketch_ux_workflow_20260608.md (the workflow)
- docs/guide_discussions.md (the 23-op matrix A1-A7)
- conductor/tracks/nagent_review_20260608/ (the source of the
user's editable-discussion corrections)
- conductor/tracks/manual_ux_validation_20260302/ (complementary
general UX review track)
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/
(the contingency track; referenced in spec §2.6 SSDL cross-ref)
No code modified. Track is active; Phase 1 (5 user-questions) is
the current phase. User-confirmed worth doing in the prior turn.
The user's third correction this session changed the framing
from "build a stateful C extension" to "wait for a hard constraint,
then build a request/response blob pipeline." This commit creates
a 1-page contingency document (no plan.md, no implementation)
that captures:
- The threshold: "only worth it under a hard constraint that
no existing Python package can solve"
- The shape when activated: subprocess-launch C11 binary with
request/response blob wire format (NOT stateful CPython C
extension)
- The 2 cited candidates (markdown parsing into aggregate markdown,
context snapshot processing) are NOT currently bottlenecks per
src/aggregate.py:380-454 (pure-Python string concat, zero
third-party markdown deps in pyproject.toml:6-27) and
src/history.py:1-141 (bounded ~500KB at 100-snapshot capacity,
debounced)
- The SSDL digest's Technique 5 "Assume-away (Xar)" in §2.2 +
"Xar-style chunked arrays" recommendation in §5.2 pre-support
this track
Files (4 total, 227+ lines of contingency document):
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/metadata.json
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/state.toml
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/index.md
Cross-references added:
- docs/reports/computational_shapes_ssdl_digest_20260608.md (the
SSDL digest is the theoretical foundation; explicitly cited in
the spec's §6.1 "SSDL alignment" and in metadata.json external)
- docs/reports/c11_python_interop_assessment_20260608.md (the v1+v2
assessment; explicitly cited in spec's §6 See Also)
No code modified. Track does NOT appear in the active queue
of conductor/tracks.md; appears in the Backlog / Contingency
section as a reference, not a commitment.
Activation criteria (per metadata.json):
1. Profiling shows a real bottleneck in a target code path
2. The bottleneck cannot be solved with existing Python packages
3. The user explicitly approves activation
Without all 3, this track stays deferred. Default action is don't.
The user specified that the code_path_audit_20260607 track should run
AFTER the 4 foundational tracks complete (qwen_llama_grok,
data_oriented_error_handling, data_structure_strengthening,
mcp_architecture_refactor). This commit formalizes that timing
and grounds the audit's analytical framing in the 5 sources loaded
into context on 2026-06-08.
3 surgical additions to the spec/plan, no task changes:
1. Post-4-tracks timing (new section in spec.md §"Timing", plus
a "Timing" callout in plan.md's opening):
- The 4 tracks will significantly reshape src/ai_client.py,
src/mcp_client.py, src/app_controller.py, and
src/type_aliases.py
- Running the audit on pre-refactor code would produce a
report that's stale on day 1
- The post-4-tracks timing ensures the audit grounds
optimization decisions for the *resulting* architecture
- Pre-flight check: verify all 4 tracks are [x] completed
in conductor/tracks.md before starting this track
2. Analytical framing (new section in spec.md §"Analytical Framing
(5-source lens)"):
- Maps each of the 5 sources (Fleury taxonomy + Fleury
combinatoric + Muratori Big OOPs + Reece Assuming + user's
chunk ideation) to specific audit-time heuristics
- 4 concrete heuristics: effective-codepath count,
entity-hierarchy fingerprint, assumed-too-much detector,
chunkification candidates
- The heuristics shape REPORT INTERPRETATION, not the
static cost model (which stays data-grounded in
EXPENSIVE_THRESHOLD + per-class weights)
3. See Also cross-references in spec.md (6 new entries):
- nagent_review Pitfalls #2 and #4 (provider history
globals + stateful singleton)
- wo84LFzx5nI Big OOPs transcript (full text, 4310
segments, 200KB; loaded 2026-06-08)
- i-h95QIGchY Assuming transcript (full text, 3719
segments, 162KB; loaded 2026-06-08)
- ed_chunk_data_structures_20260523.md (5-image archive
of user's chunk ideation, 19KB; saved 2026-06-08)
- computational_shapes_ssdl_digest_20260608.md (the SSDL
digest that synthesizes the 4-source computational-shapes
thinking; the audit's tree/mermaid outputs ARE
computational-shape visualizations)
4. tracks.md entry updated to include the spec/plan links and
a brief status note that the audit is post-4-tracks.
5. plan.md has a "Timing" callout at the top stating the 4
tracks must ship before the plan executes.
No code modified. The audit's tasks (Phases 1-6) are unchanged
in structure; the new sections only add analytical context
and timing constraints.
PR3 of the test_full_live_workflow_imgui_assert fix sequence.
When a prior live_gui test in the same session crashes the GUI (e.g.
via an ImGui IM_ASSERT from cumulative panel state), the controller's
_io_pool gets shut down. The next test starts in a degraded state
but only discovers this 120s later when its project switch times
out with a confusing 'cannot schedule new futures after shutdown'
error.
This commit adds a /api/gui_health pre-flight check at the start of
test_full_live_workflow. If the GUI is degraded, the test fails
fast (within 1s) with a clear, actionable message that includes:
- The exact RuntimeError that caused the degradation
- The full traceback of the last ImGui scope mismatch
- A note that the new test cannot proceed with a dirty state
Per user feedback 2026-06-08: 'I don't want a batch to be too fragile
where I can't restart the app and continue with the next test file
if it fails. Just has to note that the new file didn't get to deal
with a dirty state.'
Also includes the planning documents written earlier in this session:
- TODO_test_full_live_workflow_v2.md (task list)
- test_full_live_workflow_imgui_assert_20260608.md (root cause report)
- test_full_live_workflow_propagation_digest_20260608.md (solutions digest)
- batch_resilience_plan_20260608.md (batch resilience plan)
Verification:
- test_full_live_workflow in isolation: 13.45s PASS (health=True, no degrade)
- 4 sims + test_full_live_workflow in batch: 76.46s (1 FAIL fast, 4 sims PASS)
- Without PR3 fix: 200s FAIL with confusing 120s timeout
- With PR3 fix: 76s FAIL with clear 'GUI is degraded' message
- The fast-fail is observable, not silent (per user's 'wrap might be
worth it if that properly lets us handle the assert')
4 surgical additions to the spec, no task changes:
1. list_tool_schemas on the SubMCP Protocol: Added the method
to §3.1 (The SubMCP Protocol). Per nagent_review Pitfall #6
(hard-coded tool discovery) and takeaway #5 (self-describing
tools), each sub-MCP advertises its own capabilities via
list_tool_schemas() rather than relying on a central registry.
This is the equivalent of nagent's collect_bin_tool_descriptions
per sub-MCP. The MCPController.get_tool_schemas() becomes a
simple aggregator.
2. Security model is the contract: Added a new Important note
to §3.3 (The 3-Layer Security Model). The 3 layers
(Allowlist Construction -> Path Validation -> Resolution
Gate, per docs/guide_mcp_client.md) are not just refactored
- they are the CONTRACT between MCPController and the
sub-MCPs. Sub-MCPs receive a pre-validated Path and trust
it. They do NOT re-validate. The refactor is structural,
not security-changing.
3. Docs touchpoint in Phase 7: Added the docs touchpoint to
Phase 7 per the docs Refresh Protocol. The update to
docs/guide_mcp_client.md should add a Sub-MCP Architecture
section, link the list_tool_schemas pattern to 3-Layer
Security Model, and cross-link the 3 new guides from
the 2026-06-08 docs refresh.
4. See Also cross-references: Added 8 new entries to §12.2:
- docs/guide_context_aggregation.md (FileItem consumer)
- docs/guide_state_lifecycle.md (App state delegation)
- docs/guide_discussions.md (23-operation matrix)
- conductor/tracks/qwen_llama_grok_integration_20260606/
(Result return type coordination)
- conductor/tracks/nagent_review_20260608/{report,takeaways}.md
- (2 specific data_oriented_error_handling and
data_structure_strengthening cross-refs)
No plan.md changes.
4 surgical additions to the spec, no task changes:
1. ProviderHistoryMessage: Added a new alias to §3.1 (The
Aliases). Per nagent_review Pitfall #4 (provider history
divergence), the UI/curation layer (HistoryMessage, edited
via disc_entries[i].content) and the SDK layer
(ProviderHistoryMessage, the bytes actually replayed to the
LLM) are *distinct*. Conflating them via a single alias
perpetuates the bug. The new alias is documented as a
separate concept with its own use sites (_anthropic_history,
_deepseek_history, _minimax_history, _grok_history,
_llama_history). The follow-up public_api_migration_20260606
track is the natural moment to unify the two layers; this
spec just makes the distinction explicit.
2. FileItem alias points to the existing models.FileItem
dataclass, not Metadata. Per docs/guide_context_aggregation.md
(added 2026-06-08), FileItem is a 9-field dataclass
(path, auto_aggregate, force_full, view_mode, selected,
ast_signatures, ast_definitions, ast_mask, custom_slices,
injected_at) with a __post_init__ normalizer. Aliasing it to
dict[str, Any] would lose the type safety. The 9 other
aliases remain dict aliases for round-trip compatibility.
3. gui_2.py and mcp_client.py as follow-up: Added a Note
(dated 2026-06-08) to the Out of Scope section. The 23
lower-impact files (deferred) are dominated by gui_2.py
(26+ weak sites per guide_state_lifecycle.md) and
mcp_client.py (will be touched heavily by the parallel
mcp_architecture_refactor_20260606). The deferral is correct
but the follow-up should explicitly call out these two
files as the next targets, rather than implying they're
handled.
4. See Also cross-references: Added 7 new entries to §12.2:
- docs/guide_models.md (FileItem dataclass source)
- docs/guide_context_aggregation.md (FileItems consumer)
- docs/guide_discussions.md (HistoryMessage shape)
- docs/guide_state_lifecycle.md (state delegation)
- conductor/tracks/mcp_architecture_refactor_20260606/
- conductor/tracks/nagent_review_20260608/{report,takeaways}.md
No plan.md changes.