PR3 of the test_full_live_workflow_imgui_assert fix sequence.
When a prior live_gui test in the same session crashes the GUI (e.g.
via an ImGui IM_ASSERT from cumulative panel state), the controller's
_io_pool gets shut down. The next test starts in a degraded state
but only discovers this 120s later when its project switch times
out with a confusing 'cannot schedule new futures after shutdown'
error.
This commit adds a /api/gui_health pre-flight check at the start of
test_full_live_workflow. If the GUI is degraded, the test fails
fast (within 1s) with a clear, actionable message that includes:
- The exact RuntimeError that caused the degradation
- The full traceback of the last ImGui scope mismatch
- A note that the new test cannot proceed with a dirty state
Per user feedback 2026-06-08: 'I don't want a batch to be too fragile
where I can't restart the app and continue with the next test file
if it fails. Just has to note that the new file didn't get to deal
with a dirty state.'
Also includes the planning documents written earlier in this session:
- TODO_test_full_live_workflow_v2.md (task list)
- test_full_live_workflow_imgui_assert_20260608.md (root cause report)
- test_full_live_workflow_propagation_digest_20260608.md (solutions digest)
- batch_resilience_plan_20260608.md (batch resilience plan)
Verification:
- test_full_live_workflow in isolation: 13.45s PASS (health=True, no degrade)
- 4 sims + test_full_live_workflow in batch: 76.46s (1 FAIL fast, 4 sims PASS)
- Without PR3 fix: 200s FAIL with confusing 120s timeout
- With PR3 fix: 76s FAIL with clear 'GUI is degraded' message
- The fast-fail is observable, not silent (per user's 'wrap might be
worth it if that properly lets us handle the assert')
Gitea (and any case-sensitive filesystem) was rendering the [Top]
nav links in /docs as broken because of two bugs:
1. Case-sensitivity: 22 links used '../README.md' (all-uppercase)
but the actual file is 'docs/Readme.md' (capital R, lowercase
rest). 21 guide_*.md nav bars were affected, plus 1 internal
cross-link in Readme.md itself. Works on Windows (case-
insensitive) but broken on Linux/Gitea.
Fix: 22 occurrences across 22 files changed
'../README.md' -> '../Readme.md'
2. Wrong relative-path level: 16 links used '../../conductor/...'
from 'docs/guide_*.md' to reach 'conductor/'. This goes up 2
levels to 'projects/', which doesn't exist. The correct path
from 'docs/guide_*.md' to 'conductor/' is 1 level up
('../conductor/...'). 12 unique patterns across 10 files
affected.
Fix: 16 occurrences across 10 files changed
'../../conductor/' -> '../conductor/'
3. Bonus: 1 planned-guide link in guide_context_curation.md
referenced a never-written 'guide_context_presets.md'. The
ContextPreset schema is now fully covered in the new
'guide_context_aggregation.md' (per the 2026-06-08 docs
refresh). Fix: link target updated.
No content was changed, only link paths. 24 files, 37 link
replacements, 37 deletions.
Verification:
- All .md links in docs/ now resolve to existing files
(validated by path-resolution check from each file's directory)
- The 3 new guides from the previous docs refresh commit
(guide_discussions.md, guide_state_lifecycle.md,
guide_context_aggregation.md) had the case bug inherited from
guide_architecture.md's existing nav pattern; their top-of-file
nav bars are now correct
- The 21 pre-existing guide nav bars that had the same bug
(all 21 of them, except the 3 that used the correct case:
guide_mma.md, guide_simulations.md, guide_tools.md) are now
also fixed
- Inter-guide links (e.g. [Discussions](guide_discussions.md))
were not affected; they were always correct because both the
link text and the actual filename are lowercase
This is a docs-only fix. No code modified.
Per the docs Refresh Protocol (conductor/workflow.md), after a
reference/analysis track ships, the affected guides must be updated
to reflect new module structure or new conventions. The nagent_review
track (9cc51ca9) produced a deep-dive + 10 actionable takeaways that
named 3 documentation gaps in /docs. This commit fills them.
3 new guides (1,122 lines total):
1. guide_discussions.md (353 lines) — The Discussion system
- 23-operation matrix: A1-A7 per-entry + B1-B11 discussion-level
+ C1-C5 undo/redo
- Take naming convention (<base>_take_<n>), branching, promotion
- User-managed role list (app.disc_roles)
- Per-role filter linked to MMA persona focus
- _disc_entries_lock thread-safety contract
- Hook API session endpoints
- Persistence: _flush_to_project, _flush_disc_entries_to_project,
context_snapshot
- 9 file:line refs into gui_2.py:3770-4260 + history.py
2. guide_state_lifecycle.md (375 lines) — Undo/redo + reset + state
delegation
- HistoryManager + UISnapshot (13 captured fields, 100-snapshot
capacity, debounced change-detection at render frame)
- _handle_reset_session (clears 30+ fields, replaces project,
preserves active_project_path per the 2026-06-08 regression fix)
- App.__getattr__/__setattr__ state delegation to Controller
- 4-thread access pattern with 7 lock-protected regions
- State persistence: in-memory vs project TOML vs config TOML
- Hot-reload integration
- Hook API registries (_predefined_callbacks, _gettable_fields)
- 14 file:line refs into gui_2.py:1140-1170, history.py,
app_controller.py:3286-3356
3. guide_context_aggregation.md (394 lines) — The aggregate.py
pipeline
- 3 aggregation strategies (auto, summarize, full)
- 7 per-file view modes (full, summary, skeleton, outline,
masked, custom, none)
- Full FileItem schema (9 fields + __post_init__ normalizer)
at models.py:510-559
- ContextPreset schema and ContextPresetManager
- Tier 3 worker variant (build_tier3_context with FuzzyAnchor
re-resolution and focus-file handling)
- force_full / auto_aggregate short-circuits
- Cache strategy (static prefix + dynamic history)
- 23 file:line refs into aggregate.py:36-518 + models.py:909-937
8 existing guides cross-linked to the 3 new guides and to the
nagent_review track:
- guide_gui_2.md (+ See Also entries for discussions,
state lifecycle, context aggregation,
nagent_review report)
- guide_app_controller.md (+ See Also entries for discussions,
state lifecycle, context aggregation,
nagent_review report)
- guide_context_curation.md (+ new See Also section pointing to
context aggregation + nagent_review)
- guide_architecture.md (+ new See Also section listing all 10
guides + nagent_review report)
- guide_ai_client.md (+ See Also entries for state lifecycle,
context aggregation, nagent_review
pitfalls #2 and #4)
- guide_mma.md (+ new See Also section pointing to
context aggregation, discussions,
nagent_review report §9 + takeaways §3/§10
for SubConversationRunner priority)
- guide_models.md (+ See Also entries for context
aggregation, discussions, nagent_review
report §6 on FileItem as strongest
curation dimension)
- Readme.md (+ 3 new guide entries in the index
table, with one-line summaries)
No code modified. This is documentation only.
Why these 3 guides specifically:
- guide_discussions.md: The discussion system is the user's most
edited surface. nagent_review's report §3 enumerated 23 operations
(A1-C5) that previously existed only as scattered file:line refs
across gui_2.py. A dedicated guide makes the operation matrix
discoverable.
- guide_state_lifecycle.md: The undo/redo + reset + state delegation
machinery is architecturally load-bearing but scattered across 4
files. After nagent_review identified the provider-side history
divergence as Pitfall #4, the relationship between Manual Slop's
state and the provider's state needs explicit documentation.
- guide_context_aggregation.md: aggregate.py (518 lines) is the
most-touched module after ai_client.py but had no dedicated
guide. nagent_review confirmed it's Manual Slop's strongest
curation dimension. A dedicated guide makes the 7 view modes
and 3 strategies discoverable.
The 3 new guides total 1,122 lines and follow the existing
per-source-file deep-dive style (architectural, data-oriented,
state-management-focused).
Every direct dep in pyproject.toml now has a ~X.Y.Z bound
(patch-only). The 7 unconstrained deps (imgui-bundle,
anthropic, google-genai, openai, fastapi, mcp, uvicorn,
plus tomli-w) get explicit tilde bounds discovered from
uv.lock. The 6 >=X.Y.Z deps are normalized to tilde-style
(pinned to the current lock version).
The local-rag optional dep (sentence-transformers) is also
tilde-pinned.
requirements.txt is deleted (was redundant with uv.lock;
the uv project uses uv.lock as the canonical lock file,
which is regenerated locally and gitignored per project
policy at .gitignore:9).
Re-running the audit confirms 0 PIN_VIOLATION (was 7). The
final.md report records the post-cleanup state.
Also adds --report-name CLI flag to the audit script
(default 'initial') so the script can write either
initial.md (Phase 1) or final.md (Phase 2) into the same
report directory.
scripts/audit_license_cve.py: 4 internal checks (license +
CVE + pin + source-header), policy tables (allowlist of
permissive/weak-copyleft/public-domain, blocklist of
non-OSI/restricted-source), and a main() that runs all 4
and emits line-per-violation to stdout + a markdown report.
Tests (26 unit + integration) cover license classifier (16
variants across MIT, BSD, Apache, LGPL, MPL, CC0, WTFPL,
GPL, AGPL, SSPL, BSL, Commons Clause, Elastic, Anti-996,
Hippocratic, unknown), pin check (3), source-header check
(3), license check via importlib.metadata (1), CVE check
via subprocess pip-audit (2), and a smoke test of the main
loop (1).
No new pip deps in the project: pure stdlib
(importlib.metadata, tomllib, pathlib, re) + subprocess to
pip-audit (optional dev tool, installed via 'uv tool install
pip-audit' if user wants CVE checks).
Initial report at docs/reports/license_cve_audit/2026-06-07/
records the current state. The Phase 2 commit will apply
the fixes (tilde-pin, delete requirements.txt); the Phase 3
commit will add --strict mode + baseline file for CI.
Single-session planning digest that captures:
- The 5 tracks fully specced + planned (test_batching, qwen_llama_grok,
data_oriented_error_handling, data_structure_strengthening,
mcp_architecture_refactor)
- Cross-cutting design themes (data-oriented, audit-driven, per-track
commit + git note, out-of-scope-by-default)
- The audit + data foundation (scripts/audit_weak_types.py; 430 -> 60
finding; 0 strong patterns; 26 unique type strings; 86% concentrated
in 6 files)
- The dependency graph + recommended execution order
- Follow-up tracks already planned in spec §12.1 of each track
- Recommended future tracks (post-tracks documentation is the top pick)
- Risks, open questions, and a complete file index
This is the kind of reference document that:
- Future planners consult to understand the codebase's current state
- The implementing agent uses to coordinate across tracks
- The user reviews as a digest of the planning work
Written in the project's docs/reports/ directory alongside the existing
Phase 5 reports (PHASE5_STABILISATION_REPORT.md, MUTATION_MATRIX_PHASE5.md, etc.).
Phase 1, Tasks T1.2 + T1.4 of the startup_speedup_20260606 track.
NEW: scripts/audit_main_thread_imports.py
Static CI gate that AST-walks the import graph reachable from
sloppy.py and fails (exit 1) if any heavy module is imported at the
top of a main-thread-reachable file. Walks into if/elif/else and
try/except branches (which run at import time) but skips function
bodies (which only run when called). Allowlist: stdlib + the lean
gui_2 skeleton (imgui_bundle, defer, src.imgui_scopes, src.theme_2,
src.theme_models, src.paths, src.models, src.events).
NEW: scripts/audit_gui2_imports.py
Read-only analysis tool that lists every top-level and function-level
import in src/gui_2.py, classified by location. Used in Phase 5D to
identify which imports to remove.
NEW: tests/test_audit_main_thread_imports.py
9 tests covering: --help exits 0, clean stdlib-only passes, heavy
third-party fails, google.genai fails, transitive walks, function-
body imports ignored, if-branch imports flagged, try-block imports
flagged, file:line reported. All 9 pass.
NEW: docs/reports/startup_baseline_20260606.txt
3-run median cold-start benchmark. Worst offenders: src.gui_2
(1770ms), simulation.user_agent (1517ms), google.genai (1001ms),
openai (482ms), anthropic (441ms), imgui_bundle (255ms),
src.theme_nerv* (485ms combined), src.markdown_table (243ms),
src.command_palette (242ms).
NEW: docs/reports/startup_audit_20260606.txt
Audit output on the CURRENT codebase. Reports 67 violations across
the main-thread import graph (incl. numpy in src/gui_2.py:9,
tomli_w in src/gui_2.py:18, fastapi + requests in src/app_controller,
tree_sitter_* in src/file_cache, pydantic in src/models, plus all
the src.* subsystem imports that drag in heavy transitive deps).
Phase 3-5 of the track will resolve these one by one.
After Phase 3-5, this audit must exit 0 (no violations).
Co-located reports in docs/reports/ per project convention; the other
agent finished their work in docs/superpowers/ and is unrelated.