Private
Public Access
0
0
Commit Graph

129 Commits

Author SHA1 Message Date
ed b4d240a9f3 docs(rag): final report on dim-mismatch recursion fix 2026-06-09 15:04:42 -04:00
ed f207d297a3 docs(rag): final fix report and next steps 2026-06-09 14:38:30 -04:00
ed eb8357ec0e fix(rag): add CWD fallback in index_file for path-resolution resilience
RAGEngine.index_file silently returns when the joined base_dir+file_path
doesn't exist. This caused the RAG batch test to fail with 0 indexed
documents when the live_gui subprocess's active_project_root resolved
to a parent dir (e.g. tests/artifacts/) instead of the workspace
(tests/artifacts/live_gui_workspace/).

The fix: if the primary path doesn't exist, try CWD+file_path. The
base_dir takes priority; CWD is a safety net for relative-path
resolution across the spawn CWD boundary.

This is a defensive fix at the rag_engine layer. It does NOT fix the
underlying path-leakage issue in tests/conftest.py (hardcoded
Path('tests/artifacts/live_gui_workspace')) which needs a proper
fixture refactor. The RAG test still fails in batch due to that
deeper issue, documented in docs/reports/rag_test_batch_failure_status_20260609_pm3.md.

Behavior:
- base_dir+file_path exists: indexed from base_dir (unchanged)
- base_dir+file_path missing, CWD+file_path exists: indexed from CWD (new)
- Both missing: silently returns (unchanged)

Verified: tests/test_rag_index_file_path_fallback.py (3 tests, all pass)
- test_index_file_finds_file_via_cwd_fallback
- test_index_file_uses_base_dir_first
- test_index_file_silently_returns_when_no_match

Note: test file was removed before commit because it was being
abandoned along with the broader path-hygiene refactor. The fix
itself is preserved in src/rag_engine.py.
2026-06-09 12:31:21 -04:00
ed 2148e79a1c docs(rag): document venv dep install + new failure mode (relative path bug)
The venv now has sentence-transformers (installed via uv sync --extra local-rag).
The RAG test passes in isolation (7.10s) but fails in batch with a NEW error:
'RAG context not found in history' (test_rag_phase4_final_verify.py:95).

This is a SEPARATE bug from the missing-dep issue. The RAG test uses
RELATIVE file paths ('final_test_1.txt' instead of absolute). The RAG
engine indexes with these relative paths but the CWD is the project
root, not the test's workspace dir. Result: 0 docs indexed, 0 chunks
retrieved, no '## Retrieved Context' block in history.

The fix to _sync_rag_engine (e62266e8) is still correct - it surfaces
the error when the dep is missing. The dep is now installed, so the
sync/index/AI flow runs to completion. The new failure is a deeper
RAG test infrastructure bug that needs a separate track to fix.
2026-06-09 10:21:45 -04:00
ed e62266e868 fix(rag): surface embedding provider init failure as 'error' status
The bug: when the local embedding provider fails to initialize
(e.g. sentence-transformers not installed), RAGEngine.__init__
leaves self.embedding_provider = None (initialized at line 93
but never overwritten by the failing LocalEmbeddingProvider ctor).
The constructor returns. _sync_rag_engine's else branch then
sets status to 'ready' - a lie. The RAG panel shows 'ready'.
The user triggers a retrieval. The engine either has a broken
embedding provider (None) or the retrieval fails silently.
The RAG context never appears in the AI's history.

The fix: in _sync_rag_engine's _task, after RAGEngine(...)
returns, check if engine.embedding_provider is None. If so,
set status to 'error: RAG embedding provider failed to initialize'
and return early. This prevents:
  - The engine from being assigned to self.rag_engine
  - The rebuild being triggered
  - The status being set to 'ready' / 'indexing'

Note: this does NOT make the RAG test pass. The test requires
the sentence-transformers package which isn't installed in this
env. The fix makes the failure reliable (not flaky) and surfaces
the right error message.

TDD: 3 tests added in tests/test_rag_engine_ready_status_bug.py:
- RAGEngine ctor raises ImportError on missing sentence-transformers
- _sync_rag_engine sets status to 'error' (not 'ready') on init failure
- RAGEngine ctor leaves embedding_provider=None when init fails

All 3 pass. The RAG batch test now fails reliably at line 46
with the clear error message.
2026-06-09 09:39:02 -04:00
conductor-tier2 adc7ff8029 docs(audit): workflow/agent markdown audit with 10 recommendations
User asked: is there anything in our workflow or agent markdown
that should be updated or introduced based on this session?

This commit is the AUDIT ONLY. No workflow files are modified.
The 10 recommendations are not yet applied. User picks which to
act on, which to defer, which to discard.

docs/reports/workflow_markdown_audit_20260608.md (~370 lines):

Read all the workflow/agent markdown in scope (AGENTS.md,
CLAUDE.md, GEMINI.md, all 5 .agents/skills/*/SKILL.md, the 4
.agents/agents/*.md, conductor/workflow.md, product.md,
product-guidelines.md, tech-stack.md, index.md, tracks.md,
edit_workflow.md, the 2 existing code_styleguides/*.md, and the
4 .agents/policies/*.toml + 7 .agents/tools/*.json).

Cross-referenced each against the 7 new session artifacts
(nagent_review, 3 docs guides, ASCII-sketch workflow, SSDL
digest, C11 interop v1+v2, 2 new tracks) and the 3
user-correction patterns (duffle-as-style-ref, v2
request/response model, "only under hard constraint").

The 10 recommendations:
1 (HIGH) Update architecture-fallback with new docs
2 (HIGH) Document ASCII-sketch workflow in workflow.md
3 (HIGH) Document SSDL digest in product-guidelines.md
4 (HIGH) Add user_corrections_log to State.toml Template
5 (MED) Document contingency track pattern
6 (MED) Update Compaction Recovery to reference session_synthesis
7 (MED) Document v1->v2 framing iteration anti-pattern
8 (MED) Document preserve-before-compact archive pattern
9 (LOW) Document MiniMax understand_image for ASCII verification
10 (LOW) Document per-proposal commit chain with git notes

4 HIGH-priority = ~75 min to act on. All 10 = ~2-3 hours.

The audit is conservative: it does NOT recommend changing TDD,
the per-task commit discipline, the 4-tier MMA model,
product.md, tech-stack.md, the existing styleguides, or
adding new audit scripts. The session did not surface conflicts
with any of these.

Meta-pattern: workflow/agent markdown is the theoretical
contract; session artifacts are the empirical evidence; when
the two diverge, update the theory to match the evidence.
This session's evidence (new methodology, new vocabulary, new
patterns, new anti-patterns) drives the 10 recommendations.
2026-06-09 09:15:57 -04:00
ed 37b9a68017 docs: add test_infra_hardening foundation + RAG batch failure status
Foundation document for the future test_infra_hardening track that
will address session-scoped live_gui fixture isolation, silent
__getattr__/__setattr__ contract assumptions, and similar test
infrastructure fragility.

Also documents the test_rag_phase4_final_verify batch failure
that surfaces after the __getattr__ fix unblocks
test_full_live_workflow. The RAG test failure is NOT a regression
- it reproduces on pre-fix HEAD too. It's a pre-existing test
isolation issue (the live_gui fixture is session-scoped, so state
from the 4 sims pollutes the controller).
2026-06-09 00:26:05 -04:00
ed bcdc26d0bd fix(gui): correct __getattr__ to not silently return None for missing ui_ attrs
PR1 follow-up (the actual IM_ASSERT root cause fix).

The IM_ASSERT in 'MainDockSpace' was triggered by the
render_approve_script_modal function (gui_2.py:4895) calling
imgui.checkbox with a None value for app.ui_approve_modal_preview.

The chain of bugs:

1. AppController.__getattr__ returned None for ANY ui_ attribute
   (line 1237-1238). This was intended as a safety net for ui_*
   flags defined in __init__ but it was too généreux: it returned
   None for ui_ attrs that were NEVER set.

2. The pattern in render_approve_script_modal:
      if not hasattr(app, 'ui_approve_modal_preview'):
          app.ui_approve_modal_preview = False
      _, app.ui_approve_modal_preview = imgui.checkbox(..., app.ui_approve_modal_preview)
   relied on hasattr() returning False for unset attrs to trigger
   the initialization. But the App.__setattr__ checks
   hasattr(self.controller, name) to decide where to route
   assignments. The controller's __getattr__ returned None for
   ui_approve_modal_preview, so hasattr() returned True. The
   App.__setattr__ routed the assignment to the controller.
   The controller's __getattr__ then returned None on read,
   silently dropping the False value.

3. The next line called imgui.checkbox with None, which raised
   a TypeError. The TypeError propagated out of
   render_approve_script_modal without closing the modal,
   leaving the ImGui scope stack unbalanced. The unbalanced
   scope triggered IM_ASSERT(Missing End()) on the next frame.

Fix: AppController.__getattr__ now only returns None for an
EXPLICIT allowlist of ui_ attrs that are defined in __init__.
For any other missing attribute (including the case
'hasattr() should return False'), it raises AttributeError.

The App.__getattr__ was also fixed (per the test) to check
hasattr(controller, name) before delegating. This is defense in
depth in case other __getattr__ patterns are added.

Test verification (TDD red → green):
- 1/1 test_app_getattr_hasattr_bug PASSES (verifies hasattr
  returns False for unset attrs via App.__getattr__)
- 1/1 test_app_controller_getattr_ui_bug PASSES (verifies hasattr
  returns False for unset ui_ attrs on controller)

Live verification:
- 4 sims + test_live_workflow + 2 markdown tests: 7/7 PASS in 83.15s
- Previously failed at 200s+ with 'cannot schedule new futures after
  shutdown' / 121s with 'GUI is degraded before test starts'
- Now passes cleanly. The IM_ASSERT no longer fires.

13/13 related unit tests pass (app_controller_* + app_run_* +
app_getattr_*). No regressions in 51/51 io_pool/warmup/sigint/etc.
unit tests.
2026-06-08 23:45:25 -04:00
conductor-tier2 999fdea467 docs(c11-interop): cross-reference SSDL digest in See Also
The SSDL digest (docs/reports/computational_shapes_ssdl_digest_20260608.md,
504 lines, 30KB) is the theoretical foundation for the chunkification
pattern. Per the digest's Technique 5 "Assume-away (Xar)" in §2.2
and the "Xar-style chunked arrays" recommendation in §5.2, the
chunkification track is a *direct application* of the SSDL's
"assume as much as possible" lens (§4).

This commit adds the SSDL digest to the See Also of the v1+v2
C11-Python interop assessment (front-matter Cross-references line).
The same cross-reference is also being added to:
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md
  (in a new §6.1 "SSDL alignment" subsection)
- conductor/tracks/manual_ux_validation_20260608_PLACEHOLDER/spec.md
  (in §5 Architectural Reference + §6 See Also + a new §2.6
  "SSDL cross-reference" section that distinguishes GUI ASCII
  vocabulary from SSDL vocabulary)

No code modified. Cross-reference only.

Also: small update to conductor/tracks.md to add the 2 new
tracks (manual_ux_validation_20260608_PLACEHOLDER as Active;
chunkification_optimization_20260608_PLACEHOLDER as Backlog/Contingency).
2026-06-08 23:42:21 -04:00
conductor-tier2 12311190b3 docs(interop-v2): part 3 revises the recommendation after user's threshold-shift + shape-change corrections
The user pushed back on the v1 recommendation (commit 68354841) twice
in this turn. Both corrections reshape the answer.

Correction 1 (already incorporated): duffle.h + pikuma ps1 are a
C11 STYLE REFERENCE, not an interop pattern. (Captured in v1 §0.)

Correction 2 (NEW, this commit): The C11 path is only worth it under
a hard constraint that no existing Python package can solve. The
shape is request-blob -> C11 pipeline -> response-blob, NOT a
stateful C extension with a Python-facing API. Targets cited:
parsing markdown files/sources into aggregate markdown, context
snapshot processing, "possibly other things."

This commit adds Part 3 (sections 3.1-3.12) to the existing doc.
Part 1 (style) and Part 2 (general interop) stay as background.
Section 4 is re-flagged as "SUPERSEDED - see Part 3".

Part 3 covers:
- The two moves the user's second correction made (threshold-shift
  on when, shape-change on what)
- Grounded analysis of the 2 cited targets against actual code:
  * src/aggregate.py:380-454 (current markdown hot path is
    pure-Python string concat; pyproject.toml has zero
    third-party markdown deps)
  * src/history.py:1-141 (snapshot processing is bounded
    ~500KB at 100-snapshot capacity; pickle is the obvious
    cheap fix, not C11)
- The request/response wire format design space (text vs binary
  vs hybrid envelope-text+payload-binary)
- The pipeline API shape (single C entry point, subprocess-launch
  model)
- Revised answer to the "chunkification" question (chunk-array
  becomes an internal C implementation detail, not a Python
  type)
- Decision tree: profile first, try existing Python packages,
  only reach for C11 when hard constraint surfaces
- The 4 questions to revisit when constraint surfaces
- Revised insight: v2 (subprocess + wire format) is strictly
  more tractable than v1 (stateful C extension)
- Track implications: chunkification_optimization becomes a
  1-page contingency, not a full track; manual_ux_validation
  unaffected and confirmed
- v2 verdict matrix (11 rows) replacing v1's 7

Cross-references the actual code paths I read this turn:
- src/aggregate.py:380-454 (build_markdown_from_items)
- src/summarize.py:1-219 (the 3 _summarise_* functions)
- src/history.py:1-141 (UISnapshot, HistoryManager)
- pyproject.toml:6-27 (no markdown deps)

The user is right to push back. The v1 framing was over-engineered.
"Build a stateful C extension" assumed a future need; the actual
answer is "wait for a real bottleneck, then build a simple
subprocess pipeline." The 843-line doc now captures both the
v1 over-engineering AND the v2 contingency plan, so future
sessions can see the iteration and learn from it.
2026-06-08 23:07:24 -04:00
conductor-tier2 68354841cb docs(interop-assessment): C11 <-> Python interop design space for chunkification_optimization
The user asked a sharp, skeptical question: can a chunk-based C11
data structure actually interop with Python's runtime in a way
that's useful for Manual Slop? They explicitly corrected my
first-draft framing (the duffle.h + pikuma ps1 files are a C11
*style reference*, not an interop pattern). The assessment
investigates honestly and reports tractable-vs-not.

docs/reports/c11_python_interop_assessment_20260608.md (564 lines, 38KB):

Part 1: C11 style reference summary
- 11 style observations from reading duffle.h + main.c + pikuma
  ps1 duffle/ + hello_gte.c end-to-end
- Byte-width typedef convention (U1/U2/U4/U8, S1/S2/S4/S8, B1-B8, F4/F8)
- The macro meta-DSL (Struct_/Enum_/Array_/Slice_/Opt_/Ret_)
- The I_/IA_/N_ inline discipline
- The r/v pointer rule (restrict OR volatile, never both, never const)
- Slice + Slice_T as the data-structure primitive
- FArena as the allocation primitive (single-buffer, NOT chunked)
- defer/defer_rewind/scope as the cleanup primitive
- KTL (linear key-value table) as the "assume small N" pattern
- What a chunk-array in duffle.h style would look like

Part 2: Interop design space (the actual question)
- 5 candidate interop layers: ctypes, cffi, pybind11, custom
  CPython C extension, NumPy wrap
- Honest assessment matrix: build cost, per-op overhead, style
  fit, lego-set pattern support
- Verdict: custom CPython C extension is most tractable; pybind11
  is style-mismatched; ctypes/cffi work for non-hot-path
- What "MVP chunked C11 package" requires (~500-1000 LOC total)
- 5 questions to ask the user before this becomes a track
- Crucial insight: the user's "unorthodox" interop is most likely
  duffle.h-style C11 + thin PyTypeObject glue at the bottom of
  the same .h file. Tractable, style-fit high.

Cross-references the 5 sources:
- docs/transcripts/i-h95QIGchY (Reece's Xar reference impl)
- docs/ideation/ed_chunk_data_structures_20260523.md
- docs/reports/session_synthesis_20260608.md (the original proposal)
- src/app_controller.py:716 (the comms.log target)
- The user's local forth_bootslop + pikuma ps1 repos (read in full)

This is a follow-on to the synthesis's 2 proposed tracks
(manual_ux_validation_20260608_PLACEHOLDER + chunkification_optimization_20260608_PLACEHOLDER).
The user's question resolved the "skeptical of #2" concern by
scoping the tractable path: CPython C extension in duffle.h style.
The "lego-set of user-defined Python->C11 chunk ops" is NOT
tractable without a Python->C11 AST emitter, which is a
different (much larger) track.
2026-06-08 22:50:03 -04:00
conductor-tier2 77d7dff5ff docs(session-synthesis): preserve-before-compact archive of the 2026-06-08 session
The user explicitly requested the biggest in-depth report I can
muster at 478,992 tokens (94% of context window). The next
session will start with a fresh context; these two documents are
the minimum-sufficient anchor.

docs/reports/session_synthesis_20260608.md (579 lines, 40KB):
- 12 sections covering every artifact this session produced
- The 5 sources loaded: 2 YouTube transcripts + 2 Fleury
  articles + user's chunk-ideation archive
- The 10 commits in the session's commit chain (with the
  user's test-fragility work adjacent but not mine)
- The 4 audit-time heuristics derived from the 5-source lens
- The "what the user should know" section for next session

docs/reports/proposed_new_tracks_20260608.md (190 lines, 12KB):
- 2 new tracks proposed (manual_ux_validation_20260608_PLACEHOLDER,
  chunkification_optimization_20260608_PLACEHOLDER) with
  spec-ready detail
- 8 non-recommendations (so the user knows what I'm NOT
  suggesting)
- A "what I'd recommend" section with one-tracks-when
  sequencing

No code modified. Both are session-final artifacts, not tracks.
They live in docs/reports/ alongside the other session outputs
(SSDL digest, ASCII-sketch workflow, chunk ideation archive).

Cross-references the 5 sources (all committed to docs/transcripts/
and docs/ideation/ in earlier user commits):

- docs/transcripts/wo84LFzx5nI_big_oops_casemuratori.txt
- docs/transcripts/i-h95QIGchY_assuming_as_much_as_possible_andrewreece.txt
- docs/ideation/ed_chunk_data_structures_20260523.md
- docs/reports/computational_shapes_ssdl_digest_20260608.md
- docs/reports/ascii_sketch_ux_workflow_20260608.md

These 5 documents are the session's "thinking-aid" corpus. The
synthesis is the *index*; together they're the minimum-sufficient
context to re-anchor any future session.
2026-06-08 22:25:00 -04:00
ed 2eef50c5c2 transcripts 2026-06-08 21:49:35 -04:00
ed d7b66a5dda ideating chunk-based data structures 2026-06-08 21:45:30 -04:00
ed 0be9b4f0fb digest on computational shapes ssdl 2026-06-08 21:23:11 -04:00
ed 51ecace464 test(live_workflow): pre-flight health check fails fast on dirty state
PR3 of the test_full_live_workflow_imgui_assert fix sequence.

When a prior live_gui test in the same session crashes the GUI (e.g.
via an ImGui IM_ASSERT from cumulative panel state), the controller's
_io_pool gets shut down. The next test starts in a degraded state
but only discovers this 120s later when its project switch times
out with a confusing 'cannot schedule new futures after shutdown'
error.

This commit adds a /api/gui_health pre-flight check at the start of
test_full_live_workflow. If the GUI is degraded, the test fails
fast (within 1s) with a clear, actionable message that includes:
- The exact RuntimeError that caused the degradation
- The full traceback of the last ImGui scope mismatch
- A note that the new test cannot proceed with a dirty state

Per user feedback 2026-06-08: 'I don't want a batch to be too fragile
where I can't restart the app and continue with the next test file
if it fails. Just has to note that the new file didn't get to deal
with a dirty state.'

Also includes the planning documents written earlier in this session:
- TODO_test_full_live_workflow_v2.md (task list)
- test_full_live_workflow_imgui_assert_20260608.md (root cause report)
- test_full_live_workflow_propagation_digest_20260608.md (solutions digest)
- batch_resilience_plan_20260608.md (batch resilience plan)

Verification:
- test_full_live_workflow in isolation: 13.45s PASS (health=True, no degrade)
- 4 sims + test_full_live_workflow in batch: 76.46s (1 FAIL fast, 4 sims PASS)
  - Without PR3 fix: 200s FAIL with confusing 120s timeout
  - With PR3 fix: 76s FAIL with clear 'GUI is degraded' message
- The fast-fail is observable, not silent (per user's 'wrap might be
  worth it if that properly lets us handle the assert')
2026-06-08 21:17:54 -04:00
ed d7a065e9d5 ascii gui comms worflow ideation 2026-06-08 20:32:42 -04:00
conductor-tier2 161ebb0da6 docs(fix): correct nav link case + relative-path level
Gitea (and any case-sensitive filesystem) was rendering the [Top]
nav links in /docs as broken because of two bugs:

1. Case-sensitivity: 22 links used '../README.md' (all-uppercase)
   but the actual file is 'docs/Readme.md' (capital R, lowercase
   rest). 21 guide_*.md nav bars were affected, plus 1 internal
   cross-link in Readme.md itself. Works on Windows (case-
   insensitive) but broken on Linux/Gitea.

   Fix: 22 occurrences across 22 files changed
   '../README.md' -> '../Readme.md'

2. Wrong relative-path level: 16 links used '../../conductor/...'
   from 'docs/guide_*.md' to reach 'conductor/'. This goes up 2
   levels to 'projects/', which doesn't exist. The correct path
   from 'docs/guide_*.md' to 'conductor/' is 1 level up
   ('../conductor/...'). 12 unique patterns across 10 files
   affected.

   Fix: 16 occurrences across 10 files changed
   '../../conductor/' -> '../conductor/'

3. Bonus: 1 planned-guide link in guide_context_curation.md
   referenced a never-written 'guide_context_presets.md'. The
   ContextPreset schema is now fully covered in the new
   'guide_context_aggregation.md' (per the 2026-06-08 docs
   refresh). Fix: link target updated.

No content was changed, only link paths. 24 files, 37 link
replacements, 37 deletions.

Verification:
- All .md links in docs/ now resolve to existing files
  (validated by path-resolution check from each file's directory)
- The 3 new guides from the previous docs refresh commit
  (guide_discussions.md, guide_state_lifecycle.md,
  guide_context_aggregation.md) had the case bug inherited from
  guide_architecture.md's existing nav pattern; their top-of-file
  nav bars are now correct
- The 21 pre-existing guide nav bars that had the same bug
  (all 21 of them, except the 3 that used the correct case:
  guide_mma.md, guide_simulations.md, guide_tools.md) are now
  also fixed
- Inter-guide links (e.g. [Discussions](guide_discussions.md))
  were not affected; they were always correct because both the
  link text and the actual filename are lowercase

This is a docs-only fix. No code modified.
2026-06-08 19:51:55 -04:00
conductor-tier2 ba05168493 docs(refresh): 3 new guides + cross-links from nagent_review
Per the docs Refresh Protocol (conductor/workflow.md), after a
reference/analysis track ships, the affected guides must be updated
to reflect new module structure or new conventions. The nagent_review
track (9cc51ca9) produced a deep-dive + 10 actionable takeaways that
named 3 documentation gaps in /docs. This commit fills them.

3 new guides (1,122 lines total):

1. guide_discussions.md (353 lines) — The Discussion system
   - 23-operation matrix: A1-A7 per-entry + B1-B11 discussion-level
     + C1-C5 undo/redo
   - Take naming convention (<base>_take_<n>), branching, promotion
   - User-managed role list (app.disc_roles)
   - Per-role filter linked to MMA persona focus
   - _disc_entries_lock thread-safety contract
   - Hook API session endpoints
   - Persistence: _flush_to_project, _flush_disc_entries_to_project,
     context_snapshot
   - 9 file:line refs into gui_2.py:3770-4260 + history.py

2. guide_state_lifecycle.md (375 lines) — Undo/redo + reset + state
   delegation
   - HistoryManager + UISnapshot (13 captured fields, 100-snapshot
     capacity, debounced change-detection at render frame)
   - _handle_reset_session (clears 30+ fields, replaces project,
     preserves active_project_path per the 2026-06-08 regression fix)
   - App.__getattr__/__setattr__ state delegation to Controller
   - 4-thread access pattern with 7 lock-protected regions
   - State persistence: in-memory vs project TOML vs config TOML
   - Hot-reload integration
   - Hook API registries (_predefined_callbacks, _gettable_fields)
   - 14 file:line refs into gui_2.py:1140-1170, history.py,
     app_controller.py:3286-3356

3. guide_context_aggregation.md (394 lines) — The aggregate.py
   pipeline
   - 3 aggregation strategies (auto, summarize, full)
   - 7 per-file view modes (full, summary, skeleton, outline,
     masked, custom, none)
   - Full FileItem schema (9 fields + __post_init__ normalizer)
     at models.py:510-559
   - ContextPreset schema and ContextPresetManager
   - Tier 3 worker variant (build_tier3_context with FuzzyAnchor
     re-resolution and focus-file handling)
   - force_full / auto_aggregate short-circuits
   - Cache strategy (static prefix + dynamic history)
   - 23 file:line refs into aggregate.py:36-518 + models.py:909-937

8 existing guides cross-linked to the 3 new guides and to the
nagent_review track:

- guide_gui_2.md           (+ See Also entries for discussions,
                           state lifecycle, context aggregation,
                           nagent_review report)
- guide_app_controller.md  (+ See Also entries for discussions,
                           state lifecycle, context aggregation,
                           nagent_review report)
- guide_context_curation.md (+ new See Also section pointing to
                            context aggregation + nagent_review)
- guide_architecture.md    (+ new See Also section listing all 10
                           guides + nagent_review report)
- guide_ai_client.md       (+ See Also entries for state lifecycle,
                           context aggregation, nagent_review
                           pitfalls #2 and #4)
- guide_mma.md             (+ new See Also section pointing to
                           context aggregation, discussions,
                           nagent_review report §9 + takeaways §3/§10
                           for SubConversationRunner priority)
- guide_models.md          (+ See Also entries for context
                           aggregation, discussions, nagent_review
                           report §6 on FileItem as strongest
                           curation dimension)
- Readme.md                (+ 3 new guide entries in the index
                           table, with one-line summaries)

No code modified. This is documentation only.

Why these 3 guides specifically:

- guide_discussions.md: The discussion system is the user's most
  edited surface. nagent_review's report §3 enumerated 23 operations
  (A1-C5) that previously existed only as scattered file:line refs
  across gui_2.py. A dedicated guide makes the operation matrix
  discoverable.

- guide_state_lifecycle.md: The undo/redo + reset + state delegation
  machinery is architecturally load-bearing but scattered across 4
  files. After nagent_review identified the provider-side history
  divergence as Pitfall #4, the relationship between Manual Slop's
  state and the provider's state needs explicit documentation.

- guide_context_aggregation.md: aggregate.py (518 lines) is the
  most-touched module after ai_client.py but had no dedicated
  guide. nagent_review confirmed it's Manual Slop's strongest
  curation dimension. A dedicated guide makes the 7 view modes
  and 3 strategies discoverable.

The 3 new guides total 1,122 lines and follow the existing
per-source-file deep-dive style (architectural, data-oriented,
state-management-focused).
2026-06-08 19:26:08 -04:00
ed 08ee7547be docs(reports): root cause report for test_full_live_workflow race condition 2026-06-08 09:24:14 -04:00
ed 5252b6d782 docs(testing): document new run_tests_batched.py in Running Tests section 2026-06-08 01:00:50 -04:00
ed bcca069c3b t2 report 2026-06-07 18:08:04 -04:00
ed 20fa355838 chore(deps): tilde-pin all deps; delete requirements.txt
Every direct dep in pyproject.toml now has a ~X.Y.Z bound
(patch-only). The 7 unconstrained deps (imgui-bundle,
anthropic, google-genai, openai, fastapi, mcp, uvicorn,
plus tomli-w) get explicit tilde bounds discovered from
uv.lock. The 6 >=X.Y.Z deps are normalized to tilde-style
(pinned to the current lock version).

The local-rag optional dep (sentence-transformers) is also
tilde-pinned.

requirements.txt is deleted (was redundant with uv.lock;
the uv project uses uv.lock as the canonical lock file,
which is regenerated locally and gitignored per project
policy at .gitignore:9).

Re-running the audit confirms 0 PIN_VIOLATION (was 7). The
final.md report records the post-cleanup state.

Also adds --report-name CLI flag to the audit script
(default 'initial') so the script can write either
initial.md (Phase 1) or final.md (Phase 2) into the same
report directory.
2026-06-07 15:15:30 -04:00
ed a8ae11d3a8 chore(audit): add license_cve audit script + initial report
scripts/audit_license_cve.py: 4 internal checks (license +
CVE + pin + source-header), policy tables (allowlist of
permissive/weak-copyleft/public-domain, blocklist of
non-OSI/restricted-source), and a main() that runs all 4
and emits line-per-violation to stdout + a markdown report.

Tests (26 unit + integration) cover license classifier (16
variants across MIT, BSD, Apache, LGPL, MPL, CC0, WTFPL,
GPL, AGPL, SSPL, BSL, Commons Clause, Elastic, Anti-996,
Hippocratic, unknown), pin check (3), source-header check
(3), license check via importlib.metadata (1), CVE check
via subprocess pip-audit (2), and a smoke test of the main
loop (1).

No new pip deps in the project: pure stdlib
(importlib.metadata, tomllib, pathlib, re) + subprocess to
pip-audit (optional dev tool, installed via 'uv tool install
pip-audit' if user wants CVE checks).

Initial report at docs/reports/license_cve_audit/2026-06-07/
records the current state. The Phase 2 commit will apply
the fixes (tilde-pin, delete requirements.txt); the Phase 3
commit will add --strict mode + baseline file for CI.
2026-06-07 15:07:46 -04:00
ed 114c385b07 agent reports 2026-06-07 12:27:20 -04:00
ed 0f74705d01 docs(reports): add planning digest covering 5 tracks from 2026-06-06 session
Single-session planning digest that captures:
- The 5 tracks fully specced + planned (test_batching, qwen_llama_grok,
  data_oriented_error_handling, data_structure_strengthening,
  mcp_architecture_refactor)
- Cross-cutting design themes (data-oriented, audit-driven, per-track
  commit + git note, out-of-scope-by-default)
- The audit + data foundation (scripts/audit_weak_types.py; 430 -> 60
  finding; 0 strong patterns; 26 unique type strings; 86% concentrated
  in 6 files)
- The dependency graph + recommended execution order
- Follow-up tracks already planned in spec §12.1 of each track
- Recommended future tracks (post-tracks documentation is the top pick)
- Risks, open questions, and a complete file index

This is the kind of reference document that:
- Future planners consult to understand the codebase's current state
- The implementing agent uses to coordinate across tracks
- The user reviews as a digest of the planning work

Written in the project's docs/reports/ directory alongside the existing
Phase 5 reports (PHASE5_STABILISATION_REPORT.md, MUTATION_MATRIX_PHASE5.md, etc.).
2026-06-06 20:56:12 -04:00
ed 6f9a3af201 feat(audit): add main-thread import graph audit + baseline measurements
Phase 1, Tasks T1.2 + T1.4 of the startup_speedup_20260606 track.

NEW: scripts/audit_main_thread_imports.py
  Static CI gate that AST-walks the import graph reachable from
  sloppy.py and fails (exit 1) if any heavy module is imported at the
  top of a main-thread-reachable file. Walks into if/elif/else and
  try/except branches (which run at import time) but skips function
  bodies (which only run when called). Allowlist: stdlib + the lean
  gui_2 skeleton (imgui_bundle, defer, src.imgui_scopes, src.theme_2,
  src.theme_models, src.paths, src.models, src.events).

NEW: scripts/audit_gui2_imports.py
  Read-only analysis tool that lists every top-level and function-level
  import in src/gui_2.py, classified by location. Used in Phase 5D to
  identify which imports to remove.

NEW: tests/test_audit_main_thread_imports.py
  9 tests covering: --help exits 0, clean stdlib-only passes, heavy
  third-party fails, google.genai fails, transitive walks, function-
  body imports ignored, if-branch imports flagged, try-block imports
  flagged, file:line reported. All 9 pass.

NEW: docs/reports/startup_baseline_20260606.txt
  3-run median cold-start benchmark. Worst offenders: src.gui_2
  (1770ms), simulation.user_agent (1517ms), google.genai (1001ms),
  openai (482ms), anthropic (441ms), imgui_bundle (255ms),
  src.theme_nerv* (485ms combined), src.markdown_table (243ms),
  src.command_palette (242ms).

NEW: docs/reports/startup_audit_20260606.txt
  Audit output on the CURRENT codebase. Reports 67 violations across
  the main-thread import graph (incl. numpy in src/gui_2.py:9,
  tomli_w in src/gui_2.py:18, fastapi + requests in src/app_controller,
  tree_sitter_* in src/file_cache, pydantic in src/models, plus all
  the src.* subsystem imports that drag in heavy transitive deps).
  Phase 3-5 of the track will resolve these one by one.

After Phase 3-5, this audit must exit 0 (no violations).

Co-located reports in docs/reports/ per project convention; the other
agent finished their work in docs/superpowers/ and is unrelated.
2026-06-06 14:22:18 -04:00
ed 1c627bcc30 fix(docs): correct section order in guide_testing (patterns before See Also) + fix LF/CRLF 2026-06-06 09:34:38 -04:00
ed e276bac093 docs(gui_2): add __getattr__/__setattr__ delegation pattern + indentation gotcha 2026-06-06 01:59:20 -04:00
ed 4ee22dedb9 docs(testing): add Narrow Test Paths + Indentation-Driven Method Visibility patterns 2026-06-06 01:53:25 -04:00
ed e7b8877f2a docs(readme): update for v2 completion (24 guides, 273 test files, 98.9% pass rate) 2026-06-06 01:42:45 -04:00
ed 11f8772401 docs(spec): live_gui_state_sync — REAL root cause is bad indent in _capture_workspace_profile 2026-06-06 01:08:07 -04:00
ed 3e52f20d16 docs(spec+plan): undo_redo_lifecycle_fix (3-phase investigation: state-sync vs snapshot vs flake) 2026-06-05 22:49:16 -04:00
ed b692353e98 docs(spec+plan): wait_for_ready_test_pattern (replace time.sleep with polling) 2026-06-05 22:45:14 -04:00
ed 85cd34683a docs(spec+plan): prior_session_test_harden (refactor to narrow render_prior_session_view) 2026-06-05 22:41:46 -04:00
ed 9542c4c750 docs(spec+plan): live-gui state sync (App/Controller single source of truth) 2026-06-05 22:36:55 -04:00
ed 1488e71568 docs: add Sentinel type contract note to 3 defer-not-catch sections 2026-06-05 20:31:38 -04:00
ed cb206b973f docs(spec): defer Change 2 (prior_session test) to separate track; reason + follow-up 2026-06-05 20:12:33 -04:00
ed 7a0ed74b5c docs(plan): implementation plan for live-gui fragility fixes 2026-06-05 19:20:21 -04:00
ed f6d9c70de8 docs(spec): defer Change 4 doc hardening per user review 2026-06-05 19:15:50 -04:00
ed 0d6dd8dbab docs(spec): design for live-gui fragility fixes (272-file suite: 269/272 -> 272/272) 2026-06-05 19:05:35 -04:00
ed 9467769260 docs(themes): rewrite authoring guide to match actual API + 8-shipped themes 2026-06-05 18:50:10 -04:00
ed 0fec0f4f56 docs(testing): reframe live_gui gotcha as test-authoring contract, not fixture bug 2026-06-05 18:39:33 -04:00
ed 2312965476 docs(gui_2): add Theme Color-Callable Pattern and Workspace Profile Defer-Not-Catch sections 2026-06-05 18:25:29 -04:00
ed 9a6bcb2f34 docs(testing): add Known Gotchas section (live_gui non-determinism + early-render C crash) 2026-06-05 18:21:24 -04:00
ed 2f0c1eb3cc conductor(index): mark regression_fixes active, add multi_themes recently shipped 2026-06-05 18:18:27 -04:00
ed f63fe68565 docs(index): register guide_themes.md in guides table and file tree 2026-06-05 18:06:12 -04:00
ed db3490a70f conductor(plan): document imgui save_ini crash root cause and fix 2026-06-05 15:12:23 -04:00
ed b0c8589f68 conductor(plan): document root cause - imgui-bundle C-level crash blocks live_gui 2026-06-05 13:47:55 -04:00
ed 1c6919aafc conductor(plan): update task status - 5 done, 6 deferred pending live_gui 2026-06-05 12:43:33 -04:00