manual_slop

Private

Public Access

Author	SHA1	Message	Date
ed	b94dd85f14	conductor(plan): Mark phase 1 verification complete	2026-06-17 18:57:04 -04:00
ed	9cdb2edea6	conductor(plan): Mark task 1.3.3 complete	2026-06-17 18:56:30 -04:00
ed	3c13fd718f	conductor(plan): Mark task 1.3.1-1.3.3 (truncation fix) complete	2026-06-17 18:56:22 -04:00
ed	373783dedc	conductor(plan): Mark task 1.2.3 complete	2026-06-17 18:55:12 -04:00
ed	7c819017d2	conductor(plan): Mark task 1.2.1-1.2.3 (render_json filter fix) complete	2026-06-17 18:55:06 -04:00
ed	241f5b46ff	conductor(plan): Mark task 1.1.1-1.1.3 (visit_Try walker fix) complete	2026-06-17 18:53:44 -04:00
ed	92cea9c483	conductor: register result_migration_small_files_20260617 in tracks.md	2026-06-17 18:22:40 -04:00
ed	cf3c20d7df	docs(track): update result_migration_20260616 umbrella with sub-track 4 +1 site (src/gui_2.py:1349)	2026-06-17 18:22:25 -04:00
ed	5c4244077c	conductor(track): metadata + state for result_migration_small_files_20260617	2026-06-17 18:20:24 -04:00
ed	9f9fcf93e1	conductor(track): plan for result_migration_small_files_20260617	2026-06-17 18:20:06 -04:00
ed	0aa00e394d	conductor(track): spec for result_migration_small_files_20260617 (sub-track 2 of 5)	2026-06-17 18:19:42 -04:00
ed	87f273d044	Merge branch 'master' of C:\projects\manual_slop into tier2/result_migration_review_pass_20260617	2026-06-17 17:21:27 -04:00
ed	3347926717	conductor(track): mark result_migration_review_pass_20260617 as completed (all 22 tasks done; all 11 test tiers PASS)	2026-06-17 16:58:19 -04:00
ed	a6d00f0057	conductor(plan): mark t6_1 and t6_2 complete (audit verified, all 11 test tiers PASS)	2026-06-17 16:55:54 -04:00
ed	428ff64de9	conductor(plan): mark Phase 5 complete (report written + umbrella spec updated)	2026-06-17 16:21:27 -04:00
ed	a152903871	docs(track): update result_migration_20260616 with post-review scope (sub-track 4 gains 1 site; all others unchanged)	2026-06-17 16:20:04 -04:00
ed	662b6e8aba	conductor(plan): mark Phase 4 complete (10 heuristics added; UNCLEAR 24->3 in review scope)	2026-06-17 16:17:02 -04:00
ed	03c9df8450	fix(tier2): deny %TEMP% writes - use app-data dir for temp files The Tier 2 agent wrote audit_exception_handling.py output to C:\\Users\\Ed\\AppData\\Local\\Temp\\audit_initial.json via shell redirection. This is OUTSIDE the sandbox allowlist (which is C:\\projects\\manual_slop_tier2 + C:\\Users\\Ed\\AppData\\Local\\ manual_slop\\tier2 + C:\\Users\\Ed\\AppData\\Local\\manual_slop\\ tier2_failures). The OpenCode session-level guard fires the 'ask' prompt for paths outside the project root, which has no answer in an autonomous session, so ops halted mid-track. Fix (3 layers): 1. opencode.json.fragment: add bash deny rule 'AppData\\Local\\Temp\\': 'deny' to BOTH the top-level permission.bash (for default agents) and the tier2-autonomous agent's permission.bash. The agent physically cannot run shell commands that target the global Temp dir. 2. conductor/tier2/agents/tier2-autonomous.md: add 'Temp files' convention telling the agent to use C:\\Users\\Ed\\AppData\\Local\\manual_slop\\tier2\\ for scratch / audit-output / intermediate files, NOT %TEMP%. 3. conductor/tier2/commands/tier-2-auto-execute.md: same convention in the slash command so the agent sees it at slash-command time. Tests (default-on): - test_agent_denies_temp_writes: agent prompt has the Temp deny in frontmatter bash + the app-data dir note - test_config_fragment_denies_temp_writes: both top-level and agent bash have the deny rule All 16 tier 2 slash command tests pass. Also: cleaned up the leaked audit_initial.json + audit.json + audit_after*.json from %TEMP% (they were leftovers from a prior run). Re-ran setup against the live clone; opencode.json's agent bash and top-level bash both have the deny rule.	2026-06-17 16:13:19 -04:00
ed	8b954ee180	conductor(plan): mark Phase 3 complete (19 INTERNAL_RETHROW sites classified: 7 PATTERN_1 + 2 PATTERN_2 + 9 compliant + 0 migration-target)	2026-06-17 15:57:33 -04:00
ed	af47b3eaa2	conductor(plan): mark t3_6 complete (src/models.py INTERNAL_RETHROW review)	2026-06-17 15:55:44 -04:00
ed	306895f667	conductor(plan): mark t3_5 complete (src/api_hooks.py INTERNAL_RETHROW review)	2026-06-17 15:54:44 -04:00
ed	e3600545bf	conductor(plan): mark t3_4 complete (src/gui_2.py INTERNAL_RETHROW review)	2026-06-17 15:53:37 -04:00
ed	443946f8b3	conductor(plan): mark t3_3 complete (src/app_controller.py INTERNAL_RETHROW review); add rethrow_sites_compliant metric	2026-06-17 15:52:36 -04:00
ed	51a45099ef	conductor(plan): mark t3_2 complete (src/rag_engine.py INTERNAL_RETHROW review)	2026-06-17 15:51:19 -04:00
ed	7804ebd015	conductor(plan): mark t3_1 complete (src/ai_client.py INTERNAL_RETHROW review)	2026-06-17 15:15:10 -04:00
ed	2b34b8fc11	conductor(plan): mark Phase 2 complete (24 UNCLEAR sites reviewed: 23 compliant + 1 migration-target)	2026-06-17 15:12:29 -04:00
ed	31a40dd9c6	conductor(plan): mark t2_5 complete (src/models.py UNCLEAR review)	2026-06-17 15:10:57 -04:00
ed	3119d90170	conductor(plan): mark t2_4 complete (src/app_controller.py UNCLEAR review)	2026-06-17 15:09:57 -04:00
ed	f71af2febe	conductor(plan): mark t2_3 complete (src/ai_client.py UNCLEAR review)	2026-06-17 15:08:55 -04:00
ed	91b3337a18	conductor(plan): mark t2_2 complete (src/mcp_client.py UNCLEAR review)	2026-06-17 15:07:32 -04:00
ed	f94d77eab8	conductor(plan): mark t2_1 complete (src/gui_2.py UNCLEAR review)	2026-06-17 15:05:58 -04:00
ed	bd13bd7d06	conductor(plan): mark Phase 1 setup tasks complete (t1_1, t1_2)	2026-06-17 15:02:45 -04:00
ed	3ec601d4da	fix(tier2): override top-level model to MiniMax-M3 The clone's opencode.json inherited the main repo's top-level 'model' field (zai/glm-5) via 'git clone'. The tier2-autonomous agent has its own 'model: minimax-coding-plan/MiniMax-M3' override, so the default agent path was technically correct, but any other agent spawned without an explicit model (or if the user manually switched to build/plan) would have used zai/glm-5 instead of MiniMax-M3. Fix: 1. Add top-level 'model: minimax-coding-plan/MiniMax-M3' to conductor/tier2/opencode.json.fragment. 2. setup_tier2_clone.ps1 merge now overrides 'model' from the fragment (was only overriding agent, permission, default_agent). 3. Added test_config_fragment_has_top_level_model (default-on) to assert the fragment's model field. 4. Added test_setup_script_overrides_model (opt-in TIER2_SANDBOX_TESTS=1) to assert the merge code. All 17 tests pass (14 default-on + 3 opt-in). Verified: re-ran setup against the live clone; opencode.json's top-level 'model' is now minimax-coding-plan/MiniMax-M3.	2026-06-17 14:50:01 -04:00
ed	396eb82c1a	conductor(track): init result_migration_review_pass_20260617 (sub-track 1 of 5) Sub-track 1 of the 5-sub-track result_migration_20260616 campaign. Audit-driven research task: classify 43 ambiguous exception-handling sites (24 UNCLEAR + 19 INTERNAL_RETHROW across 11 files) and update the audit script's heuristics. No production code change. Scope: 11 files, 43 sites, T-shirt S. The per-site decisions feed sub-tracks 2-4 (small_files, app_controller, gui_2) as their starting migration scope. Files: spec.md, plan.md, metadata.json, state.toml under conductor/tracks/result_migration_review_pass_20260617/. Row added to conductor/tracks.md.	2026-06-17 14:45:52 -04:00
ed	97d306449f	Merge remote-tracking branch 'tier2-clone/tier2/send_result_to_send_20260616' # Conflicts: # manualslop_layout.ini	2026-06-17 13:46:58 -04:00
ed	9cd8536455	fix(tier2): top-level permission allowlist - sandbox paths now enforced Regression: a Tier 2 session was denied access to C:\\projects\\manual_slop_tier2\\scripts\\run_tests_batched.py with 'Allowed base directories are: gencpp, manual_slop'. The tier2-autonomous agent had a correct permission.read allowlist, but the top-level permission block (inherited from the main repo's opencode.json via 'git clone') had no read/write keys, and OpenCode uses the top-level for the default agent path. The agent's permission.read was merged but apparently not enforced for the default-agent access check. Fix: 1. Add a top-level 'permission' block to conductor/tier2/opencode.json.fragment with: - permission.edit: 'deny' (default agents locked down) - permission.read: deny , allow sandbox clone + app-data dirs - permission.write: same - permission.bash: deny , allowlist of read-only git commands + uv run python scripts/{run_tests_batched.py,tier2/*} + basic shell commands. git push/checkout/restore/reset remain denied. 2. Update setup_tier2_clone.ps1 to also patch the top-level 'permission' block (was only merging the tier2-autonomous agent block). The script preserves the user's mcp, model, instructions, watcher, and plugin settings from the inherited opencode.json. 3. Update test_tier2_slash_command_spec.py: - Rename test_command_fetches_origin_main -> ..._master (we changed the slash command on 2026-06-17). - Add test_config_fragment_has_top_level_permission to assert the new top-level permission block has the right deny-all + allowlist shape. The tier2-autonomous agent's permission block is unchanged; it overrides the top-level for that agent's tool calls.	2026-06-17 13:43:53 -04:00
ed	54eb4740b3	conductor+layout: remove T-shirt size metric, regenerate stale layout Per user feedback 2026-06-17: - T-shirt size is not an acceptable sizing metric. Remove it from conductor/workflow.md (the policy file), conductor/tracks.md (the registry), and docs/reports/NEGATIVE_FLOWS_INVESTIGATION_20260617.md. - Regenerate manualslop_layout.ini to remove 83 stale window references that pointed to deleted/renamed windows (Projects, Files, Screenshots, Provider, System Prompts, Discussion History, Comms History, etc.). Layout now matches the windows registered in src/app_controller.py _default_windows (lines 1862-1886). Stale window count: 10 -> 3. T-shirt size removal details: - conductor/workflow.md: Removed the S/M/L/XL table, the replacement pattern row, and the 'reasonable effort' guard's reference. Scope (N files, M sites, N tasks) is the only effort dimension. - conductor/tracks.md: Removed the T-shirt column from the table header and removed T-shirt size mentions from the Fable track entry. - docs/reports/NEGATIVE_FLOWS_INVESTIGATION_20260617.md: Removed the T-shirt size mention in the follow-up track suggestion. Layout fix: - manualslop_layout.ini went from 17,360 bytes (102 windows, 83 stale) to 3,361 bytes (23 windows, all matching _default_windows). The stale window warning dropped from 10 windows to 3 (Message, Tool Calls, Response - these are in _default_windows but reference separate panels in the layout). Verification: layout fix did NOT fix the underlying stack overflow crash. After layout fix, the test still dies with rc=3221225725 (0xC00000FD). The user noted 'Something more fundamental is wrong.' Investigation continues; this commit only addresses the explicit ask (remove T-shirt, fix layout).	2026-06-17 12:23:03 -04:00
ed	167eacc1de	Merge branch 'master' of C:\projects\manual_slop into tier2/send_result_to_send_20260616	2026-06-17 07:37:36 -04:00
ed	07a0e66a19	docs(tier2): apply user feedback - 6 workflow conventions User feedback from the first sandbox run (send_result_to_send_20260616, 2026-06-17) identified 6 conventions Tier 2 must follow. Update the agent prompt template, slash command template, user guide, and workflow doc: 1. Test runner: ALWAYS use 'uv run python scripts/run_tests_batched.py' (NOT 'uv run pytest'). The batched runner provides tier filtering, parallelization (xdist), and a summary table that direct pytest lacks. 2. Default branch: this repo uses 'master', not 'main'. The Tier 2 slash command now does 'git fetch origin master' (was 'origin main'). 3. Line endings: preserve existing. This repo has a mix of CRLF and LF; a repo-wide LF standardization is a future track. 4. Throw-away scripts: write to 'scripts/tier2/artifacts/<track>/', NOT the base 'scripts/tier2/' directory. The base is reserved for production code; throw-away scripts are kept for archival but isolated per-track. 5. End-of-track report: write 'docs/reports/TRACK_COMPLETION_<track>.md' and update 'state.toml' to 'status=completed'. The user reads this to decide merge. Previously this was implicit; now it's explicit. 6. Run-time expectation: tracks are 1-4 hours. If context runs out, Tier 2 notes progress to disk and continues. The --resume flag picks up from the last completed task. Also updated the user guide with a 'Conventions' section and a troubleshooting entry for the resume flow. The verify-the-sandbox checklist now uses 'origin master' instead of 'origin main'.	2026-06-17 02:13:29 -04:00
ed	86fc1c5477	Merge branch 'master' of C:\projects\manual_slop into tier2/send_result_to_send_20260616	2026-06-17 02:00:56 -04:00
ed	e2e570369e	wrong folder	2026-06-17 01:57:52 -04:00
ed	1fc4a6026b	plan update for (send_result-to_send)	2026-06-17 01:54:52 -04:00
ed	a91c1da33c	end of track: test suite log.	2026-06-17 01:43:50 -04:00
ed	959ea38b87	conductor(track): fable_review_20260617 metadata — point to plan.md Plan committed at `8ec6d8f4` (1010 lines, 7 phases, 50+ tasks).	2026-06-17 01:41:58 -04:00
ed	8ec6d8f4a6	conductor(plan): Add fable_review_20260617 plan 7 phases, 50+ bite-sized tasks. Phase 1: init + 4 skeleton files. Phase 2: 10 parallel Tier 3 cluster sub-agent dispatches. Phase 3: 17 synthesis sections (Tier 1 max-token-output strategy). Phase 4: 3 side artifacts. Phase 5: self-review. Phase 6: user review. Phase 7: final commit + register. Every task has a verification command. Fable artifact at docs/artifacts/Fable System Prompt.txt is NEVER staged (verified per-task). No day estimates (per conductor/workflow.md §Tier 1 Track Initialization Rules).	2026-06-17 01:41:42 -04:00
ed	8eaf694f4a	conductor(tracks): Register fable_review_20260617 in tracks.md New research track for critical analysis of Anthropic's Claude Fable 5 system prompt. Added as row 25 in the Active Tracks table (Priority B research) and as a section in the new 'Active Research Tracks (2026-06+)' grouping. The companion spec + metadata + state.toml are committed in `058e2c93` and `a6114ef9`.	2026-06-17 01:19:45 -04:00
ed	c0e2051ec9	conductor(plan): Mark Phase 6 complete - all track tasks done Phase 6 tasks (t6_1, t6_2, t6_3) and the phase itself marked completed. All 16 task entries now have status=completed. All 6 phase entries now have status=completed. This is the final state.toml commit for the track.	2026-06-17 01:18:40 -04:00
ed	9a5d3b9c8c	conductor(plan): Mark Task 6.3 complete - register in tracks.md Added entry after the Tier 2 Autonomous Sandbox track (its parent dependency). Status: shipped 2026-06-17. Notes: 6 phases, 10 atomic rename commits, 37 files modified, 0 new/deleted. Test inventory: 100/101 pass in renamed files; 7 broader pre-existing failures all due to missing credentials.toml (confirmed against origin/master).	2026-06-17 01:18:02 -04:00
ed	5a58e1ceaf	conductor(plan): Mark Task 6.2 complete - metadata.json to status=shipped Track marked shipped 2026-06-17. All 6 verification criteria evaluated with PASS/EXCEEDED/READY status and notes. 7 pre-existing test failures documented with root cause and pre_existing_failures_remaining flag. Risk register updated: scope_creep=none, behavior_change=none, doc_drift=medium (error_handling.md deprecation section required surgical rewrite to historical note). No deferred_to_followup_tracks (this track completed cleanly).	2026-06-17 01:16:43 -04:00
ed	a6114ef9ac	conductor(track): Add fable_review_20260617 state.toml 7 phases (init -> 10 parallel cluster dispatches -> 17 synthesis sections -> 3 side artifacts -> self-review -> user review -> register). Each phase has explicit task IDs (t1_1 .. t7_4) for Tier 2 to walk through. current_phase = 0 (spec approved, not started). Hard rule encoded in [meta]: docs/artifacts/Fable System Prompt.txt is NEVER committed.	2026-06-17 01:16:20 -04:00

1 2 3 4 5 ...