manual_slop

Private

Public Access

Author	SHA1	Message	Date
ed	7380e23bc0	feat(tier2): create tier-2-auto-execute slash command template	2026-06-16 19:17:41 -04:00
ed	73ab2778ca	feat(report): implement write_failure_report + 8 tests, 100% coverage	2026-06-16 19:13:30 -04:00
ed	5ca8444f35	test(report): add report writer tests (red, opt-in via TIER2_SANDBOX_TESTS=1)	2026-06-16 19:10:22 -04:00
ed	2dbfaeb60e	test(failcount): add 13 unit tests + 6 coverage tests; 100% coverage achieved	2026-06-16 19:06:09 -04:00
ed	190766fe03	feat(failcount): add default failcount.toml thresholds	2026-06-16 19:01:31 -04:00
ed	fc92e1aa74	feat(failcount): add FailcountState + FailcountConfig dataclasses + all stub functions	2026-06-16 18:59:38 -04:00
ed	e646067a8a	test(failcount): add test_initial_state_zero (red)	2026-06-16 18:58:00 -04:00
ed	9f2ff29c2e	feat(tier2): create scripts/tier2/ package	2026-06-16 18:57:09 -04:00
ed	e060399579	conductor(plan): add state.toml for tier2_autonomous_sandbox track 44 tasks across 9 phases, all pending. Tracks: - failcount unit test progression (13 target) - slash command spec tests (11 target) - report writer tests (4 opt-in) - bootstrap test (1 opt-in) - sandbox enforcement test (1 opt-in) - smoke e2e test (1 opt-in, double gate) Enforcement stack contract: 9 flags tracking the 4 git bans + filesystem boundary + 3 hook installs + OpenCode deny rules + Windows restricted token. Final verification requires all 9 enforcement flags = true. status: active, current_phase: 0, blocked_by: none, blocks: none	2026-06-16 18:51:42 -04:00
ed	2551ff18c7	no t-shirt nonsense (agents.md)	2026-06-16 18:47:50 -04:00
ed	6a26713d74	conductor(plan): Tier 2 autonomous sandbox - implementation plan + metadata 9 phases, 30+ tasks, scope-only (no T-shirt size per user feedback): - Phase 1: failcount module (15 TDD tasks, 13 unit tests, 100% coverage target) - Phase 2: failure report writer (4 sections, opt-in tests) - Phase 3: slash command + agent + opencode.json.fragment templates (11 spec tests) - Phase 4: run_track.py CLI entry point (duplicates slash command protocol) - Phase 5: setup_tier2_clone.ps1 bootstrap (idempotent, -WhatIf support) - Phase 6: run_tier2_sandboxed.ps1 launcher (restricted token skeleton v1) - Phase 7: git hooks (pre-push refuses all pushes, post-checkout logs) - Phase 8: opt-in tests (TIER2_SANDBOX_TESTS=1, TIER2_SMOKE=1) - Phase 9: user guide + tracks.md registration + metadata Key contracts: - FailcountState dataclass with 3 signals (red/green/no_progress) - Result-style with to_dict/from_dict for state persistence - Atomic write via tmp + os.replace - 3-layer enforcement: OpenCode permission system + Windows restricted token + git hooks	2026-06-16 18:46:36 -04:00
ed	568804c7d9	conductor(spec): drop T-shirt size per user feedback	2026-06-16 18:38:09 -04:00
ed	024938bd46	conductor(spec): Tier 2 autonomous sandbox track spec	2026-06-16 18:31:48 -04:00
ed	88e44d1c0e	docs(report): add session report (audit + migration plan + tech-rot prevention)	2026-06-16 10:48:15 -04:00
ed	b90d4bdd4e	feat(scripts): add --ci alias for --strict + CI-gate doc updates	2026-06-16 10:40:21 -04:00
ed	ce85c379ad	docs(agents): add Convention Enforcement section at the top (4 mechanisms)	2026-06-16 10:37:35 -04:00
ed	734840375f	docs(guidelines): add AI Agent Obligations section with 4 enforcement audit scripts	2026-06-16 10:35:55 -04:00
ed	ef1b0a1c6d	docs(styleguide): add AI Agent Checklist section against tech rot	2026-06-16 10:29:26 -04:00
ed	4a55a14fc0	conductor: register result_migration_20260616 in tracks.md (umbrella + 5 sub-tracks)	2026-06-16 10:26:10 -04:00
ed	4cf885da90	docs(workflow+agents): add HARD BAN on day estimates + Tier 1 Track Initialization Rules section	2026-06-16 10:16:49 -04:00
ed	ed6602274d	docs(tracks): strip day estimates from exception_handling_audit + rag_test_failures (Tier 1 rule)	2026-06-16 10:16:17 -04:00
ed	4c0b19b4db	conductor(track): spec/plan/metadata for result_migration_20260616 (5 sub-tracks, NO day estimates)	2026-06-16 10:15:46 -04:00
ed	4521a7df96	feat(scripts): add --summary and --by-size modes to exception_handling audit	2026-06-16 09:41:20 -04:00
ed	01fbd62a3f	conductor(track): mark exception_handling_audit_20260616 as completed	2026-06-16 09:10:14 -04:00
ed	4b8363bd71	conductor: register exception_handling_audit_20260616 in tracks.md	2026-06-16 09:09:34 -04:00
ed	3c59e24162	docs(report): add exception handling audit report (211 violations across 42 files)	2026-06-16 09:07:42 -04:00
ed	4209523228	docs(app_controller+guidelines): add Exception Handling section + audit script cross-reference	2026-06-16 09:07:24 -04:00
ed	b447f66818	docs(styleguide): add 5 sections clarifying the convention's boundaries	2026-06-16 09:06:54 -04:00
ed	9a04153abd	feat(scripts): add exception_handling audit script (10-category classification)	2026-06-16 09:06:25 -04:00
ed	3c267f6b9c	conductor(track): metadata.json for exception_handling_audit_20260616	2026-06-16 09:05:59 -04:00
ed	a33bfb0abd	conductor(track): plan for exception_handling_audit_20260616 (5 phases, ~12 tasks)	2026-06-16 09:05:40 -04:00
ed	e81413a2cd	conductor(track): spec for exception_handling_audit_20260616 (audit + doc clarification)	2026-06-16 09:05:19 -04:00
ed	3d35bb5b3f	todo	2026-06-16 01:03:59 -04:00
ed	ff91c4e8b0	docs(report): add completion report for rag_test_failures_20260615 Comprehensive 12-section completion report following the format of TRACK_COMPLETION_ai_loop_regressions_20260615.md. Documents: - 4 atomic commits, 1288+4+0 fully green baseline - 2 defensive guards in src/rag_engine.py (lines 150 and 331) - 3 new unit tests in tests/test_rag_sync_none_error.py - 4 plan deviations (spec wrong about root cause, test_rag_visual_sim was already passing, traceback diagnostic was a dead end, temp dir cleanup retry loop for Windows) - 5 followup recommendations for Tier 1 review	2026-06-16 00:36:24 -04:00
ed	ba04363003	conductor(track): mark rag_test_failures_20260615 as completed Updated metadata.json: status=completed, completed_at=2026-06-15, verification_criteria filled with actual results. Updated tracks.md: status=shipped, 4-commit summary, test file added. Final result: 1288 pass + 4 skip + 0 fail. All 11 batched test tiers pass in 873.6s. First fully green baseline since 2026-06-12.	2026-06-16 00:31:26 -04:00
ed	d89c58103d	docs(rag): add troubleshooting section for NoneType.get error Documents the two bugs fixed in the rag_test_failures_20260615 track: 1. get_all_indexed_paths: m.get('path') failing on None metadata 2. _validate_collection_dim_result: 'if not embeddings' raising ValueError on non-empty numpy arrays Also documents the 'no such table: tenants' chromadb corruption symptom (wipe .slop_cache/chroma_* to recover). Plus: 'rag_status' shows 'error: ' prefix is the failure indicator; the actual error message is the part after the prefix.	2026-06-16 00:28:53 -04:00
ed	6a0ac35738	conductor(checkpoint): Phase 3 complete - RAG test failures fix verified All 11 batched test tiers pass in 873.6s (333 files): tier-1-unit-comms (6) tier-1-unit-core (194) tier-1-unit-gui (21) tier-1-unit-headless (2) tier-1-unit-mma (20) tier-2-mock_app-comms (2) tier-2-mock_app-core (16) tier-2-mock_app-gui (9) tier-2-mock_app-headless (1) tier-2-mock_app-mma (7) tier-3-live_gui (55) - includes 3 RAG tests previously failing Test delta: 1282 + 4 + 3 -> 1288 + 4 + 0 (3 RAG tests fixed + 3 new unit tests) Phase 3 verification: - Phase 3.1: full RAG suite (27 tests) passes in 36s - Phase 3.2: full test suite (1288 pass + 4 skip + 0 fail) in 697s - Phase 3.3: full batched test suite (11 tiers, 333 files) passes in 873s	2026-06-16 00:26:59 -04:00
ed	355811635d	fix(rag): handle None metadata in get_all_indexed_paths and non-empty numpy in dim check Two bugs in src/rag_engine.py were causing 'NoneType object has no attribute get' in the live_gui RAG tests (test_rag_phase4_final_verify, test_rag_phase4_stress): 1. _validate_collection_dim_result:148 Old: if not embeddings or len(embeddings) == 0: New: if embeddings is None or len(embeddings) == 0: The 'if not embeddings' check raises ValueError('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()') when 'embeddings' is a non-empty numpy array (which is the normal case after documents are upserted). The exception is caught by the outer 'except Exception' which returns a non-ok Result, causing __init__ to set self.collection = None. Subsequent 'get_all_indexed_paths()' then fails with 'NoneType has no attribute get' on self.collection.get(). 2. get_all_indexed_paths:334 Old: return list(set(m.get('path') for m in res['metadatas'] if m.get('path'))) New: return list(set(m['path'] for m in res['metadatas'] if m is not None and m.get('path'))) When chromadb returns 'metadatas=[None, ...]' (documents upserted without metadata), 'm.get('path')' fails with AttributeError on the first None element. Adds 'm is not None' guard. Both fixes are defensive: the conditions that trigger them (orphan docs without metadata, non-empty embeddings arrays) are normal valid states that the old code couldn't handle. New file: tests/test_rag_sync_none_error.py 3 unit tests covering both bugs: - test_dim_check_does_not_raise_on_non_empty_ndarray - test_get_all_indexed_paths_handles_none_metadata - test_get_all_indexed_paths_returns_paths_with_metadata Verified: - 3/3 focused tests pass - test_rag_phase4_final_verify.py::test_phase4_final_verify PASSES (was failing) - test_rag_phase4_stress.py::test_rag_large_codebase_verification_sim PASSES (was failing) - test_rag_visual_sim.py::test_rag_full_lifecycle_sim PASSES (still passing)	2026-06-16 00:09:02 -04:00
ed	29c64a0125	conductor: register rag_test_failures_20260615 in tracks.md + update public_api row	2026-06-15 21:56:20 -04:00
ed	3fc492e302	conductor(track): metadata.json for rag_test_failures_20260615	2026-06-15 21:54:36 -04:00
ed	3aa4cfa133	conductor(track): plan for rag_test_failures_20260615 (5 phases, ~10 tasks)	2026-06-15 21:53:13 -04:00
ed	006df67637	conductor(track): spec for rag_test_failures_20260615 (3 RAG test fixes, single root cause)	2026-06-15 21:51:11 -04:00
ed	bc388f11bb	docs(report): add deviation #2.5 for test_headless_verification fix The headless batch hang the user reported was caused by an xdist worker crash on test_headless_verification_full_run, not a test logic failure. The same root cause as the 4 Phase 2 follow-ups (mock returns raw string but production does 'if not result.ok:'), but with a different failure mode (worker crash that hangs the batched test runner). Documented in section 3 of the report as deviation #2.5 with: - Where it went wrong (missed in the 4 follow-ups) - The specific symptom in the user's session - The fix (out-of-band commit `e35b6a34`) - Lesson for the next spec (verification must include xdist mode)	2026-06-15 21:28:29 -04:00
ed	e35b6a34ad	test(headless_verification): wrap mock return in Result(data=...) The test_headless_verification_full_run test in test_headless_verification.py mocked src.multi_agent_conductor.ai_client.send_result with a return_value of a raw string. The production code does 'if not result.ok:' which fails on raw strings with AttributeError. In xdist mode this caused a worker crash (gw0/gw11: 'node down: Not properly terminated') that hung the entire tier-1-unit-headless batch in the batched test runner (~50s+ per batch). The crash was the worker dying while pytest-master waited for it; the master never got a clean exit and the run was orphaned until the user's manual cancel. The test was missed in the original Phase 2 list (it was an xdist crash rather than a test logic failure) and in the 4 Phase 2 follow-up commits (which targeted the 4 specific test files the user reported during the run). Change: mock_send.return_value = 'Task completed successfully.' -> mock_send.return_value = Result(data='Task completed successfully.') Plus add the Result import. 2/2 tests in test_headless_verification.py now pass under xdist (was 1/2 + worker crash in xdist). Full headless batch (14 tests) completes in 18.7s.	2026-06-15 21:26:42 -04:00
ed	99747cafb9	docs(report): add track completion report for public_api_migration_and_ui_polish_20260615 531-line completion report for Tier 1 review covering: - Goal & scope (per spec) - 7 phases of delivery (per commit) - 6 plan deviations to flag (CRITICAL: 7 production-affected test files + 4 follow-up mock fixes were missed in the original spec; the user's stated mass-rename send_result->send plan; the track was done on master not a feature branch) - Files changed (per category) - Verification (per the spec's 15 verification criteria) - Definition of Done - Recommended next track (send_result -> send rename) - Tier 1 review checklist	2026-06-15 21:10:10 -04:00
ed	bbd4c7b5c0	conductor(track): mark public_api_migration_and_ui_polish_20260615 as completed - metadata.json: status -> completed - state.toml: all 7 phases marked completed; all tasks marked completed with their commit SHAs - Includes the 4 Phase 2 follow-up mock fixes for: test_conductor_engine_v2.py (10 tests) test_context_pruner.py (1 test) test_rag_integration.py (1 test) test_tiered_aggregation.py (1 test) Test count: 1286 + 12 newly-passing = 1298 pass; 4 RAG failures deferred. (Note: 12 newly-passing includes the 6 pre-existing failures from the spec PLUS 6 more from test_conductor_engine_v2.py and the user's manual corrections to test_ai_loop_regressions_20260614.py and test_conductor_engine_v2.py.) Total commits in this track: ~25 atomic commits + 6 phase checkpoints.	2026-06-15 20:41:12 -04:00
ed	13f32f52e0	test(tiered_aggregation): wrap mock_send return in Result(data=...) (Phase 2 follow-up) The test_run_worker_lifecycle_uses_strategy test in test_tiered_aggregation.py mocked src.multi_agent_conductor.ai_client.send_result with a return_value of a raw string. The production code does "if not result.ok:" which fails on raw strings. 3/3 tests in test_tiered_aggregation.py pass (was 2/3).	2026-06-15 20:28:41 -04:00
ed	26e1b65298	test(rag_integration): wrap _send_gemini mock return in Result(data=...) The test_rag_integration test mocks the internal _send_gemini function to return a raw string. The production code in app_controller._handle_request_event now does 'if result.ok:' which fails on raw strings. Change: mock_provider.return_value = 'Mock AI Response' -> mock_provider.return_value = Result(data='Mock AI Response') Plus add the Result import. 1 test passes (was 1 pre-existing failure).	2026-06-15 20:27:07 -04:00
ed	58576fcba7	test(context_pruner): wrap send_result lambda in Result(data=...) (Phase 2 follow-up) The test_token_reduction_logging test in test_context_pruner.py mocked src.ai_client.send_result with a lambda that returned a raw string. The production code now does "if not result.ok:" which fails on raw strings. 1 test passes (was 1 pre-existing failure).	2026-06-15 20:25:44 -04:00
ed	64278d5313	test(conductor_engine_v2): wrap mock_send return values in Result(data=...) The 7 tests in test_conductor_engine_v2.py (already updated to mock src.ai_client.send_result) were still returning raw strings from the mocks. The production code in multi_agent_conductor.py now does "if not result.ok:" which fails on raw strings with AttributeError. Changes: - Add "from src.result_types import Result" import - Wrap all mock_send.return_value = "..." with Result(data="...") (4 sites) - Wrap MagicMock(return_value="...") with Result(data="...") (2 sites) - Wrap side_effect return with Result(data="Success") 10/10 tests pass (was 3/10).	2026-06-15 20:21:46 -04:00

... 9 10 11 12 13 ...

3811 Commits