Private
Public Access
0
0
Commit Graph

3811 Commits

Author SHA1 Message Date
ed 7380e23bc0 feat(tier2): create tier-2-auto-execute slash command template 2026-06-16 19:17:41 -04:00
ed 73ab2778ca feat(report): implement write_failure_report + 8 tests, 100% coverage 2026-06-16 19:13:30 -04:00
ed 5ca8444f35 test(report): add report writer tests (red, opt-in via TIER2_SANDBOX_TESTS=1) 2026-06-16 19:10:22 -04:00
ed 2dbfaeb60e test(failcount): add 13 unit tests + 6 coverage tests; 100% coverage achieved 2026-06-16 19:06:09 -04:00
ed 190766fe03 feat(failcount): add default failcount.toml thresholds 2026-06-16 19:01:31 -04:00
ed fc92e1aa74 feat(failcount): add FailcountState + FailcountConfig dataclasses + all stub functions 2026-06-16 18:59:38 -04:00
ed e646067a8a test(failcount): add test_initial_state_zero (red) 2026-06-16 18:58:00 -04:00
ed 9f2ff29c2e feat(tier2): create scripts/tier2/ package 2026-06-16 18:57:09 -04:00
ed e060399579 conductor(plan): add state.toml for tier2_autonomous_sandbox track
44 tasks across 9 phases, all pending. Tracks:
- failcount unit test progression (13 target)
- slash command spec tests (11 target)
- report writer tests (4 opt-in)
- bootstrap test (1 opt-in)
- sandbox enforcement test (1 opt-in)
- smoke e2e test (1 opt-in, double gate)

Enforcement stack contract: 9 flags tracking the 4 git bans + filesystem
boundary + 3 hook installs + OpenCode deny rules + Windows restricted token.
Final verification requires all 9 enforcement flags = true.

status: active, current_phase: 0, blocked_by: none, blocks: none
2026-06-16 18:51:42 -04:00
ed 2551ff18c7 no t-shirt nonsense (agents.md) 2026-06-16 18:47:50 -04:00
ed 6a26713d74 conductor(plan): Tier 2 autonomous sandbox - implementation plan + metadata
9 phases, 30+ tasks, scope-only (no T-shirt size per user feedback):
- Phase 1: failcount module (15 TDD tasks, 13 unit tests, 100% coverage target)
- Phase 2: failure report writer (4 sections, opt-in tests)
- Phase 3: slash command + agent + opencode.json.fragment templates (11 spec tests)
- Phase 4: run_track.py CLI entry point (duplicates slash command protocol)
- Phase 5: setup_tier2_clone.ps1 bootstrap (idempotent, -WhatIf support)
- Phase 6: run_tier2_sandboxed.ps1 launcher (restricted token skeleton v1)
- Phase 7: git hooks (pre-push refuses all pushes, post-checkout logs)
- Phase 8: opt-in tests (TIER2_SANDBOX_TESTS=1, TIER2_SMOKE=1)
- Phase 9: user guide + tracks.md registration + metadata

Key contracts:
- FailcountState dataclass with 3 signals (red/green/no_progress)
- Result-style with to_dict/from_dict for state persistence
- Atomic write via tmp + os.replace
- 3-layer enforcement: OpenCode permission system + Windows restricted token + git hooks
2026-06-16 18:46:36 -04:00
ed 568804c7d9 conductor(spec): drop T-shirt size per user feedback 2026-06-16 18:38:09 -04:00
ed 024938bd46 conductor(spec): Tier 2 autonomous sandbox track spec 2026-06-16 18:31:48 -04:00
ed 88e44d1c0e docs(report): add session report (audit + migration plan + tech-rot prevention) 2026-06-16 10:48:15 -04:00
ed b90d4bdd4e feat(scripts): add --ci alias for --strict + CI-gate doc updates 2026-06-16 10:40:21 -04:00
ed ce85c379ad docs(agents): add Convention Enforcement section at the top (4 mechanisms) 2026-06-16 10:37:35 -04:00
ed 734840375f docs(guidelines): add AI Agent Obligations section with 4 enforcement audit scripts 2026-06-16 10:35:55 -04:00
ed ef1b0a1c6d docs(styleguide): add AI Agent Checklist section against tech rot 2026-06-16 10:29:26 -04:00
ed 4a55a14fc0 conductor: register result_migration_20260616 in tracks.md (umbrella + 5 sub-tracks) 2026-06-16 10:26:10 -04:00
ed 4cf885da90 docs(workflow+agents): add HARD BAN on day estimates + Tier 1 Track Initialization Rules section 2026-06-16 10:16:49 -04:00
ed ed6602274d docs(tracks): strip day estimates from exception_handling_audit + rag_test_failures (Tier 1 rule) 2026-06-16 10:16:17 -04:00
ed 4c0b19b4db conductor(track): spec/plan/metadata for result_migration_20260616 (5 sub-tracks, NO day estimates) 2026-06-16 10:15:46 -04:00
ed 4521a7df96 feat(scripts): add --summary and --by-size modes to exception_handling audit 2026-06-16 09:41:20 -04:00
ed 01fbd62a3f conductor(track): mark exception_handling_audit_20260616 as completed 2026-06-16 09:10:14 -04:00
ed 4b8363bd71 conductor: register exception_handling_audit_20260616 in tracks.md 2026-06-16 09:09:34 -04:00
ed 3c59e24162 docs(report): add exception handling audit report (211 violations across 42 files) 2026-06-16 09:07:42 -04:00
ed 4209523228 docs(app_controller+guidelines): add Exception Handling section + audit script cross-reference 2026-06-16 09:07:24 -04:00
ed b447f66818 docs(styleguide): add 5 sections clarifying the convention's boundaries 2026-06-16 09:06:54 -04:00
ed 9a04153abd feat(scripts): add exception_handling audit script (10-category classification) 2026-06-16 09:06:25 -04:00
ed 3c267f6b9c conductor(track): metadata.json for exception_handling_audit_20260616 2026-06-16 09:05:59 -04:00
ed a33bfb0abd conductor(track): plan for exception_handling_audit_20260616 (5 phases, ~12 tasks) 2026-06-16 09:05:40 -04:00
ed e81413a2cd conductor(track): spec for exception_handling_audit_20260616 (audit + doc clarification) 2026-06-16 09:05:19 -04:00
ed 3d35bb5b3f todo 2026-06-16 01:03:59 -04:00
ed ff91c4e8b0 docs(report): add completion report for rag_test_failures_20260615
Comprehensive 12-section completion report following the format of
TRACK_COMPLETION_ai_loop_regressions_20260615.md. Documents:

- 4 atomic commits, 1288+4+0 fully green baseline
- 2 defensive guards in src/rag_engine.py (lines 150 and 331)
- 3 new unit tests in tests/test_rag_sync_none_error.py
- 4 plan deviations (spec wrong about root cause, test_rag_visual_sim
  was already passing, traceback diagnostic was a dead end, temp dir
  cleanup retry loop for Windows)
- 5 followup recommendations for Tier 1 review
2026-06-16 00:36:24 -04:00
ed ba04363003 conductor(track): mark rag_test_failures_20260615 as completed
Updated metadata.json: status=completed, completed_at=2026-06-15,
verification_criteria filled with actual results.

Updated tracks.md: status=shipped, 4-commit summary, test file added.

Final result: 1288 pass + 4 skip + 0 fail. All 11 batched test tiers pass
in 873.6s. First fully green baseline since 2026-06-12.
2026-06-16 00:31:26 -04:00
ed d89c58103d docs(rag): add troubleshooting section for NoneType.get error
Documents the two bugs fixed in the rag_test_failures_20260615 track:
1. get_all_indexed_paths: m.get('path') failing on None metadata
2. _validate_collection_dim_result: 'if not embeddings' raising
   ValueError on non-empty numpy arrays

Also documents the 'no such table: tenants' chromadb corruption
symptom (wipe .slop_cache/chroma_* to recover).

Plus: 'rag_status' shows 'error: ' prefix is the failure indicator;
the actual error message is the part after the prefix.
2026-06-16 00:28:53 -04:00
ed 6a0ac35738 conductor(checkpoint): Phase 3 complete - RAG test failures fix verified
All 11 batched test tiers pass in 873.6s (333 files):
  tier-1-unit-comms (6)  tier-1-unit-core (194)
  tier-1-unit-gui (21)   tier-1-unit-headless (2)
  tier-1-unit-mma (20)   tier-2-mock_app-comms (2)
  tier-2-mock_app-core (16)  tier-2-mock_app-gui (9)
  tier-2-mock_app-headless (1)  tier-2-mock_app-mma (7)
  tier-3-live_gui (55) - includes 3 RAG tests previously failing

Test delta: 1282 + 4 + 3 -> 1288 + 4 + 0 (3 RAG tests fixed + 3 new unit tests)

Phase 3 verification:
- Phase 3.1: full RAG suite (27 tests) passes in 36s
- Phase 3.2: full test suite (1288 pass + 4 skip + 0 fail) in 697s
- Phase 3.3: full batched test suite (11 tiers, 333 files) passes in 873s
2026-06-16 00:26:59 -04:00
ed 355811635d fix(rag): handle None metadata in get_all_indexed_paths and non-empty numpy in dim check
Two bugs in src/rag_engine.py were causing 'NoneType object has no attribute get'
in the live_gui RAG tests (test_rag_phase4_final_verify,
test_rag_phase4_stress):

1. _validate_collection_dim_result:148
   Old:  if not embeddings or len(embeddings) == 0:
   New:  if embeddings is None or len(embeddings) == 0:
   The 'if not embeddings' check raises ValueError('The truth value of an
   array with more than one element is ambiguous. Use a.any() or a.all()')
   when 'embeddings' is a non-empty numpy array (which is the normal case
   after documents are upserted). The exception is caught by the outer
   'except Exception' which returns a non-ok Result, causing __init__ to
   set self.collection = None. Subsequent 'get_all_indexed_paths()' then
   fails with 'NoneType has no attribute get' on self.collection.get().

2. get_all_indexed_paths:334
   Old:  return list(set(m.get('path') for m in res['metadatas'] if m.get('path')))
   New:  return list(set(m['path'] for m in res['metadatas'] if m is not None and m.get('path')))
   When chromadb returns 'metadatas=[None, ...]' (documents upserted
   without metadata), 'm.get('path')' fails with AttributeError on the
   first None element. Adds 'm is not None' guard.

Both fixes are defensive: the conditions that trigger them (orphan docs
without metadata, non-empty embeddings arrays) are normal valid
states that the old code couldn't handle.

New file: tests/test_rag_sync_none_error.py
   3 unit tests covering both bugs:
   - test_dim_check_does_not_raise_on_non_empty_ndarray
   - test_get_all_indexed_paths_handles_none_metadata
   - test_get_all_indexed_paths_returns_paths_with_metadata

Verified:
- 3/3 focused tests pass
- test_rag_phase4_final_verify.py::test_phase4_final_verify PASSES (was failing)
- test_rag_phase4_stress.py::test_rag_large_codebase_verification_sim PASSES (was failing)
- test_rag_visual_sim.py::test_rag_full_lifecycle_sim PASSES (still passing)
2026-06-16 00:09:02 -04:00
ed 29c64a0125 conductor: register rag_test_failures_20260615 in tracks.md + update public_api row 2026-06-15 21:56:20 -04:00
ed 3fc492e302 conductor(track): metadata.json for rag_test_failures_20260615 2026-06-15 21:54:36 -04:00
ed 3aa4cfa133 conductor(track): plan for rag_test_failures_20260615 (5 phases, ~10 tasks) 2026-06-15 21:53:13 -04:00
ed 006df67637 conductor(track): spec for rag_test_failures_20260615 (3 RAG test fixes, single root cause) 2026-06-15 21:51:11 -04:00
ed bc388f11bb docs(report): add deviation #2.5 for test_headless_verification fix
The headless batch hang the user reported was caused by an xdist worker
crash on test_headless_verification_full_run, not a test logic failure.
The same root cause as the 4 Phase 2 follow-ups (mock returns raw string
but production does 'if not result.ok:'), but with a different failure
mode (worker crash that hangs the batched test runner).

Documented in section 3 of the report as deviation #2.5 with:
- Where it went wrong (missed in the 4 follow-ups)
- The specific symptom in the user's session
- The fix (out-of-band commit e35b6a34)
- Lesson for the next spec (verification must include xdist mode)
2026-06-15 21:28:29 -04:00
ed e35b6a34ad test(headless_verification): wrap mock return in Result(data=...)
The test_headless_verification_full_run test in test_headless_verification.py
mocked src.multi_agent_conductor.ai_client.send_result with a return_value
of a raw string. The production code does 'if not result.ok:' which
fails on raw strings with AttributeError.

In xdist mode this caused a worker crash (gw0/gw11: 'node down: Not
properly terminated') that hung the entire tier-1-unit-headless batch
in the batched test runner (~50s+ per batch). The crash was the
worker dying while pytest-master waited for it; the master never
got a clean exit and the run was orphaned until the user's manual
cancel.

The test was missed in the original Phase 2 list (it was an xdist
crash rather than a test logic failure) and in the 4 Phase 2
follow-up commits (which targeted the 4 specific test files the
user reported during the run).

Change: mock_send.return_value = 'Task completed successfully.' ->
         mock_send.return_value = Result(data='Task completed successfully.')

Plus add the Result import.

2/2 tests in test_headless_verification.py now pass under xdist
(was 1/2 + worker crash in xdist). Full headless batch (14 tests)
completes in 18.7s.
2026-06-15 21:26:42 -04:00
ed 99747cafb9 docs(report): add track completion report for public_api_migration_and_ui_polish_20260615
531-line completion report for Tier 1 review covering:
- Goal & scope (per spec)
- 7 phases of delivery (per commit)
- 6 plan deviations to flag (CRITICAL: 7 production-affected test files
  + 4 follow-up mock fixes were missed in the original spec; the user's
  stated mass-rename send_result->send plan; the track was done on
  master not a feature branch)
- Files changed (per category)
- Verification (per the spec's 15 verification criteria)
- Definition of Done
- Recommended next track (send_result -> send rename)
- Tier 1 review checklist
2026-06-15 21:10:10 -04:00
ed bbd4c7b5c0 conductor(track): mark public_api_migration_and_ui_polish_20260615 as completed
- metadata.json: status -> completed
- state.toml: all 7 phases marked completed; all tasks marked completed
  with their commit SHAs
- Includes the 4 Phase 2 follow-up mock fixes for:
  test_conductor_engine_v2.py (10 tests)
  test_context_pruner.py (1 test)
  test_rag_integration.py (1 test)
  test_tiered_aggregation.py (1 test)

Test count: 1286 + 12 newly-passing = 1298 pass; 4 RAG failures deferred.
(Note: 12 newly-passing includes the 6 pre-existing failures from the
spec PLUS 6 more from test_conductor_engine_v2.py and the user's
manual corrections to test_ai_loop_regressions_20260614.py and
test_conductor_engine_v2.py.)

Total commits in this track: ~25 atomic commits + 6 phase checkpoints.
2026-06-15 20:41:12 -04:00
ed 13f32f52e0 test(tiered_aggregation): wrap mock_send return in Result(data=...) (Phase 2 follow-up)
The test_run_worker_lifecycle_uses_strategy test in test_tiered_aggregation.py
mocked src.multi_agent_conductor.ai_client.send_result with a return_value
of a raw string. The production code does "if not result.ok:" which
fails on raw strings.

3/3 tests in test_tiered_aggregation.py pass (was 2/3).
2026-06-15 20:28:41 -04:00
ed 26e1b65298 test(rag_integration): wrap _send_gemini mock return in Result(data=...)
The test_rag_integration test mocks the internal _send_gemini
function to return a raw string. The production code in
app_controller._handle_request_event now does 'if result.ok:'
which fails on raw strings.

Change: mock_provider.return_value = 'Mock AI Response' ->
         mock_provider.return_value = Result(data='Mock AI Response')

Plus add the Result import.

1 test passes (was 1 pre-existing failure).
2026-06-15 20:27:07 -04:00
ed 58576fcba7 test(context_pruner): wrap send_result lambda in Result(data=...) (Phase 2 follow-up)
The test_token_reduction_logging test in test_context_pruner.py
mocked src.ai_client.send_result with a lambda that returned
a raw string. The production code now does "if not result.ok:"
which fails on raw strings.

1 test passes (was 1 pre-existing failure).
2026-06-15 20:25:44 -04:00
ed 64278d5313 test(conductor_engine_v2): wrap mock_send return values in Result(data=...)
The 7 tests in test_conductor_engine_v2.py (already updated to
mock src.ai_client.send_result) were still returning raw strings
from the mocks. The production code in multi_agent_conductor.py
now does "if not result.ok:" which fails on raw strings with
AttributeError.

Changes:
- Add "from src.result_types import Result" import
- Wrap all mock_send.return_value = "..." with Result(data="...") (4 sites)
- Wrap MagicMock(return_value="...") with Result(data="...") (2 sites)
- Wrap side_effect return with Result(data="Success")

10/10 tests pass (was 3/10).
2026-06-15 20:21:46 -04:00