Private
Public Access
0
0
Commit Graph

3369 Commits

Author SHA1 Message Date
ed 167eacc1de Merge branch 'master' of C:\projects\manual_slop into tier2/send_result_to_send_20260616 2026-06-17 07:37:36 -04:00
ed 07a0e66a19 docs(tier2): apply user feedback - 6 workflow conventions
User feedback from the first sandbox run (send_result_to_send_20260616,
2026-06-17) identified 6 conventions Tier 2 must follow. Update the agent
prompt template, slash command template, user guide, and workflow doc:

1. Test runner: ALWAYS use 'uv run python scripts/run_tests_batched.py'
   (NOT 'uv run pytest'). The batched runner provides tier filtering,
   parallelization (xdist), and a summary table that direct pytest lacks.

2. Default branch: this repo uses 'master', not 'main'. The Tier 2 slash
   command now does 'git fetch origin master' (was 'origin main').

3. Line endings: preserve existing. This repo has a mix of CRLF and LF;
   a repo-wide LF standardization is a future track.

4. Throw-away scripts: write to 'scripts/tier2/artifacts/<track>/', NOT
   the base 'scripts/tier2/' directory. The base is reserved for
   production code; throw-away scripts are kept for archival but
   isolated per-track.

5. End-of-track report: write 'docs/reports/TRACK_COMPLETION_<track>.md'
   and update 'state.toml' to 'status=completed'. The user reads this
   to decide merge. Previously this was implicit; now it's explicit.

6. Run-time expectation: tracks are 1-4 hours. If context runs out, Tier
   2 notes progress to disk and continues. The --resume flag picks up
   from the last completed task.

Also updated the user guide with a 'Conventions' section and a
troubleshooting entry for the resume flow. The verify-the-sandbox
checklist now uses 'origin master' instead of 'origin main'.
2026-06-17 02:13:29 -04:00
ed 86fc1c5477 Merge branch 'master' of C:\projects\manual_slop into tier2/send_result_to_send_20260616 2026-06-17 02:00:56 -04:00
ed e2e570369e wrong folder 2026-06-17 01:57:52 -04:00
ed 1fc4a6026b plan update for (send_result-to_send) 2026-06-17 01:54:52 -04:00
ed 9899ad8a41 ignore coverage 2026-06-17 01:54:24 -04:00
ed abf92a8b31 feat(tier2): add fetch_tier2_branch.ps1 - bridge from sandbox to main repo
The Tier 2 sandbox blocks git push (and all other destructive git ops).
After Tier 2 finishes a track, this script is the bridge: it fetches the
tier2/<track> branch from the sandboxed clone (C:\projects\manual_slop_tier2)
into the main repo (C:\projects\manual_slop), creating a local
review/<track> branch so the working tree is untouched.

Usage:
  pwsh -File scripts\\tier2\\fetch_tier2_branch.ps1 -TrackName send_result_to_send_20260616

Supports -WhatIf for dry-run. Does NOT push to origin (user's call).
2026-06-17 01:52:04 -04:00
ed a91c1da33c end of track: test suite log. 2026-06-17 01:43:50 -04:00
ed 959ea38b87 conductor(track): fable_review_20260617 metadata — point to plan.md
Plan committed at 8ec6d8f4 (1010 lines, 7 phases, 50+ tasks).
2026-06-17 01:41:58 -04:00
ed 8ec6d8f4a6 conductor(plan): Add fable_review_20260617 plan
7 phases, 50+ bite-sized tasks. Phase 1: init + 4 skeleton files. Phase 2: 10 parallel Tier 3 cluster sub-agent dispatches. Phase 3: 17 synthesis sections (Tier 1 max-token-output strategy). Phase 4: 3 side artifacts. Phase 5: self-review. Phase 6: user review. Phase 7: final commit + register. Every task has a verification command. Fable artifact at docs/artifacts/Fable System Prompt.txt is NEVER staged (verified per-task). No day estimates (per conductor/workflow.md §Tier 1 Track Initialization Rules).
2026-06-17 01:41:42 -04:00
ed 511a19aab2 send_result_to_send_20260616 session transcript.
This one was important to keep is it was the first attempt at an autonomous run.
Essentially worked except for a turn exhaustion on ai side (need to tweak some config maybe).
2026-06-17 01:32:07 -04:00
ed 219b653a45 docs(tier2): add track completion report (final verification + handoff)
End-of-track report following the same format as
TRACK_COMPLETION_tier2_autonomous_sandbox_20260616.md. Documents:
- 24-commit inventory (10 atomic renames + 14 plan/script commits)
- All 6 phases completed, all 9 verification flags = true
- Pre-existing failures (7 tests, all credentials.toml, confirmed
  against origin/master baseline where they also fail)
- 2 surgical doc fixes in error_handling.md (deprecation section +
  line 204 contradiction)
- Sandbox enforcement contracts held (4 of 4 hard bans + 4 of 4
  secondary contracts)
- User handoff instructions (fetch + diff + merge + per-commit review)

The track is the first end-to-end test of the tier2_autonomous_sandbox;
this report is the final deliverable for that test.
2026-06-17 01:22:57 -04:00
ed 8eaf694f4a conductor(tracks): Register fable_review_20260617 in tracks.md
New research track for critical analysis of Anthropic's Claude Fable 5 system prompt. Added as row 25 in the Active Tracks table (Priority B research) and as a section in the new 'Active Research Tracks (2026-06+)' grouping. The companion spec + metadata + state.toml are committed in 058e2c93 and a6114ef9.
2026-06-17 01:19:45 -04:00
ed c0e2051ec9 conductor(plan): Mark Phase 6 complete - all track tasks done
Phase 6 tasks (t6_1, t6_2, t6_3) and the phase itself marked completed.
All 16 task entries now have status=completed.
All 6 phase entries now have status=completed.

This is the final state.toml commit for the track.
2026-06-17 01:18:40 -04:00
ed 9a5d3b9c8c conductor(plan): Mark Task 6.3 complete - register in tracks.md
Added entry after the Tier 2 Autonomous Sandbox track (its parent
dependency). Status: shipped 2026-06-17. Notes: 6 phases, 10 atomic
rename commits, 37 files modified, 0 new/deleted. Test inventory:
100/101 pass in renamed files; 7 broader pre-existing failures all
due to missing credentials.toml (confirmed against origin/master).
2026-06-17 01:18:02 -04:00
ed 5a58e1ceaf conductor(plan): Mark Task 6.2 complete - metadata.json to status=shipped
Track marked shipped 2026-06-17. All 6 verification criteria evaluated
with PASS/EXCEEDED/READY status and notes. 7 pre-existing test failures
documented with root cause and pre_existing_failures_remaining flag.

Risk register updated: scope_creep=none, behavior_change=none,
doc_drift=medium (error_handling.md deprecation section required
surgical rewrite to historical note).

No deferred_to_followup_tracks (this track completed cleanly).
2026-06-17 01:16:43 -04:00
ed a6114ef9ac conductor(track): Add fable_review_20260617 state.toml
7 phases (init -> 10 parallel cluster dispatches -> 17 synthesis sections -> 3 side artifacts -> self-review -> user review -> register). Each phase has explicit task IDs (t1_1 .. t7_4) for Tier 2 to walk through. current_phase = 0 (spec approved, not started). Hard rule encoded in [meta]: docs/artifacts/Fable System Prompt.txt is NEVER committed.
2026-06-17 01:16:20 -04:00
ed 058e2c9385 conductor(track): Add fable_review_20260617 spec + metadata
Critical-analysis track for Anthropic's Claude Fable 5 system prompt (1585 lines, the public 'Mythos' version). 10 cluster sub-reports written by Tier 3 workers in parallel, synthesized by Tier 1 into a 17-section report (>3500 LOC) with 3 side artifacts. T-shirt size: XL. Fable artifact at docs/artifacts/Fable System Prompt.txt is local-only and MUST NOT be committed (per user hard rule). No day estimates (per conductor/workflow.md §Tier 1 Track Initialization Rules).
2026-06-17 01:15:58 -04:00
ed aad6deffcb conductor(plan): Mark Task 6.1 complete - state.toml updated
All 16 task entries now have status=completed and commit_sha.
All 6 phases marked completed (phase_6 in_progress pending metadata+tracks.md).
All 9 verification flags = true.
All 6 enforcement_stack flags = true (sandbox contracts exercised).

Added [notes] section documenting:
- Phase 4 file count discrepancy (22 actual vs 24 spec)
- error_handling.md deprecation section replacement
- Pre-existing test failures (unrelated to track)
- MCP edit_file unreliability + Python fallback
2026-06-17 01:15:33 -04:00
ed d86131d951 conductor(plan): Mark Task 5.2 + 5.3 complete (Phase 5 verification)
Final grep: 0 send_result in active code. 3 historical refs in
error_handling.md (intentional, in the 'Historical deprecation' note).

Test verification: 100/101 tests pass in the 26 files renamed by this
track. 1 pre-existing failure in test_headless_service.py due to
missing credentials.toml (verified against origin/master baseline
where it also fails - unrelated to the rename).
2026-06-17 01:14:24 -04:00
ed ea7d794a6b conductor(plan): Mark Task 5.2 + 5.3 complete (Phase 5 verification done)
Final grep: 0 send_result in active code. 3 historical refs in
error_handling.md (intentional, in the 'Historical deprecation' note).

Test verification: 100/101 tests pass in the 26 files renamed by this
track. 1 pre-existing failure in test_headless_service.py due to
missing credentials.toml (verified against origin/master baseline
where it also fails - unrelated to the rename).

7 broader suite failures all pre-existing (all FileNotFoundError on
credentials.toml, confirmed against origin/master baseline).

Track verification:
- git grep send_result: 0 in active code (3 historical intentional)
- Full test suite: matches pre-rename baseline (7 pre-existing failures
  unrelated to the rename, 0 new regressions)
2026-06-17 01:13:25 -04:00
ed 5cc422b34b conductor(plan): Mark Task 5.1 complete (Phase 5 docs done) 2026-06-17 00:51:07 -04:00
ed 9b5011231c docs(ai_client): rename send_result to send in 3 current docs
Doc consistency: guide_ai_client.md, guide_app_controller.md, and
the error_handling styleguide now reference the new symbol name.

Also fixes two consistency issues in error_handling.md introduced by
the mechanical rename:
1. The 'Deprecation: send -> send_result' section (lines 623-642) was
   rewritten as a 'Historical deprecation (added 2026-06-15, reverted
   2026-06-16)' note that points to the relevant track specs.
2. Line 204 (the 'Current State Audit' summary for src/ai_client.py)
   had a self-contradictory claim ('send() is the new public API;
   send() is @deprecated') after the rename. Updated to describe
   the canonical public API.

Historical archives (conductor/tracks/*/spec.md, conductor/tracks/*/plan.md,
docs/reports/*) are NOT modified - they document the 2026-06-15
public_api_migration decision and stay as historical record.
2026-06-17 00:50:36 -04:00
ed d17d8743dd conductor(plan): Mark Task 4.1 complete (Phase 4 done) 2026-06-17 00:45:44 -04:00
ed ada9617308 test(ai_client): rename send_result to send in 22 remaining test files
Batch rename of 22 test files. 62 references renamed total.

The full test suite is now GREEN again, matching the pre-rename baseline
from Task 1.1. Pure mechanical rename. No behavior change.

Files affected: test_ai_cache_tracking, test_ai_client_cli,
test_ai_client_result, test_api_events, test_context_pruner,
test_deepseek_provider, test_gemini_cli_* (3 files), test_gui2_mcp,
test_headless_* (2 files), test_live_gui_integration_v2,
test_orchestration_logic, test_phase6_engine, test_rag_integration,
test_run_worker_lifecycle_abort, test_spawn_interception_v2,
test_symbol_parsing, test_tier4_interceptor, test_tiered_aggregation,
test_token_usage.

Note: spec estimated 24 files; actual is 22 (test_deprecation_warnings
no longer exists, and 1 fewer file than spec's list).

Refs: conductor/tracks/send_result_to_send_20260616/
2026-06-17 00:38:29 -04:00
ed 2f45bc4d68 conductor(plan): Mark Task 3.5 + 3.6 complete (Phase 3 done) 2026-06-17 00:35:32 -04:00
ed e8a9102f19 test(ai_client): rename send_result to send in test_orchestrator_pm_history
4 references renamed. Test file state: GREEN. 3 tests pass.

Phase 3 complete (all 5 high-impact test files green).
2026-06-17 00:34:37 -04:00
ed 53b35de5c6 conductor(plan): Mark Task 3.4 complete 2026-06-17 00:34:00 -04:00
ed 423f9a95b0 test(ai_client): rename send_result to send in test_conductor_tech_lead
11 references renamed (planned 8; the count grew with the @patch pattern + local var name).
Test file state: GREEN. 9 tests pass.
2026-06-17 00:33:36 -04:00
ed 58fe3a9cb5 conductor(plan): Mark Task 3.3 complete 2026-06-17 00:33:00 -04:00
ed 4393e831b0 test(ai_client): rename send_result to send in test_ai_loop_regressions_20260614
13 references renamed (planned 12; one extra found in a comment).

Test function test_fr2_send_result_callable_in_app_controller_namespace
renamed to test_fr2_send_callable_in_app_controller_namespace.

7 tests pass.
2026-06-17 00:32:33 -04:00
ed 6dbba46a25 conductor(plan): Mark Task 3.2 complete 2026-06-17 00:31:33 -04:00
ed 5e99c204a3 test(ai_client): rename send_result to send in test_orchestrator_pm
14 references renamed (decorators + parameter names + assertions).
Test file state: GREEN. 3 tests pass.
2026-06-17 00:30:48 -04:00
ed f0663fda6a conductor(plan): Mark Task 3.1 complete 2026-06-17 00:29:54 -04:00
ed 3e2b4f74ba test(ai_client): rename send_result to send in test_conductor_engine_v2
22 references renamed (mostly monkeypatch.setattr calls + comments).
Test file state: GREEN. All 10 tests in this file now pass.
2026-06-17 00:29:21 -04:00
ed d714d10fd4 conductor(plan): Mark Task 2.1 complete 2026-06-17 00:28:17 -04:00
ed d87d909f7b refactor(ai_client): rename send_result to send in 5 src/ call sites
Renames 10 references across app_controller, conductor_tech_lead,
mcp_client (docstring example), multi_agent_conductor, orchestrator_pm.

5 call sites in ai_client.send_result(...) -> ai_client.send(...)
3 print strings mentioning send_result
1 docstring comment (conductor_tech_lead)
1 docstring example (mcp_client) 'src.ai_client.send_result' -> 'src.ai_client.send'

Test suite state: still red, but all src/-level call sites are now
renamed. Remaining failures are in test files (mocks and patches
that still reference send_result).

Refs: conductor/tracks/send_result_to_send_20260616/
2026-06-17 00:27:47 -04:00
ed 4a59567939 conductor(plan): Mark Task 1.1 complete 2026-06-17 00:26:05 -04:00
ed 5351389fc0 refactor(ai_client): rename send_result to send (the impl, TDD red moment)
The TDD red moment. The implementation is renamed but the call sites
in src/, tests/, and docs still use send_result. Subsequent commits
rename the call sites and progressively move the test suite back to
green.

10 references renamed in src/ai_client.py:
- 4 'Called by: send_result' docstring tags in private provider helpers
- 1 function definition (def send_result -> def send)
- 1 [C: ...] SDM tag referencing test function names
- 2 monitor component names (start_component / end_component)
- 2 error source strings (CONFIG + INTERNAL)

Also adds scripts/tier2/apply_t1_1_edits.py - the helper script that
applied the 10 edits. Kept in scripts/tier2/ as a record of the
mechanical change pattern.

Refs: conductor/tracks/send_result_to_send_20260616/
2026-06-17 00:23:16 -04:00
ed c1d9a966d7 conductor(plan): Rename send_result to send (sandbox test track)
The first end-to-end test of the tier2_autonomous_sandbox_20260616
sandbox. Pure mechanical rename: ai_client.send_result to ai_client.send
across 38 active files (6 src/, 29 tests/, 3 current docs). 10 atomic
commits across 5 phases. No behavior change; no new tests; the existing
test suite is the safety net.

Phase structure:
- Phase 1: rename src/ai_client.py (TDD red moment)
- Phase 2: rename 5 other src/ files (batch)
- Phase 3: rename top 5 test files (one commit per file)
- Phase 4: rename 24 remaining test files (batch)
- Phase 5: rename 3 current docs + final verification
- Phase 6: update state + metadata + register in tracks.md

Historical archives (conductor/tracks/*/spec.md, conductor/tracks/*/plan.md,
docs/reports/*) are NOT modified per spec section 7.
2026-06-16 23:52:59 -04:00
ed 9ba61d43d3 docs(tier2): add track completion report (final verification + spec coverage matrix) 2026-06-16 23:29:00 -04:00
ed 00c6922c0b conductor(plan): mark tier2_autonomous_sandbox_20260616 as complete (all 9 phases done) 2026-06-16 23:23:28 -04:00
ed eedbfa1180 conductor(plan): update metadata.json to status=shipped + actual test counts 2026-06-16 23:22:24 -04:00
ed 2f79f19989 conductor(plan): register tier2_autonomous_sandbox_20260616 in tracks.md 2026-06-16 23:21:21 -04:00
ed 8bf7cd175b docs(tier2): add user guide for Tier 2 autonomous sandbox 2026-06-16 22:48:13 -04:00
ed 3e17aa6c8b test(tier2): add smoke e2e test (opt-in, double-gate TIER2_SANDBOX_TESTS+TIER2_SMOKE) 2026-06-16 22:26:04 -04:00
ed 5b6e7db174 test(tier2): add sandbox enforcement test (pre-push hook refuses push) 2026-06-16 20:25:44 -04:00
ed 5d150dc6e0 test(tier2): add bootstrap -WhatIf test (opt-in via TIER2_SANDBOX_TESTS) 2026-06-16 20:01:32 -04:00
ed 37eafc008e test(tier2): add trivial smoke track for e2e test (force-added, fixture) 2026-06-16 19:57:36 -04:00
ed cb7c82008e test(tier2): add tier2_sandbox and tier2_smoke pytest markers 2026-06-16 19:56:20 -04:00