Root cause discovered after the user's batched test run revealed the
stress test still failed when run after the execution test. The
gemini_cli_adapter persists session_id across tests (singleton). The
execution test set session_id to 'mock-worker-ticket-A-1' (from the
worker call). When the stress test's epic call ran, it used
--resume with that stale session_id. The mock's worker check had
a session_id fallback:
if 'You are assigned to Ticket' in prompt or session_id.startswith('mock-worker-'):
...worker response...
The fallback incorrectly matched the stress test's epic call
(which used the stale worker session_id), causing the mock to return
a worker response instead of an epic response. The production's
generate_tracks then failed to parse the response, returning 0 tracks.
Fix: remove the session_id.startswith('mock-worker-') fallback. Route
workers based on prompt content only. The session_id is for the
production's session management, not for the mock's routing.
This is a 'fix the test infrastructure' change (the mock is a test
artifact, not production). The production's gemini_cli_adapter could
also be fixed to reset session_id on reset_session(), but that's
out of scope for this track.
Verified: the failing test combination (execution test before
stress test) was reproduced and the fix resolves it. The isolated
stress test still passes (3 consecutive runs).
Note: a separate issue was discovered where self.tracks is being
replaced between track appends (different id(self.tracks) values
in the diagnostic log). This causes the API to read 0 tracks after
the accept. The root cause is unclear from this session's
investigation; it appears to be a production code issue where the
in-memory track state is being overwritten by a disk read from
a different project path. This is documented as a follow-up.
The stress test (tests/test_mma_concurrent_tracks_stress_sim.py) uses
mma_epic_input='STRESS TEST: TRACK A AND TRACK B', which the mock's
epic branch did NOT match (it only matched 'PATH: Epic Initialization').
The stress prompt fell to the Default branch which returns text (not
JSON), and the production's orchestrator_pm.generate_tracks failed
to parse it, returning 0 tracks. The test polled for proposed_tracks
(60s timeout, never broke), clicked accept (no proposed_tracks to
process), then asserted tracks >= 2 and found 0.
Root cause: the mock's epic branch was a literal-substring check for
a single test-specific prompt. It was not robust to other test
prompts.
Fix: restructure routing so that sprint and worker are checked first
(more specific patterns), and ANY non-empty prompt that does not
match those patterns is treated as an epic request (returns 2
tracks). Empty prompts fall to the Default branch.
Verification:
- test_mma_concurrent_tracks_execution: still PASSES (uses
'PATH: Epic Initialization' which matches the new catch-all since
it doesn't contain sprint or worker patterns)
- test_mma_concurrent_tracks_stress_sim: now PASSES (uses
'STRESS TEST: TRACK A AND TRACK B' which matches the new catch-all)
- 3 consecutive PASS runs of both tests (13.94s, 14.81s, 14.13s)
This is 'adjust the tests instead' per user directive - the mock is
a test artifact, not production. The production's generate_tracks
correctly returns [] for unparseable responses; the test mock should
be robust enough to return valid JSON for any epic-like prompt.
The prior session_id-based routing (added in 635ca552) had two bugs:
1. call_n literal matching (== 2, == 3) is fragile to test ordering:
the file-based counter persists across tests in the same session,
so call_n != 2 for the 1st sprint if a prior test ran.
2. session_id='mock-sprint-A' means 'this is a follow-up call after
the 1st sprint returned mock-sprint-A', so the response should be
sprint-B (2nd track tickets), not sprint-A. The prior code routed
this to sprint-A, which means track-b's worker has stream id
'ticket-A-1' (not 'ticket-B-1') and the test's 'ticket-B-1' poll
never finds it.
Fix: route on prompt content. The production's conductor_tech_lead
passes the track_brief (containing 'Track A Goal' or 'Track B Goal')
in the user_message. The prompt is NOT empty in --resume mode (the
gemini_cli_adapter passes the prompt as the first turn of the resumed
session).
The prompt-based routing is the original pre-635ca552 design and
works correctly for any number of tracks (A, B, C) without depending
on call ordering.
Verified: 3 consecutive test runs PASS (7.81s, 8.90s, 7.95s) after
the fix. The 'Worker from Track B never appeared' flakiness is gone.
This test was failing for multiple stacked reasons. Fixed the ones I
could identify but the test still does not pass (the bg_task for the
second track does not run, suggesting a deeper integration issue).
Fixes:
1. src/app_controller.py: _start_track_logic_result and _cb_plan_epic both
mutated the frozen ProjectContext dataclass returned by flat_config()
via flat.setdefault('files', {})['paths'] = .... The flat_config()
return type was changed from dict[str, Any] to a frozen @dataclass
ProjectContext by cruft_elimination Phase 2 (in 0d2a9b5e), but the
consumers were never updated. Fix: call flat.to_dict() to get a
mutable dict before mutation.
2. src/app_controller.py: _start_track_logic_result iterated over
sorted_tickets_data expecting dicts but conductor_tech_lead.topological_sort()
returns list[Ticket]. So t_data['id'] raised 'Ticket' object is not
subscriptable. Fix: use Ticket attribute access (t_data.id, etc.).
3. tests/mock_concurrent_mma.py: The mock was not handling the
--resume session-id case that the gemini_cli_adapter uses for
subsequent calls. The mock's first call returns the epic, but
the second call (--resume mock-epic) fell to the default case.
Fix: parse --resume arg from sys.argv and route to per-track
sprint-ticket response based on a persistent call counter.
Known remaining issue: only one sprint-ticket mock call is observed in
the test log; the second track's _start_track_logic does not appear to
call the mock. Could be a deeper integration issue in the test sandbox
or in the _cb_accept_tracks._bg_task loop. Test still fails at line 66.