TIER-2 READ AGENTS.md conductor/workflow.md conductor/edit_workflow.md conductor/tier2/githooks/forbidden-files.txt conductor/tracks/tier2_leak_prevention_20260620/spec.md conductor/code_styleguides/data_oriented_design.md conductor/code_styleguides/error_handling.md conductor/code_styleguides/type_aliases.md before Phase 0 Tasks 0.1, 0.2, 0.4.
Phase 0 of metadata_promotion_20260624. 11 NEW per-aggregate dataclasses added to src/type_aliases.py (CommsLogEntry, HistoryMessage, FileItem, ToolDefinition, SessionInsights, DiscussionSettings, CustomSlice, MMAUsageStats, ProviderPayload, UIPanelConfig, PathInfo) + RAGChunk added to src/rag_engine.py. Metadata: TypeAlias = dict[str, Any] preserved unchanged as the catch-all for collapsed codepaths. Each dataclass has paired to_dict()/from_dict() methods.
11 regression-guard test files created with 5-7 tests each (~70 tests total). All tests PASS.
The existing tests/test_type_aliases.py was updated to reflect the NEW design (CommsLogEntry etc. are now classes, not aliases to Metadata).
Conventions: 1-space indentation, CRLF preserved, no comments.
Phase 7 of code_path_audit_phase_3_provider_state_20260624.
Per-provider history is now accessed via provider_state.get_history()
at call sites; the 12 module-level _X_history/_X_history_lock aliases
are no longer referenced anywhere in production code (helper function
DEFINITIONS that take history as a parameter are unaffected).
TIER-2 READ conductor/code_styleguides/error_handling.md before Phase 2 (deepseek migration; RLock re-entrance critical).
Phase 2 of code_path_audit_phase_3_provider_state_20260624. 11 sites in _send_deepseek (lines 2186-2414) migrated from _deepseek_history/_deepseek_history_lock to local capture history = provider_state.get_history('deepseek'). The RLock re-entrance is critical here — this was the deadlock-prone site that prompted cc7993e5. The local capture pattern uses one acquisition per function instead of one per call site, minimizing lock acquisitions while preserving the same RLock instance that _deepseek_history_lock aliased to.
4 with-blocks migrated (lines 2195, 2215, 2347, 2412). 6 _deepseek_history alias references migrated to history (lines 2196, 2197, 2201, 2216, 2354, 2414).
Verified: 30 tests pass across test_provider_state_migration (14) + test_deepseek_provider (7) + 5 ai_client test files. The test_lock_acquisition_no_deadlock regression test verifies RLock re-entrance works correctly inside the with history.lock: blocks.
Conventions: 1-space indentation, CRLF preserved, no comments added.
TIER-2 READ conductor/code_styleguides/error_handling.md before Phase 1 (anthropic migration).
Phase 1 of code_path_audit_phase_3_provider_state_20260624. 13 call sites in _send_anthropic (lines 1430-1575) migrated from the module-level _anthropic_history alias to a local capture history = provider_state.get_history('anthropic'). The local capture pattern is used (instead of repeated provider_state.get_history() calls) to minimize lock acquisitions and improve readability.
The migration preserves behavior: ProviderHistory is the same singleton that _anthropic_history aliased to, so the migration is a pure refactor. The lock acquisition pattern is unchanged (this function does not acquire _anthropic_history_lock; thread-safety comes from _send_anthropic being called per-thread).
Verified: 37 tests pass across test_provider_state_migration.py + 6 ai_client test files.
Conventions: 1-space indentation, CRLF preserved, no comments added.
TIER-3 READ AGENTS.md + conductor/workflow.md + conductor/code_styleguides/error_handling.md + the 4 source files + 3 test files before this commit.
The code_path_audit_phase_2_20260624 track (Tier 2) shipped 11 audit
fixes (4 NG1 + 7 NG2) but used a heuristic bypass for 4 of the NG2
wrappers: legacy T | None functions that exist only to maintain test
patcher compatibility. Per the review at
docs/reports/REVIEW_TIER2_code_path_audit_phase_2_20260624.md Finding 8,
this track eliminates the legacy wrappers properly.
11 wrappers eliminated (8 main + 3 _legacy_compat inner):
- src/ai_client.py: get_current_tier (1 src + 1 test consumer)
- src/ai_client.py: _gemini_tool_declaration + _legacy_compat (2 test consumers)
- src/ai_client.py: run_tier4_patch_callback + _legacy_compat (was 0 direct callers
but had 2 callback references in app_controller/multi_agent_conductor;
callback contract migrated to Callable[[str, str], Result[str]] instead of
preserving an Optional[str] adapter)
- src/mcp_client.py: _get_symbol_node + _legacy_compat (8 in-file consumers)
- src/mcp_client.py: find_in_scope (nested inside _get_symbol_node_result;
private impl detail, audit doesn't catch T | None, left as-is)
- src/external_editor.py: launch_diff (1 src + 3 test + 1 live_gui test consumer)
- src/external_editor.py: launch_editor (no consumers; deleted)
- src/session_logger.py: log_tool_output (2 src + 3 test consumers)
- src/project_manager.py: parse_ts (no consumers; deleted)
For each consumer: replace legacy_fn(args) with legacy_fn_result(args).data.
For T | None checks: replace if x is None: with if not result.ok: or
if not result.ok or not isinstance(result.data, ...) (depending on pattern).
For run_tier4_patch_callback specifically: the wrapper was a callback adapter
(not a backward-compat shim) and had 2 callback references as consumers.
Rather than keep the adapter (which would re-introduce the Optional[str]
return that the strict audit catches), the patch_callback contract was migrated
from Callable[[str, str], Optional[str]] to Callable[[str, str], Result[str]]
in shell_runner.py + app_controller.py + 9 _send_<vendor>_result signatures
in ai_client.py. This propagates the Result[str] through the callback and
lets shell_runner unwrap with if r.ok and r.data instead of if patch_text.
Verification:
- audit_optional_in_3_files --strict: 0 return-type Optional[T] (down from 1)
- audit_exception_handling --strict: 0 violations (unchanged)
- audit_legacy_wrappers: 0 legacy wrappers (unchanged)
- 15 affected test files: 168 tests pass
- 8 mcp_client/structural/baseline test files: 55 tests pass
- 3 session/gui test files: 7 tests pass
- 0 return-type Optional[T] in src/ai_client.py (was 1: run_tier4_patch_callback)
The 7 code_path_audit*.py files (2604 lines total) are pure static
analysis tools. They do AST traversal of src/, no intrusive profiling,
no runtime markers. They were inlaid with src/ but only import:
- src.result_types (the Result[T] convention type)
- each other (the 6 siblings)
After the move:
- src/ is now pure application code; line-count audit metrics are clean
- scripts/code_path_audit/ is a new namespace-isolated subdir per
AGENTS.md 'scripts are namespace-isolated by directory' rule
TIER-3 READ AGENTS.md + conductor/workflow.md + conductor/edit_workflow.md
+ conductor/code_styleguides/code_path_audit.md + the 7 files before
this commit.
Changes:
- 7 files moved: src/code_path_audit*.py -> scripts/code_path_audit/
- 7 files updated: internal imports rom src.code_path_audit_X ->
rom code_path_audit_X (siblings in same subdir)
- 7 files updated: add sys.path.insert(0, str(Path(__file__).resolve().parents[2] / 'src'))
to find src.result_types when run standalone
- 5 test files updated: rom src.code_path_audit -> rom code_path_audit
+ sys.path setup to find the new subdir
- 6 throwaway scripts in scripts/tier2/artifacts/ updated: import path
+ sys.path setup (parents[3] / 'src' + parents[3] / 'scripts' / 'code_path_audit')
- 2 styleguide/spec references updated: conductor/code_styleguides/code_path_audit.md
+ conductor/tracks/code_path_audit_20260607/spec_v2.md
- 1 meta-audit docstring updated: scripts/audit_code_path_audit_coverage.py
- 1 type registry entry deleted: docs/type_registry/src_code_path_audit.md
(the type is no longer in src/)
- 1 type registry index updated: docs/type_registry/index.md (22 files, was 23)
Verification:
- 7/7 audit gates pass --strict (weak_types 102<=112, type_registry 22 files,
main_thread_imports OK, no_models_config_io OK, code_path_audit_coverage 0
violations, exception_handling 0 violations, optional_in_3_files 0 violations)
- 6/6 test files pass: test_code_path_audit, test_code_path_audit_integration,
test_code_path_audit_phase78, test_code_path_audit_phase89,
test_code_path_audit_ssdl_behavioral, test_metadata_nil_sentinel
- src/ line count: 29997 lines (down from 32621 = -2624 lines)
- scripts/code_path_audit/ line count: 2620 lines
3 Result helper methods (_deserialize_active_track_result, _serialize_tool_calls_result, _parse_token_history_first_ts_result) were nested inside cb_load_prior_log as inner defs. The inner 'return' at the except block (line 2370) made the rest of the function body (lines 2377-2392) unreachable past the nested defs' scope.
User fix: moved the 3 helpers to class level so they're reachable from other class methods (_refresh_from_project, _load_beads, etc.). Kept _resolve_log_ref and _read_ref_file_result as nested defs inside cb_load_prior_log because they're only used there.
File: -69 lines (the 60-line def cb_load_prior_log block from its original position), +64 lines (the 3 helpers + cb_load_prior_log re-added in the correct order).
Verified: ast.parse OK; from src import app_controller OK; AppController.cb_load_prior_log is reachable.
TIER-3 READ AGENTS.md + conductor/code_styleguides/error_handling.md + src/provider_state.py + src/ai_client.py:2148-2220 before provider-state-rlock-fix.
Tier 2's 25a22057 commit re-bound the 14 module globals in src/ai_client.py as
aliases to provider_state.get_history(...) instances. The ProviderHistory dunder
methods (__bool__, __len__, __iter__, __getitem__) all use \with self.lock:\.
The dunders are non-reentrant: \ hreading.Lock\ blocks if the lock is already
held. The call site in src/ai_client.py:2210-2217 acquires the lock via
\with _deepseek_history_lock:\ (alias to ProviderHistory.lock), then calls
_rerepair_deepseek_history(_deepseek_history) which does \history[-1]\
(acquires the lock again -> DEADLOCK). This caused
tests/test_deepseek_provider.py::test_deepseek_completion_logic to hang
with a 30s timeout.
Fix: change \ hreading.Lock\ to \ hreading.RLock\ in ProviderHistory.
The dunders can now be safely called while the lock is already held.
Also removed:
- Duplicate @dataclass decorator on ProviderHistory (line 25-26)
- Duplicate _PROVIDER_HISTORIES dict declaration (lines 64-71 and 74-81)
Acceptance: test_deepseek_provider (7/7) + test_provider_state + test_ai_client_result + test_ai_client_tool_loop all pass.
OpenAI ChatMessage content can be either a string (simple text) or a list
of content parts (multimodal: text + image_url, etc.). Updated the type
annotation to match the actual API. No behavioral change; this is a
type-hint-only widening so callers can pass multimodal content via
ChatMessage instead of dicts.
Required by tests/test_openai_compatible.py::test_vision_multimodal_message
which was passing raw dicts to OpenAICompatibleRequest (wrong - the field
is typed list[ChatMessage]). With this widening, that test can now use
ChatMessage(role='user', content=[...multimodal parts]) without losing
type fidelity.
12 tests fail with:
TypeError: NormalizedResponse.__init__() got an unexpected keyword argument 'usage_input_tokens'
The @dataclass(frozen=True) auto-generated __init__ requires `usage: UsageStats`,
but 12 tests + 1 production site (src/ai_client.py:908) call it with the OLD
flat-kwarg API (usage_input_tokens=..., usage_output_tokens=..., etc.).
Change @dataclass(frozen=True) -> @dataclass(frozen=True, init=False) and add
a custom __init__ that accepts BOTH signatures:
- New: usage: UsageStats (used by current production code)
- Legacy: usage_input_tokens, usage_output_tokens, usage_cache_read_tokens,
usage_cache_creation_tokens (used by tests + 1 ai_client site)
If usage is None and any legacy flat kwarg is non-None, build a UsageStats
from the legacy kwargs. Otherwise use the provided usage. All field
assignments use object.__setattr__ because frozen=True locks __setattr__.
Verification:
- Legacy kwargs work: NormalizedResponse(text="hi", tool_calls=(), usage_input_tokens=10, usage_output_tokens=5, raw_response=None) sets usage.input_tokens=10
- New kwargs work: NormalizedResponse(text="hi", tool_calls=(), usage=UsageStats(1, 2)) sets usage directly
- 12 affected tests now pass (was 12 failed, 3 passed; now 15 passed)
compute_result_coverage() was implemented during the 14-phase plan but is
never called: synthesize_aggregate_profile() (now at ~line 1075) inlines
its own ResultCoverage construction via the actual AST analysis at
~line 1135-1145. The function has a latent bug at line 754 (was):
result_producers = total_producers
which hardcodes result_producers to 100% of total_producers regardless of
input — making the function return meaningless numbers.
Tests deleted in lockstep:
- tests/test_code_path_audit_phase78.py: test_compute_result_coverage_no_producers
- tests/test_code_path_audit_phase78.py: test_compute_result_coverage_full
The 'compute_result_coverage' import was also removed from the test file's
import block.
Verification:
- grep -c 'compute_result_coverage' src/code_path_audit.py = 0
- grep -c 'compute_result_coverage' tests/ = 0
- 125 of 125 remaining tests pass (was 127; -2 tests deleted)
The v2 postfix DSL parser (DSL_WORD_ARITY_V2, _atom, to_dsl_v2, parse_dsl_v2)
was implemented during the 14-phase DSL plan but never reached production:
run_audit() (line ~1217 after this change) only writes .md files (AUDIT_REPORT.md
plus per-aggregate markdowns via to_markdown/to_tree), never .dsl files. The DSL
parser carried latent arity bugs (DSL_WORD_ARITY_V2 declared 5 for 'result-coverage'
but writer emits 4; 4 for 'type-alias-coverage' but writer emits 3) which would
have caused silent parse failures.
Also removed the now-unused 'import re' statement (was only used by parse_dsl_v2).
The 'from datetime import date as date_mod' is retained (still used at line ~1259,
1275, 1291 in the markdown renderer).
Tests deleted in lockstep:
- tests/test_code_path_audit_phase78.py: test_dsl_word_arity_v2_14_new_words
- tests/test_code_path_audit_phase89.py: test_to_dsl_v2_includes_aggregate_kind_section,
test_parse_dsl_v2_round_trip_aggregate_kind, test_parse_dsl_v2_malformed
Verification:
- grep -c 'to_dsl_v2|parse_dsl_v2|DSL_WORD_ARITY_V2' src/code_path_audit.py = 0
- 127 of 127 remaining tests pass (was 131; -4 tests deleted)
The import statement appeared twice in quick succession (lines 655 and 658).
Both were identical and contributed nothing. Removed one. No functional change.
Verification:
- grep -c '^import json' src/code_path_audit.py = 1
- uv run python -c 'from src import code_path_audit' returns OK
- 124 tests in tests/test_code_path_audit*.py pass
Resolves audit_weak_types.py --strict regression (117 vs baseline 112 -> 104).
The regression was in src/openai_schemas.py (10 sites) and src/mcp_tool_specs.py
(4 sites), both files added after the 2026-06-21 baseline. JsonValue is the
canonical JSON-serializable data TypeAlias from src/type_aliases.py:22 and is a
structural superset of dict[str, Any], so consumers expecting the legacy shape
are unaffected. All 30 existing tests in tests/test_openai_schemas.py and
tests/test_mcp_tool_specs.py continue to pass.
Spec WHERE for t1.1 referenced code_path_audit*.py files but those modules
report 0 weak type findings per the audit (they use dict[str, int],
dict[str, dict], etc., not dict[str, Any]); see plan.md investigation note.
MVP pipeline simplification:
- render_rollups() now produces ONLY summary.md + AUDIT_REPORT.md
- run_audit() now produces only per-aggregate .md (no .dsl/.tree)
- New src/code_path_audit_gen.py generates the single coherent report
Stale artifacts moved to _stale/ subdirectory (preserved for history):
- 13 per-aggregate .dsl files (redundant with .md)
- 13 per-aggregate .tree files (redundant with .md)
- 9 old top-level rollups (cross_audit_summary, decomposition_matrix,
candidates, field_usage, call_graph, hot_paths, dead_fields,
ssdl_analysis, organization_deductions - all superseded by sections
inlined in AUDIT_REPORT.md)
- _stale/README.md explains what happened
Meta-audit updated to check .md files (14 required H2 sections per
aggregate) instead of .dsl files. 0 violations on 10 real profiles.
Tests: 131 passing. New MVP report: 5000+ lines.
Three real bugs fixed:
1. FunctionRef always used line=0. Now passes node.lineno from AST.
2. P3_pass results were discarded with bare pass. Now stored in
ProducerConsumerGraph.field_accesses.
3. Field-access detector only saw entry['key']; missed entry.get('key')
which is the dominant pattern in this codebase. Now handles both.
Plus _extract_type_name() helper handles Optional[T], dict[str, T],
list[T], Result[T], Union[T, ...], and T | None (PEP 604) so P1/P2
catch more annotation patterns.
Real numbers (Metadata aggregate):
- producers: 77 -> 117
- consumers: 35 -> 66
- field-access sites: 130 -> 173
- line numbers: all real (line 1281, 1746, etc.)
AUDIT_REPORT.md grew 2009 -> 3140 lines with real evidence.
Total audit output: 5176 lines / 50 files (was 2415 / 49).
All 131 tests still passing.
The previous code did Path(src_dir) / function_ref.file, which
double-prefixed (e.g. src/src/project_manager.py) and silently
returned empty. Fixed: if function_ref.file exists as
CWD-relative, use it directly. Only join if it doesn't exist.
Now 130 real field accesses detected across 35 Metadata consumers
in the 2026-06-22 audit output (was 0 before).
The aggregate_findings function now does 3-tier mapping:
1. Function lookup (find_enclosing_function) -> exact match
2. File-level fallback: if the finding's file has any
producer/consumer of the aggregate, bucket it there
3. Unbucketed (the file has no aggregate refs)
Handles both 'file' and 'filename' keys (v1 audit scripts use
'filename'; spec fixtures use 'file'). Path normalization
for Windows paths.
Generated the 6 real audit_inputs from scripts/audit_*.py
against real src/. The Metadata aggregate now shows:
- 1 unique weak_types finding (1 site, from ai_client.py:159)
- 1 unique exception_handling finding (76 sites from PARAM_OPTIONAL)
mcp_client.py shows 0 because no Metadata producer/consumer
exists in the PCG for mcp_client (P1/P2 only detect typed
parameter signatures, not internal field access). The next
gap is expanding P3 to capture internal field use.
Loops over audit_weak_types + audit_exception_handling from
the 6 audit_inputs, calls aggregate_cross_audit_findings per
audit, sums the buckets per profile.
Cross-audit aggregation is per-aggregate-flat (all findings go
into 1 bucket per audit). The 3-tier finding-to-aggregate
mapping (find_enclosing_function + type registry + file
heuristic) is the next gap - requires per-finding site
classification.
fqname, file, line, role. Used in ProducerConsumerGraph edges
and per-aggregate producer/consumer lists. Per error_handling.md
Pattern 1 (immutability for cross-thread safety).
2 unit tests passing.
Phase 0 of any_type_componentization_20260621. Extends src/type_aliases.py
with two recursive-friendly TypeAliases for JSON wire format (used by
Phase 5 api_hooks WebSocketMessage):
- JsonPrimitive: str | int | float | bool | None
- JsonValue: JsonPrimitive | list['JsonValue'] | dict[str, 'JsonValue']
The forward-ref 'JsonValue' strings work because from __future__ import
annotations is at the top of the module (PEP 563 + PEP 613 TypeAlias).
Tests added (4 new, 14 total):
- test_json_primitive_alias_resolves_to_union: hints exposes JsonPrimitive
- test_json_value_alias_resolves_to_recursive_union: hints exposes JsonValue
- test_json_value_accepts_primitive_dict: dict[str, JsonValue] runtime use
- test_json_value_accepts_nested_structures: nested dict+list round-trip
Verification:
uv run pytest tests/test_type_aliases.py --timeout=30
14 passed in 2.97s