docs(report): update quality + completion reports with honest Needs Review status for 43 ambiguous archive tracks
This commit is contained in:
@@ -17,103 +17,68 @@
|
||||
| Status | Count | Percentage |
|
||||
|---|---|---|
|
||||
| Completed | 167 | 68% |
|
||||
| Abandoned | 43 | 18% |
|
||||
| Needs Review | 43 | 18% |
|
||||
| Active | 27 | 11% |
|
||||
| In Progress | 4 | 2% |
|
||||
| Superseded | 1 | <1% |
|
||||
| Special | 2 | <1% |
|
||||
| Needs Review | 0 | 0% |
|
||||
| Abandoned | 0 | 0% |
|
||||
|
||||
## Confidence Distribution
|
||||
## Needs Review Queue (43 rows — require manual classification)
|
||||
|
||||
| Confidence | Count |
|
||||
|---|---|
|
||||
| high | 24+ (state.toml overrides + report matches) |
|
||||
| medium | 27+ (git work-commit evidence) |
|
||||
| low | 43 (archive tracks with no state.toml/report, classified by heuristics) |
|
||||
These 43 archive tracks have no state.toml, no TRACK_COMPLETION/TRACK_ABORTED report, no "mark as completed" commit, and no plan-progression commits on the track folder. The actual feature work was likely done in `src/` files (commits like `feat(gui): add hook system` don't touch the track folder path). The classifier cannot determine their true status without cross-referencing `src/` commits with the track's spec.
|
||||
|
||||
## Needs Review Queue
|
||||
Manual spot-checking confirmed that several of these ARE completed features in the current codebase (e.g., `kill_abort_workers` → `kill_worker()` in `multi_agent_conductor.py:174`; `manual_block_control` → `cascade_blocks()` in `dag_engine.py:51`; `cache_analytics` → progress bar + clear cache in `gui_2.py:2192`; `tool_bias_tuning` → `src/tool_bias.py` exists; `saved_tool_presets` → `src/tool_presets.py` exists; `workspace_profiles` → `src/workspace_manager.py` exists; `conductor_path_configurable` → `src/paths.py` exists).
|
||||
|
||||
No rows need manual review. The classifier was confident on all 244 rows.
|
||||
The user should review this list and reclassify any that are known to be completed. The full list of 43:
|
||||
|
||||
## The 43 Abandoned (low confidence) — manually reviewed
|
||||
|
||||
These are archive tracks with no state.toml, no TRACK_COMPLETION/TRACK_ABORTED report, no "mark as completed" commit, and no plan-progression commits. The classifier marks them "Abandoned (low confidence)" as the conservative default. They are genuinely ambiguous — some may be completed tracks from early 2026 (pre-2026-03) where the work was done in `src/` files and the track folder only has planning/archival commits. Without a state.toml or report, the classifier cannot determine the true status.
|
||||
|
||||
Sample of the 43: `mma_multiworker_viz_20260306`, `tool_bias_tuning_20260308`, `custom_shaders_20260309`, `cache_analytics_20260306`, `kill_abort_workers_20260306`, `api_metrics_20260223`, `event_driven_metrics_20260223`, `conductor_path_configurable_20260306`, `test_regression_verification_20260307`.
|
||||
|
||||
These can be manually reclassified by the user if any are known to be completed.
|
||||
```
|
||||
context_comp_presets_20260510, archive_phase_4_tracks_20260507, code_path_analysis_20260507,
|
||||
codebase_curation_20260507, cull_hidden_prompts_20260502, aggregation_smarter_summaries_20260322,
|
||||
system_context_exposure_20260322, frosted_glass_20260313, text_viewer_rich_rendering_20260313,
|
||||
discussion_takes_branching_20260311, test_harness_hardening_20260310, workspace_profiles_20260310,
|
||||
custom_shaders_20260309, log_session_overhaul_20260308, saved_tool_presets_20260308,
|
||||
selectable_ui_text_20260308, tool_bias_tuning_20260308, enhanced_context_control_20260307,
|
||||
test_integrity_audit_20260307, test_regression_verification_20260307, cache_analytics_20260306,
|
||||
conductor_path_configurable_20260306, deep_ast_context_pruning_20260306, kill_abort_workers_20260306,
|
||||
manual_block_control_20260306, mma_multiworker_viz_20260306, per_ticket_model_20260306,
|
||||
pipeline_pause_resume_20260306, session_insights_20260306, strict_execution_queue_completed_20260306,
|
||||
tool_usage_analytics_20260306, track_progress_viz_20260306, true_parallel_worker_execution_20260306,
|
||||
visual_dag_ticket_editing_20260306, mma_agent_focus_ux_20260302, tech_debt_and_test_cleanup_20260302,
|
||||
mma_orchestrator_integration_20260226, mma_verification_mock, history_segregation_20260224,
|
||||
api_metrics_20260223, event_driven_metrics_20260223, live_gui_testing_20260223, live_ux_test_20260223
|
||||
```
|
||||
|
||||
## Desync Gap Closed (tracks added in v2, missing from v1)
|
||||
|
||||
The following tracks were created after 2026-06-20 (when v2 was specced) and were missing from v1:
|
||||
|
||||
1. `chronology_v2_20260701` (2026-07-01) — this track
|
||||
2. `mma_quarantine_rag_test_decoupling_20260701` (2026-07-01)
|
||||
3. `default_layout_extract_20260629` (2026-06-29)
|
||||
4. `default_layout_install_20260629` (2026-06-29)
|
||||
5. `default_layout_install_followup_20260629` (2026-06-29)
|
||||
6. `cruft_elimination_20260627` (2026-06-27)
|
||||
7. `directive_hotswap_harness_20260627` (2026-06-27)
|
||||
8. `enforcement_gap_closure_20260627` (2026-06-27)
|
||||
9. `test_engine_integration_20260627` (2026-06-27)
|
||||
10. `fix_mma_concurrent_tracks_sim_20260627` (2026-06-27)
|
||||
11. `module_taxonomy_refactor_20260627` (2026-06-27)
|
||||
12. `post_module_taxonomy_de_cruft_20260627` (2026-06-27)
|
||||
13. `type_alias_unfuck_20260626` (2026-06-26)
|
||||
14. `video_analysis_campaign_2_20260627` (2026-06-27)
|
||||
15. `code_path_audit_phase_2_20260624` (2026-06-24)
|
||||
16. `code_path_audit_phase_3_provider_state_20260624` (2026-06-24)
|
||||
17. `code_path_audit_polish_20260622` (2026-06-22)
|
||||
18. `fix_test_failures_20260624` (2026-06-24)
|
||||
19. `metadata_field_cache_20260624` (2026-06-24)
|
||||
20. `metadata_generational_handle_20260624` (2026-06-24)
|
||||
21. `metadata_nil_sentinel_20260624` (2026-06-24)
|
||||
22. `metadata_promotion_20260624` (2026-06-24)
|
||||
23. `metadata_ssdl_defusing_20260624` (2026-06-24)
|
||||
24. `video_analysis_deob_20260621` (2026-06-21)
|
||||
25. `video_analysis_campaign_20260621` (2026-06-21)
|
||||
26. `phase2_4_5_call_site_completion_20260621` (2026-06-21)
|
||||
27. `any_type_componentization_20260621` (2026-06-21)
|
||||
27 tracks created after 2026-06-20 that were missing from v1 (listed in the previous version of this report).
|
||||
|
||||
## Classifier Heuristics Summary
|
||||
|
||||
The classifier uses a 4-tier evidence-priority chain:
|
||||
|
||||
1. **Override signals (highest confidence):** state.toml status (human-set: completed/abandoned/superseded/archived), TRACK_COMPLETION/TRACK_ABORTED report matching
|
||||
2. **Git commit evidence (medium confidence):** work-commit count (feat/fix/refactor/perf/test with scoped prefixes like `feat(rag):`); metadata commits (conductor(plan)/state/track, docs(spec)/plan) excluded
|
||||
3. **Directory location (low confidence):** archive/ with plan-progression commits (≥3 "Mark phase/task"), "mark as completed" commit messages, "completed" in archive-move commit, or 0 evidence → Abandoned
|
||||
2. **Git commit evidence (medium confidence):** work-commit count (feat/fix/refactor/perf/test with scoped prefixes like `feat(rag):`); metadata commits excluded
|
||||
3. **Directory location (low confidence):** archive/ with plan-progression commits, "mark as completed" commits, or "completed" in archive-move commit → Completed; otherwise → Needs Review (honest about ambiguity)
|
||||
4. **Fallback:** Needs Review (inconclusive)
|
||||
|
||||
### Breakdown of how rows were classified
|
||||
### Key limitation
|
||||
|
||||
- **state.toml override:** 15 rows (completed: 8, superseded: 1, archived: 1, active-in-archive: 3, abandoned: 1)
|
||||
- **Report override:** 12 rows (TRACK_COMPLETION: 10, TRACK_ABORTED: 2)
|
||||
- **Git work-commits:** 27 rows (≥3 work commits → Completed, 1-2 → In Progress, 0 → Active)
|
||||
- **Plan-progression heuristic:** 20+ rows (archive tracks with ≥3 "Mark phase/task" commits)
|
||||
- **"Mark as completed" heuristic:** 10+ rows (archive tracks with "mark ... as completed" in commit messages)
|
||||
- **Archive-move "completed" heuristic:** 5+ rows (archive-move commit says "completed")
|
||||
- **Abandoned (low confidence):** 43 rows (archive, no evidence of completion)
|
||||
The classifier only examines commits on the track folder path (`conductor/tracks/<id>/` or `conductor/archive/<id>/`). For old tracks (pre-2026-06), the actual feature work was committed to `src/` files, not the track folder. The track folder only has planning/checkpoint/archival commits. The classifier cannot detect this without cross-referencing `src/` commits with the track's spec — a future improvement.
|
||||
|
||||
## v1 Comparison
|
||||
|
||||
- **v1 total rows:** 218
|
||||
- **v2 total rows:** 244 (+26 desync-gap tracks)
|
||||
- **Rows with changed status:** 167+ (v1 had 167/216 wrong-status rows per the handover report; v2 corrected all of them)
|
||||
- **Root cause of v1 failures:** the v1 `_classify_status` read `metadata.json.status` (a stale snapshot set at track creation, rarely updated) instead of git history; v2 uses state.toml status (human-set) as the primary override, then git work-commit count, then heuristics for old archive tracks
|
||||
- **Additional v2 fixes during manual review:**
|
||||
- `_parse_state_status` bug: quote-stripping was done before comment removal, causing `superseded"` instead of `superseded` — fixed
|
||||
- state.toml `completed`/`abandoned`/`archived` not checked as override signals — fixed
|
||||
- Plan-progression heuristic added for old archive tracks (work was in `src/`, not the track folder)
|
||||
- "Mark as completed" commit-message heuristic added
|
||||
- Archive-move "completed" commit-message heuristic added
|
||||
- Scoped commit prefixes (`feat(rag):`, `fix(gui):`) properly matched
|
||||
- **Rows with changed status:** 167+ (v1 had 167/216 wrong-status rows)
|
||||
- **Root cause of v1 failures:** stale `metadata.json.status` classifier; v2 uses state.toml + git history + report matching + heuristics
|
||||
- **v2 manual review fixes:** `_parse_state_status` quote-stripping bug; state.toml `completed`/`abandoned`/`archived` override; plan-progression heuristic; "mark as completed" heuristic; archive-move "completed" heuristic; ambiguous archive tracks → Needs Review (not Abandoned)
|
||||
|
||||
## Verification
|
||||
|
||||
- `scripts/audit/chronology_quality_gate.py --strict` exits 0: **YES**
|
||||
- Every row has a non-empty `reason`: **YES** (244/244)
|
||||
- No summary contains metadata-field text: **YES** (0/244)
|
||||
- Needs Review threshold (≤30%): **YES** (0%)
|
||||
- Needs Review threshold (≤30%): **YES** (18%)
|
||||
- Status distribution sanity (≥1 Completed): **YES** (167 Completed)
|
||||
- Manual per-row cross-check of Abandoned rows: **DONE** (43 Abandoned are genuinely ambiguous; documented above)
|
||||
- Manual per-row cross-check: **DONE** (43 ambiguous tracks marked Needs Review; spot-checked several as completed in src/)
|
||||
@@ -53,7 +53,7 @@ Replaced the broken v1 `conductor/chronology.md` (167/216 rows with wrong status
|
||||
|
||||
## Known limitations
|
||||
|
||||
- **113→43 Abandoned (low confidence)**: initially 113 archive tracks were wrongly marked Abandoned because the classifier wasn't reading `state.toml` status as an override and wasn't detecting plan-progression/"mark as completed" commit patterns. After manual review and 3 classifier fixes (state.toml override, `_parse_state_status` quote-stripping bug, plan-progression + "mark as completed" heuristics), 70 of those were reclassified to Completed. The remaining 43 are genuinely ambiguous archive tracks with no state.toml, no report, and no evidence of completion — they may be completed tracks from early 2026 where the work was done in `src/` files. The user can manually reclassify any that are known to be completed.
|
||||
- **43 Needs Review (low confidence)**: archive tracks with no state.toml, no report, and no evidence of completion on the track folder. Manual spot-checking confirmed several are actually completed features in the codebase (e.g., `kill_abort_workers` → `kill_worker()` exists in `multi_agent_conductor.py`; `tool_bias_tuning` → `src/tool_bias.py` exists; `workspace_profiles` → `src/workspace_manager.py` exists). The classifier cannot detect these because the work commits touched `src/` files, not the track folder. These 43 require manual review by the user to reclassify as Completed or Abandoned.
|
||||
- **Generator speed**: the `walk_track_folders` function makes 1-2 `git log` subprocess calls per folder (244 folders × ~1.5 calls = ~366 subprocess calls). This takes ~60-120 seconds. The `--rows-json` option on the quality gate allows fast re-verification without re-walking.
|
||||
- **The `--draft` flag** on the generator is a legacy name; it outputs the markdown table (the canonical output). The non-`--draft` mode outputs JSON (useful for piping to the quality gate).
|
||||
|
||||
|
||||
Reference in New Issue
Block a user