# Track: License & CVE Audit (Dependency Compliance) **Status:** Spec approved 2026-06-07 **Initialized:** 2026-06-07 **Owner:** Tier 2 Tech Lead **Priority:** High (compliance + security; CI gate) --- ## Overview Build `scripts/audit_license_cve.py` — a single audit script that checks third-party dependencies (in `pyproject.toml` + `uv.lock` transitive tree) for: (1) license compliance against the project's policy, (2) known CVEs (via `pip-audit` subprocess), and (3) version-pinning (every direct dep must have a `~X.Y.Z` bound). The script also scans source-file license headers (`SPDX-License-Identifier`) in `src/**/*.py` and `scripts/**/*.py`. Then apply the fixes: tilde-pin all direct deps, delete `requirements.txt` (redundant with `uv.lock`), regenerate `uv.lock`, add `--strict` mode + baseline file (CI gate). One script, one CI gate, one report. The track is **scope-limited to third-party dependencies**. The project's own LICENSE file and SPDX/Copyright headers are explicitly OUT OF SCOPE — the user reserves all rights to the repo and has not picked a project license yet. The audit reports third-party state only; it does not assert or imply a project license, and it does not create a `LICENSE` file. ## Current State Audit (as of `9796fe27`) - `pyproject.toml` has 14 direct deps with **mixed pinning**: - 7 unconstrained: `"imgui-bundle"`, `"anthropic"`, `"google-genai"`, `"openai"`, `"fastapi"`, `"mcp"`, `"uvicorn"` - 6 with `>=X.Y.Z`: `"pyopengl>=3.1.10"`, `"tree-sitter>=0.25.2"`, `"tree-sitter-python>=0.25.0"`, `"tree-sitter-c>=0.23.2"`, `"tree-sitter-cpp>=0.23.2"`, `"psutil>=7.2.2"`, `"chromadb>=1.5.8"` - `"tomli-w"`, `"pytest-timeout>=2.4.0"` - `uv.lock` exists; `requirements.txt` exists (duplicates lock — will be removed) - No `LICENSE` file in repo root (user's chosen posture: all rights reserved; the audit reports this as informational, not a violation) - No source-file `SPDX-License-Identifier` headers in `src/**/*.py` or `scripts/**/*.py` (informational note; not a violation — the user hasn't picked a project license yet) - No `vendor/`, `third_party/`, or vendored C/C++ in the repo tree (the scan is defensive for the future) - 0 existing license/CVE audit tools in `scripts/` - The 3 existing audit scripts (`audit_main_thread_imports.py`, `audit_weak_types.py`, `check_test_toml_paths.py`) follow the project pattern of `scripts/audit_.py` + `scripts/audit_.baseline.json` + `--strict` mode for CI gates (per `conductor/workflow.md` "Audit Script Policy"). The new track follows the same pattern. ### Already Implemented (DO NOT re-implement; KEEP / build on) 1. **The 3 existing audit scripts** in `scripts/`. They define the project pattern for audit + CI gate. The new `scripts/audit_license_cve.py` follows the same shape. 2. **`uv.lock`** — the canonical lock file for the project. The audit reads it for transitive resolution. 3. **`importlib.metadata`** (Python 3.11+ stdlib) — gives `License` and `License-Expression` per installed distribution. No new pip dep needed for the license check. 4. **`tomllib`** (Python 3.11+ stdlib) — parses `pyproject.toml`. No new pip dep needed for the pin check. 5. **`pip-audit`** (PyPA tool) — invoked as a subprocess for the CVE check. `pip-audit` itself is NOT a project dep; it's installed via `uv tool install pip-audit` or `uvx pip-audit` if the user wants the CVE check. The script detects missing `pip-audit` and logs a warning; license + pin checks still run. ### Gaps to Fill (this track's scope) - `scripts/audit_license_cve.py` (~300 lines, 3 internal checks + `--strict` + `--dump-baseline`) - `scripts/audit_license_cve.baseline.json` (zero-violation post-cleanup state for `--strict` mode) - `docs/reports/license_cve_audit/2026-06-07/initial.md` and `final.md` (the human-readable reports) - Updates to `pyproject.toml` (tilde-pin every direct dep) - Updated `uv.lock` (regenerated) - Deletion of `requirements.txt` - `tests/test_audit_license_cve.py` (TDD unit tests) ## Goals 1. **Single audit script** that runs all four checks (license + CVE + pin + source-header) and emits a unified report. 2. **CI gate** via `--strict` mode + baseline file. Mirrors the 3 existing audit scripts. Fails on any new violation OR any new CVE. 3. **Tilde-pin every direct dep** in `pyproject.toml` (`~X.Y.Z` = `>=X.Y.Z,//initial.md` or `final.md`. - **`--strict` mode:** exits non-zero if violations > baseline. For CI. - **`--dump-baseline`:** writes the current violation set as the new baseline. For intentional changes (e.g., a new dep is added; the user accepts its license). ### Internal structure (3 checks + 1 scan) ```python def check_licenses() -> list[Violation]: ... # iterates dist.metadata; classifies def check_cves() -> list[Violation]: ... # subprocess pip-audit; parses JSON def check_pins() -> list[Violation]: ... # tomllib parse; flag missing/loose pins def check_source_headers() -> list[Violation]: ... # pathlib rglob; SPDX regex def main(): violations = [] for check in (check_licenses, check_cves, check_pins, check_source_headers): violations.extend(check()) for v in violations: print(v.format_stdout()) # parseable line-per-violation write_markdown_report(violations) if args.strict and len(violations) > len(load_baseline()): sys.exit(1) if args.dump_baseline: dump_baseline(violations) ``` ### Cost model (the 4 checks) | Check | Mechanism | New deps? | |-------|-----------|-----------| | **License** | `importlib.metadata.distribution(name).metadata.get("License")` + `License-Expression` (Python 3.11+ stdlib). For each direct + transitive dep, classify the license string against the policy table. Unknown / unparseable / missing → violation. | None (stdlib) | | **CVE** | Subprocess call to `pip-audit --format=json --strict` (a `uv tool install pip-audit` dev tool; the project itself doesn't depend on it). If `pip-audit` isn't installed, log a warning + skip the CVE check; license + pin still run. Air-gapped CI: CVE check returns no results (not a failure). | None in `pyproject.toml`; `pip-audit` is an optional dev tool. | | **Version pin** | `tomllib.load(pyproject.toml)` (stdlib). For each entry in `[project].dependencies`, check the version specifier. Flags: (a) no specifier at all, (b) no lower bound. Accepts any lower bound as a soft check (the user's choice is tilde, but the script doesn't enforce tilde specifically — it enforces "has a lower bound"). | None (stdlib) | | **Source header** | `pathlib.Path(src_dir).rglob("*.py")`, read first 20 lines of each, regex-look for `SPDX-License-Identifier:` (case-insensitive). If present and in the blocklist → violation. If no SPDX → no violation (informational note). | None (stdlib) | ## License Policy (encoded in the script) ### Allowlist (permissive or weak copyleft, import-safe in Python) - **Permissive:** MIT, BSD (2-clause + 3-clause), Apache 2.0, ISC, Unlicense, Zlib, Python-2.0, 0BSD, PSF-2.0 - **Weak copyleft (import-safe in Python):** LGPL (2.1, 3.0), MPL-2.0 - **Public domain:** CC0, Unlicense, WTFPL (The script's allowlist is the canonical source of truth for the per-license table; see `scripts/audit_license_cve.py` for the current list. New licenses can be added by editing that table; no spec change needed.) ### Blocklist (non-permissive / restricted-source) The blocklist is for licenses that are **non-OSI** or that impose **restrictions beyond standard copyleft terms** (permissive or copyleft). The unifying technical property: the license restricts how downstream users can use the software in ways that standard open-source licenses do not. | License | Specific restriction | |---------|---------------------| | **GPL** (any version) | Strong copyleft; viral licensing; downstream users must release derivative works under GPL | | **AGPL** (any version) | Network copyleft; downstream SaaS users must release source under AGPL | | **SSPL** (MongoDB, 2018) | "If you offer the software as a service, you must release the entire stack under SSPL" — broad service-provider trigger | | **BSL / BUSL** (Business Source License) | Source-available with a delayed open-source conversion; competitive-use restriction during the delay | | **Commons Clause** | Addendum to an open-source license; adds "you may not sell the software" — targets SaaS reselling | | **Elastic License v2** (Elastic NV, 2021) | "You may not offer the software as a managed service that competes with Elastic" | | **Unknown / unparseable** (e.g., `UNKNOWN`, `Custom`, `see AUTHORS`) | Not classifiable; flagged for manual review; never auto-pass | | **Missing license metadata** | Catches packaging bugs | ### Decision rule (in the script) ``` if license in BLOCKLIST: violation elif license in ALLOWLIST: pass else: # unknown / unparseable / unclassified violation (flag for manual review; never auto-pass) ``` The two lists are explicit, not heuristic. Adding a new license to either list is a one-line code change. The script's `--help` references the policy table for transparency. ## Output Format ### Stdout (line-per-violation, parseable) ``` LICENSE_VIOLATION pkg=foo license="GPL-3.0" via=bar==2.0 CVE_FOUND pkg=baz cve_id=CVE-2024-12345 severity=high fix_versions=">=1.2.3" PIN_MISSING pkg=qux (no version specifier in pyproject.toml) SPDX_VIOLATION file=src/some_module.py license="GPL-3.0" ``` Each line is a stable parseable format; CI can grep for `VIOLATION|FOUND|MISSING` and `exit 1` on any match. ### Markdown report (in `docs/reports/license_cve_audit//`) - `initial.md` — the discovered violations (committed in Phase 1) - `final.md` — the post-cleanup state (committed in Phase 2, after tilde-pinning + lock regen) Structure: ```markdown # License & CVE Audit — 2026-06-07 ## Top-level summary - License violations: 0 - CVEs found: 0 - Pinning issues: 0 - SPDX violations in src/ or scripts/: 0 ## Notes - No `LICENSE` file in repo root — informational, not a violation. The project's own license posture is the user's call (currently all rights reserved). - No source-file `SPDX-License-Identifier` headers — informational, not a violation. The project's own copyright headers are the user's call. - pip-audit not installed → CVE check skipped. Install via `uv tool install pip-audit` to enable. ## Per-violation table | Type | Package | License / CVE / Pin | Via | |------|---------|---------------------|-----| | ... | ... | ... | ... | ``` ### Baseline file (`scripts/audit_license_cve.baseline.json`) Internal state for `--strict` mode. JSON because it matches the existing convention (`scripts/audit_weak_types.baseline.json`). Not the user-facing report; not in the output surface. Format: ```json { "schema_version": 1, "baseline_violations": [], "baseline_date": "2026-06-07", "notes": "Zero-violation state after the tilde-pinning + lock regen in this track." } ``` `--strict` mode loads this file and fails CI if `len(current_violations) > len(baseline_violations)`. The user's intentional changes (e.g., adding a new dep with an acceptable license) are recorded by re-running with `--dump-baseline`. ## Commit Structure (4 atomic commits, in order) ``` 1. chore(audit): add license_cve audit script + initial report - scripts/audit_license_cve.py (initial version, informational mode) - docs/reports/license_cve_audit/2026-06-07/initial.md (the discovered violations) 2. chore(deps): tilde-pin all deps; delete requirements.txt - pyproject.toml (every direct dep gets ~X.Y.Z or stays as >=X.Y.Z) - uv.lock (regenerated) - requirements.txt (deleted; was redundant with lock) 3. chore(audit): add --strict mode + baseline file (CI gate) - scripts/audit_license_cve.py (extends with --strict + baseline diff) - scripts/audit_license_cve.baseline.json (zero-violation post-cleanup state) 4. conductor(tracks): mark License CVE Audit track complete - tracks.md update ``` Each commit message includes a `git notes add -m "..."` summary per `conductor/workflow.md`. ## Verification (TDD per `conductor/workflow.md`) Unit tests in `tests/test_audit_license_cve.py`: - License classifier: a known fixture package list with various licenses → correct classification (blocklist + allowlist + unknown). - Blocklist enforcement: each entry (GPL, AGPL, SSPL, BSL, BUSL, Commons Clause, Elastic v2, unknown, missing) → correctly flagged. - Allowlist enforcement: each entry (MIT, BSD, Apache 2.0, ISC, Unlicense, Zlib, Python-2.0, LGPL, MPL-2.0, CC0, WTFPL) → correctly passes. - Pin check: synthetic `pyproject.toml` with mixed pinning (no bound, `>=X.Y`, `~X.Y.Z`, exact) → correct flags. - Source header check: synthetic `.py` with `SPDX-License-Identifier: GPL-3.0` → flagged; with no SPDX → no violation. - `--strict` mode: violations > baseline → exit 1; violations == baseline → exit 0; new violation (delta > 0) → exit 1. - `--dump-baseline`: writes a baseline file matching the current violation set. ## Risks | Risk | Likelihood | Impact | Mitigation | |------|-----------|--------|------------| | Some packages' license metadata is missing or unparseable in `importlib.metadata` | High | Medium (false positives on unknown) | The policy treats `UNKNOWN` as violation → manual review catches the right answer; the report's notes section lists the unknowns explicitly | | `pip-audit` not installed in CI | Medium | Low (CVE check is a no-op) | Script detects missing `pip-audit` and logs a warning; license + pin checks still run | | Air-gapped CI can't reach OSV / PyPI advisory DBs | Medium | Low (CVE check returns no results) | Document; a follow-up could add offline CVE support, not in this track | | Pinning decisions are subjective (some deps deserve looser bounds than others) | Medium | Low (initial pass is conservative) | The pin check accepts any lower bound as a soft check; the user can loosen specific deps via the baseline file | | The baseline file becomes a "shadow ledger" — needs maintenance when intentional changes are made | Medium | Low (intentional) | Document the update workflow in the script's `--help`; `--dump-baseline` regenerates the baseline after an intentional change | | The project's own LICENSE absence might confuse a future contributor who doesn't know the user's posture | Low | Low | The report's notes section explicitly calls this out: "no LICENSE in repo root — informational, not a violation; project's own license is the user's call (currently all rights reserved)" | | A dep is added with a license that doesn't match the script's allowlist/blocklist (e.g., a new "BSL 2.0" variant) | Low | Low | The script's default rule (unknown = violation) catches it; the report's notes section surfaces it for review; one-line add to the appropriate list | ## Follow-up - `air_gapped_cve_check_20260607` (NOT in this track): add offline CVE support for air-gapped CI environments that can't reach OSV / PyPI. The CVE check would ship a snapshot of the advisory DBs (or use a local mirror). - `cve_auto_remediation_20260607` (NOT in this track): when a CVE is found, auto-bump the dep to the fix version (within the pin range) and re-run the audit. Out of scope here; this track REPORTS, the user DECIDES. ## Coordination with Pending Tracks This track has **no blockers** and **no conflicts** with the 5 active planned tracks. It modifies: - `pyproject.toml` (version pins; could affect resolution for any future track that depends on something) - `uv.lock` (regenerated; the lock file changes) - `requirements.txt` (deleted; was redundant with lock) - New: `scripts/audit_license_cve.py`, `scripts/audit_license_cve.baseline.json`, `docs/reports/license_cve_audit/2026-06-07/` It does NOT modify `src/`, `tests/`, or any of the 5 planned tracks' files. The deleted `requirements.txt` is a separate file from the 5 planned tracks' scope. Can ship independently and in parallel with the 5 planned tracks. The tilde-pinning in this track is a STRENGTHENING of the dep contract, not a loosening — it doesn't break any existing test or any other track's plan. ## Out of Scope - The project's own `LICENSE` file (user's decision; the track will not create one). - The project's own `SPDX-License-Identifier` / `Copyright` headers in `src/` (user's decision; the track will not add or modify). - Any recommendation on what license the user should pick for the project. - Patching CVEs in transitive deps (the track REPORTS; the user decides whether to wait for upstream or replace). - Auto-bumping versions to address CVEs (manual decision; the track reports, the user acts). - Modifying any third-party code already in the repo (none currently; the scan is defensive for the future). - License/header updates to vendored C/C++ (none currently vendored; the scan is defensive). - The local-rag optional dependency group (`sentence-transformers`); covered by the same audit but pinning happens in the same `pyproject.toml` edit. ## See Also - `conductor/workflow.md` "Audit Script Policy" — the convention this track follows. - `scripts/audit_main_thread_imports.py`, `scripts/audit_weak_types.py`, `scripts/check_test_toml_paths.py` — the 3 existing audit scripts; the new track follows the same shape. - `scripts/audit_weak_types.baseline.json` — the baseline file pattern (the new `scripts/audit_license_cve.baseline.json` mirrors this). - [OSI Approved Licenses](https://opensource.org/licenses/) — the de facto list of "open source" licenses; the script's policy is consistent with this list (with the addition of LGPL / MPL-2.0 in transitive deps for Python import-safety). - `pip-audit` (PyPA) — the CVE-checking tool invoked as a subprocess. Optional; the script handles its absence gracefully.