Private
Public Access
0
0
Files
manual_slop/conductor/tracks/license_cve_audit_20260607/spec.md
T
ed 61b5572e2b chore(audit): spec license_cve_audit track (compliance + CVE + pinning)
Builds scripts/audit_license_cve.py: single audit script that
checks third-party deps (pyproject.toml + uv.lock transitive
tree) for: (1) license compliance against the project's policy,
(2) known CVEs (via pip-audit subprocess), (3) version-pinning,
and (4) source-file SPDX license headers in src/ and scripts/.

LICENSE POLICY (encoded in the script)
Allowlist (permissive or weak copyleft or public domain):
- Permissive: MIT, BSD, Apache 2.0, ISC, Unlicense, Zlib,
  Python-2.0, 0BSD, PSF-2.0
- Weak copyleft (Python import-safe): LGPL 2.1/3.0, MPL-2.0
- Public domain: CC0, WTFPL

Blocklist (non-OSI / restricted-source):
- GPL (any version), AGPL (any version)
- SSPL (MongoDB 2018) - broad service-provider trigger
- BSL / BUSL - delayed open source; competitive-use restriction
- Commons Clause - 'cannot sell the software' addendum
- Elastic License v2 - 'cannot offer as managed service'
- Unknown / unparseable / missing metadata (catches packaging
  bugs and custom licenses)

The two lists are explicit. Default rule: unknown = violation
(never auto-pass). The script's --help references the policy
table for transparency. Specific per-license additions go in
scripts/audit_license_cve.py directly; no spec change needed.

TRACK SCOPE
In scope: third-party deps (direct + transitive), source-file
SPDX headers, vendored libraries (defensive), version pinning.
Out of scope: the project's own LICENSE file, project's own
SPDX/Copyright headers, recommendations on project license.
The user reserves all rights to the repo; no LICENSE file is
created by the track. The audit reports third-party state only.

OUTPUT FORMAT (sanitized: no JSON in user-facing output)
- Stdout: line-per-violation, parseable by eye and by grep
- Markdown report in docs/reports/license_cve_audit/2026-06-07/
- Baseline file: JSON (matches existing audit_weak_types
  convention; internal state for --strict mode only)

CI GATE
--strict mode + scripts/audit_license_cve.baseline.json. Fails
CI on any new violation OR any new CVE. Mirrors the 3 existing
audit scripts (audit_main_thread_imports, audit_weak_types,
check_test_toml_paths).

COMMITS PLANNED
1. chore(audit): add license_cve audit script + initial report
2. chore(deps): tilde-pin all deps; delete requirements.txt
3. chore(audit): add --strict mode + baseline file (CI gate)
4. conductor(tracks): mark License CVE Audit track complete

NO NEW PIP DEPENDENCIES IN PROJECT
Pure stdlib (importlib.metadata, tomllib, pathlib, re) +
subprocess to pip-audit (an optional dev tool, installed via
'uv tool install pip-audit' if user wants CVE checks).
2026-06-07 14:26:22 -04:00

19 KiB

Track: License & CVE Audit (Dependency Compliance)

Status: Spec approved 2026-06-07 Initialized: 2026-06-07 Owner: Tier 2 Tech Lead Priority: High (compliance + security; CI gate)


Overview

Build scripts/audit_license_cve.py — a single audit script that checks third-party dependencies (in pyproject.toml + uv.lock transitive tree) for: (1) license compliance against the project's policy, (2) known CVEs (via pip-audit subprocess), and (3) version-pinning (every direct dep must have a ~X.Y.Z bound). The script also scans source-file license headers (SPDX-License-Identifier) in src/**/*.py and scripts/**/*.py. Then apply the fixes: tilde-pin all direct deps, delete requirements.txt (redundant with uv.lock), regenerate uv.lock, add --strict mode + baseline file (CI gate). One script, one CI gate, one report.

The track is scope-limited to third-party dependencies. The project's own LICENSE file and SPDX/Copyright headers are explicitly OUT OF SCOPE — the user reserves all rights to the repo and has not picked a project license yet. The audit reports third-party state only; it does not assert or imply a project license, and it does not create a LICENSE file.

Current State Audit (as of 9796fe27)

  • pyproject.toml has 14 direct deps with mixed pinning:
    • 7 unconstrained: "imgui-bundle", "anthropic", "google-genai", "openai", "fastapi", "mcp", "uvicorn"
    • 6 with >=X.Y.Z: "pyopengl>=3.1.10", "tree-sitter>=0.25.2", "tree-sitter-python>=0.25.0", "tree-sitter-c>=0.23.2", "tree-sitter-cpp>=0.23.2", "psutil>=7.2.2", "chromadb>=1.5.8"
    • "tomli-w", "pytest-timeout>=2.4.0"
  • uv.lock exists; requirements.txt exists (duplicates lock — will be removed)
  • No LICENSE file in repo root (user's chosen posture: all rights reserved; the audit reports this as informational, not a violation)
  • No source-file SPDX-License-Identifier headers in src/**/*.py or scripts/**/*.py (informational note; not a violation — the user hasn't picked a project license yet)
  • No vendor/, third_party/, or vendored C/C++ in the repo tree (the scan is defensive for the future)
  • 0 existing license/CVE audit tools in scripts/
  • The 3 existing audit scripts (audit_main_thread_imports.py, audit_weak_types.py, check_test_toml_paths.py) follow the project pattern of scripts/audit_<name>.py + scripts/audit_<name>.baseline.json + --strict mode for CI gates (per conductor/workflow.md "Audit Script Policy"). The new track follows the same pattern.

Already Implemented (DO NOT re-implement; KEEP / build on)

  1. The 3 existing audit scripts in scripts/. They define the project pattern for audit + CI gate. The new scripts/audit_license_cve.py follows the same shape.
  2. uv.lock — the canonical lock file for the project. The audit reads it for transitive resolution.
  3. importlib.metadata (Python 3.11+ stdlib) — gives License and License-Expression per installed distribution. No new pip dep needed for the license check.
  4. tomllib (Python 3.11+ stdlib) — parses pyproject.toml. No new pip dep needed for the pin check.
  5. pip-audit (PyPA tool) — invoked as a subprocess for the CVE check. pip-audit itself is NOT a project dep; it's installed via uv tool install pip-audit or uvx pip-audit if the user wants the CVE check. The script detects missing pip-audit and logs a warning; license + pin checks still run.

Gaps to Fill (this track's scope)

  • scripts/audit_license_cve.py (~300 lines, 3 internal checks + --strict + --dump-baseline)
  • scripts/audit_license_cve.baseline.json (zero-violation post-cleanup state for --strict mode)
  • docs/reports/license_cve_audit/2026-06-07/initial.md and final.md (the human-readable reports)
  • Updates to pyproject.toml (tilde-pin every direct dep)
  • Updated uv.lock (regenerated)
  • Deletion of requirements.txt
  • tests/test_audit_license_cve.py (TDD unit tests)

Goals

  1. Single audit script that runs all four checks (license + CVE + pin + source-header) and emits a unified report.
  2. CI gate via --strict mode + baseline file. Mirrors the 3 existing audit scripts. Fails on any new violation OR any new CVE.
  3. Tilde-pin every direct dep in pyproject.toml (~X.Y.Z = >=X.Y.Z,<X.(Y+1).0).
  4. Delete requirements.txt (duplicates uv.lock; redundant in a uv project).
  5. Re-run uv lock to refresh the lock file with the new bounds.
  6. Document the non-OSI / restricted-source category in the policy table of the script (so future contributors understand why these licenses are blocked).
  7. Preserve the user's "all rights reserved" posture — no LICENSE file is created; no project-level SPDX headers are added.

Non-Goals

  • The project's own LICENSE file (user's decision; not creating one).
  • The project's own SPDX-License-Identifier / Copyright headers (user's decision; not adding or modifying).
  • Any recommendation on what license the user should pick for the project.
  • Patching CVEs in transitive deps (the track REPORTS; the user decides whether to wait for upstream or replace).
  • Auto-bumping versions to address CVEs (manual decision; the track reports, the user acts).
  • Modifying any third-party code already in the repo (none currently; the scan is defensive for the future).
  • License/header updates to vendored C/C++ (none currently vendored; the scan is defensive).
  • The local-rag optional dependency group (sentence-transformers); covered by the same audit but pinning happens in the same pyproject.toml edit.

Architecture

scripts/audit_license_cve.py — single audit script, ~300 lines. No new pip dep required (stdlib + subprocess to pip-audit).

Public API (CLI)

uv run python scripts/audit_license_cve.py [--src src] [--scripts scripts] \
    [--report-dir docs/reports/license_cve_audit] [--date YYYY-MM-DD] \
    [--strict] [--dump-baseline]
  • Default mode: informational. Prints violations to stdout (line-per-violation format). Writes markdown report to <report-dir>/<date>/initial.md or final.md.
  • --strict mode: exits non-zero if violations > baseline. For CI.
  • --dump-baseline: writes the current violation set as the new baseline. For intentional changes (e.g., a new dep is added; the user accepts its license).

Internal structure (3 checks + 1 scan)

def check_licenses() -> list[Violation]: ...     # iterates dist.metadata; classifies
def check_cves() -> list[Violation]: ...          # subprocess pip-audit; parses JSON
def check_pins() -> list[Violation]: ...          # tomllib parse; flag missing/loose pins
def check_source_headers() -> list[Violation]: ... # pathlib rglob; SPDX regex

def main():
    violations = []
    for check in (check_licenses, check_cves, check_pins, check_source_headers):
        violations.extend(check())
    for v in violations:
        print(v.format_stdout())     # parseable line-per-violation
    write_markdown_report(violations)
    if args.strict and len(violations) > len(load_baseline()):
        sys.exit(1)
    if args.dump_baseline:
        dump_baseline(violations)

Cost model (the 4 checks)

Check Mechanism New deps?
License importlib.metadata.distribution(name).metadata.get("License") + License-Expression (Python 3.11+ stdlib). For each direct + transitive dep, classify the license string against the policy table. Unknown / unparseable / missing → violation. None (stdlib)
CVE Subprocess call to pip-audit --format=json --strict (a uv tool install pip-audit dev tool; the project itself doesn't depend on it). If pip-audit isn't installed, log a warning + skip the CVE check; license + pin still run. Air-gapped CI: CVE check returns no results (not a failure). None in pyproject.toml; pip-audit is an optional dev tool.
Version pin tomllib.load(pyproject.toml) (stdlib). For each entry in [project].dependencies, check the version specifier. Flags: (a) no specifier at all, (b) no lower bound. Accepts any lower bound as a soft check (the user's choice is tilde, but the script doesn't enforce tilde specifically — it enforces "has a lower bound"). None (stdlib)
Source header pathlib.Path(src_dir).rglob("*.py"), read first 20 lines of each, regex-look for SPDX-License-Identifier: (case-insensitive). If present and in the blocklist → violation. If no SPDX → no violation (informational note). None (stdlib)

License Policy (encoded in the script)

Allowlist (permissive or weak copyleft, import-safe in Python)

  • Permissive: MIT, BSD (2-clause + 3-clause), Apache 2.0, ISC, Unlicense, Zlib, Python-2.0, 0BSD, PSF-2.0
  • Weak copyleft (import-safe in Python): LGPL (2.1, 3.0), MPL-2.0
  • Public domain: CC0, Unlicense, WTFPL

(The script's allowlist is the canonical source of truth for the per-license table; see scripts/audit_license_cve.py for the current list. New licenses can be added by editing that table; no spec change needed.)

Blocklist (non-permissive / restricted-source)

The blocklist is for licenses that are non-OSI or that impose restrictions beyond standard copyleft terms (permissive or copyleft). The unifying technical property: the license restricts how downstream users can use the software in ways that standard open-source licenses do not.

License Specific restriction
GPL (any version) Strong copyleft; viral licensing; downstream users must release derivative works under GPL
AGPL (any version) Network copyleft; downstream SaaS users must release source under AGPL
SSPL (MongoDB, 2018) "If you offer the software as a service, you must release the entire stack under SSPL" — broad service-provider trigger
BSL / BUSL (Business Source License) Source-available with a delayed open-source conversion; competitive-use restriction during the delay
Commons Clause Addendum to an open-source license; adds "you may not sell the software" — targets SaaS reselling
Elastic License v2 (Elastic NV, 2021) "You may not offer the software as a managed service that competes with Elastic"
Unknown / unparseable (e.g., UNKNOWN, Custom, see AUTHORS) Not classifiable; flagged for manual review; never auto-pass
Missing license metadata Catches packaging bugs

Decision rule (in the script)

if license in BLOCKLIST: violation
elif license in ALLOWLIST: pass
else:  # unknown / unparseable / unclassified
    violation (flag for manual review; never auto-pass)

The two lists are explicit, not heuristic. Adding a new license to either list is a one-line code change. The script's --help references the policy table for transparency.

Output Format

Stdout (line-per-violation, parseable)

LICENSE_VIOLATION pkg=foo license="GPL-3.0" via=bar==2.0
CVE_FOUND pkg=baz cve_id=CVE-2024-12345 severity=high fix_versions=">=1.2.3"
PIN_MISSING pkg=qux (no version specifier in pyproject.toml)
SPDX_VIOLATION file=src/some_module.py license="GPL-3.0"

Each line is a stable parseable format; CI can grep for VIOLATION|FOUND|MISSING and exit 1 on any match.

Markdown report (in docs/reports/license_cve_audit/<YYYY-MM-DD>/)

  • initial.md — the discovered violations (committed in Phase 1)
  • final.md — the post-cleanup state (committed in Phase 2, after tilde-pinning + lock regen)

Structure:

# License & CVE Audit — 2026-06-07

## Top-level summary

- License violations: 0
- CVEs found: 0
- Pinning issues: 0
- SPDX violations in src/ or scripts/: 0

## Notes

- No `LICENSE` file in repo root — informational, not a violation. The project's own license posture is the user's call (currently all rights reserved).
- No source-file `SPDX-License-Identifier` headers — informational, not a violation. The project's own copyright headers are the user's call.
- pip-audit not installed → CVE check skipped. Install via `uv tool install pip-audit` to enable.

## Per-violation table

| Type | Package | License / CVE / Pin | Via |
|------|---------|---------------------|-----|
| ... | ... | ... | ... |

Baseline file (scripts/audit_license_cve.baseline.json)

Internal state for --strict mode. JSON because it matches the existing convention (scripts/audit_weak_types.baseline.json). Not the user-facing report; not in the output surface. Format:

{
  "schema_version": 1,
  "baseline_violations": [],
  "baseline_date": "2026-06-07",
  "notes": "Zero-violation state after the tilde-pinning + lock regen in this track."
}

--strict mode loads this file and fails CI if len(current_violations) > len(baseline_violations). The user's intentional changes (e.g., adding a new dep with an acceptable license) are recorded by re-running with --dump-baseline.

Commit Structure (4 atomic commits, in order)

1. chore(audit): add license_cve audit script + initial report
   - scripts/audit_license_cve.py (initial version, informational mode)
   - docs/reports/license_cve_audit/2026-06-07/initial.md (the discovered violations)
2. chore(deps): tilde-pin all deps; delete requirements.txt
   - pyproject.toml (every direct dep gets ~X.Y.Z or stays as >=X.Y.Z)
   - uv.lock (regenerated)
   - requirements.txt (deleted; was redundant with lock)
3. chore(audit): add --strict mode + baseline file (CI gate)
   - scripts/audit_license_cve.py (extends with --strict + baseline diff)
   - scripts/audit_license_cve.baseline.json (zero-violation post-cleanup state)
4. conductor(tracks): mark License CVE Audit track complete
   - tracks.md update

Each commit message includes a git notes add -m "..." summary per conductor/workflow.md.

Verification (TDD per conductor/workflow.md)

Unit tests in tests/test_audit_license_cve.py:

  • License classifier: a known fixture package list with various licenses → correct classification (blocklist + allowlist + unknown).
  • Blocklist enforcement: each entry (GPL, AGPL, SSPL, BSL, BUSL, Commons Clause, Elastic v2, unknown, missing) → correctly flagged.
  • Allowlist enforcement: each entry (MIT, BSD, Apache 2.0, ISC, Unlicense, Zlib, Python-2.0, LGPL, MPL-2.0, CC0, WTFPL) → correctly passes.
  • Pin check: synthetic pyproject.toml with mixed pinning (no bound, >=X.Y, ~X.Y.Z, exact) → correct flags.
  • Source header check: synthetic .py with SPDX-License-Identifier: GPL-3.0 → flagged; with no SPDX → no violation.
  • --strict mode: violations > baseline → exit 1; violations == baseline → exit 0; new violation (delta > 0) → exit 1.
  • --dump-baseline: writes a baseline file matching the current violation set.

Risks

Risk Likelihood Impact Mitigation
Some packages' license metadata is missing or unparseable in importlib.metadata High Medium (false positives on unknown) The policy treats UNKNOWN as violation → manual review catches the right answer; the report's notes section lists the unknowns explicitly
pip-audit not installed in CI Medium Low (CVE check is a no-op) Script detects missing pip-audit and logs a warning; license + pin checks still run
Air-gapped CI can't reach OSV / PyPI advisory DBs Medium Low (CVE check returns no results) Document; a follow-up could add offline CVE support, not in this track
Pinning decisions are subjective (some deps deserve looser bounds than others) Medium Low (initial pass is conservative) The pin check accepts any lower bound as a soft check; the user can loosen specific deps via the baseline file
The baseline file becomes a "shadow ledger" — needs maintenance when intentional changes are made Medium Low (intentional) Document the update workflow in the script's --help; --dump-baseline regenerates the baseline after an intentional change
The project's own LICENSE absence might confuse a future contributor who doesn't know the user's posture Low Low The report's notes section explicitly calls this out: "no LICENSE in repo root — informational, not a violation; project's own license is the user's call (currently all rights reserved)"
A dep is added with a license that doesn't match the script's allowlist/blocklist (e.g., a new "BSL 2.0" variant) Low Low The script's default rule (unknown = violation) catches it; the report's notes section surfaces it for review; one-line add to the appropriate list

Follow-up

  • air_gapped_cve_check_20260607 (NOT in this track): add offline CVE support for air-gapped CI environments that can't reach OSV / PyPI. The CVE check would ship a snapshot of the advisory DBs (or use a local mirror).
  • cve_auto_remediation_20260607 (NOT in this track): when a CVE is found, auto-bump the dep to the fix version (within the pin range) and re-run the audit. Out of scope here; this track REPORTS, the user DECIDES.

Coordination with Pending Tracks

This track has no blockers and no conflicts with the 5 active planned tracks. It modifies:

  • pyproject.toml (version pins; could affect resolution for any future track that depends on something)
  • uv.lock (regenerated; the lock file changes)
  • requirements.txt (deleted; was redundant with lock)
  • New: scripts/audit_license_cve.py, scripts/audit_license_cve.baseline.json, docs/reports/license_cve_audit/2026-06-07/

It does NOT modify src/, tests/, or any of the 5 planned tracks' files. The deleted requirements.txt is a separate file from the 5 planned tracks' scope. Can ship independently and in parallel with the 5 planned tracks.

The tilde-pinning in this track is a STRENGTHENING of the dep contract, not a loosening — it doesn't break any existing test or any other track's plan.

Out of Scope

  • The project's own LICENSE file (user's decision; the track will not create one).
  • The project's own SPDX-License-Identifier / Copyright headers in src/ (user's decision; the track will not add or modify).
  • Any recommendation on what license the user should pick for the project.
  • Patching CVEs in transitive deps (the track REPORTS; the user decides whether to wait for upstream or replace).
  • Auto-bumping versions to address CVEs (manual decision; the track reports, the user acts).
  • Modifying any third-party code already in the repo (none currently; the scan is defensive for the future).
  • License/header updates to vendored C/C++ (none currently vendored; the scan is defensive).
  • The local-rag optional dependency group (sentence-transformers); covered by the same audit but pinning happens in the same pyproject.toml edit.

See Also

  • conductor/workflow.md "Audit Script Policy" — the convention this track follows.
  • scripts/audit_main_thread_imports.py, scripts/audit_weak_types.py, scripts/check_test_toml_paths.py — the 3 existing audit scripts; the new track follows the same shape.
  • scripts/audit_weak_types.baseline.json — the baseline file pattern (the new scripts/audit_license_cve.baseline.json mirrors this).
  • OSI Approved Licenses — the de facto list of "open source" licenses; the script's policy is consistent with this list (with the addition of LGPL / MPL-2.0 in transitive deps for Python import-safety).
  • pip-audit (PyPA) — the CVE-checking tool invoked as a subprocess. Optional; the script handles its absence gracefully.