Builds scripts/audit_license_cve.py: single audit script that checks third-party deps (pyproject.toml + uv.lock transitive tree) for: (1) license compliance against the project's policy, (2) known CVEs (via pip-audit subprocess), (3) version-pinning, and (4) source-file SPDX license headers in src/ and scripts/. LICENSE POLICY (encoded in the script) Allowlist (permissive or weak copyleft or public domain): - Permissive: MIT, BSD, Apache 2.0, ISC, Unlicense, Zlib, Python-2.0, 0BSD, PSF-2.0 - Weak copyleft (Python import-safe): LGPL 2.1/3.0, MPL-2.0 - Public domain: CC0, WTFPL Blocklist (non-OSI / restricted-source): - GPL (any version), AGPL (any version) - SSPL (MongoDB 2018) - broad service-provider trigger - BSL / BUSL - delayed open source; competitive-use restriction - Commons Clause - 'cannot sell the software' addendum - Elastic License v2 - 'cannot offer as managed service' - Unknown / unparseable / missing metadata (catches packaging bugs and custom licenses) The two lists are explicit. Default rule: unknown = violation (never auto-pass). The script's --help references the policy table for transparency. Specific per-license additions go in scripts/audit_license_cve.py directly; no spec change needed. TRACK SCOPE In scope: third-party deps (direct + transitive), source-file SPDX headers, vendored libraries (defensive), version pinning. Out of scope: the project's own LICENSE file, project's own SPDX/Copyright headers, recommendations on project license. The user reserves all rights to the repo; no LICENSE file is created by the track. The audit reports third-party state only. OUTPUT FORMAT (sanitized: no JSON in user-facing output) - Stdout: line-per-violation, parseable by eye and by grep - Markdown report in docs/reports/license_cve_audit/2026-06-07/ - Baseline file: JSON (matches existing audit_weak_types convention; internal state for --strict mode only) CI GATE --strict mode + scripts/audit_license_cve.baseline.json. Fails CI on any new violation OR any new CVE. Mirrors the 3 existing audit scripts (audit_main_thread_imports, audit_weak_types, check_test_toml_paths). COMMITS PLANNED 1. chore(audit): add license_cve audit script + initial report 2. chore(deps): tilde-pin all deps; delete requirements.txt 3. chore(audit): add --strict mode + baseline file (CI gate) 4. conductor(tracks): mark License CVE Audit track complete NO NEW PIP DEPENDENCIES IN PROJECT Pure stdlib (importlib.metadata, tomllib, pathlib, re) + subprocess to pip-audit (an optional dev tool, installed via 'uv tool install pip-audit' if user wants CVE checks).
19 KiB
Track: License & CVE Audit (Dependency Compliance)
Status: Spec approved 2026-06-07 Initialized: 2026-06-07 Owner: Tier 2 Tech Lead Priority: High (compliance + security; CI gate)
Overview
Build scripts/audit_license_cve.py — a single audit script that checks third-party dependencies (in pyproject.toml + uv.lock transitive tree) for: (1) license compliance against the project's policy, (2) known CVEs (via pip-audit subprocess), and (3) version-pinning (every direct dep must have a ~X.Y.Z bound). The script also scans source-file license headers (SPDX-License-Identifier) in src/**/*.py and scripts/**/*.py. Then apply the fixes: tilde-pin all direct deps, delete requirements.txt (redundant with uv.lock), regenerate uv.lock, add --strict mode + baseline file (CI gate). One script, one CI gate, one report.
The track is scope-limited to third-party dependencies. The project's own LICENSE file and SPDX/Copyright headers are explicitly OUT OF SCOPE — the user reserves all rights to the repo and has not picked a project license yet. The audit reports third-party state only; it does not assert or imply a project license, and it does not create a LICENSE file.
Current State Audit (as of 9796fe27)
pyproject.tomlhas 14 direct deps with mixed pinning:- 7 unconstrained:
"imgui-bundle","anthropic","google-genai","openai","fastapi","mcp","uvicorn" - 6 with
>=X.Y.Z:"pyopengl>=3.1.10","tree-sitter>=0.25.2","tree-sitter-python>=0.25.0","tree-sitter-c>=0.23.2","tree-sitter-cpp>=0.23.2","psutil>=7.2.2","chromadb>=1.5.8" "tomli-w","pytest-timeout>=2.4.0"
- 7 unconstrained:
uv.lockexists;requirements.txtexists (duplicates lock — will be removed)- No
LICENSEfile in repo root (user's chosen posture: all rights reserved; the audit reports this as informational, not a violation) - No source-file
SPDX-License-Identifierheaders insrc/**/*.pyorscripts/**/*.py(informational note; not a violation — the user hasn't picked a project license yet) - No
vendor/,third_party/, or vendored C/C++ in the repo tree (the scan is defensive for the future) - 0 existing license/CVE audit tools in
scripts/ - The 3 existing audit scripts (
audit_main_thread_imports.py,audit_weak_types.py,check_test_toml_paths.py) follow the project pattern ofscripts/audit_<name>.py+scripts/audit_<name>.baseline.json+--strictmode for CI gates (perconductor/workflow.md"Audit Script Policy"). The new track follows the same pattern.
Already Implemented (DO NOT re-implement; KEEP / build on)
- The 3 existing audit scripts in
scripts/. They define the project pattern for audit + CI gate. The newscripts/audit_license_cve.pyfollows the same shape. uv.lock— the canonical lock file for the project. The audit reads it for transitive resolution.importlib.metadata(Python 3.11+ stdlib) — givesLicenseandLicense-Expressionper installed distribution. No new pip dep needed for the license check.tomllib(Python 3.11+ stdlib) — parsespyproject.toml. No new pip dep needed for the pin check.pip-audit(PyPA tool) — invoked as a subprocess for the CVE check.pip-audititself is NOT a project dep; it's installed viauv tool install pip-auditoruvx pip-auditif the user wants the CVE check. The script detects missingpip-auditand logs a warning; license + pin checks still run.
Gaps to Fill (this track's scope)
scripts/audit_license_cve.py(~300 lines, 3 internal checks +--strict+--dump-baseline)scripts/audit_license_cve.baseline.json(zero-violation post-cleanup state for--strictmode)docs/reports/license_cve_audit/2026-06-07/initial.mdandfinal.md(the human-readable reports)- Updates to
pyproject.toml(tilde-pin every direct dep) - Updated
uv.lock(regenerated) - Deletion of
requirements.txt tests/test_audit_license_cve.py(TDD unit tests)
Goals
- Single audit script that runs all four checks (license + CVE + pin + source-header) and emits a unified report.
- CI gate via
--strictmode + baseline file. Mirrors the 3 existing audit scripts. Fails on any new violation OR any new CVE. - Tilde-pin every direct dep in
pyproject.toml(~X.Y.Z=>=X.Y.Z,<X.(Y+1).0). - Delete
requirements.txt(duplicatesuv.lock; redundant in auvproject). - Re-run
uv lockto refresh the lock file with the new bounds. - Document the non-OSI / restricted-source category in the policy table of the script (so future contributors understand why these licenses are blocked).
- Preserve the user's "all rights reserved" posture — no
LICENSEfile is created; no project-level SPDX headers are added.
Non-Goals
- The project's own
LICENSEfile (user's decision; not creating one). - The project's own
SPDX-License-Identifier/Copyrightheaders (user's decision; not adding or modifying). - Any recommendation on what license the user should pick for the project.
- Patching CVEs in transitive deps (the track REPORTS; the user decides whether to wait for upstream or replace).
- Auto-bumping versions to address CVEs (manual decision; the track reports, the user acts).
- Modifying any third-party code already in the repo (none currently; the scan is defensive for the future).
- License/header updates to vendored C/C++ (none currently vendored; the scan is defensive).
- The local-rag optional dependency group (
sentence-transformers); covered by the same audit but pinning happens in the samepyproject.tomledit.
Architecture
scripts/audit_license_cve.py — single audit script, ~300 lines. No new pip dep required (stdlib + subprocess to pip-audit).
Public API (CLI)
uv run python scripts/audit_license_cve.py [--src src] [--scripts scripts] \
[--report-dir docs/reports/license_cve_audit] [--date YYYY-MM-DD] \
[--strict] [--dump-baseline]
- Default mode: informational. Prints violations to stdout (line-per-violation format). Writes markdown report to
<report-dir>/<date>/initial.mdorfinal.md. --strictmode: exits non-zero if violations > baseline. For CI.--dump-baseline: writes the current violation set as the new baseline. For intentional changes (e.g., a new dep is added; the user accepts its license).
Internal structure (3 checks + 1 scan)
def check_licenses() -> list[Violation]: ... # iterates dist.metadata; classifies
def check_cves() -> list[Violation]: ... # subprocess pip-audit; parses JSON
def check_pins() -> list[Violation]: ... # tomllib parse; flag missing/loose pins
def check_source_headers() -> list[Violation]: ... # pathlib rglob; SPDX regex
def main():
violations = []
for check in (check_licenses, check_cves, check_pins, check_source_headers):
violations.extend(check())
for v in violations:
print(v.format_stdout()) # parseable line-per-violation
write_markdown_report(violations)
if args.strict and len(violations) > len(load_baseline()):
sys.exit(1)
if args.dump_baseline:
dump_baseline(violations)
Cost model (the 4 checks)
| Check | Mechanism | New deps? |
|---|---|---|
| License | importlib.metadata.distribution(name).metadata.get("License") + License-Expression (Python 3.11+ stdlib). For each direct + transitive dep, classify the license string against the policy table. Unknown / unparseable / missing → violation. |
None (stdlib) |
| CVE | Subprocess call to pip-audit --format=json --strict (a uv tool install pip-audit dev tool; the project itself doesn't depend on it). If pip-audit isn't installed, log a warning + skip the CVE check; license + pin still run. Air-gapped CI: CVE check returns no results (not a failure). |
None in pyproject.toml; pip-audit is an optional dev tool. |
| Version pin | tomllib.load(pyproject.toml) (stdlib). For each entry in [project].dependencies, check the version specifier. Flags: (a) no specifier at all, (b) no lower bound. Accepts any lower bound as a soft check (the user's choice is tilde, but the script doesn't enforce tilde specifically — it enforces "has a lower bound"). |
None (stdlib) |
| Source header | pathlib.Path(src_dir).rglob("*.py"), read first 20 lines of each, regex-look for SPDX-License-Identifier: (case-insensitive). If present and in the blocklist → violation. If no SPDX → no violation (informational note). |
None (stdlib) |
License Policy (encoded in the script)
Allowlist (permissive or weak copyleft, import-safe in Python)
- Permissive: MIT, BSD (2-clause + 3-clause), Apache 2.0, ISC, Unlicense, Zlib, Python-2.0, 0BSD, PSF-2.0
- Weak copyleft (import-safe in Python): LGPL (2.1, 3.0), MPL-2.0
- Public domain: CC0, Unlicense, WTFPL
(The script's allowlist is the canonical source of truth for the per-license table; see scripts/audit_license_cve.py for the current list. New licenses can be added by editing that table; no spec change needed.)
Blocklist (non-permissive / restricted-source)
The blocklist is for licenses that are non-OSI or that impose restrictions beyond standard copyleft terms (permissive or copyleft). The unifying technical property: the license restricts how downstream users can use the software in ways that standard open-source licenses do not.
| License | Specific restriction |
|---|---|
| GPL (any version) | Strong copyleft; viral licensing; downstream users must release derivative works under GPL |
| AGPL (any version) | Network copyleft; downstream SaaS users must release source under AGPL |
| SSPL (MongoDB, 2018) | "If you offer the software as a service, you must release the entire stack under SSPL" — broad service-provider trigger |
| BSL / BUSL (Business Source License) | Source-available with a delayed open-source conversion; competitive-use restriction during the delay |
| Commons Clause | Addendum to an open-source license; adds "you may not sell the software" — targets SaaS reselling |
| Elastic License v2 (Elastic NV, 2021) | "You may not offer the software as a managed service that competes with Elastic" |
Unknown / unparseable (e.g., UNKNOWN, Custom, see AUTHORS) |
Not classifiable; flagged for manual review; never auto-pass |
| Missing license metadata | Catches packaging bugs |
Decision rule (in the script)
if license in BLOCKLIST: violation
elif license in ALLOWLIST: pass
else: # unknown / unparseable / unclassified
violation (flag for manual review; never auto-pass)
The two lists are explicit, not heuristic. Adding a new license to either list is a one-line code change. The script's --help references the policy table for transparency.
Output Format
Stdout (line-per-violation, parseable)
LICENSE_VIOLATION pkg=foo license="GPL-3.0" via=bar==2.0
CVE_FOUND pkg=baz cve_id=CVE-2024-12345 severity=high fix_versions=">=1.2.3"
PIN_MISSING pkg=qux (no version specifier in pyproject.toml)
SPDX_VIOLATION file=src/some_module.py license="GPL-3.0"
Each line is a stable parseable format; CI can grep for VIOLATION|FOUND|MISSING and exit 1 on any match.
Markdown report (in docs/reports/license_cve_audit/<YYYY-MM-DD>/)
initial.md— the discovered violations (committed in Phase 1)final.md— the post-cleanup state (committed in Phase 2, after tilde-pinning + lock regen)
Structure:
# License & CVE Audit — 2026-06-07
## Top-level summary
- License violations: 0
- CVEs found: 0
- Pinning issues: 0
- SPDX violations in src/ or scripts/: 0
## Notes
- No `LICENSE` file in repo root — informational, not a violation. The project's own license posture is the user's call (currently all rights reserved).
- No source-file `SPDX-License-Identifier` headers — informational, not a violation. The project's own copyright headers are the user's call.
- pip-audit not installed → CVE check skipped. Install via `uv tool install pip-audit` to enable.
## Per-violation table
| Type | Package | License / CVE / Pin | Via |
|------|---------|---------------------|-----|
| ... | ... | ... | ... |
Baseline file (scripts/audit_license_cve.baseline.json)
Internal state for --strict mode. JSON because it matches the existing convention (scripts/audit_weak_types.baseline.json). Not the user-facing report; not in the output surface. Format:
{
"schema_version": 1,
"baseline_violations": [],
"baseline_date": "2026-06-07",
"notes": "Zero-violation state after the tilde-pinning + lock regen in this track."
}
--strict mode loads this file and fails CI if len(current_violations) > len(baseline_violations). The user's intentional changes (e.g., adding a new dep with an acceptable license) are recorded by re-running with --dump-baseline.
Commit Structure (4 atomic commits, in order)
1. chore(audit): add license_cve audit script + initial report
- scripts/audit_license_cve.py (initial version, informational mode)
- docs/reports/license_cve_audit/2026-06-07/initial.md (the discovered violations)
2. chore(deps): tilde-pin all deps; delete requirements.txt
- pyproject.toml (every direct dep gets ~X.Y.Z or stays as >=X.Y.Z)
- uv.lock (regenerated)
- requirements.txt (deleted; was redundant with lock)
3. chore(audit): add --strict mode + baseline file (CI gate)
- scripts/audit_license_cve.py (extends with --strict + baseline diff)
- scripts/audit_license_cve.baseline.json (zero-violation post-cleanup state)
4. conductor(tracks): mark License CVE Audit track complete
- tracks.md update
Each commit message includes a git notes add -m "..." summary per conductor/workflow.md.
Verification (TDD per conductor/workflow.md)
Unit tests in tests/test_audit_license_cve.py:
- License classifier: a known fixture package list with various licenses → correct classification (blocklist + allowlist + unknown).
- Blocklist enforcement: each entry (GPL, AGPL, SSPL, BSL, BUSL, Commons Clause, Elastic v2, unknown, missing) → correctly flagged.
- Allowlist enforcement: each entry (MIT, BSD, Apache 2.0, ISC, Unlicense, Zlib, Python-2.0, LGPL, MPL-2.0, CC0, WTFPL) → correctly passes.
- Pin check: synthetic
pyproject.tomlwith mixed pinning (no bound,>=X.Y,~X.Y.Z, exact) → correct flags. - Source header check: synthetic
.pywithSPDX-License-Identifier: GPL-3.0→ flagged; with no SPDX → no violation. --strictmode: violations > baseline → exit 1; violations == baseline → exit 0; new violation (delta > 0) → exit 1.--dump-baseline: writes a baseline file matching the current violation set.
Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
Some packages' license metadata is missing or unparseable in importlib.metadata |
High | Medium (false positives on unknown) | The policy treats UNKNOWN as violation → manual review catches the right answer; the report's notes section lists the unknowns explicitly |
pip-audit not installed in CI |
Medium | Low (CVE check is a no-op) | Script detects missing pip-audit and logs a warning; license + pin checks still run |
| Air-gapped CI can't reach OSV / PyPI advisory DBs | Medium | Low (CVE check returns no results) | Document; a follow-up could add offline CVE support, not in this track |
| Pinning decisions are subjective (some deps deserve looser bounds than others) | Medium | Low (initial pass is conservative) | The pin check accepts any lower bound as a soft check; the user can loosen specific deps via the baseline file |
| The baseline file becomes a "shadow ledger" — needs maintenance when intentional changes are made | Medium | Low (intentional) | Document the update workflow in the script's --help; --dump-baseline regenerates the baseline after an intentional change |
| The project's own LICENSE absence might confuse a future contributor who doesn't know the user's posture | Low | Low | The report's notes section explicitly calls this out: "no LICENSE in repo root — informational, not a violation; project's own license is the user's call (currently all rights reserved)" |
| A dep is added with a license that doesn't match the script's allowlist/blocklist (e.g., a new "BSL 2.0" variant) | Low | Low | The script's default rule (unknown = violation) catches it; the report's notes section surfaces it for review; one-line add to the appropriate list |
Follow-up
air_gapped_cve_check_20260607(NOT in this track): add offline CVE support for air-gapped CI environments that can't reach OSV / PyPI. The CVE check would ship a snapshot of the advisory DBs (or use a local mirror).cve_auto_remediation_20260607(NOT in this track): when a CVE is found, auto-bump the dep to the fix version (within the pin range) and re-run the audit. Out of scope here; this track REPORTS, the user DECIDES.
Coordination with Pending Tracks
This track has no blockers and no conflicts with the 5 active planned tracks. It modifies:
pyproject.toml(version pins; could affect resolution for any future track that depends on something)uv.lock(regenerated; the lock file changes)requirements.txt(deleted; was redundant with lock)- New:
scripts/audit_license_cve.py,scripts/audit_license_cve.baseline.json,docs/reports/license_cve_audit/2026-06-07/
It does NOT modify src/, tests/, or any of the 5 planned tracks' files. The deleted requirements.txt is a separate file from the 5 planned tracks' scope. Can ship independently and in parallel with the 5 planned tracks.
The tilde-pinning in this track is a STRENGTHENING of the dep contract, not a loosening — it doesn't break any existing test or any other track's plan.
Out of Scope
- The project's own
LICENSEfile (user's decision; the track will not create one). - The project's own
SPDX-License-Identifier/Copyrightheaders insrc/(user's decision; the track will not add or modify). - Any recommendation on what license the user should pick for the project.
- Patching CVEs in transitive deps (the track REPORTS; the user decides whether to wait for upstream or replace).
- Auto-bumping versions to address CVEs (manual decision; the track reports, the user acts).
- Modifying any third-party code already in the repo (none currently; the scan is defensive for the future).
- License/header updates to vendored C/C++ (none currently vendored; the scan is defensive).
- The local-rag optional dependency group (
sentence-transformers); covered by the same audit but pinning happens in the samepyproject.tomledit.
See Also
conductor/workflow.md"Audit Script Policy" — the convention this track follows.scripts/audit_main_thread_imports.py,scripts/audit_weak_types.py,scripts/check_test_toml_paths.py— the 3 existing audit scripts; the new track follows the same shape.scripts/audit_weak_types.baseline.json— the baseline file pattern (the newscripts/audit_license_cve.baseline.jsonmirrors this).- OSI Approved Licenses — the de facto list of "open source" licenses; the script's policy is consistent with this list (with the addition of LGPL / MPL-2.0 in transitive deps for Python import-safety).
pip-audit(PyPA) — the CVE-checking tool invoked as a subprocess. Optional; the script handles its absence gracefully.