Private
Public Access
0
0

chore(audit): spec license_cve_audit track (compliance + CVE + pinning)

Builds scripts/audit_license_cve.py: single audit script that
checks third-party deps (pyproject.toml + uv.lock transitive
tree) for: (1) license compliance against the project's policy,
(2) known CVEs (via pip-audit subprocess), (3) version-pinning,
and (4) source-file SPDX license headers in src/ and scripts/.

LICENSE POLICY (encoded in the script)
Allowlist (permissive or weak copyleft or public domain):
- Permissive: MIT, BSD, Apache 2.0, ISC, Unlicense, Zlib,
  Python-2.0, 0BSD, PSF-2.0
- Weak copyleft (Python import-safe): LGPL 2.1/3.0, MPL-2.0
- Public domain: CC0, WTFPL

Blocklist (non-OSI / restricted-source):
- GPL (any version), AGPL (any version)
- SSPL (MongoDB 2018) - broad service-provider trigger
- BSL / BUSL - delayed open source; competitive-use restriction
- Commons Clause - 'cannot sell the software' addendum
- Elastic License v2 - 'cannot offer as managed service'
- Unknown / unparseable / missing metadata (catches packaging
  bugs and custom licenses)

The two lists are explicit. Default rule: unknown = violation
(never auto-pass). The script's --help references the policy
table for transparency. Specific per-license additions go in
scripts/audit_license_cve.py directly; no spec change needed.

TRACK SCOPE
In scope: third-party deps (direct + transitive), source-file
SPDX headers, vendored libraries (defensive), version pinning.
Out of scope: the project's own LICENSE file, project's own
SPDX/Copyright headers, recommendations on project license.
The user reserves all rights to the repo; no LICENSE file is
created by the track. The audit reports third-party state only.

OUTPUT FORMAT (sanitized: no JSON in user-facing output)
- Stdout: line-per-violation, parseable by eye and by grep
- Markdown report in docs/reports/license_cve_audit/2026-06-07/
- Baseline file: JSON (matches existing audit_weak_types
  convention; internal state for --strict mode only)

CI GATE
--strict mode + scripts/audit_license_cve.baseline.json. Fails
CI on any new violation OR any new CVE. Mirrors the 3 existing
audit scripts (audit_main_thread_imports, audit_weak_types,
check_test_toml_paths).

COMMITS PLANNED
1. chore(audit): add license_cve audit script + initial report
2. chore(deps): tilde-pin all deps; delete requirements.txt
3. chore(audit): add --strict mode + baseline file (CI gate)
4. conductor(tracks): mark License CVE Audit track complete

NO NEW PIP DEPENDENCIES IN PROJECT
Pure stdlib (importlib.metadata, tomllib, pathlib, re) +
subprocess to pip-audit (an optional dev tool, installed via
'uv tool install pip-audit' if user wants CVE checks).
This commit is contained in:
2026-06-07 14:26:22 -04:00
parent 8216d49440
commit 61b5572e2b
@@ -0,0 +1,286 @@
# Track: License & CVE Audit (Dependency Compliance)
**Status:** Spec approved 2026-06-07
**Initialized:** 2026-06-07
**Owner:** Tier 2 Tech Lead
**Priority:** High (compliance + security; CI gate)
---
## Overview
Build `scripts/audit_license_cve.py` — a single audit script that checks third-party dependencies (in `pyproject.toml` + `uv.lock` transitive tree) for: (1) license compliance against the project's policy, (2) known CVEs (via `pip-audit` subprocess), and (3) version-pinning (every direct dep must have a `~X.Y.Z` bound). The script also scans source-file license headers (`SPDX-License-Identifier`) in `src/**/*.py` and `scripts/**/*.py`. Then apply the fixes: tilde-pin all direct deps, delete `requirements.txt` (redundant with `uv.lock`), regenerate `uv.lock`, add `--strict` mode + baseline file (CI gate). One script, one CI gate, one report.
The track is **scope-limited to third-party dependencies**. The project's own LICENSE file and SPDX/Copyright headers are explicitly OUT OF SCOPE — the user reserves all rights to the repo and has not picked a project license yet. The audit reports third-party state only; it does not assert or imply a project license, and it does not create a `LICENSE` file.
## Current State Audit (as of `9796fe27`)
- `pyproject.toml` has 14 direct deps with **mixed pinning**:
- 7 unconstrained: `"imgui-bundle"`, `"anthropic"`, `"google-genai"`, `"openai"`, `"fastapi"`, `"mcp"`, `"uvicorn"`
- 6 with `>=X.Y.Z`: `"pyopengl>=3.1.10"`, `"tree-sitter>=0.25.2"`, `"tree-sitter-python>=0.25.0"`, `"tree-sitter-c>=0.23.2"`, `"tree-sitter-cpp>=0.23.2"`, `"psutil>=7.2.2"`, `"chromadb>=1.5.8"`
- `"tomli-w"`, `"pytest-timeout>=2.4.0"`
- `uv.lock` exists; `requirements.txt` exists (duplicates lock — will be removed)
- No `LICENSE` file in repo root (user's chosen posture: all rights reserved; the audit reports this as informational, not a violation)
- No source-file `SPDX-License-Identifier` headers in `src/**/*.py` or `scripts/**/*.py` (informational note; not a violation — the user hasn't picked a project license yet)
- No `vendor/`, `third_party/`, or vendored C/C++ in the repo tree (the scan is defensive for the future)
- 0 existing license/CVE audit tools in `scripts/`
- The 3 existing audit scripts (`audit_main_thread_imports.py`, `audit_weak_types.py`, `check_test_toml_paths.py`) follow the project pattern of `scripts/audit_<name>.py` + `scripts/audit_<name>.baseline.json` + `--strict` mode for CI gates (per `conductor/workflow.md` "Audit Script Policy"). The new track follows the same pattern.
### Already Implemented (DO NOT re-implement; KEEP / build on)
1. **The 3 existing audit scripts** in `scripts/`. They define the project pattern for audit + CI gate. The new `scripts/audit_license_cve.py` follows the same shape.
2. **`uv.lock`** — the canonical lock file for the project. The audit reads it for transitive resolution.
3. **`importlib.metadata`** (Python 3.11+ stdlib) — gives `License` and `License-Expression` per installed distribution. No new pip dep needed for the license check.
4. **`tomllib`** (Python 3.11+ stdlib) — parses `pyproject.toml`. No new pip dep needed for the pin check.
5. **`pip-audit`** (PyPA tool) — invoked as a subprocess for the CVE check. `pip-audit` itself is NOT a project dep; it's installed via `uv tool install pip-audit` or `uvx pip-audit` if the user wants the CVE check. The script detects missing `pip-audit` and logs a warning; license + pin checks still run.
### Gaps to Fill (this track's scope)
- `scripts/audit_license_cve.py` (~300 lines, 3 internal checks + `--strict` + `--dump-baseline`)
- `scripts/audit_license_cve.baseline.json` (zero-violation post-cleanup state for `--strict` mode)
- `docs/reports/license_cve_audit/2026-06-07/initial.md` and `final.md` (the human-readable reports)
- Updates to `pyproject.toml` (tilde-pin every direct dep)
- Updated `uv.lock` (regenerated)
- Deletion of `requirements.txt`
- `tests/test_audit_license_cve.py` (TDD unit tests)
## Goals
1. **Single audit script** that runs all four checks (license + CVE + pin + source-header) and emits a unified report.
2. **CI gate** via `--strict` mode + baseline file. Mirrors the 3 existing audit scripts. Fails on any new violation OR any new CVE.
3. **Tilde-pin every direct dep** in `pyproject.toml` (`~X.Y.Z` = `>=X.Y.Z,<X.(Y+1).0`).
4. **Delete `requirements.txt`** (duplicates `uv.lock`; redundant in a `uv` project).
5. **Re-run `uv lock`** to refresh the lock file with the new bounds.
6. **Document the non-OSI / restricted-source category** in the policy table of the script (so future contributors understand why these licenses are blocked).
7. **Preserve the user's "all rights reserved" posture** — no `LICENSE` file is created; no project-level SPDX headers are added.
## Non-Goals
- The project's own `LICENSE` file (user's decision; not creating one).
- The project's own `SPDX-License-Identifier` / `Copyright` headers (user's decision; not adding or modifying).
- Any recommendation on what license the user should pick for the project.
- Patching CVEs in transitive deps (the track REPORTS; the user decides whether to wait for upstream or replace).
- Auto-bumping versions to address CVEs (manual decision; the track reports, the user acts).
- Modifying any third-party code already in the repo (none currently; the scan is defensive for the future).
- License/header updates to vendored C/C++ (none currently vendored; the scan is defensive).
- The local-rag optional dependency group (`sentence-transformers`); covered by the same audit but pinning happens in the same `pyproject.toml` edit.
## Architecture
**`scripts/audit_license_cve.py`** — single audit script, ~300 lines. No new pip dep required (stdlib + subprocess to `pip-audit`).
### Public API (CLI)
```bash
uv run python scripts/audit_license_cve.py [--src src] [--scripts scripts] \
[--report-dir docs/reports/license_cve_audit] [--date YYYY-MM-DD] \
[--strict] [--dump-baseline]
```
- **Default mode:** informational. Prints violations to stdout (line-per-violation format). Writes markdown report to `<report-dir>/<date>/initial.md` or `final.md`.
- **`--strict` mode:** exits non-zero if violations > baseline. For CI.
- **`--dump-baseline`:** writes the current violation set as the new baseline. For intentional changes (e.g., a new dep is added; the user accepts its license).
### Internal structure (3 checks + 1 scan)
```python
def check_licenses() -> list[Violation]: ... # iterates dist.metadata; classifies
def check_cves() -> list[Violation]: ... # subprocess pip-audit; parses JSON
def check_pins() -> list[Violation]: ... # tomllib parse; flag missing/loose pins
def check_source_headers() -> list[Violation]: ... # pathlib rglob; SPDX regex
def main():
violations = []
for check in (check_licenses, check_cves, check_pins, check_source_headers):
violations.extend(check())
for v in violations:
print(v.format_stdout()) # parseable line-per-violation
write_markdown_report(violations)
if args.strict and len(violations) > len(load_baseline()):
sys.exit(1)
if args.dump_baseline:
dump_baseline(violations)
```
### Cost model (the 4 checks)
| Check | Mechanism | New deps? |
|-------|-----------|-----------|
| **License** | `importlib.metadata.distribution(name).metadata.get("License")` + `License-Expression` (Python 3.11+ stdlib). For each direct + transitive dep, classify the license string against the policy table. Unknown / unparseable / missing → violation. | None (stdlib) |
| **CVE** | Subprocess call to `pip-audit --format=json --strict` (a `uv tool install pip-audit` dev tool; the project itself doesn't depend on it). If `pip-audit` isn't installed, log a warning + skip the CVE check; license + pin still run. Air-gapped CI: CVE check returns no results (not a failure). | None in `pyproject.toml`; `pip-audit` is an optional dev tool. |
| **Version pin** | `tomllib.load(pyproject.toml)` (stdlib). For each entry in `[project].dependencies`, check the version specifier. Flags: (a) no specifier at all, (b) no lower bound. Accepts any lower bound as a soft check (the user's choice is tilde, but the script doesn't enforce tilde specifically — it enforces "has a lower bound"). | None (stdlib) |
| **Source header** | `pathlib.Path(src_dir).rglob("*.py")`, read first 20 lines of each, regex-look for `SPDX-License-Identifier:` (case-insensitive). If present and in the blocklist → violation. If no SPDX → no violation (informational note). | None (stdlib) |
## License Policy (encoded in the script)
### Allowlist (permissive or weak copyleft, import-safe in Python)
- **Permissive:** MIT, BSD (2-clause + 3-clause), Apache 2.0, ISC, Unlicense, Zlib, Python-2.0, 0BSD, PSF-2.0
- **Weak copyleft (import-safe in Python):** LGPL (2.1, 3.0), MPL-2.0
- **Public domain:** CC0, Unlicense, WTFPL
(The script's allowlist is the canonical source of truth for the per-license table; see `scripts/audit_license_cve.py` for the current list. New licenses can be added by editing that table; no spec change needed.)
### Blocklist (non-permissive / restricted-source)
The blocklist is for licenses that are **non-OSI** or that impose **restrictions beyond standard copyleft terms** (permissive or copyleft). The unifying technical property: the license restricts how downstream users can use the software in ways that standard open-source licenses do not.
| License | Specific restriction |
|---------|---------------------|
| **GPL** (any version) | Strong copyleft; viral licensing; downstream users must release derivative works under GPL |
| **AGPL** (any version) | Network copyleft; downstream SaaS users must release source under AGPL |
| **SSPL** (MongoDB, 2018) | "If you offer the software as a service, you must release the entire stack under SSPL" — broad service-provider trigger |
| **BSL / BUSL** (Business Source License) | Source-available with a delayed open-source conversion; competitive-use restriction during the delay |
| **Commons Clause** | Addendum to an open-source license; adds "you may not sell the software" — targets SaaS reselling |
| **Elastic License v2** (Elastic NV, 2021) | "You may not offer the software as a managed service that competes with Elastic" |
| **Unknown / unparseable** (e.g., `UNKNOWN`, `Custom`, `see AUTHORS`) | Not classifiable; flagged for manual review; never auto-pass |
| **Missing license metadata** | Catches packaging bugs |
### Decision rule (in the script)
```
if license in BLOCKLIST: violation
elif license in ALLOWLIST: pass
else: # unknown / unparseable / unclassified
violation (flag for manual review; never auto-pass)
```
The two lists are explicit, not heuristic. Adding a new license to either list is a one-line code change. The script's `--help` references the policy table for transparency.
## Output Format
### Stdout (line-per-violation, parseable)
```
LICENSE_VIOLATION pkg=foo license="GPL-3.0" via=bar==2.0
CVE_FOUND pkg=baz cve_id=CVE-2024-12345 severity=high fix_versions=">=1.2.3"
PIN_MISSING pkg=qux (no version specifier in pyproject.toml)
SPDX_VIOLATION file=src/some_module.py license="GPL-3.0"
```
Each line is a stable parseable format; CI can grep for `VIOLATION|FOUND|MISSING` and `exit 1` on any match.
### Markdown report (in `docs/reports/license_cve_audit/<YYYY-MM-DD>/`)
- `initial.md` — the discovered violations (committed in Phase 1)
- `final.md` — the post-cleanup state (committed in Phase 2, after tilde-pinning + lock regen)
Structure:
```markdown
# License & CVE Audit — 2026-06-07
## Top-level summary
- License violations: 0
- CVEs found: 0
- Pinning issues: 0
- SPDX violations in src/ or scripts/: 0
## Notes
- No `LICENSE` file in repo root — informational, not a violation. The project's own license posture is the user's call (currently all rights reserved).
- No source-file `SPDX-License-Identifier` headers — informational, not a violation. The project's own copyright headers are the user's call.
- pip-audit not installed → CVE check skipped. Install via `uv tool install pip-audit` to enable.
## Per-violation table
| Type | Package | License / CVE / Pin | Via |
|------|---------|---------------------|-----|
| ... | ... | ... | ... |
```
### Baseline file (`scripts/audit_license_cve.baseline.json`)
Internal state for `--strict` mode. JSON because it matches the existing convention (`scripts/audit_weak_types.baseline.json`). Not the user-facing report; not in the output surface. Format:
```json
{
"schema_version": 1,
"baseline_violations": [],
"baseline_date": "2026-06-07",
"notes": "Zero-violation state after the tilde-pinning + lock regen in this track."
}
```
`--strict` mode loads this file and fails CI if `len(current_violations) > len(baseline_violations)`. The user's intentional changes (e.g., adding a new dep with an acceptable license) are recorded by re-running with `--dump-baseline`.
## Commit Structure (4 atomic commits, in order)
```
1. chore(audit): add license_cve audit script + initial report
- scripts/audit_license_cve.py (initial version, informational mode)
- docs/reports/license_cve_audit/2026-06-07/initial.md (the discovered violations)
2. chore(deps): tilde-pin all deps; delete requirements.txt
- pyproject.toml (every direct dep gets ~X.Y.Z or stays as >=X.Y.Z)
- uv.lock (regenerated)
- requirements.txt (deleted; was redundant with lock)
3. chore(audit): add --strict mode + baseline file (CI gate)
- scripts/audit_license_cve.py (extends with --strict + baseline diff)
- scripts/audit_license_cve.baseline.json (zero-violation post-cleanup state)
4. conductor(tracks): mark License CVE Audit track complete
- tracks.md update
```
Each commit message includes a `git notes add -m "..."` summary per `conductor/workflow.md`.
## Verification (TDD per `conductor/workflow.md`)
Unit tests in `tests/test_audit_license_cve.py`:
- License classifier: a known fixture package list with various licenses → correct classification (blocklist + allowlist + unknown).
- Blocklist enforcement: each entry (GPL, AGPL, SSPL, BSL, BUSL, Commons Clause, Elastic v2, unknown, missing) → correctly flagged.
- Allowlist enforcement: each entry (MIT, BSD, Apache 2.0, ISC, Unlicense, Zlib, Python-2.0, LGPL, MPL-2.0, CC0, WTFPL) → correctly passes.
- Pin check: synthetic `pyproject.toml` with mixed pinning (no bound, `>=X.Y`, `~X.Y.Z`, exact) → correct flags.
- Source header check: synthetic `.py` with `SPDX-License-Identifier: GPL-3.0` → flagged; with no SPDX → no violation.
- `--strict` mode: violations > baseline → exit 1; violations == baseline → exit 0; new violation (delta > 0) → exit 1.
- `--dump-baseline`: writes a baseline file matching the current violation set.
## Risks
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Some packages' license metadata is missing or unparseable in `importlib.metadata` | High | Medium (false positives on unknown) | The policy treats `UNKNOWN` as violation → manual review catches the right answer; the report's notes section lists the unknowns explicitly |
| `pip-audit` not installed in CI | Medium | Low (CVE check is a no-op) | Script detects missing `pip-audit` and logs a warning; license + pin checks still run |
| Air-gapped CI can't reach OSV / PyPI advisory DBs | Medium | Low (CVE check returns no results) | Document; a follow-up could add offline CVE support, not in this track |
| Pinning decisions are subjective (some deps deserve looser bounds than others) | Medium | Low (initial pass is conservative) | The pin check accepts any lower bound as a soft check; the user can loosen specific deps via the baseline file |
| The baseline file becomes a "shadow ledger" — needs maintenance when intentional changes are made | Medium | Low (intentional) | Document the update workflow in the script's `--help`; `--dump-baseline` regenerates the baseline after an intentional change |
| The project's own LICENSE absence might confuse a future contributor who doesn't know the user's posture | Low | Low | The report's notes section explicitly calls this out: "no LICENSE in repo root — informational, not a violation; project's own license is the user's call (currently all rights reserved)" |
| A dep is added with a license that doesn't match the script's allowlist/blocklist (e.g., a new "BSL 2.0" variant) | Low | Low | The script's default rule (unknown = violation) catches it; the report's notes section surfaces it for review; one-line add to the appropriate list |
## Follow-up
- `air_gapped_cve_check_20260607` (NOT in this track): add offline CVE support for air-gapped CI environments that can't reach OSV / PyPI. The CVE check would ship a snapshot of the advisory DBs (or use a local mirror).
- `cve_auto_remediation_20260607` (NOT in this track): when a CVE is found, auto-bump the dep to the fix version (within the pin range) and re-run the audit. Out of scope here; this track REPORTS, the user DECIDES.
## Coordination with Pending Tracks
This track has **no blockers** and **no conflicts** with the 5 active planned tracks. It modifies:
- `pyproject.toml` (version pins; could affect resolution for any future track that depends on something)
- `uv.lock` (regenerated; the lock file changes)
- `requirements.txt` (deleted; was redundant with lock)
- New: `scripts/audit_license_cve.py`, `scripts/audit_license_cve.baseline.json`, `docs/reports/license_cve_audit/2026-06-07/`
It does NOT modify `src/`, `tests/`, or any of the 5 planned tracks' files. The deleted `requirements.txt` is a separate file from the 5 planned tracks' scope. Can ship independently and in parallel with the 5 planned tracks.
The tilde-pinning in this track is a STRENGTHENING of the dep contract, not a loosening — it doesn't break any existing test or any other track's plan.
## Out of Scope
- The project's own `LICENSE` file (user's decision; the track will not create one).
- The project's own `SPDX-License-Identifier` / `Copyright` headers in `src/` (user's decision; the track will not add or modify).
- Any recommendation on what license the user should pick for the project.
- Patching CVEs in transitive deps (the track REPORTS; the user decides whether to wait for upstream or replace).
- Auto-bumping versions to address CVEs (manual decision; the track reports, the user acts).
- Modifying any third-party code already in the repo (none currently; the scan is defensive for the future).
- License/header updates to vendored C/C++ (none currently vendored; the scan is defensive).
- The local-rag optional dependency group (`sentence-transformers`); covered by the same audit but pinning happens in the same `pyproject.toml` edit.
## See Also
- `conductor/workflow.md` "Audit Script Policy" — the convention this track follows.
- `scripts/audit_main_thread_imports.py`, `scripts/audit_weak_types.py`, `scripts/check_test_toml_paths.py` — the 3 existing audit scripts; the new track follows the same shape.
- `scripts/audit_weak_types.baseline.json` — the baseline file pattern (the new `scripts/audit_license_cve.baseline.json` mirrors this).
- [OSI Approved Licenses](https://opensource.org/licenses/) — the de facto list of "open source" licenses; the script's policy is consistent with this list (with the addition of LGPL / MPL-2.0 in transitive deps for Python import-safety).
- `pip-audit` (PyPA) — the CVE-checking tool invoked as a subprocess. Optional; the script handles its absence gracefully.