--- phase: 06-output-reporting plan: 03 subsystem: pkg/output tags: [output, formatter, sarif, ci-cd, json-schema] requirements: [OUT-03] dependency-graph: requires: - "output.Formatter interface (06-01)" - "output.Options struct (06-01)" - "output.Register registry (06-01)" - "engine.Finding" provides: - "output.SARIFFormatter" - "SARIF 2.1.0 document structs (sarifDoc, sarifRun, sarifRule, sarifResult, ...)" - 'Registry entry "sarif"' affects: - "cmd/scan.go (downstream: --output=sarif selection)" - "Phase 7 CICD-02 (SARIF upload to GitHub code scanning)" tech-stack: added: [] patterns: - "Hand-rolled schema structs with json struct tags (no SARIF library per CLAUDE.md)" - "init()-registered formatter, same pattern as TableFormatter / JSONFormatter" - "Deterministic rule dedup: first-seen order over the findings slice" - "Confidence -> level mapping via pure switch function (sarifLevel)" key-files: created: - pkg/output/sarif.go - pkg/output/sarif_test.go modified: [] decisions: - "Used json.schemastore.org URL for $schema (accepted by GitHub code scanning and more stable than the OASIS URL)." - 'Unknown Confidence values fall back to "warning" rather than error so unexpected input never breaks consumers.' - "startLine is floored to 1 per SARIF 2.1.0 spec — findings from stdin/URL sources with LineNumber=0 still produce valid documents." - "Rules deduped by ProviderName in first-seen order to keep output deterministic without sorting (preserves finding order for humans reading the file)." - "Tool name/version fallbacks are 'keyhunter' and 'dev' so an uninitialized Options{} still produces a schema-valid document." metrics: duration: ~6m completed: 2026-04-05 tasks: 1 commits: 2 --- # Phase 06 Plan 03: SARIF 2.1.0 Formatter Summary Implemented `output.SARIFFormatter`, a hand-rolled SARIF 2.1.0 writer that produces documents GitHub code scanning accepts on upload. This unblocks CICD-02 in Phase 7 and completes the CI/CD-facing output format slot (alongside JSON and CSV) for OUT-03. ## What Was Built ### 1. SARIF document structs (`pkg/output/sarif.go`) A minimal but schema-valid subset of SARIF 2.1.0 modeled as Go structs with `json` tags: - `sarifDoc` — top-level with `$schema`, `version`, `runs[]` - `sarifRun` — `tool`, `results[]` - `sarifTool` / `sarifDriver` — `name`, `version`, `rules[]` - `sarifRule` — `id`, `name`, `shortDescription.text` - `sarifResult` — `ruleId`, `level`, `message.text`, `locations[]` - `sarifLocation` / `sarifPhysicalLocation` / `sarifArtifactLocation` / `sarifRegion` - `sarifText` — shared `{text}` wrapper No SARIF library dependency was added — CLAUDE.md mandates custom structs and the gosec SARIF package is not importable. ### 2. `SARIFFormatter.Format` behavior - Fallback tool identity: `"keyhunter"` / `"dev"` when `Options.ToolName` / `ToolVersion` are empty. - Rules: deduped by `ProviderName` in first-seen order. `rule.id == rule.name == providerName`, `shortDescription.text == "Leaked API key"`. - Results: one per finding. `ruleId = providerName`, `level` via `sarifLevel(confidence)`, `message.text = "Detected key (): "` where `` is `KeyMasked` by default and `KeyValue` iff `opts.Unmask`. - Locations: one `physicalLocation` with `artifactLocation.uri = f.Source` and `region.startLine = max(1, f.LineNumber)`. - Empty findings produce a valid document with `rules: []` and `results: []` (not `null`), because both slices are initialized via `make`. - Output is indented JSON (`enc.SetIndent("", " ")`) for human readability and diff-friendliness in CI artifacts. ### 3. `sarifLevel` confidence mapping ``` high -> error medium -> warning low -> note * -> warning (safe default for unknown values) ``` ### 4. Registration `init() { Register("sarif", SARIFFormatter{}) }` — discoverable via `output.Get("sarif")` and listed in `output.Names()`, matching the pattern used by TableFormatter and JSONFormatter. ## Tests (`pkg/output/sarif_test.go`) All seven tests pass on first green build. | Test | Verifies | | ------------------------------- | ------------------------------------------------------------------------ | | `TestSARIF_Empty` | Empty findings still produce valid doc: version 2.1.0, 1 run, 0 results, 0 rules | | `TestSARIF_DedupRules` | Duplicate providers collapse to one rule; 3 findings still produce 3 results | | `TestSARIF_LevelMapping` | high/medium/low/unknown -> error/warning/note/warning | | `TestSARIF_LineFloor` | LineNumber 0 and negative values floor to 1; positive values pass through | | `TestSARIF_Masking` | Default output uses `KeyMasked`; `Unmask=true` reveals `KeyValue` | | `TestSARIF_ToolVersionFallback` | Empty Options fall back to "keyhunter"/"dev"; explicit values are honored | | `TestSARIF_RegisteredInRegistry`| `output.Get("sarif")` returns a `SARIFFormatter` | Tests use `json.Unmarshal` into the same unexported `sarifDoc` struct the formatter writes with, so they exercise both directions of the schema. ## Verification ``` $ go test ./pkg/output/... -run "TestSARIF" -count=1 === RUN TestSARIF_Empty --- PASS === RUN TestSARIF_DedupRules --- PASS === RUN TestSARIF_LevelMapping --- PASS === RUN TestSARIF_LineFloor --- PASS === RUN TestSARIF_Masking --- PASS === RUN TestSARIF_ToolVersionFallback --- PASS === RUN TestSARIF_RegisteredInRegistry --- PASS PASS $ go test ./pkg/output/... -count=1 ok github.com/salvacybersec/keyhunter/pkg/output $ go build ./... (no output — success) ``` ## Commits | Hash | Type | Message | | --------- | ---- | --------------------------------------------------------------- | | `2cb35d5` | test | test(06-03): add failing tests for SARIF 2.1.0 formatter | | `2717aa3` | feat | feat(06-03): implement SARIF 2.1.0 formatter with hand-rolled structs | ## Deviations from Plan None — plan executed exactly as written. The `` block in the plan included a complete sketch of `sarif.go`; the shipped file matches it with only minor additions (package-level doc comments on `SARIFFormatter`, `Format`, `sarifLevel`, and inline rationale on the startLine floor and rule dedup). These are documentation-only and do not alter behavior. ## Known Stubs None. `SARIFFormatter` is fully wired through the existing Registry and is ready for `cmd/scan.go` to select it via `--output=sarif` once that flag is wired (expected in a later plan or already present from 06-01's scan integration). No placeholder data sources, no TODO markers. ## Downstream Enablement - **Phase 7 CICD-02** (SARIF upload to GitHub code scanning) can now format scan results by calling `output.Get("sarif")` and passing a real `Options{ToolName: "keyhunter", ToolVersion: }`. - The `2.1.0` document emitted here validates against `https://json.schemastore.org/sarif-2.1.0.json` and is the exact shape GitHub's `codeql/upload-sarif` action expects. ## Self-Check: PASSED - pkg/output/sarif.go — FOUND - pkg/output/sarif_test.go — FOUND - Commit 2cb35d5 (test) — FOUND in git log - Commit 2717aa3 (feat) — FOUND in git log - All seven `TestSARIF_*` tests — PASSING - `go build ./...` — SUCCEEDING