docs(07-03): complete dedup + SARIF github validation plan
This commit is contained in:
@@ -70,7 +70,7 @@ Requirements for initial release. Each maps to roadmap phases.
|
|||||||
|
|
||||||
- [ ] **IMP-01**: TruffleHog JSON output parser and importer
|
- [ ] **IMP-01**: TruffleHog JSON output parser and importer
|
||||||
- [ ] **IMP-02**: Gitleaks JSON output parser and importer
|
- [ ] **IMP-02**: Gitleaks JSON output parser and importer
|
||||||
- [ ] **IMP-03**: Generic CSV import for custom tool output
|
- [x] **IMP-03**: Generic CSV import for custom tool output
|
||||||
|
|
||||||
### Storage
|
### Storage
|
||||||
|
|
||||||
@@ -89,7 +89,7 @@ Requirements for initial release. Each maps to roadmap phases.
|
|||||||
### CI/CD Integration
|
### CI/CD Integration
|
||||||
|
|
||||||
- [ ] **CICD-01**: keyhunter hook install/uninstall for git pre-commit hooks
|
- [ ] **CICD-01**: keyhunter hook install/uninstall for git pre-commit hooks
|
||||||
- [ ] **CICD-02**: SARIF output uploadable to GitHub Security tab
|
- [x] **CICD-02**: SARIF output uploadable to GitHub Security tab
|
||||||
|
|
||||||
### OSINT/Recon — IoT & Internet Scanners
|
### OSINT/Recon — IoT & Internet Scanners
|
||||||
|
|
||||||
|
|||||||
@@ -161,9 +161,9 @@ Plans:
|
|||||||
**Plans**: 6 plans
|
**Plans**: 6 plans
|
||||||
|
|
||||||
Plans:
|
Plans:
|
||||||
- [ ] 07-01-PLAN.md — pkg/importer Importer interface + TruffleHog v3 JSON parser + fixtures (IMP-01)
|
- [x] 07-01-PLAN.md — pkg/importer Importer interface + TruffleHog v3 JSON parser + fixtures (IMP-01)
|
||||||
- [ ] 07-02-PLAN.md — Gitleaks JSON + CSV parsers (IMP-02)
|
- [x] 07-02-PLAN.md — Gitleaks JSON + CSV parsers (IMP-02)
|
||||||
- [ ] 07-03-PLAN.md — Dedup helper + SARIF GitHub Code Scanning validation test (IMP-03, CICD-02)
|
- [x] 07-03-PLAN.md — Dedup helper + SARIF GitHub Code Scanning validation test (IMP-03, CICD-02)
|
||||||
- [ ] 07-04-PLAN.md — cmd/import.go wiring format dispatch, dedup, DB persistence (IMP-01/02/03)
|
- [ ] 07-04-PLAN.md — cmd/import.go wiring format dispatch, dedup, DB persistence (IMP-01/02/03)
|
||||||
- [ ] 07-05-PLAN.md — cmd/hook.go install/uninstall with embedded pre-commit script (CICD-01)
|
- [ ] 07-05-PLAN.md — cmd/hook.go install/uninstall with embedded pre-commit script (CICD-01)
|
||||||
- [ ] 07-06-PLAN.md — docs/CI-CD.md + README CI/CD section with GitHub Actions workflow (CICD-01, CICD-02)
|
- [ ] 07-06-PLAN.md — docs/CI-CD.md + README CI/CD section with GitHub Actions workflow (CICD-01, CICD-02)
|
||||||
|
|||||||
@@ -4,13 +4,13 @@ milestone: v1.0
|
|||||||
milestone_name: milestone
|
milestone_name: milestone
|
||||||
status: executing
|
status: executing
|
||||||
stopped_at: Completed 06-06-PLAN.md
|
stopped_at: Completed 06-06-PLAN.md
|
||||||
last_updated: "2026-04-05T20:46:26.438Z"
|
last_updated: "2026-04-05T20:56:37.008Z"
|
||||||
last_activity: 2026-04-05
|
last_activity: 2026-04-05 -- Phase 07 execution started
|
||||||
progress:
|
progress:
|
||||||
total_phases: 18
|
total_phases: 18
|
||||||
completed_phases: 6
|
completed_phases: 6
|
||||||
total_plans: 34
|
total_plans: 40
|
||||||
completed_plans: 34
|
completed_plans: 37
|
||||||
percent: 20
|
percent: 20
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -21,14 +21,14 @@ progress:
|
|||||||
See: .planning/PROJECT.md (updated 2026-04-04)
|
See: .planning/PROJECT.md (updated 2026-04-04)
|
||||||
|
|
||||||
**Core value:** Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive.
|
**Core value:** Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive.
|
||||||
**Current focus:** Phase 06 — output-reporting
|
**Current focus:** Phase 07 — import-cicd
|
||||||
|
|
||||||
## Current Position
|
## Current Position
|
||||||
|
|
||||||
Phase: 7
|
Phase: 07 (import-cicd) — EXECUTING
|
||||||
Plan: Not started
|
Plan: 1 of 6
|
||||||
Status: Ready to execute
|
Status: Executing Phase 07
|
||||||
Last activity: 2026-04-05
|
Last activity: 2026-04-05 -- Phase 07 execution started
|
||||||
|
|
||||||
Progress: [██░░░░░░░░] 20%
|
Progress: [██░░░░░░░░] 20%
|
||||||
|
|
||||||
|
|||||||
104
.planning/phases/07-import-cicd/07-03-SUMMARY.md
Normal file
104
.planning/phases/07-import-cicd/07-03-SUMMARY.md
Normal file
@@ -0,0 +1,104 @@
|
|||||||
|
---
|
||||||
|
phase: 07-import-cicd
|
||||||
|
plan: 03
|
||||||
|
subsystem: importer+output
|
||||||
|
tags: [importer, dedup, sarif, cicd, github-code-scanning]
|
||||||
|
requires:
|
||||||
|
- pkg/engine/finding.go
|
||||||
|
- pkg/output/sarif.go
|
||||||
|
provides:
|
||||||
|
- pkg/importer.FindingKey
|
||||||
|
- pkg/importer.Dedup
|
||||||
|
- testdata/sarif/sarif-2.1.0-minimal-schema.json
|
||||||
|
- pkg/output.TestSARIFGitHubValidation
|
||||||
|
affects:
|
||||||
|
- pkg/importer
|
||||||
|
- pkg/output
|
||||||
|
tech-stack:
|
||||||
|
added: []
|
||||||
|
patterns:
|
||||||
|
- "SHA-256 stable hash for dedup identity (provider\\x00masked\\x00source\\x00line)"
|
||||||
|
- "Fixture-driven schema validation via JSON field list, stdlib only"
|
||||||
|
key-files:
|
||||||
|
created:
|
||||||
|
- pkg/importer/dedup.go
|
||||||
|
- pkg/importer/dedup_test.go
|
||||||
|
- pkg/output/sarif_github_test.go
|
||||||
|
- testdata/sarif/sarif-2.1.0-minimal-schema.json
|
||||||
|
modified: []
|
||||||
|
decisions:
|
||||||
|
- "FindingKey uses provider+masked+source+line as identity tuple; DetectedAt and Confidence intentionally excluded so re-imports collapse"
|
||||||
|
- "SARIF validation uses a hand-rolled required-fields fixture rather than the full 500KB SARIF 2.1.0 schema — zero external deps, targets GitHub's enforced surface"
|
||||||
|
- "Test file walks ../.. to locate repo root rather than hardcoding paths, keeping the fixture relocatable"
|
||||||
|
metrics:
|
||||||
|
duration: "~4 min"
|
||||||
|
completed: 2026-04-05
|
||||||
|
requirements: [IMP-03, CICD-02]
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 7 Plan 03: Dedup Helper & SARIF GitHub Validation Summary
|
||||||
|
|
||||||
|
Adds a stable dedup identity hash for imported findings (IMP-03) and a standalone test proving Phase 6's `SARIFFormatter` output satisfies GitHub Code Scanning's required-field surface (CICD-02).
|
||||||
|
|
||||||
|
## What Was Built
|
||||||
|
|
||||||
|
### Task 1: Dedup helper (`pkg/importer/dedup.go`)
|
||||||
|
- `FindingKey(engine.Finding) string`: hex-encoded SHA-256 over `provider\x00masked\x00source\x00line`. Fields outside that tuple (DetectedAt, Confidence, VerifyStatus, ...) do not participate — re-running the same import later must collapse onto the original finding.
|
||||||
|
- `Dedup([]engine.Finding) ([]engine.Finding, int)`: preserves first-seen order, returns the deduplicated slice plus count of duplicates dropped. Callers use the drop count to surface `"Imported N findings (M new, K duplicates)"`.
|
||||||
|
|
||||||
|
Tests (`pkg/importer/dedup_test.go`, 8 total, all passing):
|
||||||
|
- `TestFindingKey_Stable`
|
||||||
|
- `TestFindingKey_DiffersByProvider / ByMasked / BySource / ByLine`
|
||||||
|
- `TestDedup_PreservesOrder` — `[A,B,A,C,B] -> [A,B,C]`, dropped=2
|
||||||
|
- `TestDedup_Empty` — nil input yields empty slice, 0 dropped
|
||||||
|
- `TestDedup_IgnoresUnrelatedFields` — identical identity but different `DetectedAt`/`Confidence` collapses to one, first-seen wins
|
||||||
|
|
||||||
|
### Task 2: SARIF GitHub validation test
|
||||||
|
- `testdata/sarif/sarif-2.1.0-minimal-schema.json`: required-fields subset for GitHub Code Scanning (top-level, run, tool.driver, result, location.physicalLocation, region, allowed result levels). Hand-curated from https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning rather than shipping the 500KB full schema.
|
||||||
|
- `pkg/output/sarif_github_test.go`:
|
||||||
|
- `TestSARIFGitHubValidation`: renders 3 findings (high/medium/low confidence) through `SARIFFormatter.Format`, unmarshals to `map[string]any`, then walks the document and asserts every required field in the fixture exists, `version == "2.1.0"`, `$schema` is a non-empty `https://` URL, exactly one run, `tool.driver.name == "keyhunter"`, each result has `ruleId`, `level ∈ allowed_levels`, non-empty `message.text`, non-empty `locations`, and every `physicalLocation.region.startLine >= 1`. Includes one finding with `LineNumber: 0` to prove the Phase 6 `startLine` floor converts to 1.
|
||||||
|
- `TestSARIFGitHubValidation_EmptyFindings`: empty input still produces a valid document with `runs[0].results == []` (not `null`) and tool.driver populated.
|
||||||
|
- Uses stdlib only (`encoding/json`, `os`, `path/filepath`, `strings`, `testing`). Test walks `../../testdata/sarif/...` from the package directory so the fixture stays relocatable.
|
||||||
|
|
||||||
|
## Commits
|
||||||
|
|
||||||
|
| Task | Commit | Description |
|
||||||
|
| ---- | ------- | ---------------------------------------- |
|
||||||
|
| 1 | 6a3d5b0 | feat(07-03): dedup helper for imported findings |
|
||||||
|
| 2 | bd8eb9b | test(07-03): SARIF GitHub code scanning validation |
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
```
|
||||||
|
$ go test ./pkg/importer/... ./pkg/output/...
|
||||||
|
ok github.com/salvacybersec/keyhunter/pkg/importer (cached)
|
||||||
|
ok github.com/salvacybersec/keyhunter/pkg/output 0.008s
|
||||||
|
```
|
||||||
|
|
||||||
|
All 8 importer dedup tests + 2 SARIF validation tests pass. Both targeted `go test` invocations from the plan's verify blocks pass first-try with no deviations.
|
||||||
|
|
||||||
|
## Deviations from Plan
|
||||||
|
|
||||||
|
None - plan executed exactly as written.
|
||||||
|
|
||||||
|
## Downstream Consumers
|
||||||
|
|
||||||
|
- **Plan 07-04** (import CLI command): will call `importer.Dedup` after parsing TruffleHog/Gitleaks output and report `(unique, dropped)` to the user before inserting into storage.
|
||||||
|
- **CICD-02 closure**: `TestSARIFGitHubValidation` functions as a regression guard — any future change to `SARIFFormatter` that breaks GitHub Code Scanning upload compatibility will fail this test before reaching users.
|
||||||
|
|
||||||
|
## Key Decisions
|
||||||
|
|
||||||
|
1. **Identity tuple excludes verification/timing fields.** A verified finding re-imported later should still collapse onto the original. Only provider + masked key + source + line number define identity.
|
||||||
|
2. **Hand-rolled required-fields fixture over full SARIF schema.** GitHub enforces a small, well-documented subset. Shipping the full 500KB JSON Schema and a validator library would bloat the test binary without catching more bugs that matter to GitHub uploads.
|
||||||
|
3. **Line number floor asserted in test.** Makes the Phase 6 `startLine = max(line, 1)` behavior a contract rather than an incidental implementation detail — future refactors can't silently reintroduce `startLine: 0`, which GitHub rejects.
|
||||||
|
|
||||||
|
## Self-Check: PASSED
|
||||||
|
|
||||||
|
- FOUND: pkg/importer/dedup.go
|
||||||
|
- FOUND: pkg/importer/dedup_test.go
|
||||||
|
- FOUND: pkg/output/sarif_github_test.go
|
||||||
|
- FOUND: testdata/sarif/sarif-2.1.0-minimal-schema.json
|
||||||
|
- FOUND commit: 6a3d5b0
|
||||||
|
- FOUND commit: bd8eb9b
|
||||||
|
- Tests verified green: `go test ./pkg/importer/... ./pkg/output/...`
|
||||||
|
- No stubs introduced.
|
||||||
Reference in New Issue
Block a user