--- phase: 07-import-cicd plan: 03 type: execute wave: 1 depends_on: [] files_modified: - pkg/importer/dedup.go - pkg/importer/dedup_test.go - pkg/output/sarif_github_test.go - testdata/sarif/sarif-2.1.0-minimal-schema.json autonomous: true requirements: [IMP-03, CICD-02] must_haves: truths: - "Duplicate findings (same provider + masked key + source) are detected via stable hash" - "SARIF output from Phase 6 contains all GitHub-required fields for code scanning uploads" artifacts: - path: pkg/importer/dedup.go provides: "FindingKey hash + Dedup function" contains: "func FindingKey" - path: pkg/output/sarif_github_test.go provides: "GitHub code scanning SARIF validation test" contains: "TestSARIFGitHubValidation" key_links: - from: pkg/importer/dedup.go to: pkg/engine/finding.go via: "hashes engine.Finding fields" pattern: "engine\\.Finding" - from: pkg/output/sarif_github_test.go to: pkg/output/sarif.go via: "renders SARIFFormatter output and validates required fields" pattern: "SARIFFormatter" --- Build two independent assets needed by Plan 07-04 and the GitHub integration story: (1) deduplication helper for imported findings (IMP-03), (2) a SARIF GitHub validation test that asserts Phase 6's SARIF output satisfies GitHub Code Scanning requirements (CICD-02). Purpose: Imports will be re-run repeatedly; without dedup the database fills with copies. GitHub upload validation closes the loop on CICD-02 by proving SARIF output is acceptable without manual upload. Output: Dedup package function, dedup unit tests, SARIF validation test, minimal schema fixture. @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md @.planning/phases/07-import-cicd/07-CONTEXT.md @pkg/engine/finding.go @pkg/output/sarif.go From pkg/output/sarif.go: ```go type SARIFFormatter struct{} func (SARIFFormatter) Format(findings []engine.Finding, w io.Writer, opts Options) error ``` From pkg/engine/finding.go: engine.Finding with ProviderName, KeyMasked, Source, LineNumber. Task 1: Dedup helper for imported findings pkg/importer/dedup.go, pkg/importer/dedup_test.go - FindingKey(f engine.Finding) string returns hex-encoded SHA-256 over "provider\x00masked\x00source\x00line". - Dedup(in []engine.Finding) (unique []engine.Finding, duplicates int): preserves first-seen order, drops subsequent matches of the same FindingKey, returns count of dropped. - Two findings with same provider+masked+source+line are duplicates regardless of other fields (DetectedAt, Confidence). - Different source paths or different line numbers are NOT duplicates. Create pkg/importer/dedup.go: ```go package importer import ( "crypto/sha256" "encoding/hex" "fmt" "github.com/salvacybersec/keyhunter/pkg/engine" ) // FindingKey returns a stable identity hash for a finding based on the // provider name, masked key, source path, and line number. This is the // dedup identity used by import pipelines so the same underlying secret // is not inserted twice when re-importing the same scanner output. func FindingKey(f engine.Finding) string { payload := fmt.Sprintf("%s\x00%s\x00%s\x00%d", f.ProviderName, f.KeyMasked, f.Source, f.LineNumber) sum := sha256.Sum256([]byte(payload)) return hex.EncodeToString(sum[:]) } // Dedup removes duplicate findings from in-memory slices before insert. // Order of first-seen findings is preserved. Returns the deduplicated // slice and the number of duplicates dropped. func Dedup(in []engine.Finding) ([]engine.Finding, int) { seen := make(map[string]struct{}, len(in)) out := make([]engine.Finding, 0, len(in)) dropped := 0 for _, f := range in { k := FindingKey(f) if _, ok := seen[k]; ok { dropped++ continue } seen[k] = struct{}{} out = append(out, f) } return out, dropped } ``` Create pkg/importer/dedup_test.go with tests: - TestFindingKey_Stable: same finding twice -> identical key. - TestFindingKey_DiffersByProvider / ByMasked / BySource / ByLine. - TestDedup_PreservesOrder: input [A, B, A, C, B] -> output [A, B, C], dropped=2. - TestDedup_Empty: nil slice -> empty slice, 0 dropped. - TestDedup_IgnoresUnrelatedFields: two findings identical except DetectedAt and Confidence -> one kept. cd /home/salva/Documents/apikey && go test ./pkg/importer/... -run Dedup -v - FindingKey + Dedup implemented - 5 tests pass Task 2: SARIF GitHub code scanning validation test pkg/output/sarif_github_test.go, testdata/sarif/sarif-2.1.0-minimal-schema.json Create testdata/sarif/sarif-2.1.0-minimal-schema.json — a minimal JSON document listing GitHub's required SARIF fields for code scanning upload. Not the full schema (would be 500KB); the required-fields subset documented at https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning. Content: ```json { "required_top_level": ["$schema", "version", "runs"], "required_run": ["tool", "results"], "required_tool_driver": ["name", "version"], "required_result": ["ruleId", "level", "message", "locations"], "required_location_physical": ["artifactLocation", "region"], "required_region": ["startLine"], "allowed_levels": ["error", "warning", "note", "none"] } ``` Create pkg/output/sarif_github_test.go (package `output`): - TestSARIFGitHubValidation: 1. Build a []engine.Finding of 3 findings spanning high/medium/low confidence with realistic values (ProviderName, KeyValue, KeyMasked, Source, LineNumber). 2. Render via SARIFFormatter.Format into a bytes.Buffer with Options{ToolName: "keyhunter", ToolVersion: "test"}. 3. json.Unmarshal into map[string]any. 4. Load testdata/sarif/sarif-2.1.0-minimal-schema.json (relative to test file via os.ReadFile). 5. Assert every key in required_top_level exists at root. 6. Assert doc["version"] == "2.1.0". 7. Assert doc["$schema"] is a non-empty string starting with "https://". 8. runs := doc["runs"].([]any); require len(runs) == 1. 9. For the single run, assert tool.driver.name == "keyhunter", version non-empty, results is a slice. 10. For each result: assert ruleId non-empty string, level in allowed_levels, message.text non-empty, locations is non-empty slice. 11. For each location: assert physicalLocation.artifactLocation.uri non-empty and physicalLocation.region.startLine >= 1. 12. Assert startLine is always >= 1 even when input LineNumber is 0 (test one finding with LineNumber: 0 and confirm startLine in output == 1 — matches Phase 6 floor behavior). - TestSARIFGitHubValidation_EmptyFindings: empty findings slice still produces a valid document with runs[0].results == [] (not null), tool.driver present. Use standard library only (encoding/json, os, path/filepath, testing). No schema validation library. cd /home/salva/Documents/apikey && go test ./pkg/output/... -run SARIFGitHub -v - testdata/sarif/sarif-2.1.0-minimal-schema.json committed - pkg/output/sarif_github_test.go passes - SARIFFormatter output provably satisfies GitHub Code Scanning required fields go test ./pkg/importer/... ./pkg/output/... passes. Dedup helper usable by the import command (07-04). SARIF output validated against GitHub's required-field surface with no external dependencies, proving CICD-02 end-to-end. After completion, create `.planning/phases/07-import-cicd/07-03-SUMMARY.md`.