Files
2026-04-05 23:53:14 +03:00

8.2 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
phase plan type wave depends_on files_modified autonomous requirements must_haves
07-import-cicd 03 execute 1
pkg/importer/dedup.go
pkg/importer/dedup_test.go
pkg/output/sarif_github_test.go
testdata/sarif/sarif-2.1.0-minimal-schema.json
true
IMP-03
CICD-02
truths artifacts key_links
Duplicate findings (same provider + masked key + source) are detected via stable hash
SARIF output from Phase 6 contains all GitHub-required fields for code scanning uploads
path provides contains
pkg/importer/dedup.go FindingKey hash + Dedup function func FindingKey
path provides contains
pkg/output/sarif_github_test.go GitHub code scanning SARIF validation test TestSARIFGitHubValidation
from to via pattern
pkg/importer/dedup.go pkg/engine/finding.go hashes engine.Finding fields engine.Finding
from to via pattern
pkg/output/sarif_github_test.go pkg/output/sarif.go renders SARIFFormatter output and validates required fields SARIFFormatter
Build two independent assets needed by Plan 07-04 and the GitHub integration story: (1) deduplication helper for imported findings (IMP-03), (2) a SARIF GitHub validation test that asserts Phase 6's SARIF output satisfies GitHub Code Scanning requirements (CICD-02).

Purpose: Imports will be re-run repeatedly; without dedup the database fills with copies. GitHub upload validation closes the loop on CICD-02 by proving SARIF output is acceptable without manual upload. Output: Dedup package function, dedup unit tests, SARIF validation test, minimal schema fixture.

<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/phases/07-import-cicd/07-CONTEXT.md @pkg/engine/finding.go @pkg/output/sarif.go From pkg/output/sarif.go: ```go type SARIFFormatter struct{} func (SARIFFormatter) Format(findings []engine.Finding, w io.Writer, opts Options) error ``` From pkg/engine/finding.go: engine.Finding with ProviderName, KeyMasked, Source, LineNumber. Task 1: Dedup helper for imported findings pkg/importer/dedup.go, pkg/importer/dedup_test.go - FindingKey(f engine.Finding) string returns hex-encoded SHA-256 over "provider\x00masked\x00source\x00line". - Dedup(in []engine.Finding) (unique []engine.Finding, duplicates int): preserves first-seen order, drops subsequent matches of the same FindingKey, returns count of dropped. - Two findings with same provider+masked+source+line are duplicates regardless of other fields (DetectedAt, Confidence). - Different source paths or different line numbers are NOT duplicates. Create pkg/importer/dedup.go: ```go package importer
import (
    "crypto/sha256"
    "encoding/hex"
    "fmt"

    "github.com/salvacybersec/keyhunter/pkg/engine"
)

// FindingKey returns a stable identity hash for a finding based on the
// provider name, masked key, source path, and line number. This is the
// dedup identity used by import pipelines so the same underlying secret
// is not inserted twice when re-importing the same scanner output.
func FindingKey(f engine.Finding) string {
    payload := fmt.Sprintf("%s\x00%s\x00%s\x00%d", f.ProviderName, f.KeyMasked, f.Source, f.LineNumber)
    sum := sha256.Sum256([]byte(payload))
    return hex.EncodeToString(sum[:])
}

// Dedup removes duplicate findings from in-memory slices before insert.
// Order of first-seen findings is preserved. Returns the deduplicated
// slice and the number of duplicates dropped.
func Dedup(in []engine.Finding) ([]engine.Finding, int) {
    seen := make(map[string]struct{}, len(in))
    out := make([]engine.Finding, 0, len(in))
    dropped := 0
    for _, f := range in {
        k := FindingKey(f)
        if _, ok := seen[k]; ok {
            dropped++
            continue
        }
        seen[k] = struct{}{}
        out = append(out, f)
    }
    return out, dropped
}
```

Create pkg/importer/dedup_test.go with tests:
- TestFindingKey_Stable: same finding twice -> identical key.
- TestFindingKey_DiffersByProvider / ByMasked / BySource / ByLine.
- TestDedup_PreservesOrder: input [A, B, A, C, B] -> output [A, B, C], dropped=2.
- TestDedup_Empty: nil slice -> empty slice, 0 dropped.
- TestDedup_IgnoresUnrelatedFields: two findings identical except DetectedAt and Confidence -> one kept.
cd /home/salva/Documents/apikey && go test ./pkg/importer/... -run Dedup -v - FindingKey + Dedup implemented - 5 tests pass Task 2: SARIF GitHub code scanning validation test pkg/output/sarif_github_test.go, testdata/sarif/sarif-2.1.0-minimal-schema.json Create testdata/sarif/sarif-2.1.0-minimal-schema.json — a minimal JSON document listing GitHub's required SARIF fields for code scanning upload. Not the full schema (would be 500KB); the required-fields subset documented at https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning. Content: ```json { "required_top_level": ["$schema", "version", "runs"], "required_run": ["tool", "results"], "required_tool_driver": ["name", "version"], "required_result": ["ruleId", "level", "message", "locations"], "required_location_physical": ["artifactLocation", "region"], "required_region": ["startLine"], "allowed_levels": ["error", "warning", "note", "none"] } ```
Create pkg/output/sarif_github_test.go (package `output`):
- TestSARIFGitHubValidation:
  1. Build a []engine.Finding of 3 findings spanning high/medium/low confidence with realistic values (ProviderName, KeyValue, KeyMasked, Source, LineNumber).
  2. Render via SARIFFormatter.Format into a bytes.Buffer with Options{ToolName: "keyhunter", ToolVersion: "test"}.
  3. json.Unmarshal into map[string]any.
  4. Load testdata/sarif/sarif-2.1.0-minimal-schema.json (relative to test file via os.ReadFile).
  5. Assert every key in required_top_level exists at root.
  6. Assert doc["version"] == "2.1.0".
  7. Assert doc["$schema"] is a non-empty string starting with "https://".
  8. runs := doc["runs"].([]any); require len(runs) == 1.
  9. For the single run, assert tool.driver.name == "keyhunter", version non-empty, results is a slice.
  10. For each result: assert ruleId non-empty string, level in allowed_levels, message.text non-empty, locations is non-empty slice.
  11. For each location: assert physicalLocation.artifactLocation.uri non-empty and physicalLocation.region.startLine >= 1.
  12. Assert startLine is always >= 1 even when input LineNumber is 0 (test one finding with LineNumber: 0 and confirm startLine in output == 1 — matches Phase 6 floor behavior).
- TestSARIFGitHubValidation_EmptyFindings: empty findings slice still produces a valid document with runs[0].results == [] (not null), tool.driver present.

Use standard library only (encoding/json, os, path/filepath, testing). No schema validation library.
cd /home/salva/Documents/apikey && go test ./pkg/output/... -run SARIFGitHub -v - testdata/sarif/sarif-2.1.0-minimal-schema.json committed - pkg/output/sarif_github_test.go passes - SARIFFormatter output provably satisfies GitHub Code Scanning required fields go test ./pkg/importer/... ./pkg/output/... passes.

<success_criteria> Dedup helper usable by the import command (07-04). SARIF output validated against GitHub's required-field surface with no external dependencies, proving CICD-02 end-to-end. </success_criteria>

After completion, create `.planning/phases/07-import-cicd/07-03-SUMMARY.md`.