Files
2026-04-05 23:47:19 +03:00

4.9 KiB

Phase 7: Import Adapters & CI/CD Integration - Context

Gathered: 2026-04-05 Status: Ready for planning Mode: Auto-generated

## Phase Boundary

Three capabilities:

  1. Import findings from TruffleHog v3 JSON and Gitleaks JSON/CSV into the SQLite database (normalize to KeyHunter Finding schema, deduplicate)
  2. Git pre-commit hook install/uninstall that runs keyhunter scan on staged files and blocks commits with findings
  3. SARIF 2.1.0 output already built in Phase 6 — this phase verifies it passes GitHub Code Scanning validation
## Implementation Decisions

Import Package (IMP-01, IMP-02, IMP-03)

  • New package: pkg/importer/
  • Subdirectory per format: pkg/importer/trufflehog.go, pkg/importer/gitleaks.go
  • Common interface: Importer with Import(r io.Reader) ([]engine.Finding, error)
  • TruffleHog v3 JSON schema: array of objects with {SourceID, SourceMetadata, SourceName, DetectorName, DetectorType, Verified, Raw, Redacted, ExtraData}. Map:
    • DetectorName → Finding.ProviderName (lowercase)
    • Raw → Finding.KeyValue
    • Verified → Finding.VerifyStatus (live/unverified)
    • SourceMetadata → source path (JSON-nested, need path-based extraction)
  • Gitleaks JSON schema: array of {Description, StartLine, EndLine, StartColumn, EndColumn, Match, Secret, File, SymlinkFile, Commit, Entropy, Author, Email, Date, Message, Tags, RuleID, Fingerprint}. Map:
    • RuleID → Finding.ProviderName (normalize against our provider names)
    • Secret → Finding.KeyValue
    • File → Finding.Source
    • StartLine → Finding.LineNumber
  • Gitleaks CSV: same columns as JSON, use encoding/csv
  • Deduplication: hash(provider + masked_key + source) before insert, skip if exists
  • CLI command: keyhunter import --format=<trufflehog|gitleaks|gitleaks-csv> <file> reads file, inserts findings

Git Hook (CICD-01)

  • Install: keyhunter hook install writes .git/hooks/pre-commit shell script that calls keyhunter scan $(git diff --cached --name-only --diff-filter=ACMR) with --exit-code
  • Uninstall: keyhunter hook uninstall removes the script (preserves any non-keyhunter portions in a backup)
  • Content: simple bash script with shebang, runs keyhunter, exits with scan's exit code
  • Detection: before install, check if pre-commit already exists; if yes, ask to append or replace (--force flag to skip prompt)

SARIF GitHub Validation (CICD-02)

  • Already produced in Phase 6 by SARIFFormatter
  • This phase: add a validation test using official SARIF schema JSON (download once, embed in repo as testdata, validate output against it via gjson or simple schema check)
  • Add documentation in README about uploading to GitHub: .github/workflows/keyhunter.yml example workflow
  • Validate the SARIF against a real GitHub-accepted format — use a minimal validator or manual schema check

New Files

pkg/importer/
  importer.go       — Importer interface
  trufflehog.go     — TruffleHog v3 JSON parser
  gitleaks.go       — Gitleaks JSON + CSV parser
  dedup.go          — dedup hash logic
  importer_test.go
  testdata/
    trufflehog-sample.json
    gitleaks-sample.json
    gitleaks-sample.csv

cmd/
  import.go         — keyhunter import command (replace stub)
  hook.go           — keyhunter hook install/uninstall (replace stub)
  hook_script.sh    — embedded pre-commit script template (via go:embed)

docs/
  CI-CD.md          — GitHub Actions example, pre-commit setup

testdata/sarif/
  sarif-2.1.0-schema.json  — official schema for validation tests

<code_context>

Existing Code Insights

Reusable Assets

  • pkg/engine/finding.go — Finding struct (target for imports)
  • pkg/storage/findings.go — SaveFinding (target for inserts)
  • pkg/output/sarif.go — SARIFFormatter from Phase 6
  • cmd/stubs.go — import and hook are stubs to replace

Provider Name Normalization

  • TruffleHog uses names like "OpenAI", "GitHubV2", "AWS" (mixed case)
  • Gitleaks uses names like "openai-api-key", "aws-access-token" (kebab)
  • Normalize to KeyHunter's lowercase names: openai, aws-bedrock, etc.
  • Unknown provider names → keep as-is, tag confidence "imported"

</code_context>

## Specific Ideas
  • Import command should show a summary: "Imported N findings (M new, K duplicates)"
  • Hook install should verify .git/ exists in current directory; error if not a repo
  • SARIF validation test should check: $schema, version, runs[], runs[].tool.driver.name == "keyhunter", each result has ruleId, level, message, locations
## Deferred Ideas
  • Import from arbitrary JSON formats via jsonpath config — over-engineering
  • pre-push and post-merge hooks — pre-commit is enough for v1
  • GitHub App integration for automatic scanning on PRs — separate project
  • Semgrep/Snyk output format imports — defer to v2