4.9 KiB
4.9 KiB
Phase 7: Import Adapters & CI/CD Integration - Context
Gathered: 2026-04-05 Status: Ready for planning Mode: Auto-generated
## Phase BoundaryThree capabilities:
- Import findings from TruffleHog v3 JSON and Gitleaks JSON/CSV into the SQLite database (normalize to KeyHunter Finding schema, deduplicate)
- Git pre-commit hook install/uninstall that runs
keyhunter scanon staged files and blocks commits with findings - SARIF 2.1.0 output already built in Phase 6 — this phase verifies it passes GitHub Code Scanning validation
Import Package (IMP-01, IMP-02, IMP-03)
- New package:
pkg/importer/ - Subdirectory per format:
pkg/importer/trufflehog.go,pkg/importer/gitleaks.go - Common interface:
ImporterwithImport(r io.Reader) ([]engine.Finding, error) - TruffleHog v3 JSON schema: array of objects with
{SourceID, SourceMetadata, SourceName, DetectorName, DetectorType, Verified, Raw, Redacted, ExtraData}. Map:- DetectorName → Finding.ProviderName (lowercase)
- Raw → Finding.KeyValue
- Verified → Finding.VerifyStatus (live/unverified)
- SourceMetadata → source path (JSON-nested, need path-based extraction)
- Gitleaks JSON schema: array of
{Description, StartLine, EndLine, StartColumn, EndColumn, Match, Secret, File, SymlinkFile, Commit, Entropy, Author, Email, Date, Message, Tags, RuleID, Fingerprint}. Map:- RuleID → Finding.ProviderName (normalize against our provider names)
- Secret → Finding.KeyValue
- File → Finding.Source
- StartLine → Finding.LineNumber
- Gitleaks CSV: same columns as JSON, use encoding/csv
- Deduplication: hash(provider + masked_key + source) before insert, skip if exists
- CLI command:
keyhunter import --format=<trufflehog|gitleaks|gitleaks-csv> <file>reads file, inserts findings
Git Hook (CICD-01)
- Install:
keyhunter hook installwrites.git/hooks/pre-commitshell script that callskeyhunter scan $(git diff --cached --name-only --diff-filter=ACMR)with--exit-code - Uninstall:
keyhunter hook uninstallremoves the script (preserves any non-keyhunter portions in a backup) - Content: simple bash script with shebang, runs keyhunter, exits with scan's exit code
- Detection: before install, check if pre-commit already exists; if yes, ask to append or replace (--force flag to skip prompt)
SARIF GitHub Validation (CICD-02)
- Already produced in Phase 6 by SARIFFormatter
- This phase: add a validation test using official SARIF schema JSON (download once, embed in repo as testdata, validate output against it via gjson or simple schema check)
- Add documentation in README about uploading to GitHub:
.github/workflows/keyhunter.ymlexample workflow - Validate the SARIF against a real GitHub-accepted format — use a minimal validator or manual schema check
New Files
pkg/importer/
importer.go — Importer interface
trufflehog.go — TruffleHog v3 JSON parser
gitleaks.go — Gitleaks JSON + CSV parser
dedup.go — dedup hash logic
importer_test.go
testdata/
trufflehog-sample.json
gitleaks-sample.json
gitleaks-sample.csv
cmd/
import.go — keyhunter import command (replace stub)
hook.go — keyhunter hook install/uninstall (replace stub)
hook_script.sh — embedded pre-commit script template (via go:embed)
docs/
CI-CD.md — GitHub Actions example, pre-commit setup
testdata/sarif/
sarif-2.1.0-schema.json — official schema for validation tests
<code_context>
Existing Code Insights
Reusable Assets
- pkg/engine/finding.go — Finding struct (target for imports)
- pkg/storage/findings.go — SaveFinding (target for inserts)
- pkg/output/sarif.go — SARIFFormatter from Phase 6
- cmd/stubs.go — import and hook are stubs to replace
Provider Name Normalization
- TruffleHog uses names like "OpenAI", "GitHubV2", "AWS" (mixed case)
- Gitleaks uses names like "openai-api-key", "aws-access-token" (kebab)
- Normalize to KeyHunter's lowercase names: openai, aws-bedrock, etc.
- Unknown provider names → keep as-is, tag confidence "imported"
</code_context>
## Specific Ideas- Import command should show a summary: "Imported N findings (M new, K duplicates)"
- Hook install should verify
.git/exists in current directory; error if not a repo - SARIF validation test should check:
$schema,version,runs[],runs[].tool.driver.name == "keyhunter", each result hasruleId,level,message,locations
- Import from arbitrary JSON formats via jsonpath config — over-engineering
- pre-push and post-merge hooks — pre-commit is enough for v1
- GitHub App integration for automatic scanning on PRs — separate project
- Semgrep/Snyk output format imports — defer to v2