112 lines
4.9 KiB
Markdown
112 lines
4.9 KiB
Markdown
# Phase 7: Import Adapters & CI/CD Integration - Context
|
|
|
|
**Gathered:** 2026-04-05
|
|
**Status:** Ready for planning
|
|
**Mode:** Auto-generated
|
|
|
|
<domain>
|
|
## Phase Boundary
|
|
|
|
Three capabilities:
|
|
1. Import findings from TruffleHog v3 JSON and Gitleaks JSON/CSV into the SQLite database (normalize to KeyHunter Finding schema, deduplicate)
|
|
2. Git pre-commit hook install/uninstall that runs `keyhunter scan` on staged files and blocks commits with findings
|
|
3. SARIF 2.1.0 output already built in Phase 6 — this phase verifies it passes GitHub Code Scanning validation
|
|
|
|
</domain>
|
|
|
|
<decisions>
|
|
## Implementation Decisions
|
|
|
|
### Import Package (IMP-01, IMP-02, IMP-03)
|
|
- **New package**: `pkg/importer/`
|
|
- **Subdirectory per format**: `pkg/importer/trufflehog.go`, `pkg/importer/gitleaks.go`
|
|
- **Common interface**: `Importer` with `Import(r io.Reader) ([]engine.Finding, error)`
|
|
- **TruffleHog v3 JSON schema**: array of objects with `{SourceID, SourceMetadata, SourceName, DetectorName, DetectorType, Verified, Raw, Redacted, ExtraData}`. Map:
|
|
- DetectorName → Finding.ProviderName (lowercase)
|
|
- Raw → Finding.KeyValue
|
|
- Verified → Finding.VerifyStatus (live/unverified)
|
|
- SourceMetadata → source path (JSON-nested, need path-based extraction)
|
|
- **Gitleaks JSON schema**: array of `{Description, StartLine, EndLine, StartColumn, EndColumn, Match, Secret, File, SymlinkFile, Commit, Entropy, Author, Email, Date, Message, Tags, RuleID, Fingerprint}`. Map:
|
|
- RuleID → Finding.ProviderName (normalize against our provider names)
|
|
- Secret → Finding.KeyValue
|
|
- File → Finding.Source
|
|
- StartLine → Finding.LineNumber
|
|
- **Gitleaks CSV**: same columns as JSON, use encoding/csv
|
|
- **Deduplication**: hash(provider + masked_key + source) before insert, skip if exists
|
|
- **CLI command**: `keyhunter import --format=<trufflehog|gitleaks|gitleaks-csv> <file>` reads file, inserts findings
|
|
|
|
### Git Hook (CICD-01)
|
|
- **Install**: `keyhunter hook install` writes `.git/hooks/pre-commit` shell script that calls `keyhunter scan $(git diff --cached --name-only --diff-filter=ACMR)` with `--exit-code`
|
|
- **Uninstall**: `keyhunter hook uninstall` removes the script (preserves any non-keyhunter portions in a backup)
|
|
- **Content**: simple bash script with shebang, runs keyhunter, exits with scan's exit code
|
|
- **Detection**: before install, check if pre-commit already exists; if yes, ask to append or replace (--force flag to skip prompt)
|
|
|
|
### SARIF GitHub Validation (CICD-02)
|
|
- Already produced in Phase 6 by SARIFFormatter
|
|
- This phase: add a validation test using official SARIF schema JSON (download once, embed in repo as testdata, validate output against it via gjson or simple schema check)
|
|
- Add documentation in README about uploading to GitHub: `.github/workflows/keyhunter.yml` example workflow
|
|
- Validate the SARIF against a real GitHub-accepted format — use a minimal validator or manual schema check
|
|
|
|
### New Files
|
|
```
|
|
pkg/importer/
|
|
importer.go — Importer interface
|
|
trufflehog.go — TruffleHog v3 JSON parser
|
|
gitleaks.go — Gitleaks JSON + CSV parser
|
|
dedup.go — dedup hash logic
|
|
importer_test.go
|
|
testdata/
|
|
trufflehog-sample.json
|
|
gitleaks-sample.json
|
|
gitleaks-sample.csv
|
|
|
|
cmd/
|
|
import.go — keyhunter import command (replace stub)
|
|
hook.go — keyhunter hook install/uninstall (replace stub)
|
|
hook_script.sh — embedded pre-commit script template (via go:embed)
|
|
|
|
docs/
|
|
CI-CD.md — GitHub Actions example, pre-commit setup
|
|
|
|
testdata/sarif/
|
|
sarif-2.1.0-schema.json — official schema for validation tests
|
|
```
|
|
|
|
</decisions>
|
|
|
|
<code_context>
|
|
## Existing Code Insights
|
|
|
|
### Reusable Assets
|
|
- pkg/engine/finding.go — Finding struct (target for imports)
|
|
- pkg/storage/findings.go — SaveFinding (target for inserts)
|
|
- pkg/output/sarif.go — SARIFFormatter from Phase 6
|
|
- cmd/stubs.go — import and hook are stubs to replace
|
|
|
|
### Provider Name Normalization
|
|
- TruffleHog uses names like "OpenAI", "GitHubV2", "AWS" (mixed case)
|
|
- Gitleaks uses names like "openai-api-key", "aws-access-token" (kebab)
|
|
- Normalize to KeyHunter's lowercase names: openai, aws-bedrock, etc.
|
|
- Unknown provider names → keep as-is, tag confidence "imported"
|
|
|
|
</code_context>
|
|
|
|
<specifics>
|
|
## Specific Ideas
|
|
|
|
- Import command should show a summary: "Imported N findings (M new, K duplicates)"
|
|
- Hook install should verify `.git/` exists in current directory; error if not a repo
|
|
- SARIF validation test should check: `$schema`, `version`, `runs[]`, `runs[].tool.driver.name == "keyhunter"`, each result has `ruleId`, `level`, `message`, `locations`
|
|
|
|
</specifics>
|
|
|
|
<deferred>
|
|
## Deferred Ideas
|
|
|
|
- Import from arbitrary JSON formats via jsonpath config — over-engineering
|
|
- pre-push and post-merge hooks — pre-commit is enough for v1
|
|
- GitHub App integration for automatic scanning on PRs — separate project
|
|
- Semgrep/Snyk output format imports — defer to v2
|
|
|
|
</deferred>
|