docs(07): import adapters and CI/CD context
This commit is contained in:
111
.planning/phases/07-import-cicd/07-CONTEXT.md
Normal file
111
.planning/phases/07-import-cicd/07-CONTEXT.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# Phase 7: Import Adapters & CI/CD Integration - Context
|
||||
|
||||
**Gathered:** 2026-04-05
|
||||
**Status:** Ready for planning
|
||||
**Mode:** Auto-generated
|
||||
|
||||
<domain>
|
||||
## Phase Boundary
|
||||
|
||||
Three capabilities:
|
||||
1. Import findings from TruffleHog v3 JSON and Gitleaks JSON/CSV into the SQLite database (normalize to KeyHunter Finding schema, deduplicate)
|
||||
2. Git pre-commit hook install/uninstall that runs `keyhunter scan` on staged files and blocks commits with findings
|
||||
3. SARIF 2.1.0 output already built in Phase 6 — this phase verifies it passes GitHub Code Scanning validation
|
||||
|
||||
</domain>
|
||||
|
||||
<decisions>
|
||||
## Implementation Decisions
|
||||
|
||||
### Import Package (IMP-01, IMP-02, IMP-03)
|
||||
- **New package**: `pkg/importer/`
|
||||
- **Subdirectory per format**: `pkg/importer/trufflehog.go`, `pkg/importer/gitleaks.go`
|
||||
- **Common interface**: `Importer` with `Import(r io.Reader) ([]engine.Finding, error)`
|
||||
- **TruffleHog v3 JSON schema**: array of objects with `{SourceID, SourceMetadata, SourceName, DetectorName, DetectorType, Verified, Raw, Redacted, ExtraData}`. Map:
|
||||
- DetectorName → Finding.ProviderName (lowercase)
|
||||
- Raw → Finding.KeyValue
|
||||
- Verified → Finding.VerifyStatus (live/unverified)
|
||||
- SourceMetadata → source path (JSON-nested, need path-based extraction)
|
||||
- **Gitleaks JSON schema**: array of `{Description, StartLine, EndLine, StartColumn, EndColumn, Match, Secret, File, SymlinkFile, Commit, Entropy, Author, Email, Date, Message, Tags, RuleID, Fingerprint}`. Map:
|
||||
- RuleID → Finding.ProviderName (normalize against our provider names)
|
||||
- Secret → Finding.KeyValue
|
||||
- File → Finding.Source
|
||||
- StartLine → Finding.LineNumber
|
||||
- **Gitleaks CSV**: same columns as JSON, use encoding/csv
|
||||
- **Deduplication**: hash(provider + masked_key + source) before insert, skip if exists
|
||||
- **CLI command**: `keyhunter import --format=<trufflehog|gitleaks|gitleaks-csv> <file>` reads file, inserts findings
|
||||
|
||||
### Git Hook (CICD-01)
|
||||
- **Install**: `keyhunter hook install` writes `.git/hooks/pre-commit` shell script that calls `keyhunter scan $(git diff --cached --name-only --diff-filter=ACMR)` with `--exit-code`
|
||||
- **Uninstall**: `keyhunter hook uninstall` removes the script (preserves any non-keyhunter portions in a backup)
|
||||
- **Content**: simple bash script with shebang, runs keyhunter, exits with scan's exit code
|
||||
- **Detection**: before install, check if pre-commit already exists; if yes, ask to append or replace (--force flag to skip prompt)
|
||||
|
||||
### SARIF GitHub Validation (CICD-02)
|
||||
- Already produced in Phase 6 by SARIFFormatter
|
||||
- This phase: add a validation test using official SARIF schema JSON (download once, embed in repo as testdata, validate output against it via gjson or simple schema check)
|
||||
- Add documentation in README about uploading to GitHub: `.github/workflows/keyhunter.yml` example workflow
|
||||
- Validate the SARIF against a real GitHub-accepted format — use a minimal validator or manual schema check
|
||||
|
||||
### New Files
|
||||
```
|
||||
pkg/importer/
|
||||
importer.go — Importer interface
|
||||
trufflehog.go — TruffleHog v3 JSON parser
|
||||
gitleaks.go — Gitleaks JSON + CSV parser
|
||||
dedup.go — dedup hash logic
|
||||
importer_test.go
|
||||
testdata/
|
||||
trufflehog-sample.json
|
||||
gitleaks-sample.json
|
||||
gitleaks-sample.csv
|
||||
|
||||
cmd/
|
||||
import.go — keyhunter import command (replace stub)
|
||||
hook.go — keyhunter hook install/uninstall (replace stub)
|
||||
hook_script.sh — embedded pre-commit script template (via go:embed)
|
||||
|
||||
docs/
|
||||
CI-CD.md — GitHub Actions example, pre-commit setup
|
||||
|
||||
testdata/sarif/
|
||||
sarif-2.1.0-schema.json — official schema for validation tests
|
||||
```
|
||||
|
||||
</decisions>
|
||||
|
||||
<code_context>
|
||||
## Existing Code Insights
|
||||
|
||||
### Reusable Assets
|
||||
- pkg/engine/finding.go — Finding struct (target for imports)
|
||||
- pkg/storage/findings.go — SaveFinding (target for inserts)
|
||||
- pkg/output/sarif.go — SARIFFormatter from Phase 6
|
||||
- cmd/stubs.go — import and hook are stubs to replace
|
||||
|
||||
### Provider Name Normalization
|
||||
- TruffleHog uses names like "OpenAI", "GitHubV2", "AWS" (mixed case)
|
||||
- Gitleaks uses names like "openai-api-key", "aws-access-token" (kebab)
|
||||
- Normalize to KeyHunter's lowercase names: openai, aws-bedrock, etc.
|
||||
- Unknown provider names → keep as-is, tag confidence "imported"
|
||||
|
||||
</code_context>
|
||||
|
||||
<specifics>
|
||||
## Specific Ideas
|
||||
|
||||
- Import command should show a summary: "Imported N findings (M new, K duplicates)"
|
||||
- Hook install should verify `.git/` exists in current directory; error if not a repo
|
||||
- SARIF validation test should check: `$schema`, `version`, `runs[]`, `runs[].tool.driver.name == "keyhunter"`, each result has `ruleId`, `level`, `message`, `locations`
|
||||
|
||||
</specifics>
|
||||
|
||||
<deferred>
|
||||
## Deferred Ideas
|
||||
|
||||
- Import from arbitrary JSON formats via jsonpath config — over-engineering
|
||||
- pre-push and post-merge hooks — pre-commit is enough for v1
|
||||
- GitHub App integration for automatic scanning on PRs — separate project
|
||||
- Semgrep/Snyk output format imports — defer to v2
|
||||
|
||||
</deferred>
|
||||
Reference in New Issue
Block a user