docs(07-04): complete import command plan

This commit is contained in:
salvacybersec
2026-04-06 00:00:24 +03:00
parent 9dbb0b87d4
commit ca526d8e32

View File

@@ -0,0 +1,100 @@
---
phase: 07-import-cicd
plan: 04
subsystem: cmd/import
tags: [cli, importer, storage, dedup]
requires:
- pkg/importer (07-01, 07-02, 07-03)
- pkg/storage.SaveFinding
- cmd.openDBWithKey
provides:
- cmd.importCmd (keyhunter import)
- pkg/storage.FindingExistsByKey
affects:
- cmd/stubs.go (import stub removed)
tech-stack:
added: []
patterns:
- RunE extracted for direct test invocation
- Cross-import dedup via DB tuple lookup (no decrypt needed)
key-files:
created:
- cmd/import.go
- cmd/import_test.go
modified:
- pkg/storage/queries.go
- pkg/storage/queries_test.go
- cmd/stubs.go
decisions:
- Dedup identity = (provider, masked key, source path, line number); matches importer.FindingKey semantics and requires no key decryption at lookup time
- Reuse openDBWithKey helper rather than duplicating encryption bootstrap
- Report total findings, new, and duplicates (in-file + DB) in summary line
metrics:
duration: ~4 min
completed: 2026-04-06
requirements: [IMP-01, IMP-02, IMP-03]
---
# Phase 7 Plan 4: Import Command Wiring Summary
Wires `keyhunter import --format=trufflehog|gitleaks|gitleaks-csv <file>` end-to-end: parses external scanner output via pkg/importer, deduplicates in-file and against the existing KeyHunter database, and persists new findings to encrypted SQLite storage.
## What Was Built
- **cmd/import.go** — new `importCmd` with required `--format` flag dispatching to `TruffleHogImporter`, `GitleaksImporter`, or `GitleaksCSVImporter`. `runImport` opens the file, decodes, runs `importer.Dedup`, then for each unique finding checks `db.FindingExistsByKey` before `db.SaveFinding`. Emits `Imported N findings (M new, K duplicates)` to stdout where K combines in-file duplicates and pre-existing DB matches.
- **engineToStorage helper** — bridges the `engine.Source` / `storage.SourcePath` field name gap and defaults `DetectedAt`.
- **pkg/storage.FindingExistsByKey** — thin `SELECT 1 ... LIMIT 1` lookup keyed on `(provider_name, key_masked, source_path, line_number)`. Makes repeat imports idempotent without decrypting stored key values.
- **cmd/stubs.go** — `importCmd` stub block removed; new `var importCmd` in cmd/import.go takes over the identifier so no cmd/root.go change is required.
## Tests
- `TestSelectImporter` — table covering trufflehog / gitleaks / gitleaks-csv / bogus / empty.
- `TestEngineToStorage` — verifies Source->SourcePath mapping and all verify_* fields.
- `TestRunImport_TruffleHogEndToEnd` — loads `pkg/importer/testdata/trufflehog-sample.json`, runs `runImport` twice: first pass asserts `Imported 3 findings (3 new, 0 duplicates)` and ≥3 rows in `db.ListFindings`; second pass asserts `0 new, 3 duplicates`.
- `TestRunImport_UnknownFormat` — asserts selectImporter surfaces the "unknown format" error.
- `TestRunImport_MissingFile` — asserts wrapped "opening" error for a nonexistent path.
- `TestFindingExistsByKey` — hit case plus four miss cases (each tuple field flipped).
All tests pass: `go build ./...` clean, `go test ./cmd/... ./pkg/storage/... ./pkg/importer/...` ok.
## Deviations from Plan
- **[Rule 3 - Blocking]** The plan sketch left `openDBForImport` and `findingExistsInDB` as TODOs inside cmd/import.go. Replaced inline: `openDBForImport` collapsed into a direct call to the existing `openDBWithKey` helper (per plan's executor note), and `findingExistsInDB` was replaced by a new `storage.FindingExistsByKey` method so dedup runs as a single indexed SQL lookup instead of loading+decrypting every stored finding.
- **[Rule 2 - Missing critical functionality]** `cmd/stubs.go` was already stripped of the `hookCmd` block by a sibling wave-2 plan when this plan reached it. The import stub removal still applied cleanly; no conflict.
- Added `TestRunImport_UnknownFormat` and `TestRunImport_MissingFile` beyond the plan's test list to lock in error-path behavior since the success path exercises most of the happy code.
## Verification
```
cd /home/salva/Documents/apikey
go build ./...
go test ./cmd/... ./pkg/storage/... ./pkg/importer/...
# ok github.com/salvacybersec/keyhunter/cmd 0.448s
# ok github.com/salvacybersec/keyhunter/pkg/storage 0.148s
# ok github.com/salvacybersec/keyhunter/pkg/importer (cached)
```
Manual smoke (matches `<verification>` block in plan):
```
go run ./cmd/keyhunter import --format=trufflehog pkg/importer/testdata/trufflehog-sample.json
# Imported 3 findings (3 new, 0 duplicates)
go run ./cmd/keyhunter import --format=trufflehog pkg/importer/testdata/trufflehog-sample.json
# Imported 3 findings (0 new, 3 duplicates)
```
The end-to-end test exercises this exact sequence against a tempdir DB.
## Commits
- `9dbb0b8` feat(07-04): wire keyhunter import command with dedup and DB persist
## Self-Check: PASSED
- cmd/import.go: FOUND
- cmd/import_test.go: FOUND
- pkg/storage/queries.go FindingExistsByKey: FOUND
- pkg/storage/queries_test.go TestFindingExistsByKey: FOUND
- cmd/stubs.go importCmd removed: CONFIRMED (grep empty)
- Commit 9dbb0b8: FOUND
- Tests green across cmd, pkg/storage, pkg/importer