diff --git a/.planning/phases/05-verification-engine/05-CONTEXT.md b/.planning/phases/05-verification-engine/05-CONTEXT.md new file mode 100644 index 0000000..d669cdf --- /dev/null +++ b/.planning/phases/05-verification-engine/05-CONTEXT.md @@ -0,0 +1,119 @@ +# Phase 5: Verification Engine - Context + +**Gathered:** 2026-04-05 +**Status:** Ready for planning +**Mode:** Auto-generated + + +## Phase Boundary + +Users can opt into active API key verification via `--verify` flag. The verifier makes HTTP calls to each provider's documented verify endpoint (from provider YAML), determines if the key is live, and extracts metadata (org name, permissions, rate limit tier) when available. First use shows a consent prompt with legal language. A LEGAL.md file is embedded in the binary for user reference. + + + + +## Implementation Decisions + +### Consent & Legal (VRFY-01, VRFY-06) +- **Consent prompt**: Shown on first `--verify` use only. State stored in `~/.keyhunter/verify-consent` (or settings table in SQLite DB — use DB for consistency) +- **Prompt text**: Warn about making unsolicited API calls to third-party services. User must type "yes" (full word, case-insensitive) to proceed. Any other input → abort. +- **LEGAL.md**: Embedded via `go:embed` in `pkg/legal/legal.go`. Content covers: unauthorized access laws (CFAA, Computer Misuse Act), consent from key owners, responsible disclosure, tool author disclaims liability. +- **CLI command**: `keyhunter legal` prints the embedded LEGAL.md. `keyhunter config set verify.consent false` resets consent (forces re-prompt). + +### Verifier Architecture (VRFY-02) +- **Package**: `pkg/verify/` +- **Driver pattern**: Single generic `HTTPVerifier` that reads `VerifySpec` from provider YAML (already defined in pkg/providers/schema.go) +- **No hardcoded verification per provider** — everything driven by YAML. If YAML lacks verify.url, the provider is skipped with a warning. +- **Verification result struct**: `{Provider, KeyMasked, Status (live/dead/rate_limited/error/unknown), HTTPCode, Metadata map[string]string, ResponseTime, Error}` +- **Concurrency**: ants worker pool (reuse pattern from engine), configurable via `--verify-workers`, default 10 +- **Per-key timeout**: `--verify-timeout` flag, default 10s, configurable per call +- **Global rate limiting**: respect per-provider rate limit from YAML if specified + +### Verify Spec Schema (VRFY-02, VRFY-03) +Current schema has `VerifySpec{URL, Method, Headers}`. Need to extend: +- `SuccessCodes []int` (default [200]) +- `FailureCodes []int` (default [401, 403]) +- `RateLimitCodes []int` (default [429]) +- `MetadataPaths map[string]string` — JSONPath-like keys mapping to display names: `{"$.organization.name": "org", "$.rate_limit.tier": "tier"}` +- **JSONPath**: use `github.com/tidwall/gjson` (pure Go, fast, zero deps) + +### Body Substitution +- Some providers need the key in the body not header: `{"api_key": "{{KEY}}"}`. Support `{{KEY}}` substitution in `Body` field. +- Support `{{KEY}}` substitution in `Headers` values (e.g., `Authorization: Bearer {{KEY}}`) + +### Metadata Extraction (VRFY-03) +- After successful verification, parse response body as JSON (if content-type is JSON) +- For each MetadataPaths entry, extract value via gjson and add to result.Metadata +- Display metadata in output: `finding.Metadata = {"org": "Acme Corp", "tier": "plus"}` + +### Output Integration (VRFY-04) +- Extend Finding struct in pkg/engine/finding.go: add `Verified bool`, `VerifyStatus string`, `VerifyMetadata map[string]string` +- Extend storage schema: add `verified`, `verify_status`, `verify_metadata_json` columns to findings table (migration) +- Output layer (pkg/output/table.go) displays verification badge: ✓ live / ✗ dead / ⚠ rate-limited / ? unknown + +### Verification Cycle (VRFY-05) +- Scan → findings stream → for each finding, if --verify flag, route through verifier +- Engine.Scan already returns chan Finding — wrap this in verifier stage when --verify is on +- Verifier runs in its own goroutine pool, output goes to the same results channel +- Timeout per key, not per scan + +### Provider YAML Updates +- Existing providers in phases 2-3 have basic verify specs. This phase extends them with SuccessCodes, FailureCodes, RateLimitCodes, MetadataPaths, and Body templates where known +- Priority: update the 12 Tier 1 providers this phase; Tier 2-9 can stay as basic HEAD/GET checks + + + + +## Existing Code Insights + +### Reusable Assets +- pkg/providers/schema.go — VerifySpec struct (needs extension) +- pkg/engine/finding.go — Finding struct (needs extension) +- pkg/storage/db.go + findings.go — storage layer (needs schema migration) +- pkg/output/table.go — output rendering (needs verify columns) +- cmd/scan.go — existing --verify flag from Phase 1 (was a placeholder) + +### New Package +- pkg/verify/verifier.go — HTTPVerifier implementation +- pkg/verify/consent.go — consent prompt logic +- pkg/verify/spec.go — VerifySpec extensions +- pkg/legal/legal.go — embedded LEGAL.md +- LEGAL.md at repo root + +### Dependencies to add +- github.com/tidwall/gjson — JSON path extraction + + + + +## Specific Ideas + +### Metadata paths for Tier 1 providers (examples) +- **OpenAI**: verify via `GET /v1/models` with `Authorization: Bearer {{KEY}}`; metadata: org from `/v1/organization/info` (admin key only) +- **Anthropic**: verify via `POST /v1/messages` with minimal body (`{"model":"claude-haiku-4-5","max_tokens":1,"messages":[{"role":"user","content":"hi"}]}`) — 401 = dead, 200 = live +- **Groq**: `GET /openai/v1/models` with `Authorization: Bearer {{KEY}}` +- **Google AI**: `GET https://generativelanguage.googleapis.com/v1/models?key={{KEY}}` +- **Cohere**: `GET /v1/models` with `Authorization: Bearer {{KEY}}` + +### Rate limit handling +- If provider returns 429, mark as `rate_limited`, queue for retry later +- Respect Retry-After header if present +- Global backoff: if 3 consecutive 429s for a provider, skip remaining keys for that provider + +### Security +- Never log full keys — always use MaskKey +- HTTPS only (reject http:// verify URLs) +- Short timeouts to avoid hanging scans + + + + +## Deferred Ideas + +- Scan resume from checkpoint after rate limit — out of scope for v1 +- Verify-only command (`keyhunter verify keys.json`) — can be added in Phase 6 output/keys phase +- Per-provider custom verifier logic (go plugins) — complexity not justified +- Machine-readable verify result export — covered by Phase 6 JSON output +- Bulk org info fetching (multiple keys per request) — not supported by most providers + +