--- phase: 05-verification-engine plan: 03 subsystem: verify tags: [verification, http, ants, gjson, tdd] requires: - pkg/providers (VerifySpec, Registry) - pkg/engine (Finding, MaskKey) - github.com/tidwall/gjson - github.com/panjf2000/ants/v2 provides: - pkg/verify.HTTPVerifier - pkg/verify.Result + Status constants - pkg/verify.HTTPVerifier.Verify (single-key) - pkg/verify.HTTPVerifier.VerifyAll (parallel worker pool) - providers.NewRegistryFromProviders (test helper) affects: - pkg/providers/registry.go (added NewRegistryFromProviders) tech-stack: added: [] patterns: - TDD (RED/GREEN per task) - YAML-driven classification (no provider-name switches in verifier.go) - bounded buffered result channel + ants worker pool key-files: created: - pkg/verify/result.go - pkg/verify/verifier.go - pkg/verify/verifier_test.go modified: - pkg/providers/registry.go decisions: - "{{KEY}} (plus legacy {KEY}) is substituted in URL, header values, and body via a single substituteKey helper — no per-site templating engine." - "Metadata extraction is gated on StatusLive + application/json Content-Type; body is capped at 1 MiB to bound memory under malicious responses." - "HTTPS-only enforced at call time (not schema time) so YAML authors can still write http:// and see a clear StatusError instead of a silent success." - "VerifyAll always emits exactly one Result per input finding (including provider-not-found as StatusUnknown) so callers can pair results to findings by count." - "Worker pool uses ants.NewPool with a buffered channel sized to len(findings); back-pressure never blocks workers, and close happens only after pool.Release + wg.Wait." metrics: tasks: 2 duration_seconds: 245 files_created: 3 files_modified: 1 test_cases: 13 completed: 2026-04-05 --- # Phase 05 Plan 03: HTTPVerifier Core Summary Built the YAML-driven HTTPVerifier that classifies a single provider verify endpoint call into live/dead/rate_limited/error/unknown, substitutes `{{KEY}}` into URL/headers/body, extracts JSON metadata via gjson, and scales to many findings through an ants worker pool. ## What Shipped ### `pkg/verify/result.go` - `Status*` string constants (`live`, `dead`, `rate_limited`, `error`, `unknown`). - `Result` struct carrying `ProviderName`, `KeyMasked`, `Status`, `HTTPCode`, `Metadata`, `RetryAfter`, `ResponseTime`, `Error`. ### `pkg/verify/verifier.go` - `HTTPVerifier` with a TLS 1.2+ `http.Client`, per-call `Timeout` (default 10s), and a `NewHTTPVerifier(timeout)` constructor. - `Verify(ctx, finding, provider) Result`: - Empty `spec.URL` → `StatusUnknown` (provider skipped). - `http://` URL → `StatusError` with `"verify URL must be HTTPS"`. - Default method is `GET`; respects `spec.Method` when set. - `{{KEY}}` and legacy `{KEY}` substituted in URL, each header value, and body. - Per-call `context.WithTimeout(ctx, v.Timeout)` — ctx cancellation maps to `StatusError` with a timeout/deadline/canceled message. - Classification order: `EffectiveSuccessCodes` → live, `EffectiveFailureCodes` → dead, `EffectiveRateLimitCodes` → rate-limited (with `Retry-After` seconds parsed into `Result.RetryAfter`), else unknown. - Metadata only on live + `application/json`; body capped at 1 MiB via `io.LimitReader`; `gjson.GetBytes` per `MetadataPaths` entry. - `VerifyAll(ctx, findings, reg, workers) <-chan Result`: - `workers <= 0` falls back to `DefaultWorkers` (10). - `ants.NewPool(workers)`; each finding submitted as a task that resolves its provider via `reg.Get`, runs `Verify`, and emits exactly one `Result`. - Missing providers → `Result{Status: StatusUnknown, Error: "provider not found in registry"}`. - `ctx.Err() != nil` stops further dispatch; inflight tasks drain, `pool.Release()` + `wg.Wait()` guarantee clean close of the result channel. - Pool-init failure degrades gracefully: emits one `StatusError` Result per finding and closes. ### `pkg/providers/registry.go` - `NewRegistryFromProviders([]Provider) *Registry` — lightweight constructor that skips YAML loading and builds the Aho-Corasick automaton from the caller-supplied providers. Enables unit tests in `pkg/verify` (and future packages) to work with synthetic providers without touching embedded YAML. ## Tests 13 new tests — all green under `-race`. **Single-key (`TestVerify_*`):** - `Live_200`, `Dead_401`, `RateLimited_429_WithRetryAfter` - `MetadataExtraction` (nested path `organization.name` + top-level `tier`) - `KeySubstitution_InHeader`, `KeySubstitution_InBody`, `KeySubstitution_InURL` - `MissingURL_Unknown`, `HTTPRejected`, `Timeout` (50ms timeout vs 300ms server) **Worker pool (`TestVerifyAll_*`):** - `MultipleFindings` — 5 findings, 3 workers, all live + hit counter. - `MissingProvider` — unknown provider yields StatusUnknown and single-result-then-close. - `ContextCancellation` — 100 findings behind a 100ms server, cancel after 50ms, asserts channel closes within 3s with strictly fewer than 100 results. ``` go test ./pkg/verify/... -race -count=1 ok github.com/salvacybersec/keyhunter/pkg/verify 1.557s go build ./... # clean ``` YAML-driven check: ``` grep -i 'openai\|anthropic\|groq' pkg/verify/verifier.go # no matches ``` ## Requirements Satisfied - **VRFY-02** — Single `HTTPVerifier` drives every provider via `VerifySpec`; no per-provider branches in the verifier. - **VRFY-03** — JSON metadata extracted via `gjson` paths from `VerifySpec.MetadataPaths` on live responses. - **VRFY-05** — Per-call timeout honored (`Timeout` field, default 10s); configurable through `NewHTTPVerifier`. ## Deviations from Plan **[Rule 3 — Blocking] Added `providers.NewRegistryFromProviders` test helper.** - **Found during:** Task 2 — verifier_test.go needed a Registry containing only a synthetic `testprov` provider, but `providers.Registry` only exposed `NewRegistry()` (loads all embedded YAML) and had unexported fields. - **Fix:** Added a small exported constructor that accepts a `[]Provider`, builds the Aho-Corasick automaton inline, and returns a ready Registry. Keeps the package's invariants intact. - **Files modified:** `pkg/providers/registry.go` - **Commit:** `45ee2f8` **[Rule 3 — Blocking, transient] Cross-plan build break from parallel Plan 05-02 RED commit.** - **Found during:** Task 1 GREEN verification run. - **Issue:** Plan 05-02 (consent prompt) runs in the same wave and had committed `consent_test.go` with a failing RED step referencing `EnsureConsent`, `ConsentSettingKey`, `ConsentGranted`, and `ConsentDeclined` — making the whole `pkg/verify` package non-buildable and blocking my tests from compiling. - **Fix:** During investigation, the parallel 05-02 agent's GREEN commit (`d4c1403 feat(05-02): implement EnsureConsent`) landed on `master`, so no stub file from this plan was needed. Verified both test suites (6 consent tests + 13 verify tests) pass together. - **Files modified:** none (05-02's commit resolved it) - **Note:** This kind of race is expected when wave-1 plans share a working directory instead of isolated worktrees. Worth flagging for the orchestrator. No architectural deviations. No auth gates. No deferred issues. ## Key Decisions 1. **Substitution helper is string-level, not template-engine.** `{{KEY}}` (and legacy `{KEY}`) are `strings.ReplaceAll` targets. No `text/template`, no escaping — raw byte-for-byte replacement. This matches what every existing provider YAML expects and avoids the footgun of accidental template directives in header values. 2. **Metadata is gated on StatusLive.** There is no good reason to parse a 401 body for "org name" — the response is typically an error blob with unrelated fields. Gating on `StatusLive` keeps `Result.Metadata` meaningful. 3. **1 MiB body cap for metadata extraction.** Defensive against hostile/runaway JSON responses. Legitimate verify endpoints return a few KB at most; 1 MiB is 1000× headroom. 4. **One Result per Finding, always.** `VerifyAll` emits a result for every input finding — success, failure, missing provider, or pool-init error. Callers can count findings vs results and never have to reconcile gaps. 5. **`Result.ProviderName` and `KeyMasked` populated even on error paths.** Downstream consumers can display the masked key + provider next to the failure reason without a second lookup. ## Files **Created** - `pkg/verify/result.go` - `pkg/verify/verifier.go` - `pkg/verify/verifier_test.go` **Modified** - `pkg/providers/registry.go` — `NewRegistryFromProviders` helper. ## Commits - `3ceccd9` test(05-03): add failing tests for HTTPVerifier single-key verification (RED) - `3dfe727` feat(05-03): implement HTTPVerifier single-key verification (GREEN) - `45ee2f8` test(05-03): add failing tests for VerifyAll worker pool (RED) - `35c7759` feat(05-03): add VerifyAll ants worker pool for parallel verification (GREEN) ## Known Stubs None. All values flowing through `Result` are derived from live HTTP responses or documented error paths; no hardcoded placeholder data. ## Next Up - Plan 05-04 (verify spec completeness for Tier 1 providers) — already in progress in a sibling worktree. - Plan 05-05 will wire `HTTPVerifier.VerifyAll` into the scan pipeline behind `--verify`, reusing the `Result` channel shape defined here. ## Self-Check: PASSED - All 5 listed files present on disk. - All 4 commits present in git log. - `go test ./pkg/verify/... -race -count=1` → ok - `go build ./...` → clean