Files
keyhunter/.planning/phases/05-verification-engine/05-03-SUMMARY.md
salvacybersec c6968a4a72 docs(05-03): complete HTTPVerifier core plan
- SUMMARY.md with decisions, metrics, deviations, self-check
- STATE.md advanced, requirements VRFY-02/03/05 marked complete
- ROADMAP.md plan progress updated
2026-04-05 15:51:23 +03:00

168 lines
9.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
phase: 05-verification-engine
plan: 03
subsystem: verify
tags: [verification, http, ants, gjson, tdd]
requires:
- pkg/providers (VerifySpec, Registry)
- pkg/engine (Finding, MaskKey)
- github.com/tidwall/gjson
- github.com/panjf2000/ants/v2
provides:
- pkg/verify.HTTPVerifier
- pkg/verify.Result + Status constants
- pkg/verify.HTTPVerifier.Verify (single-key)
- pkg/verify.HTTPVerifier.VerifyAll (parallel worker pool)
- providers.NewRegistryFromProviders (test helper)
affects:
- pkg/providers/registry.go (added NewRegistryFromProviders)
tech-stack:
added: []
patterns:
- TDD (RED/GREEN per task)
- YAML-driven classification (no provider-name switches in verifier.go)
- bounded buffered result channel + ants worker pool
key-files:
created:
- pkg/verify/result.go
- pkg/verify/verifier.go
- pkg/verify/verifier_test.go
modified:
- pkg/providers/registry.go
decisions:
- "{{KEY}} (plus legacy {KEY}) is substituted in URL, header values, and body via a single substituteKey helper — no per-site templating engine."
- "Metadata extraction is gated on StatusLive + application/json Content-Type; body is capped at 1 MiB to bound memory under malicious responses."
- "HTTPS-only enforced at call time (not schema time) so YAML authors can still write http:// and see a clear StatusError instead of a silent success."
- "VerifyAll always emits exactly one Result per input finding (including provider-not-found as StatusUnknown) so callers can pair results to findings by count."
- "Worker pool uses ants.NewPool with a buffered channel sized to len(findings); back-pressure never blocks workers, and close happens only after pool.Release + wg.Wait."
metrics:
tasks: 2
duration_seconds: 245
files_created: 3
files_modified: 1
test_cases: 13
completed: 2026-04-05
---
# Phase 05 Plan 03: HTTPVerifier Core Summary
Built the YAML-driven HTTPVerifier that classifies a single provider verify endpoint call into live/dead/rate_limited/error/unknown, substitutes `{{KEY}}` into URL/headers/body, extracts JSON metadata via gjson, and scales to many findings through an ants worker pool.
## What Shipped
### `pkg/verify/result.go`
- `Status*` string constants (`live`, `dead`, `rate_limited`, `error`, `unknown`).
- `Result` struct carrying `ProviderName`, `KeyMasked`, `Status`, `HTTPCode`, `Metadata`, `RetryAfter`, `ResponseTime`, `Error`.
### `pkg/verify/verifier.go`
- `HTTPVerifier` with a TLS 1.2+ `http.Client`, per-call `Timeout` (default 10s), and a `NewHTTPVerifier(timeout)` constructor.
- `Verify(ctx, finding, provider) Result`:
- Empty `spec.URL``StatusUnknown` (provider skipped).
- `http://` URL → `StatusError` with `"verify URL must be HTTPS"`.
- Default method is `GET`; respects `spec.Method` when set.
- `{{KEY}}` and legacy `{KEY}` substituted in URL, each header value, and body.
- Per-call `context.WithTimeout(ctx, v.Timeout)` — ctx cancellation maps to `StatusError` with a timeout/deadline/canceled message.
- Classification order: `EffectiveSuccessCodes` → live, `EffectiveFailureCodes` → dead, `EffectiveRateLimitCodes` → rate-limited (with `Retry-After` seconds parsed into `Result.RetryAfter`), else unknown.
- Metadata only on live + `application/json`; body capped at 1 MiB via `io.LimitReader`; `gjson.GetBytes` per `MetadataPaths` entry.
- `VerifyAll(ctx, findings, reg, workers) <-chan Result`:
- `workers <= 0` falls back to `DefaultWorkers` (10).
- `ants.NewPool(workers)`; each finding submitted as a task that resolves its provider via `reg.Get`, runs `Verify`, and emits exactly one `Result`.
- Missing providers → `Result{Status: StatusUnknown, Error: "provider not found in registry"}`.
- `ctx.Err() != nil` stops further dispatch; inflight tasks drain, `pool.Release()` + `wg.Wait()` guarantee clean close of the result channel.
- Pool-init failure degrades gracefully: emits one `StatusError` Result per finding and closes.
### `pkg/providers/registry.go`
- `NewRegistryFromProviders([]Provider) *Registry` — lightweight constructor that skips YAML loading and builds the Aho-Corasick automaton from the caller-supplied providers. Enables unit tests in `pkg/verify` (and future packages) to work with synthetic providers without touching embedded YAML.
## Tests
13 new tests — all green under `-race`.
**Single-key (`TestVerify_*`):**
- `Live_200`, `Dead_401`, `RateLimited_429_WithRetryAfter`
- `MetadataExtraction` (nested path `organization.name` + top-level `tier`)
- `KeySubstitution_InHeader`, `KeySubstitution_InBody`, `KeySubstitution_InURL`
- `MissingURL_Unknown`, `HTTPRejected`, `Timeout` (50ms timeout vs 300ms server)
**Worker pool (`TestVerifyAll_*`):**
- `MultipleFindings` — 5 findings, 3 workers, all live + hit counter.
- `MissingProvider` — unknown provider yields StatusUnknown and single-result-then-close.
- `ContextCancellation` — 100 findings behind a 100ms server, cancel after 50ms, asserts channel closes within 3s with strictly fewer than 100 results.
```
go test ./pkg/verify/... -race -count=1
ok github.com/salvacybersec/keyhunter/pkg/verify 1.557s
go build ./... # clean
```
YAML-driven check:
```
grep -i 'openai\|anthropic\|groq' pkg/verify/verifier.go
# no matches
```
## Requirements Satisfied
- **VRFY-02** — Single `HTTPVerifier` drives every provider via `VerifySpec`; no per-provider branches in the verifier.
- **VRFY-03** — JSON metadata extracted via `gjson` paths from `VerifySpec.MetadataPaths` on live responses.
- **VRFY-05** — Per-call timeout honored (`Timeout` field, default 10s); configurable through `NewHTTPVerifier`.
## Deviations from Plan
**[Rule 3 — Blocking] Added `providers.NewRegistryFromProviders` test helper.**
- **Found during:** Task 2 — verifier_test.go needed a Registry containing only a synthetic `testprov` provider, but `providers.Registry` only exposed `NewRegistry()` (loads all embedded YAML) and had unexported fields.
- **Fix:** Added a small exported constructor that accepts a `[]Provider`, builds the Aho-Corasick automaton inline, and returns a ready Registry. Keeps the package's invariants intact.
- **Files modified:** `pkg/providers/registry.go`
- **Commit:** `45ee2f8`
**[Rule 3 — Blocking, transient] Cross-plan build break from parallel Plan 05-02 RED commit.**
- **Found during:** Task 1 GREEN verification run.
- **Issue:** Plan 05-02 (consent prompt) runs in the same wave and had committed `consent_test.go` with a failing RED step referencing `EnsureConsent`, `ConsentSettingKey`, `ConsentGranted`, and `ConsentDeclined` — making the whole `pkg/verify` package non-buildable and blocking my tests from compiling.
- **Fix:** During investigation, the parallel 05-02 agent's GREEN commit (`d4c1403 feat(05-02): implement EnsureConsent`) landed on `master`, so no stub file from this plan was needed. Verified both test suites (6 consent tests + 13 verify tests) pass together.
- **Files modified:** none (05-02's commit resolved it)
- **Note:** This kind of race is expected when wave-1 plans share a working directory instead of isolated worktrees. Worth flagging for the orchestrator.
No architectural deviations. No auth gates. No deferred issues.
## Key Decisions
1. **Substitution helper is string-level, not template-engine.** `{{KEY}}` (and legacy `{KEY}`) are `strings.ReplaceAll` targets. No `text/template`, no escaping — raw byte-for-byte replacement. This matches what every existing provider YAML expects and avoids the footgun of accidental template directives in header values.
2. **Metadata is gated on StatusLive.** There is no good reason to parse a 401 body for "org name" — the response is typically an error blob with unrelated fields. Gating on `StatusLive` keeps `Result.Metadata` meaningful.
3. **1 MiB body cap for metadata extraction.** Defensive against hostile/runaway JSON responses. Legitimate verify endpoints return a few KB at most; 1 MiB is 1000× headroom.
4. **One Result per Finding, always.** `VerifyAll` emits a result for every input finding — success, failure, missing provider, or pool-init error. Callers can count findings vs results and never have to reconcile gaps.
5. **`Result.ProviderName` and `KeyMasked` populated even on error paths.** Downstream consumers can display the masked key + provider next to the failure reason without a second lookup.
## Files
**Created**
- `pkg/verify/result.go`
- `pkg/verify/verifier.go`
- `pkg/verify/verifier_test.go`
**Modified**
- `pkg/providers/registry.go``NewRegistryFromProviders` helper.
## Commits
- `3ceccd9` test(05-03): add failing tests for HTTPVerifier single-key verification (RED)
- `3dfe727` feat(05-03): implement HTTPVerifier single-key verification (GREEN)
- `45ee2f8` test(05-03): add failing tests for VerifyAll worker pool (RED)
- `35c7759` feat(05-03): add VerifyAll ants worker pool for parallel verification (GREEN)
## Known Stubs
None. All values flowing through `Result` are derived from live HTTP responses or documented error paths; no hardcoded placeholder data.
## Next Up
- Plan 05-04 (verify spec completeness for Tier 1 providers) — already in progress in a sibling worktree.
- Plan 05-05 will wire `HTTPVerifier.VerifyAll` into the scan pipeline behind `--verify`, reusing the `Result` channel shape defined here.
## Self-Check: PASSED
- All 5 listed files present on disk.
- All 4 commits present in git log.
- `go test ./pkg/verify/... -race -count=1` → ok
- `go build ./...` → clean