Files
salvacybersec c6968a4a72 docs(05-03): complete HTTPVerifier core plan
- SUMMARY.md with decisions, metrics, deviations, self-check
- STATE.md advanced, requirements VRFY-02/03/05 marked complete
- ROADMAP.md plan progress updated
2026-04-05 15:51:23 +03:00

9.3 KiB
Raw Permalink Blame History

phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, decisions, metrics
phase plan subsystem tags requires provides affects tech-stack key-files decisions metrics
05-verification-engine 03 verify
verification
http
ants
gjson
tdd
pkg/providers (VerifySpec, Registry)
pkg/engine (Finding, MaskKey)
github.com/tidwall/gjson
github.com/panjf2000/ants/v2
pkg/verify.HTTPVerifier
pkg/verify.Result + Status constants
pkg/verify.HTTPVerifier.Verify (single-key)
pkg/verify.HTTPVerifier.VerifyAll (parallel worker pool)
providers.NewRegistryFromProviders (test helper)
pkg/providers/registry.go (added NewRegistryFromProviders)
added patterns
TDD (RED/GREEN per task)
YAML-driven classification (no provider-name switches in verifier.go)
bounded buffered result channel + ants worker pool
created modified
pkg/verify/result.go
pkg/verify/verifier.go
pkg/verify/verifier_test.go
pkg/providers/registry.go
{{KEY}} (plus legacy {KEY}) is substituted in URL, header values, and body via a single substituteKey helper — no per-site templating engine.
Metadata extraction is gated on StatusLive + application/json Content-Type; body is capped at 1 MiB to bound memory under malicious responses.
HTTPS-only enforced at call time (not schema time) so YAML authors can still write http:// and see a clear StatusError instead of a silent success.
VerifyAll always emits exactly one Result per input finding (including provider-not-found as StatusUnknown) so callers can pair results to findings by count.
Worker pool uses ants.NewPool with a buffered channel sized to len(findings); back-pressure never blocks workers, and close happens only after pool.Release + wg.Wait.
tasks duration_seconds files_created files_modified test_cases completed
2 245 3 1 13 2026-04-05

Phase 05 Plan 03: HTTPVerifier Core Summary

Built the YAML-driven HTTPVerifier that classifies a single provider verify endpoint call into live/dead/rate_limited/error/unknown, substitutes {{KEY}} into URL/headers/body, extracts JSON metadata via gjson, and scales to many findings through an ants worker pool.

What Shipped

pkg/verify/result.go

  • Status* string constants (live, dead, rate_limited, error, unknown).
  • Result struct carrying ProviderName, KeyMasked, Status, HTTPCode, Metadata, RetryAfter, ResponseTime, Error.

pkg/verify/verifier.go

  • HTTPVerifier with a TLS 1.2+ http.Client, per-call Timeout (default 10s), and a NewHTTPVerifier(timeout) constructor.
  • Verify(ctx, finding, provider) Result:
    • Empty spec.URLStatusUnknown (provider skipped).
    • http:// URL → StatusError with "verify URL must be HTTPS".
    • Default method is GET; respects spec.Method when set.
    • {{KEY}} and legacy {KEY} substituted in URL, each header value, and body.
    • Per-call context.WithTimeout(ctx, v.Timeout) — ctx cancellation maps to StatusError with a timeout/deadline/canceled message.
    • Classification order: EffectiveSuccessCodes → live, EffectiveFailureCodes → dead, EffectiveRateLimitCodes → rate-limited (with Retry-After seconds parsed into Result.RetryAfter), else unknown.
    • Metadata only on live + application/json; body capped at 1 MiB via io.LimitReader; gjson.GetBytes per MetadataPaths entry.
  • VerifyAll(ctx, findings, reg, workers) <-chan Result:
    • workers <= 0 falls back to DefaultWorkers (10).
    • ants.NewPool(workers); each finding submitted as a task that resolves its provider via reg.Get, runs Verify, and emits exactly one Result.
    • Missing providers → Result{Status: StatusUnknown, Error: "provider not found in registry"}.
    • ctx.Err() != nil stops further dispatch; inflight tasks drain, pool.Release() + wg.Wait() guarantee clean close of the result channel.
    • Pool-init failure degrades gracefully: emits one StatusError Result per finding and closes.

pkg/providers/registry.go

  • NewRegistryFromProviders([]Provider) *Registry — lightweight constructor that skips YAML loading and builds the Aho-Corasick automaton from the caller-supplied providers. Enables unit tests in pkg/verify (and future packages) to work with synthetic providers without touching embedded YAML.

Tests

13 new tests — all green under -race.

Single-key (TestVerify_*):

  • Live_200, Dead_401, RateLimited_429_WithRetryAfter
  • MetadataExtraction (nested path organization.name + top-level tier)
  • KeySubstitution_InHeader, KeySubstitution_InBody, KeySubstitution_InURL
  • MissingURL_Unknown, HTTPRejected, Timeout (50ms timeout vs 300ms server)

Worker pool (TestVerifyAll_*):

  • MultipleFindings — 5 findings, 3 workers, all live + hit counter.
  • MissingProvider — unknown provider yields StatusUnknown and single-result-then-close.
  • ContextCancellation — 100 findings behind a 100ms server, cancel after 50ms, asserts channel closes within 3s with strictly fewer than 100 results.
go test ./pkg/verify/... -race -count=1
ok  	github.com/salvacybersec/keyhunter/pkg/verify	1.557s
go build ./...  # clean

YAML-driven check:

grep -i 'openai\|anthropic\|groq' pkg/verify/verifier.go
# no matches

Requirements Satisfied

  • VRFY-02 — Single HTTPVerifier drives every provider via VerifySpec; no per-provider branches in the verifier.
  • VRFY-03 — JSON metadata extracted via gjson paths from VerifySpec.MetadataPaths on live responses.
  • VRFY-05 — Per-call timeout honored (Timeout field, default 10s); configurable through NewHTTPVerifier.

Deviations from Plan

[Rule 3 — Blocking] Added providers.NewRegistryFromProviders test helper.

  • Found during: Task 2 — verifier_test.go needed a Registry containing only a synthetic testprov provider, but providers.Registry only exposed NewRegistry() (loads all embedded YAML) and had unexported fields.
  • Fix: Added a small exported constructor that accepts a []Provider, builds the Aho-Corasick automaton inline, and returns a ready Registry. Keeps the package's invariants intact.
  • Files modified: pkg/providers/registry.go
  • Commit: 45ee2f8

[Rule 3 — Blocking, transient] Cross-plan build break from parallel Plan 05-02 RED commit.

  • Found during: Task 1 GREEN verification run.
  • Issue: Plan 05-02 (consent prompt) runs in the same wave and had committed consent_test.go with a failing RED step referencing EnsureConsent, ConsentSettingKey, ConsentGranted, and ConsentDeclined — making the whole pkg/verify package non-buildable and blocking my tests from compiling.
  • Fix: During investigation, the parallel 05-02 agent's GREEN commit (d4c1403 feat(05-02): implement EnsureConsent) landed on master, so no stub file from this plan was needed. Verified both test suites (6 consent tests + 13 verify tests) pass together.
  • Files modified: none (05-02's commit resolved it)
  • Note: This kind of race is expected when wave-1 plans share a working directory instead of isolated worktrees. Worth flagging for the orchestrator.

No architectural deviations. No auth gates. No deferred issues.

Key Decisions

  1. Substitution helper is string-level, not template-engine. {{KEY}} (and legacy {KEY}) are strings.ReplaceAll targets. No text/template, no escaping — raw byte-for-byte replacement. This matches what every existing provider YAML expects and avoids the footgun of accidental template directives in header values.
  2. Metadata is gated on StatusLive. There is no good reason to parse a 401 body for "org name" — the response is typically an error blob with unrelated fields. Gating on StatusLive keeps Result.Metadata meaningful.
  3. 1 MiB body cap for metadata extraction. Defensive against hostile/runaway JSON responses. Legitimate verify endpoints return a few KB at most; 1 MiB is 1000× headroom.
  4. One Result per Finding, always. VerifyAll emits a result for every input finding — success, failure, missing provider, or pool-init error. Callers can count findings vs results and never have to reconcile gaps.
  5. Result.ProviderName and KeyMasked populated even on error paths. Downstream consumers can display the masked key + provider next to the failure reason without a second lookup.

Files

Created

  • pkg/verify/result.go
  • pkg/verify/verifier.go
  • pkg/verify/verifier_test.go

Modified

  • pkg/providers/registry.goNewRegistryFromProviders helper.

Commits

  • 3ceccd9 test(05-03): add failing tests for HTTPVerifier single-key verification (RED)
  • 3dfe727 feat(05-03): implement HTTPVerifier single-key verification (GREEN)
  • 45ee2f8 test(05-03): add failing tests for VerifyAll worker pool (RED)
  • 35c7759 feat(05-03): add VerifyAll ants worker pool for parallel verification (GREEN)

Known Stubs

None. All values flowing through Result are derived from live HTTP responses or documented error paths; no hardcoded placeholder data.

Next Up

  • Plan 05-04 (verify spec completeness for Tier 1 providers) — already in progress in a sibling worktree.
  • Plan 05-05 will wire HTTPVerifier.VerifyAll into the scan pipeline behind --verify, reusing the Result channel shape defined here.

Self-Check: PASSED

  • All 5 listed files present on disk.
  • All 4 commits present in git log.
  • go test ./pkg/verify/... -race -count=1 → ok
  • go build ./... → clean