Files
keyhunter/.planning/phases/05-verification-engine/05-03-PLAN.md
2026-04-05 15:38:23 +03:00

18 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
phase plan type wave depends_on files_modified autonomous requirements must_haves
05-verification-engine 03 execute 1
05-01
pkg/verify/verifier.go
pkg/verify/verifier_test.go
pkg/verify/result.go
true
VRFY-02
VRFY-03
VRFY-05
truths artifacts key_links
HTTPVerifier.Verify(ctx, finding, provider) returns a Result with Status in {live,dead,rate_limited,error,unknown}
{{KEY}} in Headers values and Body is substituted with the plaintext key
HTTP codes in EffectiveSuccessCodes → Status='live'; in EffectiveFailureCodes → Status='dead'; in EffectiveRateLimitCodes → Status='rate_limited'
Metadata extracted from JSON response via gjson paths when response Content-Type is application/json
Per-call context timeout is respected; timeout → Status='error', Error contains 'timeout' or 'deadline'
http:// verify URLs are rejected (HTTPS-only); missing verify URL → Status='unknown'
ants pool with configurable worker count runs verification in parallel
path provides contains
pkg/verify/verifier.go HTTPVerifier struct, VerifyAll(ctx, []Finding, reg) chan Result HTTPVerifier
path provides contains
pkg/verify/result.go Result struct with Status constants StatusLive
from to via pattern
pkg/verify/verifier.go provider.Verify (VerifySpec) template substitution + http.Client.Do {{KEY}}
from to via pattern
pkg/verify/verifier.go github.com/tidwall/gjson metadata extraction gjson.GetBytes
from to via pattern
pkg/verify/verifier.go github.com/panjf2000/ants/v2 worker pool ants.NewPool
Build the core HTTPVerifier. It takes a Finding plus its Provider, substitutes {{KEY}} into the VerifySpec headers/body, makes a single HTTP call with a bounded timeout, classifies the response into live/dead/rate_limited/error, and extracts metadata via gjson. Includes an ants worker pool for parallel verification across many findings.

Purpose: VRFY-02 (YAML-driven verification, no hardcoded logic), VRFY-03 (metadata extraction), VRFY-05 (configurable per-key timeout). Output: pkg/verify/verifier.go with the HTTPVerifier, Result types, and unit tests using httptest.

<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/phases/05-verification-engine/05-CONTEXT.md @pkg/providers/schema.go @pkg/engine/finding.go After Plan 05-01 completes, these are the shapes available:
// pkg/providers/schema.go
type VerifySpec struct {
    Method, URL, Body string
    Headers           map[string]string
    SuccessCodes, FailureCodes, RateLimitCodes []int
    MetadataPaths map[string]string // display-name -> gjson path
    ValidStatus, InvalidStatus []int // legacy
}
func (v VerifySpec) EffectiveSuccessCodes() []int
func (v VerifySpec) EffectiveFailureCodes() []int
func (v VerifySpec) EffectiveRateLimitCodes() []int

type Provider struct {
    Name   string
    Verify VerifySpec
    // ...
}

// pkg/engine/finding.go
type Finding struct {
    ProviderName, KeyValue, KeyMasked string
    // ...
    Verified       bool
    VerifyStatus   string
    VerifyHTTPCode int
    VerifyMetadata map[string]string
    VerifyError    string
}

Registry (existing) exposes func (r *Registry) Get(name string) (*Provider, bool).

Task 1: Result types + HTTPVerifier.Verify single-key logic pkg/verify/result.go, pkg/verify/verifier.go, pkg/verify/verifier_test.go - Verify(ctx, finding, provider) with missing VerifySpec.URL → Result{Status: StatusUnknown} - URL starting with "http://" → Result{Status: StatusError, Error: "verify URL must be HTTPS"} - Default Method is GET when VerifySpec.Method is empty - "{{KEY}}" substituted in every Header value and in Body - 200 (or any code in EffectiveSuccessCodes) → StatusLive - 401/403 (or any EffectiveFailureCodes) → StatusDead - 429 (or EffectiveRateLimitCodes) → StatusRateLimited; Retry-After header captured in Result.RetryAfter - Unknown code → StatusUnknown - JSON response with MetadataPaths set → Metadata populated via gjson - Non-JSON response → Metadata empty (no error) - ctx deadline exceeded → StatusError with "timeout" or "deadline" in Error 1. Create `pkg/verify/result.go`: ```go package verify
   import "time"

   const (
       StatusLive         = "live"
       StatusDead         = "dead"
       StatusRateLimited  = "rate_limited"
       StatusError        = "error"
       StatusUnknown      = "unknown"
   )

   type Result struct {
       ProviderName string
       KeyMasked    string
       Status       string // one of the Status* constants
       HTTPCode     int
       Metadata     map[string]string
       RetryAfter   time.Duration
       ResponseTime time.Duration
       Error        string
   }
   ```

2. Create `pkg/verify/verifier.go`:
   ```go
   package verify

   import (
       "bytes"
       "context"
       "crypto/tls"
       "fmt"
       "io"
       "net/http"
       "strconv"
       "strings"
       "time"

       "github.com/salvacybersec/keyhunter/pkg/engine"
       "github.com/salvacybersec/keyhunter/pkg/providers"
       "github.com/tidwall/gjson"
   )

   const DefaultTimeout = 10 * time.Second

   type HTTPVerifier struct {
       Client  *http.Client
       Timeout time.Duration
   }

   func NewHTTPVerifier(timeout time.Duration) *HTTPVerifier {
       if timeout <= 0 {
           timeout = DefaultTimeout
       }
       return &HTTPVerifier{
           Client: &http.Client{
               Timeout: timeout,
               Transport: &http.Transport{
                   TLSClientConfig: &tls.Config{MinVersion: tls.VersionTLS12},
               },
           },
           Timeout: timeout,
       }
   }

   // Verify runs a single verification against a provider's verify endpoint.
   // It never returns an error — transport/classification errors are encoded in Result.
   func (v *HTTPVerifier) Verify(ctx context.Context, f engine.Finding, p *providers.Provider) Result {
       start := time.Now()
       res := Result{ProviderName: f.ProviderName, KeyMasked: f.KeyMasked, Status: StatusUnknown}

       spec := p.Verify
       if spec.URL == "" {
           return res // StatusUnknown: provider has no verify endpoint
       }
       if strings.HasPrefix(strings.ToLower(spec.URL), "http://") {
           res.Status = StatusError
           res.Error = "verify URL must be HTTPS"
           return res
       }

       // Substitute {{KEY}} in URL (some providers pass key in query string e.g. Google AI)
       url := strings.ReplaceAll(spec.URL, "{{KEY}}", f.KeyValue)
       // Also support legacy {KEY} form used by some existing YAMLs
       url = strings.ReplaceAll(url, "{KEY}", f.KeyValue)

       method := spec.Method
       if method == "" {
           method = http.MethodGet
       }

       var bodyReader io.Reader
       if spec.Body != "" {
           body := strings.ReplaceAll(spec.Body, "{{KEY}}", f.KeyValue)
           body = strings.ReplaceAll(body, "{KEY}", f.KeyValue)
           bodyReader = bytes.NewBufferString(body)
       }

       reqCtx, cancel := context.WithTimeout(ctx, v.Timeout)
       defer cancel()

       req, err := http.NewRequestWithContext(reqCtx, method, url, bodyReader)
       if err != nil {
           res.Status = StatusError
           res.Error = err.Error()
           return res
       }
       for k, val := range spec.Headers {
           substituted := strings.ReplaceAll(val, "{{KEY}}", f.KeyValue)
           substituted = strings.ReplaceAll(substituted, "{KEY}", f.KeyValue)
           req.Header.Set(k, substituted)
       }

       resp, err := v.Client.Do(req)
       res.ResponseTime = time.Since(start)
       if err != nil {
           res.Status = StatusError
           res.Error = err.Error()
           return res
       }
       defer resp.Body.Close()
       res.HTTPCode = resp.StatusCode

       // Classify
       if containsInt(spec.EffectiveSuccessCodes(), resp.StatusCode) {
           res.Status = StatusLive
       } else if containsInt(spec.EffectiveFailureCodes(), resp.StatusCode) {
           res.Status = StatusDead
       } else if containsInt(spec.EffectiveRateLimitCodes(), resp.StatusCode) {
           res.Status = StatusRateLimited
           if ra := resp.Header.Get("Retry-After"); ra != "" {
               if secs, err := strconv.Atoi(ra); err == nil {
                   res.RetryAfter = time.Duration(secs) * time.Second
               }
           }
       } else {
           res.Status = StatusUnknown
       }

       // Metadata extraction only on live responses with JSON body and MetadataPaths
       if res.Status == StatusLive && len(spec.MetadataPaths) > 0 {
           ct := resp.Header.Get("Content-Type")
           if strings.Contains(ct, "application/json") {
               bodyBytes, _ := io.ReadAll(io.LimitReader(resp.Body, 1<<20)) // 1 MiB cap
               res.Metadata = make(map[string]string, len(spec.MetadataPaths))
               for displayName, path := range spec.MetadataPaths {
                   r := gjson.GetBytes(bodyBytes, path)
                   if r.Exists() {
                       res.Metadata[displayName] = r.String()
                   }
               }
           }
       }
       return res
   }

   func containsInt(haystack []int, needle int) bool {
       for _, x := range haystack {
           if x == needle {
               return true
           }
       }
       return false
   }

   // Unused import guard
   var _ = fmt.Sprintf
   ```
   (Remove the `_ = fmt.Sprintf` line if `fmt` ends up unused.)

3. Create `pkg/verify/verifier_test.go` using httptest.NewTLSServer. Tests:
   - `TestVerify_Live_200` — server returns 200, assert StatusLive, HTTPCode=200
   - `TestVerify_Dead_401` — server returns 401, assert StatusDead
   - `TestVerify_RateLimited_429_WithRetryAfter` — server returns 429 with `Retry-After: 30`, assert StatusRateLimited and RetryAfter == 30s
   - `TestVerify_MetadataExtraction` — JSON response `{"organization":{"name":"Acme"},"tier":"plus"}`, MetadataPaths={"org":"organization.name","tier":"tier"}, assert Metadata["org"]=="Acme" and Metadata["tier"]=="plus"
   - `TestVerify_KeySubstitution_InHeader` — server inspects `Authorization` header, verify spec Headers={"Authorization":"Bearer {{KEY}}"}, assert server received "Bearer sk-test-keyvalue"
   - `TestVerify_KeySubstitution_InBody` — POST with Body `{"api_key":"{{KEY}}"}`, server reads body and asserts substitution
   - `TestVerify_KeySubstitution_InURL` — URL `https://host/v1/models?key={{KEY}}`, server inspects req.URL.Query().Get("key")
   - `TestVerify_MissingURL_Unknown` — empty spec.URL, assert StatusUnknown
   - `TestVerify_HTTPRejected` — URL `http://example.com`, assert StatusError, Error contains "HTTPS"
   - `TestVerify_Timeout` — server sleeps 200ms, verifier timeout 50ms, assert StatusError and Error matches /timeout|deadline|canceled/i

   For httptest.NewTLSServer, set `verifier.Client.Transport = server.Client().Transport` so the test cert validates. Use a small helper to build a *providers.Provider inline.
cd /home/salva/Documents/apikey && go test ./pkg/verify/... -run Verify -v - `grep -q 'HTTPVerifier' pkg/verify/verifier.go` - `grep -q 'StatusLive\|StatusDead\|StatusRateLimited' pkg/verify/result.go` - `grep -q 'gjson.GetBytes' pkg/verify/verifier.go` - `grep -q '{{KEY}}' pkg/verify/verifier.go` - All 10 verifier test cases pass - `go build ./...` succeeds Single-key verification classifies status correctly, substitutes key template, extracts JSON metadata, enforces HTTPS + timeout. Task 2: VerifyAll worker pool with ants pkg/verify/verifier.go, pkg/verify/verifier_test.go - VerifyAll(ctx, findings, reg, workers) returns chan Result; closes channel after all findings processed - Workers count respected (default 10 if <= 0) - Findings whose provider is missing from registry → emit Result{Status: StatusUnknown, Error: "provider not found"} - ctx cancellation stops further dispatch; channel still closes cleanly Append to `pkg/verify/verifier.go`: ```go import "github.com/panjf2000/ants/v2" import "sync"
const DefaultWorkers = 10

// VerifyAll runs verification for all findings via an ants worker pool.
// The returned channel is closed after every finding has been processed or ctx is cancelled.
func (v *HTTPVerifier) VerifyAll(ctx context.Context, findings []engine.Finding, reg *providers.Registry, workers int) <-chan Result {
    if workers <= 0 {
        workers = DefaultWorkers
    }
    out := make(chan Result, len(findings))
    pool, err := ants.NewPool(workers)
    if err != nil {
        // On pool creation failure, emit one error result per finding and close.
        go func() {
            defer close(out)
            for _, f := range findings {
                out <- Result{ProviderName: f.ProviderName, KeyMasked: f.KeyMasked, Status: StatusError, Error: "pool init: " + err.Error()}
            }
        }()
        return out
    }

    var wg sync.WaitGroup
    go func() {
        defer close(out)
        defer pool.Release()
        for i := range findings {
            if ctx.Err() != nil {
                break
            }
            f := findings[i]
            wg.Add(1)
            submitErr := pool.Submit(func() {
                defer wg.Done()
                prov, ok := reg.Get(f.ProviderName)
                if !ok {
                    out <- Result{ProviderName: f.ProviderName, KeyMasked: f.KeyMasked, Status: StatusUnknown, Error: "provider not found in registry"}
                    return
                }
                out <- v.Verify(ctx, f, prov)
            })
            if submitErr != nil {
                wg.Done()
                out <- Result{ProviderName: f.ProviderName, KeyMasked: f.KeyMasked, Status: StatusError, Error: submitErr.Error()}
            }
        }
        wg.Wait()
    }()
    return out
}
```

NOTE: verify the exact API of `reg.Get` — check pkg/providers/registry.go before writing. If the method is named differently (e.g. `Find`, `Lookup`), use that. Also verify that ants/v2 is already in go.mod from earlier phases; if not, `go get github.com/panjf2000/ants/v2`.

Append to `pkg/verify/verifier_test.go`:
- `TestVerifyAll_MultipleFindings` — 5 findings against one test server returning 200, workers=3, assert 5 StatusLive results received
- `TestVerifyAll_MissingProvider` — finding with ProviderName="nonexistent", assert Result.Status == StatusUnknown and Error contains "not found"
- `TestVerifyAll_ContextCancellation` — 100 findings, server sleeps 100ms each, cancel ctx after 50ms, assert channel closes within 1s and fewer than 100 results received

Use a real Registry built via providers.NewRegistry() or a minimal test helper that constructs a Registry with a single test provider. If NewRegistry embeds all real providers, prefer that and add a test provider dynamically if there is an API for it; otherwise add a `newTestRegistry(t, p *Provider) *Registry` helper in the test file.
cd /home/salva/Documents/apikey && go test ./pkg/verify/... -run VerifyAll -v - `grep -q 'ants.NewPool' pkg/verify/verifier.go` - `grep -q 'VerifyAll' pkg/verify/verifier.go` - All 3 VerifyAll test cases pass - `go build ./...` succeeds - Race detector clean: `go test ./pkg/verify/... -race -run VerifyAll` Parallel verification via ants pool works; graceful cancellation; missing providers handled. - `go build ./...` clean - `go test ./pkg/verify/... -v -race` all pass - Verifier is YAML-driven (no provider name switches in verifier.go): `grep -v "StatusLive\|StatusDead\|StatusError\|StatusUnknown\|StatusRateLimited" pkg/verify/verifier.go | grep -i "openai\|anthropic\|groq"` returns nothing

<success_criteria>

  • VRFY-02: single HTTPVerifier drives all providers via YAML VerifySpec
  • VRFY-03: metadata extracted via gjson paths on JSON responses
  • VRFY-05: per-call timeout respected, default 10s, configurable
  • Unit tests cover live/dead/rate-limited/error/unknown + key substitution + metadata + cancellation </success_criteria>
After completion, create `.planning/phases/05-verification-engine/05-03-SUMMARY.md`