docs(05): create phase 5 verification engine plans

This commit is contained in:
salvacybersec
2026-04-05 15:38:23 +03:00
parent e65b9c981b
commit 0b667566c4
6 changed files with 1632 additions and 1 deletions

View File

@@ -120,7 +120,14 @@ Plans:
3. `keyhunter scan --verify` extracts and displays org name, rate limit tier, and available permissions when the provider API returns them
4. `--verify-timeout=30s` changes the per-key verification timeout from the default 10s
5. A `LEGAL.md` file shipping with the binary documents the legal implications of using `--verify`
**Plans**: TBD
**Plans**: 5 plans
Plans:
- [ ] 05-01-PLAN.md — Wave 0: extend VerifySpec schema, Finding struct, storage schema; add gjson dep
- [ ] 05-02-PLAN.md — LEGAL.md + pkg/legal embed + consent prompt + keyhunter legal command
- [ ] 05-03-PLAN.md — pkg/verify HTTPVerifier: template sub, gjson metadata extraction, ants pool
- [ ] 05-04-PLAN.md — Update 12 Tier 1 provider YAMLs with extended verify specs + guardrail test
- [ ] 05-05-PLAN.md — cmd/scan.go --verify wiring + --verify-timeout/workers flags + output verify column
### Phase 6: Output, Reporting & Key Management
**Goal**: Users can consume scan results in any format they need and perform full lifecycle management of stored keys — listing, inspecting, exporting, copying, and deleting

View File

@@ -0,0 +1,245 @@
---
phase: 05-verification-engine
plan: 01
type: execute
wave: 0
depends_on: []
files_modified:
- go.mod
- go.sum
- pkg/providers/schema.go
- pkg/engine/finding.go
- pkg/storage/schema.sql
- pkg/storage/findings.go
- pkg/storage/findings_test.go
autonomous: true
requirements: [VRFY-02, VRFY-03]
must_haves:
truths:
- "VerifySpec struct carries SuccessCodes, FailureCodes, RateLimitCodes, MetadataPaths, Body"
- "Finding struct carries Verified, VerifyStatus, VerifyMetadata fields"
- "findings table has verified/verify_status/verify_metadata_json columns and existing DBs migrate without data loss"
- "github.com/tidwall/gjson is in go.mod"
artifacts:
- path: "pkg/providers/schema.go"
provides: "Extended VerifySpec with SuccessCodes/FailureCodes/RateLimitCodes/MetadataPaths/Body"
contains: "SuccessCodes"
- path: "pkg/engine/finding.go"
provides: "Finding with Verified/VerifyStatus/VerifyMetadata"
contains: "VerifyStatus"
- path: "pkg/storage/schema.sql"
provides: "findings columns verified, verify_status, verify_metadata_json"
contains: "verify_status"
- path: "go.mod"
provides: "github.com/tidwall/gjson dependency"
contains: "tidwall/gjson"
key_links:
- from: "pkg/storage/findings.go"
to: "findings table"
via: "INSERT/SELECT with new verify_* columns"
pattern: "verify_status"
---
<objective>
Wave 0 foundation for the verification engine. Extend the provider VerifySpec schema, Finding struct, and storage schema with the fields every downstream plan needs. Add the gjson dependency. No runtime verifier logic yet — just contracts so Plans 05-02, 05-03, 05-04 can run in parallel on Wave 1.
Purpose: Interface-first — downstream plans build against these types without exploring the codebase.
Output: Extended schema types, migrated SQLite schema, gjson dependency wired.
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/05-verification-engine/05-CONTEXT.md
@pkg/providers/schema.go
@pkg/engine/finding.go
@pkg/storage/schema.sql
@pkg/storage/findings.go
@pkg/storage/db.go
<interfaces>
<!-- Current shapes that must be extended, not replaced -->
From pkg/providers/schema.go (current):
```go
type VerifySpec struct {
Method string `yaml:"method"`
URL string `yaml:"url"`
Headers map[string]string `yaml:"headers"`
ValidStatus []int `yaml:"valid_status"`
InvalidStatus []int `yaml:"invalid_status"`
}
```
Note: existing YAMLs use `valid_status` / `invalid_status`. Keep those fields for backward compat AND add the new canonical fields.
From pkg/engine/finding.go (current):
```go
type Finding struct {
ProviderName string
KeyValue string
KeyMasked string
Confidence string
Source string
SourceType string
LineNumber int
Offset int64
DetectedAt time.Time
}
```
From pkg/storage/schema.sql findings table: columns id, scan_id, provider_name, key_value, key_masked, confidence, source_path, source_type, line_number, created_at.
</interfaces>
</context>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Extend VerifySpec, Finding, and add gjson dependency</name>
<files>go.mod, go.sum, pkg/providers/schema.go, pkg/engine/finding.go, pkg/providers/schema_test.go</files>
<behavior>
- VerifySpec parses new YAML fields: success_codes, failure_codes, rate_limit_codes, metadata_paths, body
- Backward compat: existing YAMLs with only valid_status/invalid_status still load (no error); VerifySpec exposes both old and new fields
- Finding zero value has Verified=false, VerifyStatus="", VerifyMetadata=nil
- gjson is importable: `import "github.com/tidwall/gjson"` compiles
</behavior>
<action>
1. Add gjson dep:
```
go get github.com/tidwall/gjson@latest
```
(Do NOT run `go mod tidy`; use `go mod download` if needed per Phase 4 lesson.)
2. Extend `pkg/providers/schema.go` VerifySpec struct to:
```go
type VerifySpec struct {
Method string `yaml:"method"`
URL string `yaml:"url"`
Headers map[string]string `yaml:"headers"`
Body string `yaml:"body"`
// Canonical status code fields (Phase 5)
SuccessCodes []int `yaml:"success_codes"`
FailureCodes []int `yaml:"failure_codes"`
RateLimitCodes []int `yaml:"rate_limit_codes"`
// MetadataPaths maps display-name -> gjson path (e.g. "org" -> "organization.name")
MetadataPaths map[string]string `yaml:"metadata_paths"`
// Legacy fields kept for backward compat with existing YAMLs (Phase 2-3 providers)
ValidStatus []int `yaml:"valid_status"`
InvalidStatus []int `yaml:"invalid_status"`
}
```
Add a method `(v VerifySpec) EffectiveSuccessCodes() []int` that returns `SuccessCodes` if non-empty else `ValidStatus` else `[]int{200}`.
Add `(v VerifySpec) EffectiveFailureCodes() []int` returning `FailureCodes` if non-empty else `InvalidStatus` else `[]int{401, 403}`.
Add `(v VerifySpec) EffectiveRateLimitCodes() []int` returning `RateLimitCodes` if non-empty else `[]int{429}`.
3. Extend `pkg/engine/finding.go` Finding struct — add at bottom (preserve existing fields and MaskKey function):
```go
// Verification fields populated when scan --verify is set (Phase 5).
Verified bool // true if verifier ran against this finding
VerifyStatus string // "live", "dead", "rate_limited", "error", "unknown"
VerifyHTTPCode int // HTTP status code returned by verify endpoint
VerifyMetadata map[string]string // extracted metadata from response (org, tier, etc.)
VerifyError string // non-nil if VerifyStatus == "error"
```
4. Add `pkg/providers/schema_test.go` with:
- `TestVerifySpec_NewFieldsParse` — YAML with success_codes/failure_codes/rate_limit_codes/metadata_paths/body unmarshals correctly
- `TestVerifySpec_LegacyFieldsStillWork` — YAML with only valid_status/invalid_status parses and `EffectiveSuccessCodes()` returns the legacy values
- `TestVerifySpec_Defaults` — empty VerifySpec: `EffectiveSuccessCodes()==[200]`, `EffectiveFailureCodes()==[401,403]`, `EffectiveRateLimitCodes()==[429]`
Use `yaml.Unmarshal` directly into a small wrapper struct `struct{ Verify VerifySpec \`yaml:"verify"\` }`.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go build ./... && go test ./pkg/providers/... -run VerifySpec -v</automated>
</verify>
<acceptance_criteria>
- `grep -q 'SuccessCodes' pkg/providers/schema.go`
- `grep -q 'MetadataPaths' pkg/providers/schema.go`
- `grep -q 'VerifyStatus' pkg/engine/finding.go`
- `grep -q 'tidwall/gjson' go.mod`
- `go build ./...` succeeds
- All three new test cases pass
</acceptance_criteria>
<done>VerifySpec and Finding carry all Phase 5 fields, legacy YAMLs still load, gjson is available for Plan 05-03.</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Migrate storage schema and persist verify fields</name>
<files>pkg/storage/schema.sql, pkg/storage/findings.go, pkg/storage/findings_test.go</files>
<behavior>
- Fresh DB: findings table has verified INTEGER, verify_status TEXT, verify_metadata_json TEXT, verify_http_code INTEGER columns
- Existing DB: ALTER TABLE adds the columns idempotently (using ADD COLUMN IF NOT EXISTS pattern or pragma table_info check)
- SaveFinding persists verify_* fields when set; ListFindings round-trips them
- Storing a Finding with nil VerifyMetadata results in NULL verify_metadata_json; storing a populated map round-trips via JSON
</behavior>
<action>
1. Update `pkg/storage/schema.sql`: Add four columns to `findings` table DDL:
```sql
CREATE TABLE IF NOT EXISTS findings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
scan_id INTEGER REFERENCES scans(id),
provider_name TEXT NOT NULL,
key_value BLOB NOT NULL,
key_masked TEXT NOT NULL,
confidence TEXT NOT NULL,
source_path TEXT,
source_type TEXT,
line_number INTEGER,
verified INTEGER NOT NULL DEFAULT 0,
verify_status TEXT NOT NULL DEFAULT '',
verify_http_code INTEGER NOT NULL DEFAULT 0,
verify_metadata_json TEXT,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
```
2. In `pkg/storage/db.go` Open(), after the schema exec, run an idempotent migration for existing databases. SQLite < 3.35 does not support `ADD COLUMN IF NOT EXISTS`, so check existing columns via `PRAGMA table_info(findings)` and issue `ALTER TABLE findings ADD COLUMN ...` only if missing. Add this as a helper function `migrateFindingsVerifyColumns(sqlDB *sql.DB) error` and call it from Open(). Columns to add if missing: `verified INTEGER NOT NULL DEFAULT 0`, `verify_status TEXT NOT NULL DEFAULT ''`, `verify_http_code INTEGER NOT NULL DEFAULT 0`, `verify_metadata_json TEXT`.
3. Update `pkg/storage/findings.go`:
- Extend `Finding` struct with: `Verified bool`, `VerifyStatus string`, `VerifyHTTPCode int`, `VerifyMetadata map[string]string`
- Update `SaveFinding` INSERT to include the four new columns. Serialize `VerifyMetadata` via `encoding/json` when non-nil; pass `sql.NullString{}` when nil
- Update `ListFindings` SELECT to read the new columns and populate the struct (decode JSON back into map)
4. Add `pkg/storage/findings_test.go` tests (use `:memory:` DB):
- `TestSaveFinding_VerifyFields_RoundTrip` — save with Verified=true, VerifyStatus="live", VerifyHTTPCode=200, VerifyMetadata={"org":"Acme","tier":"plus"}, then ListFindings and assert all fields equal
- `TestSaveFinding_VerifyFields_Empty` — save finding with no verify data, ListFindings returns Verified=false, empty status, nil metadata
- `TestOpen_MigratesExistingDB` — create an old-schema findings table manually (without verify columns), close, reopen with storage.Open, assert the four columns now exist via PRAGMA table_info
Reference existing SaveFinding/ListFindings code shape in pkg/storage/findings.go — mirror the NULL-handling pattern used for scan_id.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/storage/... -run Verify -v && go test ./pkg/storage/... -run Migrate -v</automated>
</verify>
<acceptance_criteria>
- `grep -q 'verify_status' pkg/storage/schema.sql`
- `grep -q 'verify_metadata_json' pkg/storage/findings.go`
- `grep -q 'migrateFindingsVerifyColumns' pkg/storage/db.go`
- `go test ./pkg/storage/... -v` all pass
- `go build ./...` succeeds
</acceptance_criteria>
<done>Storage persists verify_* fields for fresh and existing DBs; round-trip test green.</done>
</task>
</tasks>
<verification>
- `go build ./...` clean
- `go test ./pkg/providers/... ./pkg/storage/... -v` all pass
- `grep -rn "SuccessCodes\|VerifyStatus\|verify_metadata_json" pkg/` confirms all three extensions landed
</verification>
<success_criteria>
- Extended VerifySpec, extended Finding, migrated SQLite schema, gjson dep present
- Backward compatibility preserved (legacy YAMLs load, old DBs migrate)
- Unit tests green on all new fields
</success_criteria>
<output>
After completion, create `.planning/phases/05-verification-engine/05-01-SUMMARY.md`
</output>

View File

@@ -0,0 +1,265 @@
---
phase: 05-verification-engine
plan: 02
type: execute
wave: 1
depends_on: [05-01]
files_modified:
- LEGAL.md
- pkg/legal/legal.go
- pkg/legal/legal_test.go
- pkg/verify/consent.go
- pkg/verify/consent_test.go
- cmd/legal.go
- cmd/root.go
autonomous: true
requirements: [VRFY-01, VRFY-06]
must_haves:
truths:
- "Running `keyhunter legal` prints the embedded LEGAL.md to stdout"
- "First invocation of --verify prompts for consent; typing 'yes' (case-insensitive) proceeds; anything else aborts"
- "Consent decision is stored in the settings table and not re-prompted on subsequent runs"
- "LEGAL.md ships in the binary via go:embed (not read from filesystem at runtime)"
artifacts:
- path: "LEGAL.md"
provides: "Legal disclaimer text for verification feature"
min_lines: 40
- path: "pkg/legal/legal.go"
provides: "Embedded LEGAL.md via go:embed"
contains: "go:embed"
- path: "pkg/verify/consent.go"
provides: "EnsureConsent(db, in, out) (bool, error) — prompts once, persists decision"
contains: "EnsureConsent"
- path: "cmd/legal.go"
provides: "keyhunter legal subcommand"
contains: "legalCmd"
key_links:
- from: "pkg/verify/consent.go"
to: "pkg/storage settings table"
via: "db.GetSetting/SetSetting('verify.consent')"
pattern: "verify.consent"
- from: "pkg/legal/legal.go"
to: "LEGAL.md"
via: "go:embed"
pattern: "go:embed LEGAL.md"
---
<objective>
Create the legal disclaimer document, embed it in the binary, add a `keyhunter legal` subcommand that prints it, and implement the consent prompt that gates `--verify` on first use.
Purpose: Legal safety — VRFY-01 requires a one-time consent prompt with clear language; VRFY-06 requires LEGAL.md shipping with the binary.
Output: LEGAL.md file, pkg/legal with embed, pkg/verify consent logic, `keyhunter legal` command.
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/phases/05-verification-engine/05-CONTEXT.md
@pkg/storage/settings.go
@cmd/root.go
@cmd/scan.go
<interfaces>
From pkg/storage (existing settings API used by cmd/scan.go):
```go
func (db *DB) GetSetting(key string) (value string, found bool, err error)
func (db *DB) SetSetting(key, value string) error
```
Use setting key `"verify.consent"` with values `"granted"` or `"declined"`.
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Write LEGAL.md and embed it in pkg/legal, wire keyhunter legal subcommand</name>
<files>LEGAL.md, pkg/legal/legal.go, pkg/legal/legal_test.go, cmd/legal.go, cmd/root.go</files>
<action>
1. Create `LEGAL.md` at the repo root. Contents must include these H2 sections (minimum 40 lines total):
- `## Purpose` — explains KeyHunter's verification feature makes HTTP calls to third-party APIs
- `## What --verify Does` — bullet list: sends single lightweight request per found key to provider's documented endpoint, does not modify any account or data, reads only metadata the API returns
- `## Legal Considerations` — covers unauthorized access laws including US CFAA (18 U.S.C. § 1030), UK Computer Misuse Act 1990, EU Directive 2013/40/EU; warn that verifying a key you do not own or have permission to test may constitute unauthorized access
- `## Responsible Use` — only verify keys you own, keys from authorized engagements (pen-test / bug bounty with explicit scope), or keys in your own CI/CD pipelines. Do NOT verify keys found in random public repositories without owner consent
- `## Responsible Disclosure` — when you find a leaked key belonging to someone else, contact the key owner directly via their security contact or security.txt; do not publish the key
- `## Disclaimer` — tool authors and contributors disclaim all liability; the user is solely responsible for compliance with applicable laws and terms of service
- `## Consent Record` — note that running `keyhunter scan --verify` the first time shows an interactive prompt; typing `yes` records consent in the local SQLite settings table
2. Create `pkg/legal/legal.go`:
```go
package legal
import _ "embed"
//go:embed LEGAL.md
var legalMarkdown string
// Text returns the embedded LEGAL.md contents.
func Text() string { return legalMarkdown }
```
NOTE: Go embed cannot traverse up directories. Create `pkg/legal/LEGAL.md` as a copy of the root LEGAL.md, OR use a build step. Prefer: keep the canonical file at `pkg/legal/LEGAL.md` and have the repo-root `LEGAL.md` be a symlink OR a second identical copy. Simplest: write content to both `LEGAL.md` and `pkg/legal/LEGAL.md` (identical). Document this dual-location pattern in a `// Note:` comment in legal.go (mirrors the providers/ vs pkg/providers/definitions/ pattern from Phase 1).
3. Create `pkg/legal/legal_test.go`:
- `TestText_NonEmpty` — `legal.Text()` returns string with len > 500
- `TestText_ContainsKeyPhrases` — contains "CFAA", "Responsible Use", "Disclaimer"
4. Create `cmd/legal.go`:
```go
package cmd
import (
"fmt"
"github.com/salvacybersec/keyhunter/pkg/legal"
"github.com/spf13/cobra"
)
var legalCmd = &cobra.Command{
Use: "legal",
Short: "Print the legal disclaimer for the --verify feature",
RunE: func(cmd *cobra.Command, args []string) error {
fmt.Println(legal.Text())
return nil
},
}
```
5. Register `legalCmd` in `cmd/root.go` via `rootCmd.AddCommand(legalCmd)` in the existing init() function (follow the pattern of other commands like scanCmd).
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go build ./... && go test ./pkg/legal/... -v && ./keyhunter legal 2>&1 | grep -q "CFAA" || go run ./cmd/keyhunter legal 2>&1 | grep -q "CFAA"</automated>
</verify>
<acceptance_criteria>
- `test -f LEGAL.md && test -f pkg/legal/LEGAL.md`
- `grep -q 'CFAA' LEGAL.md && grep -q 'Responsible Use' LEGAL.md`
- `grep -q 'go:embed LEGAL.md' pkg/legal/legal.go`
- `grep -q 'legalCmd' cmd/root.go`
- `go test ./pkg/legal/... -v` passes
- `go run . legal` (or equivalent) prints text containing "CFAA"
</acceptance_criteria>
<done>LEGAL.md exists, is embedded, `keyhunter legal` prints it.</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Implement consent prompt (pkg/verify/consent.go)</name>
<files>pkg/verify/consent.go, pkg/verify/consent_test.go</files>
<behavior>
- EnsureConsent returns (true, nil) immediately when settings key "verify.consent" == "granted"
- When no prior decision exists, writes a prompt to `out` io.Writer, reads a line from `in` io.Reader
- Input "yes", "YES", "Yes" (case-insensitive full word) -> persists "granted" and returns (true, nil)
- Any other input (including "y", "no", empty) -> persists "declined" and returns (false, nil)
- When "verify.consent" == "declined", re-prompts (declined is not sticky; user might change their mind). Only "granted" is sticky
- Prompt text contains phrases: "legal implications", "keyhunter legal", "yes"
</behavior>
<action>
Create `pkg/verify/consent.go`:
```go
package verify
import (
"bufio"
"fmt"
"io"
"strings"
"github.com/salvacybersec/keyhunter/pkg/storage"
)
const ConsentSettingKey = "verify.consent"
const (
ConsentGranted = "granted"
ConsentDeclined = "declined"
)
// EnsureConsent checks whether the user has previously granted consent to run
// active key verification. If not, it prints a warning and prompts on `out`,
// reads one line from `in`, and persists the decision via the settings table.
//
// Returns true if consent is granted (either previously or just now).
// Declined decisions are not sticky — the next call will re-prompt.
func EnsureConsent(db *storage.DB, in io.Reader, out io.Writer) (bool, error) {
val, found, err := db.GetSetting(ConsentSettingKey)
if err != nil {
return false, fmt.Errorf("reading verify.consent: %w", err)
}
if found && val == ConsentGranted {
return true, nil
}
fmt.Fprintln(out, "⚠ Active Key Verification — Legal Notice")
fmt.Fprintln(out, "")
fmt.Fprintln(out, "Using --verify will send HTTP requests to third-party provider APIs")
fmt.Fprintln(out, "for every API key KeyHunter finds. You are responsible for the legal")
fmt.Fprintln(out, "implications of these requests in your jurisdiction (CFAA, Computer")
fmt.Fprintln(out, "Misuse Act, GDPR, provider ToS).")
fmt.Fprintln(out, "")
fmt.Fprintln(out, "Run `keyhunter legal` to read the full disclaimer.")
fmt.Fprintln(out, "")
fmt.Fprint(out, "Type 'yes' to proceed: ")
reader := bufio.NewReader(in)
line, err := reader.ReadString('\n')
if err != nil && err != io.EOF {
return false, fmt.Errorf("reading consent input: %w", err)
}
answer := strings.ToLower(strings.TrimSpace(line))
if answer == "yes" {
if err := db.SetSetting(ConsentSettingKey, ConsentGranted); err != nil {
return false, fmt.Errorf("persisting consent: %w", err)
}
fmt.Fprintln(out, "Consent recorded. Proceeding with verification.")
return true, nil
}
if err := db.SetSetting(ConsentSettingKey, ConsentDeclined); err != nil {
return false, fmt.Errorf("persisting declined consent: %w", err)
}
fmt.Fprintln(out, "Consent declined. Verification skipped.")
return false, nil
}
```
Create `pkg/verify/consent_test.go`:
- `TestEnsureConsent_GrantedPrevious` — seed settings with "granted", call with empty in reader, assert (true, nil), no prompt written
- `TestEnsureConsent_TypeYes` — fresh DB, in = strings.NewReader("yes\n"), assert (true, nil), settings now "granted", prompt text contains "legal implications"
- `TestEnsureConsent_TypeYesUppercase` — in = strings.NewReader("YES\n"), assert (true, nil)
- `TestEnsureConsent_TypeNo` — in = strings.NewReader("no\n"), assert (false, nil), settings now "declined"
- `TestEnsureConsent_Empty` — in = strings.NewReader("\n"), assert (false, nil)
- `TestEnsureConsent_DeclinedNotSticky` — seed settings with "declined", in = strings.NewReader("yes\n"), assert (true, nil) — i.e. re-prompted and now granted
Use `storage.Open(":memory:")` for test DB setup.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/verify/... -run Consent -v</automated>
</verify>
<acceptance_criteria>
- `grep -q 'EnsureConsent' pkg/verify/consent.go`
- `grep -q 'ConsentSettingKey' pkg/verify/consent.go`
- All 6 consent test cases pass
- `go build ./...` succeeds
</acceptance_criteria>
<done>EnsureConsent gates --verify correctly and persists only granted decisions as sticky.</done>
</task>
</tasks>
<verification>
- `go build ./...` clean
- `go test ./pkg/legal/... ./pkg/verify/... -v` all pass
- `go run . legal` prints LEGAL.md content
</verification>
<success_criteria>
- LEGAL.md exists at repo root and in pkg/legal/ for embed
- `keyhunter legal` command works
- EnsureConsent prompts once on first --verify, persists granted, re-prompts if declined
</success_criteria>
<output>
After completion, create `.planning/phases/05-verification-engine/05-02-SUMMARY.md`
</output>

View File

@@ -0,0 +1,424 @@
---
phase: 05-verification-engine
plan: 03
type: execute
wave: 1
depends_on: [05-01]
files_modified:
- pkg/verify/verifier.go
- pkg/verify/verifier_test.go
- pkg/verify/result.go
autonomous: true
requirements: [VRFY-02, VRFY-03, VRFY-05]
must_haves:
truths:
- "HTTPVerifier.Verify(ctx, finding, provider) returns a Result with Status in {live,dead,rate_limited,error,unknown}"
- "{{KEY}} in Headers values and Body is substituted with the plaintext key"
- "HTTP codes in EffectiveSuccessCodes → Status='live'; in EffectiveFailureCodes → Status='dead'; in EffectiveRateLimitCodes → Status='rate_limited'"
- "Metadata extracted from JSON response via gjson paths when response Content-Type is application/json"
- "Per-call context timeout is respected; timeout → Status='error', Error contains 'timeout' or 'deadline'"
- "http:// verify URLs are rejected (HTTPS-only); missing verify URL → Status='unknown'"
- "ants pool with configurable worker count runs verification in parallel"
artifacts:
- path: "pkg/verify/verifier.go"
provides: "HTTPVerifier struct, VerifyAll(ctx, []Finding, reg) chan Result"
contains: "HTTPVerifier"
- path: "pkg/verify/result.go"
provides: "Result struct with Status constants"
contains: "StatusLive"
key_links:
- from: "pkg/verify/verifier.go"
to: "provider.Verify (VerifySpec)"
via: "template substitution + http.Client.Do"
pattern: "{{KEY}}"
- from: "pkg/verify/verifier.go"
to: "github.com/tidwall/gjson"
via: "metadata extraction"
pattern: "gjson.GetBytes"
- from: "pkg/verify/verifier.go"
to: "github.com/panjf2000/ants/v2"
via: "worker pool"
pattern: "ants.NewPool"
---
<objective>
Build the core HTTPVerifier. It takes a Finding plus its Provider, substitutes {{KEY}} into the VerifySpec headers/body, makes a single HTTP call with a bounded timeout, classifies the response into live/dead/rate_limited/error, and extracts metadata via gjson. Includes an ants worker pool for parallel verification across many findings.
Purpose: VRFY-02 (YAML-driven verification, no hardcoded logic), VRFY-03 (metadata extraction), VRFY-05 (configurable per-key timeout).
Output: pkg/verify/verifier.go with the HTTPVerifier, Result types, and unit tests using httptest.
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/phases/05-verification-engine/05-CONTEXT.md
@pkg/providers/schema.go
@pkg/engine/finding.go
<interfaces>
After Plan 05-01 completes, these are the shapes available:
```go
// pkg/providers/schema.go
type VerifySpec struct {
Method, URL, Body string
Headers map[string]string
SuccessCodes, FailureCodes, RateLimitCodes []int
MetadataPaths map[string]string // display-name -> gjson path
ValidStatus, InvalidStatus []int // legacy
}
func (v VerifySpec) EffectiveSuccessCodes() []int
func (v VerifySpec) EffectiveFailureCodes() []int
func (v VerifySpec) EffectiveRateLimitCodes() []int
type Provider struct {
Name string
Verify VerifySpec
// ...
}
// pkg/engine/finding.go
type Finding struct {
ProviderName, KeyValue, KeyMasked string
// ...
Verified bool
VerifyStatus string
VerifyHTTPCode int
VerifyMetadata map[string]string
VerifyError string
}
```
Registry (existing) exposes `func (r *Registry) Get(name string) (*Provider, bool)`.
</interfaces>
</context>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Result types + HTTPVerifier.Verify single-key logic</name>
<files>pkg/verify/result.go, pkg/verify/verifier.go, pkg/verify/verifier_test.go</files>
<behavior>
- Verify(ctx, finding, provider) with missing VerifySpec.URL → Result{Status: StatusUnknown}
- URL starting with "http://" → Result{Status: StatusError, Error: "verify URL must be HTTPS"}
- Default Method is GET when VerifySpec.Method is empty
- "{{KEY}}" substituted in every Header value and in Body
- 200 (or any code in EffectiveSuccessCodes) → StatusLive
- 401/403 (or any EffectiveFailureCodes) → StatusDead
- 429 (or EffectiveRateLimitCodes) → StatusRateLimited; Retry-After header captured in Result.RetryAfter
- Unknown code → StatusUnknown
- JSON response with MetadataPaths set → Metadata populated via gjson
- Non-JSON response → Metadata empty (no error)
- ctx deadline exceeded → StatusError with "timeout" or "deadline" in Error
</behavior>
<action>
1. Create `pkg/verify/result.go`:
```go
package verify
import "time"
const (
StatusLive = "live"
StatusDead = "dead"
StatusRateLimited = "rate_limited"
StatusError = "error"
StatusUnknown = "unknown"
)
type Result struct {
ProviderName string
KeyMasked string
Status string // one of the Status* constants
HTTPCode int
Metadata map[string]string
RetryAfter time.Duration
ResponseTime time.Duration
Error string
}
```
2. Create `pkg/verify/verifier.go`:
```go
package verify
import (
"bytes"
"context"
"crypto/tls"
"fmt"
"io"
"net/http"
"strconv"
"strings"
"time"
"github.com/salvacybersec/keyhunter/pkg/engine"
"github.com/salvacybersec/keyhunter/pkg/providers"
"github.com/tidwall/gjson"
)
const DefaultTimeout = 10 * time.Second
type HTTPVerifier struct {
Client *http.Client
Timeout time.Duration
}
func NewHTTPVerifier(timeout time.Duration) *HTTPVerifier {
if timeout <= 0 {
timeout = DefaultTimeout
}
return &HTTPVerifier{
Client: &http.Client{
Timeout: timeout,
Transport: &http.Transport{
TLSClientConfig: &tls.Config{MinVersion: tls.VersionTLS12},
},
},
Timeout: timeout,
}
}
// Verify runs a single verification against a provider's verify endpoint.
// It never returns an error — transport/classification errors are encoded in Result.
func (v *HTTPVerifier) Verify(ctx context.Context, f engine.Finding, p *providers.Provider) Result {
start := time.Now()
res := Result{ProviderName: f.ProviderName, KeyMasked: f.KeyMasked, Status: StatusUnknown}
spec := p.Verify
if spec.URL == "" {
return res // StatusUnknown: provider has no verify endpoint
}
if strings.HasPrefix(strings.ToLower(spec.URL), "http://") {
res.Status = StatusError
res.Error = "verify URL must be HTTPS"
return res
}
// Substitute {{KEY}} in URL (some providers pass key in query string e.g. Google AI)
url := strings.ReplaceAll(spec.URL, "{{KEY}}", f.KeyValue)
// Also support legacy {KEY} form used by some existing YAMLs
url = strings.ReplaceAll(url, "{KEY}", f.KeyValue)
method := spec.Method
if method == "" {
method = http.MethodGet
}
var bodyReader io.Reader
if spec.Body != "" {
body := strings.ReplaceAll(spec.Body, "{{KEY}}", f.KeyValue)
body = strings.ReplaceAll(body, "{KEY}", f.KeyValue)
bodyReader = bytes.NewBufferString(body)
}
reqCtx, cancel := context.WithTimeout(ctx, v.Timeout)
defer cancel()
req, err := http.NewRequestWithContext(reqCtx, method, url, bodyReader)
if err != nil {
res.Status = StatusError
res.Error = err.Error()
return res
}
for k, val := range spec.Headers {
substituted := strings.ReplaceAll(val, "{{KEY}}", f.KeyValue)
substituted = strings.ReplaceAll(substituted, "{KEY}", f.KeyValue)
req.Header.Set(k, substituted)
}
resp, err := v.Client.Do(req)
res.ResponseTime = time.Since(start)
if err != nil {
res.Status = StatusError
res.Error = err.Error()
return res
}
defer resp.Body.Close()
res.HTTPCode = resp.StatusCode
// Classify
if containsInt(spec.EffectiveSuccessCodes(), resp.StatusCode) {
res.Status = StatusLive
} else if containsInt(spec.EffectiveFailureCodes(), resp.StatusCode) {
res.Status = StatusDead
} else if containsInt(spec.EffectiveRateLimitCodes(), resp.StatusCode) {
res.Status = StatusRateLimited
if ra := resp.Header.Get("Retry-After"); ra != "" {
if secs, err := strconv.Atoi(ra); err == nil {
res.RetryAfter = time.Duration(secs) * time.Second
}
}
} else {
res.Status = StatusUnknown
}
// Metadata extraction only on live responses with JSON body and MetadataPaths
if res.Status == StatusLive && len(spec.MetadataPaths) > 0 {
ct := resp.Header.Get("Content-Type")
if strings.Contains(ct, "application/json") {
bodyBytes, _ := io.ReadAll(io.LimitReader(resp.Body, 1<<20)) // 1 MiB cap
res.Metadata = make(map[string]string, len(spec.MetadataPaths))
for displayName, path := range spec.MetadataPaths {
r := gjson.GetBytes(bodyBytes, path)
if r.Exists() {
res.Metadata[displayName] = r.String()
}
}
}
}
return res
}
func containsInt(haystack []int, needle int) bool {
for _, x := range haystack {
if x == needle {
return true
}
}
return false
}
// Unused import guard
var _ = fmt.Sprintf
```
(Remove the `_ = fmt.Sprintf` line if `fmt` ends up unused.)
3. Create `pkg/verify/verifier_test.go` using httptest.NewTLSServer. Tests:
- `TestVerify_Live_200` — server returns 200, assert StatusLive, HTTPCode=200
- `TestVerify_Dead_401` — server returns 401, assert StatusDead
- `TestVerify_RateLimited_429_WithRetryAfter` — server returns 429 with `Retry-After: 30`, assert StatusRateLimited and RetryAfter == 30s
- `TestVerify_MetadataExtraction` — JSON response `{"organization":{"name":"Acme"},"tier":"plus"}`, MetadataPaths={"org":"organization.name","tier":"tier"}, assert Metadata["org"]=="Acme" and Metadata["tier"]=="plus"
- `TestVerify_KeySubstitution_InHeader` — server inspects `Authorization` header, verify spec Headers={"Authorization":"Bearer {{KEY}}"}, assert server received "Bearer sk-test-keyvalue"
- `TestVerify_KeySubstitution_InBody` — POST with Body `{"api_key":"{{KEY}}"}`, server reads body and asserts substitution
- `TestVerify_KeySubstitution_InURL` — URL `https://host/v1/models?key={{KEY}}`, server inspects req.URL.Query().Get("key")
- `TestVerify_MissingURL_Unknown` — empty spec.URL, assert StatusUnknown
- `TestVerify_HTTPRejected` — URL `http://example.com`, assert StatusError, Error contains "HTTPS"
- `TestVerify_Timeout` — server sleeps 200ms, verifier timeout 50ms, assert StatusError and Error matches /timeout|deadline|canceled/i
For httptest.NewTLSServer, set `verifier.Client.Transport = server.Client().Transport` so the test cert validates. Use a small helper to build a *providers.Provider inline.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/verify/... -run Verify -v</automated>
</verify>
<acceptance_criteria>
- `grep -q 'HTTPVerifier' pkg/verify/verifier.go`
- `grep -q 'StatusLive\|StatusDead\|StatusRateLimited' pkg/verify/result.go`
- `grep -q 'gjson.GetBytes' pkg/verify/verifier.go`
- `grep -q '{{KEY}}' pkg/verify/verifier.go`
- All 10 verifier test cases pass
- `go build ./...` succeeds
</acceptance_criteria>
<done>Single-key verification classifies status correctly, substitutes key template, extracts JSON metadata, enforces HTTPS + timeout.</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: VerifyAll worker pool with ants</name>
<files>pkg/verify/verifier.go, pkg/verify/verifier_test.go</files>
<behavior>
- VerifyAll(ctx, findings, reg, workers) returns chan Result; closes channel after all findings processed
- Workers count respected (default 10 if <= 0)
- Findings whose provider is missing from registry → emit Result{Status: StatusUnknown, Error: "provider not found"}
- ctx cancellation stops further dispatch; channel still closes cleanly
</behavior>
<action>
Append to `pkg/verify/verifier.go`:
```go
import "github.com/panjf2000/ants/v2"
import "sync"
const DefaultWorkers = 10
// VerifyAll runs verification for all findings via an ants worker pool.
// The returned channel is closed after every finding has been processed or ctx is cancelled.
func (v *HTTPVerifier) VerifyAll(ctx context.Context, findings []engine.Finding, reg *providers.Registry, workers int) <-chan Result {
if workers <= 0 {
workers = DefaultWorkers
}
out := make(chan Result, len(findings))
pool, err := ants.NewPool(workers)
if err != nil {
// On pool creation failure, emit one error result per finding and close.
go func() {
defer close(out)
for _, f := range findings {
out <- Result{ProviderName: f.ProviderName, KeyMasked: f.KeyMasked, Status: StatusError, Error: "pool init: " + err.Error()}
}
}()
return out
}
var wg sync.WaitGroup
go func() {
defer close(out)
defer pool.Release()
for i := range findings {
if ctx.Err() != nil {
break
}
f := findings[i]
wg.Add(1)
submitErr := pool.Submit(func() {
defer wg.Done()
prov, ok := reg.Get(f.ProviderName)
if !ok {
out <- Result{ProviderName: f.ProviderName, KeyMasked: f.KeyMasked, Status: StatusUnknown, Error: "provider not found in registry"}
return
}
out <- v.Verify(ctx, f, prov)
})
if submitErr != nil {
wg.Done()
out <- Result{ProviderName: f.ProviderName, KeyMasked: f.KeyMasked, Status: StatusError, Error: submitErr.Error()}
}
}
wg.Wait()
}()
return out
}
```
NOTE: verify the exact API of `reg.Get` — check pkg/providers/registry.go before writing. If the method is named differently (e.g. `Find`, `Lookup`), use that. Also verify that ants/v2 is already in go.mod from earlier phases; if not, `go get github.com/panjf2000/ants/v2`.
Append to `pkg/verify/verifier_test.go`:
- `TestVerifyAll_MultipleFindings` — 5 findings against one test server returning 200, workers=3, assert 5 StatusLive results received
- `TestVerifyAll_MissingProvider` — finding with ProviderName="nonexistent", assert Result.Status == StatusUnknown and Error contains "not found"
- `TestVerifyAll_ContextCancellation` — 100 findings, server sleeps 100ms each, cancel ctx after 50ms, assert channel closes within 1s and fewer than 100 results received
Use a real Registry built via providers.NewRegistry() or a minimal test helper that constructs a Registry with a single test provider. If NewRegistry embeds all real providers, prefer that and add a test provider dynamically if there is an API for it; otherwise add a `newTestRegistry(t, p *Provider) *Registry` helper in the test file.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/verify/... -run VerifyAll -v</automated>
</verify>
<acceptance_criteria>
- `grep -q 'ants.NewPool' pkg/verify/verifier.go`
- `grep -q 'VerifyAll' pkg/verify/verifier.go`
- All 3 VerifyAll test cases pass
- `go build ./...` succeeds
- Race detector clean: `go test ./pkg/verify/... -race -run VerifyAll`
</acceptance_criteria>
<done>Parallel verification via ants pool works; graceful cancellation; missing providers handled.</done>
</task>
</tasks>
<verification>
- `go build ./...` clean
- `go test ./pkg/verify/... -v -race` all pass
- Verifier is YAML-driven (no provider name switches in verifier.go): `grep -v "StatusLive\|StatusDead\|StatusError\|StatusUnknown\|StatusRateLimited" pkg/verify/verifier.go | grep -i "openai\|anthropic\|groq"` returns nothing
</verification>
<success_criteria>
- VRFY-02: single HTTPVerifier drives all providers via YAML VerifySpec
- VRFY-03: metadata extracted via gjson paths on JSON responses
- VRFY-05: per-call timeout respected, default 10s, configurable
- Unit tests cover live/dead/rate-limited/error/unknown + key substitution + metadata + cancellation
</success_criteria>
<output>
After completion, create `.planning/phases/05-verification-engine/05-03-SUMMARY.md`
</output>

View File

@@ -0,0 +1,370 @@
---
phase: 05-verification-engine
plan: 04
type: execute
wave: 1
depends_on: [05-01]
files_modified:
- providers/openai.yaml
- providers/anthropic.yaml
- providers/google-ai.yaml
- providers/cohere.yaml
- providers/mistral.yaml
- providers/groq.yaml
- providers/xai.yaml
- providers/ai21.yaml
- providers/inflection.yaml
- providers/perplexity.yaml
- providers/deepseek.yaml
- providers/together.yaml
- pkg/providers/definitions/openai.yaml
- pkg/providers/definitions/anthropic.yaml
- pkg/providers/definitions/google-ai.yaml
- pkg/providers/definitions/cohere.yaml
- pkg/providers/definitions/mistral.yaml
- pkg/providers/definitions/groq.yaml
- pkg/providers/definitions/xai.yaml
- pkg/providers/definitions/ai21.yaml
- pkg/providers/definitions/inflection.yaml
- pkg/providers/definitions/perplexity.yaml
- pkg/providers/definitions/deepseek.yaml
- pkg/providers/definitions/together.yaml
- pkg/providers/registry_test.go
autonomous: true
requirements: [VRFY-02, VRFY-03]
must_haves:
truths:
- "All 12 Tier 1 provider YAMLs include success_codes, failure_codes, rate_limit_codes fields (extended from legacy valid_status/invalid_status)"
- "{{KEY}} template is used in verify.headers (Bearer token) or verify.url (query key)"
- "Providers with known metadata endpoints include metadata_paths mapping"
- "Dual-location sync: providers/ and pkg/providers/definitions/ kept identical"
- "All YAMLs still load via providers.NewRegistry() with no parse errors"
artifacts:
- path: "providers/openai.yaml"
provides: "OpenAI verify spec with success_codes, {{KEY}} header substitution"
contains: "{{KEY}}"
key_links:
- from: "providers/*.yaml"
to: "pkg/providers/definitions/*.yaml"
via: "dual-location mirror"
pattern: "format_version"
---
<objective>
Update Tier 1 provider YAMLs so each carries a complete verify spec usable by the new HTTPVerifier: `{{KEY}}` template in headers or URL, `success_codes`, `failure_codes`, `rate_limit_codes`, and `metadata_paths` where the provider API returns useful metadata. Must maintain the dual-location sync between `providers/` (user-visible) and `pkg/providers/definitions/` (embed).
Purpose: VRFY-03 requires that provider YAMLs carry verification metadata. Without this update the verifier from Plan 05-03 would have no endpoints to hit.
Output: 12 Tier 1 provider YAMLs updated in both locations; guardrail test asserts presence of new fields.
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/phases/05-verification-engine/05-CONTEXT.md
@providers/openai.yaml
@pkg/providers/schema.go
<interfaces>
Extended VerifySpec (from Plan 05-01) accepts these YAML keys under `verify:`:
```yaml
verify:
method: GET
url: https://api.provider.com/v1/models
headers:
Authorization: "Bearer {{KEY}}"
body: "" # optional, can use {{KEY}}
success_codes: [200]
failure_codes: [401, 403]
rate_limit_codes: [429]
metadata_paths: # display-name -> gjson path
org: "organization.name"
tier: "rate_limit.tier"
```
Legacy fields `valid_status`/`invalid_status` still parse (backward compat) but new YAMLs should use the canonical `success_codes`/`failure_codes`.
**Dual-location rule** (from Phase 1 decisions): every YAML in `providers/` must have an identical copy in `pkg/providers/definitions/` because `go:embed` cannot traverse `..`.
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Update 12 Tier 1 provider YAMLs with extended verify specs (both locations)</name>
<files>providers/openai.yaml, providers/anthropic.yaml, providers/google-ai.yaml, providers/cohere.yaml, providers/mistral.yaml, providers/groq.yaml, providers/xai.yaml, providers/ai21.yaml, providers/inflection.yaml, providers/perplexity.yaml, providers/deepseek.yaml, providers/together.yaml, and mirrors in pkg/providers/definitions/</files>
<action>
For each of the 12 providers below, update BOTH `providers/{name}.yaml` AND `pkg/providers/definitions/{name}.yaml` to have an identical `verify:` block as specified. Preserve existing `format_version`, `name`, `display_name`, `tier`, `last_verified`, `keywords`, and `patterns` — only touch the `verify:` block.
First, check each file exists in both locations — if a provider file is named differently in `pkg/providers/definitions/` than in `providers/`, match the `providers/` location's naming. If a file is missing from `pkg/providers/definitions/`, add it as a copy.
Use `{{KEY}}` (double brace) as the template marker. Set `last_verified: "2026-04-05"` on every updated file.
**1. openai**`Bearer {{KEY}}` header, `GET /v1/models`
```yaml
verify:
method: GET
url: https://api.openai.com/v1/models
headers:
Authorization: "Bearer {{KEY}}"
success_codes: [200]
failure_codes: [401, 403]
rate_limit_codes: [429]
metadata_paths:
first_model: "data.0.id"
object_type: "object"
```
**2. anthropic** — POST /v1/messages with minimal body; Anthropic requires `x-api-key` header and `anthropic-version`
```yaml
verify:
method: POST
url: https://api.anthropic.com/v1/messages
headers:
x-api-key: "{{KEY}}"
anthropic-version: "2023-06-01"
content-type: "application/json"
body: '{"model":"claude-haiku-4-5","max_tokens":1,"messages":[{"role":"user","content":"hi"}]}'
success_codes: [200]
failure_codes: [401, 403]
rate_limit_codes: [429, 529]
metadata_paths:
model: "model"
stop_reason: "stop_reason"
```
**3. google-ai** — key goes in URL query string
```yaml
verify:
method: GET
url: https://generativelanguage.googleapis.com/v1/models?key={{KEY}}
success_codes: [200]
failure_codes: [400, 401, 403]
rate_limit_codes: [429]
metadata_paths:
first_model: "models.0.name"
```
(Note: Google returns 400 for bad key, not 401 — include 400 in failure_codes.)
**4. cohere**
```yaml
verify:
method: GET
url: https://api.cohere.ai/v1/models
headers:
Authorization: "Bearer {{KEY}}"
success_codes: [200]
failure_codes: [401, 403]
rate_limit_codes: [429]
metadata_paths:
first_model: "models.0.name"
```
**5. mistral**
```yaml
verify:
method: GET
url: https://api.mistral.ai/v1/models
headers:
Authorization: "Bearer {{KEY}}"
success_codes: [200]
failure_codes: [401, 403]
rate_limit_codes: [429]
metadata_paths:
first_model: "data.0.id"
```
**6. groq**
```yaml
verify:
method: GET
url: https://api.groq.com/openai/v1/models
headers:
Authorization: "Bearer {{KEY}}"
success_codes: [200]
failure_codes: [401, 403]
rate_limit_codes: [429]
metadata_paths:
first_model: "data.0.id"
```
**7. xai**
```yaml
verify:
method: GET
url: https://api.x.ai/v1/api-key
headers:
Authorization: "Bearer {{KEY}}"
success_codes: [200]
failure_codes: [401, 403]
rate_limit_codes: [429]
metadata_paths:
name: "name"
acls: "acls"
```
**8. ai21**
```yaml
verify:
method: GET
url: https://api.ai21.com/studio/v1/models
headers:
Authorization: "Bearer {{KEY}}"
success_codes: [200]
failure_codes: [401, 403]
rate_limit_codes: [429]
```
**9. inflection** — leave URL empty (no public endpoint) → verifier will return StatusUnknown
```yaml
verify:
method: GET
url: ""
success_codes: [200]
failure_codes: [401, 403]
rate_limit_codes: [429]
```
**10. perplexity**
```yaml
verify:
method: POST
url: https://api.perplexity.ai/chat/completions
headers:
Authorization: "Bearer {{KEY}}"
content-type: "application/json"
body: '{"model":"sonar","messages":[{"role":"user","content":"hi"}],"max_tokens":1}'
success_codes: [200]
failure_codes: [401, 403]
rate_limit_codes: [429]
```
**11. deepseek**
```yaml
verify:
method: GET
url: https://api.deepseek.com/v1/models
headers:
Authorization: "Bearer {{KEY}}"
success_codes: [200]
failure_codes: [401, 403]
rate_limit_codes: [429]
metadata_paths:
first_model: "data.0.id"
```
**12. together**
```yaml
verify:
method: GET
url: https://api.together.xyz/v1/models
headers:
Authorization: "Bearer {{KEY}}"
success_codes: [200]
failure_codes: [401, 403]
rate_limit_codes: [429]
metadata_paths:
first_model: "0.id"
```
After updating, `diff providers/openai.yaml pkg/providers/definitions/openai.yaml` should return nothing (identical files). Verify for each of the 12.
If a provider file in `providers/` has a slightly different filename in `pkg/providers/definitions/` (e.g. `google_ai.yaml` vs `google-ai.yaml`), investigate `pkg/providers/definitions/` directory listing first via `Read` or `Bash ls` to get exact names, then update the matching file.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/providers/... -run Registry -v && for p in openai anthropic google-ai cohere mistral groq xai ai21 perplexity deepseek together; do diff "providers/$p.yaml" "pkg/providers/definitions/$p.yaml" || echo "MISMATCH $p"; done</automated>
</verify>
<acceptance_criteria>
- `grep -l '{{KEY}}' providers/*.yaml | wc -l` returns at least 11 (inflection has empty URL, so no key template)
- `grep -l 'success_codes:' providers/*.yaml | wc -l` returns at least 12
- `grep -l 'metadata_paths:' providers/*.yaml | wc -l` returns at least 8
- All Tier 1 provider files identical between `providers/` and `pkg/providers/definitions/`
- `go test ./pkg/providers/...` passes (existing guardrail tests load YAMLs)
</acceptance_criteria>
<done>All 12 Tier 1 provider YAMLs carry Phase 5 verify specs in both locations.</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Guardrail test — verify spec completeness for Tier 1</name>
<files>pkg/providers/registry_test.go</files>
<behavior>
- Test asserts that all 12 Tier 1 providers in the loaded registry have VerifySpec.URL set (except inflection, which is allowed to be empty)
- Test asserts that providers with a URL have SuccessCodes populated (either via new field or legacy ValidStatus)
- Test asserts that all non-empty verify URLs start with "https://"
</behavior>
<action>
Append to `pkg/providers/registry_test.go` (or create if absent — check first):
```go
func TestTier1VerifySpecs_Complete(t *testing.T) {
reg, err := NewRegistry()
if err != nil {
t.Fatalf("NewRegistry: %v", err)
}
tier1 := []string{"openai", "anthropic", "google-ai", "cohere", "mistral", "groq", "xai", "ai21", "perplexity", "deepseek", "together"}
// Note: inflection intentionally excluded — no public verify endpoint.
for _, name := range tier1 {
p, ok := reg.Get(name) // adjust to match actual Registry method
if !ok {
t.Errorf("provider %q not in registry", name)
continue
}
if p.Verify.URL == "" {
t.Errorf("provider %q: verify.url must be set", name)
continue
}
if !strings.HasPrefix(p.Verify.URL, "https://") {
t.Errorf("provider %q: verify.url must be HTTPS, got %q", name, p.Verify.URL)
}
if len(p.Verify.EffectiveSuccessCodes()) == 0 {
t.Errorf("provider %q: no success codes configured", name)
}
}
}
func TestInflection_NoVerifyEndpoint(t *testing.T) {
reg, err := NewRegistry()
if err != nil {
t.Fatalf("NewRegistry: %v", err)
}
p, ok := reg.Get("inflection")
if !ok {
t.Skip("inflection provider not loaded")
}
if p.Verify.URL != "" {
t.Errorf("inflection should have empty verify.url (no public endpoint), got %q", p.Verify.URL)
}
}
```
Adjust `reg.Get(name)` to match the actual method on Registry (check pkg/providers/registry.go first — may be `Find`, `ByName`, or a map access). If Registry exposes the providers via an exported field or method like `All()`, iterate from that.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/providers/... -run Tier1VerifySpecs -v && go test ./pkg/providers/... -run Inflection -v</automated>
</verify>
<acceptance_criteria>
- Both new tests pass
- `go build ./...` succeeds
</acceptance_criteria>
<done>Guardrail test protects Tier 1 verify spec quality on future edits.</done>
</task>
</tasks>
<verification>
- `go test ./pkg/providers/... -v` all pass
- `diff` between providers/ and pkg/providers/definitions/ copies returns no mismatches for Tier 1 files
- Guardrail test catches any regression
</verification>
<success_criteria>
- 12 Tier 1 providers carry complete verify specs
- Dual-location sync maintained
- New guardrail test prevents future drift
</success_criteria>
<output>
After completion, create `.planning/phases/05-verification-engine/05-04-SUMMARY.md`
</output>

View File

@@ -0,0 +1,320 @@
---
phase: 05-verification-engine
plan: 05
type: execute
wave: 2
depends_on: [05-01, 05-02, 05-03, 05-04]
files_modified:
- cmd/scan.go
- cmd/scan_test.go
- pkg/output/table.go
- pkg/output/table_test.go
autonomous: true
requirements: [VRFY-01, VRFY-04, VRFY-05]
must_haves:
truths:
- "keyhunter scan --verify triggers EnsureConsent before any verify HTTP calls; declined consent skips verification but still prints findings"
- "Verified findings have VerifyStatus populated and are persisted via SaveFinding with verify_* columns set"
- "--verify-timeout=30s changes the per-key HTTP timeout from default 10s"
- "--verify-workers=N sets the ants pool size for parallel verification"
- "Output table shows a VERIFY column: ✓ live / ✗ dead / ⚠ rate-limited / ? unknown / ! error"
- "Verification only runs after scan completes (batch mode) — all findings collected, then verified"
artifacts:
- path: "cmd/scan.go"
provides: "--verify wiring: consent -> verifier -> save -> display"
contains: "verify.EnsureConsent"
- path: "pkg/output/table.go"
provides: "Verification status column"
contains: "VERIFY"
key_links:
- from: "cmd/scan.go"
to: "pkg/verify.HTTPVerifier.VerifyAll"
via: "after scan findings collected"
pattern: "VerifyAll"
- from: "cmd/scan.go"
to: "pkg/verify.EnsureConsent"
via: "gate before verification"
pattern: "EnsureConsent"
- from: "cmd/scan.go"
to: "storage.SaveFinding"
via: "persists verified findings with VerifyStatus populated"
pattern: "storeFinding.VerifyStatus"
---
<objective>
Wire Plans 05-02/03/04 together into the scan command. Add `--verify-timeout` and `--verify-workers` flags, gate verification behind consent, run the verifier over collected findings, persist verify results, and render a new verify column in the output table.
Purpose: End-user visible feature — this is where VRFY-01 (prompt), VRFY-04 (metadata display), and VRFY-05 (configurable timeout) come together.
Output: Working `keyhunter scan --verify` command that prompts on first use and displays verification status.
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/phases/05-verification-engine/05-CONTEXT.md
@cmd/scan.go
@pkg/output/table.go
@pkg/engine/finding.go
<interfaces>
Available after Wave 1 completes:
```go
// pkg/verify/consent.go (Plan 05-02)
func EnsureConsent(db *storage.DB, in io.Reader, out io.Writer) (bool, error)
// pkg/verify/verifier.go (Plan 05-03)
func NewHTTPVerifier(timeout time.Duration) *HTTPVerifier
func (v *HTTPVerifier) VerifyAll(ctx, []engine.Finding, *providers.Registry, workers int) <-chan Result
// pkg/verify/result.go
type Result struct {
ProviderName string
KeyMasked string
Status string // StatusLive/Dead/RateLimited/Error/Unknown
HTTPCode int
Metadata map[string]string
RetryAfter time.Duration
ResponseTime time.Duration
Error string
}
// pkg/storage/findings.go (Plan 05-01)
type Finding struct {
// ... existing ...
Verified bool
VerifyStatus string
VerifyHTTPCode int
VerifyMetadata map[string]string
}
```
Current scan command already has `flagVerify bool`. This plan extends with `flagVerifyTimeout time.Duration` and `flagVerifyWorkers int`.
</interfaces>
</context>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Wire verifier into cmd/scan.go with consent and flags</name>
<files>cmd/scan.go, cmd/scan_test.go</files>
<behavior>
- New flags registered: `--verify-timeout` (default 10s), `--verify-workers` (default 10)
- When --verify is set: collect all findings, call EnsureConsent(db, os.Stdin, os.Stderr)
- If consent declined: print notice to stderr, skip verification, still display + persist unverified findings
- If consent granted: run NewHTTPVerifier(timeout).VerifyAll(ctx, findings, reg, workers), read results from channel, match back to findings by (provider+KeyMasked), update Finding.Verified/VerifyStatus/VerifyHTTPCode/VerifyMetadata
- SaveFinding is called AFTER verification so verify_* columns are persisted in the same row (refactor current loop: collect first, verify second, save third)
- On scan errors or no findings: verification path is a no-op
</behavior>
<action>
1. In `cmd/scan.go`:
a. Add imports: `"io"` (may already exist), `"github.com/salvacybersec/keyhunter/pkg/verify"`, `"time"` (already there).
b. Add new package-level flag variables near existing flagVerify:
```go
var (
flagVerifyTimeout time.Duration
flagVerifyWorkers int
)
```
c. In the `init()` function add:
```go
scanCmd.Flags().DurationVar(&flagVerifyTimeout, "verify-timeout", 10*time.Second, "per-key verification HTTP timeout (default 10s)")
scanCmd.Flags().IntVar(&flagVerifyWorkers, "verify-workers", 10, "parallel workers for key verification (default 10)")
```
d. Refactor the scan loop in `RunE`. Currently the loop saves each finding as it comes from the channel. Change to:
```go
// Collect findings first (no immediate save) so verification can populate
// verify_* fields before persistence.
var findings []engine.Finding
for f := range ch {
findings = append(findings, f)
}
```
e. After the collection loop, add the verification block:
```go
if flagVerify && len(findings) > 0 {
granted, err := verify.EnsureConsent(db, os.Stdin, os.Stderr)
if err != nil {
return fmt.Errorf("consent check: %w", err)
}
if !granted {
fmt.Fprintln(os.Stderr, "Verification skipped (consent not granted). Run `keyhunter legal` for details.")
} else {
verifier := verify.NewHTTPVerifier(flagVerifyTimeout)
resultsCh := verifier.VerifyAll(context.Background(), findings, reg, flagVerifyWorkers)
// Build an index for back-assignment
idx := make(map[string]int, len(findings))
for i, f := range findings {
key := f.ProviderName + "|" + f.KeyMasked
idx[key] = i
}
for r := range resultsCh {
if i, ok := idx[r.ProviderName+"|"+r.KeyMasked]; ok {
findings[i].Verified = true
findings[i].VerifyStatus = r.Status
findings[i].VerifyHTTPCode = r.HTTPCode
findings[i].VerifyMetadata = r.Metadata
if r.Error != "" {
findings[i].VerifyError = r.Error
}
}
}
}
}
```
f. Then persist all findings (moved out of collection loop) with verify fields now populated:
```go
for _, f := range findings {
storeFinding := storage.Finding{
ProviderName: f.ProviderName,
KeyValue: f.KeyValue,
KeyMasked: f.KeyMasked,
Confidence: f.Confidence,
SourcePath: f.Source,
SourceType: f.SourceType,
LineNumber: f.LineNumber,
Verified: f.Verified,
VerifyStatus: f.VerifyStatus,
VerifyHTTPCode: f.VerifyHTTPCode,
VerifyMetadata: f.VerifyMetadata,
}
if _, err := db.SaveFinding(storeFinding, encKey); err != nil {
fmt.Fprintf(os.Stderr, "warning: failed to save finding: %v\n", err)
}
}
```
g. Leave the output rendering call unchanged (Task 2 handles the display column).
2. Create `cmd/scan_test.go` (or append if present) with:
- `TestScan_VerifyFlag_DeclinedConsent_SkipsVerification` — set up a scan command with --verify, provide stdin reader "no\n", run against a test file with one fake key, assert that the resulting in-memory finding has Verified=false and the scan still completes
- `TestScan_VerifyFlag_GrantedConsent_PopulatesStatus` — pre-seed settings "verify.consent" = "granted", run scan --verify against a file containing a test pattern, assert at least one finding has Verified=true after the run
These tests will likely need to refactor scanCmd to accept injected stdin and a test helper to invoke the command function directly (not via cobra execution). If that's too invasive, scope Task 1 tests to:
- `TestScan_VerifyFlags_Registered` — ensure --verify-timeout and --verify-workers flags exist on scanCmd with correct defaults (call `scanCmd.Flags().Lookup("verify-timeout")` and assert non-nil + default "10s")
Prefer the lightweight flag-registration test to avoid pulling the full scan path into tests. Add at least one behavioral integration test if straightforward; otherwise document the limitation in the task-level SUMMARY.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go build ./... && go test ./cmd/... -run VerifyFlag -v</automated>
</verify>
<acceptance_criteria>
- `grep -q 'verify.EnsureConsent' cmd/scan.go`
- `grep -q 'verifier.VerifyAll\|NewHTTPVerifier' cmd/scan.go`
- `grep -q 'verify-timeout' cmd/scan.go`
- `grep -q 'verify-workers' cmd/scan.go`
- `go build ./...` succeeds
- `go run . scan --help` shows --verify, --verify-timeout, --verify-workers flags
- scan_test.go flag-registration test passes
</acceptance_criteria>
<done>Scan command orchestrates consent → verification → save with configurable timeout and workers.</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Output table shows verification status column and metadata</name>
<files>pkg/output/table.go, pkg/output/table_test.go</files>
<behavior>
- When any finding has Verified=true, PrintFindings renders an extra VERIFY column
- Symbol mapping: "live"=✓ (green), "dead"=✗ (red), "rate_limited"=⚠ (yellow), "error"=! (red), "unknown"=? (gray), "" (unverified)=empty cell
- When any finding has VerifyMetadata, a second summary line per finding shows key: value pairs (e.g. " org: Acme Corp, tier: plus")
- When no findings are verified, table layout is unchanged from Phase 1 (backward compat)
</behavior>
<action>
1. In `pkg/output/table.go`, modify `PrintFindings`:
a. Compute `anyVerified := false` by scanning findings once before printing.
b. If anyVerified, add a VERIFY column header between KEY and CONFIDENCE (or after LINE — pick after LINE for minimal disruption to column widths):
```go
fmt.Fprintf(os.Stdout, "%-20s %-40s %-10s %-30s %-5s %s\n",
styleHeader.Render("PROVIDER"),
styleHeader.Render("KEY"),
styleHeader.Render("CONFIDENCE"),
styleHeader.Render("SOURCE"),
styleHeader.Render("LINE"),
styleHeader.Render("VERIFY"),
)
```
c. Add helper:
```go
func verifySymbol(f engine.Finding) string {
if !f.Verified {
return ""
}
switch f.VerifyStatus {
case "live":
return lipgloss.NewStyle().Foreground(lipgloss.Color("2")).Render("✓ live")
case "dead":
return lipgloss.NewStyle().Foreground(lipgloss.Color("1")).Render("✗ dead")
case "rate_limited":
return lipgloss.NewStyle().Foreground(lipgloss.Color("3")).Render("⚠ rate")
case "error":
return lipgloss.NewStyle().Foreground(lipgloss.Color("1")).Render("! err")
default:
return lipgloss.NewStyle().Foreground(lipgloss.Color("8")).Render("? unk")
}
}
```
d. In the per-finding loop, when anyVerified, append verifySymbol(f) as the final column. When len(f.VerifyMetadata) > 0, print a second indented line:
```go
if len(f.VerifyMetadata) > 0 {
parts := make([]string, 0, len(f.VerifyMetadata))
for k, v := range f.VerifyMetadata {
parts = append(parts, fmt.Sprintf("%s: %s", k, v))
}
sort.Strings(parts) // deterministic order
fmt.Fprintf(os.Stdout, " ↳ %s\n", strings.Join(parts, ", "))
}
```
Add the `sort` and `strings` imports.
2. Create `pkg/output/table_test.go`:
- `TestPrintFindings_NoVerification_Unchanged` — findings with Verified=false, capture stdout via os.Pipe redirect, assert output does not contain "VERIFY" header (backward compat)
- `TestPrintFindings_LiveVerification_ShowsCheck` — finding with Verified=true, VerifyStatus="live", assert stdout contains "VERIFY" and "live"
- `TestPrintFindings_Metadata_Rendered` — finding with VerifyMetadata={"org":"Acme","tier":"plus"}, assert stdout contains "org: Acme" and "tier: plus" on the indented metadata line
Capture stdout using the `os.Pipe` + `os.Stdout = w` swap pattern, restore after test. Strip ANSI escape sequences before asserting content (lipgloss output contains them). Use a small helper `stripANSI(s string) string` with a regex `\x1b\[[0-9;]*m`.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/output/... -v</automated>
</verify>
<acceptance_criteria>
- `grep -q 'VERIFY' pkg/output/table.go`
- `grep -q 'verifySymbol\|VerifyStatus' pkg/output/table.go`
- All 3 table tests pass
- `go build ./...` succeeds
- Manual: `go run . scan ./testdata` (without --verify) output is unchanged; with --verify shows VERIFY column
</acceptance_criteria>
<done>Output table renders verify column and metadata line when findings are verified; backward compatible when not.</done>
</task>
</tasks>
<verification>
- `go build ./...` clean
- `go test ./... -v` across all modified packages green
- `go run . scan --help` shows `--verify`, `--verify-timeout`, `--verify-workers`
- Manual smoke: create a file with a fake `sk-proj-...` string, run `go run . scan file.txt --verify`, first run prompts for consent, subsequent runs skip prompt
</verification>
<success_criteria>
- VRFY-01: consent prompt gates --verify on first use
- VRFY-04: metadata displayed under finding when extracted
- VRFY-05: --verify-timeout and --verify-workers flags work
- Unverified scans unchanged from Phase 4 behavior
</success_criteria>
<output>
After completion, create `.planning/phases/05-verification-engine/05-05-SUMMARY.md`
</output>