--- phase: 01-foundation plan: 02 type: execute wave: 1 depends_on: [01-01] files_modified: - providers/openai.yaml - providers/anthropic.yaml - providers/huggingface.yaml - pkg/providers/schema.go - pkg/providers/loader.go - pkg/providers/registry.go - pkg/providers/registry_test.go autonomous: true requirements: [CORE-02, CORE-03, CORE-06, PROV-10] must_haves: truths: - "Provider YAML files are embedded at compile time — no filesystem access at runtime" - "Registry loads all YAML files from embed.FS and returns a slice of Provider structs" - "Provider schema validation rejects YAML missing format_version or last_verified" - "Aho-Corasick automaton is built from all provider keywords at registry init" - "keyhunter providers list command lists providers (tested via registry methods)" artifacts: - path: "providers/openai.yaml" provides: "Reference provider definition with all schema fields" contains: "format_version" - path: "pkg/providers/schema.go" provides: "Provider, Pattern, VerifySpec Go structs with UnmarshalYAML validation" exports: ["Provider", "Pattern", "VerifySpec"] - path: "pkg/providers/registry.go" provides: "Registry struct with List, Get, Stats, AC methods" exports: ["Registry", "NewRegistry"] - path: "pkg/providers/loader.go" provides: "embed.FS declaration and fs.WalkDir loading logic" contains: "go:embed" key_links: - from: "pkg/providers/loader.go" to: "providers/*.yaml" via: "//go:embed directive" pattern: "go:embed.*providers" - from: "pkg/providers/registry.go" to: "github.com/petar-dambovaliev/aho-corasick" via: "AC automaton build at NewRegistry()" pattern: "ahocorasick" - from: "pkg/providers/schema.go" to: "format_version and last_verified YAML fields" via: "UnmarshalYAML validation" pattern: "UnmarshalYAML" --- Build the provider registry: YAML schema structs with validation, embed.FS loader, in-memory registry with List/Get/Stats/AC methods, and three reference provider YAML definitions. The Aho-Corasick automaton is built from all provider keywords at registry initialization. Purpose: Every downstream subsystem (scan engine, CLI providers command, verification engine) depends on the Registry interface. This plan establishes the stable contract they build against. Output: providers/*.yaml, pkg/providers/{schema,loader,registry}.go, registry_test.go (stubs filled). @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md @.planning/phases/01-foundation/01-RESEARCH.md @.planning/phases/01-foundation/01-01-SUMMARY.md Full provider YAML structure: ```yaml format_version: 1 name: openai display_name: OpenAI tier: 1 last_verified: "2026-04-04" keywords: - "sk-proj-" - "openai" patterns: - regex: 'sk-proj-[A-Za-z0-9_\-]{48,}' entropy_min: 3.5 confidence: high verify: method: GET url: https://api.openai.com/v1/models headers: Authorization: "Bearer {KEY}" valid_status: [200] invalid_status: [401, 403] ``` Provider struct fields: FormatVersion int (yaml:"format_version" — must be >= 1) Name string (yaml:"name") DisplayName string (yaml:"display_name") Tier int (yaml:"tier") LastVerified string (yaml:"last_verified" — must be non-empty) Keywords []string (yaml:"keywords") Patterns []Pattern (yaml:"patterns") Verify VerifySpec (yaml:"verify") Pattern struct fields: Regex string (yaml:"regex") EntropyMin float64 (yaml:"entropy_min") Confidence string (yaml:"confidence" — "high", "medium", "low") VerifySpec struct fields: Method string (yaml:"method") URL string (yaml:"url") Headers map[string]string (yaml:"headers") ValidStatus []int (yaml:"valid_status") InvalidStatus []int (yaml:"invalid_status") type Registry struct { ... } func NewRegistry() (*Registry, error) func (r *Registry) List() []Provider func (r *Registry) Get(name string) (Provider, bool) func (r *Registry) Stats() RegistryStats // {Total int, ByTier map[int]int, ByConfidence map[string]int} func (r *Registry) AC() ahocorasick.AhoCorasick // pre-built automaton The embed directive must reference providers relative to loader.go location. loader.go is at pkg/providers/loader.go. providers/ directory is at project root. Use: //go:embed ../../providers/*.yaml Actually: Go embed paths must be relative and cannot use "..". Correct approach: place the embed in a file at project root level, or adjust. Better approach from research: put loader in providers package, embed from pkg/providers, but reference the providers/ dir which sits at root. Resolution: The go:embed directive path is relative to the SOURCE FILE, not the module root. Since loader.go is at pkg/providers/loader.go, to embed ../../providers/*.yaml would work syntactically but Go's embed restricts paths containing "..". Use this instead: place a providers_embed.go at the PROJECT ROOT (same dir as go.mod): package main -- NO, this breaks package separation Correct architectural pattern (from RESEARCH.md example): The embed FS should be in pkg/providers/loader.go using a path that doesn't traverse up. Solution: embed the providers directory from within the providers package itself by symlinking or — better — move the YAML files to pkg/providers/definitions/*.yaml and use: //go:embed definitions/*.yaml This is the clean solution: pkg/providers/definitions/openai.yaml etc. Update files_modified accordingly. The RESEARCH.md shows //go:embed ../../providers/*.yaml but that path won't work with Go's embed restrictions. Use definitions/ subdirectory instead. Task 1: Provider YAML schema structs with validation pkg/providers/schema.go, providers/openai.yaml, providers/anthropic.yaml, providers/huggingface.yaml - /home/salva/Documents/apikey/.planning/phases/01-foundation/01-RESEARCH.md (Pattern 1: Provider Registry, Provider YAML schema section, PROV-10 row in requirements table) - /home/salva/Documents/apikey/.planning/research/ARCHITECTURE.md (Provider Registry component, YAML schema example) - Test 1: Provider with format_version=0 → UnmarshalYAML returns error "format_version must be >= 1" - Test 2: Provider with empty last_verified → UnmarshalYAML returns error "last_verified is required" - Test 3: Valid provider YAML → UnmarshalYAML succeeds, Provider.Name == "openai" - Test 4: Provider with no patterns → loaded successfully (patterns list can be empty for schema-only providers) - Test 5: Pattern.Confidence not in {"high","medium","low"} → error "confidence must be high, medium, or low" Create pkg/providers/schema.go: ```go package providers import ( "fmt" "gopkg.in/yaml.v3" ) // Provider represents a single API key provider definition loaded from YAML. type Provider struct { FormatVersion int `yaml:"format_version"` Name string `yaml:"name"` DisplayName string `yaml:"display_name"` Tier int `yaml:"tier"` LastVerified string `yaml:"last_verified"` Keywords []string `yaml:"keywords"` Patterns []Pattern `yaml:"patterns"` Verify VerifySpec `yaml:"verify"` } // Pattern defines a single regex pattern for API key detection. type Pattern struct { Regex string `yaml:"regex"` EntropyMin float64 `yaml:"entropy_min"` Confidence string `yaml:"confidence"` } // VerifySpec defines how to verify a key is live (used by Phase 5 verification engine). type VerifySpec struct { Method string `yaml:"method"` URL string `yaml:"url"` Headers map[string]string `yaml:"headers"` ValidStatus []int `yaml:"valid_status"` InvalidStatus []int `yaml:"invalid_status"` } // RegistryStats holds aggregate statistics about loaded providers. type RegistryStats struct { Total int ByTier map[int]int ByConfidence map[string]int } // UnmarshalYAML implements yaml.Unmarshaler with schema validation (satisfies PROV-10). func (p *Provider) UnmarshalYAML(value *yaml.Node) error { // Use a type alias to avoid infinite recursion type ProviderAlias Provider var alias ProviderAlias if err := value.Decode(&alias); err != nil { return err } if alias.FormatVersion < 1 { return fmt.Errorf("provider %q: format_version must be >= 1 (got %d)", alias.Name, alias.FormatVersion) } if alias.LastVerified == "" { return fmt.Errorf("provider %q: last_verified is required", alias.Name) } validConfidences := map[string]bool{"high": true, "medium": true, "low": true, "": true} for _, pat := range alias.Patterns { if !validConfidences[pat.Confidence] { return fmt.Errorf("provider %q: pattern confidence %q must be high, medium, or low", alias.Name, pat.Confidence) } } *p = Provider(alias) return nil } ``` Create the three reference YAML provider definitions. These are SCHEMA EXAMPLES for Phase 1; full pattern libraries come in Phase 2-3. **providers/openai.yaml:** ```yaml format_version: 1 name: openai display_name: OpenAI tier: 1 last_verified: "2026-04-04" keywords: - "sk-proj-" - "openai" patterns: - regex: 'sk-proj-[A-Za-z0-9_\-]{48,}' entropy_min: 3.5 confidence: high verify: method: GET url: https://api.openai.com/v1/models headers: Authorization: "Bearer {KEY}" valid_status: [200] invalid_status: [401, 403] ``` **providers/anthropic.yaml:** ```yaml format_version: 1 name: anthropic display_name: Anthropic tier: 1 last_verified: "2026-04-04" keywords: - "sk-ant-api03-" - "anthropic" patterns: - regex: 'sk-ant-api03-[A-Za-z0-9_\-]{93,}' entropy_min: 3.5 confidence: high verify: method: GET url: https://api.anthropic.com/v1/models headers: x-api-key: "{KEY}" anthropic-version: "2023-06-01" valid_status: [200] invalid_status: [401, 403] ``` **providers/huggingface.yaml:** ```yaml format_version: 1 name: huggingface display_name: HuggingFace tier: 3 last_verified: "2026-04-04" keywords: - "hf_" - "huggingface" patterns: - regex: 'hf_[A-Za-z0-9]{34,}' entropy_min: 3.5 confidence: high verify: method: GET url: https://huggingface.co/api/whoami-v2 headers: Authorization: "Bearer {KEY}" valid_status: [200] invalid_status: [401, 403] ``` cd /home/salva/Documents/apikey && go build ./pkg/providers/... && go test ./pkg/providers/... -run TestProviderSchemaValidation -v 2>&1 | head -30 - `go build ./pkg/providers/...` exits 0 - providers/openai.yaml contains `format_version: 1` and `last_verified` - providers/anthropic.yaml contains `format_version: 1` and `last_verified` - providers/huggingface.yaml contains `format_version: 1` and `last_verified` - pkg/providers/schema.go exports: Provider, Pattern, VerifySpec, RegistryStats - Provider.UnmarshalYAML returns error when format_version < 1 - Provider.UnmarshalYAML returns error when last_verified is empty - `grep -q 'UnmarshalYAML' pkg/providers/schema.go` exits 0 Provider schema structs exist with validation. Three reference YAML files exist with all required fields. Task 2: Embed loader, registry with Aho-Corasick, and filled test stubs pkg/providers/loader.go, pkg/providers/registry.go, pkg/providers/registry_test.go - /home/salva/Documents/apikey/.planning/phases/01-foundation/01-RESEARCH.md (Pattern 1: Provider Registry with Compile-Time Embed — exact code example) - /home/salva/Documents/apikey/pkg/providers/schema.go (types just created in Task 1) - Test 1: NewRegistry() loads 3 providers from embedded YAML → registry.List() returns slice of length 3 - Test 2: registry.Get("openai") → returns Provider with Name=="openai", bool==true - Test 3: registry.Get("nonexistent") → returns zero Provider, bool==false - Test 4: registry.Stats().Total == 3 and Stats().ByTier[1] == 2 (openai + anthropic are tier 1) - Test 5: AC automaton built — registry.AC().FindAll("sk-proj-abc") returns non-empty slice - Test 6: AC automaton does NOT match — registry.AC().FindAll("hello world") returns empty slice IMPORTANT NOTE ON EMBED PATHS: Go's embed package does NOT allow paths containing "..". Since loader.go is at pkg/providers/loader.go, it CANNOT embed ../../providers/*.yaml. Solution: Place provider YAML files at pkg/providers/definitions/*.yaml and use: //go:embed definitions/*.yaml This means the YAML files created in Task 1 at providers/openai.yaml etc. are the "source of truth" files users may inspect, but the embedded versions live in pkg/providers/definitions/. Copy them there (or move and update Task 1 output). Actually, the cleanest solution per Go embed docs: put an embed.go file at the PACKAGE level that embeds a subdirectory. Since pkg/providers/ package owns the embed, use: pkg/providers/definitions/openai.yaml (embedded) providers/openai.yaml (user-facing, can symlink or keep as docs) For Phase 1, keep BOTH: the providers/ root dir for user reference, definitions/ for embed. Copy the three YAML files from providers/ to pkg/providers/definitions/ at the end. Create **pkg/providers/loader.go**: ```go package providers import ( "embed" "fmt" "io/fs" "path/filepath" "gopkg.in/yaml.v3" ) //go:embed definitions/*.yaml var definitionsFS embed.FS // loadProviders reads all YAML files from the embedded definitions FS. func loadProviders() ([]Provider, error) { var providers []Provider err := fs.WalkDir(definitionsFS, "definitions", func(path string, d fs.DirEntry, err error) error { if err != nil { return err } if d.IsDir() || filepath.Ext(path) != ".yaml" { return nil } data, err := definitionsFS.ReadFile(path) if err != nil { return fmt.Errorf("reading provider file %s: %w", path, err) } var p Provider if err := yaml.Unmarshal(data, &p); err != nil { return fmt.Errorf("parsing provider %s: %w", path, err) } providers = append(providers, p) return nil }) return providers, err } ``` Create **pkg/providers/registry.go**: ```go package providers import ( "fmt" ahocorasick "github.com/petar-dambovaliev/aho-corasick" ) // Registry is the in-memory store of all loaded provider definitions. // It is initialized once at startup and is safe for concurrent reads. type Registry struct { providers []Provider index map[string]int // name -> slice index ac ahocorasick.AhoCorasick // pre-built automaton for keyword pre-filter } // NewRegistry loads all embedded provider YAML files, validates them, builds the // Aho-Corasick automaton from all provider keywords, and returns the Registry. func NewRegistry() (*Registry, error) { providers, err := loadProviders() if err != nil { return nil, fmt.Errorf("loading providers: %w", err) } index := make(map[string]int, len(providers)) var keywords []string for i, p := range providers { index[p.Name] = i keywords = append(keywords, p.Keywords...) } builder := ahocorasick.NewAhoCorasickBuilder(ahocorasick.Opts{DFA: true}) ac := builder.Build(keywords) return &Registry{ providers: providers, index: index, ac: ac, }, nil } // List returns all loaded providers. func (r *Registry) List() []Provider { return r.providers } // Get returns a provider by name and a boolean indicating whether it was found. func (r *Registry) Get(name string) (Provider, bool) { idx, ok := r.index[name] if !ok { return Provider{}, false } return r.providers[idx], true } // Stats returns aggregate statistics about the loaded providers. func (r *Registry) Stats() RegistryStats { stats := RegistryStats{ Total: len(r.providers), ByTier: make(map[int]int), ByConfidence: make(map[string]int), } for _, p := range r.providers { stats.ByTier[p.Tier]++ for _, pat := range p.Patterns { stats.ByConfidence[pat.Confidence]++ } } return stats } // AC returns the pre-built Aho-Corasick automaton for keyword pre-filtering. func (r *Registry) AC() ahocorasick.AhoCorasick { return r.ac } ``` Then copy the three YAML files into the embed location: ```bash mkdir -p /home/salva/Documents/apikey/pkg/providers/definitions cp /home/salva/Documents/apikey/providers/openai.yaml /home/salva/Documents/apikey/pkg/providers/definitions/ cp /home/salva/Documents/apikey/providers/anthropic.yaml /home/salva/Documents/apikey/pkg/providers/definitions/ cp /home/salva/Documents/apikey/providers/huggingface.yaml /home/salva/Documents/apikey/pkg/providers/definitions/ ``` Finally, fill in **pkg/providers/registry_test.go** (replacing the stubs from Plan 01). Write ONLY the following content — do not include any earlier draft versions: ```go package providers_test import ( "testing" "github.com/salvacybersec/keyhunter/pkg/providers" "github.com/stretchr/testify/assert" "github.com/stretchr/testify/require" "gopkg.in/yaml.v3" ) func TestRegistryLoad(t *testing.T) { reg, err := providers.NewRegistry() require.NoError(t, err) assert.GreaterOrEqual(t, len(reg.List()), 3, "expected at least 3 providers") } func TestRegistryGet(t *testing.T) { reg, err := providers.NewRegistry() require.NoError(t, err) p, ok := reg.Get("openai") assert.True(t, ok) assert.Equal(t, "openai", p.Name) assert.Equal(t, 1, p.Tier) _, notOk := reg.Get("nonexistent-provider") assert.False(t, notOk) } func TestRegistryStats(t *testing.T) { reg, err := providers.NewRegistry() require.NoError(t, err) stats := reg.Stats() assert.GreaterOrEqual(t, stats.Total, 3) assert.GreaterOrEqual(t, stats.ByTier[1], 2) } func TestAhoCorasickBuild(t *testing.T) { reg, err := providers.NewRegistry() require.NoError(t, err) ac := reg.AC() matches := ac.FindAll("export OPENAI_API_KEY=sk-proj-abc") assert.NotEmpty(t, matches) noMatches := ac.FindAll("hello world nothing here") assert.Empty(t, noMatches) } func TestProviderSchemaValidation(t *testing.T) { invalid := []byte("format_version: 0\nname: invalid\nlast_verified: \"\"\n") var p providers.Provider err := yaml.Unmarshal(invalid, &p) assert.Error(t, err) assert.Contains(t, err.Error(), "format_version") } ``` cd /home/salva/Documents/apikey && go test ./pkg/providers/... -v -count=1 2>&1 | tail -20 - `go test ./pkg/providers/... -v` exits 0 with all 5 tests PASS (not SKIP) - TestRegistryLoad passes with >= 3 providers - TestRegistryGet passes — "openai" found, "nonexistent" not found - TestRegistryStats passes — Total >= 3 - TestAhoCorasickBuild passes — "sk-proj-" match found, "hello world" empty - TestProviderSchemaValidation passes — error on format_version=0 - `grep -q 'go:embed' pkg/providers/loader.go` exits 0 - pkg/providers/definitions/ directory exists with 3 YAML files Registry loads providers from embedded YAML, builds Aho-Corasick automaton, exposes List/Get/Stats/AC. All 5 tests pass. After both tasks: - `go test ./pkg/providers/... -v -count=1` exits 0 with 5 tests PASS - `go build ./...` still exits 0 - `grep -q 'format_version' providers/openai.yaml providers/anthropic.yaml providers/huggingface.yaml` exits 0 - `grep -q 'go:embed' pkg/providers/loader.go` exits 0 - pkg/providers/definitions/ has 3 YAML files (same content as providers/) - 3 reference provider YAML files exist in providers/ and pkg/providers/definitions/ with format_version and last_verified - Provider schema validates format_version >= 1 and non-empty last_verified (PROV-10) - Registry loads providers from embed.FS at compile time (CORE-02) - Aho-Corasick automaton built from all keywords at NewRegistry() (CORE-06) - Registry exposes List(), Get(), Stats(), AC() (CORE-03) - 5 provider tests all pass After completion, create `.planning/phases/01-foundation/01-02-SUMMARY.md` following the summary template.