Files
keyhunter/.planning/phases/01-foundation/01-02-SUMMARY.md
salvacybersec 62fdb14162 docs(01-02): complete provider registry plan
- SUMMARY.md: schema validation + embed loader + Aho-Corasick registry
- STATE.md: updated progress (20%), decisions, metrics
- ROADMAP.md: phase 01 in-progress (1/5 summaries)
- REQUIREMENTS.md: marked CORE-02, CORE-03, CORE-06, PROV-10 complete
2026-04-05 00:13:03 +03:00

7.6 KiB

phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, requirements-completed, duration, completed
phase plan subsystem tags requires provides affects tech-stack key-files key-decisions patterns-established requirements-completed duration completed
01-foundation 02 providers
yaml
embed
aho-corasick
registry
go-embed
gopkg.in/yaml.v3
phase provides
01-01 go.mod with all Phase 1 dependencies, test scaffolding, cmd/root.go stub
Provider, Pattern, VerifySpec, RegistryStats Go structs with YAML validation
Registry with List(), Get(), Stats(), AC() methods
Aho-Corasick automaton built from all provider keywords at NewRegistry()
Three reference provider YAML definitions (openai, anthropic, huggingface)
Compile-time embed of provider YAML via pkg/providers/definitions/
scan-engine
cli-providers-command
verification-engine
storage-layer
added patterns
gopkg.in/yaml.v3 (UnmarshalYAML custom validation)
github.com/petar-dambovaliev/aho-corasick (keyword pre-filter automaton)
embed (stdlib) for compile-time YAML embedding
Provider YAML at providers/ (user-visible) + pkg/providers/definitions/ (embed location)
Type alias pattern for custom UnmarshalYAML without infinite recursion
Registry injected via constructor (NewRegistry), not global singleton
created modified
pkg/providers/schema.go
pkg/providers/loader.go
pkg/providers/registry.go
pkg/providers/registry_test.go
pkg/providers/definitions/openai.yaml
pkg/providers/definitions/anthropic.yaml
pkg/providers/definitions/huggingface.yaml
providers/openai.yaml
providers/anthropic.yaml
providers/huggingface.yaml
Provider YAML kept in dual locations: providers/ (user-visible) and pkg/providers/definitions/ (embedded) — Go embed cannot use '..' paths, so definitions/ subdirectory is canonical embed source
UnmarshalYAML validates format_version >= 1 and non-empty last_verified at parse time, not at registry use time — fail fast on malformed definitions
Aho-Corasick automaton built with DFA=true for deterministic performance — trades memory for guaranteed O(n) matching
Registry is value-safe for concurrent reads — no mutex needed since providers slice is written once at NewRegistry and never mutated
Pattern 1: Type alias in UnmarshalYAML to avoid infinite recursion: `type ProviderAlias Provider`
Pattern 2: embed path convention — YAML at pkg/providers/definitions/, user docs at providers/
Pattern 3: Registry constructor NewRegistry() loads+validates+indexes+builds AC in one call
CORE-02
CORE-03
CORE-06
PROV-10
9min 2026-04-04

Phase 01 Plan 02: Provider Registry Summary

YAML schema structs with UnmarshalYAML validation, embed.FS loader, and Aho-Corasick registry serving List/Get/Stats/AC to all downstream subsystems

Performance

  • Duration: ~9 min
  • Started: 2026-04-04T21:02:31Z
  • Completed: 2026-04-04T21:11:41Z
  • Tasks: 2 (both TDD)
  • Files modified: 10 created, 1 updated (registry_test.go)

Accomplishments

  • Provider YAML schema with compile-time validation (format_version >= 1, last_verified required, confidence enum)
  • Registry loads 3 providers from embedded YAML at startup, builds Aho-Corasick automaton over all keywords
  • Three reference provider YAML definitions with full verify specs (OpenAI, Anthropic, HuggingFace)
  • All 5 provider tests pass: TestRegistryLoad, TestRegistryGet, TestRegistryStats, TestAhoCorasickBuild, TestProviderSchemaValidation

Task Commits

Each task was committed atomically:

  1. TDD RED - Failing tests for schema and registry - ebaf7d7 (test)
  2. Task 1: Provider schema structs and reference YAMLs - 4fcdc42 (feat)
  3. Task 2: Embed loader, registry with AC, filled test stubs - a9859b3 (feat)

Note: Bootstrap (go.mod, main.go, test stubs) was included in the RED commit since Plan 01-01 runs in parallel.

Files Created/Modified

  • pkg/providers/schema.go - Provider, Pattern, VerifySpec, RegistryStats structs with UnmarshalYAML validation
  • pkg/providers/loader.go - embed.FS declaration with //go:embed definitions/*.yaml and fs.WalkDir loader
  • pkg/providers/registry.go - Registry struct with List(), Get(), Stats(), AC() methods and NewRegistry() constructor
  • pkg/providers/registry_test.go - Full test implementation (replaced stub from Plan 01)
  • pkg/providers/definitions/openai.yaml - Embedded OpenAI provider definition
  • pkg/providers/definitions/anthropic.yaml - Embedded Anthropic provider definition
  • pkg/providers/definitions/huggingface.yaml - Embedded HuggingFace provider definition
  • providers/openai.yaml - User-visible OpenAI reference definition
  • providers/anthropic.yaml - User-visible Anthropic reference definition
  • providers/huggingface.yaml - User-visible HuggingFace reference definition

Decisions Made

  • Dual YAML location: providers/ for user reference, pkg/providers/definitions/ for embed — Go's embed package cannot traverse .. paths, so definitions/ inside the package is the only valid embed location.
  • DFA mode for Aho-Corasick: Opts{DFA: true} chosen for guaranteed O(n) matching at cost of higher upfront build time — appropriate for a scanner tool that pays build cost once and scans many files.
  • Constructor injection over globals: NewRegistry() returns a value; callers inject it. No package-level var Registry global — avoids init order issues and enables testing.

Deviations from Plan

Auto-fixed Issues

1. [Rule 3 - Blocking] Bootstrapped Plan 01-01 prerequisites in this worktree

  • Found during: Pre-task setup
  • Issue: Plan 01-02 depends on Plan 01-01 (go.mod, main.go, test stubs) which runs in parallel in a different worktree. This worktree had no go.mod.
  • Fix: Executed Plan 01-01 bootstrap (go mod init, go get all 10 deps, main.go, cmd/root.go, testdata fixtures, test stub files) before starting Plan 01-02 tasks.
  • Files modified: go.mod, go.sum, main.go, cmd/root.go, testdata/samples/.txt, pkg//stub_test.go files
  • Verification: go build ./... succeeded before Plan 01-02 task execution
  • Committed in: ebaf7d7 (RED phase commit includes bootstrap)

2. [Rule 3 - Blocking] go mod tidy required after adding production packages

  • Found during: Task 2 GREEN phase
  • Issue: go test failed with "no required module provides package github.com/petar-dambovaliev/aho-corasick" even though it was in go.mod — tidy hadn't propagated it for non-test code.
  • Fix: Ran go mod tidy which resolved the module graph.
  • Files modified: go.mod, go.sum
  • Verification: go test ./pkg/providers/... passed after tidy

Total deviations: 2 auto-fixed (2 blocking/infrastructure) Impact on plan: Both deviations were infrastructure setup, not scope changes. Plan objectives met exactly.

Issues Encountered

  • Go embed .. path restriction required dual YAML directory strategy (documented in plan's context, confirmed during implementation)
  • aho-corasick package name is aho_corasick (underscore) not ahocorasick — used import alias ahocorasick for cleaner code

User Setup Required

None - no external service configuration required.

Next Phase Readiness

  • Registry interface is stable: NewRegistry(), List(), Get(), Stats(), AC() — downstream plans can depend on these signatures
  • Phase 03 (Storage Layer) can proceed immediately — no registry dependency
  • Phase 04 (Scan Engine) can now wire AC() for keyword pre-filtering
  • Phase 05 (CLI) can call Registry.List() for keyhunter providers list
  • Known: only 3 reference providers embedded; Phase 02-03 will add all 108

Phase: 01-foundation Completed: 2026-04-04