--- phase: 08-dork-engine plan: 07 subsystem: dorks tags: [test, guardrail, ci, regression-prevention] one_liner: "Guardrail test locks DORK-02 floor at >=150 embedded dorks with per-source minimums and ID uniqueness" requires: - pkg/dorks.NewRegistry - pkg/dorks.Registry.Stats - pkg/dorks.Registry.List provides: - CI enforcement of DORK-02 150+ floor - per-source minimum enforcement - cross-source dork ID uniqueness guarantee affects: - pkg/dorks/ tech_stack: added: [] patterns: - table-driven per-source minimums - guardrail test against real embedded FS (no mocks) key_files: created: - pkg/dorks/count_test.go modified: [] decisions: - Test hits real embedded filesystem via NewRegistry() rather than a synthetic slice — a synthetic slice would not catch YAML regressions. - Per-source minimums are hardcoded to the planned distribution (50/30/20/15/10/10/10/5) so removing a file from any source fails CI even if total still clears 150. - Stats.BySource / Stats.ByCategory field names matched the plan exactly — no adjustments needed. metrics: duration: "~3m" completed: 2026-04-05 tasks_total: 1 tasks_completed: 1 files_created: 1 files_modified: 0 --- # Phase 08 Plan 07: Dork Count Guardrail Test Summary Guardrail test suite (`pkg/dorks/count_test.go`) that enforces the DORK-02 "150+ built-in dorks" requirement against the real embedded filesystem via `NewRegistry()`. Four tests catch total regressions, per-source drops, missing categories, and ID collisions — the three failure modes a future contributor could introduce without noticing. ## What Was Built Single test file with four `TestDork*` functions exercising the live embedded corpus: 1. **TestDorkCountGuardrail** — asserts `len(NewRegistry().List()) >= 150`. Error message cites DORK-02 so future maintainers know the threshold is a requirement, not a suggestion. 2. **TestDorkCountPerSource** — table-driven check against `Stats().BySource`. Minimums: github>=50, google>=30, shodan>=20, censys>=15, zoomeye>=10, fofa>=10, gitlab>=10, bing>=5. 3. **TestDorkCategoriesPresent** — confirms all five DORK-01 categories (frontier, specialized, infrastructure, emerging, enterprise) appear at least once in `Stats().ByCategory`. 4. **TestDorkIDsUnique** — walks `Registry.List()` building a seen-map; any duplicate ID across source files fails the test and reports both source files involved. ## Verification Results ``` === RUN TestDorkCountGuardrail --- PASS: TestDorkCountGuardrail (0.00s) === RUN TestDorkCountPerSource --- PASS: TestDorkCountPerSource (0.00s) === RUN TestDorkCategoriesPresent --- PASS: TestDorkCategoriesPresent (0.00s) === RUN TestDorkIDsUnique --- PASS: TestDorkIDsUnique (0.00s) PASS ok github.com/salvacybersec/keyhunter/pkg/dorks 0.017s ``` Full `go test ./pkg/dorks/...` also passes (2.024s). Current embedded corpus state (captured during verification): | Source | Count | Min | Status | | ------- | ----- | --- | ------ | | github | 50 | 50 | at floor | | google | 30 | 30 | at floor | | shodan | 20 | 20 | at floor | | censys | 15 | 15 | at floor | | zoomeye | 10 | 10 | at floor | | fofa | 10 | 10 | at floor | | gitlab | 10 | 10 | at floor | | bing | 5 | 5 | at floor | | **total** | **150** | **150** | at floor | | Category | Count | | -------------- | ----- | | infrastructure | 63 | | frontier | 45 | | specialized | 24 | | emerging | 13 | | enterprise | 5 | Every source and every category sits at the exact minimum — the guardrail is biting immediately, which is the whole point. Any regression would flip at least one row red. ## Deviations from Plan None - plan executed exactly as written. The Stats struct field names (`BySource`, `ByCategory` as `map[string]int`) matched the plan's notes, so no test adjustments were needed. ## Key Decisions - **Real FS over synthetic** — Tests call `NewRegistry()` directly rather than building a `NewRegistryFromDorks(slice)` fixture. Synthetic fixtures would not catch the most likely regression (someone deleting a YAML file). - **Hardcoded per-source minimums** — The 50/30/20/15/10/10/10/5 distribution is written into the test, not derived. If a future plan wants to raise a floor, it must also update the test, which is the correct coupling. - **Duplicate-ID test reports both sources** — Error message includes both the first and second source of a collision so a reviewer can resolve the conflict without grep. ## Files Created - `pkg/dorks/count_test.go` (78 lines) — four guardrail tests ## Commits - `2c554b9` test(08-07): add dork count + uniqueness guardrail ## Self-Check: PASSED - pkg/dorks/count_test.go: FOUND - commit 2c554b9: FOUND - all four guardrail tests: PASSED against real embedded FS - full `go test ./pkg/dorks/...` suite: PASSED