Guardrail test locks DORK-02 floor at >=150 embedded dorks with per-source minimums and ID uniqueness
pkg/dorks.NewRegistry
pkg/dorks.Registry.Stats
pkg/dorks.Registry.List
CI enforcement of DORK-02 150+ floor
per-source minimum enforcement
cross-source dork ID uniqueness guarantee
pkg/dorks/
added
patterns
table-driven per-source minimums
guardrail test against real embedded FS (no mocks)
created
modified
pkg/dorks/count_test.go
Test hits real embedded filesystem via NewRegistry() rather than a synthetic slice — a synthetic slice would not catch YAML regressions.
Per-source minimums are hardcoded to the planned distribution (50/30/20/15/10/10/10/5) so removing a file from any source fails CI even if total still clears 150.
Stats.BySource / Stats.ByCategory field names matched the plan exactly — no adjustments needed.
duration
completed
tasks_total
tasks_completed
files_created
files_modified
~3m
2026-04-05
1
1
1
0
Phase 08 Plan 07: Dork Count Guardrail Test Summary
Guardrail test suite (pkg/dorks/count_test.go) that enforces the DORK-02 "150+ built-in dorks" requirement against the real embedded filesystem via NewRegistry(). Four tests catch total regressions, per-source drops, missing categories, and ID collisions — the three failure modes a future contributor could introduce without noticing.
What Was Built
Single test file with four TestDork* functions exercising the live embedded corpus:
TestDorkCountGuardrail — asserts len(NewRegistry().List()) >= 150. Error message cites DORK-02 so future maintainers know the threshold is a requirement, not a suggestion.
TestDorkCategoriesPresent — confirms all five DORK-01 categories (frontier, specialized, infrastructure, emerging, enterprise) appear at least once in Stats().ByCategory.
TestDorkIDsUnique — walks Registry.List() building a seen-map; any duplicate ID across source files fails the test and reports both source files involved.
Verification Results
=== RUN TestDorkCountGuardrail
--- PASS: TestDorkCountGuardrail (0.00s)
=== RUN TestDorkCountPerSource
--- PASS: TestDorkCountPerSource (0.00s)
=== RUN TestDorkCategoriesPresent
--- PASS: TestDorkCategoriesPresent (0.00s)
=== RUN TestDorkIDsUnique
--- PASS: TestDorkIDsUnique (0.00s)
PASS
ok github.com/salvacybersec/keyhunter/pkg/dorks 0.017s
Full go test ./pkg/dorks/... also passes (2.024s).
Current embedded corpus state (captured during verification):
Source
Count
Min
Status
github
50
50
at floor
google
30
30
at floor
shodan
20
20
at floor
censys
15
15
at floor
zoomeye
10
10
at floor
fofa
10
10
at floor
gitlab
10
10
at floor
bing
5
5
at floor
total
150
150
at floor
Category
Count
infrastructure
63
frontier
45
specialized
24
emerging
13
enterprise
5
Every source and every category sits at the exact minimum — the guardrail is biting immediately, which is the whole point. Any regression would flip at least one row red.
Deviations from Plan
None - plan executed exactly as written. The Stats struct field names (BySource, ByCategory as map[string]int) matched the plan's notes, so no test adjustments were needed.
Key Decisions
Real FS over synthetic — Tests call NewRegistry() directly rather than building a NewRegistryFromDorks(slice) fixture. Synthetic fixtures would not catch the most likely regression (someone deleting a YAML file).
Hardcoded per-source minimums — The 50/30/20/15/10/10/10/5 distribution is written into the test, not derived. If a future plan wants to raise a floor, it must also update the test, which is the correct coupling.
Duplicate-ID test reports both sources — Error message includes both the first and second source of a collision so a reviewer can resolve the conflict without grep.
Files Created
pkg/dorks/count_test.go (78 lines) — four guardrail tests