Files
keyhunter/.planning/phases/08-dork-engine/08-07-SUMMARY.md
2026-04-06 00:25:55 +03:00

4.8 KiB

phase, plan, subsystem, tags, one_liner, requires, provides, affects, tech_stack, key_files, decisions, metrics
phase plan subsystem tags one_liner requires provides affects tech_stack key_files decisions metrics
08-dork-engine 07 dorks
test
guardrail
ci
regression-prevention
Guardrail test locks DORK-02 floor at >=150 embedded dorks with per-source minimums and ID uniqueness
pkg/dorks.NewRegistry
pkg/dorks.Registry.Stats
pkg/dorks.Registry.List
CI enforcement of DORK-02 150+ floor
per-source minimum enforcement
cross-source dork ID uniqueness guarantee
pkg/dorks/
added patterns
table-driven per-source minimums
guardrail test against real embedded FS (no mocks)
created modified
pkg/dorks/count_test.go
Test hits real embedded filesystem via NewRegistry() rather than a synthetic slice — a synthetic slice would not catch YAML regressions.
Per-source minimums are hardcoded to the planned distribution (50/30/20/15/10/10/10/5) so removing a file from any source fails CI even if total still clears 150.
Stats.BySource / Stats.ByCategory field names matched the plan exactly — no adjustments needed.
duration completed tasks_total tasks_completed files_created files_modified
~3m 2026-04-05 1 1 1 0

Phase 08 Plan 07: Dork Count Guardrail Test Summary

Guardrail test suite (pkg/dorks/count_test.go) that enforces the DORK-02 "150+ built-in dorks" requirement against the real embedded filesystem via NewRegistry(). Four tests catch total regressions, per-source drops, missing categories, and ID collisions — the three failure modes a future contributor could introduce without noticing.

What Was Built

Single test file with four TestDork* functions exercising the live embedded corpus:

  1. TestDorkCountGuardrail — asserts len(NewRegistry().List()) >= 150. Error message cites DORK-02 so future maintainers know the threshold is a requirement, not a suggestion.
  2. TestDorkCountPerSource — table-driven check against Stats().BySource. Minimums: github>=50, google>=30, shodan>=20, censys>=15, zoomeye>=10, fofa>=10, gitlab>=10, bing>=5.
  3. TestDorkCategoriesPresent — confirms all five DORK-01 categories (frontier, specialized, infrastructure, emerging, enterprise) appear at least once in Stats().ByCategory.
  4. TestDorkIDsUnique — walks Registry.List() building a seen-map; any duplicate ID across source files fails the test and reports both source files involved.

Verification Results

=== RUN   TestDorkCountGuardrail
--- PASS: TestDorkCountGuardrail (0.00s)
=== RUN   TestDorkCountPerSource
--- PASS: TestDorkCountPerSource (0.00s)
=== RUN   TestDorkCategoriesPresent
--- PASS: TestDorkCategoriesPresent (0.00s)
=== RUN   TestDorkIDsUnique
--- PASS: TestDorkIDsUnique (0.00s)
PASS
ok  github.com/salvacybersec/keyhunter/pkg/dorks  0.017s

Full go test ./pkg/dorks/... also passes (2.024s).

Current embedded corpus state (captured during verification):

Source Count Min Status
github 50 50 at floor
google 30 30 at floor
shodan 20 20 at floor
censys 15 15 at floor
zoomeye 10 10 at floor
fofa 10 10 at floor
gitlab 10 10 at floor
bing 5 5 at floor
total 150 150 at floor
Category Count
infrastructure 63
frontier 45
specialized 24
emerging 13
enterprise 5

Every source and every category sits at the exact minimum — the guardrail is biting immediately, which is the whole point. Any regression would flip at least one row red.

Deviations from Plan

None - plan executed exactly as written. The Stats struct field names (BySource, ByCategory as map[string]int) matched the plan's notes, so no test adjustments were needed.

Key Decisions

  • Real FS over synthetic — Tests call NewRegistry() directly rather than building a NewRegistryFromDorks(slice) fixture. Synthetic fixtures would not catch the most likely regression (someone deleting a YAML file).
  • Hardcoded per-source minimums — The 50/30/20/15/10/10/10/5 distribution is written into the test, not derived. If a future plan wants to raise a floor, it must also update the test, which is the correct coupling.
  • Duplicate-ID test reports both sources — Error message includes both the first and second source of a collision so a reviewer can resolve the conflict without grep.

Files Created

  • pkg/dorks/count_test.go (78 lines) — four guardrail tests

Commits

  • 2c554b9 test(08-07): add dork count + uniqueness guardrail

Self-Check: PASSED

  • pkg/dorks/count_test.go: FOUND
  • commit 2c554b9: FOUND
  • all four guardrail tests: PASSED against real embedded FS
  • full go test ./pkg/dorks/... suite: PASSED