keyhunter/.planning/phases/08-dork-engine/08-07-SUMMARY.md

---
phase: 08-dork-engine
plan: 07
subsystem: dorks
tags: [test, guardrail, ci, regression-prevention]
one_liner: "Guardrail test locks DORK-02 floor at >=150 embedded dorks with per-source minimums and ID uniqueness"
requires:
  - pkg/dorks.NewRegistry
  - pkg/dorks.Registry.Stats
  - pkg/dorks.Registry.List
provides:
  - CI enforcement of DORK-02 150+ floor
  - per-source minimum enforcement
  - cross-source dork ID uniqueness guarantee
affects:
  - pkg/dorks/
tech_stack:
  added: []
  patterns:
    - table-driven per-source minimums
    - guardrail test against real embedded FS (no mocks)
key_files:
  created:
    - pkg/dorks/count_test.go
  modified: []
decisions:
  - Test hits real embedded filesystem via NewRegistry() rather than a synthetic slice — a synthetic slice would not catch YAML regressions.
  - Per-source minimums are hardcoded to the planned distribution (50/30/20/15/10/10/10/5) so removing a file from any source fails CI even if total still clears 150.
  - Stats.BySource / Stats.ByCategory field names matched the plan exactly — no adjustments needed.
metrics:
  duration: "~3m"
  completed: 2026-04-05
  tasks_total: 1
  tasks_completed: 1
  files_created: 1
  files_modified: 0
---

# Phase 08 Plan 07: Dork Count Guardrail Test Summary

Guardrail test suite (`pkg/dorks/count_test.go`) that enforces the DORK-02 "150+ built-in dorks" requirement against the real embedded filesystem via `NewRegistry()`. Four tests catch total regressions, per-source drops, missing categories, and ID collisions — the three failure modes a future contributor could introduce without noticing.

## What Was Built

Single test file with four `TestDork*` functions exercising the live embedded corpus:

1. **TestDorkCountGuardrail** — asserts `len(NewRegistry().List()) >= 150`. Error message cites DORK-02 so future maintainers know the threshold is a requirement, not a suggestion.
2. **TestDorkCountPerSource** — table-driven check against `Stats().BySource`. Minimums: github>=50, google>=30, shodan>=20, censys>=15, zoomeye>=10, fofa>=10, gitlab>=10, bing>=5.
3. **TestDorkCategoriesPresent** — confirms all five DORK-01 categories (frontier, specialized, infrastructure, emerging, enterprise) appear at least once in `Stats().ByCategory`.
4. **TestDorkIDsUnique** — walks `Registry.List()` building a seen-map; any duplicate ID across source files fails the test and reports both source files involved.

## Verification Results

```
=== RUN   TestDorkCountGuardrail
--- PASS: TestDorkCountGuardrail (0.00s)
=== RUN   TestDorkCountPerSource
--- PASS: TestDorkCountPerSource (0.00s)
=== RUN   TestDorkCategoriesPresent
--- PASS: TestDorkCategoriesPresent (0.00s)
=== RUN   TestDorkIDsUnique
--- PASS: TestDorkIDsUnique (0.00s)
PASS
ok  github.com/salvacybersec/keyhunter/pkg/dorks  0.017s
```

Full `go test ./pkg/dorks/...` also passes (2.024s).

Current embedded corpus state (captured during verification):

| Source  | Count | Min | Status |
| ------- | ----- | --- | ------ |
| github  | 50    | 50  | at floor |
| google  | 30    | 30  | at floor |
| shodan  | 20    | 20  | at floor |
| censys  | 15    | 15  | at floor |
| zoomeye | 10    | 10  | at floor |
| fofa    | 10    | 10  | at floor |
| gitlab  | 10    | 10  | at floor |
| bing    | 5     | 5   | at floor |
| **total** | **150** | **150** | at floor |

| Category       | Count |
| -------------- | ----- |
| infrastructure | 63    |
| frontier       | 45    |
| specialized    | 24    |
| emerging       | 13    |
| enterprise     | 5     |

Every source and every category sits at the exact minimum — the guardrail is biting immediately, which is the whole point. Any regression would flip at least one row red.

## Deviations from Plan

None - plan executed exactly as written. The Stats struct field names (`BySource`, `ByCategory` as `map[string]int`) matched the plan's notes, so no test adjustments were needed.

## Key Decisions

- **Real FS over synthetic** — Tests call `NewRegistry()` directly rather than building a `NewRegistryFromDorks(slice)` fixture. Synthetic fixtures would not catch the most likely regression (someone deleting a YAML file).
- **Hardcoded per-source minimums** — The 50/30/20/15/10/10/10/5 distribution is written into the test, not derived. If a future plan wants to raise a floor, it must also update the test, which is the correct coupling.
- **Duplicate-ID test reports both sources** — Error message includes both the first and second source of a collision so a reviewer can resolve the conflict without grep.

## Files Created

- `pkg/dorks/count_test.go` (78 lines) — four guardrail tests

## Commits

- `2c554b9` test(08-07): add dork count + uniqueness guardrail

## Self-Check: PASSED

- pkg/dorks/count_test.go: FOUND
- commit 2c554b9: FOUND
- all four guardrail tests: PASSED against real embedded FS
- full `go test ./pkg/dorks/...` suite: PASSED