docs(08-07): complete dork guardrail test plan

2026-04-06 00:25:55 +03:00
parent 2c554b9c9c
commit f9e3ad99f8
3 changed files with 126 additions and 9 deletions
--- a/.planning/ROADMAP.md
+++ b/.planning/ROADMAP.md
@@ -183,10 +183,10 @@ Plans:
 - [x] 08-01-PLAN.md — Dork schema, go:embed loader, registry, executor interface, custom_dorks storage table
 - [x] 08-02-PLAN.md — 50 GitHub dork YAML definitions across 5 categories
 - [x] 08-03-PLAN.md — 30 Google + 20 Shodan dork YAML definitions
- [ ] 08-04-PLAN.md — 15 Censys + 10 ZoomEye + 10 FOFA + 10 GitLab + 5 Bing dork YAML definitions
+- [x] 08-04-PLAN.md — 15 Censys + 10 ZoomEye + 10 FOFA + 10 GitLab + 5 Bing dork YAML definitions
- [ ] 08-05-PLAN.md — Live GitHub Code Search executor (net/http, Retry-After, limit cap)
+- [x] 08-05-PLAN.md — Live GitHub Code Search executor (net/http, Retry-After, limit cap)
 - [ ] 08-06-PLAN.md — cmd/dorks.go Cobra tree: list/run/add/export/info/delete
- [ ] 08-07-PLAN.md — Dork count guardrail test (>=150 total, per-source minimums, ID uniqueness)
+- [x] 08-07-PLAN.md — Dork count guardrail test (>=150 total, per-source minimums, ID uniqueness)
 ### Phase 9: OSINT Infrastructure
 **Goal**: The recon engine's `ReconSource` interface, per-source rate limiter architecture, stealth mode, and parallel sweep orchestrator exist and are validated — all individual source modules build on this foundation
--- a/.planning/STATE.md
+++ b/.planning/STATE.md
@@ -3,14 +3,14 @@ gsd_state_version: 1.0
 milestone: v1.0
 milestone_name: milestone
 status: executing
-stopped_at: Completed 08-dork-engine-03-PLAN.md
+stopped_at: Completed 08-07-PLAN.md
-last_updated: "2026-04-05T21:22:30.579Z"
+last_updated: "2026-04-05T21:25:47.473Z"
 last_activity: 2026-04-05
 progress:
  total_phases: 18
  completed_phases: 7
  total_plans: 47
-  completed_plans: 43
+  completed_plans: 46
  percent: 20
 ---
@@ -26,7 +26,7 @@ See: .planning/PROJECT.md (updated 2026-04-04)
 ## Current Position
 Phase: 08 (dork-engine) — EXECUTING
-Plan: 3 of 7
+Plan: 4 of 7
 Status: Ready to execute
 Last activity: 2026-04-05
@@ -81,6 +81,7 @@ Progress: [██░░░░░░░░] 20%
 | Phase 08-dork-engine P01 | 15min | 2 tasks | 10 files |
 | Phase 08-dork-engine P02 | 12min | 2 tasks | 11 files |
 | Phase 08-dork-engine P03 | 10m | 2 tasks | 10 files |
 | Phase 08-dork-engine P07 | 3m | 1 tasks | 1 files |
 ## Accumulated Context
@@ -128,6 +129,6 @@ None yet.
 ## Session Continuity
-Last session: 2026-04-05T21:22:30.575Z
+Last session: 2026-04-05T21:25:47.469Z
-Stopped at: Completed 08-dork-engine-03-PLAN.md
+Stopped at: Completed 08-07-PLAN.md
 Resume file: None
--- a/.planning/phases/08-dork-engine/08-07-SUMMARY.md
+++ b/.planning/phases/08-dork-engine/08-07-SUMMARY.md
@@ -0,0 +1,116 @@
 ---
 phase: 08-dork-engine
 plan: 07
 subsystem: dorks
 tags: [test, guardrail, ci, regression-prevention]
 one_liner: "Guardrail test locks DORK-02 floor at >=150 embedded dorks with per-source minimums and ID uniqueness"
 requires:
  - pkg/dorks.NewRegistry
  - pkg/dorks.Registry.Stats
  - pkg/dorks.Registry.List
 provides:
  - CI enforcement of DORK-02 150+ floor
  - per-source minimum enforcement
  - cross-source dork ID uniqueness guarantee
 affects:
  - pkg/dorks/
 tech_stack:
  added: []
  patterns:
    - table-driven per-source minimums
    - guardrail test against real embedded FS (no mocks)
 key_files:
  created:
    - pkg/dorks/count_test.go
  modified: []
 decisions:
  - Test hits real embedded filesystem via NewRegistry() rather than a synthetic slice — a synthetic slice would not catch YAML regressions.
  - Per-source minimums are hardcoded to the planned distribution (50/30/20/15/10/10/10/5) so removing a file from any source fails CI even if total still clears 150.
  - Stats.BySource / Stats.ByCategory field names matched the plan exactly — no adjustments needed.
 metrics:
  duration: "~3m"
  completed: 2026-04-05
  tasks_total: 1
  tasks_completed: 1
  files_created: 1
  files_modified: 0
 ---
 # Phase 08 Plan 07: Dork Count Guardrail Test Summary
 Guardrail test suite (`pkg/dorks/count_test.go`) that enforces the DORK-02 "150+ built-in dorks" requirement against the real embedded filesystem via `NewRegistry()`. Four tests catch total regressions, per-source drops, missing categories, and ID collisions — the three failure modes a future contributor could introduce without noticing.
 ## What Was Built
 Single test file with four `TestDork*` functions exercising the live embedded corpus:
 1. **TestDorkCountGuardrail** — asserts `len(NewRegistry().List()) >= 150`. Error message cites DORK-02 so future maintainers know the threshold is a requirement, not a suggestion.
 2. **TestDorkCountPerSource** — table-driven check against `Stats().BySource`. Minimums: github>=50, google>=30, shodan>=20, censys>=15, zoomeye>=10, fofa>=10, gitlab>=10, bing>=5.
 3. **TestDorkCategoriesPresent** — confirms all five DORK-01 categories (frontier, specialized, infrastructure, emerging, enterprise) appear at least once in `Stats().ByCategory`.
 4. **TestDorkIDsUnique** — walks `Registry.List()` building a seen-map; any duplicate ID across source files fails the test and reports both source files involved.
 ## Verification Results
 ```
 === RUN   TestDorkCountGuardrail
 --- PASS: TestDorkCountGuardrail (0.00s)
 === RUN   TestDorkCountPerSource
 --- PASS: TestDorkCountPerSource (0.00s)
 === RUN   TestDorkCategoriesPresent
 --- PASS: TestDorkCategoriesPresent (0.00s)
 === RUN   TestDorkIDsUnique
 --- PASS: TestDorkIDsUnique (0.00s)
 PASS
 ok  github.com/salvacybersec/keyhunter/pkg/dorks  0.017s
 ```
 Full `go test ./pkg/dorks/...` also passes (2.024s).
 Current embedded corpus state (captured during verification):
 | Source  | Count | Min | Status |
 | ------- | ----- | --- | ------ |
 | github  | 50    | 50  | at floor |
 | google  | 30    | 30  | at floor |
 | shodan  | 20    | 20  | at floor |
 | censys  | 15    | 15  | at floor |
 | zoomeye | 10    | 10  | at floor |
 | fofa    | 10    | 10  | at floor |
 | gitlab  | 10    | 10  | at floor |
 | bing    | 5     | 5   | at floor |
 | **total** | **150** | **150** | at floor |
 | Category       | Count |
 | -------------- | ----- |
 | infrastructure | 63    |
 | frontier       | 45    |
 | specialized    | 24    |
 | emerging       | 13    |
 | enterprise     | 5     |
 Every source and every category sits at the exact minimum — the guardrail is biting immediately, which is the whole point. Any regression would flip at least one row red.
 ## Deviations from Plan
 None - plan executed exactly as written. The Stats struct field names (`BySource`, `ByCategory` as `map[string]int`) matched the plan's notes, so no test adjustments were needed.
 ## Key Decisions
 - **Real FS over synthetic** — Tests call `NewRegistry()` directly rather than building a `NewRegistryFromDorks(slice)` fixture. Synthetic fixtures would not catch the most likely regression (someone deleting a YAML file).
 - **Hardcoded per-source minimums** — The 50/30/20/15/10/10/10/5 distribution is written into the test, not derived. If a future plan wants to raise a floor, it must also update the test, which is the correct coupling.
 - **Duplicate-ID test reports both sources** — Error message includes both the first and second source of a collision so a reviewer can resolve the conflict without grep.
 ## Files Created
 - `pkg/dorks/count_test.go` (78 lines) — four guardrail tests
 ## Commits
 - `2c554b9` test(08-07): add dork count + uniqueness guardrail
 ## Self-Check: PASSED
 - pkg/dorks/count_test.go: FOUND
 - commit 2c554b9: FOUND
 - all four guardrail tests: PASSED against real embedded FS
 - full `go test ./pkg/dorks/...` suite: PASSED