diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index 75b383e..4417a78 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -183,10 +183,10 @@ Plans: - [x] 08-01-PLAN.md — Dork schema, go:embed loader, registry, executor interface, custom_dorks storage table - [x] 08-02-PLAN.md — 50 GitHub dork YAML definitions across 5 categories - [x] 08-03-PLAN.md — 30 Google + 20 Shodan dork YAML definitions -- [ ] 08-04-PLAN.md — 15 Censys + 10 ZoomEye + 10 FOFA + 10 GitLab + 5 Bing dork YAML definitions -- [ ] 08-05-PLAN.md — Live GitHub Code Search executor (net/http, Retry-After, limit cap) +- [x] 08-04-PLAN.md — 15 Censys + 10 ZoomEye + 10 FOFA + 10 GitLab + 5 Bing dork YAML definitions +- [x] 08-05-PLAN.md — Live GitHub Code Search executor (net/http, Retry-After, limit cap) - [ ] 08-06-PLAN.md — cmd/dorks.go Cobra tree: list/run/add/export/info/delete -- [ ] 08-07-PLAN.md — Dork count guardrail test (>=150 total, per-source minimums, ID uniqueness) +- [x] 08-07-PLAN.md — Dork count guardrail test (>=150 total, per-source minimums, ID uniqueness) ### Phase 9: OSINT Infrastructure **Goal**: The recon engine's `ReconSource` interface, per-source rate limiter architecture, stealth mode, and parallel sweep orchestrator exist and are validated — all individual source modules build on this foundation diff --git a/.planning/STATE.md b/.planning/STATE.md index 870bd40..c1406e2 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -3,14 +3,14 @@ gsd_state_version: 1.0 milestone: v1.0 milestone_name: milestone status: executing -stopped_at: Completed 08-dork-engine-03-PLAN.md -last_updated: "2026-04-05T21:22:30.579Z" +stopped_at: Completed 08-07-PLAN.md +last_updated: "2026-04-05T21:25:47.473Z" last_activity: 2026-04-05 progress: total_phases: 18 completed_phases: 7 total_plans: 47 - completed_plans: 43 + completed_plans: 46 percent: 20 --- @@ -26,7 +26,7 @@ See: .planning/PROJECT.md (updated 2026-04-04) ## Current Position Phase: 08 (dork-engine) — EXECUTING -Plan: 3 of 7 +Plan: 4 of 7 Status: Ready to execute Last activity: 2026-04-05 @@ -81,6 +81,7 @@ Progress: [██░░░░░░░░] 20% | Phase 08-dork-engine P01 | 15min | 2 tasks | 10 files | | Phase 08-dork-engine P02 | 12min | 2 tasks | 11 files | | Phase 08-dork-engine P03 | 10m | 2 tasks | 10 files | +| Phase 08-dork-engine P07 | 3m | 1 tasks | 1 files | ## Accumulated Context @@ -128,6 +129,6 @@ None yet. ## Session Continuity -Last session: 2026-04-05T21:22:30.575Z -Stopped at: Completed 08-dork-engine-03-PLAN.md +Last session: 2026-04-05T21:25:47.469Z +Stopped at: Completed 08-07-PLAN.md Resume file: None diff --git a/.planning/phases/08-dork-engine/08-07-SUMMARY.md b/.planning/phases/08-dork-engine/08-07-SUMMARY.md new file mode 100644 index 0000000..857a027 --- /dev/null +++ b/.planning/phases/08-dork-engine/08-07-SUMMARY.md @@ -0,0 +1,116 @@ +--- +phase: 08-dork-engine +plan: 07 +subsystem: dorks +tags: [test, guardrail, ci, regression-prevention] +one_liner: "Guardrail test locks DORK-02 floor at >=150 embedded dorks with per-source minimums and ID uniqueness" +requires: + - pkg/dorks.NewRegistry + - pkg/dorks.Registry.Stats + - pkg/dorks.Registry.List +provides: + - CI enforcement of DORK-02 150+ floor + - per-source minimum enforcement + - cross-source dork ID uniqueness guarantee +affects: + - pkg/dorks/ +tech_stack: + added: [] + patterns: + - table-driven per-source minimums + - guardrail test against real embedded FS (no mocks) +key_files: + created: + - pkg/dorks/count_test.go + modified: [] +decisions: + - Test hits real embedded filesystem via NewRegistry() rather than a synthetic slice — a synthetic slice would not catch YAML regressions. + - Per-source minimums are hardcoded to the planned distribution (50/30/20/15/10/10/10/5) so removing a file from any source fails CI even if total still clears 150. + - Stats.BySource / Stats.ByCategory field names matched the plan exactly — no adjustments needed. +metrics: + duration: "~3m" + completed: 2026-04-05 + tasks_total: 1 + tasks_completed: 1 + files_created: 1 + files_modified: 0 +--- + +# Phase 08 Plan 07: Dork Count Guardrail Test Summary + +Guardrail test suite (`pkg/dorks/count_test.go`) that enforces the DORK-02 "150+ built-in dorks" requirement against the real embedded filesystem via `NewRegistry()`. Four tests catch total regressions, per-source drops, missing categories, and ID collisions — the three failure modes a future contributor could introduce without noticing. + +## What Was Built + +Single test file with four `TestDork*` functions exercising the live embedded corpus: + +1. **TestDorkCountGuardrail** — asserts `len(NewRegistry().List()) >= 150`. Error message cites DORK-02 so future maintainers know the threshold is a requirement, not a suggestion. +2. **TestDorkCountPerSource** — table-driven check against `Stats().BySource`. Minimums: github>=50, google>=30, shodan>=20, censys>=15, zoomeye>=10, fofa>=10, gitlab>=10, bing>=5. +3. **TestDorkCategoriesPresent** — confirms all five DORK-01 categories (frontier, specialized, infrastructure, emerging, enterprise) appear at least once in `Stats().ByCategory`. +4. **TestDorkIDsUnique** — walks `Registry.List()` building a seen-map; any duplicate ID across source files fails the test and reports both source files involved. + +## Verification Results + +``` +=== RUN TestDorkCountGuardrail +--- PASS: TestDorkCountGuardrail (0.00s) +=== RUN TestDorkCountPerSource +--- PASS: TestDorkCountPerSource (0.00s) +=== RUN TestDorkCategoriesPresent +--- PASS: TestDorkCategoriesPresent (0.00s) +=== RUN TestDorkIDsUnique +--- PASS: TestDorkIDsUnique (0.00s) +PASS +ok github.com/salvacybersec/keyhunter/pkg/dorks 0.017s +``` + +Full `go test ./pkg/dorks/...` also passes (2.024s). + +Current embedded corpus state (captured during verification): + +| Source | Count | Min | Status | +| ------- | ----- | --- | ------ | +| github | 50 | 50 | at floor | +| google | 30 | 30 | at floor | +| shodan | 20 | 20 | at floor | +| censys | 15 | 15 | at floor | +| zoomeye | 10 | 10 | at floor | +| fofa | 10 | 10 | at floor | +| gitlab | 10 | 10 | at floor | +| bing | 5 | 5 | at floor | +| **total** | **150** | **150** | at floor | + +| Category | Count | +| -------------- | ----- | +| infrastructure | 63 | +| frontier | 45 | +| specialized | 24 | +| emerging | 13 | +| enterprise | 5 | + +Every source and every category sits at the exact minimum — the guardrail is biting immediately, which is the whole point. Any regression would flip at least one row red. + +## Deviations from Plan + +None - plan executed exactly as written. The Stats struct field names (`BySource`, `ByCategory` as `map[string]int`) matched the plan's notes, so no test adjustments were needed. + +## Key Decisions + +- **Real FS over synthetic** — Tests call `NewRegistry()` directly rather than building a `NewRegistryFromDorks(slice)` fixture. Synthetic fixtures would not catch the most likely regression (someone deleting a YAML file). +- **Hardcoded per-source minimums** — The 50/30/20/15/10/10/10/5 distribution is written into the test, not derived. If a future plan wants to raise a floor, it must also update the test, which is the correct coupling. +- **Duplicate-ID test reports both sources** — Error message includes both the first and second source of a collision so a reviewer can resolve the conflict without grep. + +## Files Created + +- `pkg/dorks/count_test.go` (78 lines) — four guardrail tests + +## Commits + +- `2c554b9` test(08-07): add dork count + uniqueness guardrail + +## Self-Check: PASSED + +- pkg/dorks/count_test.go: FOUND +- commit 2c554b9: FOUND +- all four guardrail tests: PASSED against real embedded FS +- full `go test ./pkg/dorks/...` suite: PASSED