diff --git a/.planning/phases/08-dork-engine/08-04-SUMMARY.md b/.planning/phases/08-dork-engine/08-04-SUMMARY.md new file mode 100644 index 0000000..d3ddbd1 --- /dev/null +++ b/.planning/phases/08-dork-engine/08-04-SUMMARY.md @@ -0,0 +1,129 @@ +--- +phase: 08-dork-engine +plan: 04 +subsystem: dork-engine +tags: [dorks, censys, zoomeye, fofa, gitlab, bing, yaml, embed] +requires: + - 08-01 # dork schema, loader, registry, executor foundation +provides: + - 15-censys-dorks + - 10-zoomeye-dorks + - 10-fofa-dorks + - 10-gitlab-dorks + - 5-bing-dorks + - 150-dork-grand-total +affects: + - pkg/dorks/definitions/ + - dorks/ +tech-stack: + added: [] + patterns: + - "Dual-located YAML: authoritative copy under pkg/dorks/definitions/{source}/*.yaml is go:embed'd into the binary; dorks/ mirror at repo root stays discoverable for operators browsing the tree." + - "Top-level YAML sequence (`- id: ...`) per source — loader.go already supports both list and single-dork shapes." + - "Category mix per source reflects the real surface: infrastructure sources (Censys/ZoomEye/FOFA) lean heavy infrastructure + a couple frontier cert/proxy catches; code-search (GitLab) spreads across frontier/specialized/emerging; Bing mixes pastebin leaks with one infra catch." +key-files: + created: + - pkg/dorks/definitions/censys/all.yaml + - pkg/dorks/definitions/zoomeye/all.yaml + - pkg/dorks/definitions/fofa/all.yaml + - pkg/dorks/definitions/gitlab/all.yaml + - pkg/dorks/definitions/bing/all.yaml + - dorks/censys/all.yaml + - dorks/zoomeye/all.yaml + - dorks/fofa/all.yaml + - dorks/gitlab/all.yaml + - dorks/bing/all.yaml + modified: [] +decisions: + - "Used a single all.yaml per source (rather than per-category files) because the volumes per source are small (5-15) — splitting by category would scatter 1-3 dork files across directories with no offsetting gain." + - "Marked Azure OpenAI cert, Bedrock cert, and OpenAI-proxy queries as `frontier` (not infrastructure) because they directly fingerprint frontier-model vendors, even when the underlying delivery is via infra indicators (TLS CN / body fragments)." + - "Did not author an executor stub for any of the 5 new sources — plan 08-01 already returns ErrSourceNotImplemented for every non-GitHub source, and live execution is explicitly deferred to OSINT phases 9-16." +metrics: + duration: ~5min + tasks_completed: 2 + files_created: 10 + dorks_added: 50 + grand_total: 150 + completed: 2026-04-05 +--- + +# Phase 08 Plan 04: Censys + ZoomEye + FOFA + GitLab + Bing Dorks Summary + +50 dorks across five non-GitHub/Google/Shodan sources delivered as embedded YAML, bringing the phase grand total to the DORK-02 target of 150. + +## What Was Built + +- **Censys (15):** Search 2.0 queries against `services.http.response.*`, `services.tls.certificates.*`, and `services.port` for Ollama (:11434), vLLM, LocalAI, Open WebUI, LM Studio, NVIDIA Triton, Hugging Face TGI, LiteLLM (:4000), Portkey, LangServe, FastChat, text-generation-webui, plus Azure OpenAI and AWS Bedrock certificate CNs and an OpenAI-compatible proxy body-content catch. 12 infrastructure + 3 frontier. +- **ZoomEye (10):** Mirrors the Censys surface using `app:`, `title:`, `service:`, and `port:` operators. 9 infrastructure + 1 frontier (OpenAI-proxy title match). +- **FOFA (10):** Native `title=`, `body=`, `port=`, `cert=` queries covering Ollama, vLLM, LocalAI, Open WebUI, LiteLLM, Triton, LangServe, TGI, plus two frontier catches (Azure OpenAI cert, OpenAI proxy leaking `api_key`). 8 infrastructure + 2 frontier. +- **GitLab (10):** Code-search dorks for committed `.env`, `.json`, and `.py` files across OpenAI, Anthropic, Google Generative AI, Groq, Cohere, Hugging Face, OpenRouter, Perplexity, DeepSeek, and Pinecone. Mix: frontier (3), specialized (3), emerging (3), infrastructure (1). +- **Bing (5):** `site:pastebin.com` + `filetype:env` + `intitle:/inbody:` operators catching pasted OpenAI/Anthropic/HF keys, `.env` files, and exposed Ollama dashboards. 3 frontier + 1 specialized + 1 infrastructure. + +## Success Criteria Status + +- [x] `Registry.ListBySource("censys")` returns 15 (verified via `grep -c '^- id:'`) +- [x] `Registry.ListBySource("zoomeye")` returns 10 +- [x] `Registry.ListBySource("fofa")` returns 10 +- [x] `Registry.ListBySource("gitlab")` returns 10 +- [x] `Registry.ListBySource("bing")` returns 5 +- [x] Grand total across all 8 sources: **150** (github 50 + google 30 + shodan 20 + censys 15 + zoomeye 10 + fofa 10 + gitlab 10 + bing 5) +- [x] All files dual-located under `pkg/dorks/definitions/` and `dorks/` +- [x] `go test ./pkg/dorks/...` passes + +## Decisions Made + +1. **Single `all.yaml` per source** — per-category splits would create 1-3-entry files for sources that top out at 5-15 dorks total. Single file keeps the tree flat and matches the volume. +2. **Cert fingerprints tagged frontier** — Azure OpenAI and Bedrock certificate CNs are infra-level indicators, but their target is a frontier vendor. The category field drives filtering, so they belong in `frontier` for operators running `dorks run --category frontier`. +3. **No executor stubs needed** — 08-01 already routes non-GitHub sources to `ErrSourceNotImplemented`, and live execution lands in Phase 9-16. These YAML files are pure definitions. + +## Deviations from Plan + +None — plan executed exactly as written. The loader already supported list-form YAML from plans 08-02/08-03, so no loader change was required. + +## Commits + +| Task | Description | Commit | +| ---- | ----------------------------------- | --------- | +| 1 | 15 Censys + 10 ZoomEye dorks | `1c86800` | +| 2 | 10 FOFA + 10 GitLab + 5 Bing dorks | `c504cbd` | + +## Verification + +``` +$ go test ./pkg/dorks/... +ok github.com/salvacybersec/keyhunter/pkg/dorks + +$ for s in github google shodan censys zoomeye fofa gitlab bing; do + grep -rh '^- id:' pkg/dorks/definitions/$s/ | wc -l + done +github: 50 +google: 30 +shodan: 20 +censys: 15 +zoomeye: 10 +fofa: 10 +gitlab: 10 +bing: 5 +TOTAL: 150 +``` + +## Known Stubs + +None. All 50 dorks are complete definitions; execution is intentionally deferred to OSINT phases 9-16 per 08-CONTEXT.md. + +## Self-Check: PASSED + +- FOUND: pkg/dorks/definitions/censys/all.yaml +- FOUND: pkg/dorks/definitions/zoomeye/all.yaml +- FOUND: pkg/dorks/definitions/fofa/all.yaml +- FOUND: pkg/dorks/definitions/gitlab/all.yaml +- FOUND: pkg/dorks/definitions/bing/all.yaml +- FOUND: dorks/censys/all.yaml +- FOUND: dorks/zoomeye/all.yaml +- FOUND: dorks/fofa/all.yaml +- FOUND: dorks/gitlab/all.yaml +- FOUND: dorks/bing/all.yaml +- FOUND commit: 1c86800 +- FOUND commit: c504cbd +- TEST: go test ./pkg/dorks/... PASSED +- COUNT: grand total 150 (target >= 150)