--- phase: 08-dork-engine plan: 02 subsystem: dorks tags: [dorks, github, yaml, go-embed, osint] requires: - phase: 08-dork-engine provides: pkg/dorks foundation (schema, loader, registry) from plan 08-01 provides: - 50 production-ready GitHub code-search dorks covering all 5 categories - YAML list format support in the dork loader for multi-dork files - Dual-location mirror (dorks/github/ and pkg/dorks/definitions/github/) affects: [08-03, 08-04, 08-05, 08-06, 08-07] tech-stack: added: [] patterns: - "Dork YAML files as top-level lists (multiple dorks per file) grouped by category" - "Dual-location mirror: user-visible dorks/ copy + go:embed pkg/dorks/definitions/ copy" key-files: created: - pkg/dorks/definitions/github/frontier.yaml - pkg/dorks/definitions/github/specialized.yaml - pkg/dorks/definitions/github/infrastructure.yaml - pkg/dorks/definitions/github/emerging.yaml - pkg/dorks/definitions/github/enterprise.yaml - dorks/github/frontier.yaml - dorks/github/specialized.yaml - dorks/github/infrastructure.yaml - dorks/github/emerging.yaml - dorks/github/enterprise.yaml modified: - pkg/dorks/loader.go key-decisions: - "Loader accepts both YAML list and single-dork mapping forms for backward compatibility with plan 08-01 tests" - "Category split into five YAML files (one per taxonomy bucket) rather than one monolithic file for easier diff/review" - "Dorks use literal GitHub Code Search queries with no templating — GitHub syntax goes straight to the API" patterns-established: - "Multi-dork YAML: top-level list of Dork mappings per file, grouped by category" - "Dual-location mirror: identical content in dorks/{source}/ and pkg/dorks/definitions/{source}/" requirements-completed: [DORK-01, DORK-02, DORK-04] duration: 12min completed: 2026-04-05 --- # Phase 8 Plan 02: 50 GitHub Dorks Summary **50 production GitHub code-search dorks across 5 categories (frontier, specialized, infrastructure, emerging, enterprise) covering 40+ LLM/AI providers, embedded via go:embed and mirrored into the user-visible dorks/ tree.** ## Performance - **Duration:** ~12 min - **Started:** 2026-04-05T21:09:00Z - **Completed:** 2026-04-05T21:21:00Z - **Tasks:** 2 - **Files modified:** 11 (10 created, 1 modified) ## Accomplishments - 50 GitHub dorks loadable via `pkg/dorks.NewRegistry().ListBySource("github")` - All 5 dork taxonomy categories populated (frontier 15, specialized 10, infrastructure 10, emerging 10, enterprise 5) - Loader extended to parse YAML list form without breaking existing one-dork-per-file tests - Dual-location mirror maintained per the Phase 8 architecture decision ## Task Commits 1. **Task 1: 25 GitHub dorks — frontier + specialized categories** — `09722ea` (feat) 2. **Task 2: 25 GitHub dorks — infrastructure + emerging + enterprise** — `9755b37` (feat) ## Files Created/Modified - `pkg/dorks/definitions/github/frontier.yaml` — 15 Tier 1/2 dorks (OpenAI, Anthropic, Google AI, Azure OpenAI, AWS Bedrock, xAI, Cohere, Mistral, Groq, Together, Replicate) - `pkg/dorks/definitions/github/specialized.yaml` — 10 Tier 3 dorks (Perplexity, Voyage, Jina, AssemblyAI, Deepgram, ElevenLabs, Stability, HuggingFace) - `pkg/dorks/definitions/github/infrastructure.yaml` — 10 Tier 5 gateway + Tier 8 self-hosted dorks (OpenRouter, LiteLLM, Portkey, Helicone, Cloudflare AI, Vercel AI, Ollama, vLLM, LocalAI) - `pkg/dorks/definitions/github/emerging.yaml` — 10 Tier 4 Chinese + Tier 6 vector DB dorks (DeepSeek, Moonshot, Qwen, Zhipu, MiniMax, Pinecone, Weaviate, Qdrant, Chroma, Writer) - `pkg/dorks/definitions/github/enterprise.yaml` — 5 Tier 7/9 enterprise dorks (Codeium, Tabnine, Databricks, Snowflake Cortex, IBM watsonx) - `dorks/github/*.yaml` — mirror of all five category files for user-visible inspection - `pkg/dorks/loader.go` — parse YAML list first, fall back to single-dork mapping ## Decisions Made - Accept both YAML shapes in the loader (list + single) so the existing `TestNewRegistry_EmptyDefinitionsTreeOK` test and any future one-off dork files keep working. - Split the 50 dorks into five category files rather than one `github.yaml` — easier to review and aligns with the taxonomy categories enumerated in `schema.ValidCategories`. - Use real provider prefixes verified against `pkg/providers/definitions/*.yaml` (sk-proj-, sk-ant-api03-, AIzaSy, gsk_, r8_, pplx-, hf_, sk-or-v1-, etc.) so future live execution in plan 08-05 returns genuine hits. ## Deviations from Plan ### Auto-fixed Issues **1. [Rule 3 — Blocking] Removed empty source subdirectories that broke go:embed** - **Found during:** Task 1 verification (`go test ./pkg/dorks/...`) - **Issue:** Plan 08-01 left `pkg/dorks/definitions/{bing,fofa,gitlab,shodan}/` as empty directories. Once `pkg/dorks/definitions/github/` gained real YAML files, `//go:embed definitions/*` started recursing into siblings and errored with `cannot embed directory definitions/bing: contains no embeddable files`. - **Fix:** Removed the four empty subdirs. They will be re-created by the Wave 2 plans that populate each source (shodan 08-03, etc.). The remaining non-empty siblings (`censys`, `google`, `zoomeye`) already contain YAML from parallel plans and embed fine. - **Files modified:** deleted `pkg/dorks/definitions/{bing,fofa,gitlab,shodan}/` - **Verification:** `go test ./pkg/dorks/...` passes, registry loads 126 dorks total (50 github + 76 from parallel sources) with 0 errors. - **Committed in:** `09722ea` (folded into Task 1) **2. [Rule 2 — Missing critical] Extended loader to accept YAML list form** - **Found during:** Task 1 planning (reading `pkg/dorks/loader.go`) - **Issue:** The 08-01 loader only accepted one-dork-per-file (`yaml.Unmarshal(data, &Dork{})`). The plan explicitly anticipated this and instructed me to adapt the loader to also accept top-level lists. - **Fix:** Loader now tries `[]Dork` first; if that yields entries, each is validated and appended. Otherwise it falls back to single-dork parsing (preserving the empty-definitions-tree test from 08-01). Empty YAML is tolerated. - **Files modified:** `pkg/dorks/loader.go` - **Verification:** `go test ./pkg/dorks/...` passes — both the empty-tree test and the new 50-dork load succeed. - **Committed in:** `09722ea` (Task 1 commit) --- **Total deviations:** 2 auto-fixed (1 blocking, 1 missing-critical — both anticipated by the plan text) **Impact on plan:** None — both changes were explicitly forecast in the plan's `` block. No scope creep. ## Issues Encountered - go:embed does not embed empty directories, so the empty source subdirs left over from 08-01 broke compilation once the `github/` sibling had content. Resolved by deleting them; future source plans will recreate them with real content. ## User Setup Required None — these are built-in dorks embedded into the binary. ## Next Phase Readiness - Plan 08-03 and later source plans can follow the same multi-dork YAML list pattern established here. - Plan 08-05 (GitHub live executor) now has 50 real queries to execute against the GitHub Code Search API. - Registry statistics: `ListBySource("github")` returns 50; all 5 categories represented. ## Self-Check: PASSED - File exists: `pkg/dorks/definitions/github/frontier.yaml` — FOUND - File exists: `pkg/dorks/definitions/github/specialized.yaml` — FOUND - File exists: `pkg/dorks/definitions/github/infrastructure.yaml` — FOUND - File exists: `pkg/dorks/definitions/github/emerging.yaml` — FOUND - File exists: `pkg/dorks/definitions/github/enterprise.yaml` — FOUND - File exists: `dorks/github/frontier.yaml` — FOUND - File exists: `dorks/github/specialized.yaml` — FOUND - File exists: `dorks/github/infrastructure.yaml` — FOUND - File exists: `dorks/github/emerging.yaml` — FOUND - File exists: `dorks/github/enterprise.yaml` — FOUND - Commit exists: `09722ea` — FOUND - Commit exists: `9755b37` — FOUND - Runtime verification: `dorks.NewRegistry().ListBySource("github")` returned 50 dorks across 5 categories — PASSED - `go test ./pkg/dorks/...` — PASSED - `go build ./...` — PASSED --- *Phase: 08-dork-engine* *Completed: 2026-04-05*