Files
keyhunter/.planning/phases/08-dork-engine/08-02-SUMMARY.md
2026-04-06 00:22:13 +03:00

8.0 KiB

phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, requirements-completed, duration, completed
phase plan subsystem tags requires provides affects tech-stack key-files key-decisions patterns-established requirements-completed duration completed
08-dork-engine 02 dorks
dorks
github
yaml
go-embed
osint
phase provides
08-dork-engine pkg/dorks foundation (schema, loader, registry) from plan 08-01
50 production-ready GitHub code-search dorks covering all 5 categories
YAML list format support in the dork loader for multi-dork files
Dual-location mirror (dorks/github/ and pkg/dorks/definitions/github/)
08-03
08-04
08-05
08-06
08-07
added patterns
Dork YAML files as top-level lists (multiple dorks per file) grouped by category
Dual-location mirror: user-visible dorks/ copy + go:embed pkg/dorks/definitions/ copy
created modified
pkg/dorks/definitions/github/frontier.yaml
pkg/dorks/definitions/github/specialized.yaml
pkg/dorks/definitions/github/infrastructure.yaml
pkg/dorks/definitions/github/emerging.yaml
pkg/dorks/definitions/github/enterprise.yaml
dorks/github/frontier.yaml
dorks/github/specialized.yaml
dorks/github/infrastructure.yaml
dorks/github/emerging.yaml
dorks/github/enterprise.yaml
pkg/dorks/loader.go
Loader accepts both YAML list and single-dork mapping forms for backward compatibility with plan 08-01 tests
Category split into five YAML files (one per taxonomy bucket) rather than one monolithic file for easier diff/review
Dorks use literal GitHub Code Search queries with no templating — GitHub syntax goes straight to the API
Multi-dork YAML: top-level list of Dork mappings per file, grouped by category
Dual-location mirror: identical content in dorks/{source}/ and pkg/dorks/definitions/{source}/
DORK-01
DORK-02
DORK-04
12min 2026-04-05

Phase 8 Plan 02: 50 GitHub Dorks Summary

50 production GitHub code-search dorks across 5 categories (frontier, specialized, infrastructure, emerging, enterprise) covering 40+ LLM/AI providers, embedded via go:embed and mirrored into the user-visible dorks/ tree.

Performance

  • Duration: ~12 min
  • Started: 2026-04-05T21:09:00Z
  • Completed: 2026-04-05T21:21:00Z
  • Tasks: 2
  • Files modified: 11 (10 created, 1 modified)

Accomplishments

  • 50 GitHub dorks loadable via pkg/dorks.NewRegistry().ListBySource("github")
  • All 5 dork taxonomy categories populated (frontier 15, specialized 10, infrastructure 10, emerging 10, enterprise 5)
  • Loader extended to parse YAML list form without breaking existing one-dork-per-file tests
  • Dual-location mirror maintained per the Phase 8 architecture decision

Task Commits

  1. Task 1: 25 GitHub dorks — frontier + specialized categories09722ea (feat)
  2. Task 2: 25 GitHub dorks — infrastructure + emerging + enterprise9755b37 (feat)

Files Created/Modified

  • pkg/dorks/definitions/github/frontier.yaml — 15 Tier 1/2 dorks (OpenAI, Anthropic, Google AI, Azure OpenAI, AWS Bedrock, xAI, Cohere, Mistral, Groq, Together, Replicate)
  • pkg/dorks/definitions/github/specialized.yaml — 10 Tier 3 dorks (Perplexity, Voyage, Jina, AssemblyAI, Deepgram, ElevenLabs, Stability, HuggingFace)
  • pkg/dorks/definitions/github/infrastructure.yaml — 10 Tier 5 gateway + Tier 8 self-hosted dorks (OpenRouter, LiteLLM, Portkey, Helicone, Cloudflare AI, Vercel AI, Ollama, vLLM, LocalAI)
  • pkg/dorks/definitions/github/emerging.yaml — 10 Tier 4 Chinese + Tier 6 vector DB dorks (DeepSeek, Moonshot, Qwen, Zhipu, MiniMax, Pinecone, Weaviate, Qdrant, Chroma, Writer)
  • pkg/dorks/definitions/github/enterprise.yaml — 5 Tier 7/9 enterprise dorks (Codeium, Tabnine, Databricks, Snowflake Cortex, IBM watsonx)
  • dorks/github/*.yaml — mirror of all five category files for user-visible inspection
  • pkg/dorks/loader.go — parse YAML list first, fall back to single-dork mapping

Decisions Made

  • Accept both YAML shapes in the loader (list + single) so the existing TestNewRegistry_EmptyDefinitionsTreeOK test and any future one-off dork files keep working.
  • Split the 50 dorks into five category files rather than one github.yaml — easier to review and aligns with the taxonomy categories enumerated in schema.ValidCategories.
  • Use real provider prefixes verified against pkg/providers/definitions/*.yaml (sk-proj-, sk-ant-api03-, AIzaSy, gsk_, r8_, pplx-, hf_, sk-or-v1-, etc.) so future live execution in plan 08-05 returns genuine hits.

Deviations from Plan

Auto-fixed Issues

1. [Rule 3 — Blocking] Removed empty source subdirectories that broke go:embed

  • Found during: Task 1 verification (go test ./pkg/dorks/...)
  • Issue: Plan 08-01 left pkg/dorks/definitions/{bing,fofa,gitlab,shodan}/ as empty directories. Once pkg/dorks/definitions/github/ gained real YAML files, //go:embed definitions/* started recursing into siblings and errored with cannot embed directory definitions/bing: contains no embeddable files.
  • Fix: Removed the four empty subdirs. They will be re-created by the Wave 2 plans that populate each source (shodan 08-03, etc.). The remaining non-empty siblings (censys, google, zoomeye) already contain YAML from parallel plans and embed fine.
  • Files modified: deleted pkg/dorks/definitions/{bing,fofa,gitlab,shodan}/
  • Verification: go test ./pkg/dorks/... passes, registry loads 126 dorks total (50 github + 76 from parallel sources) with 0 errors.
  • Committed in: 09722ea (folded into Task 1)

2. [Rule 2 — Missing critical] Extended loader to accept YAML list form

  • Found during: Task 1 planning (reading pkg/dorks/loader.go)
  • Issue: The 08-01 loader only accepted one-dork-per-file (yaml.Unmarshal(data, &Dork{})). The plan explicitly anticipated this and instructed me to adapt the loader to also accept top-level lists.
  • Fix: Loader now tries []Dork first; if that yields entries, each is validated and appended. Otherwise it falls back to single-dork parsing (preserving the empty-definitions-tree test from 08-01). Empty YAML is tolerated.
  • Files modified: pkg/dorks/loader.go
  • Verification: go test ./pkg/dorks/... passes — both the empty-tree test and the new 50-dork load succeed.
  • Committed in: 09722ea (Task 1 commit)

Total deviations: 2 auto-fixed (1 blocking, 1 missing-critical — both anticipated by the plan text) Impact on plan: None — both changes were explicitly forecast in the plan's <action> block. No scope creep.

Issues Encountered

  • go:embed does not embed empty directories, so the empty source subdirs left over from 08-01 broke compilation once the github/ sibling had content. Resolved by deleting them; future source plans will recreate them with real content.

User Setup Required

None — these are built-in dorks embedded into the binary.

Next Phase Readiness

  • Plan 08-03 and later source plans can follow the same multi-dork YAML list pattern established here.
  • Plan 08-05 (GitHub live executor) now has 50 real queries to execute against the GitHub Code Search API.
  • Registry statistics: ListBySource("github") returns 50; all 5 categories represented.

Self-Check: PASSED

  • File exists: pkg/dorks/definitions/github/frontier.yaml — FOUND
  • File exists: pkg/dorks/definitions/github/specialized.yaml — FOUND
  • File exists: pkg/dorks/definitions/github/infrastructure.yaml — FOUND
  • File exists: pkg/dorks/definitions/github/emerging.yaml — FOUND
  • File exists: pkg/dorks/definitions/github/enterprise.yaml — FOUND
  • File exists: dorks/github/frontier.yaml — FOUND
  • File exists: dorks/github/specialized.yaml — FOUND
  • File exists: dorks/github/infrastructure.yaml — FOUND
  • File exists: dorks/github/emerging.yaml — FOUND
  • File exists: dorks/github/enterprise.yaml — FOUND
  • Commit exists: 09722ea — FOUND
  • Commit exists: 9755b37 — FOUND
  • Runtime verification: dorks.NewRegistry().ListBySource("github") returned 50 dorks across 5 categories — PASSED
  • go test ./pkg/dorks/... — PASSED
  • go build ./... — PASSED

Phase: 08-dork-engine Completed: 2026-04-05