diff --git a/.planning/phases/08-dork-engine/08-CONTEXT.md b/.planning/phases/08-dork-engine/08-CONTEXT.md new file mode 100644 index 0000000..f62ee85 --- /dev/null +++ b/.planning/phases/08-dork-engine/08-CONTEXT.md @@ -0,0 +1,107 @@ +# Phase 8: Dork Engine - Context + +**Gathered:** 2026-04-05 +**Status:** Ready for planning +**Mode:** Auto-generated + + +## Phase Boundary + +Dork engine: YAML-based dork definitions embedded via go:embed (same pattern as providers). 150+ built-in dorks across 8 sources (GitHub, Google, Shodan, Censys, ZoomEye, FOFA, GitLab, Bing). CLI commands: `keyhunter dorks list/run/add/export`. In this phase, `run` executes **GitHub code search** live (needs user GITHUB_TOKEN) and returns placeholder errors for other sources (they're implemented in OSINT phases 9-16). + + + + +## Implementation Decisions + +### Dork Schema (DORK-01) +```yaml +id: openai-github-envfile +name: "OpenAI API Key in .env files" +source: github # github|google|shodan|censys|zoomeye|fofa|gitlab|bing +category: frontier # frontier|specialized|infrastructure|emerging|enterprise +query: 'sk-proj- extension:env' +description: "Finds OpenAI project keys exposed in committed .env files" +tags: [openai, env, tier1] +``` + +### Package Layout +- `pkg/dorks/schema.go` — Dork struct +- `pkg/dorks/loader.go` — go:embed loader (same pattern as providers/) +- `pkg/dorks/registry.go` — List/Get/Stats methods + custom dork persistence +- `pkg/dorks/executor.go` — ExecuteDork interface + per-source implementations (only GitHub live) +- `pkg/dorks/github.go` — GitHub Code Search via REST API (needs GITHUB_TOKEN from env/config) +- `dorks/` directory at repo root — 150+ YAML files organized by source subdir +- `pkg/dorks/definitions/*/*.yaml` — mirrored for go:embed + +### Custom Dorks (DORK-03) +- User-added dorks stored in SQLite `custom_dorks` table (id, source, category, query, description, created_at) +- `keyhunter dorks add` writes to DB +- `keyhunter dorks list` merges embedded + custom +- `keyhunter dorks export` outputs both sets + +### GitHub Code Search (DORK-02 partial) +- `GET https://api.github.com/search/code?q={query}` +- Auth via `GITHUB_TOKEN` env var or config file +- Rate limit: 30 req/min (authenticated), respect Retry-After +- Results piped through engine detection pipeline (each match is a small chunk) +- If no GITHUB_TOKEN → clear error message with setup instructions + +### Other Sources (DORK-02 deferred) +- Return `ErrSourceNotImplemented` with hint that Shodan/Censys/etc. are coming in Phase 9-16 +- Stubs registered in executor so they show in `dorks list` but refuse `dorks run` + +### CLI Commands +- `keyhunter dorks list [--source=X] [--category=Y]` — table output +- `keyhunter dorks run --source=X --category=Y [--id=Z] [--limit=N]` — execute matching dorks +- `keyhunter dorks add --source=X --category=Y --query='...' --description='...'` — persist custom +- `keyhunter dorks export [--format=json|yaml]` — dump all dorks +- `keyhunter dorks info ` — show full dork detail +- `keyhunter dorks delete ` — remove custom dork (embedded cannot be deleted) + +### 150+ Built-in Dorks +- **GitHub (50)**: per-provider prefix searches, .env file searches, config file searches +- **Google (30)**: site: operators for pastebin, github raw, gitlab raw, etc. +- **Shodan (20)**: SSL cert CN matches, banner keyword searches +- **Censys (15)**: similar to Shodan +- **ZoomEye (10)**, **FOFA (10)**, **GitLab (10)**, **Bing (5)** +- Total: 150 + +### New Deps +- None — use stdlib net/http for GitHub API + + + + +## Existing Code Insights + +### Reusable Assets +- pkg/providers/loader.go — go:embed pattern to mirror for dorks +- pkg/providers/registry.go — Registry pattern +- pkg/storage/db.go — add custom_dorks table via migration +- cmd/stubs.go — dorks is a stub to replace +- pkg/output/table.go — for dorks list output + + + + +## Specific Ideas + +Per-provider GitHub dorks should cover: +- `sk-proj- extension:env` (OpenAI) +- `sk-ant-api03- extension:env` (Anthropic) +- `AIzaSy extension:json` (Google AI) +- etc. + +Queries should avoid GitHub-specific quirks (escape quotes properly for API). + + + + +## Deferred Ideas + +- Shodan/Censys/FOFA live execution — deferred to Phase 9-16 (OSINT phases) +- Query templating with variables — over-engineering +- Dork scheduling (daily dork runs) — Phase 17 (scheduler) + +