4.1 KiB
4.1 KiB
Phase 8: Dork Engine - Context
Gathered: 2026-04-05 Status: Ready for planning Mode: Auto-generated
## Phase BoundaryDork engine: YAML-based dork definitions embedded via go:embed (same pattern as providers). 150+ built-in dorks across 8 sources (GitHub, Google, Shodan, Censys, ZoomEye, FOFA, GitLab, Bing). CLI commands: keyhunter dorks list/run/add/export. In this phase, run executes GitHub code search live (needs user GITHUB_TOKEN) and returns placeholder errors for other sources (they're implemented in OSINT phases 9-16).
Dork Schema (DORK-01)
id: openai-github-envfile
name: "OpenAI API Key in .env files"
source: github # github|google|shodan|censys|zoomeye|fofa|gitlab|bing
category: frontier # frontier|specialized|infrastructure|emerging|enterprise
query: 'sk-proj- extension:env'
description: "Finds OpenAI project keys exposed in committed .env files"
tags: [openai, env, tier1]
Package Layout
pkg/dorks/schema.go— Dork structpkg/dorks/loader.go— go:embed loader (same pattern as providers/)pkg/dorks/registry.go— List/Get/Stats methods + custom dork persistencepkg/dorks/executor.go— ExecuteDork interface + per-source implementations (only GitHub live)pkg/dorks/github.go— GitHub Code Search via REST API (needs GITHUB_TOKEN from env/config)dorks/directory at repo root — 150+ YAML files organized by source subdirpkg/dorks/definitions/*/*.yaml— mirrored for go:embed
Custom Dorks (DORK-03)
- User-added dorks stored in SQLite
custom_dorkstable (id, source, category, query, description, created_at) keyhunter dorks addwrites to DBkeyhunter dorks listmerges embedded + customkeyhunter dorks exportoutputs both sets
GitHub Code Search (DORK-02 partial)
GET https://api.github.com/search/code?q={query}- Auth via
GITHUB_TOKENenv var or config file - Rate limit: 30 req/min (authenticated), respect Retry-After
- Results piped through engine detection pipeline (each match is a small chunk)
- If no GITHUB_TOKEN → clear error message with setup instructions
Other Sources (DORK-02 deferred)
- Return
ErrSourceNotImplementedwith hint that Shodan/Censys/etc. are coming in Phase 9-16 - Stubs registered in executor so they show in
dorks listbut refusedorks run
CLI Commands
keyhunter dorks list [--source=X] [--category=Y]— table outputkeyhunter dorks run --source=X --category=Y [--id=Z] [--limit=N]— execute matching dorkskeyhunter dorks add --source=X --category=Y --query='...' --description='...'— persist customkeyhunter dorks export [--format=json|yaml]— dump all dorkskeyhunter dorks info <id>— show full dork detailkeyhunter dorks delete <id>— remove custom dork (embedded cannot be deleted)
150+ Built-in Dorks
- GitHub (50): per-provider prefix searches, .env file searches, config file searches
- Google (30): site: operators for pastebin, github raw, gitlab raw, etc.
- Shodan (20): SSL cert CN matches, banner keyword searches
- Censys (15): similar to Shodan
- ZoomEye (10), FOFA (10), GitLab (10), Bing (5)
- Total: 150
New Deps
- None — use stdlib net/http for GitHub API
<code_context>
Existing Code Insights
Reusable Assets
- pkg/providers/loader.go — go:embed pattern to mirror for dorks
- pkg/providers/registry.go — Registry pattern
- pkg/storage/db.go — add custom_dorks table via migration
- cmd/stubs.go — dorks is a stub to replace
- pkg/output/table.go — for dorks list output
</code_context>
## Specific IdeasPer-provider GitHub dorks should cover:
sk-proj- extension:env(OpenAI)sk-ant-api03- extension:env(Anthropic)AIzaSy extension:json(Google AI)- etc.
Queries should avoid GitHub-specific quirks (escape quotes properly for API).
## Deferred Ideas- Shodan/Censys/FOFA live execution — deferred to Phase 9-16 (OSINT phases)
- Query templating with variables — over-engineering
- Dork scheduling (daily dork runs) — Phase 17 (scheduler)