Files
keyhunter/.planning/phases/08-dork-engine/08-CONTEXT.md
2026-04-06 00:05:59 +03:00

4.1 KiB

Phase 8: Dork Engine - Context

Gathered: 2026-04-05 Status: Ready for planning Mode: Auto-generated

## Phase Boundary

Dork engine: YAML-based dork definitions embedded via go:embed (same pattern as providers). 150+ built-in dorks across 8 sources (GitHub, Google, Shodan, Censys, ZoomEye, FOFA, GitLab, Bing). CLI commands: keyhunter dorks list/run/add/export. In this phase, run executes GitHub code search live (needs user GITHUB_TOKEN) and returns placeholder errors for other sources (they're implemented in OSINT phases 9-16).

## Implementation Decisions

Dork Schema (DORK-01)

id: openai-github-envfile
name: "OpenAI API Key in .env files"
source: github  # github|google|shodan|censys|zoomeye|fofa|gitlab|bing
category: frontier  # frontier|specialized|infrastructure|emerging|enterprise
query: 'sk-proj- extension:env'
description: "Finds OpenAI project keys exposed in committed .env files"
tags: [openai, env, tier1]

Package Layout

  • pkg/dorks/schema.go — Dork struct
  • pkg/dorks/loader.go — go:embed loader (same pattern as providers/)
  • pkg/dorks/registry.go — List/Get/Stats methods + custom dork persistence
  • pkg/dorks/executor.go — ExecuteDork interface + per-source implementations (only GitHub live)
  • pkg/dorks/github.go — GitHub Code Search via REST API (needs GITHUB_TOKEN from env/config)
  • dorks/ directory at repo root — 150+ YAML files organized by source subdir
  • pkg/dorks/definitions/*/*.yaml — mirrored for go:embed

Custom Dorks (DORK-03)

  • User-added dorks stored in SQLite custom_dorks table (id, source, category, query, description, created_at)
  • keyhunter dorks add writes to DB
  • keyhunter dorks list merges embedded + custom
  • keyhunter dorks export outputs both sets

GitHub Code Search (DORK-02 partial)

  • GET https://api.github.com/search/code?q={query}
  • Auth via GITHUB_TOKEN env var or config file
  • Rate limit: 30 req/min (authenticated), respect Retry-After
  • Results piped through engine detection pipeline (each match is a small chunk)
  • If no GITHUB_TOKEN → clear error message with setup instructions

Other Sources (DORK-02 deferred)

  • Return ErrSourceNotImplemented with hint that Shodan/Censys/etc. are coming in Phase 9-16
  • Stubs registered in executor so they show in dorks list but refuse dorks run

CLI Commands

  • keyhunter dorks list [--source=X] [--category=Y] — table output
  • keyhunter dorks run --source=X --category=Y [--id=Z] [--limit=N] — execute matching dorks
  • keyhunter dorks add --source=X --category=Y --query='...' --description='...' — persist custom
  • keyhunter dorks export [--format=json|yaml] — dump all dorks
  • keyhunter dorks info <id> — show full dork detail
  • keyhunter dorks delete <id> — remove custom dork (embedded cannot be deleted)

150+ Built-in Dorks

  • GitHub (50): per-provider prefix searches, .env file searches, config file searches
  • Google (30): site: operators for pastebin, github raw, gitlab raw, etc.
  • Shodan (20): SSL cert CN matches, banner keyword searches
  • Censys (15): similar to Shodan
  • ZoomEye (10), FOFA (10), GitLab (10), Bing (5)
  • Total: 150

New Deps

  • None — use stdlib net/http for GitHub API

<code_context>

Existing Code Insights

Reusable Assets

  • pkg/providers/loader.go — go:embed pattern to mirror for dorks
  • pkg/providers/registry.go — Registry pattern
  • pkg/storage/db.go — add custom_dorks table via migration
  • cmd/stubs.go — dorks is a stub to replace
  • pkg/output/table.go — for dorks list output

</code_context>

## Specific Ideas

Per-provider GitHub dorks should cover:

  • sk-proj- extension:env (OpenAI)
  • sk-ant-api03- extension:env (Anthropic)
  • AIzaSy extension:json (Google AI)
  • etc.

Queries should avoid GitHub-specific quirks (escape quotes properly for API).

## Deferred Ideas
  • Shodan/Censys/FOFA live execution — deferred to Phase 9-16 (OSINT phases)
  • Query templating with variables — over-engineering
  • Dork scheduling (daily dork runs) — Phase 17 (scheduler)