docs(08): dork engine context
This commit is contained in:
107
.planning/phases/08-dork-engine/08-CONTEXT.md
Normal file
107
.planning/phases/08-dork-engine/08-CONTEXT.md
Normal file
@@ -0,0 +1,107 @@
|
|||||||
|
# Phase 8: Dork Engine - Context
|
||||||
|
|
||||||
|
**Gathered:** 2026-04-05
|
||||||
|
**Status:** Ready for planning
|
||||||
|
**Mode:** Auto-generated
|
||||||
|
|
||||||
|
<domain>
|
||||||
|
## Phase Boundary
|
||||||
|
|
||||||
|
Dork engine: YAML-based dork definitions embedded via go:embed (same pattern as providers). 150+ built-in dorks across 8 sources (GitHub, Google, Shodan, Censys, ZoomEye, FOFA, GitLab, Bing). CLI commands: `keyhunter dorks list/run/add/export`. In this phase, `run` executes **GitHub code search** live (needs user GITHUB_TOKEN) and returns placeholder errors for other sources (they're implemented in OSINT phases 9-16).
|
||||||
|
|
||||||
|
</domain>
|
||||||
|
|
||||||
|
<decisions>
|
||||||
|
## Implementation Decisions
|
||||||
|
|
||||||
|
### Dork Schema (DORK-01)
|
||||||
|
```yaml
|
||||||
|
id: openai-github-envfile
|
||||||
|
name: "OpenAI API Key in .env files"
|
||||||
|
source: github # github|google|shodan|censys|zoomeye|fofa|gitlab|bing
|
||||||
|
category: frontier # frontier|specialized|infrastructure|emerging|enterprise
|
||||||
|
query: 'sk-proj- extension:env'
|
||||||
|
description: "Finds OpenAI project keys exposed in committed .env files"
|
||||||
|
tags: [openai, env, tier1]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Package Layout
|
||||||
|
- `pkg/dorks/schema.go` — Dork struct
|
||||||
|
- `pkg/dorks/loader.go` — go:embed loader (same pattern as providers/)
|
||||||
|
- `pkg/dorks/registry.go` — List/Get/Stats methods + custom dork persistence
|
||||||
|
- `pkg/dorks/executor.go` — ExecuteDork interface + per-source implementations (only GitHub live)
|
||||||
|
- `pkg/dorks/github.go` — GitHub Code Search via REST API (needs GITHUB_TOKEN from env/config)
|
||||||
|
- `dorks/` directory at repo root — 150+ YAML files organized by source subdir
|
||||||
|
- `pkg/dorks/definitions/*/*.yaml` — mirrored for go:embed
|
||||||
|
|
||||||
|
### Custom Dorks (DORK-03)
|
||||||
|
- User-added dorks stored in SQLite `custom_dorks` table (id, source, category, query, description, created_at)
|
||||||
|
- `keyhunter dorks add` writes to DB
|
||||||
|
- `keyhunter dorks list` merges embedded + custom
|
||||||
|
- `keyhunter dorks export` outputs both sets
|
||||||
|
|
||||||
|
### GitHub Code Search (DORK-02 partial)
|
||||||
|
- `GET https://api.github.com/search/code?q={query}`
|
||||||
|
- Auth via `GITHUB_TOKEN` env var or config file
|
||||||
|
- Rate limit: 30 req/min (authenticated), respect Retry-After
|
||||||
|
- Results piped through engine detection pipeline (each match is a small chunk)
|
||||||
|
- If no GITHUB_TOKEN → clear error message with setup instructions
|
||||||
|
|
||||||
|
### Other Sources (DORK-02 deferred)
|
||||||
|
- Return `ErrSourceNotImplemented` with hint that Shodan/Censys/etc. are coming in Phase 9-16
|
||||||
|
- Stubs registered in executor so they show in `dorks list` but refuse `dorks run`
|
||||||
|
|
||||||
|
### CLI Commands
|
||||||
|
- `keyhunter dorks list [--source=X] [--category=Y]` — table output
|
||||||
|
- `keyhunter dorks run --source=X --category=Y [--id=Z] [--limit=N]` — execute matching dorks
|
||||||
|
- `keyhunter dorks add --source=X --category=Y --query='...' --description='...'` — persist custom
|
||||||
|
- `keyhunter dorks export [--format=json|yaml]` — dump all dorks
|
||||||
|
- `keyhunter dorks info <id>` — show full dork detail
|
||||||
|
- `keyhunter dorks delete <id>` — remove custom dork (embedded cannot be deleted)
|
||||||
|
|
||||||
|
### 150+ Built-in Dorks
|
||||||
|
- **GitHub (50)**: per-provider prefix searches, .env file searches, config file searches
|
||||||
|
- **Google (30)**: site: operators for pastebin, github raw, gitlab raw, etc.
|
||||||
|
- **Shodan (20)**: SSL cert CN matches, banner keyword searches
|
||||||
|
- **Censys (15)**: similar to Shodan
|
||||||
|
- **ZoomEye (10)**, **FOFA (10)**, **GitLab (10)**, **Bing (5)**
|
||||||
|
- Total: 150
|
||||||
|
|
||||||
|
### New Deps
|
||||||
|
- None — use stdlib net/http for GitHub API
|
||||||
|
|
||||||
|
</decisions>
|
||||||
|
|
||||||
|
<code_context>
|
||||||
|
## Existing Code Insights
|
||||||
|
|
||||||
|
### Reusable Assets
|
||||||
|
- pkg/providers/loader.go — go:embed pattern to mirror for dorks
|
||||||
|
- pkg/providers/registry.go — Registry pattern
|
||||||
|
- pkg/storage/db.go — add custom_dorks table via migration
|
||||||
|
- cmd/stubs.go — dorks is a stub to replace
|
||||||
|
- pkg/output/table.go — for dorks list output
|
||||||
|
|
||||||
|
</code_context>
|
||||||
|
|
||||||
|
<specifics>
|
||||||
|
## Specific Ideas
|
||||||
|
|
||||||
|
Per-provider GitHub dorks should cover:
|
||||||
|
- `sk-proj- extension:env` (OpenAI)
|
||||||
|
- `sk-ant-api03- extension:env` (Anthropic)
|
||||||
|
- `AIzaSy extension:json` (Google AI)
|
||||||
|
- etc.
|
||||||
|
|
||||||
|
Queries should avoid GitHub-specific quirks (escape quotes properly for API).
|
||||||
|
|
||||||
|
</specifics>
|
||||||
|
|
||||||
|
<deferred>
|
||||||
|
## Deferred Ideas
|
||||||
|
|
||||||
|
- Shodan/Censys/FOFA live execution — deferred to Phase 9-16 (OSINT phases)
|
||||||
|
- Query templating with variables — over-engineering
|
||||||
|
- Dork scheduling (daily dork runs) — Phase 17 (scheduler)
|
||||||
|
|
||||||
|
</deferred>
|
||||||
Reference in New Issue
Block a user