diff --git a/.planning/phases/08-dork-engine/08-05-SUMMARY.md b/.planning/phases/08-dork-engine/08-05-SUMMARY.md new file mode 100644 index 0000000..d0069be --- /dev/null +++ b/.planning/phases/08-dork-engine/08-05-SUMMARY.md @@ -0,0 +1,153 @@ +--- +phase: 08-dork-engine +plan: 05 +subsystem: pkg/dorks +tags: [dorks, github, executor, live, http, rate-limit] +requires: + - 08-01 # Executor interface, Runner, Match, ErrMissingAuth +provides: + - GitHubExecutor (Executor interface implementation for source "github") + - parseRetryAfter helper +affects: + - Unblocks 08-06 (dorks run CLI wiring via NewGitHubExecutor) +tech-stack: + added: [] # zero new dependencies — stdlib net/http only + patterns: + - Retry-After backoff with single retry on 403/429 + - httptest.Server BaseURL override for hermetic testing + - ErrMissingAuth wrapping for both empty token and 401 server response +key-files: + created: + - pkg/dorks/github.go + - pkg/dorks/github_test.go + modified: [] +decisions: + - GitHub Code Search remains the only live dork source in Phase 8; all others stay stubbed via ErrSourceNotImplemented (unchanged from 08-01). + - MaxRetries defaults to 1 — single retry per Execute after honoring Retry-After. Additional retries would amplify rate-limit pressure on authenticated-only endpoint (30 req/min). + - Auth token rejection (HTTP 401) is mapped to ErrMissingAuth wrap rather than a generic "bad token" error, so callers can use errors.Is for a single "credentials problem" branch. + - Limit clamping: limit <= 0 or > 100 falls back to 30 (GitHub's default per_page). 100 is the GitHub-enforced maximum for /search/code. + - Path field is populated as "/" so downstream consumers get a deduplicable identifier; URL field retains the raw html_url for browser navigation. + - Do NOT register the executor into any global Runner here — wiring lives in cmd/dorks.go (Plan 08-06) via NewGitHubExecutor(viper.GetString("dorks.github.token")). +metrics: + duration: ~10 min + tasks: 1 + files: 2 + tests: 8 subtests + 1 helper test (parseRetryAfter) + completed: 2026-04-05 +--- + +# Phase 8 Plan 05: GitHub Code Search Live Executor Summary + +One-liner: Implements `GitHubExecutor`, the sole live dork source in Phase 8, which calls GitHub's Code Search REST API with bearer auth, honors Retry-After on 403/429, and maps response items into `pkg/dorks.Match` entries — all stdlib, hermetically tested against `httptest.NewServer`. + +## What Was Built + +### `pkg/dorks/github.go` (GitHubExecutor) + +- `GitHubExecutor` struct with exported `Token`, `BaseURL`, `HTTPClient`, `MaxRetries` fields (BaseURL override enables hermetic testing). +- `NewGitHubExecutor(token string) *GitHubExecutor` — defaults BaseURL to `https://api.github.com`, HTTP client timeout to 30s, MaxRetries to 1. +- `Source() string` — returns `"github"`, satisfying the `Executor` interface from 08-01. +- `Execute(ctx, dork, limit) ([]Match, error)`: + 1. Empty token → `fmt.Errorf("%w: set GITHUB_TOKEN env var or `keyhunter config set dorks.github.token ` (needs public_repo scope)", ErrMissingAuth)` — fails closed before any HTTP traffic. + 2. Clamps `limit` to `(0, 100]`, defaults to 30. + 3. Builds `GET {BaseURL}/search/code?q={url.QueryEscape(d.Query)}&per_page={limit}`. + 4. Sets `Authorization: Bearer `, `Accept: application/vnd.github.v3.text-match+json` (enables `text_matches` in response), `User-Agent: keyhunter-dork-engine`. + 5. Retry loop (up to `MaxRetries + 1` attempts): + - `200` → break, decode response. + - `401` → wrap `ErrMissingAuth` with server body (token rejected). + - `403` / `429` → `parseRetryAfter(Retry-After)` → `time.After` sleep → retry. Respects `ctx.Done()` during the sleep. + - Other statuses → return `fmt.Errorf("github search failed: %d %s", ...)`. + 6. Decodes into `ghSearchResponse` (only fields we actually need: items[].name / path / html_url / repository.full_name / text_matches[].fragment). + 7. Builds `Match` entries with `Path = "/"` and `Snippet = text_matches[0].fragment`. Enforces `len(out) >= limit` cap as a belt-and-suspenders guard. +- `parseRetryAfter(string) time.Duration` — integer seconds form only (what GitHub uses for code search rate limits); unparseable or zero values fall back to 1 second. + +### `pkg/dorks/github_test.go` (8 subtests + helper test) + +All tests use a local `httptest.NewServer` and construct the executor via a private `newTestExecutor(token, baseURL)` helper that points `BaseURL` at the test server: + +| # | Test | What it verifies | +|---|------|------------------| +| 1 | `TestGitHubExecutor_Source` | `Source()` returns `"github"`. | +| 2 | `TestGitHubExecutor_MissingTokenReturnsErrMissingAuth` | Empty token short-circuits *before* hitting HTTP (server handler calls `t.Fatalf`); error wraps `ErrMissingAuth` and contains `"GITHUB_TOKEN"` setup hint. | +| 3 | `TestGitHubExecutor_SuccessfulSearchParsesMatches` | Asserts request carries `Authorization: Bearer test-token`, `Accept: ...text-match...`, path `/search/code`, raw decoded `q` query; response with 2 items is parsed into `[]Match` with correct `DorkID`, `Source`, `URL`, `Path="/"`, `Snippet`. | +| 4 | `TestGitHubExecutor_LimitCapsResults` | Server returns 10 items but `limit=5` caps output at 5; asserts `per_page=5` was sent. | +| 5 | `TestGitHubExecutor_RetryAfterSleepsAndRetries` | First hit: 403 with `Retry-After: 1` + `X-RateLimit-Remaining: 0`. Second hit: 200 with one item. Asserts 2 server hits, elapsed ≥ 900ms, and the match is returned. | +| 6 | `TestGitHubExecutor_RateLimitExhaustedReturnsError` | Server always returns 429. With `MaxRetries=1`, asserts exactly 2 hits and error contains `"rate limit"`. | +| 7 | `TestGitHubExecutor_UnauthorizedMapsToMissingAuth` | 401 response is wrapped with `ErrMissingAuth` via `errors.Is`. | +| 8 | `TestGitHubExecutor_UnprocessableEntityReturnsDescriptiveError` | 422 returns error containing both `"422"` and the server's `"Validation Failed"` body. | +| + | `TestParseRetryAfter` | Table test covers empty, `"0"`, `"1"`, `"5"`, and unparseable inputs. | + +## Verification + +``` +$ go test ./pkg/dorks/... -run GitHub -v +=== RUN TestGitHubExecutor_Source +--- PASS: TestGitHubExecutor_Source (0.00s) +=== RUN TestGitHubExecutor_MissingTokenReturnsErrMissingAuth +--- PASS: TestGitHubExecutor_MissingTokenReturnsErrMissingAuth (0.00s) +=== RUN TestGitHubExecutor_SuccessfulSearchParsesMatches +--- PASS: TestGitHubExecutor_SuccessfulSearchParsesMatches (0.00s) +=== RUN TestGitHubExecutor_LimitCapsResults +--- PASS: TestGitHubExecutor_LimitCapsResults (0.00s) +=== RUN TestGitHubExecutor_RetryAfterSleepsAndRetries +--- PASS: TestGitHubExecutor_RetryAfterSleepsAndRetries (1.00s) +=== RUN TestGitHubExecutor_RateLimitExhaustedReturnsError +--- PASS: TestGitHubExecutor_RateLimitExhaustedReturnsError (1.00s) +=== RUN TestGitHubExecutor_UnauthorizedMapsToMissingAuth +--- PASS: TestGitHubExecutor_UnauthorizedMapsToMissingAuth (0.00s) +=== RUN TestGitHubExecutor_UnprocessableEntityReturnsDescriptiveError +--- PASS: TestGitHubExecutor_UnprocessableEntityReturnsDescriptiveError (0.00s) +PASS +ok github.com/salvacybersec/keyhunter/pkg/dorks 2.008s +``` + +Full package suite (`go test ./pkg/dorks/...`) also green — no regressions in the existing loader/registry tests from 08-01. + +## Deviations from Plan + +### Rule 3 – Blocking Issue: empty `definitions/*` subdirectories broke `go:embed` + +- **Found during:** Task 1 first test run (`pkg/dorks/loader.go:19:12: pattern definitions/*: cannot embed directory definitions/fofa: contains no embeddable files`). +- **Root cause:** The `//go:embed definitions/*` directive in `pkg/dorks/loader.go` requires every immediate child directory to contain at least one non-hidden, embeddable file. Several source directories (bing/fofa/gitlab/shodan at various points) were empty on disk because their content lives in parallel Wave-2 plans that hadn't landed yet in this worktree. +- **Fix:** Added a 0-byte `placeholder.yaml` to any source directory that was otherwise empty at build time. `loadDorks` already treats empty-ID parses as non-errors (`pkg/dorks/loader.go:63-66`), so placeholders are no-ops in the registry. Placeholders are superseded automatically as real dork files land from Plans 08-02/03/04. +- **Scope:** Strictly limited to making the pre-existing `//go:embed` compile — no semantic changes to loader or registry logic. Because other parallel waves are actively populating these directories, most placeholder files were already obsolete by the final commit step (the dirs now contain real dork YAMLs). +- **Commit:** Bundled into the same commit as the github executor work (see "Commits" below). + +### Rule 1 – Bug in Plan's reference code: `urlQueryEscape` helper used wrong API + +- **Found during:** Task 1 implementation, reviewing the `` reference code. +- **Issue:** The plan's reference snippet contained `(&url.URL{Path: s}).EscapedPath()` with an inline `// wrong — use url.QueryEscape` comment. The plan author flagged it for fixing; I implemented it correctly from the start using `net/url` stdlib `url.QueryEscape(d.Query)`. +- **Fix:** Used `url.QueryEscape` directly inline in the request URL builder — no separate helper needed. Verified round-trip in `TestGitHubExecutor_SuccessfulSearchParsesMatches` (server asserts `r.URL.Query().Get("q") == "sk-proj- extension:env"`, which only passes if encoding+decoding are both correct). + +### No Rule 4 (architectural) issues encountered. + +## Auth Gates + +None — task is purely library code. Runtime auth (`GITHUB_TOKEN` / viper) is wired by Plan 08-06 when the `keyhunter dorks run --source=github` CLI command is built. + +## Known Stubs + +None. The executor is fully functional: missing token is a first-class error path, HTTP failures are mapped, and the retry loop is exercised by dedicated tests. + +## Commits + +Due to parallel wave activity in this worktree during execution, `pkg/dorks/github.go` and `pkg/dorks/github_test.go` were staged and committed by a neighbouring wave's bulk commit rather than their own atomic commit. Content is byte-identical to the files I authored (verified via `diff <(git show HEAD:pkg/dorks/github.go) pkg/dorks/github.go` — empty diff). + +- **56c11e3** — includes this plan's `pkg/dorks/github.go` + `pkg/dorks/github_test.go` alongside unrelated Shodan dork YAMLs. Not an atomic commit, but the content delivered matches the plan exactly. Follow-up tooling may choose to re-attribute; the code is in place and tested. + +## Requirements Satisfied + +- **DORK-02 (partial)** — GitHub Code Search live executor path. Remaining DORK-02 work (non-GitHub sources) remains deferred to Phase 9-16 OSINT waves, as defined in 08-CONTEXT.md. + +## Self-Check + +- [x] `pkg/dorks/github.go` exists at expected path +- [x] `pkg/dorks/github_test.go` exists at expected path +- [x] `go test ./pkg/dorks/... -run GitHub -v` passes (8 tests) +- [x] `go test ./pkg/dorks/...` passes (no regressions) +- [x] `GitHubExecutor` implements `Executor` interface (Source + Execute with matching signatures) +- [x] `ErrMissingAuth` wrapped for both empty-token and 401 paths +- [x] Retry-After honored on 403/429 (exercised by dedicated test) +- [x] `limit` cap enforced even if server over-returns + +## Self-Check: PASSED