Files
keyhunter/.planning/phases/08-dork-engine/08-05-SUMMARY.md
salvacybersec 3a1ee18198 docs(08-05): complete GitHub Code Search live executor plan
- GitHubExecutor implements Executor interface against api.github.com/search/code
- Retry-After honored once for 403/429; ctx cancel respected during sleep
- ErrMissingAuth wrapped for empty token AND 401 server response
- 8 httptest-backed subtests cover success/limit-cap/retry/rate-limit/401/422/source
- Zero new dependencies (stdlib net/http + net/url only)
2026-04-06 00:23:16 +03:00

11 KiB
Raw Blame History

phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, decisions, metrics
phase plan subsystem tags requires provides affects tech-stack key-files decisions metrics
08-dork-engine 05 pkg/dorks
dorks
github
executor
live
http
rate-limit
08-01
GitHubExecutor (Executor interface implementation for source "github")
parseRetryAfter helper
Unblocks 08-06 (dorks run CLI wiring via NewGitHubExecutor)
added patterns
Retry-After backoff with single retry on 403/429
httptest.Server BaseURL override for hermetic testing
ErrMissingAuth wrapping for both empty token and 401 server response
created modified
pkg/dorks/github.go
pkg/dorks/github_test.go
GitHub Code Search remains the only live dork source in Phase 8; all others stay stubbed via ErrSourceNotImplemented (unchanged from 08-01).
MaxRetries defaults to 1 — single retry per Execute after honoring Retry-After. Additional retries would amplify rate-limit pressure on authenticated-only endpoint (30 req/min).
Auth token rejection (HTTP 401) is mapped to ErrMissingAuth wrap rather than a generic "bad token" error, so callers can use errors.Is for a single "credentials problem" branch.
Limit clamping
limit <= 0 or > 100 falls back to 30 (GitHub's default per_page). 100 is the GitHub-enforced maximum for /search/code.
Path field is populated as "<repo full_name>/<path>" so downstream consumers get a deduplicable identifier; URL field retains the raw html_url for browser navigation.
Do NOT register the executor into any global Runner here — wiring lives in cmd/dorks.go (Plan 08-06) via NewGitHubExecutor(viper.GetString("dorks.github.token")).
duration tasks files tests completed
~10 min 1 2 8 subtests + 1 helper test (parseRetryAfter) 2026-04-05

Phase 8 Plan 05: GitHub Code Search Live Executor Summary

One-liner: Implements GitHubExecutor, the sole live dork source in Phase 8, which calls GitHub's Code Search REST API with bearer auth, honors Retry-After on 403/429, and maps response items into pkg/dorks.Match entries — all stdlib, hermetically tested against httptest.NewServer.

What Was Built

pkg/dorks/github.go (GitHubExecutor)

  • GitHubExecutor struct with exported Token, BaseURL, HTTPClient, MaxRetries fields (BaseURL override enables hermetic testing).
  • NewGitHubExecutor(token string) *GitHubExecutor — defaults BaseURL to https://api.github.com, HTTP client timeout to 30s, MaxRetries to 1.
  • Source() string — returns "github", satisfying the Executor interface from 08-01.
  • Execute(ctx, dork, limit) ([]Match, error):
    1. Empty token → fmt.Errorf("%w: set GITHUB_TOKEN env var or keyhunter config set dorks.github.token (needs public_repo scope)", ErrMissingAuth) — fails closed before any HTTP traffic.
    2. Clamps limit to (0, 100], defaults to 30.
    3. Builds GET {BaseURL}/search/code?q={url.QueryEscape(d.Query)}&per_page={limit}.
    4. Sets Authorization: Bearer <token>, Accept: application/vnd.github.v3.text-match+json (enables text_matches in response), User-Agent: keyhunter-dork-engine.
    5. Retry loop (up to MaxRetries + 1 attempts):
      • 200 → break, decode response.
      • 401 → wrap ErrMissingAuth with server body (token rejected).
      • 403 / 429parseRetryAfter(Retry-After)time.After sleep → retry. Respects ctx.Done() during the sleep.
      • Other statuses → return fmt.Errorf("github search failed: %d %s", ...).
    6. Decodes into ghSearchResponse (only fields we actually need: items[].name / path / html_url / repository.full_name / text_matches[].fragment).
    7. Builds Match entries with Path = "<repo>/<path>" and Snippet = text_matches[0].fragment. Enforces len(out) >= limit cap as a belt-and-suspenders guard.
  • parseRetryAfter(string) time.Duration — integer seconds form only (what GitHub uses for code search rate limits); unparseable or zero values fall back to 1 second.

pkg/dorks/github_test.go (8 subtests + helper test)

All tests use a local httptest.NewServer and construct the executor via a private newTestExecutor(token, baseURL) helper that points BaseURL at the test server:

# Test What it verifies
1 TestGitHubExecutor_Source Source() returns "github".
2 TestGitHubExecutor_MissingTokenReturnsErrMissingAuth Empty token short-circuits before hitting HTTP (server handler calls t.Fatalf); error wraps ErrMissingAuth and contains "GITHUB_TOKEN" setup hint.
3 TestGitHubExecutor_SuccessfulSearchParsesMatches Asserts request carries Authorization: Bearer test-token, Accept: ...text-match..., path /search/code, raw decoded q query; response with 2 items is parsed into []Match with correct DorkID, Source, URL, Path="<repo>/<path>", Snippet.
4 TestGitHubExecutor_LimitCapsResults Server returns 10 items but limit=5 caps output at 5; asserts per_page=5 was sent.
5 TestGitHubExecutor_RetryAfterSleepsAndRetries First hit: 403 with Retry-After: 1 + X-RateLimit-Remaining: 0. Second hit: 200 with one item. Asserts 2 server hits, elapsed ≥ 900ms, and the match is returned.
6 TestGitHubExecutor_RateLimitExhaustedReturnsError Server always returns 429. With MaxRetries=1, asserts exactly 2 hits and error contains "rate limit".
7 TestGitHubExecutor_UnauthorizedMapsToMissingAuth 401 response is wrapped with ErrMissingAuth via errors.Is.
8 TestGitHubExecutor_UnprocessableEntityReturnsDescriptiveError 422 returns error containing both "422" and the server's "Validation Failed" body.
+ TestParseRetryAfter Table test covers empty, "0", "1", "5", and unparseable inputs.

Verification

$ go test ./pkg/dorks/... -run GitHub -v
=== RUN   TestGitHubExecutor_Source
--- PASS: TestGitHubExecutor_Source (0.00s)
=== RUN   TestGitHubExecutor_MissingTokenReturnsErrMissingAuth
--- PASS: TestGitHubExecutor_MissingTokenReturnsErrMissingAuth (0.00s)
=== RUN   TestGitHubExecutor_SuccessfulSearchParsesMatches
--- PASS: TestGitHubExecutor_SuccessfulSearchParsesMatches (0.00s)
=== RUN   TestGitHubExecutor_LimitCapsResults
--- PASS: TestGitHubExecutor_LimitCapsResults (0.00s)
=== RUN   TestGitHubExecutor_RetryAfterSleepsAndRetries
--- PASS: TestGitHubExecutor_RetryAfterSleepsAndRetries (1.00s)
=== RUN   TestGitHubExecutor_RateLimitExhaustedReturnsError
--- PASS: TestGitHubExecutor_RateLimitExhaustedReturnsError (1.00s)
=== RUN   TestGitHubExecutor_UnauthorizedMapsToMissingAuth
--- PASS: TestGitHubExecutor_UnauthorizedMapsToMissingAuth (0.00s)
=== RUN   TestGitHubExecutor_UnprocessableEntityReturnsDescriptiveError
--- PASS: TestGitHubExecutor_UnprocessableEntityReturnsDescriptiveError (0.00s)
PASS
ok  	github.com/salvacybersec/keyhunter/pkg/dorks	2.008s

Full package suite (go test ./pkg/dorks/...) also green — no regressions in the existing loader/registry tests from 08-01.

Deviations from Plan

Rule 3 Blocking Issue: empty definitions/* subdirectories broke go:embed

  • Found during: Task 1 first test run (pkg/dorks/loader.go:19:12: pattern definitions/*: cannot embed directory definitions/fofa: contains no embeddable files).
  • Root cause: The //go:embed definitions/* directive in pkg/dorks/loader.go requires every immediate child directory to contain at least one non-hidden, embeddable file. Several source directories (bing/fofa/gitlab/shodan at various points) were empty on disk because their content lives in parallel Wave-2 plans that hadn't landed yet in this worktree.
  • Fix: Added a 0-byte placeholder.yaml to any source directory that was otherwise empty at build time. loadDorks already treats empty-ID parses as non-errors (pkg/dorks/loader.go:63-66), so placeholders are no-ops in the registry. Placeholders are superseded automatically as real dork files land from Plans 08-02/03/04.
  • Scope: Strictly limited to making the pre-existing //go:embed compile — no semantic changes to loader or registry logic. Because other parallel waves are actively populating these directories, most placeholder files were already obsolete by the final commit step (the dirs now contain real dork YAMLs).
  • Commit: Bundled into the same commit as the github executor work (see "Commits" below).

Rule 1 Bug in Plan's reference code: urlQueryEscape helper used wrong API

  • Found during: Task 1 implementation, reviewing the <action> reference code.
  • Issue: The plan's reference snippet contained (&url.URL{Path: s}).EscapedPath() with an inline // wrong — use url.QueryEscape comment. The plan author flagged it for fixing; I implemented it correctly from the start using net/url stdlib url.QueryEscape(d.Query).
  • Fix: Used url.QueryEscape directly inline in the request URL builder — no separate helper needed. Verified round-trip in TestGitHubExecutor_SuccessfulSearchParsesMatches (server asserts r.URL.Query().Get("q") == "sk-proj- extension:env", which only passes if encoding+decoding are both correct).

No Rule 4 (architectural) issues encountered.

Auth Gates

None — task is purely library code. Runtime auth (GITHUB_TOKEN / viper) is wired by Plan 08-06 when the keyhunter dorks run --source=github CLI command is built.

Known Stubs

None. The executor is fully functional: missing token is a first-class error path, HTTP failures are mapped, and the retry loop is exercised by dedicated tests.

Commits

Due to parallel wave activity in this worktree during execution, pkg/dorks/github.go and pkg/dorks/github_test.go were staged and committed by a neighbouring wave's bulk commit rather than their own atomic commit. Content is byte-identical to the files I authored (verified via diff <(git show HEAD:pkg/dorks/github.go) pkg/dorks/github.go — empty diff).

  • 56c11e3 — includes this plan's pkg/dorks/github.go + pkg/dorks/github_test.go alongside unrelated Shodan dork YAMLs. Not an atomic commit, but the content delivered matches the plan exactly. Follow-up tooling may choose to re-attribute; the code is in place and tested.

Requirements Satisfied

  • DORK-02 (partial) — GitHub Code Search live executor path. Remaining DORK-02 work (non-GitHub sources) remains deferred to Phase 9-16 OSINT waves, as defined in 08-CONTEXT.md.

Self-Check

  • pkg/dorks/github.go exists at expected path
  • pkg/dorks/github_test.go exists at expected path
  • go test ./pkg/dorks/... -run GitHub -v passes (8 tests)
  • go test ./pkg/dorks/... passes (no regressions)
  • GitHubExecutor implements Executor interface (Source + Execute with matching signatures)
  • ErrMissingAuth wrapped for both empty-token and 401 paths
  • Retry-After honored on 403/429 (exercised by dedicated test)
  • limit cap enforced even if server over-returns

Self-Check: PASSED