- BuildQueries(reg, source) dedups keywords and formats per-source syntax
- github/gist use 'keyword' in:file; others use bare keyword
- SourcesConfig placeholder struct for Wave 2 plans to depend on
- RegisterAll no-op stub (Plan 10-09 will fill)
Documents all 4 RECON-INFRA requirement IDs as complete, summarizes
decisions (per-source limiters, default-allow robots, SHA256 dedup,
UA pool of 10), lists handoff contract for Phases 10-16.
- Stealth UA pool (10 browsers) + RandomUserAgent/StealthHeaders
- Stable cross-source Dedup keyed by sha256(provider|masked|source)
- Mark RECON-INFRA-06 complete
- Dedup drops duplicates keyed by sha256(ProviderName|KeyMasked|Source)
- Preserves input order and first-seen metadata (stable dedup)
- Same provider+masked with different Source URLs are kept separate
- Uses engine.Finding directly to avoid alias collision with Plan 09-01
- Engine.Register/List/SweepAll with ants pool fanout
- ExampleSource emits two deterministic findings (SourceType=recon:example)
- Tests cover Register/List idempotency, SweepAll aggregation, empty-registry,
and Enabled() filtering
- Parses robots.txt via temoto/robotstxt
- Caches per host for 1 hour; second call within TTL skips HTTP fetch
- Default-allow on network/parse/4xx/5xx errors
- Matches 'keyhunter' user-agent against disallowed paths
- Client field allows httptest injection
Satisfies RECON-INFRA-07.
- Pool of 10 realistic browser User-Agents (Chrome/Firefox/Safari/Edge)
- Covers Windows, macOS, Linux, iOS, Android
- RandomUserAgent returns a random pool entry
- StealthHeaders returns UA + Accept-Language header map
- Add run subcommand dispatching via dorks.Runner (github live,
other sources wrapped into friendly ErrSourceNotImplemented)
- Add add subcommand with source/category validation and embedded
ID collision guard
- Add delete subcommand that refuses embedded dork ids
- Expose newGitHubExecutor as package var for test injection
- cmd/dorks_test.go covers list filtering, add persistence + list
merge marker, invalid source rejection, embedded collision,
embedded delete refusal, custom delete, shodan not-implemented
path, GitHub missing-token auth hint, fake executor run, yaml
export merge, and info for both origins
Completes DORK-03 (list/run/add/export/info/delete) and DORK-04
(--source/--category filtering).
- Replace cmd/stubs.go dorksCmd stub with full command tree
- Add cmd/dorks.go with list, info, export subcommands
- Wire Registry + custom_dorks merge for list/export
- Bind GITHUB_TOKEN env var via viper for downstream run
Satisfies part of DORK-03 (list/info/export) and DORK-04 (source/category
filtering). run/add/delete land in Task 2.
- GitHubExecutor implements Executor interface against api.github.com/search/code
- Retry-After honored once for 403/429; ctx cancel respected during sleep
- ErrMissingAuth wrapped for empty token AND 401 server response
- 8 httptest-backed subtests cover success/limit-cap/retry/rate-limit/401/422/source
- Zero new dependencies (stdlib net/http + net/url only)
- schema.sql: CREATE TABLE IF NOT EXISTS custom_dorks with unique dork_id,
source/category indexes, and tags stored as JSON TEXT
- custom_dorks.go: Save/List/Get/GetByDorkID/Delete with JSON tag round-trip
- Tests: round-trip, newest-first ordering, not-found, unique constraint,
delete no-op, schema migration idempotency
- Dork schema with Validate() mirroring provider YAML pattern
- go:embed loader tolerating empty definitions tree
- Registry with List/Get/Stats/ListBySource/ListByCategory
- Executor interface + Runner dispatch + ErrSourceNotImplemented
- Placeholder definitions/.gitkeep and repo-root dorks/.gitkeep
- Full unit test coverage for registry, validation, and runner dispatch