Files
keyhunter/.planning/phases/11-osint_search_paste/11-01-PLAN.md

13 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
phase plan type wave depends_on files_modified autonomous requirements must_haves
11-osint-search-paste 01 execute 1
pkg/recon/sources/google.go
pkg/recon/sources/google_test.go
pkg/recon/sources/bing.go
pkg/recon/sources/bing_test.go
pkg/recon/sources/duckduckgo.go
pkg/recon/sources/duckduckgo_test.go
pkg/recon/sources/yandex.go
pkg/recon/sources/yandex_test.go
pkg/recon/sources/brave.go
pkg/recon/sources/brave_test.go
pkg/recon/sources/queries.go
true
RECON-DORK-01
RECON-DORK-02
RECON-DORK-03
truths artifacts key_links
Google dorking source searches via Google Custom Search JSON API and emits findings with dork query context
Bing dorking source searches via Bing Web Search API and emits findings
DuckDuckGo, Yandex, and Brave sources each search their respective APIs/endpoints and emit findings
All five sources respect ctx cancellation and use LimiterRegistry for rate limiting
Missing API keys disable the source (Enabled=false) without error
path provides contains
pkg/recon/sources/google.go GoogleDorkSource implementing recon.ReconSource func (s *GoogleDorkSource) Sweep
path provides contains
pkg/recon/sources/bing.go BingDorkSource implementing recon.ReconSource func (s *BingDorkSource) Sweep
path provides contains
pkg/recon/sources/duckduckgo.go DuckDuckGoSource implementing recon.ReconSource func (s *DuckDuckGoSource) Sweep
path provides contains
pkg/recon/sources/yandex.go YandexSource implementing recon.ReconSource func (s *YandexSource) Sweep
path provides contains
pkg/recon/sources/brave.go BraveSource implementing recon.ReconSource func (s *BraveSource) Sweep
from to via pattern
pkg/recon/sources/google.go pkg/recon/sources/httpclient.go sources.Client for HTTP with retry client.Do
from to via pattern
pkg/recon/sources/queries.go all five search sources formatQuery switch cases case "google"|"bing"|"duckduckgo"|"yandex"|"brave"
Implement five search engine dorking ReconSource implementations: GoogleDorkSource, BingDorkSource, DuckDuckGoSource, YandexSource, and BraveSource.

Purpose: RECON-DORK-01/02/03 -- enable automated search engine dorking for API key leak detection across all major search engines. Output: Five source files + tests, updated queries.go formatQuery.

<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @pkg/recon/source.go @pkg/recon/sources/httpclient.go @pkg/recon/sources/queries.go @pkg/recon/sources/github.go (reference pattern for API-backed source) @pkg/recon/sources/replit.go (reference pattern for scraping source)

From pkg/recon/source.go:

type ReconSource interface {
    Name() string
    RateLimit() rate.Limit
    Burst() int
    RespectsRobots() bool
    Enabled(cfg Config) bool
    Sweep(ctx context.Context, query string, out chan<- Finding) error
}

From pkg/recon/sources/httpclient.go:

type Client struct { HTTP *http.Client; MaxRetries int; UserAgent string }
func NewClient() *Client
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)

From pkg/recon/sources/queries.go:

func BuildQueries(reg *providers.Registry, source string) []string
func formatQuery(source, keyword string) string  // needs new cases

From pkg/recon/sources/register.go:

type SourcesConfig struct { ... }  // will be extended in Plan 11-03
Task 1: GoogleDorkSource + BingDorkSource + formatQuery updates pkg/recon/sources/google.go, pkg/recon/sources/google_test.go, pkg/recon/sources/bing.go, pkg/recon/sources/bing_test.go, pkg/recon/sources/queries.go - GoogleDorkSource.Name() == "google" - GoogleDorkSource.RateLimit() == rate.Every(1*time.Second) (Google Custom Search: 100/day free, be conservative) - GoogleDorkSource.Burst() == 1 - GoogleDorkSource.RespectsRobots() == false (authenticated API) - GoogleDorkSource.Enabled() == true only when APIKey AND CX (search engine ID) are both non-empty - GoogleDorkSource.Sweep() calls Google Custom Search JSON API: GET https://www.googleapis.com/customsearch/v1?key={key}&cx={cx}&q={query}&num=10 - Each search result item emits a Finding with Source=item.link, SourceType="recon:google", Confidence="low" - BingDorkSource.Name() == "bing" - BingDorkSource.RateLimit() == rate.Every(500*time.Millisecond) (Bing allows 3 TPS on S1 tier) - BingDorkSource.Enabled() == true only when APIKey is non-empty - BingDorkSource.Sweep() calls Bing Web Search API v7: GET https://api.bing.microsoft.com/v7.0/search?q={query}&count=50 with Ocp-Apim-Subscription-Key header - Each webPages.value item emits Finding with Source=item.url, SourceType="recon:bing" - formatQuery("google", kw) returns `site:pastebin.com OR site:github.com "{kw}"` (dork-style) - formatQuery("bing", kw) returns same dork-style format - ctx cancellation aborts both sources promptly - Transient HTTP errors (429/5xx) are retried via sources.Client; 401 aborts sweep Create `pkg/recon/sources/google.go`: - Struct: `GoogleDorkSource` with fields: APIKey string, CX string, BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, client *Client - Compile-time interface assertion: `var _ recon.ReconSource = (*GoogleDorkSource)(nil)` - Name() returns "google" - RateLimit() returns rate.Every(1*time.Second) - Burst() returns 1 - RespectsRobots() returns false - Enabled() returns s.APIKey != "" && s.CX != "" - Sweep(): iterate BuildQueries(registry, "google"), for each query: wait on LimiterRegistry, build GET request to `{BaseURL}/customsearch/v1?key={APIKey}&cx={CX}&q={url.QueryEscape(q)}&num=10`, set Accept: application/json, call client.Do, decode JSON response `{ items: [{ title, link, snippet }] }`, emit Finding per item with Source=link, SourceType="recon:google", ProviderName from keyword index (same pattern as githubKeywordIndex), Confidence="low". On 401 abort; on transient error continue to next query. - Private response structs: googleSearchResponse, googleSearchItem
Create `pkg/recon/sources/bing.go`:
- Struct: `BingDorkSource` with fields: APIKey string, BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, client *Client
- Name() returns "bing"
- RateLimit() returns rate.Every(500*time.Millisecond)
- Burst() returns 2
- RespectsRobots() returns false
- Enabled() returns s.APIKey != ""
- Sweep(): iterate BuildQueries(registry, "bing"), for each: wait on limiter, GET `{BaseURL}/v7.0/search?q={query}&count=50`, set Ocp-Apim-Subscription-Key header, decode JSON `{ webPages: { value: [{ name, url, snippet }] } }`, emit Finding per value item with Source=url, SourceType="recon:bing". Same error handling pattern.
- Private response structs: bingSearchResponse, bingWebPages, bingWebResult

Update `pkg/recon/sources/queries.go` formatQuery():
- Add cases for "google", "bing", "duckduckgo", "yandex", "brave" that return the keyword wrapped in dork syntax: `site:pastebin.com OR site:github.com "%s"` using fmt.Sprintf with the keyword. This focuses search results on paste/code hosting sites where keys leak.

Create test files with httptest servers returning canned JSON fixtures. Each test:
- Verifies Sweep emits correct number of findings
- Verifies SourceType is correct
- Verifies Source URLs match fixture data
- Verifies Enabled() behavior with/without credentials
- Verifies ctx cancellation returns error
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestGoogle|TestBing" -v -count=1 GoogleDorkSource and BingDorkSource pass all tests. formatQuery handles google/bing cases. Task 2: DuckDuckGoSource + YandexSource + BraveSource pkg/recon/sources/duckduckgo.go, pkg/recon/sources/duckduckgo_test.go, pkg/recon/sources/yandex.go, pkg/recon/sources/yandex_test.go, pkg/recon/sources/brave.go, pkg/recon/sources/brave_test.go - DuckDuckGoSource.Name() == "duckduckgo" - DuckDuckGoSource.RateLimit() == rate.Every(2*time.Second) (no official API, scrape-conservative) - DuckDuckGoSource.RespectsRobots() == true (HTML scraper) - DuckDuckGoSource.Enabled() always true (no API key needed -- uses DuckDuckGo HTML search) - DuckDuckGoSource.Sweep() GETs `https://html.duckduckgo.com/html/?q={query}`, parses HTML for result links in anchors, emits Findings - YandexSource.Name() == "yandex" - YandexSource.RateLimit() == rate.Every(1*time.Second) - YandexSource.RespectsRobots() == false (uses Yandex XML search API) - YandexSource.Enabled() == true only when User and APIKey are both non-empty - YandexSource.Sweep() GETs `https://yandex.com/search/xml?user={user}&key={key}&query={q}&l10n=en&sortby=rlv&filter=none&groupby=attr%3D%22%22.mode%3Dflat.groups-on-page%3D50`, parses XML response for elements - BraveSource.Name() == "brave" - BraveSource.RateLimit() == rate.Every(1*time.Second) (Brave Search API: 1 QPS free tier) - BraveSource.Enabled() == true only when APIKey is non-empty - BraveSource.Sweep() GETs `https://api.search.brave.com/res/v1/web/search?q={query}&count=20` with X-Subscription-Token header, decodes JSON { web: { results: [{ url, title }] } }, emits Findings Create `pkg/recon/sources/duckduckgo.go`: - Struct: `DuckDuckGoSource` with BaseURL, Registry, Limiters, Client fields - Name() "duckduckgo", RateLimit() Every(2s), Burst() 1, RespectsRobots() true - Enabled() always true (credential-free, like Replit) - Sweep(): iterate BuildQueries(registry, "duckduckgo"), for each: wait limiter, GET `{BaseURL}/html/?q={query}`, parse HTML using golang.org/x/net/html (same as Replit pattern), extract href from `` or `` elements. Use a regex or attribute check: look for tags whose class contains "result__a". Emit Finding with Source=extracted URL, SourceType="recon:duckduckgo". Deduplicate results within the same query.
Create `pkg/recon/sources/yandex.go`:
- Struct: `YandexSource` with User, APIKey, BaseURL, Registry, Limiters, client fields
- Name() "yandex", RateLimit() Every(1s), Burst() 1, RespectsRobots() false
- Enabled() returns s.User != "" && s.APIKey != ""
- Sweep(): iterate BuildQueries, for each: wait limiter, GET `{BaseURL}/search/xml?user={User}&key={APIKey}&query={url.QueryEscape(q)}&l10n=en&sortby=rlv&filter=none&groupby=attr%3D%22%22.mode%3Dflat.groups-on-page%3D50`, decode XML using encoding/xml. Response structure: `<yandexsearch><response><results><grouping><group><doc><url>...</url></doc></group></grouping></results></response></yandexsearch>`. Emit Finding per <url>. SourceType="recon:yandex".

Create `pkg/recon/sources/brave.go`:
- Struct: `BraveSource` with APIKey, BaseURL, Registry, Limiters, client fields
- Name() "brave", RateLimit() Every(1s), Burst() 1, RespectsRobots() false
- Enabled() returns s.APIKey != ""
- Sweep(): iterate BuildQueries, for each: wait limiter, GET `{BaseURL}/res/v1/web/search?q={query}&count=20`, set X-Subscription-Token header to APIKey, Accept: application/json. Decode JSON `{ web: { results: [{ url, title, description }] } }`. Emit Finding per result. SourceType="recon:brave".

All three follow the same error handling pattern as Task 1: 401 aborts, transient errors continue, ctx cancellation returns immediately.

Create test files with httptest servers. DuckDuckGo test serves HTML fixture with result anchors. Yandex test serves XML fixture. Brave test serves JSON fixture. Each test covers: Sweep emits findings, SourceType correct, Enabled behavior, ctx cancellation.
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestDuckDuckGo|TestYandex|TestBrave" -v -count=1 DuckDuckGoSource, YandexSource, and BraveSource pass all tests. All five search sources complete. All five search engine sources compile and pass unit tests: ```bash cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestGoogle|TestBing|TestDuckDuckGo|TestYandex|TestBrave" -v -count=1 ```

<success_criteria>

  • 5 new source files exist in pkg/recon/sources/ (google.go, bing.go, duckduckgo.go, yandex.go, brave.go)
  • Each source implements recon.ReconSource with compile-time assertion
  • Each has a corresponding _test.go file with httptest-based tests
  • formatQuery in queries.go handles all 5 new source names
  • All tests pass </success_criteria>
After completion, create `.planning/phases/11-osint_search_paste/11-01-SUMMARY.md`