242 lines
13 KiB
Markdown
242 lines
13 KiB
Markdown
---
|
|
phase: 11-osint-search-paste
|
|
plan: 01
|
|
type: execute
|
|
wave: 1
|
|
depends_on: []
|
|
files_modified:
|
|
- pkg/recon/sources/google.go
|
|
- pkg/recon/sources/google_test.go
|
|
- pkg/recon/sources/bing.go
|
|
- pkg/recon/sources/bing_test.go
|
|
- pkg/recon/sources/duckduckgo.go
|
|
- pkg/recon/sources/duckduckgo_test.go
|
|
- pkg/recon/sources/yandex.go
|
|
- pkg/recon/sources/yandex_test.go
|
|
- pkg/recon/sources/brave.go
|
|
- pkg/recon/sources/brave_test.go
|
|
- pkg/recon/sources/queries.go
|
|
autonomous: true
|
|
requirements: [RECON-DORK-01, RECON-DORK-02, RECON-DORK-03]
|
|
|
|
must_haves:
|
|
truths:
|
|
- "Google dorking source searches via Google Custom Search JSON API and emits findings with dork query context"
|
|
- "Bing dorking source searches via Bing Web Search API and emits findings"
|
|
- "DuckDuckGo, Yandex, and Brave sources each search their respective APIs/endpoints and emit findings"
|
|
- "All five sources respect ctx cancellation and use LimiterRegistry for rate limiting"
|
|
- "Missing API keys disable the source (Enabled=false) without error"
|
|
artifacts:
|
|
- path: "pkg/recon/sources/google.go"
|
|
provides: "GoogleDorkSource implementing recon.ReconSource"
|
|
contains: "func (s *GoogleDorkSource) Sweep"
|
|
- path: "pkg/recon/sources/bing.go"
|
|
provides: "BingDorkSource implementing recon.ReconSource"
|
|
contains: "func (s *BingDorkSource) Sweep"
|
|
- path: "pkg/recon/sources/duckduckgo.go"
|
|
provides: "DuckDuckGoSource implementing recon.ReconSource"
|
|
contains: "func (s *DuckDuckGoSource) Sweep"
|
|
- path: "pkg/recon/sources/yandex.go"
|
|
provides: "YandexSource implementing recon.ReconSource"
|
|
contains: "func (s *YandexSource) Sweep"
|
|
- path: "pkg/recon/sources/brave.go"
|
|
provides: "BraveSource implementing recon.ReconSource"
|
|
contains: "func (s *BraveSource) Sweep"
|
|
key_links:
|
|
- from: "pkg/recon/sources/google.go"
|
|
to: "pkg/recon/sources/httpclient.go"
|
|
via: "sources.Client for HTTP with retry"
|
|
pattern: "client\\.Do"
|
|
- from: "pkg/recon/sources/queries.go"
|
|
to: "all five search sources"
|
|
via: "formatQuery switch cases"
|
|
pattern: "case \"google\"|\"bing\"|\"duckduckgo\"|\"yandex\"|\"brave\""
|
|
---
|
|
|
|
<objective>
|
|
Implement five search engine dorking ReconSource implementations: GoogleDorkSource, BingDorkSource, DuckDuckGoSource, YandexSource, and BraveSource.
|
|
|
|
Purpose: RECON-DORK-01/02/03 -- enable automated search engine dorking for API key leak detection across all major search engines.
|
|
Output: Five source files + tests, updated queries.go formatQuery.
|
|
</objective>
|
|
|
|
<execution_context>
|
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
|
</execution_context>
|
|
|
|
<context>
|
|
@.planning/PROJECT.md
|
|
@.planning/ROADMAP.md
|
|
@.planning/STATE.md
|
|
@pkg/recon/source.go
|
|
@pkg/recon/sources/httpclient.go
|
|
@pkg/recon/sources/queries.go
|
|
@pkg/recon/sources/github.go (reference pattern for API-backed source)
|
|
@pkg/recon/sources/replit.go (reference pattern for scraping source)
|
|
|
|
<interfaces>
|
|
<!-- Executor needs these contracts -->
|
|
|
|
From pkg/recon/source.go:
|
|
```go
|
|
type ReconSource interface {
|
|
Name() string
|
|
RateLimit() rate.Limit
|
|
Burst() int
|
|
RespectsRobots() bool
|
|
Enabled(cfg Config) bool
|
|
Sweep(ctx context.Context, query string, out chan<- Finding) error
|
|
}
|
|
```
|
|
|
|
From pkg/recon/sources/httpclient.go:
|
|
```go
|
|
type Client struct { HTTP *http.Client; MaxRetries int; UserAgent string }
|
|
func NewClient() *Client
|
|
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
|
|
```
|
|
|
|
From pkg/recon/sources/queries.go:
|
|
```go
|
|
func BuildQueries(reg *providers.Registry, source string) []string
|
|
func formatQuery(source, keyword string) string // needs new cases
|
|
```
|
|
|
|
From pkg/recon/sources/register.go:
|
|
```go
|
|
type SourcesConfig struct { ... } // will be extended in Plan 11-03
|
|
```
|
|
</interfaces>
|
|
</context>
|
|
|
|
<tasks>
|
|
|
|
<task type="auto" tdd="true">
|
|
<name>Task 1: GoogleDorkSource + BingDorkSource + formatQuery updates</name>
|
|
<files>pkg/recon/sources/google.go, pkg/recon/sources/google_test.go, pkg/recon/sources/bing.go, pkg/recon/sources/bing_test.go, pkg/recon/sources/queries.go</files>
|
|
<behavior>
|
|
- GoogleDorkSource.Name() == "google"
|
|
- GoogleDorkSource.RateLimit() == rate.Every(1*time.Second) (Google Custom Search: 100/day free, be conservative)
|
|
- GoogleDorkSource.Burst() == 1
|
|
- GoogleDorkSource.RespectsRobots() == false (authenticated API)
|
|
- GoogleDorkSource.Enabled() == true only when APIKey AND CX (search engine ID) are both non-empty
|
|
- GoogleDorkSource.Sweep() calls Google Custom Search JSON API: GET https://www.googleapis.com/customsearch/v1?key={key}&cx={cx}&q={query}&num=10
|
|
- Each search result item emits a Finding with Source=item.link, SourceType="recon:google", Confidence="low"
|
|
- BingDorkSource.Name() == "bing"
|
|
- BingDorkSource.RateLimit() == rate.Every(500*time.Millisecond) (Bing allows 3 TPS on S1 tier)
|
|
- BingDorkSource.Enabled() == true only when APIKey is non-empty
|
|
- BingDorkSource.Sweep() calls Bing Web Search API v7: GET https://api.bing.microsoft.com/v7.0/search?q={query}&count=50 with Ocp-Apim-Subscription-Key header
|
|
- Each webPages.value item emits Finding with Source=item.url, SourceType="recon:bing"
|
|
- formatQuery("google", kw) returns `site:pastebin.com OR site:github.com "{kw}"` (dork-style)
|
|
- formatQuery("bing", kw) returns same dork-style format
|
|
- ctx cancellation aborts both sources promptly
|
|
- Transient HTTP errors (429/5xx) are retried via sources.Client; 401 aborts sweep
|
|
</behavior>
|
|
<action>
|
|
Create `pkg/recon/sources/google.go`:
|
|
- Struct: `GoogleDorkSource` with fields: APIKey string, CX string, BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, client *Client
|
|
- Compile-time interface assertion: `var _ recon.ReconSource = (*GoogleDorkSource)(nil)`
|
|
- Name() returns "google"
|
|
- RateLimit() returns rate.Every(1*time.Second)
|
|
- Burst() returns 1
|
|
- RespectsRobots() returns false
|
|
- Enabled() returns s.APIKey != "" && s.CX != ""
|
|
- Sweep(): iterate BuildQueries(registry, "google"), for each query: wait on LimiterRegistry, build GET request to `{BaseURL}/customsearch/v1?key={APIKey}&cx={CX}&q={url.QueryEscape(q)}&num=10`, set Accept: application/json, call client.Do, decode JSON response `{ items: [{ title, link, snippet }] }`, emit Finding per item with Source=link, SourceType="recon:google", ProviderName from keyword index (same pattern as githubKeywordIndex), Confidence="low". On 401 abort; on transient error continue to next query.
|
|
- Private response structs: googleSearchResponse, googleSearchItem
|
|
|
|
Create `pkg/recon/sources/bing.go`:
|
|
- Struct: `BingDorkSource` with fields: APIKey string, BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, client *Client
|
|
- Name() returns "bing"
|
|
- RateLimit() returns rate.Every(500*time.Millisecond)
|
|
- Burst() returns 2
|
|
- RespectsRobots() returns false
|
|
- Enabled() returns s.APIKey != ""
|
|
- Sweep(): iterate BuildQueries(registry, "bing"), for each: wait on limiter, GET `{BaseURL}/v7.0/search?q={query}&count=50`, set Ocp-Apim-Subscription-Key header, decode JSON `{ webPages: { value: [{ name, url, snippet }] } }`, emit Finding per value item with Source=url, SourceType="recon:bing". Same error handling pattern.
|
|
- Private response structs: bingSearchResponse, bingWebPages, bingWebResult
|
|
|
|
Update `pkg/recon/sources/queries.go` formatQuery():
|
|
- Add cases for "google", "bing", "duckduckgo", "yandex", "brave" that return the keyword wrapped in dork syntax: `site:pastebin.com OR site:github.com "%s"` using fmt.Sprintf with the keyword. This focuses search results on paste/code hosting sites where keys leak.
|
|
|
|
Create test files with httptest servers returning canned JSON fixtures. Each test:
|
|
- Verifies Sweep emits correct number of findings
|
|
- Verifies SourceType is correct
|
|
- Verifies Source URLs match fixture data
|
|
- Verifies Enabled() behavior with/without credentials
|
|
- Verifies ctx cancellation returns error
|
|
</action>
|
|
<verify>
|
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestGoogle|TestBing" -v -count=1</automated>
|
|
</verify>
|
|
<done>GoogleDorkSource and BingDorkSource pass all tests. formatQuery handles google/bing cases.</done>
|
|
</task>
|
|
|
|
<task type="auto" tdd="true">
|
|
<name>Task 2: DuckDuckGoSource + YandexSource + BraveSource</name>
|
|
<files>pkg/recon/sources/duckduckgo.go, pkg/recon/sources/duckduckgo_test.go, pkg/recon/sources/yandex.go, pkg/recon/sources/yandex_test.go, pkg/recon/sources/brave.go, pkg/recon/sources/brave_test.go</files>
|
|
<behavior>
|
|
- DuckDuckGoSource.Name() == "duckduckgo"
|
|
- DuckDuckGoSource.RateLimit() == rate.Every(2*time.Second) (no official API, scrape-conservative)
|
|
- DuckDuckGoSource.RespectsRobots() == true (HTML scraper)
|
|
- DuckDuckGoSource.Enabled() always true (no API key needed -- uses DuckDuckGo HTML search)
|
|
- DuckDuckGoSource.Sweep() GETs `https://html.duckduckgo.com/html/?q={query}`, parses HTML for result links in <a class="result__a" href="..."> anchors, emits Findings
|
|
- YandexSource.Name() == "yandex"
|
|
- YandexSource.RateLimit() == rate.Every(1*time.Second)
|
|
- YandexSource.RespectsRobots() == false (uses Yandex XML search API)
|
|
- YandexSource.Enabled() == true only when User and APIKey are both non-empty
|
|
- YandexSource.Sweep() GETs `https://yandex.com/search/xml?user={user}&key={key}&query={q}&l10n=en&sortby=rlv&filter=none&groupby=attr%3D%22%22.mode%3Dflat.groups-on-page%3D50`, parses XML response for <url> elements
|
|
- BraveSource.Name() == "brave"
|
|
- BraveSource.RateLimit() == rate.Every(1*time.Second) (Brave Search API: 1 QPS free tier)
|
|
- BraveSource.Enabled() == true only when APIKey is non-empty
|
|
- BraveSource.Sweep() GETs `https://api.search.brave.com/res/v1/web/search?q={query}&count=20` with X-Subscription-Token header, decodes JSON { web: { results: [{ url, title }] } }, emits Findings
|
|
</behavior>
|
|
<action>
|
|
Create `pkg/recon/sources/duckduckgo.go`:
|
|
- Struct: `DuckDuckGoSource` with BaseURL, Registry, Limiters, Client fields
|
|
- Name() "duckduckgo", RateLimit() Every(2s), Burst() 1, RespectsRobots() true
|
|
- Enabled() always true (credential-free, like Replit)
|
|
- Sweep(): iterate BuildQueries(registry, "duckduckgo"), for each: wait limiter, GET `{BaseURL}/html/?q={query}`, parse HTML using golang.org/x/net/html (same as Replit pattern), extract href from `<a class="result__a">` or `<a class="result__url">` elements. Use a regex or attribute check: look for <a> tags whose class contains "result__a". Emit Finding with Source=extracted URL, SourceType="recon:duckduckgo". Deduplicate results within the same query.
|
|
|
|
Create `pkg/recon/sources/yandex.go`:
|
|
- Struct: `YandexSource` with User, APIKey, BaseURL, Registry, Limiters, client fields
|
|
- Name() "yandex", RateLimit() Every(1s), Burst() 1, RespectsRobots() false
|
|
- Enabled() returns s.User != "" && s.APIKey != ""
|
|
- Sweep(): iterate BuildQueries, for each: wait limiter, GET `{BaseURL}/search/xml?user={User}&key={APIKey}&query={url.QueryEscape(q)}&l10n=en&sortby=rlv&filter=none&groupby=attr%3D%22%22.mode%3Dflat.groups-on-page%3D50`, decode XML using encoding/xml. Response structure: `<yandexsearch><response><results><grouping><group><doc><url>...</url></doc></group></grouping></results></response></yandexsearch>`. Emit Finding per <url>. SourceType="recon:yandex".
|
|
|
|
Create `pkg/recon/sources/brave.go`:
|
|
- Struct: `BraveSource` with APIKey, BaseURL, Registry, Limiters, client fields
|
|
- Name() "brave", RateLimit() Every(1s), Burst() 1, RespectsRobots() false
|
|
- Enabled() returns s.APIKey != ""
|
|
- Sweep(): iterate BuildQueries, for each: wait limiter, GET `{BaseURL}/res/v1/web/search?q={query}&count=20`, set X-Subscription-Token header to APIKey, Accept: application/json. Decode JSON `{ web: { results: [{ url, title, description }] } }`. Emit Finding per result. SourceType="recon:brave".
|
|
|
|
All three follow the same error handling pattern as Task 1: 401 aborts, transient errors continue, ctx cancellation returns immediately.
|
|
|
|
Create test files with httptest servers. DuckDuckGo test serves HTML fixture with result anchors. Yandex test serves XML fixture. Brave test serves JSON fixture. Each test covers: Sweep emits findings, SourceType correct, Enabled behavior, ctx cancellation.
|
|
</action>
|
|
<verify>
|
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestDuckDuckGo|TestYandex|TestBrave" -v -count=1</automated>
|
|
</verify>
|
|
<done>DuckDuckGoSource, YandexSource, and BraveSource pass all tests. All five search sources complete.</done>
|
|
</task>
|
|
|
|
</tasks>
|
|
|
|
<verification>
|
|
All five search engine sources compile and pass unit tests:
|
|
```bash
|
|
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestGoogle|TestBing|TestDuckDuckGo|TestYandex|TestBrave" -v -count=1
|
|
```
|
|
</verification>
|
|
|
|
<success_criteria>
|
|
- 5 new source files exist in pkg/recon/sources/ (google.go, bing.go, duckduckgo.go, yandex.go, brave.go)
|
|
- Each source implements recon.ReconSource with compile-time assertion
|
|
- Each has a corresponding _test.go file with httptest-based tests
|
|
- formatQuery in queries.go handles all 5 new source names
|
|
- All tests pass
|
|
</success_criteria>
|
|
|
|
<output>
|
|
After completion, create `.planning/phases/11-osint_search_paste/11-01-SUMMARY.md`
|
|
</output>
|