13 KiB
13 KiB
phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
| phase | plan | type | wave | depends_on | files_modified | autonomous | requirements | must_haves | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11-osint-search-paste | 01 | execute | 1 |
|
true |
|
|
Purpose: RECON-DORK-01/02/03 -- enable automated search engine dorking for API key leak detection across all major search engines. Output: Five source files + tests, updated queries.go formatQuery.
<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @pkg/recon/source.go @pkg/recon/sources/httpclient.go @pkg/recon/sources/queries.go @pkg/recon/sources/github.go (reference pattern for API-backed source) @pkg/recon/sources/replit.go (reference pattern for scraping source)From pkg/recon/source.go:
type ReconSource interface {
Name() string
RateLimit() rate.Limit
Burst() int
RespectsRobots() bool
Enabled(cfg Config) bool
Sweep(ctx context.Context, query string, out chan<- Finding) error
}
From pkg/recon/sources/httpclient.go:
type Client struct { HTTP *http.Client; MaxRetries int; UserAgent string }
func NewClient() *Client
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
From pkg/recon/sources/queries.go:
func BuildQueries(reg *providers.Registry, source string) []string
func formatQuery(source, keyword string) string // needs new cases
From pkg/recon/sources/register.go:
type SourcesConfig struct { ... } // will be extended in Plan 11-03
Create `pkg/recon/sources/bing.go`:
- Struct: `BingDorkSource` with fields: APIKey string, BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, client *Client
- Name() returns "bing"
- RateLimit() returns rate.Every(500*time.Millisecond)
- Burst() returns 2
- RespectsRobots() returns false
- Enabled() returns s.APIKey != ""
- Sweep(): iterate BuildQueries(registry, "bing"), for each: wait on limiter, GET `{BaseURL}/v7.0/search?q={query}&count=50`, set Ocp-Apim-Subscription-Key header, decode JSON `{ webPages: { value: [{ name, url, snippet }] } }`, emit Finding per value item with Source=url, SourceType="recon:bing". Same error handling pattern.
- Private response structs: bingSearchResponse, bingWebPages, bingWebResult
Update `pkg/recon/sources/queries.go` formatQuery():
- Add cases for "google", "bing", "duckduckgo", "yandex", "brave" that return the keyword wrapped in dork syntax: `site:pastebin.com OR site:github.com "%s"` using fmt.Sprintf with the keyword. This focuses search results on paste/code hosting sites where keys leak.
Create test files with httptest servers returning canned JSON fixtures. Each test:
- Verifies Sweep emits correct number of findings
- Verifies SourceType is correct
- Verifies Source URLs match fixture data
- Verifies Enabled() behavior with/without credentials
- Verifies ctx cancellation returns error
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestGoogle|TestBing" -v -count=1
GoogleDorkSource and BingDorkSource pass all tests. formatQuery handles google/bing cases.
Task 2: DuckDuckGoSource + YandexSource + BraveSource
pkg/recon/sources/duckduckgo.go, pkg/recon/sources/duckduckgo_test.go, pkg/recon/sources/yandex.go, pkg/recon/sources/yandex_test.go, pkg/recon/sources/brave.go, pkg/recon/sources/brave_test.go
- DuckDuckGoSource.Name() == "duckduckgo"
- DuckDuckGoSource.RateLimit() == rate.Every(2*time.Second) (no official API, scrape-conservative)
- DuckDuckGoSource.RespectsRobots() == true (HTML scraper)
- DuckDuckGoSource.Enabled() always true (no API key needed -- uses DuckDuckGo HTML search)
- DuckDuckGoSource.Sweep() GETs `https://html.duckduckgo.com/html/?q={query}`, parses HTML for result links in anchors, emits Findings
- YandexSource.Name() == "yandex"
- YandexSource.RateLimit() == rate.Every(1*time.Second)
- YandexSource.RespectsRobots() == false (uses Yandex XML search API)
- YandexSource.Enabled() == true only when User and APIKey are both non-empty
- YandexSource.Sweep() GETs `https://yandex.com/search/xml?user={user}&key={key}&query={q}&l10n=en&sortby=rlv&filter=none&groupby=attr%3D%22%22.mode%3Dflat.groups-on-page%3D50`, parses XML response for elements
- BraveSource.Name() == "brave"
- BraveSource.RateLimit() == rate.Every(1*time.Second) (Brave Search API: 1 QPS free tier)
- BraveSource.Enabled() == true only when APIKey is non-empty
- BraveSource.Sweep() GETs `https://api.search.brave.com/res/v1/web/search?q={query}&count=20` with X-Subscription-Token header, decodes JSON { web: { results: [{ url, title }] } }, emits Findings
Create `pkg/recon/sources/duckduckgo.go`:
- Struct: `DuckDuckGoSource` with BaseURL, Registry, Limiters, Client fields
- Name() "duckduckgo", RateLimit() Every(2s), Burst() 1, RespectsRobots() true
- Enabled() always true (credential-free, like Replit)
- Sweep(): iterate BuildQueries(registry, "duckduckgo"), for each: wait limiter, GET `{BaseURL}/html/?q={query}`, parse HTML using golang.org/x/net/html (same as Replit pattern), extract href from `` or `` elements. Use a regex or attribute check: look for tags whose class contains "result__a". Emit Finding with Source=extracted URL, SourceType="recon:duckduckgo". Deduplicate results within the same query.
Create `pkg/recon/sources/yandex.go`:
- Struct: `YandexSource` with User, APIKey, BaseURL, Registry, Limiters, client fields
- Name() "yandex", RateLimit() Every(1s), Burst() 1, RespectsRobots() false
- Enabled() returns s.User != "" && s.APIKey != ""
- Sweep(): iterate BuildQueries, for each: wait limiter, GET `{BaseURL}/search/xml?user={User}&key={APIKey}&query={url.QueryEscape(q)}&l10n=en&sortby=rlv&filter=none&groupby=attr%3D%22%22.mode%3Dflat.groups-on-page%3D50`, decode XML using encoding/xml. Response structure: `<yandexsearch><response><results><grouping><group><doc><url>...</url></doc></group></grouping></results></response></yandexsearch>`. Emit Finding per <url>. SourceType="recon:yandex".
Create `pkg/recon/sources/brave.go`:
- Struct: `BraveSource` with APIKey, BaseURL, Registry, Limiters, client fields
- Name() "brave", RateLimit() Every(1s), Burst() 1, RespectsRobots() false
- Enabled() returns s.APIKey != ""
- Sweep(): iterate BuildQueries, for each: wait limiter, GET `{BaseURL}/res/v1/web/search?q={query}&count=20`, set X-Subscription-Token header to APIKey, Accept: application/json. Decode JSON `{ web: { results: [{ url, title, description }] } }`. Emit Finding per result. SourceType="recon:brave".
All three follow the same error handling pattern as Task 1: 401 aborts, transient errors continue, ctx cancellation returns immediately.
Create test files with httptest servers. DuckDuckGo test serves HTML fixture with result anchors. Yandex test serves XML fixture. Brave test serves JSON fixture. Each test covers: Sweep emits findings, SourceType correct, Enabled behavior, ctx cancellation.
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestDuckDuckGo|TestYandex|TestBrave" -v -count=1
DuckDuckGoSource, YandexSource, and BraveSource pass all tests. All five search sources complete.
All five search engine sources compile and pass unit tests:
```bash
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestGoogle|TestBing|TestDuckDuckGo|TestYandex|TestBrave" -v -count=1
```
<success_criteria>
- 5 new source files exist in pkg/recon/sources/ (google.go, bing.go, duckduckgo.go, yandex.go, brave.go)
- Each source implements recon.ReconSource with compile-time assertion
- Each has a corresponding _test.go file with httptest-based tests
- formatQuery in queries.go handles all 5 new source names
- All tests pass </success_criteria>