docs(11): create phase plan — 3 plans for search engine dorking + paste sites
This commit is contained in:
241
.planning/phases/11-osint_search_paste/11-01-PLAN.md
Normal file
241
.planning/phases/11-osint_search_paste/11-01-PLAN.md
Normal file
@@ -0,0 +1,241 @@
|
||||
---
|
||||
phase: 11-osint-search-paste
|
||||
plan: 01
|
||||
type: execute
|
||||
wave: 1
|
||||
depends_on: []
|
||||
files_modified:
|
||||
- pkg/recon/sources/google.go
|
||||
- pkg/recon/sources/google_test.go
|
||||
- pkg/recon/sources/bing.go
|
||||
- pkg/recon/sources/bing_test.go
|
||||
- pkg/recon/sources/duckduckgo.go
|
||||
- pkg/recon/sources/duckduckgo_test.go
|
||||
- pkg/recon/sources/yandex.go
|
||||
- pkg/recon/sources/yandex_test.go
|
||||
- pkg/recon/sources/brave.go
|
||||
- pkg/recon/sources/brave_test.go
|
||||
- pkg/recon/sources/queries.go
|
||||
autonomous: true
|
||||
requirements: [RECON-DORK-01, RECON-DORK-02, RECON-DORK-03]
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "Google dorking source searches via Google Custom Search JSON API and emits findings with dork query context"
|
||||
- "Bing dorking source searches via Bing Web Search API and emits findings"
|
||||
- "DuckDuckGo, Yandex, and Brave sources each search their respective APIs/endpoints and emit findings"
|
||||
- "All five sources respect ctx cancellation and use LimiterRegistry for rate limiting"
|
||||
- "Missing API keys disable the source (Enabled=false) without error"
|
||||
artifacts:
|
||||
- path: "pkg/recon/sources/google.go"
|
||||
provides: "GoogleDorkSource implementing recon.ReconSource"
|
||||
contains: "func (s *GoogleDorkSource) Sweep"
|
||||
- path: "pkg/recon/sources/bing.go"
|
||||
provides: "BingDorkSource implementing recon.ReconSource"
|
||||
contains: "func (s *BingDorkSource) Sweep"
|
||||
- path: "pkg/recon/sources/duckduckgo.go"
|
||||
provides: "DuckDuckGoSource implementing recon.ReconSource"
|
||||
contains: "func (s *DuckDuckGoSource) Sweep"
|
||||
- path: "pkg/recon/sources/yandex.go"
|
||||
provides: "YandexSource implementing recon.ReconSource"
|
||||
contains: "func (s *YandexSource) Sweep"
|
||||
- path: "pkg/recon/sources/brave.go"
|
||||
provides: "BraveSource implementing recon.ReconSource"
|
||||
contains: "func (s *BraveSource) Sweep"
|
||||
key_links:
|
||||
- from: "pkg/recon/sources/google.go"
|
||||
to: "pkg/recon/sources/httpclient.go"
|
||||
via: "sources.Client for HTTP with retry"
|
||||
pattern: "client\\.Do"
|
||||
- from: "pkg/recon/sources/queries.go"
|
||||
to: "all five search sources"
|
||||
via: "formatQuery switch cases"
|
||||
pattern: "case \"google\"|\"bing\"|\"duckduckgo\"|\"yandex\"|\"brave\""
|
||||
---
|
||||
|
||||
<objective>
|
||||
Implement five search engine dorking ReconSource implementations: GoogleDorkSource, BingDorkSource, DuckDuckGoSource, YandexSource, and BraveSource.
|
||||
|
||||
Purpose: RECON-DORK-01/02/03 -- enable automated search engine dorking for API key leak detection across all major search engines.
|
||||
Output: Five source files + tests, updated queries.go formatQuery.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@pkg/recon/source.go
|
||||
@pkg/recon/sources/httpclient.go
|
||||
@pkg/recon/sources/queries.go
|
||||
@pkg/recon/sources/github.go (reference pattern for API-backed source)
|
||||
@pkg/recon/sources/replit.go (reference pattern for scraping source)
|
||||
|
||||
<interfaces>
|
||||
<!-- Executor needs these contracts -->
|
||||
|
||||
From pkg/recon/source.go:
|
||||
```go
|
||||
type ReconSource interface {
|
||||
Name() string
|
||||
RateLimit() rate.Limit
|
||||
Burst() int
|
||||
RespectsRobots() bool
|
||||
Enabled(cfg Config) bool
|
||||
Sweep(ctx context.Context, query string, out chan<- Finding) error
|
||||
}
|
||||
```
|
||||
|
||||
From pkg/recon/sources/httpclient.go:
|
||||
```go
|
||||
type Client struct { HTTP *http.Client; MaxRetries int; UserAgent string }
|
||||
func NewClient() *Client
|
||||
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
|
||||
```
|
||||
|
||||
From pkg/recon/sources/queries.go:
|
||||
```go
|
||||
func BuildQueries(reg *providers.Registry, source string) []string
|
||||
func formatQuery(source, keyword string) string // needs new cases
|
||||
```
|
||||
|
||||
From pkg/recon/sources/register.go:
|
||||
```go
|
||||
type SourcesConfig struct { ... } // will be extended in Plan 11-03
|
||||
```
|
||||
</interfaces>
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 1: GoogleDorkSource + BingDorkSource + formatQuery updates</name>
|
||||
<files>pkg/recon/sources/google.go, pkg/recon/sources/google_test.go, pkg/recon/sources/bing.go, pkg/recon/sources/bing_test.go, pkg/recon/sources/queries.go</files>
|
||||
<behavior>
|
||||
- GoogleDorkSource.Name() == "google"
|
||||
- GoogleDorkSource.RateLimit() == rate.Every(1*time.Second) (Google Custom Search: 100/day free, be conservative)
|
||||
- GoogleDorkSource.Burst() == 1
|
||||
- GoogleDorkSource.RespectsRobots() == false (authenticated API)
|
||||
- GoogleDorkSource.Enabled() == true only when APIKey AND CX (search engine ID) are both non-empty
|
||||
- GoogleDorkSource.Sweep() calls Google Custom Search JSON API: GET https://www.googleapis.com/customsearch/v1?key={key}&cx={cx}&q={query}&num=10
|
||||
- Each search result item emits a Finding with Source=item.link, SourceType="recon:google", Confidence="low"
|
||||
- BingDorkSource.Name() == "bing"
|
||||
- BingDorkSource.RateLimit() == rate.Every(500*time.Millisecond) (Bing allows 3 TPS on S1 tier)
|
||||
- BingDorkSource.Enabled() == true only when APIKey is non-empty
|
||||
- BingDorkSource.Sweep() calls Bing Web Search API v7: GET https://api.bing.microsoft.com/v7.0/search?q={query}&count=50 with Ocp-Apim-Subscription-Key header
|
||||
- Each webPages.value item emits Finding with Source=item.url, SourceType="recon:bing"
|
||||
- formatQuery("google", kw) returns `site:pastebin.com OR site:github.com "{kw}"` (dork-style)
|
||||
- formatQuery("bing", kw) returns same dork-style format
|
||||
- ctx cancellation aborts both sources promptly
|
||||
- Transient HTTP errors (429/5xx) are retried via sources.Client; 401 aborts sweep
|
||||
</behavior>
|
||||
<action>
|
||||
Create `pkg/recon/sources/google.go`:
|
||||
- Struct: `GoogleDorkSource` with fields: APIKey string, CX string, BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, client *Client
|
||||
- Compile-time interface assertion: `var _ recon.ReconSource = (*GoogleDorkSource)(nil)`
|
||||
- Name() returns "google"
|
||||
- RateLimit() returns rate.Every(1*time.Second)
|
||||
- Burst() returns 1
|
||||
- RespectsRobots() returns false
|
||||
- Enabled() returns s.APIKey != "" && s.CX != ""
|
||||
- Sweep(): iterate BuildQueries(registry, "google"), for each query: wait on LimiterRegistry, build GET request to `{BaseURL}/customsearch/v1?key={APIKey}&cx={CX}&q={url.QueryEscape(q)}&num=10`, set Accept: application/json, call client.Do, decode JSON response `{ items: [{ title, link, snippet }] }`, emit Finding per item with Source=link, SourceType="recon:google", ProviderName from keyword index (same pattern as githubKeywordIndex), Confidence="low". On 401 abort; on transient error continue to next query.
|
||||
- Private response structs: googleSearchResponse, googleSearchItem
|
||||
|
||||
Create `pkg/recon/sources/bing.go`:
|
||||
- Struct: `BingDorkSource` with fields: APIKey string, BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, client *Client
|
||||
- Name() returns "bing"
|
||||
- RateLimit() returns rate.Every(500*time.Millisecond)
|
||||
- Burst() returns 2
|
||||
- RespectsRobots() returns false
|
||||
- Enabled() returns s.APIKey != ""
|
||||
- Sweep(): iterate BuildQueries(registry, "bing"), for each: wait on limiter, GET `{BaseURL}/v7.0/search?q={query}&count=50`, set Ocp-Apim-Subscription-Key header, decode JSON `{ webPages: { value: [{ name, url, snippet }] } }`, emit Finding per value item with Source=url, SourceType="recon:bing". Same error handling pattern.
|
||||
- Private response structs: bingSearchResponse, bingWebPages, bingWebResult
|
||||
|
||||
Update `pkg/recon/sources/queries.go` formatQuery():
|
||||
- Add cases for "google", "bing", "duckduckgo", "yandex", "brave" that return the keyword wrapped in dork syntax: `site:pastebin.com OR site:github.com "%s"` using fmt.Sprintf with the keyword. This focuses search results on paste/code hosting sites where keys leak.
|
||||
|
||||
Create test files with httptest servers returning canned JSON fixtures. Each test:
|
||||
- Verifies Sweep emits correct number of findings
|
||||
- Verifies SourceType is correct
|
||||
- Verifies Source URLs match fixture data
|
||||
- Verifies Enabled() behavior with/without credentials
|
||||
- Verifies ctx cancellation returns error
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestGoogle|TestBing" -v -count=1</automated>
|
||||
</verify>
|
||||
<done>GoogleDorkSource and BingDorkSource pass all tests. formatQuery handles google/bing cases.</done>
|
||||
</task>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 2: DuckDuckGoSource + YandexSource + BraveSource</name>
|
||||
<files>pkg/recon/sources/duckduckgo.go, pkg/recon/sources/duckduckgo_test.go, pkg/recon/sources/yandex.go, pkg/recon/sources/yandex_test.go, pkg/recon/sources/brave.go, pkg/recon/sources/brave_test.go</files>
|
||||
<behavior>
|
||||
- DuckDuckGoSource.Name() == "duckduckgo"
|
||||
- DuckDuckGoSource.RateLimit() == rate.Every(2*time.Second) (no official API, scrape-conservative)
|
||||
- DuckDuckGoSource.RespectsRobots() == true (HTML scraper)
|
||||
- DuckDuckGoSource.Enabled() always true (no API key needed -- uses DuckDuckGo HTML search)
|
||||
- DuckDuckGoSource.Sweep() GETs `https://html.duckduckgo.com/html/?q={query}`, parses HTML for result links in <a class="result__a" href="..."> anchors, emits Findings
|
||||
- YandexSource.Name() == "yandex"
|
||||
- YandexSource.RateLimit() == rate.Every(1*time.Second)
|
||||
- YandexSource.RespectsRobots() == false (uses Yandex XML search API)
|
||||
- YandexSource.Enabled() == true only when User and APIKey are both non-empty
|
||||
- YandexSource.Sweep() GETs `https://yandex.com/search/xml?user={user}&key={key}&query={q}&l10n=en&sortby=rlv&filter=none&groupby=attr%3D%22%22.mode%3Dflat.groups-on-page%3D50`, parses XML response for <url> elements
|
||||
- BraveSource.Name() == "brave"
|
||||
- BraveSource.RateLimit() == rate.Every(1*time.Second) (Brave Search API: 1 QPS free tier)
|
||||
- BraveSource.Enabled() == true only when APIKey is non-empty
|
||||
- BraveSource.Sweep() GETs `https://api.search.brave.com/res/v1/web/search?q={query}&count=20` with X-Subscription-Token header, decodes JSON { web: { results: [{ url, title }] } }, emits Findings
|
||||
</behavior>
|
||||
<action>
|
||||
Create `pkg/recon/sources/duckduckgo.go`:
|
||||
- Struct: `DuckDuckGoSource` with BaseURL, Registry, Limiters, Client fields
|
||||
- Name() "duckduckgo", RateLimit() Every(2s), Burst() 1, RespectsRobots() true
|
||||
- Enabled() always true (credential-free, like Replit)
|
||||
- Sweep(): iterate BuildQueries(registry, "duckduckgo"), for each: wait limiter, GET `{BaseURL}/html/?q={query}`, parse HTML using golang.org/x/net/html (same as Replit pattern), extract href from `<a class="result__a">` or `<a class="result__url">` elements. Use a regex or attribute check: look for <a> tags whose class contains "result__a". Emit Finding with Source=extracted URL, SourceType="recon:duckduckgo". Deduplicate results within the same query.
|
||||
|
||||
Create `pkg/recon/sources/yandex.go`:
|
||||
- Struct: `YandexSource` with User, APIKey, BaseURL, Registry, Limiters, client fields
|
||||
- Name() "yandex", RateLimit() Every(1s), Burst() 1, RespectsRobots() false
|
||||
- Enabled() returns s.User != "" && s.APIKey != ""
|
||||
- Sweep(): iterate BuildQueries, for each: wait limiter, GET `{BaseURL}/search/xml?user={User}&key={APIKey}&query={url.QueryEscape(q)}&l10n=en&sortby=rlv&filter=none&groupby=attr%3D%22%22.mode%3Dflat.groups-on-page%3D50`, decode XML using encoding/xml. Response structure: `<yandexsearch><response><results><grouping><group><doc><url>...</url></doc></group></grouping></results></response></yandexsearch>`. Emit Finding per <url>. SourceType="recon:yandex".
|
||||
|
||||
Create `pkg/recon/sources/brave.go`:
|
||||
- Struct: `BraveSource` with APIKey, BaseURL, Registry, Limiters, client fields
|
||||
- Name() "brave", RateLimit() Every(1s), Burst() 1, RespectsRobots() false
|
||||
- Enabled() returns s.APIKey != ""
|
||||
- Sweep(): iterate BuildQueries, for each: wait limiter, GET `{BaseURL}/res/v1/web/search?q={query}&count=20`, set X-Subscription-Token header to APIKey, Accept: application/json. Decode JSON `{ web: { results: [{ url, title, description }] } }`. Emit Finding per result. SourceType="recon:brave".
|
||||
|
||||
All three follow the same error handling pattern as Task 1: 401 aborts, transient errors continue, ctx cancellation returns immediately.
|
||||
|
||||
Create test files with httptest servers. DuckDuckGo test serves HTML fixture with result anchors. Yandex test serves XML fixture. Brave test serves JSON fixture. Each test covers: Sweep emits findings, SourceType correct, Enabled behavior, ctx cancellation.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestDuckDuckGo|TestYandex|TestBrave" -v -count=1</automated>
|
||||
</verify>
|
||||
<done>DuckDuckGoSource, YandexSource, and BraveSource pass all tests. All five search sources complete.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
All five search engine sources compile and pass unit tests:
|
||||
```bash
|
||||
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestGoogle|TestBing|TestDuckDuckGo|TestYandex|TestBrave" -v -count=1
|
||||
```
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- 5 new source files exist in pkg/recon/sources/ (google.go, bing.go, duckduckgo.go, yandex.go, brave.go)
|
||||
- Each source implements recon.ReconSource with compile-time assertion
|
||||
- Each has a corresponding _test.go file with httptest-based tests
|
||||
- formatQuery in queries.go handles all 5 new source names
|
||||
- All tests pass
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/11-osint_search_paste/11-01-SUMMARY.md`
|
||||
</output>
|
||||
199
.planning/phases/11-osint_search_paste/11-02-PLAN.md
Normal file
199
.planning/phases/11-osint_search_paste/11-02-PLAN.md
Normal file
@@ -0,0 +1,199 @@
|
||||
---
|
||||
phase: 11-osint-search-paste
|
||||
plan: 02
|
||||
type: execute
|
||||
wave: 1
|
||||
depends_on: []
|
||||
files_modified:
|
||||
- pkg/recon/sources/pastebin.go
|
||||
- pkg/recon/sources/pastebin_test.go
|
||||
- pkg/recon/sources/gistpaste.go
|
||||
- pkg/recon/sources/gistpaste_test.go
|
||||
- pkg/recon/sources/pastesites.go
|
||||
- pkg/recon/sources/pastesites_test.go
|
||||
autonomous: true
|
||||
requirements: [RECON-PASTE-01]
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "PastebinSource scrapes Pastebin search results and emits findings for pastes containing provider keywords"
|
||||
- "GistPasteSource searches public GitHub Gists via unauthenticated scraping (distinct from Phase 10 GistSource which uses API)"
|
||||
- "PasteSitesSource aggregates results from dpaste, paste.ee, rentry.co, ix.io, and similar sites"
|
||||
- "All paste sources feed raw content through keyword matching against the provider registry"
|
||||
- "Missing credentials disable sources that need them; credential-free sources are always enabled"
|
||||
artifacts:
|
||||
- path: "pkg/recon/sources/pastebin.go"
|
||||
provides: "PastebinSource implementing recon.ReconSource"
|
||||
contains: "func (s *PastebinSource) Sweep"
|
||||
- path: "pkg/recon/sources/gistpaste.go"
|
||||
provides: "GistPasteSource implementing recon.ReconSource"
|
||||
contains: "func (s *GistPasteSource) Sweep"
|
||||
- path: "pkg/recon/sources/pastesites.go"
|
||||
provides: "PasteSitesSource implementing recon.ReconSource with multi-site sub-platform pattern"
|
||||
contains: "func (s *PasteSitesSource) Sweep"
|
||||
key_links:
|
||||
- from: "pkg/recon/sources/pastebin.go"
|
||||
to: "pkg/recon/sources/httpclient.go"
|
||||
via: "sources.Client for HTTP with retry"
|
||||
pattern: "client\\.Do"
|
||||
- from: "pkg/recon/sources/pastesites.go"
|
||||
to: "providers.Registry"
|
||||
via: "keyword matching on paste content"
|
||||
pattern: "keywordSet|BuildQueries"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Implement three paste site ReconSource implementations: PastebinSource, GistPasteSource, and PasteSitesSource (multi-site aggregator for dpaste, paste.ee, rentry.co, ix.io, etc.).
|
||||
|
||||
Purpose: RECON-PASTE-01 -- detect API key leaks across public paste sites.
|
||||
Output: Three source files + tests covering paste site scanning.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@pkg/recon/source.go
|
||||
@pkg/recon/sources/httpclient.go
|
||||
@pkg/recon/sources/queries.go
|
||||
@pkg/recon/sources/gist.go (reference: Phase 10 GistSource uses GitHub API -- this plan's GistPasteSource is a scraping alternative)
|
||||
@pkg/recon/sources/replit.go (reference pattern for HTML scraping source)
|
||||
@pkg/recon/sources/sandboxes.go (reference pattern for multi-platform aggregator)
|
||||
|
||||
<interfaces>
|
||||
From pkg/recon/source.go:
|
||||
```go
|
||||
type ReconSource interface {
|
||||
Name() string
|
||||
RateLimit() rate.Limit
|
||||
Burst() int
|
||||
RespectsRobots() bool
|
||||
Enabled(cfg Config) bool
|
||||
Sweep(ctx context.Context, query string, out chan<- Finding) error
|
||||
}
|
||||
```
|
||||
|
||||
From pkg/recon/sources/httpclient.go:
|
||||
```go
|
||||
func NewClient() *Client
|
||||
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
|
||||
```
|
||||
|
||||
From pkg/recon/sources/gist.go (existing Phase 10 GistSource -- avoid name collision):
|
||||
```go
|
||||
type GistSource struct { ... } // Name() == "gist" -- already taken
|
||||
func (s *GistSource) keywordSet() map[string]string // pattern to reuse
|
||||
```
|
||||
</interfaces>
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 1: PastebinSource + GistPasteSource</name>
|
||||
<files>pkg/recon/sources/pastebin.go, pkg/recon/sources/pastebin_test.go, pkg/recon/sources/gistpaste.go, pkg/recon/sources/gistpaste_test.go</files>
|
||||
<behavior>
|
||||
- PastebinSource.Name() == "pastebin"
|
||||
- PastebinSource.RateLimit() == rate.Every(3*time.Second) (conservative -- Pastebin scraping)
|
||||
- PastebinSource.Burst() == 1
|
||||
- PastebinSource.RespectsRobots() == true (HTML scraper)
|
||||
- PastebinSource.Enabled() always true (credential-free Google dorking of pastebin.com)
|
||||
- PastebinSource.Sweep(): For each provider keyword, scrape Google (via the same DuckDuckGo HTML endpoint as a proxy to avoid Google ToS) with query `site:pastebin.com "{keyword}"`. Parse result links. For each pastebin.com URL found, fetch the raw paste content via /raw/{paste_id} endpoint, scan content for keyword matches, emit Finding with Source=paste URL, SourceType="recon:pastebin", ProviderName from match.
|
||||
- GistPasteSource.Name() == "gistpaste" (not "gist" -- that's Phase 10's API source)
|
||||
- GistPasteSource.RateLimit() == rate.Every(3*time.Second)
|
||||
- GistPasteSource.RespectsRobots() == true (HTML scraper)
|
||||
- GistPasteSource.Enabled() always true (credential-free)
|
||||
- GistPasteSource.Sweep(): Scrape gist.github.com/search?q={keyword} (public search, no auth needed), parse HTML for gist links, fetch raw content, keyword-match against registry
|
||||
</behavior>
|
||||
<action>
|
||||
Create `pkg/recon/sources/pastebin.go`:
|
||||
- Struct: `PastebinSource` with BaseURL, Registry, Limiters, Client fields
|
||||
- Name() "pastebin", RateLimit() Every(3s), Burst() 1, RespectsRobots() true
|
||||
- Enabled() always true
|
||||
- Sweep(): Use a two-phase approach:
|
||||
Phase A: Search -- iterate BuildQueries(registry, "pastebin"). For each keyword, GET `{BaseURL}/search?q={url.QueryEscape(keyword)}` (Pastebin's own search). Parse HTML for paste links matching `^/[A-Za-z0-9]{8}$` pattern (Pastebin paste IDs are 8 alphanumeric chars). Collect unique paste IDs.
|
||||
Phase B: Fetch+Scan -- for each paste ID: wait limiter, GET `{BaseURL}/raw/{pasteID}`, read body (limit 256KB), scan content against keywordSet() (same pattern as GistSource.keywordSet). If any keyword matches, emit Finding with Source=`{BaseURL}/{pasteID}`, SourceType="recon:pastebin", ProviderName from matched keyword.
|
||||
- Helper: `pastebinKeywordSet(reg)` returning map[string]string (keyword -> provider name), same as GistSource pattern.
|
||||
|
||||
Create `pkg/recon/sources/gistpaste.go`:
|
||||
- Struct: `GistPasteSource` with BaseURL, Registry, Limiters, Client fields
|
||||
- Name() "gistpaste", RateLimit() Every(3s), Burst() 1, RespectsRobots() true
|
||||
- Enabled() always true
|
||||
- Sweep(): iterate BuildQueries(registry, "gistpaste"). For each keyword, GET `{BaseURL}/search?q={url.QueryEscape(keyword)}` (gist.github.com search). Parse HTML for gist links matching `^/[^/]+/[a-f0-9]+$` pattern. For each gist link, construct raw URL `{BaseURL}{gistPath}/raw` and fetch content (limit 256KB). Keyword-match and emit Finding with SourceType="recon:gistpaste".
|
||||
|
||||
Tests: httptest servers serving HTML search results + raw paste content fixtures. Verify findings emitted with correct SourceType, Source URL, and ProviderName.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestPastebin|TestGistPaste" -v -count=1</automated>
|
||||
</verify>
|
||||
<done>PastebinSource and GistPasteSource compile, pass all tests, handle ctx cancellation.</done>
|
||||
</task>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 2: PasteSitesSource (multi-paste aggregator)</name>
|
||||
<files>pkg/recon/sources/pastesites.go, pkg/recon/sources/pastesites_test.go</files>
|
||||
<behavior>
|
||||
- PasteSitesSource.Name() == "pastesites"
|
||||
- PasteSitesSource.RateLimit() == rate.Every(3*time.Second)
|
||||
- PasteSitesSource.RespectsRobots() == true
|
||||
- PasteSitesSource.Enabled() always true (all credential-free)
|
||||
- PasteSitesSource.Sweep() iterates across sub-platforms: dpaste.org, paste.ee, rentry.co, ix.io, hastebin.com
|
||||
- Each sub-platform has: Name, SearchURL pattern, result link regex, and optional raw URL construction
|
||||
- Sweep emits at least one Finding per platform when fixture data matches keywords
|
||||
- ctx cancellation stops the sweep promptly
|
||||
</behavior>
|
||||
<action>
|
||||
Create `pkg/recon/sources/pastesites.go` following the SandboxesSource multi-platform pattern from pkg/recon/sources/sandboxes.go:
|
||||
|
||||
- Define `pastePlatform` struct: Name string, SearchPath string (with %s for query), ResultLinkRegex string, RawPathTemplate string (optional, for fetching raw content), IsJSON bool
|
||||
- Default platforms:
|
||||
1. dpaste: SearchPath="/search/?q=%s", result links matching `^/[A-Za-z0-9]+$`, raw via `/{id}/raw`
|
||||
2. paste.ee: SearchPath="/search?q=%s", result links matching `^/p/[A-Za-z0-9]+$`, raw via `/r/{id}`
|
||||
3. rentry.co: SearchPath="/search?q=%s", result links matching `^/[a-z0-9-]+$`, raw via `/{slug}/raw`
|
||||
4. ix.io: No search -- skip (ix.io has no search). Remove from list.
|
||||
5. hastebin: SearchPath="/search?q=%s", result links matching `^/[a-z]+$`, raw via `/raw/{id}`
|
||||
|
||||
- Struct: `PasteSitesSource` with Platforms []pastePlatform, BaseURL string (test override), Registry, Limiters, Client fields
|
||||
- Name() "pastesites", RateLimit() Every(3s), Burst() 1, RespectsRobots() true
|
||||
- Enabled() always true
|
||||
- Sweep(): For each platform, for each keyword from BuildQueries(registry, "pastesites"):
|
||||
1. Wait limiter
|
||||
2. GET `{platform base or BaseURL}{searchPath with keyword}`
|
||||
3. Parse HTML, extract result links matching platform regex
|
||||
4. For each result link: wait limiter, GET raw content URL, read body (256KB limit), keyword-match against registry
|
||||
5. Emit Finding with Source=paste URL, SourceType="recon:pastesites", ProviderName from keyword match
|
||||
- Default platforms populated in a `defaultPastePlatforms()` function. Tests override Platforms to use httptest URLs.
|
||||
|
||||
Test: httptest mux serving search HTML + raw content for each sub-platform. Verify at least one Finding per platform fixture. Verify SourceType="recon:pastesites" on all.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestPasteSites" -v -count=1</automated>
|
||||
</verify>
|
||||
<done>PasteSitesSource aggregates across multiple paste sites, keyword-matches content, emits findings with correct SourceType.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
All paste sources compile and pass unit tests:
|
||||
```bash
|
||||
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestPastebin|TestGistPaste|TestPasteSites" -v -count=1
|
||||
```
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- 3 new source files exist (pastebin.go, gistpaste.go, pastesites.go) with tests
|
||||
- Each implements recon.ReconSource with compile-time assertion
|
||||
- PasteSitesSource covers 3+ paste sub-platforms
|
||||
- Keyword matching uses provider Registry for ProviderName population
|
||||
- All tests pass
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/11-osint_search_paste/11-02-SUMMARY.md`
|
||||
</output>
|
||||
221
.planning/phases/11-osint_search_paste/11-03-PLAN.md
Normal file
221
.planning/phases/11-osint_search_paste/11-03-PLAN.md
Normal file
@@ -0,0 +1,221 @@
|
||||
---
|
||||
phase: 11-osint-search-paste
|
||||
plan: 03
|
||||
type: execute
|
||||
wave: 2
|
||||
depends_on: ["11-01", "11-02"]
|
||||
files_modified:
|
||||
- pkg/recon/sources/register.go
|
||||
- pkg/recon/sources/register_test.go
|
||||
- pkg/recon/sources/integration_test.go
|
||||
- cmd/recon.go
|
||||
autonomous: true
|
||||
requirements: [RECON-DORK-01, RECON-DORK-02, RECON-DORK-03, RECON-PASTE-01]
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "RegisterAll wires all 8 new Phase 11 sources onto the recon engine alongside the 10 Phase 10 sources"
|
||||
- "cmd/recon.go reads Google/Bing/Yandex/Brave API keys from env vars and viper config"
|
||||
- "keyhunter recon list shows all 18 sources (10 Phase 10 + 8 Phase 11)"
|
||||
- "Integration test with httptest fixtures proves SweepAll emits findings from all 18 source types"
|
||||
- "Sources with missing credentials are registered but Enabled()==false"
|
||||
artifacts:
|
||||
- path: "pkg/recon/sources/register.go"
|
||||
provides: "RegisterAll extended with Phase 11 sources"
|
||||
contains: "GoogleDorkSource"
|
||||
- path: "pkg/recon/sources/register_test.go"
|
||||
provides: "Guardrail test asserting 18 sources registered"
|
||||
contains: "18"
|
||||
- path: "pkg/recon/sources/integration_test.go"
|
||||
provides: "SweepAll integration test covering all 18 sources"
|
||||
contains: "recon:google"
|
||||
- path: "cmd/recon.go"
|
||||
provides: "Credential wiring for search engine API keys"
|
||||
contains: "GoogleAPIKey"
|
||||
key_links:
|
||||
- from: "pkg/recon/sources/register.go"
|
||||
to: "pkg/recon/sources/google.go"
|
||||
via: "RegisterAll calls engine.Register(GoogleDorkSource)"
|
||||
pattern: "GoogleDorkSource"
|
||||
- from: "cmd/recon.go"
|
||||
to: "pkg/recon/sources/register.go"
|
||||
via: "SourcesConfig credential fields"
|
||||
pattern: "GoogleAPIKey|GoogleCX|BingAPIKey|YandexUser|YandexAPIKey|BraveAPIKey"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Wire all 8 Phase 11 sources into RegisterAll, extend SourcesConfig with search engine credentials, update cmd/recon.go for env/viper credential lookup, and create the integration test proving all 18 sources work end-to-end via SweepAll.
|
||||
|
||||
Purpose: Complete Phase 11 by connecting all new sources to the engine and proving the full 18-source sweep works.
|
||||
Output: Updated register.go, register_test.go, integration_test.go, cmd/recon.go.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@pkg/recon/sources/register.go
|
||||
@pkg/recon/sources/register_test.go
|
||||
@pkg/recon/sources/integration_test.go
|
||||
@cmd/recon.go
|
||||
|
||||
<interfaces>
|
||||
From pkg/recon/sources/register.go (current):
|
||||
```go
|
||||
type SourcesConfig struct {
|
||||
GitHubToken string
|
||||
GitLabToken string
|
||||
BitbucketToken string
|
||||
BitbucketWorkspace string
|
||||
CodebergToken string
|
||||
HuggingFaceToken string
|
||||
KaggleUser string
|
||||
KaggleKey string
|
||||
Registry *providers.Registry
|
||||
Limiters *recon.LimiterRegistry
|
||||
}
|
||||
func RegisterAll(engine *recon.Engine, cfg SourcesConfig)
|
||||
```
|
||||
|
||||
From cmd/recon.go (current):
|
||||
```go
|
||||
func buildReconEngine() *recon.Engine // constructs SourcesConfig, calls RegisterAll
|
||||
func firstNonEmpty(a, b string) string
|
||||
```
|
||||
|
||||
New sources from Plan 11-01 (to be registered):
|
||||
```go
|
||||
type GoogleDorkSource struct { APIKey, CX, BaseURL string; Registry; Limiters; client }
|
||||
type BingDorkSource struct { APIKey, BaseURL string; Registry; Limiters; client }
|
||||
type DuckDuckGoSource struct { BaseURL string; Registry; Limiters; Client }
|
||||
type YandexSource struct { User, APIKey, BaseURL string; Registry; Limiters; client }
|
||||
type BraveSource struct { APIKey, BaseURL string; Registry; Limiters; client }
|
||||
```
|
||||
|
||||
New sources from Plan 11-02 (to be registered):
|
||||
```go
|
||||
type PastebinSource struct { BaseURL string; Registry; Limiters; Client }
|
||||
type GistPasteSource struct { BaseURL string; Registry; Limiters; Client }
|
||||
type PasteSitesSource struct { Platforms; BaseURL string; Registry; Limiters; Client }
|
||||
```
|
||||
</interfaces>
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 1: Extend SourcesConfig + RegisterAll + cmd/recon.go credential wiring</name>
|
||||
<files>pkg/recon/sources/register.go, pkg/recon/sources/register_test.go, cmd/recon.go</files>
|
||||
<behavior>
|
||||
- SourcesConfig gains 6 new fields: GoogleAPIKey, GoogleCX, BingAPIKey, YandexUser, YandexAPIKey, BraveAPIKey
|
||||
- RegisterAll registers 18 sources total (10 Phase 10 + 8 Phase 11)
|
||||
- RegisterAll with nil engine is still a no-op
|
||||
- TestRegisterAll_WiresAllEighteenSources asserts eng.List() contains all 18 names sorted
|
||||
- TestRegisterAll_MissingCredsStillRegistered asserts 18 sources with empty config
|
||||
- buildReconEngine reads: GOOGLE_API_KEY / recon.google.api_key, GOOGLE_CX / recon.google.cx, BING_API_KEY / recon.bing.api_key, YANDEX_USER / recon.yandex.user, YANDEX_API_KEY / recon.yandex.api_key, BRAVE_API_KEY / recon.brave.api_key
|
||||
- reconCmd Long description updated to mention Phase 11 sources
|
||||
</behavior>
|
||||
<action>
|
||||
Update `pkg/recon/sources/register.go`:
|
||||
- Add to SourcesConfig: GoogleAPIKey, GoogleCX, BingAPIKey, YandexUser, YandexAPIKey, BraveAPIKey (all string)
|
||||
- Add Phase 11 registrations to RegisterAll after the Phase 10 block:
|
||||
```
|
||||
// Phase 11: Search engine dorking sources.
|
||||
engine.Register(&GoogleDorkSource{APIKey: cfg.GoogleAPIKey, CX: cfg.GoogleCX, Registry: reg, Limiters: lim})
|
||||
engine.Register(&BingDorkSource{APIKey: cfg.BingAPIKey, Registry: reg, Limiters: lim})
|
||||
engine.Register(&DuckDuckGoSource{Registry: reg, Limiters: lim})
|
||||
engine.Register(&YandexSource{User: cfg.YandexUser, APIKey: cfg.YandexAPIKey, Registry: reg, Limiters: lim})
|
||||
engine.Register(&BraveSource{APIKey: cfg.BraveAPIKey, Registry: reg, Limiters: lim})
|
||||
|
||||
// Phase 11: Paste site sources.
|
||||
engine.Register(&PastebinSource{Registry: reg, Limiters: lim})
|
||||
engine.Register(&GistPasteSource{Registry: reg, Limiters: lim})
|
||||
engine.Register(&PasteSitesSource{Registry: reg, Limiters: lim})
|
||||
```
|
||||
- Update doc comment on RegisterAll to say "Phase 10 + Phase 11" and total "18 sources"
|
||||
|
||||
Update `pkg/recon/sources/register_test.go`:
|
||||
- TestRegisterAll_WiresAllEighteenSources: want list = sorted 18 names: ["bing", "bitbucket", "brave", "codeberg", "codesandbox", "duckduckgo", "gist", "gistpaste", "github", "gitlab", "google", "huggingface", "kaggle", "pastebin", "pastesites", "replit", "sandboxes", "yandex"]
|
||||
- TestRegisterAll_MissingCredsStillRegistered: assert n == 18
|
||||
|
||||
Update `cmd/recon.go`:
|
||||
- Add to SourcesConfig construction in buildReconEngine():
|
||||
GoogleAPIKey: firstNonEmpty(os.Getenv("GOOGLE_API_KEY"), viper.GetString("recon.google.api_key")),
|
||||
GoogleCX: firstNonEmpty(os.Getenv("GOOGLE_CX"), viper.GetString("recon.google.cx")),
|
||||
BingAPIKey: firstNonEmpty(os.Getenv("BING_API_KEY"), viper.GetString("recon.bing.api_key")),
|
||||
YandexUser: firstNonEmpty(os.Getenv("YANDEX_USER"), viper.GetString("recon.yandex.user")),
|
||||
YandexAPIKey: firstNonEmpty(os.Getenv("YANDEX_API_KEY"), viper.GetString("recon.yandex.api_key")),
|
||||
BraveAPIKey: firstNonEmpty(os.Getenv("BRAVE_API_KEY"), viper.GetString("recon.brave.api_key")),
|
||||
- Update reconCmd.Long to list Phase 11 sources
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestRegisterAll" -v -count=1 && go build ./cmd/...</automated>
|
||||
</verify>
|
||||
<done>RegisterAll registers 18 sources. cmd/recon.go compiles with credential wiring. Guardrail tests pass.</done>
|
||||
</task>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 2: Integration test -- SweepAll across all 18 sources</name>
|
||||
<files>pkg/recon/sources/integration_test.go</files>
|
||||
<behavior>
|
||||
- TestIntegration_AllSources_SweepAll registers all 18 sources with BaseURL overrides pointing at an httptest mux
|
||||
- SweepAll returns findings from all 18 SourceType values
|
||||
- Each SourceType (recon:github, recon:gitlab, ..., recon:google, recon:bing, recon:duckduckgo, recon:yandex, recon:brave, recon:pastebin, recon:gistpaste, recon:pastesites) has at least 1 finding
|
||||
</behavior>
|
||||
<action>
|
||||
Update `pkg/recon/sources/integration_test.go`:
|
||||
- Extend the existing httptest mux with handlers for the 8 new sources:
|
||||
|
||||
Google Custom Search: mux.HandleFunc("/customsearch/v1", ...) serves JSON `{"items":[{"link":"https://pastebin.com/abc123","title":"leak","snippet":"sk-proj-xxx"}]}`
|
||||
|
||||
Bing Web Search: mux.HandleFunc("/v7.0/search", ...) serves JSON `{"webPages":{"value":[{"url":"https://example.com/leak","name":"leak"}]}}`
|
||||
|
||||
DuckDuckGo HTML: mux.HandleFunc("/html/", ...) serves HTML with `<a class="result__a" href="https://example.com/ddg-leak">result</a>`
|
||||
|
||||
Yandex XML: mux.HandleFunc("/search/xml", ...) serves XML `<yandexsearch><response><results><grouping><group><doc><url>https://example.com/yandex-leak</url></doc></group></grouping></results></response></yandexsearch>`
|
||||
|
||||
Brave Search: mux.HandleFunc("/res/v1/web/search", ...) serves JSON `{"web":{"results":[{"url":"https://example.com/brave-leak","title":"leak"}]}}`
|
||||
|
||||
Pastebin search + raw: mux.HandleFunc("/pastebin-search", ...) serves HTML with paste links; mux.HandleFunc("/pastebin-raw/", ...) serves raw content with "sk-proj-ABC"
|
||||
|
||||
GistPaste search + raw: mux.HandleFunc("/gistpaste-search", ...) serves HTML with gist links; mux.HandleFunc("/gistpaste-raw/", ...) serves raw content with keyword
|
||||
|
||||
PasteSites: mux.HandleFunc("/pastesites-search", ...) + mux.HandleFunc("/pastesites-raw/", ...) similar pattern
|
||||
|
||||
Register all 18 sources on the engine with BaseURL=srv.URL, appropriate credentials for API sources (fake tokens). Then call eng.SweepAll and assert byType map has all 18 SourceType keys.
|
||||
|
||||
Update wantTypes to include: "recon:google", "recon:bing", "recon:duckduckgo", "recon:yandex", "recon:brave", "recon:pastebin", "recon:gistpaste", "recon:pastesites"
|
||||
|
||||
Keep the existing 10 Phase 10 source fixtures and registrations intact.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestIntegration_AllSources" -v -count=1 -timeout=60s</automated>
|
||||
</verify>
|
||||
<done>Integration test proves SweepAll emits findings from all 18 sources. Full Phase 11 wiring confirmed end-to-end.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
Full Phase 11 verification:
|
||||
```bash
|
||||
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -v -count=1 -timeout=120s && go build ./cmd/...
|
||||
```
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- RegisterAll registers 18 sources (10 Phase 10 + 8 Phase 11)
|
||||
- cmd/recon.go compiles with all credential wiring
|
||||
- Integration test passes with all 18 SourceTypes emitting findings
|
||||
- `go build ./cmd/...` succeeds
|
||||
- Guardrail test asserts exact 18-source name list
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/11-osint_search_paste/11-03-SUMMARY.md`
|
||||
</output>
|
||||
Reference in New Issue
Block a user