docs(11): create phase plan — 3 plans for search engine dorking + paste sites
This commit is contained in:
@@ -235,7 +235,12 @@ Plans:
|
||||
1. `keyhunter recon --sources=google` runs built-in dorks via Google Custom Search API or SerpAPI and returns results with the dork query that triggered each finding
|
||||
2. `keyhunter recon --sources=bing` executes dorks via Azure Cognitive Services and `--sources=duckduckgo,yandex,brave` via their respective integrations
|
||||
3. `keyhunter recon --sources=paste` queries Pastebin API and scrapes 15+ additional paste sites, feeding raw content through the detection pipeline
|
||||
**Plans**: TBD
|
||||
**Plans**: 3 plans
|
||||
|
||||
Plans:
|
||||
- [ ] 11-01-PLAN.md — GoogleDorkSource + BingDorkSource + DuckDuckGoSource + YandexSource + BraveSource (RECON-DORK-01, RECON-DORK-02, RECON-DORK-03)
|
||||
- [ ] 11-02-PLAN.md — PastebinSource + GistPasteSource + PasteSitesSource multi-paste aggregator (RECON-PASTE-01)
|
||||
- [ ] 11-03-PLAN.md — RegisterAll wiring + cmd/recon.go credentials + integration test (all Phase 11 reqs)
|
||||
|
||||
### Phase 12: OSINT IoT & Cloud Storage
|
||||
**Goal**: Users can discover exposed LLM endpoints via IoT scanners (Shodan, Censys, ZoomEye, FOFA, Netlas, BinaryEdge) and scan publicly accessible cloud storage buckets (S3, GCS, Azure Blob, MinIO, GrayHatWarfare) for leaked keys
|
||||
@@ -337,7 +342,7 @@ Phases execute in numeric order: 1 → 2 → 3 → ... → 18
|
||||
| 8. Dork Engine | 0/? | Not started | - |
|
||||
| 9. OSINT Infrastructure | 2/6 | In Progress| |
|
||||
| 10. OSINT Code Hosting | 9/9 | Complete | 2026-04-06 |
|
||||
| 11. OSINT Search & Paste | 0/? | Not started | - |
|
||||
| 11. OSINT Search & Paste | 0/3 | Planning complete | - |
|
||||
| 12. OSINT IoT & Cloud Storage | 0/? | Not started | - |
|
||||
| 13. OSINT Package Registries & Container/IaC | 0/? | Not started | - |
|
||||
| 14. OSINT CI/CD Logs, Web Archives & Frontend Leaks | 0/? | Not started | - |
|
||||
|
||||
241
.planning/phases/11-osint_search_paste/11-01-PLAN.md
Normal file
241
.planning/phases/11-osint_search_paste/11-01-PLAN.md
Normal file
@@ -0,0 +1,241 @@
|
||||
---
|
||||
phase: 11-osint-search-paste
|
||||
plan: 01
|
||||
type: execute
|
||||
wave: 1
|
||||
depends_on: []
|
||||
files_modified:
|
||||
- pkg/recon/sources/google.go
|
||||
- pkg/recon/sources/google_test.go
|
||||
- pkg/recon/sources/bing.go
|
||||
- pkg/recon/sources/bing_test.go
|
||||
- pkg/recon/sources/duckduckgo.go
|
||||
- pkg/recon/sources/duckduckgo_test.go
|
||||
- pkg/recon/sources/yandex.go
|
||||
- pkg/recon/sources/yandex_test.go
|
||||
- pkg/recon/sources/brave.go
|
||||
- pkg/recon/sources/brave_test.go
|
||||
- pkg/recon/sources/queries.go
|
||||
autonomous: true
|
||||
requirements: [RECON-DORK-01, RECON-DORK-02, RECON-DORK-03]
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "Google dorking source searches via Google Custom Search JSON API and emits findings with dork query context"
|
||||
- "Bing dorking source searches via Bing Web Search API and emits findings"
|
||||
- "DuckDuckGo, Yandex, and Brave sources each search their respective APIs/endpoints and emit findings"
|
||||
- "All five sources respect ctx cancellation and use LimiterRegistry for rate limiting"
|
||||
- "Missing API keys disable the source (Enabled=false) without error"
|
||||
artifacts:
|
||||
- path: "pkg/recon/sources/google.go"
|
||||
provides: "GoogleDorkSource implementing recon.ReconSource"
|
||||
contains: "func (s *GoogleDorkSource) Sweep"
|
||||
- path: "pkg/recon/sources/bing.go"
|
||||
provides: "BingDorkSource implementing recon.ReconSource"
|
||||
contains: "func (s *BingDorkSource) Sweep"
|
||||
- path: "pkg/recon/sources/duckduckgo.go"
|
||||
provides: "DuckDuckGoSource implementing recon.ReconSource"
|
||||
contains: "func (s *DuckDuckGoSource) Sweep"
|
||||
- path: "pkg/recon/sources/yandex.go"
|
||||
provides: "YandexSource implementing recon.ReconSource"
|
||||
contains: "func (s *YandexSource) Sweep"
|
||||
- path: "pkg/recon/sources/brave.go"
|
||||
provides: "BraveSource implementing recon.ReconSource"
|
||||
contains: "func (s *BraveSource) Sweep"
|
||||
key_links:
|
||||
- from: "pkg/recon/sources/google.go"
|
||||
to: "pkg/recon/sources/httpclient.go"
|
||||
via: "sources.Client for HTTP with retry"
|
||||
pattern: "client\\.Do"
|
||||
- from: "pkg/recon/sources/queries.go"
|
||||
to: "all five search sources"
|
||||
via: "formatQuery switch cases"
|
||||
pattern: "case \"google\"|\"bing\"|\"duckduckgo\"|\"yandex\"|\"brave\""
|
||||
---
|
||||
|
||||
<objective>
|
||||
Implement five search engine dorking ReconSource implementations: GoogleDorkSource, BingDorkSource, DuckDuckGoSource, YandexSource, and BraveSource.
|
||||
|
||||
Purpose: RECON-DORK-01/02/03 -- enable automated search engine dorking for API key leak detection across all major search engines.
|
||||
Output: Five source files + tests, updated queries.go formatQuery.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@pkg/recon/source.go
|
||||
@pkg/recon/sources/httpclient.go
|
||||
@pkg/recon/sources/queries.go
|
||||
@pkg/recon/sources/github.go (reference pattern for API-backed source)
|
||||
@pkg/recon/sources/replit.go (reference pattern for scraping source)
|
||||
|
||||
<interfaces>
|
||||
<!-- Executor needs these contracts -->
|
||||
|
||||
From pkg/recon/source.go:
|
||||
```go
|
||||
type ReconSource interface {
|
||||
Name() string
|
||||
RateLimit() rate.Limit
|
||||
Burst() int
|
||||
RespectsRobots() bool
|
||||
Enabled(cfg Config) bool
|
||||
Sweep(ctx context.Context, query string, out chan<- Finding) error
|
||||
}
|
||||
```
|
||||
|
||||
From pkg/recon/sources/httpclient.go:
|
||||
```go
|
||||
type Client struct { HTTP *http.Client; MaxRetries int; UserAgent string }
|
||||
func NewClient() *Client
|
||||
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
|
||||
```
|
||||
|
||||
From pkg/recon/sources/queries.go:
|
||||
```go
|
||||
func BuildQueries(reg *providers.Registry, source string) []string
|
||||
func formatQuery(source, keyword string) string // needs new cases
|
||||
```
|
||||
|
||||
From pkg/recon/sources/register.go:
|
||||
```go
|
||||
type SourcesConfig struct { ... } // will be extended in Plan 11-03
|
||||
```
|
||||
</interfaces>
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 1: GoogleDorkSource + BingDorkSource + formatQuery updates</name>
|
||||
<files>pkg/recon/sources/google.go, pkg/recon/sources/google_test.go, pkg/recon/sources/bing.go, pkg/recon/sources/bing_test.go, pkg/recon/sources/queries.go</files>
|
||||
<behavior>
|
||||
- GoogleDorkSource.Name() == "google"
|
||||
- GoogleDorkSource.RateLimit() == rate.Every(1*time.Second) (Google Custom Search: 100/day free, be conservative)
|
||||
- GoogleDorkSource.Burst() == 1
|
||||
- GoogleDorkSource.RespectsRobots() == false (authenticated API)
|
||||
- GoogleDorkSource.Enabled() == true only when APIKey AND CX (search engine ID) are both non-empty
|
||||
- GoogleDorkSource.Sweep() calls Google Custom Search JSON API: GET https://www.googleapis.com/customsearch/v1?key={key}&cx={cx}&q={query}&num=10
|
||||
- Each search result item emits a Finding with Source=item.link, SourceType="recon:google", Confidence="low"
|
||||
- BingDorkSource.Name() == "bing"
|
||||
- BingDorkSource.RateLimit() == rate.Every(500*time.Millisecond) (Bing allows 3 TPS on S1 tier)
|
||||
- BingDorkSource.Enabled() == true only when APIKey is non-empty
|
||||
- BingDorkSource.Sweep() calls Bing Web Search API v7: GET https://api.bing.microsoft.com/v7.0/search?q={query}&count=50 with Ocp-Apim-Subscription-Key header
|
||||
- Each webPages.value item emits Finding with Source=item.url, SourceType="recon:bing"
|
||||
- formatQuery("google", kw) returns `site:pastebin.com OR site:github.com "{kw}"` (dork-style)
|
||||
- formatQuery("bing", kw) returns same dork-style format
|
||||
- ctx cancellation aborts both sources promptly
|
||||
- Transient HTTP errors (429/5xx) are retried via sources.Client; 401 aborts sweep
|
||||
</behavior>
|
||||
<action>
|
||||
Create `pkg/recon/sources/google.go`:
|
||||
- Struct: `GoogleDorkSource` with fields: APIKey string, CX string, BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, client *Client
|
||||
- Compile-time interface assertion: `var _ recon.ReconSource = (*GoogleDorkSource)(nil)`
|
||||
- Name() returns "google"
|
||||
- RateLimit() returns rate.Every(1*time.Second)
|
||||
- Burst() returns 1
|
||||
- RespectsRobots() returns false
|
||||
- Enabled() returns s.APIKey != "" && s.CX != ""
|
||||
- Sweep(): iterate BuildQueries(registry, "google"), for each query: wait on LimiterRegistry, build GET request to `{BaseURL}/customsearch/v1?key={APIKey}&cx={CX}&q={url.QueryEscape(q)}&num=10`, set Accept: application/json, call client.Do, decode JSON response `{ items: [{ title, link, snippet }] }`, emit Finding per item with Source=link, SourceType="recon:google", ProviderName from keyword index (same pattern as githubKeywordIndex), Confidence="low". On 401 abort; on transient error continue to next query.
|
||||
- Private response structs: googleSearchResponse, googleSearchItem
|
||||
|
||||
Create `pkg/recon/sources/bing.go`:
|
||||
- Struct: `BingDorkSource` with fields: APIKey string, BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, client *Client
|
||||
- Name() returns "bing"
|
||||
- RateLimit() returns rate.Every(500*time.Millisecond)
|
||||
- Burst() returns 2
|
||||
- RespectsRobots() returns false
|
||||
- Enabled() returns s.APIKey != ""
|
||||
- Sweep(): iterate BuildQueries(registry, "bing"), for each: wait on limiter, GET `{BaseURL}/v7.0/search?q={query}&count=50`, set Ocp-Apim-Subscription-Key header, decode JSON `{ webPages: { value: [{ name, url, snippet }] } }`, emit Finding per value item with Source=url, SourceType="recon:bing". Same error handling pattern.
|
||||
- Private response structs: bingSearchResponse, bingWebPages, bingWebResult
|
||||
|
||||
Update `pkg/recon/sources/queries.go` formatQuery():
|
||||
- Add cases for "google", "bing", "duckduckgo", "yandex", "brave" that return the keyword wrapped in dork syntax: `site:pastebin.com OR site:github.com "%s"` using fmt.Sprintf with the keyword. This focuses search results on paste/code hosting sites where keys leak.
|
||||
|
||||
Create test files with httptest servers returning canned JSON fixtures. Each test:
|
||||
- Verifies Sweep emits correct number of findings
|
||||
- Verifies SourceType is correct
|
||||
- Verifies Source URLs match fixture data
|
||||
- Verifies Enabled() behavior with/without credentials
|
||||
- Verifies ctx cancellation returns error
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestGoogle|TestBing" -v -count=1</automated>
|
||||
</verify>
|
||||
<done>GoogleDorkSource and BingDorkSource pass all tests. formatQuery handles google/bing cases.</done>
|
||||
</task>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 2: DuckDuckGoSource + YandexSource + BraveSource</name>
|
||||
<files>pkg/recon/sources/duckduckgo.go, pkg/recon/sources/duckduckgo_test.go, pkg/recon/sources/yandex.go, pkg/recon/sources/yandex_test.go, pkg/recon/sources/brave.go, pkg/recon/sources/brave_test.go</files>
|
||||
<behavior>
|
||||
- DuckDuckGoSource.Name() == "duckduckgo"
|
||||
- DuckDuckGoSource.RateLimit() == rate.Every(2*time.Second) (no official API, scrape-conservative)
|
||||
- DuckDuckGoSource.RespectsRobots() == true (HTML scraper)
|
||||
- DuckDuckGoSource.Enabled() always true (no API key needed -- uses DuckDuckGo HTML search)
|
||||
- DuckDuckGoSource.Sweep() GETs `https://html.duckduckgo.com/html/?q={query}`, parses HTML for result links in <a class="result__a" href="..."> anchors, emits Findings
|
||||
- YandexSource.Name() == "yandex"
|
||||
- YandexSource.RateLimit() == rate.Every(1*time.Second)
|
||||
- YandexSource.RespectsRobots() == false (uses Yandex XML search API)
|
||||
- YandexSource.Enabled() == true only when User and APIKey are both non-empty
|
||||
- YandexSource.Sweep() GETs `https://yandex.com/search/xml?user={user}&key={key}&query={q}&l10n=en&sortby=rlv&filter=none&groupby=attr%3D%22%22.mode%3Dflat.groups-on-page%3D50`, parses XML response for <url> elements
|
||||
- BraveSource.Name() == "brave"
|
||||
- BraveSource.RateLimit() == rate.Every(1*time.Second) (Brave Search API: 1 QPS free tier)
|
||||
- BraveSource.Enabled() == true only when APIKey is non-empty
|
||||
- BraveSource.Sweep() GETs `https://api.search.brave.com/res/v1/web/search?q={query}&count=20` with X-Subscription-Token header, decodes JSON { web: { results: [{ url, title }] } }, emits Findings
|
||||
</behavior>
|
||||
<action>
|
||||
Create `pkg/recon/sources/duckduckgo.go`:
|
||||
- Struct: `DuckDuckGoSource` with BaseURL, Registry, Limiters, Client fields
|
||||
- Name() "duckduckgo", RateLimit() Every(2s), Burst() 1, RespectsRobots() true
|
||||
- Enabled() always true (credential-free, like Replit)
|
||||
- Sweep(): iterate BuildQueries(registry, "duckduckgo"), for each: wait limiter, GET `{BaseURL}/html/?q={query}`, parse HTML using golang.org/x/net/html (same as Replit pattern), extract href from `<a class="result__a">` or `<a class="result__url">` elements. Use a regex or attribute check: look for <a> tags whose class contains "result__a". Emit Finding with Source=extracted URL, SourceType="recon:duckduckgo". Deduplicate results within the same query.
|
||||
|
||||
Create `pkg/recon/sources/yandex.go`:
|
||||
- Struct: `YandexSource` with User, APIKey, BaseURL, Registry, Limiters, client fields
|
||||
- Name() "yandex", RateLimit() Every(1s), Burst() 1, RespectsRobots() false
|
||||
- Enabled() returns s.User != "" && s.APIKey != ""
|
||||
- Sweep(): iterate BuildQueries, for each: wait limiter, GET `{BaseURL}/search/xml?user={User}&key={APIKey}&query={url.QueryEscape(q)}&l10n=en&sortby=rlv&filter=none&groupby=attr%3D%22%22.mode%3Dflat.groups-on-page%3D50`, decode XML using encoding/xml. Response structure: `<yandexsearch><response><results><grouping><group><doc><url>...</url></doc></group></grouping></results></response></yandexsearch>`. Emit Finding per <url>. SourceType="recon:yandex".
|
||||
|
||||
Create `pkg/recon/sources/brave.go`:
|
||||
- Struct: `BraveSource` with APIKey, BaseURL, Registry, Limiters, client fields
|
||||
- Name() "brave", RateLimit() Every(1s), Burst() 1, RespectsRobots() false
|
||||
- Enabled() returns s.APIKey != ""
|
||||
- Sweep(): iterate BuildQueries, for each: wait limiter, GET `{BaseURL}/res/v1/web/search?q={query}&count=20`, set X-Subscription-Token header to APIKey, Accept: application/json. Decode JSON `{ web: { results: [{ url, title, description }] } }`. Emit Finding per result. SourceType="recon:brave".
|
||||
|
||||
All three follow the same error handling pattern as Task 1: 401 aborts, transient errors continue, ctx cancellation returns immediately.
|
||||
|
||||
Create test files with httptest servers. DuckDuckGo test serves HTML fixture with result anchors. Yandex test serves XML fixture. Brave test serves JSON fixture. Each test covers: Sweep emits findings, SourceType correct, Enabled behavior, ctx cancellation.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestDuckDuckGo|TestYandex|TestBrave" -v -count=1</automated>
|
||||
</verify>
|
||||
<done>DuckDuckGoSource, YandexSource, and BraveSource pass all tests. All five search sources complete.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
All five search engine sources compile and pass unit tests:
|
||||
```bash
|
||||
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestGoogle|TestBing|TestDuckDuckGo|TestYandex|TestBrave" -v -count=1
|
||||
```
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- 5 new source files exist in pkg/recon/sources/ (google.go, bing.go, duckduckgo.go, yandex.go, brave.go)
|
||||
- Each source implements recon.ReconSource with compile-time assertion
|
||||
- Each has a corresponding _test.go file with httptest-based tests
|
||||
- formatQuery in queries.go handles all 5 new source names
|
||||
- All tests pass
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/11-osint_search_paste/11-01-SUMMARY.md`
|
||||
</output>
|
||||
199
.planning/phases/11-osint_search_paste/11-02-PLAN.md
Normal file
199
.planning/phases/11-osint_search_paste/11-02-PLAN.md
Normal file
@@ -0,0 +1,199 @@
|
||||
---
|
||||
phase: 11-osint-search-paste
|
||||
plan: 02
|
||||
type: execute
|
||||
wave: 1
|
||||
depends_on: []
|
||||
files_modified:
|
||||
- pkg/recon/sources/pastebin.go
|
||||
- pkg/recon/sources/pastebin_test.go
|
||||
- pkg/recon/sources/gistpaste.go
|
||||
- pkg/recon/sources/gistpaste_test.go
|
||||
- pkg/recon/sources/pastesites.go
|
||||
- pkg/recon/sources/pastesites_test.go
|
||||
autonomous: true
|
||||
requirements: [RECON-PASTE-01]
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "PastebinSource scrapes Pastebin search results and emits findings for pastes containing provider keywords"
|
||||
- "GistPasteSource searches public GitHub Gists via unauthenticated scraping (distinct from Phase 10 GistSource which uses API)"
|
||||
- "PasteSitesSource aggregates results from dpaste, paste.ee, rentry.co, ix.io, and similar sites"
|
||||
- "All paste sources feed raw content through keyword matching against the provider registry"
|
||||
- "Missing credentials disable sources that need them; credential-free sources are always enabled"
|
||||
artifacts:
|
||||
- path: "pkg/recon/sources/pastebin.go"
|
||||
provides: "PastebinSource implementing recon.ReconSource"
|
||||
contains: "func (s *PastebinSource) Sweep"
|
||||
- path: "pkg/recon/sources/gistpaste.go"
|
||||
provides: "GistPasteSource implementing recon.ReconSource"
|
||||
contains: "func (s *GistPasteSource) Sweep"
|
||||
- path: "pkg/recon/sources/pastesites.go"
|
||||
provides: "PasteSitesSource implementing recon.ReconSource with multi-site sub-platform pattern"
|
||||
contains: "func (s *PasteSitesSource) Sweep"
|
||||
key_links:
|
||||
- from: "pkg/recon/sources/pastebin.go"
|
||||
to: "pkg/recon/sources/httpclient.go"
|
||||
via: "sources.Client for HTTP with retry"
|
||||
pattern: "client\\.Do"
|
||||
- from: "pkg/recon/sources/pastesites.go"
|
||||
to: "providers.Registry"
|
||||
via: "keyword matching on paste content"
|
||||
pattern: "keywordSet|BuildQueries"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Implement three paste site ReconSource implementations: PastebinSource, GistPasteSource, and PasteSitesSource (multi-site aggregator for dpaste, paste.ee, rentry.co, ix.io, etc.).
|
||||
|
||||
Purpose: RECON-PASTE-01 -- detect API key leaks across public paste sites.
|
||||
Output: Three source files + tests covering paste site scanning.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@pkg/recon/source.go
|
||||
@pkg/recon/sources/httpclient.go
|
||||
@pkg/recon/sources/queries.go
|
||||
@pkg/recon/sources/gist.go (reference: Phase 10 GistSource uses GitHub API -- this plan's GistPasteSource is a scraping alternative)
|
||||
@pkg/recon/sources/replit.go (reference pattern for HTML scraping source)
|
||||
@pkg/recon/sources/sandboxes.go (reference pattern for multi-platform aggregator)
|
||||
|
||||
<interfaces>
|
||||
From pkg/recon/source.go:
|
||||
```go
|
||||
type ReconSource interface {
|
||||
Name() string
|
||||
RateLimit() rate.Limit
|
||||
Burst() int
|
||||
RespectsRobots() bool
|
||||
Enabled(cfg Config) bool
|
||||
Sweep(ctx context.Context, query string, out chan<- Finding) error
|
||||
}
|
||||
```
|
||||
|
||||
From pkg/recon/sources/httpclient.go:
|
||||
```go
|
||||
func NewClient() *Client
|
||||
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
|
||||
```
|
||||
|
||||
From pkg/recon/sources/gist.go (existing Phase 10 GistSource -- avoid name collision):
|
||||
```go
|
||||
type GistSource struct { ... } // Name() == "gist" -- already taken
|
||||
func (s *GistSource) keywordSet() map[string]string // pattern to reuse
|
||||
```
|
||||
</interfaces>
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 1: PastebinSource + GistPasteSource</name>
|
||||
<files>pkg/recon/sources/pastebin.go, pkg/recon/sources/pastebin_test.go, pkg/recon/sources/gistpaste.go, pkg/recon/sources/gistpaste_test.go</files>
|
||||
<behavior>
|
||||
- PastebinSource.Name() == "pastebin"
|
||||
- PastebinSource.RateLimit() == rate.Every(3*time.Second) (conservative -- Pastebin scraping)
|
||||
- PastebinSource.Burst() == 1
|
||||
- PastebinSource.RespectsRobots() == true (HTML scraper)
|
||||
- PastebinSource.Enabled() always true (credential-free Google dorking of pastebin.com)
|
||||
- PastebinSource.Sweep(): For each provider keyword, scrape Google (via the same DuckDuckGo HTML endpoint as a proxy to avoid Google ToS) with query `site:pastebin.com "{keyword}"`. Parse result links. For each pastebin.com URL found, fetch the raw paste content via /raw/{paste_id} endpoint, scan content for keyword matches, emit Finding with Source=paste URL, SourceType="recon:pastebin", ProviderName from match.
|
||||
- GistPasteSource.Name() == "gistpaste" (not "gist" -- that's Phase 10's API source)
|
||||
- GistPasteSource.RateLimit() == rate.Every(3*time.Second)
|
||||
- GistPasteSource.RespectsRobots() == true (HTML scraper)
|
||||
- GistPasteSource.Enabled() always true (credential-free)
|
||||
- GistPasteSource.Sweep(): Scrape gist.github.com/search?q={keyword} (public search, no auth needed), parse HTML for gist links, fetch raw content, keyword-match against registry
|
||||
</behavior>
|
||||
<action>
|
||||
Create `pkg/recon/sources/pastebin.go`:
|
||||
- Struct: `PastebinSource` with BaseURL, Registry, Limiters, Client fields
|
||||
- Name() "pastebin", RateLimit() Every(3s), Burst() 1, RespectsRobots() true
|
||||
- Enabled() always true
|
||||
- Sweep(): Use a two-phase approach:
|
||||
Phase A: Search -- iterate BuildQueries(registry, "pastebin"). For each keyword, GET `{BaseURL}/search?q={url.QueryEscape(keyword)}` (Pastebin's own search). Parse HTML for paste links matching `^/[A-Za-z0-9]{8}$` pattern (Pastebin paste IDs are 8 alphanumeric chars). Collect unique paste IDs.
|
||||
Phase B: Fetch+Scan -- for each paste ID: wait limiter, GET `{BaseURL}/raw/{pasteID}`, read body (limit 256KB), scan content against keywordSet() (same pattern as GistSource.keywordSet). If any keyword matches, emit Finding with Source=`{BaseURL}/{pasteID}`, SourceType="recon:pastebin", ProviderName from matched keyword.
|
||||
- Helper: `pastebinKeywordSet(reg)` returning map[string]string (keyword -> provider name), same as GistSource pattern.
|
||||
|
||||
Create `pkg/recon/sources/gistpaste.go`:
|
||||
- Struct: `GistPasteSource` with BaseURL, Registry, Limiters, Client fields
|
||||
- Name() "gistpaste", RateLimit() Every(3s), Burst() 1, RespectsRobots() true
|
||||
- Enabled() always true
|
||||
- Sweep(): iterate BuildQueries(registry, "gistpaste"). For each keyword, GET `{BaseURL}/search?q={url.QueryEscape(keyword)}` (gist.github.com search). Parse HTML for gist links matching `^/[^/]+/[a-f0-9]+$` pattern. For each gist link, construct raw URL `{BaseURL}{gistPath}/raw` and fetch content (limit 256KB). Keyword-match and emit Finding with SourceType="recon:gistpaste".
|
||||
|
||||
Tests: httptest servers serving HTML search results + raw paste content fixtures. Verify findings emitted with correct SourceType, Source URL, and ProviderName.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestPastebin|TestGistPaste" -v -count=1</automated>
|
||||
</verify>
|
||||
<done>PastebinSource and GistPasteSource compile, pass all tests, handle ctx cancellation.</done>
|
||||
</task>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 2: PasteSitesSource (multi-paste aggregator)</name>
|
||||
<files>pkg/recon/sources/pastesites.go, pkg/recon/sources/pastesites_test.go</files>
|
||||
<behavior>
|
||||
- PasteSitesSource.Name() == "pastesites"
|
||||
- PasteSitesSource.RateLimit() == rate.Every(3*time.Second)
|
||||
- PasteSitesSource.RespectsRobots() == true
|
||||
- PasteSitesSource.Enabled() always true (all credential-free)
|
||||
- PasteSitesSource.Sweep() iterates across sub-platforms: dpaste.org, paste.ee, rentry.co, ix.io, hastebin.com
|
||||
- Each sub-platform has: Name, SearchURL pattern, result link regex, and optional raw URL construction
|
||||
- Sweep emits at least one Finding per platform when fixture data matches keywords
|
||||
- ctx cancellation stops the sweep promptly
|
||||
</behavior>
|
||||
<action>
|
||||
Create `pkg/recon/sources/pastesites.go` following the SandboxesSource multi-platform pattern from pkg/recon/sources/sandboxes.go:
|
||||
|
||||
- Define `pastePlatform` struct: Name string, SearchPath string (with %s for query), ResultLinkRegex string, RawPathTemplate string (optional, for fetching raw content), IsJSON bool
|
||||
- Default platforms:
|
||||
1. dpaste: SearchPath="/search/?q=%s", result links matching `^/[A-Za-z0-9]+$`, raw via `/{id}/raw`
|
||||
2. paste.ee: SearchPath="/search?q=%s", result links matching `^/p/[A-Za-z0-9]+$`, raw via `/r/{id}`
|
||||
3. rentry.co: SearchPath="/search?q=%s", result links matching `^/[a-z0-9-]+$`, raw via `/{slug}/raw`
|
||||
4. ix.io: No search -- skip (ix.io has no search). Remove from list.
|
||||
5. hastebin: SearchPath="/search?q=%s", result links matching `^/[a-z]+$`, raw via `/raw/{id}`
|
||||
|
||||
- Struct: `PasteSitesSource` with Platforms []pastePlatform, BaseURL string (test override), Registry, Limiters, Client fields
|
||||
- Name() "pastesites", RateLimit() Every(3s), Burst() 1, RespectsRobots() true
|
||||
- Enabled() always true
|
||||
- Sweep(): For each platform, for each keyword from BuildQueries(registry, "pastesites"):
|
||||
1. Wait limiter
|
||||
2. GET `{platform base or BaseURL}{searchPath with keyword}`
|
||||
3. Parse HTML, extract result links matching platform regex
|
||||
4. For each result link: wait limiter, GET raw content URL, read body (256KB limit), keyword-match against registry
|
||||
5. Emit Finding with Source=paste URL, SourceType="recon:pastesites", ProviderName from keyword match
|
||||
- Default platforms populated in a `defaultPastePlatforms()` function. Tests override Platforms to use httptest URLs.
|
||||
|
||||
Test: httptest mux serving search HTML + raw content for each sub-platform. Verify at least one Finding per platform fixture. Verify SourceType="recon:pastesites" on all.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestPasteSites" -v -count=1</automated>
|
||||
</verify>
|
||||
<done>PasteSitesSource aggregates across multiple paste sites, keyword-matches content, emits findings with correct SourceType.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
All paste sources compile and pass unit tests:
|
||||
```bash
|
||||
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestPastebin|TestGistPaste|TestPasteSites" -v -count=1
|
||||
```
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- 3 new source files exist (pastebin.go, gistpaste.go, pastesites.go) with tests
|
||||
- Each implements recon.ReconSource with compile-time assertion
|
||||
- PasteSitesSource covers 3+ paste sub-platforms
|
||||
- Keyword matching uses provider Registry for ProviderName population
|
||||
- All tests pass
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/11-osint_search_paste/11-02-SUMMARY.md`
|
||||
</output>
|
||||
221
.planning/phases/11-osint_search_paste/11-03-PLAN.md
Normal file
221
.planning/phases/11-osint_search_paste/11-03-PLAN.md
Normal file
@@ -0,0 +1,221 @@
|
||||
---
|
||||
phase: 11-osint-search-paste
|
||||
plan: 03
|
||||
type: execute
|
||||
wave: 2
|
||||
depends_on: ["11-01", "11-02"]
|
||||
files_modified:
|
||||
- pkg/recon/sources/register.go
|
||||
- pkg/recon/sources/register_test.go
|
||||
- pkg/recon/sources/integration_test.go
|
||||
- cmd/recon.go
|
||||
autonomous: true
|
||||
requirements: [RECON-DORK-01, RECON-DORK-02, RECON-DORK-03, RECON-PASTE-01]
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "RegisterAll wires all 8 new Phase 11 sources onto the recon engine alongside the 10 Phase 10 sources"
|
||||
- "cmd/recon.go reads Google/Bing/Yandex/Brave API keys from env vars and viper config"
|
||||
- "keyhunter recon list shows all 18 sources (10 Phase 10 + 8 Phase 11)"
|
||||
- "Integration test with httptest fixtures proves SweepAll emits findings from all 18 source types"
|
||||
- "Sources with missing credentials are registered but Enabled()==false"
|
||||
artifacts:
|
||||
- path: "pkg/recon/sources/register.go"
|
||||
provides: "RegisterAll extended with Phase 11 sources"
|
||||
contains: "GoogleDorkSource"
|
||||
- path: "pkg/recon/sources/register_test.go"
|
||||
provides: "Guardrail test asserting 18 sources registered"
|
||||
contains: "18"
|
||||
- path: "pkg/recon/sources/integration_test.go"
|
||||
provides: "SweepAll integration test covering all 18 sources"
|
||||
contains: "recon:google"
|
||||
- path: "cmd/recon.go"
|
||||
provides: "Credential wiring for search engine API keys"
|
||||
contains: "GoogleAPIKey"
|
||||
key_links:
|
||||
- from: "pkg/recon/sources/register.go"
|
||||
to: "pkg/recon/sources/google.go"
|
||||
via: "RegisterAll calls engine.Register(GoogleDorkSource)"
|
||||
pattern: "GoogleDorkSource"
|
||||
- from: "cmd/recon.go"
|
||||
to: "pkg/recon/sources/register.go"
|
||||
via: "SourcesConfig credential fields"
|
||||
pattern: "GoogleAPIKey|GoogleCX|BingAPIKey|YandexUser|YandexAPIKey|BraveAPIKey"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Wire all 8 Phase 11 sources into RegisterAll, extend SourcesConfig with search engine credentials, update cmd/recon.go for env/viper credential lookup, and create the integration test proving all 18 sources work end-to-end via SweepAll.
|
||||
|
||||
Purpose: Complete Phase 11 by connecting all new sources to the engine and proving the full 18-source sweep works.
|
||||
Output: Updated register.go, register_test.go, integration_test.go, cmd/recon.go.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@pkg/recon/sources/register.go
|
||||
@pkg/recon/sources/register_test.go
|
||||
@pkg/recon/sources/integration_test.go
|
||||
@cmd/recon.go
|
||||
|
||||
<interfaces>
|
||||
From pkg/recon/sources/register.go (current):
|
||||
```go
|
||||
type SourcesConfig struct {
|
||||
GitHubToken string
|
||||
GitLabToken string
|
||||
BitbucketToken string
|
||||
BitbucketWorkspace string
|
||||
CodebergToken string
|
||||
HuggingFaceToken string
|
||||
KaggleUser string
|
||||
KaggleKey string
|
||||
Registry *providers.Registry
|
||||
Limiters *recon.LimiterRegistry
|
||||
}
|
||||
func RegisterAll(engine *recon.Engine, cfg SourcesConfig)
|
||||
```
|
||||
|
||||
From cmd/recon.go (current):
|
||||
```go
|
||||
func buildReconEngine() *recon.Engine // constructs SourcesConfig, calls RegisterAll
|
||||
func firstNonEmpty(a, b string) string
|
||||
```
|
||||
|
||||
New sources from Plan 11-01 (to be registered):
|
||||
```go
|
||||
type GoogleDorkSource struct { APIKey, CX, BaseURL string; Registry; Limiters; client }
|
||||
type BingDorkSource struct { APIKey, BaseURL string; Registry; Limiters; client }
|
||||
type DuckDuckGoSource struct { BaseURL string; Registry; Limiters; Client }
|
||||
type YandexSource struct { User, APIKey, BaseURL string; Registry; Limiters; client }
|
||||
type BraveSource struct { APIKey, BaseURL string; Registry; Limiters; client }
|
||||
```
|
||||
|
||||
New sources from Plan 11-02 (to be registered):
|
||||
```go
|
||||
type PastebinSource struct { BaseURL string; Registry; Limiters; Client }
|
||||
type GistPasteSource struct { BaseURL string; Registry; Limiters; Client }
|
||||
type PasteSitesSource struct { Platforms; BaseURL string; Registry; Limiters; Client }
|
||||
```
|
||||
</interfaces>
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 1: Extend SourcesConfig + RegisterAll + cmd/recon.go credential wiring</name>
|
||||
<files>pkg/recon/sources/register.go, pkg/recon/sources/register_test.go, cmd/recon.go</files>
|
||||
<behavior>
|
||||
- SourcesConfig gains 6 new fields: GoogleAPIKey, GoogleCX, BingAPIKey, YandexUser, YandexAPIKey, BraveAPIKey
|
||||
- RegisterAll registers 18 sources total (10 Phase 10 + 8 Phase 11)
|
||||
- RegisterAll with nil engine is still a no-op
|
||||
- TestRegisterAll_WiresAllEighteenSources asserts eng.List() contains all 18 names sorted
|
||||
- TestRegisterAll_MissingCredsStillRegistered asserts 18 sources with empty config
|
||||
- buildReconEngine reads: GOOGLE_API_KEY / recon.google.api_key, GOOGLE_CX / recon.google.cx, BING_API_KEY / recon.bing.api_key, YANDEX_USER / recon.yandex.user, YANDEX_API_KEY / recon.yandex.api_key, BRAVE_API_KEY / recon.brave.api_key
|
||||
- reconCmd Long description updated to mention Phase 11 sources
|
||||
</behavior>
|
||||
<action>
|
||||
Update `pkg/recon/sources/register.go`:
|
||||
- Add to SourcesConfig: GoogleAPIKey, GoogleCX, BingAPIKey, YandexUser, YandexAPIKey, BraveAPIKey (all string)
|
||||
- Add Phase 11 registrations to RegisterAll after the Phase 10 block:
|
||||
```
|
||||
// Phase 11: Search engine dorking sources.
|
||||
engine.Register(&GoogleDorkSource{APIKey: cfg.GoogleAPIKey, CX: cfg.GoogleCX, Registry: reg, Limiters: lim})
|
||||
engine.Register(&BingDorkSource{APIKey: cfg.BingAPIKey, Registry: reg, Limiters: lim})
|
||||
engine.Register(&DuckDuckGoSource{Registry: reg, Limiters: lim})
|
||||
engine.Register(&YandexSource{User: cfg.YandexUser, APIKey: cfg.YandexAPIKey, Registry: reg, Limiters: lim})
|
||||
engine.Register(&BraveSource{APIKey: cfg.BraveAPIKey, Registry: reg, Limiters: lim})
|
||||
|
||||
// Phase 11: Paste site sources.
|
||||
engine.Register(&PastebinSource{Registry: reg, Limiters: lim})
|
||||
engine.Register(&GistPasteSource{Registry: reg, Limiters: lim})
|
||||
engine.Register(&PasteSitesSource{Registry: reg, Limiters: lim})
|
||||
```
|
||||
- Update doc comment on RegisterAll to say "Phase 10 + Phase 11" and total "18 sources"
|
||||
|
||||
Update `pkg/recon/sources/register_test.go`:
|
||||
- TestRegisterAll_WiresAllEighteenSources: want list = sorted 18 names: ["bing", "bitbucket", "brave", "codeberg", "codesandbox", "duckduckgo", "gist", "gistpaste", "github", "gitlab", "google", "huggingface", "kaggle", "pastebin", "pastesites", "replit", "sandboxes", "yandex"]
|
||||
- TestRegisterAll_MissingCredsStillRegistered: assert n == 18
|
||||
|
||||
Update `cmd/recon.go`:
|
||||
- Add to SourcesConfig construction in buildReconEngine():
|
||||
GoogleAPIKey: firstNonEmpty(os.Getenv("GOOGLE_API_KEY"), viper.GetString("recon.google.api_key")),
|
||||
GoogleCX: firstNonEmpty(os.Getenv("GOOGLE_CX"), viper.GetString("recon.google.cx")),
|
||||
BingAPIKey: firstNonEmpty(os.Getenv("BING_API_KEY"), viper.GetString("recon.bing.api_key")),
|
||||
YandexUser: firstNonEmpty(os.Getenv("YANDEX_USER"), viper.GetString("recon.yandex.user")),
|
||||
YandexAPIKey: firstNonEmpty(os.Getenv("YANDEX_API_KEY"), viper.GetString("recon.yandex.api_key")),
|
||||
BraveAPIKey: firstNonEmpty(os.Getenv("BRAVE_API_KEY"), viper.GetString("recon.brave.api_key")),
|
||||
- Update reconCmd.Long to list Phase 11 sources
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestRegisterAll" -v -count=1 && go build ./cmd/...</automated>
|
||||
</verify>
|
||||
<done>RegisterAll registers 18 sources. cmd/recon.go compiles with credential wiring. Guardrail tests pass.</done>
|
||||
</task>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 2: Integration test -- SweepAll across all 18 sources</name>
|
||||
<files>pkg/recon/sources/integration_test.go</files>
|
||||
<behavior>
|
||||
- TestIntegration_AllSources_SweepAll registers all 18 sources with BaseURL overrides pointing at an httptest mux
|
||||
- SweepAll returns findings from all 18 SourceType values
|
||||
- Each SourceType (recon:github, recon:gitlab, ..., recon:google, recon:bing, recon:duckduckgo, recon:yandex, recon:brave, recon:pastebin, recon:gistpaste, recon:pastesites) has at least 1 finding
|
||||
</behavior>
|
||||
<action>
|
||||
Update `pkg/recon/sources/integration_test.go`:
|
||||
- Extend the existing httptest mux with handlers for the 8 new sources:
|
||||
|
||||
Google Custom Search: mux.HandleFunc("/customsearch/v1", ...) serves JSON `{"items":[{"link":"https://pastebin.com/abc123","title":"leak","snippet":"sk-proj-xxx"}]}`
|
||||
|
||||
Bing Web Search: mux.HandleFunc("/v7.0/search", ...) serves JSON `{"webPages":{"value":[{"url":"https://example.com/leak","name":"leak"}]}}`
|
||||
|
||||
DuckDuckGo HTML: mux.HandleFunc("/html/", ...) serves HTML with `<a class="result__a" href="https://example.com/ddg-leak">result</a>`
|
||||
|
||||
Yandex XML: mux.HandleFunc("/search/xml", ...) serves XML `<yandexsearch><response><results><grouping><group><doc><url>https://example.com/yandex-leak</url></doc></group></grouping></results></response></yandexsearch>`
|
||||
|
||||
Brave Search: mux.HandleFunc("/res/v1/web/search", ...) serves JSON `{"web":{"results":[{"url":"https://example.com/brave-leak","title":"leak"}]}}`
|
||||
|
||||
Pastebin search + raw: mux.HandleFunc("/pastebin-search", ...) serves HTML with paste links; mux.HandleFunc("/pastebin-raw/", ...) serves raw content with "sk-proj-ABC"
|
||||
|
||||
GistPaste search + raw: mux.HandleFunc("/gistpaste-search", ...) serves HTML with gist links; mux.HandleFunc("/gistpaste-raw/", ...) serves raw content with keyword
|
||||
|
||||
PasteSites: mux.HandleFunc("/pastesites-search", ...) + mux.HandleFunc("/pastesites-raw/", ...) similar pattern
|
||||
|
||||
Register all 18 sources on the engine with BaseURL=srv.URL, appropriate credentials for API sources (fake tokens). Then call eng.SweepAll and assert byType map has all 18 SourceType keys.
|
||||
|
||||
Update wantTypes to include: "recon:google", "recon:bing", "recon:duckduckgo", "recon:yandex", "recon:brave", "recon:pastebin", "recon:gistpaste", "recon:pastesites"
|
||||
|
||||
Keep the existing 10 Phase 10 source fixtures and registrations intact.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestIntegration_AllSources" -v -count=1 -timeout=60s</automated>
|
||||
</verify>
|
||||
<done>Integration test proves SweepAll emits findings from all 18 sources. Full Phase 11 wiring confirmed end-to-end.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
Full Phase 11 verification:
|
||||
```bash
|
||||
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -v -count=1 -timeout=120s && go build ./cmd/...
|
||||
```
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- RegisterAll registers 18 sources (10 Phase 10 + 8 Phase 11)
|
||||
- cmd/recon.go compiles with all credential wiring
|
||||
- Integration test passes with all 18 SourceTypes emitting findings
|
||||
- `go build ./cmd/...` succeeds
|
||||
- Guardrail test asserts exact 18-source name list
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/11-osint_search_paste/11-03-SUMMARY.md`
|
||||
</output>
|
||||
Reference in New Issue
Block a user