200 lines
10 KiB
Markdown
200 lines
10 KiB
Markdown
---
|
|
phase: 11-osint-search-paste
|
|
plan: 02
|
|
type: execute
|
|
wave: 1
|
|
depends_on: []
|
|
files_modified:
|
|
- pkg/recon/sources/pastebin.go
|
|
- pkg/recon/sources/pastebin_test.go
|
|
- pkg/recon/sources/gistpaste.go
|
|
- pkg/recon/sources/gistpaste_test.go
|
|
- pkg/recon/sources/pastesites.go
|
|
- pkg/recon/sources/pastesites_test.go
|
|
autonomous: true
|
|
requirements: [RECON-PASTE-01]
|
|
|
|
must_haves:
|
|
truths:
|
|
- "PastebinSource scrapes Pastebin search results and emits findings for pastes containing provider keywords"
|
|
- "GistPasteSource searches public GitHub Gists via unauthenticated scraping (distinct from Phase 10 GistSource which uses API)"
|
|
- "PasteSitesSource aggregates results from dpaste, paste.ee, rentry.co, ix.io, and similar sites"
|
|
- "All paste sources feed raw content through keyword matching against the provider registry"
|
|
- "Missing credentials disable sources that need them; credential-free sources are always enabled"
|
|
artifacts:
|
|
- path: "pkg/recon/sources/pastebin.go"
|
|
provides: "PastebinSource implementing recon.ReconSource"
|
|
contains: "func (s *PastebinSource) Sweep"
|
|
- path: "pkg/recon/sources/gistpaste.go"
|
|
provides: "GistPasteSource implementing recon.ReconSource"
|
|
contains: "func (s *GistPasteSource) Sweep"
|
|
- path: "pkg/recon/sources/pastesites.go"
|
|
provides: "PasteSitesSource implementing recon.ReconSource with multi-site sub-platform pattern"
|
|
contains: "func (s *PasteSitesSource) Sweep"
|
|
key_links:
|
|
- from: "pkg/recon/sources/pastebin.go"
|
|
to: "pkg/recon/sources/httpclient.go"
|
|
via: "sources.Client for HTTP with retry"
|
|
pattern: "client\\.Do"
|
|
- from: "pkg/recon/sources/pastesites.go"
|
|
to: "providers.Registry"
|
|
via: "keyword matching on paste content"
|
|
pattern: "keywordSet|BuildQueries"
|
|
---
|
|
|
|
<objective>
|
|
Implement three paste site ReconSource implementations: PastebinSource, GistPasteSource, and PasteSitesSource (multi-site aggregator for dpaste, paste.ee, rentry.co, ix.io, etc.).
|
|
|
|
Purpose: RECON-PASTE-01 -- detect API key leaks across public paste sites.
|
|
Output: Three source files + tests covering paste site scanning.
|
|
</objective>
|
|
|
|
<execution_context>
|
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
|
</execution_context>
|
|
|
|
<context>
|
|
@.planning/PROJECT.md
|
|
@.planning/ROADMAP.md
|
|
@.planning/STATE.md
|
|
@pkg/recon/source.go
|
|
@pkg/recon/sources/httpclient.go
|
|
@pkg/recon/sources/queries.go
|
|
@pkg/recon/sources/gist.go (reference: Phase 10 GistSource uses GitHub API -- this plan's GistPasteSource is a scraping alternative)
|
|
@pkg/recon/sources/replit.go (reference pattern for HTML scraping source)
|
|
@pkg/recon/sources/sandboxes.go (reference pattern for multi-platform aggregator)
|
|
|
|
<interfaces>
|
|
From pkg/recon/source.go:
|
|
```go
|
|
type ReconSource interface {
|
|
Name() string
|
|
RateLimit() rate.Limit
|
|
Burst() int
|
|
RespectsRobots() bool
|
|
Enabled(cfg Config) bool
|
|
Sweep(ctx context.Context, query string, out chan<- Finding) error
|
|
}
|
|
```
|
|
|
|
From pkg/recon/sources/httpclient.go:
|
|
```go
|
|
func NewClient() *Client
|
|
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
|
|
```
|
|
|
|
From pkg/recon/sources/gist.go (existing Phase 10 GistSource -- avoid name collision):
|
|
```go
|
|
type GistSource struct { ... } // Name() == "gist" -- already taken
|
|
func (s *GistSource) keywordSet() map[string]string // pattern to reuse
|
|
```
|
|
</interfaces>
|
|
</context>
|
|
|
|
<tasks>
|
|
|
|
<task type="auto" tdd="true">
|
|
<name>Task 1: PastebinSource + GistPasteSource</name>
|
|
<files>pkg/recon/sources/pastebin.go, pkg/recon/sources/pastebin_test.go, pkg/recon/sources/gistpaste.go, pkg/recon/sources/gistpaste_test.go</files>
|
|
<behavior>
|
|
- PastebinSource.Name() == "pastebin"
|
|
- PastebinSource.RateLimit() == rate.Every(3*time.Second) (conservative -- Pastebin scraping)
|
|
- PastebinSource.Burst() == 1
|
|
- PastebinSource.RespectsRobots() == true (HTML scraper)
|
|
- PastebinSource.Enabled() always true (credential-free Google dorking of pastebin.com)
|
|
- PastebinSource.Sweep(): For each provider keyword, scrape Google (via the same DuckDuckGo HTML endpoint as a proxy to avoid Google ToS) with query `site:pastebin.com "{keyword}"`. Parse result links. For each pastebin.com URL found, fetch the raw paste content via /raw/{paste_id} endpoint, scan content for keyword matches, emit Finding with Source=paste URL, SourceType="recon:pastebin", ProviderName from match.
|
|
- GistPasteSource.Name() == "gistpaste" (not "gist" -- that's Phase 10's API source)
|
|
- GistPasteSource.RateLimit() == rate.Every(3*time.Second)
|
|
- GistPasteSource.RespectsRobots() == true (HTML scraper)
|
|
- GistPasteSource.Enabled() always true (credential-free)
|
|
- GistPasteSource.Sweep(): Scrape gist.github.com/search?q={keyword} (public search, no auth needed), parse HTML for gist links, fetch raw content, keyword-match against registry
|
|
</behavior>
|
|
<action>
|
|
Create `pkg/recon/sources/pastebin.go`:
|
|
- Struct: `PastebinSource` with BaseURL, Registry, Limiters, Client fields
|
|
- Name() "pastebin", RateLimit() Every(3s), Burst() 1, RespectsRobots() true
|
|
- Enabled() always true
|
|
- Sweep(): Use a two-phase approach:
|
|
Phase A: Search -- iterate BuildQueries(registry, "pastebin"). For each keyword, GET `{BaseURL}/search?q={url.QueryEscape(keyword)}` (Pastebin's own search). Parse HTML for paste links matching `^/[A-Za-z0-9]{8}$` pattern (Pastebin paste IDs are 8 alphanumeric chars). Collect unique paste IDs.
|
|
Phase B: Fetch+Scan -- for each paste ID: wait limiter, GET `{BaseURL}/raw/{pasteID}`, read body (limit 256KB), scan content against keywordSet() (same pattern as GistSource.keywordSet). If any keyword matches, emit Finding with Source=`{BaseURL}/{pasteID}`, SourceType="recon:pastebin", ProviderName from matched keyword.
|
|
- Helper: `pastebinKeywordSet(reg)` returning map[string]string (keyword -> provider name), same as GistSource pattern.
|
|
|
|
Create `pkg/recon/sources/gistpaste.go`:
|
|
- Struct: `GistPasteSource` with BaseURL, Registry, Limiters, Client fields
|
|
- Name() "gistpaste", RateLimit() Every(3s), Burst() 1, RespectsRobots() true
|
|
- Enabled() always true
|
|
- Sweep(): iterate BuildQueries(registry, "gistpaste"). For each keyword, GET `{BaseURL}/search?q={url.QueryEscape(keyword)}` (gist.github.com search). Parse HTML for gist links matching `^/[^/]+/[a-f0-9]+$` pattern. For each gist link, construct raw URL `{BaseURL}{gistPath}/raw` and fetch content (limit 256KB). Keyword-match and emit Finding with SourceType="recon:gistpaste".
|
|
|
|
Tests: httptest servers serving HTML search results + raw paste content fixtures. Verify findings emitted with correct SourceType, Source URL, and ProviderName.
|
|
</action>
|
|
<verify>
|
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestPastebin|TestGistPaste" -v -count=1</automated>
|
|
</verify>
|
|
<done>PastebinSource and GistPasteSource compile, pass all tests, handle ctx cancellation.</done>
|
|
</task>
|
|
|
|
<task type="auto" tdd="true">
|
|
<name>Task 2: PasteSitesSource (multi-paste aggregator)</name>
|
|
<files>pkg/recon/sources/pastesites.go, pkg/recon/sources/pastesites_test.go</files>
|
|
<behavior>
|
|
- PasteSitesSource.Name() == "pastesites"
|
|
- PasteSitesSource.RateLimit() == rate.Every(3*time.Second)
|
|
- PasteSitesSource.RespectsRobots() == true
|
|
- PasteSitesSource.Enabled() always true (all credential-free)
|
|
- PasteSitesSource.Sweep() iterates across sub-platforms: dpaste.org, paste.ee, rentry.co, ix.io, hastebin.com
|
|
- Each sub-platform has: Name, SearchURL pattern, result link regex, and optional raw URL construction
|
|
- Sweep emits at least one Finding per platform when fixture data matches keywords
|
|
- ctx cancellation stops the sweep promptly
|
|
</behavior>
|
|
<action>
|
|
Create `pkg/recon/sources/pastesites.go` following the SandboxesSource multi-platform pattern from pkg/recon/sources/sandboxes.go:
|
|
|
|
- Define `pastePlatform` struct: Name string, SearchPath string (with %s for query), ResultLinkRegex string, RawPathTemplate string (optional, for fetching raw content), IsJSON bool
|
|
- Default platforms:
|
|
1. dpaste: SearchPath="/search/?q=%s", result links matching `^/[A-Za-z0-9]+$`, raw via `/{id}/raw`
|
|
2. paste.ee: SearchPath="/search?q=%s", result links matching `^/p/[A-Za-z0-9]+$`, raw via `/r/{id}`
|
|
3. rentry.co: SearchPath="/search?q=%s", result links matching `^/[a-z0-9-]+$`, raw via `/{slug}/raw`
|
|
4. ix.io: No search -- skip (ix.io has no search). Remove from list.
|
|
5. hastebin: SearchPath="/search?q=%s", result links matching `^/[a-z]+$`, raw via `/raw/{id}`
|
|
|
|
- Struct: `PasteSitesSource` with Platforms []pastePlatform, BaseURL string (test override), Registry, Limiters, Client fields
|
|
- Name() "pastesites", RateLimit() Every(3s), Burst() 1, RespectsRobots() true
|
|
- Enabled() always true
|
|
- Sweep(): For each platform, for each keyword from BuildQueries(registry, "pastesites"):
|
|
1. Wait limiter
|
|
2. GET `{platform base or BaseURL}{searchPath with keyword}`
|
|
3. Parse HTML, extract result links matching platform regex
|
|
4. For each result link: wait limiter, GET raw content URL, read body (256KB limit), keyword-match against registry
|
|
5. Emit Finding with Source=paste URL, SourceType="recon:pastesites", ProviderName from keyword match
|
|
- Default platforms populated in a `defaultPastePlatforms()` function. Tests override Platforms to use httptest URLs.
|
|
|
|
Test: httptest mux serving search HTML + raw content for each sub-platform. Verify at least one Finding per platform fixture. Verify SourceType="recon:pastesites" on all.
|
|
</action>
|
|
<verify>
|
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestPasteSites" -v -count=1</automated>
|
|
</verify>
|
|
<done>PasteSitesSource aggregates across multiple paste sites, keyword-matches content, emits findings with correct SourceType.</done>
|
|
</task>
|
|
|
|
</tasks>
|
|
|
|
<verification>
|
|
All paste sources compile and pass unit tests:
|
|
```bash
|
|
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestPastebin|TestGistPaste|TestPasteSites" -v -count=1
|
|
```
|
|
</verification>
|
|
|
|
<success_criteria>
|
|
- 3 new source files exist (pastebin.go, gistpaste.go, pastesites.go) with tests
|
|
- Each implements recon.ReconSource with compile-time assertion
|
|
- PasteSitesSource covers 3+ paste sub-platforms
|
|
- Keyword matching uses provider Registry for ProviderName population
|
|
- All tests pass
|
|
</success_criteria>
|
|
|
|
<output>
|
|
After completion, create `.planning/phases/11-osint_search_paste/11-02-SUMMARY.md`
|
|
</output>
|