10 KiB
10 KiB
phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
| phase | plan | type | wave | depends_on | files_modified | autonomous | requirements | must_haves | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11-osint-search-paste | 02 | execute | 1 |
|
true |
|
|
Purpose: RECON-PASTE-01 -- detect API key leaks across public paste sites. Output: Three source files + tests covering paste site scanning.
<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @pkg/recon/source.go @pkg/recon/sources/httpclient.go @pkg/recon/sources/queries.go @pkg/recon/sources/gist.go (reference: Phase 10 GistSource uses GitHub API -- this plan's GistPasteSource is a scraping alternative) @pkg/recon/sources/replit.go (reference pattern for HTML scraping source) @pkg/recon/sources/sandboxes.go (reference pattern for multi-platform aggregator) From pkg/recon/source.go: ```go type ReconSource interface { Name() string RateLimit() rate.Limit Burst() int RespectsRobots() bool Enabled(cfg Config) bool Sweep(ctx context.Context, query string, out chan<- Finding) error } ```From pkg/recon/sources/httpclient.go:
func NewClient() *Client
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
From pkg/recon/sources/gist.go (existing Phase 10 GistSource -- avoid name collision):
type GistSource struct { ... } // Name() == "gist" -- already taken
func (s *GistSource) keywordSet() map[string]string // pattern to reuse
Create `pkg/recon/sources/gistpaste.go`:
- Struct: `GistPasteSource` with BaseURL, Registry, Limiters, Client fields
- Name() "gistpaste", RateLimit() Every(3s), Burst() 1, RespectsRobots() true
- Enabled() always true
- Sweep(): iterate BuildQueries(registry, "gistpaste"). For each keyword, GET `{BaseURL}/search?q={url.QueryEscape(keyword)}` (gist.github.com search). Parse HTML for gist links matching `^/[^/]+/[a-f0-9]+$` pattern. For each gist link, construct raw URL `{BaseURL}{gistPath}/raw` and fetch content (limit 256KB). Keyword-match and emit Finding with SourceType="recon:gistpaste".
Tests: httptest servers serving HTML search results + raw paste content fixtures. Verify findings emitted with correct SourceType, Source URL, and ProviderName.
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestPastebin|TestGistPaste" -v -count=1
PastebinSource and GistPasteSource compile, pass all tests, handle ctx cancellation.
Task 2: PasteSitesSource (multi-paste aggregator)
pkg/recon/sources/pastesites.go, pkg/recon/sources/pastesites_test.go
- PasteSitesSource.Name() == "pastesites"
- PasteSitesSource.RateLimit() == rate.Every(3*time.Second)
- PasteSitesSource.RespectsRobots() == true
- PasteSitesSource.Enabled() always true (all credential-free)
- PasteSitesSource.Sweep() iterates across sub-platforms: dpaste.org, paste.ee, rentry.co, ix.io, hastebin.com
- Each sub-platform has: Name, SearchURL pattern, result link regex, and optional raw URL construction
- Sweep emits at least one Finding per platform when fixture data matches keywords
- ctx cancellation stops the sweep promptly
Create `pkg/recon/sources/pastesites.go` following the SandboxesSource multi-platform pattern from pkg/recon/sources/sandboxes.go:
- Define `pastePlatform` struct: Name string, SearchPath string (with %s for query), ResultLinkRegex string, RawPathTemplate string (optional, for fetching raw content), IsJSON bool
- Default platforms:
1. dpaste: SearchPath="/search/?q=%s", result links matching `^/[A-Za-z0-9]+$`, raw via `/{id}/raw`
2. paste.ee: SearchPath="/search?q=%s", result links matching `^/p/[A-Za-z0-9]+$`, raw via `/r/{id}`
3. rentry.co: SearchPath="/search?q=%s", result links matching `^/[a-z0-9-]+$`, raw via `/{slug}/raw`
4. ix.io: No search -- skip (ix.io has no search). Remove from list.
5. hastebin: SearchPath="/search?q=%s", result links matching `^/[a-z]+$`, raw via `/raw/{id}`
- Struct: `PasteSitesSource` with Platforms []pastePlatform, BaseURL string (test override), Registry, Limiters, Client fields
- Name() "pastesites", RateLimit() Every(3s), Burst() 1, RespectsRobots() true
- Enabled() always true
- Sweep(): For each platform, for each keyword from BuildQueries(registry, "pastesites"):
1. Wait limiter
2. GET `{platform base or BaseURL}{searchPath with keyword}`
3. Parse HTML, extract result links matching platform regex
4. For each result link: wait limiter, GET raw content URL, read body (256KB limit), keyword-match against registry
5. Emit Finding with Source=paste URL, SourceType="recon:pastesites", ProviderName from keyword match
- Default platforms populated in a `defaultPastePlatforms()` function. Tests override Platforms to use httptest URLs.
Test: httptest mux serving search HTML + raw content for each sub-platform. Verify at least one Finding per platform fixture. Verify SourceType="recon:pastesites" on all.
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestPasteSites" -v -count=1
PasteSitesSource aggregates across multiple paste sites, keyword-matches content, emits findings with correct SourceType.
All paste sources compile and pass unit tests:
```bash
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestPastebin|TestGistPaste|TestPasteSites" -v -count=1
```
<success_criteria>
- 3 new source files exist (pastebin.go, gistpaste.go, pastesites.go) with tests
- Each implements recon.ReconSource with compile-time assertion
- PasteSitesSource covers 3+ paste sub-platforms
- Keyword matching uses provider Registry for ProviderName population
- All tests pass </success_criteria>