--- phase: 11-osint-search-paste plan: 02 subsystem: recon tags: [pastebin, gist, paste-sites, scraping, osint] requires: - phase: 10-osint-code-hosting provides: ReconSource interface, shared HTTP client, extractAnchorHrefs helper, BuildQueries provides: - PastebinSource for pastebin.com search+raw scanning - GistPasteSource for gist.github.com unauthenticated search scraping - PasteSitesSource multi-platform aggregator (dpaste, paste.ee, rentry, hastebin) affects: [11-03, recon-registration, recon-engine] tech-stack: added: [] patterns: [two-phase search+raw-fetch for paste sources, multi-platform aggregator reuse from sandboxes] key-files: created: - pkg/recon/sources/pastebin.go - pkg/recon/sources/pastebin_test.go - pkg/recon/sources/gistpaste.go - pkg/recon/sources/gistpaste_test.go - pkg/recon/sources/pastesites.go - pkg/recon/sources/pastesites_test.go modified: [] key-decisions: - "Two-phase approach for all paste sources: search HTML for links, then fetch raw content and keyword-match" - "PasteSitesSource reuses SandboxesSource multi-platform pattern with pastePlatform struct" - "GistPasteSource named 'gistpaste' to avoid collision with Phase 10 GistSource ('gist')" patterns-established: - "Paste source pattern: search page -> extract links -> fetch raw -> keyword match -> emit finding" requirements-completed: [RECON-PASTE-01] duration: 5min completed: 2026-04-06 --- # Phase 11 Plan 02: Paste Site Sources Summary **Three paste site ReconSources implementing two-phase search+raw-fetch with keyword matching against provider registry** ## What Was Built ### PastebinSource (`pkg/recon/sources/pastebin.go`) - Searches pastebin.com for provider keywords, extracts 8-char paste IDs from HTML - Fetches `/raw/{pasteID}` content (256KB cap), matches against provider keyword set - Emits findings with SourceType="recon:pastebin" and ProviderName from matched keyword - Rate: Every(3s), Burst 1, credential-free, respects robots.txt ### GistPasteSource (`pkg/recon/sources/gistpaste.go`) - Scrapes gist.github.com public search (no auth needed, distinct from Phase 10 API-based GistSource) - Extracts gist links matching `//` pattern, fetches `{gistPath}/raw` - Keyword-matches raw content, emits findings with SourceType="recon:gistpaste" - Rate: Every(3s), Burst 1, credential-free ### PasteSitesSource (`pkg/recon/sources/pastesites.go`) - Multi-platform aggregator following SandboxesSource pattern - Covers 4 paste sub-platforms: dpaste.org, paste.ee, rentry.co, hastebin.com - Each platform has configurable SearchPath, ResultLinkRegex, and RawPathTemplate - Per-platform error isolation: failures logged and skipped without aborting others - Findings tagged with `platform=` in KeyMasked field ## Test Coverage 9 tests total across 3 test files: - Sweep with httptest fixtures verifying finding extraction and keyword matching - Name/rate/burst/robots/enabled metadata assertions - Context cancellation handling ## Deviations from Plan None - plan executed exactly as written. ## Commits | Task | Commit | Description | |------|--------|-------------| | 1 | 3c500b5 | PastebinSource + GistPasteSource with tests | | 2 | ed148d4 | PasteSitesSource multi-paste aggregator with tests | ## Self-Check: PASSED All 7 files found. Both commit hashes verified in git log.