Files
salvacybersec da0bf800f9 docs(11-02): complete paste site sources plan
- SUMMARY.md for PastebinSource, GistPasteSource, PasteSitesSource
2026-04-06 11:57:21 +03:00

3.3 KiB

phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, requirements-completed, duration, completed
phase plan subsystem tags requires provides affects tech-stack key-files key-decisions patterns-established requirements-completed duration completed
11-osint-search-paste 02 recon
pastebin
gist
paste-sites
scraping
osint
phase provides
10-osint-code-hosting ReconSource interface, shared HTTP client, extractAnchorHrefs helper, BuildQueries
PastebinSource for pastebin.com search+raw scanning
GistPasteSource for gist.github.com unauthenticated search scraping
PasteSitesSource multi-platform aggregator (dpaste, paste.ee, rentry, hastebin)
11-03
recon-registration
recon-engine
added patterns
two-phase search+raw-fetch for paste sources
multi-platform aggregator reuse from sandboxes
created modified
pkg/recon/sources/pastebin.go
pkg/recon/sources/pastebin_test.go
pkg/recon/sources/gistpaste.go
pkg/recon/sources/gistpaste_test.go
pkg/recon/sources/pastesites.go
pkg/recon/sources/pastesites_test.go
Two-phase approach for all paste sources: search HTML for links, then fetch raw content and keyword-match
PasteSitesSource reuses SandboxesSource multi-platform pattern with pastePlatform struct
GistPasteSource named 'gistpaste' to avoid collision with Phase 10 GistSource ('gist')
Paste source pattern: search page -> extract links -> fetch raw -> keyword match -> emit finding
RECON-PASTE-01
5min 2026-04-06

Phase 11 Plan 02: Paste Site Sources Summary

Three paste site ReconSources implementing two-phase search+raw-fetch with keyword matching against provider registry

What Was Built

PastebinSource (pkg/recon/sources/pastebin.go)

  • Searches pastebin.com for provider keywords, extracts 8-char paste IDs from HTML
  • Fetches /raw/{pasteID} content (256KB cap), matches against provider keyword set
  • Emits findings with SourceType="recon:pastebin" and ProviderName from matched keyword
  • Rate: Every(3s), Burst 1, credential-free, respects robots.txt

GistPasteSource (pkg/recon/sources/gistpaste.go)

  • Scrapes gist.github.com public search (no auth needed, distinct from Phase 10 API-based GistSource)
  • Extracts gist links matching /<user>/<hex-hash> pattern, fetches {gistPath}/raw
  • Keyword-matches raw content, emits findings with SourceType="recon:gistpaste"
  • Rate: Every(3s), Burst 1, credential-free

PasteSitesSource (pkg/recon/sources/pastesites.go)

  • Multi-platform aggregator following SandboxesSource pattern
  • Covers 4 paste sub-platforms: dpaste.org, paste.ee, rentry.co, hastebin.com
  • Each platform has configurable SearchPath, ResultLinkRegex, and RawPathTemplate
  • Per-platform error isolation: failures logged and skipped without aborting others
  • Findings tagged with platform=<name> in KeyMasked field

Test Coverage

9 tests total across 3 test files:

  • Sweep with httptest fixtures verifying finding extraction and keyword matching
  • Name/rate/burst/robots/enabled metadata assertions
  • Context cancellation handling

Deviations from Plan

None - plan executed exactly as written.

Commits

Task Commit Description
1 3c500b5 PastebinSource + GistPasteSource with tests
2 ed148d4 PasteSitesSource multi-paste aggregator with tests

Self-Check: PASSED

All 7 files found. Both commit hashes verified in git log.