Merge branch 'worktree-agent-a6700ee2'
This commit is contained in:
91
.planning/phases/11-osint_search_paste/11-02-SUMMARY.md
Normal file
91
.planning/phases/11-osint_search_paste/11-02-SUMMARY.md
Normal file
@@ -0,0 +1,91 @@
|
||||
---
|
||||
phase: 11-osint-search-paste
|
||||
plan: 02
|
||||
subsystem: recon
|
||||
tags: [pastebin, gist, paste-sites, scraping, osint]
|
||||
|
||||
requires:
|
||||
- phase: 10-osint-code-hosting
|
||||
provides: ReconSource interface, shared HTTP client, extractAnchorHrefs helper, BuildQueries
|
||||
|
||||
provides:
|
||||
- PastebinSource for pastebin.com search+raw scanning
|
||||
- GistPasteSource for gist.github.com unauthenticated search scraping
|
||||
- PasteSitesSource multi-platform aggregator (dpaste, paste.ee, rentry, hastebin)
|
||||
|
||||
affects: [11-03, recon-registration, recon-engine]
|
||||
|
||||
tech-stack:
|
||||
added: []
|
||||
patterns: [two-phase search+raw-fetch for paste sources, multi-platform aggregator reuse from sandboxes]
|
||||
|
||||
key-files:
|
||||
created:
|
||||
- pkg/recon/sources/pastebin.go
|
||||
- pkg/recon/sources/pastebin_test.go
|
||||
- pkg/recon/sources/gistpaste.go
|
||||
- pkg/recon/sources/gistpaste_test.go
|
||||
- pkg/recon/sources/pastesites.go
|
||||
- pkg/recon/sources/pastesites_test.go
|
||||
modified: []
|
||||
|
||||
key-decisions:
|
||||
- "Two-phase approach for all paste sources: search HTML for links, then fetch raw content and keyword-match"
|
||||
- "PasteSitesSource reuses SandboxesSource multi-platform pattern with pastePlatform struct"
|
||||
- "GistPasteSource named 'gistpaste' to avoid collision with Phase 10 GistSource ('gist')"
|
||||
|
||||
patterns-established:
|
||||
- "Paste source pattern: search page -> extract links -> fetch raw -> keyword match -> emit finding"
|
||||
|
||||
requirements-completed: [RECON-PASTE-01]
|
||||
|
||||
duration: 5min
|
||||
completed: 2026-04-06
|
||||
---
|
||||
|
||||
# Phase 11 Plan 02: Paste Site Sources Summary
|
||||
|
||||
**Three paste site ReconSources implementing two-phase search+raw-fetch with keyword matching against provider registry**
|
||||
|
||||
## What Was Built
|
||||
|
||||
### PastebinSource (`pkg/recon/sources/pastebin.go`)
|
||||
- Searches pastebin.com for provider keywords, extracts 8-char paste IDs from HTML
|
||||
- Fetches `/raw/{pasteID}` content (256KB cap), matches against provider keyword set
|
||||
- Emits findings with SourceType="recon:pastebin" and ProviderName from matched keyword
|
||||
- Rate: Every(3s), Burst 1, credential-free, respects robots.txt
|
||||
|
||||
### GistPasteSource (`pkg/recon/sources/gistpaste.go`)
|
||||
- Scrapes gist.github.com public search (no auth needed, distinct from Phase 10 API-based GistSource)
|
||||
- Extracts gist links matching `/<user>/<hex-hash>` pattern, fetches `{gistPath}/raw`
|
||||
- Keyword-matches raw content, emits findings with SourceType="recon:gistpaste"
|
||||
- Rate: Every(3s), Burst 1, credential-free
|
||||
|
||||
### PasteSitesSource (`pkg/recon/sources/pastesites.go`)
|
||||
- Multi-platform aggregator following SandboxesSource pattern
|
||||
- Covers 4 paste sub-platforms: dpaste.org, paste.ee, rentry.co, hastebin.com
|
||||
- Each platform has configurable SearchPath, ResultLinkRegex, and RawPathTemplate
|
||||
- Per-platform error isolation: failures logged and skipped without aborting others
|
||||
- Findings tagged with `platform=<name>` in KeyMasked field
|
||||
|
||||
## Test Coverage
|
||||
|
||||
9 tests total across 3 test files:
|
||||
- Sweep with httptest fixtures verifying finding extraction and keyword matching
|
||||
- Name/rate/burst/robots/enabled metadata assertions
|
||||
- Context cancellation handling
|
||||
|
||||
## Deviations from Plan
|
||||
|
||||
None - plan executed exactly as written.
|
||||
|
||||
## Commits
|
||||
|
||||
| Task | Commit | Description |
|
||||
|------|--------|-------------|
|
||||
| 1 | 3c500b5 | PastebinSource + GistPasteSource with tests |
|
||||
| 2 | ed148d4 | PasteSitesSource multi-paste aggregator with tests |
|
||||
|
||||
## Self-Check: PASSED
|
||||
|
||||
All 7 files found. Both commit hashes verified in git log.
|
||||
Reference in New Issue
Block a user