Files
salvacybersec d29a7d30b2 docs(09-06): add phase 09 completion summary
Documents all 4 RECON-INFRA requirement IDs as complete, summarizes
decisions (per-source limiters, default-allow robots, SHA256 dedup,
UA pool of 10), lists handoff contract for Phases 10-16.
2026-04-06 00:52:20 +03:00

7.2 KiB

phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, requirements-completed, duration, completed
phase plan subsystem tags requires provides affects tech-stack key-files key-decisions patterns-established requirements-completed duration completed
09-osint-infrastructure phase-summary infra
recon
osint
rate-limiting
robots-txt
stealth
dedup
ants
phase provides
01-foundation engine.Finding type, ants worker pool pattern
ReconSource interface for OSINT sources (Phases 10-16)
Engine with parallel fanout via ants pool
Per-source LimiterRegistry (golang.org/x/time/rate)
Stealth mode (UA rotation + jitter)
RobotsCache with 1h TTL and default-allow on failure
Cross-source Dedup by SHA256(provider|masked|source)
keyhunter recon full / recon list CLI commands
ExampleSource stub proving the pipeline
10-github-recon
11-shodan-recon
12-pastebin-recon
13-search-engine-recon
14-wayback-recon
15-huggingface-recon
16-misc-recon
added patterns
github.com/temoto/robotstxt (robots.txt parsing)
ReconSource interface — every OSINT source implements 6 methods
Per-source rate.Limiter owned by LimiterRegistry keyed on source name
Default-allow semantics on robots fetch/parse failure
Dedup via stable SHA256(provider|masked|source) hash, first-seen wins
SourceType tagged "recon:<name>" for downstream storage unification
created modified
pkg/recon/source.go
pkg/recon/engine.go
pkg/recon/limiter.go
pkg/recon/stealth.go
pkg/recon/robots.go
pkg/recon/dedup.go
pkg/recon/example.go
pkg/recon/integration_test.go
cmd/recon.go
go.mod (added temoto/robotstxt)
Per-source rate limiters — no central limiter (RECON-INFRA-05)
Default-allow on robots.txt fetch/parse failure to avoid silently disabling sources
Dedup key = SHA256(provider|masked|source); distinct source URLs are kept
UA pool of 10 realistic browsers covering Chrome/Firefox/Safari/Edge on Win/Mac/Linux/iOS/Android
SourceType prefix 'recon:<name>' unifies recon findings with file/git/stdin through engine.Finding
Engine does NOT dedup internally; callers invoke recon.Dedup explicitly
ReconSource interface: Name/RateLimit/Burst/RespectsRobots/Enabled/Sweep
Source registration via Engine.Register; Phases 10-16 add sources in buildReconEngine() or package init()
Integration tests live alongside unit tests in pkg/recon/ using the same package (not _test package)
RECON-INFRA-05
RECON-INFRA-06
RECON-INFRA-07
RECON-INFRA-08
phase 2026-04-05

Phase 9: OSINT Infrastructure Summary

Recon framework with ReconSource interface, per-source rate limiting, stealth UA rotation, robots.txt compliance, and ants-powered parallel sweep — ready for sources in Phases 10-16.

Accomplishments

  • pkg/recon package created with 7 production files + full unit + integration tests
  • ReconSource interface defined and proven via ExampleSource stub
  • Engine.SweepAll fans out to all registered sources in parallel via ants pool
  • LimiterRegistry provides isolated per-source *rate.Limiter instances with optional stealth jitter
  • RobotsCache fetches, caches (1h TTL), and enforces robots.txt with default-allow failure mode
  • Dedup collapses duplicate findings across sources via SHA256(provider|masked|source)
  • keyhunter recon full and keyhunter recon list CLI commands wired in cmd/recon.go
  • End-to-end integration test (pkg/recon/integration_test.go) wires Engine + Limiter + Stealth + Robots + Dedup against a synthetic source

Requirements Closed

ID Description Evidence
RECON-INFRA-05 Per-source rate limiting via LimiterRegistry pkg/recon/limiter.go + limiter_test.go + integration
RECON-INFRA-06 Stealth mode (UA rotation + jitter) pkg/recon/stealth.go + limiter.go Wait jitter path
RECON-INFRA-07 robots.txt compliance, cache, default-allow pkg/recon/robots.go + robots_test.go + integration
RECON-INFRA-08 Parallel sweep orchestrator with dedup pkg/recon/engine.go + dedup.go + integration

Plans in Phase 9

  1. 09-01 Engine + ReconSource interface + ExampleSource
  2. 09-02 LimiterRegistry (rate limiting + stealth jitter)
  3. 09-03 Dedup (SHA256 hash, first-seen wins)
  4. 09-04 RobotsCache (1h TTL, default-allow on failure)
  5. 09-05 Stealth UA pool + CLI wiring (cmd/recon.go)
  6. 09-06 Integration test + phase summary (this plan)

Key Decisions

  • Per-source rate limiters, not central — each OSINT source owns its bucket; matches TruffleHog pattern and keeps a slow source from starving fast ones
  • Default-allow on robots fetch failure — a broken /robots.txt endpoint must not silently disable recon; errors are swallowed and true is returned
  • Dedup key = SHA256(provider|masked|source) — distinct source URLs for the same masked key are kept so operators see every leak location
  • UA pool of 10 — spans Chrome/Firefox/Safari/Edge across Win/macOS/Linux/iOS/Android for realistic fingerprint distribution
  • engine.Finding reused — recon findings flow through the same storage/verification paths as file/git findings; only SourceType is prefixed recon:
  • Engine does not dedup — callers invoke recon.Dedup explicitly; keeps the Engine responsibility narrow (fanout only) and allows callers to access pre-dedup raw data

New Dependencies

  • github.com/temoto/robotstxt — small, well-maintained robots.txt parser used by RobotsCache

CLI Surface

keyhunter recon full [--stealth] [--respect-robots] [--query=STRING]
keyhunter recon list

Phase 9 ships with ExampleSource only; Phases 10-16 register real sources via buildReconEngine() in cmd/recon.go (or via package-init side effects once the pattern is established).

Handoff to Phase 10

  • ReconSource interface is frozen for the phase block — Phases 10-16 can implement it confidently
  • New sources register in cmd/recon.go:buildReconEngine() with a single e.Register(...) call
  • Each source should:
    1. Return a stable lowercase Name()
    2. Declare its own RateLimit() / Burst() values
    3. Set RespectsRobots() to true for HTML scrapers, false for authenticated APIs
    4. Tag findings with SourceType = "recon:<name>"
    5. Exit promptly on ctx.Done() in Sweep
  • Integration test pattern in pkg/recon/integration_test.go shows how to wire a synthetic source for source-specific tests

Known Gaps (Deferred)

  • Proxy / TOR support — out of scope; can be added via http.Transport injection later
  • Per-source retry with backoff — each source handles its own retries; no framework-level retry
  • Distributed rate limiting — out of scope; per-instance limiters only
  • Webhook notifications on source exhaustion — deferred to Phase 17 (Telegram)

Next Phase Readiness

Phase 10 (GitHub recon) can start immediately. The pkg/recon contract is stable and proven end-to-end by TestReconPipelineIntegration and TestRobotsOnlyWhenRespectsRobots.


Phase: 09-osint-infrastructure Completed: 2026-04-05