Documents all 4 RECON-INFRA requirement IDs as complete, summarizes decisions (per-source limiters, default-allow robots, SHA256 dedup, UA pool of 10), lists handoff contract for Phases 10-16.
7.2 KiB
7.2 KiB
phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, requirements-completed, duration, completed
| phase | plan | subsystem | tags | requires | provides | affects | tech-stack | key-files | key-decisions | patterns-established | requirements-completed | duration | completed | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 09-osint-infrastructure | phase-summary | infra |
|
|
|
|
|
|
|
|
|
phase | 2026-04-05 |
Phase 9: OSINT Infrastructure Summary
Recon framework with ReconSource interface, per-source rate limiting, stealth UA rotation, robots.txt compliance, and ants-powered parallel sweep — ready for sources in Phases 10-16.
Accomplishments
pkg/reconpackage created with 7 production files + full unit + integration testsReconSourceinterface defined and proven viaExampleSourcestubEngine.SweepAllfans out to all registered sources in parallel via ants poolLimiterRegistryprovides isolated per-source*rate.Limiterinstances with optional stealth jitterRobotsCachefetches, caches (1h TTL), and enforces robots.txt with default-allow failure modeDedupcollapses duplicate findings across sources via SHA256(provider|masked|source)keyhunter recon fullandkeyhunter recon listCLI commands wired incmd/recon.go- End-to-end integration test (
pkg/recon/integration_test.go) wires Engine + Limiter + Stealth + Robots + Dedup against a synthetic source
Requirements Closed
| ID | Description | Evidence |
|---|---|---|
| RECON-INFRA-05 | Per-source rate limiting via LimiterRegistry | pkg/recon/limiter.go + limiter_test.go + integration |
| RECON-INFRA-06 | Stealth mode (UA rotation + jitter) | pkg/recon/stealth.go + limiter.go Wait jitter path |
| RECON-INFRA-07 | robots.txt compliance, cache, default-allow | pkg/recon/robots.go + robots_test.go + integration |
| RECON-INFRA-08 | Parallel sweep orchestrator with dedup | pkg/recon/engine.go + dedup.go + integration |
Plans in Phase 9
- 09-01 Engine + ReconSource interface + ExampleSource
- 09-02 LimiterRegistry (rate limiting + stealth jitter)
- 09-03 Dedup (SHA256 hash, first-seen wins)
- 09-04 RobotsCache (1h TTL, default-allow on failure)
- 09-05 Stealth UA pool + CLI wiring (
cmd/recon.go) - 09-06 Integration test + phase summary (this plan)
Key Decisions
- Per-source rate limiters, not central — each OSINT source owns its bucket; matches TruffleHog pattern and keeps a slow source from starving fast ones
- Default-allow on robots fetch failure — a broken
/robots.txtendpoint must not silently disable recon; errors are swallowed andtrueis returned - Dedup key = SHA256(provider|masked|source) — distinct source URLs for the same masked key are kept so operators see every leak location
- UA pool of 10 — spans Chrome/Firefox/Safari/Edge across Win/macOS/Linux/iOS/Android for realistic fingerprint distribution
engine.Findingreused — recon findings flow through the same storage/verification paths as file/git findings; onlySourceTypeis prefixedrecon:- Engine does not dedup — callers invoke
recon.Dedupexplicitly; keeps the Engine responsibility narrow (fanout only) and allows callers to access pre-dedup raw data
New Dependencies
github.com/temoto/robotstxt— small, well-maintained robots.txt parser used by RobotsCache
CLI Surface
keyhunter recon full [--stealth] [--respect-robots] [--query=STRING]
keyhunter recon list
Phase 9 ships with ExampleSource only; Phases 10-16 register real sources via buildReconEngine() in cmd/recon.go (or via package-init side effects once the pattern is established).
Handoff to Phase 10
ReconSourceinterface is frozen for the phase block — Phases 10-16 can implement it confidently- New sources register in
cmd/recon.go:buildReconEngine()with a singlee.Register(...)call - Each source should:
- Return a stable lowercase
Name() - Declare its own
RateLimit()/Burst()values - Set
RespectsRobots()to true for HTML scrapers, false for authenticated APIs - Tag findings with
SourceType = "recon:<name>" - Exit promptly on
ctx.Done()inSweep
- Return a stable lowercase
- Integration test pattern in
pkg/recon/integration_test.goshows how to wire a synthetic source for source-specific tests
Known Gaps (Deferred)
- Proxy / TOR support — out of scope; can be added via
http.Transportinjection later - Per-source retry with backoff — each source handles its own retries; no framework-level retry
- Distributed rate limiting — out of scope; per-instance limiters only
- Webhook notifications on source exhaustion — deferred to Phase 17 (Telegram)
Next Phase Readiness
Phase 10 (GitHub recon) can start immediately. The pkg/recon contract is stable and proven end-to-end by TestReconPipelineIntegration and TestRobotsOnlyWhenRespectsRobots.
Phase: 09-osint-infrastructure Completed: 2026-04-05