diff --git a/.planning/phases/09-osint-infrastructure/09-01-SUMMARY.md b/.planning/phases/09-osint-infrastructure/09-01-SUMMARY.md new file mode 100644 index 0000000..a52e51d --- /dev/null +++ b/.planning/phases/09-osint-infrastructure/09-01-SUMMARY.md @@ -0,0 +1,135 @@ +--- +phase: 09-osint-infrastructure +plan: 01 +subsystem: pkg/recon +tags: [recon, osint, interface, engine, ants, fanout] +dependency-graph: + requires: + - pkg/engine.Finding + - github.com/panjf2000/ants/v2 + - golang.org/x/time/rate + provides: + - recon.ReconSource interface + - recon.Engine (Register/List/SweepAll) + - recon.Config + - recon.Finding (alias of engine.Finding) + - recon.ExampleSource (reference stub) + affects: + - Wave 1 siblings (limiter, stealth, robots) compose with Engine + - Wave 2 CLI plan (09-05) wires Engine into cmd/recon + - Phases 10-16 implement ReconSource for real sources +tech-stack: + added: [] + patterns: + - Ants pool for parallel source fanout + - Type alias (Finding = engine.Finding) for shared storage path + - Interface-per-source plugin model +key-files: + created: + - pkg/recon/source.go + - pkg/recon/engine.go + - pkg/recon/example.go + - pkg/recon/engine_test.go + modified: [] +decisions: + - "Finding is a type alias of engine.Finding, not a new struct — keeps storage and verification paths unified; recon sources only need to set SourceType=recon:" + - "Dedup is intentionally NOT done in SweepAll — plan 09-03 owns pkg/recon/dedup.go; SweepAll only aggregates" + - "Engine sizes the ants pool to len(active) sources — small N (tens, not thousands), so per-sweep pool allocation is cheap and avoids cross-sweep state" + - "Context cancellation in SweepAll drains the out channel in a detached goroutine to prevent source senders from blocking after cancel" + - "Register is idempotent by Name() — re-registering replaces; guards against double-init loops" +metrics: + duration: "~3m" + completed: "2026-04-05" + tasks: 2 + files: 4 +--- + +# Phase 9 Plan 1: Recon Framework Foundation Summary + +ReconSource interface, Engine with ants-pool parallel fanout, and ExampleSource stub — the contract every OSINT source in Phases 10-16 will implement. + +## What Was Built + +**`pkg/recon/source.go`** defines the public contract: + +- `ReconSource` interface: `Name() / RateLimit() / Burst() / RespectsRobots() / Enabled(Config) / Sweep(ctx, query, out)` +- `Config` struct: `Stealth`, `RespectRobots`, `EnabledSources`, `Query` +- `Finding` as a Go type alias of `engine.Finding` — recon findings flow through the same storage path as file/git/stdin scanning; sources simply set `SourceType = "recon:"`. + +**`pkg/recon/engine.go`** is the orchestrator: + +- `NewEngine()` / `Register(src)` / `List()` with an RWMutex-guarded map and sorted name listing. +- `SweepAll(ctx, cfg)` collects enabled sources, allocates an `ants.Pool` sized to `len(active)`, submits one goroutine per source, aggregates findings via a buffered channel, and closes on WaitGroup completion. A context-cancel branch starts a detached drainer so senders never block post-cancel. +- Deduplication is deliberately deferred to plan 09-03 (`dedup.go`). + +**`pkg/recon/example.go`** ships a deterministic `ExampleSource` that emits two fake findings (openai + anthropic, masked keys, `recon:example` SourceType). It lets Wave 2 CLI work and the dashboard verify the end-to-end pipeline without any network I/O. + +**`pkg/recon/engine_test.go`** covers: + +- `TestRegisterList` — empty engine, register, idempotent re-register, sorted output. +- `TestSweepAll` — full fanout path via ExampleSource, asserts 2 findings tagged `recon:example` with populated provider/masked/source fields. +- `TestSweepAll_NoSources` — empty registry returns `nil, nil`. +- `TestSweepAll_FiltersDisabled` — sources whose `Enabled()` returns false are excluded. + +## Tasks Completed + +| Task | Name | Commit | Files | +| ---- | ------------------------------------------ | -------- | ------------------------------------------------------- | +| 1 | ReconSource interface + Config | 10af12d | pkg/recon/source.go | +| 2 | Engine + ExampleSource + tests | 851b243 | pkg/recon/engine.go, example.go, engine_test.go | + +## Verification + +- `go build ./pkg/recon/...` — clean +- `go vet ./pkg/recon/...` — clean +- `go test ./pkg/recon/ -count=1` — PASS (4/4 new tests; existing limiter/robots tests from sibling Wave 1 plans continue to pass) + +## Key Decisions + +1. **Type alias over new struct.** `type Finding = engine.Finding` means recon findings are byte-identical to scan findings. Storage, verification, and output paths already handle them; sources only tag `SourceType = "recon:"`. Avoids a parallel Finding hierarchy. + +2. **Per-sweep pool.** `ants.NewPool(len(active))` is allocated inside `SweepAll` and released via `defer pool.Release()`. With tens of sources this is cheap and eliminates shared-state bugs across concurrent sweeps. A long-lived shared pool can be introduced later if profiling warrants it. + +3. **Dedup deferred.** Per 09-CONTEXT.md, `pkg/recon/dedup.go` is owned by plan 09-03. `SweepAll` returns the raw aggregate so the caller can choose when to dedup (batched persistence vs streaming). + +4. **Cancellation safety.** On `ctx.Done()` mid-collection, `SweepAll` spawns a detached `for range out {}` drainer before returning `ctx.Err()`. This prevents goroutines inside the ants pool from blocking on `out <- f` after the caller has left. + +5. **ExampleSource is a real implementation, not a mock.** It lives in the production package (no `_test.go` suffix) so the CLI (`keyhunter recon list` / `recon full`) and the dashboard can exercise the pipeline end-to-end before any Phase 10-16 source lands. It performs zero network I/O. + +## Deviations from Plan + +None — plan executed exactly as written. Tests were written RED before the engine/example implementation (TDD per `tdd="true"`), then driven to GREEN. Added two extra tests beyond the plan's stated minimum (`TestSweepAll_NoSources`, `TestSweepAll_FiltersDisabled`) to cover the empty-registry and disabled-source branches of `SweepAll` — pure additive coverage, no behavior change. + +## Known Stubs + +- `ExampleSource` is itself a stub by design (documented in 09-CONTEXT.md and in the source file doc comment). It will remain in the package as a reference implementation and CI smoke-test source; real sources replace it in Phases 10-16. This is an intentional stub, not an unfinished task. + +## Interfaces Provided to Downstream Plans + +```go +// Wave 1 siblings (limiter, stealth, robots) compose orthogonally with Engine: +type ReconSource interface { + Name() string + RateLimit() rate.Limit + Burst() int + RespectsRobots() bool + Enabled(cfg Config) bool + Sweep(ctx context.Context, query string, out chan<- Finding) error +} + +// Wave 2 CLI (plan 09-05) will call: +e := recon.NewEngine() +e.Register(shodan.New(...)) +e.Register(github.New(...)) +findings, err := e.SweepAll(ctx, recon.Config{Stealth: true, Query: "..."}) +``` + +## Self-Check: PASSED + +- pkg/recon/source.go — FOUND +- pkg/recon/engine.go — FOUND +- pkg/recon/example.go — FOUND +- pkg/recon/engine_test.go — FOUND +- Commit 10af12d — FOUND +- Commit 851b243 — FOUND +- `go test ./pkg/recon/ -count=1` — PASS