docs(09-01): complete recon framework foundation plan

This commit is contained in:
salvacybersec
2026-04-06 00:44:04 +03:00
parent 4dbc38dcc5
commit fb1e7f8bf5

View File

@@ -0,0 +1,135 @@
---
phase: 09-osint-infrastructure
plan: 01
subsystem: pkg/recon
tags: [recon, osint, interface, engine, ants, fanout]
dependency-graph:
requires:
- pkg/engine.Finding
- github.com/panjf2000/ants/v2
- golang.org/x/time/rate
provides:
- recon.ReconSource interface
- recon.Engine (Register/List/SweepAll)
- recon.Config
- recon.Finding (alias of engine.Finding)
- recon.ExampleSource (reference stub)
affects:
- Wave 1 siblings (limiter, stealth, robots) compose with Engine
- Wave 2 CLI plan (09-05) wires Engine into cmd/recon
- Phases 10-16 implement ReconSource for real sources
tech-stack:
added: []
patterns:
- Ants pool for parallel source fanout
- Type alias (Finding = engine.Finding) for shared storage path
- Interface-per-source plugin model
key-files:
created:
- pkg/recon/source.go
- pkg/recon/engine.go
- pkg/recon/example.go
- pkg/recon/engine_test.go
modified: []
decisions:
- "Finding is a type alias of engine.Finding, not a new struct — keeps storage and verification paths unified; recon sources only need to set SourceType=recon:<name>"
- "Dedup is intentionally NOT done in SweepAll — plan 09-03 owns pkg/recon/dedup.go; SweepAll only aggregates"
- "Engine sizes the ants pool to len(active) sources — small N (tens, not thousands), so per-sweep pool allocation is cheap and avoids cross-sweep state"
- "Context cancellation in SweepAll drains the out channel in a detached goroutine to prevent source senders from blocking after cancel"
- "Register is idempotent by Name() — re-registering replaces; guards against double-init loops"
metrics:
duration: "~3m"
completed: "2026-04-05"
tasks: 2
files: 4
---
# Phase 9 Plan 1: Recon Framework Foundation Summary
ReconSource interface, Engine with ants-pool parallel fanout, and ExampleSource stub — the contract every OSINT source in Phases 10-16 will implement.
## What Was Built
**`pkg/recon/source.go`** defines the public contract:
- `ReconSource` interface: `Name() / RateLimit() / Burst() / RespectsRobots() / Enabled(Config) / Sweep(ctx, query, out)`
- `Config` struct: `Stealth`, `RespectRobots`, `EnabledSources`, `Query`
- `Finding` as a Go type alias of `engine.Finding` — recon findings flow through the same storage path as file/git/stdin scanning; sources simply set `SourceType = "recon:<name>"`.
**`pkg/recon/engine.go`** is the orchestrator:
- `NewEngine()` / `Register(src)` / `List()` with an RWMutex-guarded map and sorted name listing.
- `SweepAll(ctx, cfg)` collects enabled sources, allocates an `ants.Pool` sized to `len(active)`, submits one goroutine per source, aggregates findings via a buffered channel, and closes on WaitGroup completion. A context-cancel branch starts a detached drainer so senders never block post-cancel.
- Deduplication is deliberately deferred to plan 09-03 (`dedup.go`).
**`pkg/recon/example.go`** ships a deterministic `ExampleSource` that emits two fake findings (openai + anthropic, masked keys, `recon:example` SourceType). It lets Wave 2 CLI work and the dashboard verify the end-to-end pipeline without any network I/O.
**`pkg/recon/engine_test.go`** covers:
- `TestRegisterList` — empty engine, register, idempotent re-register, sorted output.
- `TestSweepAll` — full fanout path via ExampleSource, asserts 2 findings tagged `recon:example` with populated provider/masked/source fields.
- `TestSweepAll_NoSources` — empty registry returns `nil, nil`.
- `TestSweepAll_FiltersDisabled` — sources whose `Enabled()` returns false are excluded.
## Tasks Completed
| Task | Name | Commit | Files |
| ---- | ------------------------------------------ | -------- | ------------------------------------------------------- |
| 1 | ReconSource interface + Config | 10af12d | pkg/recon/source.go |
| 2 | Engine + ExampleSource + tests | 851b243 | pkg/recon/engine.go, example.go, engine_test.go |
## Verification
- `go build ./pkg/recon/...` — clean
- `go vet ./pkg/recon/...` — clean
- `go test ./pkg/recon/ -count=1` — PASS (4/4 new tests; existing limiter/robots tests from sibling Wave 1 plans continue to pass)
## Key Decisions
1. **Type alias over new struct.** `type Finding = engine.Finding` means recon findings are byte-identical to scan findings. Storage, verification, and output paths already handle them; sources only tag `SourceType = "recon:<name>"`. Avoids a parallel Finding hierarchy.
2. **Per-sweep pool.** `ants.NewPool(len(active))` is allocated inside `SweepAll` and released via `defer pool.Release()`. With tens of sources this is cheap and eliminates shared-state bugs across concurrent sweeps. A long-lived shared pool can be introduced later if profiling warrants it.
3. **Dedup deferred.** Per 09-CONTEXT.md, `pkg/recon/dedup.go` is owned by plan 09-03. `SweepAll` returns the raw aggregate so the caller can choose when to dedup (batched persistence vs streaming).
4. **Cancellation safety.** On `ctx.Done()` mid-collection, `SweepAll` spawns a detached `for range out {}` drainer before returning `ctx.Err()`. This prevents goroutines inside the ants pool from blocking on `out <- f` after the caller has left.
5. **ExampleSource is a real implementation, not a mock.** It lives in the production package (no `_test.go` suffix) so the CLI (`keyhunter recon list` / `recon full`) and the dashboard can exercise the pipeline end-to-end before any Phase 10-16 source lands. It performs zero network I/O.
## Deviations from Plan
None — plan executed exactly as written. Tests were written RED before the engine/example implementation (TDD per `tdd="true"`), then driven to GREEN. Added two extra tests beyond the plan's stated minimum (`TestSweepAll_NoSources`, `TestSweepAll_FiltersDisabled`) to cover the empty-registry and disabled-source branches of `SweepAll` — pure additive coverage, no behavior change.
## Known Stubs
- `ExampleSource` is itself a stub by design (documented in 09-CONTEXT.md and in the source file doc comment). It will remain in the package as a reference implementation and CI smoke-test source; real sources replace it in Phases 10-16. This is an intentional stub, not an unfinished task.
## Interfaces Provided to Downstream Plans
```go
// Wave 1 siblings (limiter, stealth, robots) compose orthogonally with Engine:
type ReconSource interface {
Name() string
RateLimit() rate.Limit
Burst() int
RespectsRobots() bool
Enabled(cfg Config) bool
Sweep(ctx context.Context, query string, out chan<- Finding) error
}
// Wave 2 CLI (plan 09-05) will call:
e := recon.NewEngine()
e.Register(shodan.New(...))
e.Register(github.New(...))
findings, err := e.SweepAll(ctx, recon.Config{Stealth: true, Query: "..."})
```
## Self-Check: PASSED
- pkg/recon/source.go — FOUND
- pkg/recon/engine.go — FOUND
- pkg/recon/example.go — FOUND
- pkg/recon/engine_test.go — FOUND
- Commit 10af12d — FOUND
- Commit 851b243 — FOUND
- `go test ./pkg/recon/ -count=1` — PASS