--- phase: 10-osint-code-hosting plan: 01 subsystem: recon/sources tags: [recon, osint, http, foundation, wave-1] requires: - pkg/recon.Engine (Phase 9) - pkg/providers.Registry - pkg/recon.LimiterRegistry (Phase 9) provides: - pkg/recon/sources.Client (retry-aware HTTP wrapper) - pkg/recon/sources.ErrUnauthorized - pkg/recon/sources.ParseRetryAfter - pkg/recon/sources.BuildQueries - pkg/recon/sources.SourcesConfig - pkg/recon/sources.RegisterAll (stub) affects: - pkg/recon/sources (new package) tech_stack_added: [] patterns: - "Retry on 429/403/5xx honoring Retry-After; 401 is terminal" - "Context cancellation honored during retry backoff sleeps" - "Provider-driven query generation with per-source syntax switch" key_files_created: - pkg/recon/sources/doc.go - pkg/recon/sources/httpclient.go - pkg/recon/sources/httpclient_test.go - pkg/recon/sources/queries.go - pkg/recon/sources/queries_test.go - pkg/recon/sources/register.go - pkg/recon/sources/testhelpers_test.go key_files_modified: [] decisions: - "Client handles retry only; callers invoke LimiterRegistry.Wait before Do (single-purpose)" - "github/gist use 'kw' in:file syntax; gitlab/bitbucket/codeberg/huggingface use bare keywords" - "Unknown source names fall back to bare keyword (safe default for future sources)" - "SourcesConfig shipped as placeholder struct so Wave 2 plans can type-depend on its shape" metrics: duration_minutes: 4 tasks_completed: 2 tests_added: 18 completed_at: "2026-04-05T22:10:00Z" --- # Phase 10 Plan 01: Recon Sources Foundation Summary One-liner: Retry-aware HTTP client, provider-driven query generator, and empty RegisterAll bootstrap that unblocks Wave 2 plans 10-02..10-08 to run in parallel. ## What Was Built The shared foundation for every Phase 10 code-hosting source now lives in `pkg/recon/sources`: 1. **`Client`** — wraps `*http.Client` with retry on 429/403/5xx, `Retry-After` honoring, and context cancellation during backoff. 401 short-circuits to `ErrUnauthorized` (no retries). Default UA `keyhunter-recon/1.0`, 30s timeout, 2 retries. 2. **`BuildQueries(reg, source)`** — iterates `providers.Registry.List()`, dedups keywords across providers, sorts for determinism, and applies per-source search syntax via `formatQuery`. GitHub and Gist get `"keyword" in:file`; all others get the bare keyword. 3. **`SourcesConfig` + `RegisterAll`** — placeholder struct carrying per-source tokens and shared Registry/Limiters, plus a no-op registration function with a nil-engine guard. Plan 10-09 will fill the body after Wave 2 delivers individual sources. ## Tasks | # | Name | Commit | Status | | - | ---------------------------------------------------------- | ------- | ------ | | 1 | Shared retry HTTP client helper | 75024e4 | done | | 2 | Provider-driven query generator + RegisterAll skeleton | 9273f35 | done | ## Tests All tests green (`go test ./pkg/recon/sources/...` → PASS in ~3.1s). HTTP client tests (httptest-backed): - OK pass-through, 429 retry, 403 retry, 401 no-retry (ErrUnauthorized), ctx cancel during backoff, retries exhausted, default UA, ParseRetryAfter table. Query generator tests: - GitHub/Gist `in:file` syntax, GitLab/HuggingFace bare keywords, unknown-source default, nil registry, cross-provider dedup, empty-keyword skip. RegisterAll tests: - Nil-engine no-panic, empty-cfg no-panic on real engine. ## Decisions Made - **Single-purpose Client:** rate limiting is caller's job via `recon.LimiterRegistry.Wait`, keeping retry/backoff logic decoupled from rate policy. Avoids coupling 10 sources to a single limiter injection shape. - **Deterministic queries:** sorting keywords means test output is stable and cache keys are reproducible when future plans memoize search results. - **Placeholder `SourcesConfig`:** Wave 2 plans can write `sources.SourcesConfig{...}` against a stable shape before Plan 10-09 ships credential loading. ## Deviations from Plan None — plan executed exactly as written. A small `testhelpers_test.go` file was added (not listed in `files_modified`) purely to expose a test-only `newTestEngine()` helper shared between test files; this is idiomatic Go test scaffolding, not a functional deviation. ## Verification - `go build ./...` — clean - `go vet ./pkg/recon/sources/...` — clean - `go test ./pkg/recon/sources/... -timeout 60s` — PASS (3.11s) ## Ready For Wave 2 plans (10-02 GitHub, 10-03 GitLab, 10-04 Bitbucket, 10-05 Gist, 10-06 Codeberg, 10-07 HuggingFace, 10-08 Kaggle/sandboxes) can now import `pkg/recon/sources` and use `Client` + `BuildQueries` in parallel without conflicts. Plan 10-09 will populate `RegisterAll` with the full source list and wire it into `cmd/recon.go`. ## Self-Check: PASSED All 7 artifact files present, both commits (75024e4, 9273f35) reachable in git history, SUMMARY.md on disk.