From 9b1aaae28d72ad2364a00030abf390ac4b441ef5 Mon Sep 17 00:00:00 2001 From: salvacybersec Date: Mon, 6 Apr 2026 01:10:57 +0300 Subject: [PATCH] docs(10-01): complete recon sources foundation plan --- .planning/ROADMAP.md | 4 +- .planning/STATE.md | 21 ++-- .../10-osint-code-hosting/10-01-SUMMARY.md | 99 +++++++++++++++++++ 3 files changed, 113 insertions(+), 11 deletions(-) create mode 100644 .planning/phases/10-osint-code-hosting/10-01-SUMMARY.md diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index 498bd84..ee3e6c0 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -217,7 +217,7 @@ Plans: 5. All code hosting source findings are stored in the database with source attribution and deduplication **Plans**: 9 plans Plans: -- [ ] 10-01-PLAN.md — Shared HTTP client + provider-query generator + RegisterAll skeleton +- [x] 10-01-PLAN.md — Shared HTTP client + provider-query generator + RegisterAll skeleton - [ ] 10-02-PLAN.md — GitHubSource (RECON-CODE-01) - [ ] 10-03-PLAN.md — GitLabSource (RECON-CODE-02) - [ ] 10-04-PLAN.md — BitbucketSource + GistSource (RECON-CODE-03, RECON-CODE-04) @@ -336,7 +336,7 @@ Phases execute in numeric order: 1 → 2 → 3 → ... → 18 | 7. Import Adapters & CI/CD Integration | 0/? | Not started | - | | 8. Dork Engine | 0/? | Not started | - | | 9. OSINT Infrastructure | 2/6 | In Progress| | -| 10. OSINT Code Hosting | 0/? | Not started | - | +| 10. OSINT Code Hosting | 1/9 | In Progress| | | 11. OSINT Search & Paste | 0/? | Not started | - | | 12. OSINT IoT & Cloud Storage | 0/? | Not started | - | | 13. OSINT Package Registries & Container/IaC | 0/? | Not started | - | diff --git a/.planning/STATE.md b/.planning/STATE.md index ca40052..acaefcb 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -3,14 +3,14 @@ gsd_state_version: 1.0 milestone: v1.0 milestone_name: milestone status: executing -stopped_at: Completed 09-06-PLAN.md (Phase 9 complete) -last_updated: "2026-04-05T21:56:36.779Z" +stopped_at: Completed 10-01-PLAN.md +last_updated: "2026-04-05T22:10:53.439Z" last_activity: 2026-04-05 progress: total_phases: 18 completed_phases: 9 - total_plans: 53 - completed_plans: 54 + total_plans: 62 + completed_plans: 55 percent: 20 --- @@ -21,12 +21,12 @@ progress: See: .planning/PROJECT.md (updated 2026-04-04) **Core value:** Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive. -**Current focus:** Phase 09 — osint-infrastructure +**Current focus:** Phase 10 — osint-code-hosting ## Current Position -Phase: 10 -Plan: Not started +Phase: 10 (osint-code-hosting) — EXECUTING +Plan: 2 of 9 Status: Ready to execute Last activity: 2026-04-05 @@ -85,6 +85,7 @@ Progress: [██░░░░░░░░] 20% | Phase 09-osint-infrastructure P04 | 6min | 2 tasks | 4 files | | Phase 09 P05 | 5m | 2 tasks | 2 files | | Phase 09-osint-infrastructure P06 | 8min | 2 tasks | 2 files | +| Phase 10-osint-code-hosting P01 | 4m | 2 tasks | 7 files | ## Accumulated Context @@ -118,6 +119,8 @@ Recent decisions affecting current work: - [Phase 06-output-reporting]: keys export rejects SARIF (scan-only); keys show always unmasked; keys verify updates findings inline via db.SQL().Exec - [Phase 08-dork-engine]: pkg/dorks mirrors pkg/providers go:embed pattern; //go:embed definitions/* tolerates empty .gitkeep-only tree - [Phase 08-dork-engine]: Runner + Executor interface separate from Registry so 08-05 GitHub executor registers without touching YAML loader +- [Phase 10-osint-code-hosting]: Client handles retry only; rate limiting is caller's responsibility via LimiterRegistry +- [Phase 10-osint-code-hosting]: github/gist use 'kw' in:file; all other sources use bare keyword ### Pending Todos @@ -132,6 +135,6 @@ None yet. ## Session Continuity -Last session: 2026-04-05T21:53:23.957Z -Stopped at: Completed 09-06-PLAN.md (Phase 9 complete) +Last session: 2026-04-05T22:10:53.436Z +Stopped at: Completed 10-01-PLAN.md Resume file: None diff --git a/.planning/phases/10-osint-code-hosting/10-01-SUMMARY.md b/.planning/phases/10-osint-code-hosting/10-01-SUMMARY.md new file mode 100644 index 0000000..d9d745f --- /dev/null +++ b/.planning/phases/10-osint-code-hosting/10-01-SUMMARY.md @@ -0,0 +1,99 @@ +--- +phase: 10-osint-code-hosting +plan: 01 +subsystem: recon/sources +tags: [recon, osint, http, foundation, wave-1] +requires: + - pkg/recon.Engine (Phase 9) + - pkg/providers.Registry + - pkg/recon.LimiterRegistry (Phase 9) +provides: + - pkg/recon/sources.Client (retry-aware HTTP wrapper) + - pkg/recon/sources.ErrUnauthorized + - pkg/recon/sources.ParseRetryAfter + - pkg/recon/sources.BuildQueries + - pkg/recon/sources.SourcesConfig + - pkg/recon/sources.RegisterAll (stub) +affects: + - pkg/recon/sources (new package) +tech_stack_added: [] +patterns: + - "Retry on 429/403/5xx honoring Retry-After; 401 is terminal" + - "Context cancellation honored during retry backoff sleeps" + - "Provider-driven query generation with per-source syntax switch" +key_files_created: + - pkg/recon/sources/doc.go + - pkg/recon/sources/httpclient.go + - pkg/recon/sources/httpclient_test.go + - pkg/recon/sources/queries.go + - pkg/recon/sources/queries_test.go + - pkg/recon/sources/register.go + - pkg/recon/sources/testhelpers_test.go +key_files_modified: [] +decisions: + - "Client handles retry only; callers invoke LimiterRegistry.Wait before Do (single-purpose)" + - "github/gist use 'kw' in:file syntax; gitlab/bitbucket/codeberg/huggingface use bare keywords" + - "Unknown source names fall back to bare keyword (safe default for future sources)" + - "SourcesConfig shipped as placeholder struct so Wave 2 plans can type-depend on its shape" +metrics: + duration_minutes: 4 + tasks_completed: 2 + tests_added: 18 + completed_at: "2026-04-05T22:10:00Z" +--- + +# Phase 10 Plan 01: Recon Sources Foundation Summary + +One-liner: Retry-aware HTTP client, provider-driven query generator, and empty RegisterAll bootstrap that unblocks Wave 2 plans 10-02..10-08 to run in parallel. + +## What Was Built + +The shared foundation for every Phase 10 code-hosting source now lives in `pkg/recon/sources`: + +1. **`Client`** — wraps `*http.Client` with retry on 429/403/5xx, `Retry-After` honoring, and context cancellation during backoff. 401 short-circuits to `ErrUnauthorized` (no retries). Default UA `keyhunter-recon/1.0`, 30s timeout, 2 retries. +2. **`BuildQueries(reg, source)`** — iterates `providers.Registry.List()`, dedups keywords across providers, sorts for determinism, and applies per-source search syntax via `formatQuery`. GitHub and Gist get `"keyword" in:file`; all others get the bare keyword. +3. **`SourcesConfig` + `RegisterAll`** — placeholder struct carrying per-source tokens and shared Registry/Limiters, plus a no-op registration function with a nil-engine guard. Plan 10-09 will fill the body after Wave 2 delivers individual sources. + +## Tasks + +| # | Name | Commit | Status | +| - | ---------------------------------------------------------- | ------- | ------ | +| 1 | Shared retry HTTP client helper | 75024e4 | done | +| 2 | Provider-driven query generator + RegisterAll skeleton | 9273f35 | done | + +## Tests + +All tests green (`go test ./pkg/recon/sources/...` → PASS in ~3.1s). + +HTTP client tests (httptest-backed): +- OK pass-through, 429 retry, 403 retry, 401 no-retry (ErrUnauthorized), ctx cancel during backoff, retries exhausted, default UA, ParseRetryAfter table. + +Query generator tests: +- GitHub/Gist `in:file` syntax, GitLab/HuggingFace bare keywords, unknown-source default, nil registry, cross-provider dedup, empty-keyword skip. + +RegisterAll tests: +- Nil-engine no-panic, empty-cfg no-panic on real engine. + +## Decisions Made + +- **Single-purpose Client:** rate limiting is caller's job via `recon.LimiterRegistry.Wait`, keeping retry/backoff logic decoupled from rate policy. Avoids coupling 10 sources to a single limiter injection shape. +- **Deterministic queries:** sorting keywords means test output is stable and cache keys are reproducible when future plans memoize search results. +- **Placeholder `SourcesConfig`:** Wave 2 plans can write `sources.SourcesConfig{...}` against a stable shape before Plan 10-09 ships credential loading. + +## Deviations from Plan + +None — plan executed exactly as written. A small `testhelpers_test.go` file was added (not listed in `files_modified`) purely to expose a test-only `newTestEngine()` helper shared between test files; this is idiomatic Go test scaffolding, not a functional deviation. + +## Verification + +- `go build ./...` — clean +- `go vet ./pkg/recon/sources/...` — clean +- `go test ./pkg/recon/sources/... -timeout 60s` — PASS (3.11s) + +## Ready For + +Wave 2 plans (10-02 GitHub, 10-03 GitLab, 10-04 Bitbucket, 10-05 Gist, 10-06 Codeberg, 10-07 HuggingFace, 10-08 Kaggle/sandboxes) can now import `pkg/recon/sources` and use `Client` + `BuildQueries` in parallel without conflicts. Plan 10-09 will populate `RegisterAll` with the full source list and wire it into `cmd/recon.go`. + +## Self-Check: PASSED + +All 7 artifact files present, both commits (75024e4, 9273f35) reachable in git history, SUMMARY.md on disk.