docs(10-01): complete recon sources foundation plan

This commit is contained in:
salvacybersec
2026-04-06 01:10:57 +03:00
parent 9273f356e6
commit 9b1aaae28d
3 changed files with 113 additions and 11 deletions

View File

@@ -217,7 +217,7 @@ Plans:
5. All code hosting source findings are stored in the database with source attribution and deduplication 5. All code hosting source findings are stored in the database with source attribution and deduplication
**Plans**: 9 plans **Plans**: 9 plans
Plans: Plans:
- [ ] 10-01-PLAN.md — Shared HTTP client + provider-query generator + RegisterAll skeleton - [x] 10-01-PLAN.md — Shared HTTP client + provider-query generator + RegisterAll skeleton
- [ ] 10-02-PLAN.md — GitHubSource (RECON-CODE-01) - [ ] 10-02-PLAN.md — GitHubSource (RECON-CODE-01)
- [ ] 10-03-PLAN.md — GitLabSource (RECON-CODE-02) - [ ] 10-03-PLAN.md — GitLabSource (RECON-CODE-02)
- [ ] 10-04-PLAN.md — BitbucketSource + GistSource (RECON-CODE-03, RECON-CODE-04) - [ ] 10-04-PLAN.md — BitbucketSource + GistSource (RECON-CODE-03, RECON-CODE-04)
@@ -336,7 +336,7 @@ Phases execute in numeric order: 1 → 2 → 3 → ... → 18
| 7. Import Adapters & CI/CD Integration | 0/? | Not started | - | | 7. Import Adapters & CI/CD Integration | 0/? | Not started | - |
| 8. Dork Engine | 0/? | Not started | - | | 8. Dork Engine | 0/? | Not started | - |
| 9. OSINT Infrastructure | 2/6 | In Progress| | | 9. OSINT Infrastructure | 2/6 | In Progress| |
| 10. OSINT Code Hosting | 0/? | Not started | - | | 10. OSINT Code Hosting | 1/9 | In Progress| |
| 11. OSINT Search & Paste | 0/? | Not started | - | | 11. OSINT Search & Paste | 0/? | Not started | - |
| 12. OSINT IoT & Cloud Storage | 0/? | Not started | - | | 12. OSINT IoT & Cloud Storage | 0/? | Not started | - |
| 13. OSINT Package Registries & Container/IaC | 0/? | Not started | - | | 13. OSINT Package Registries & Container/IaC | 0/? | Not started | - |

View File

@@ -3,14 +3,14 @@ gsd_state_version: 1.0
milestone: v1.0 milestone: v1.0
milestone_name: milestone milestone_name: milestone
status: executing status: executing
stopped_at: Completed 09-06-PLAN.md (Phase 9 complete) stopped_at: Completed 10-01-PLAN.md
last_updated: "2026-04-05T21:56:36.779Z" last_updated: "2026-04-05T22:10:53.439Z"
last_activity: 2026-04-05 last_activity: 2026-04-05
progress: progress:
total_phases: 18 total_phases: 18
completed_phases: 9 completed_phases: 9
total_plans: 53 total_plans: 62
completed_plans: 54 completed_plans: 55
percent: 20 percent: 20
--- ---
@@ -21,12 +21,12 @@ progress:
See: .planning/PROJECT.md (updated 2026-04-04) See: .planning/PROJECT.md (updated 2026-04-04)
**Core value:** Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive. **Core value:** Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive.
**Current focus:** Phase 09 — osint-infrastructure **Current focus:** Phase 10 — osint-code-hosting
## Current Position ## Current Position
Phase: 10 Phase: 10 (osint-code-hosting) — EXECUTING
Plan: Not started Plan: 2 of 9
Status: Ready to execute Status: Ready to execute
Last activity: 2026-04-05 Last activity: 2026-04-05
@@ -85,6 +85,7 @@ Progress: [██░░░░░░░░] 20%
| Phase 09-osint-infrastructure P04 | 6min | 2 tasks | 4 files | | Phase 09-osint-infrastructure P04 | 6min | 2 tasks | 4 files |
| Phase 09 P05 | 5m | 2 tasks | 2 files | | Phase 09 P05 | 5m | 2 tasks | 2 files |
| Phase 09-osint-infrastructure P06 | 8min | 2 tasks | 2 files | | Phase 09-osint-infrastructure P06 | 8min | 2 tasks | 2 files |
| Phase 10-osint-code-hosting P01 | 4m | 2 tasks | 7 files |
## Accumulated Context ## Accumulated Context
@@ -118,6 +119,8 @@ Recent decisions affecting current work:
- [Phase 06-output-reporting]: keys export rejects SARIF (scan-only); keys show always unmasked; keys verify updates findings inline via db.SQL().Exec - [Phase 06-output-reporting]: keys export rejects SARIF (scan-only); keys show always unmasked; keys verify updates findings inline via db.SQL().Exec
- [Phase 08-dork-engine]: pkg/dorks mirrors pkg/providers go:embed pattern; //go:embed definitions/* tolerates empty .gitkeep-only tree - [Phase 08-dork-engine]: pkg/dorks mirrors pkg/providers go:embed pattern; //go:embed definitions/* tolerates empty .gitkeep-only tree
- [Phase 08-dork-engine]: Runner + Executor interface separate from Registry so 08-05 GitHub executor registers without touching YAML loader - [Phase 08-dork-engine]: Runner + Executor interface separate from Registry so 08-05 GitHub executor registers without touching YAML loader
- [Phase 10-osint-code-hosting]: Client handles retry only; rate limiting is caller's responsibility via LimiterRegistry
- [Phase 10-osint-code-hosting]: github/gist use 'kw' in:file; all other sources use bare keyword
### Pending Todos ### Pending Todos
@@ -132,6 +135,6 @@ None yet.
## Session Continuity ## Session Continuity
Last session: 2026-04-05T21:53:23.957Z Last session: 2026-04-05T22:10:53.436Z
Stopped at: Completed 09-06-PLAN.md (Phase 9 complete) Stopped at: Completed 10-01-PLAN.md
Resume file: None Resume file: None

View File

@@ -0,0 +1,99 @@
---
phase: 10-osint-code-hosting
plan: 01
subsystem: recon/sources
tags: [recon, osint, http, foundation, wave-1]
requires:
- pkg/recon.Engine (Phase 9)
- pkg/providers.Registry
- pkg/recon.LimiterRegistry (Phase 9)
provides:
- pkg/recon/sources.Client (retry-aware HTTP wrapper)
- pkg/recon/sources.ErrUnauthorized
- pkg/recon/sources.ParseRetryAfter
- pkg/recon/sources.BuildQueries
- pkg/recon/sources.SourcesConfig
- pkg/recon/sources.RegisterAll (stub)
affects:
- pkg/recon/sources (new package)
tech_stack_added: []
patterns:
- "Retry on 429/403/5xx honoring Retry-After; 401 is terminal"
- "Context cancellation honored during retry backoff sleeps"
- "Provider-driven query generation with per-source syntax switch"
key_files_created:
- pkg/recon/sources/doc.go
- pkg/recon/sources/httpclient.go
- pkg/recon/sources/httpclient_test.go
- pkg/recon/sources/queries.go
- pkg/recon/sources/queries_test.go
- pkg/recon/sources/register.go
- pkg/recon/sources/testhelpers_test.go
key_files_modified: []
decisions:
- "Client handles retry only; callers invoke LimiterRegistry.Wait before Do (single-purpose)"
- "github/gist use 'kw' in:file syntax; gitlab/bitbucket/codeberg/huggingface use bare keywords"
- "Unknown source names fall back to bare keyword (safe default for future sources)"
- "SourcesConfig shipped as placeholder struct so Wave 2 plans can type-depend on its shape"
metrics:
duration_minutes: 4
tasks_completed: 2
tests_added: 18
completed_at: "2026-04-05T22:10:00Z"
---
# Phase 10 Plan 01: Recon Sources Foundation Summary
One-liner: Retry-aware HTTP client, provider-driven query generator, and empty RegisterAll bootstrap that unblocks Wave 2 plans 10-02..10-08 to run in parallel.
## What Was Built
The shared foundation for every Phase 10 code-hosting source now lives in `pkg/recon/sources`:
1. **`Client`** — wraps `*http.Client` with retry on 429/403/5xx, `Retry-After` honoring, and context cancellation during backoff. 401 short-circuits to `ErrUnauthorized` (no retries). Default UA `keyhunter-recon/1.0`, 30s timeout, 2 retries.
2. **`BuildQueries(reg, source)`** — iterates `providers.Registry.List()`, dedups keywords across providers, sorts for determinism, and applies per-source search syntax via `formatQuery`. GitHub and Gist get `"keyword" in:file`; all others get the bare keyword.
3. **`SourcesConfig` + `RegisterAll`** — placeholder struct carrying per-source tokens and shared Registry/Limiters, plus a no-op registration function with a nil-engine guard. Plan 10-09 will fill the body after Wave 2 delivers individual sources.
## Tasks
| # | Name | Commit | Status |
| - | ---------------------------------------------------------- | ------- | ------ |
| 1 | Shared retry HTTP client helper | 75024e4 | done |
| 2 | Provider-driven query generator + RegisterAll skeleton | 9273f35 | done |
## Tests
All tests green (`go test ./pkg/recon/sources/...` → PASS in ~3.1s).
HTTP client tests (httptest-backed):
- OK pass-through, 429 retry, 403 retry, 401 no-retry (ErrUnauthorized), ctx cancel during backoff, retries exhausted, default UA, ParseRetryAfter table.
Query generator tests:
- GitHub/Gist `in:file` syntax, GitLab/HuggingFace bare keywords, unknown-source default, nil registry, cross-provider dedup, empty-keyword skip.
RegisterAll tests:
- Nil-engine no-panic, empty-cfg no-panic on real engine.
## Decisions Made
- **Single-purpose Client:** rate limiting is caller's job via `recon.LimiterRegistry.Wait`, keeping retry/backoff logic decoupled from rate policy. Avoids coupling 10 sources to a single limiter injection shape.
- **Deterministic queries:** sorting keywords means test output is stable and cache keys are reproducible when future plans memoize search results.
- **Placeholder `SourcesConfig`:** Wave 2 plans can write `sources.SourcesConfig{...}` against a stable shape before Plan 10-09 ships credential loading.
## Deviations from Plan
None — plan executed exactly as written. A small `testhelpers_test.go` file was added (not listed in `files_modified`) purely to expose a test-only `newTestEngine()` helper shared between test files; this is idiomatic Go test scaffolding, not a functional deviation.
## Verification
- `go build ./...` — clean
- `go vet ./pkg/recon/sources/...` — clean
- `go test ./pkg/recon/sources/... -timeout 60s` — PASS (3.11s)
## Ready For
Wave 2 plans (10-02 GitHub, 10-03 GitLab, 10-04 Bitbucket, 10-05 Gist, 10-06 Codeberg, 10-07 HuggingFace, 10-08 Kaggle/sandboxes) can now import `pkg/recon/sources` and use `Client` + `BuildQueries` in parallel without conflicts. Plan 10-09 will populate `RegisterAll` with the full source list and wire it into `cmd/recon.go`.
## Self-Check: PASSED
All 7 artifact files present, both commits (75024e4, 9273f35) reachable in git history, SUMMARY.md on disk.