docs(10-01): complete recon sources foundation plan
This commit is contained in:
@@ -217,7 +217,7 @@ Plans:
|
||||
5. All code hosting source findings are stored in the database with source attribution and deduplication
|
||||
**Plans**: 9 plans
|
||||
Plans:
|
||||
- [ ] 10-01-PLAN.md — Shared HTTP client + provider-query generator + RegisterAll skeleton
|
||||
- [x] 10-01-PLAN.md — Shared HTTP client + provider-query generator + RegisterAll skeleton
|
||||
- [ ] 10-02-PLAN.md — GitHubSource (RECON-CODE-01)
|
||||
- [ ] 10-03-PLAN.md — GitLabSource (RECON-CODE-02)
|
||||
- [ ] 10-04-PLAN.md — BitbucketSource + GistSource (RECON-CODE-03, RECON-CODE-04)
|
||||
@@ -336,7 +336,7 @@ Phases execute in numeric order: 1 → 2 → 3 → ... → 18
|
||||
| 7. Import Adapters & CI/CD Integration | 0/? | Not started | - |
|
||||
| 8. Dork Engine | 0/? | Not started | - |
|
||||
| 9. OSINT Infrastructure | 2/6 | In Progress| |
|
||||
| 10. OSINT Code Hosting | 0/? | Not started | - |
|
||||
| 10. OSINT Code Hosting | 1/9 | In Progress| |
|
||||
| 11. OSINT Search & Paste | 0/? | Not started | - |
|
||||
| 12. OSINT IoT & Cloud Storage | 0/? | Not started | - |
|
||||
| 13. OSINT Package Registries & Container/IaC | 0/? | Not started | - |
|
||||
|
||||
@@ -3,14 +3,14 @@ gsd_state_version: 1.0
|
||||
milestone: v1.0
|
||||
milestone_name: milestone
|
||||
status: executing
|
||||
stopped_at: Completed 09-06-PLAN.md (Phase 9 complete)
|
||||
last_updated: "2026-04-05T21:56:36.779Z"
|
||||
stopped_at: Completed 10-01-PLAN.md
|
||||
last_updated: "2026-04-05T22:10:53.439Z"
|
||||
last_activity: 2026-04-05
|
||||
progress:
|
||||
total_phases: 18
|
||||
completed_phases: 9
|
||||
total_plans: 53
|
||||
completed_plans: 54
|
||||
total_plans: 62
|
||||
completed_plans: 55
|
||||
percent: 20
|
||||
---
|
||||
|
||||
@@ -21,12 +21,12 @@ progress:
|
||||
See: .planning/PROJECT.md (updated 2026-04-04)
|
||||
|
||||
**Core value:** Detect leaked LLM API keys across more providers and more internet sources than any other tool, with active verification to confirm keys are real and alive.
|
||||
**Current focus:** Phase 09 — osint-infrastructure
|
||||
**Current focus:** Phase 10 — osint-code-hosting
|
||||
|
||||
## Current Position
|
||||
|
||||
Phase: 10
|
||||
Plan: Not started
|
||||
Phase: 10 (osint-code-hosting) — EXECUTING
|
||||
Plan: 2 of 9
|
||||
Status: Ready to execute
|
||||
Last activity: 2026-04-05
|
||||
|
||||
@@ -85,6 +85,7 @@ Progress: [██░░░░░░░░] 20%
|
||||
| Phase 09-osint-infrastructure P04 | 6min | 2 tasks | 4 files |
|
||||
| Phase 09 P05 | 5m | 2 tasks | 2 files |
|
||||
| Phase 09-osint-infrastructure P06 | 8min | 2 tasks | 2 files |
|
||||
| Phase 10-osint-code-hosting P01 | 4m | 2 tasks | 7 files |
|
||||
|
||||
## Accumulated Context
|
||||
|
||||
@@ -118,6 +119,8 @@ Recent decisions affecting current work:
|
||||
- [Phase 06-output-reporting]: keys export rejects SARIF (scan-only); keys show always unmasked; keys verify updates findings inline via db.SQL().Exec
|
||||
- [Phase 08-dork-engine]: pkg/dorks mirrors pkg/providers go:embed pattern; //go:embed definitions/* tolerates empty .gitkeep-only tree
|
||||
- [Phase 08-dork-engine]: Runner + Executor interface separate from Registry so 08-05 GitHub executor registers without touching YAML loader
|
||||
- [Phase 10-osint-code-hosting]: Client handles retry only; rate limiting is caller's responsibility via LimiterRegistry
|
||||
- [Phase 10-osint-code-hosting]: github/gist use 'kw' in:file; all other sources use bare keyword
|
||||
|
||||
### Pending Todos
|
||||
|
||||
@@ -132,6 +135,6 @@ None yet.
|
||||
|
||||
## Session Continuity
|
||||
|
||||
Last session: 2026-04-05T21:53:23.957Z
|
||||
Stopped at: Completed 09-06-PLAN.md (Phase 9 complete)
|
||||
Last session: 2026-04-05T22:10:53.436Z
|
||||
Stopped at: Completed 10-01-PLAN.md
|
||||
Resume file: None
|
||||
|
||||
99
.planning/phases/10-osint-code-hosting/10-01-SUMMARY.md
Normal file
99
.planning/phases/10-osint-code-hosting/10-01-SUMMARY.md
Normal file
@@ -0,0 +1,99 @@
|
||||
---
|
||||
phase: 10-osint-code-hosting
|
||||
plan: 01
|
||||
subsystem: recon/sources
|
||||
tags: [recon, osint, http, foundation, wave-1]
|
||||
requires:
|
||||
- pkg/recon.Engine (Phase 9)
|
||||
- pkg/providers.Registry
|
||||
- pkg/recon.LimiterRegistry (Phase 9)
|
||||
provides:
|
||||
- pkg/recon/sources.Client (retry-aware HTTP wrapper)
|
||||
- pkg/recon/sources.ErrUnauthorized
|
||||
- pkg/recon/sources.ParseRetryAfter
|
||||
- pkg/recon/sources.BuildQueries
|
||||
- pkg/recon/sources.SourcesConfig
|
||||
- pkg/recon/sources.RegisterAll (stub)
|
||||
affects:
|
||||
- pkg/recon/sources (new package)
|
||||
tech_stack_added: []
|
||||
patterns:
|
||||
- "Retry on 429/403/5xx honoring Retry-After; 401 is terminal"
|
||||
- "Context cancellation honored during retry backoff sleeps"
|
||||
- "Provider-driven query generation with per-source syntax switch"
|
||||
key_files_created:
|
||||
- pkg/recon/sources/doc.go
|
||||
- pkg/recon/sources/httpclient.go
|
||||
- pkg/recon/sources/httpclient_test.go
|
||||
- pkg/recon/sources/queries.go
|
||||
- pkg/recon/sources/queries_test.go
|
||||
- pkg/recon/sources/register.go
|
||||
- pkg/recon/sources/testhelpers_test.go
|
||||
key_files_modified: []
|
||||
decisions:
|
||||
- "Client handles retry only; callers invoke LimiterRegistry.Wait before Do (single-purpose)"
|
||||
- "github/gist use 'kw' in:file syntax; gitlab/bitbucket/codeberg/huggingface use bare keywords"
|
||||
- "Unknown source names fall back to bare keyword (safe default for future sources)"
|
||||
- "SourcesConfig shipped as placeholder struct so Wave 2 plans can type-depend on its shape"
|
||||
metrics:
|
||||
duration_minutes: 4
|
||||
tasks_completed: 2
|
||||
tests_added: 18
|
||||
completed_at: "2026-04-05T22:10:00Z"
|
||||
---
|
||||
|
||||
# Phase 10 Plan 01: Recon Sources Foundation Summary
|
||||
|
||||
One-liner: Retry-aware HTTP client, provider-driven query generator, and empty RegisterAll bootstrap that unblocks Wave 2 plans 10-02..10-08 to run in parallel.
|
||||
|
||||
## What Was Built
|
||||
|
||||
The shared foundation for every Phase 10 code-hosting source now lives in `pkg/recon/sources`:
|
||||
|
||||
1. **`Client`** — wraps `*http.Client` with retry on 429/403/5xx, `Retry-After` honoring, and context cancellation during backoff. 401 short-circuits to `ErrUnauthorized` (no retries). Default UA `keyhunter-recon/1.0`, 30s timeout, 2 retries.
|
||||
2. **`BuildQueries(reg, source)`** — iterates `providers.Registry.List()`, dedups keywords across providers, sorts for determinism, and applies per-source search syntax via `formatQuery`. GitHub and Gist get `"keyword" in:file`; all others get the bare keyword.
|
||||
3. **`SourcesConfig` + `RegisterAll`** — placeholder struct carrying per-source tokens and shared Registry/Limiters, plus a no-op registration function with a nil-engine guard. Plan 10-09 will fill the body after Wave 2 delivers individual sources.
|
||||
|
||||
## Tasks
|
||||
|
||||
| # | Name | Commit | Status |
|
||||
| - | ---------------------------------------------------------- | ------- | ------ |
|
||||
| 1 | Shared retry HTTP client helper | 75024e4 | done |
|
||||
| 2 | Provider-driven query generator + RegisterAll skeleton | 9273f35 | done |
|
||||
|
||||
## Tests
|
||||
|
||||
All tests green (`go test ./pkg/recon/sources/...` → PASS in ~3.1s).
|
||||
|
||||
HTTP client tests (httptest-backed):
|
||||
- OK pass-through, 429 retry, 403 retry, 401 no-retry (ErrUnauthorized), ctx cancel during backoff, retries exhausted, default UA, ParseRetryAfter table.
|
||||
|
||||
Query generator tests:
|
||||
- GitHub/Gist `in:file` syntax, GitLab/HuggingFace bare keywords, unknown-source default, nil registry, cross-provider dedup, empty-keyword skip.
|
||||
|
||||
RegisterAll tests:
|
||||
- Nil-engine no-panic, empty-cfg no-panic on real engine.
|
||||
|
||||
## Decisions Made
|
||||
|
||||
- **Single-purpose Client:** rate limiting is caller's job via `recon.LimiterRegistry.Wait`, keeping retry/backoff logic decoupled from rate policy. Avoids coupling 10 sources to a single limiter injection shape.
|
||||
- **Deterministic queries:** sorting keywords means test output is stable and cache keys are reproducible when future plans memoize search results.
|
||||
- **Placeholder `SourcesConfig`:** Wave 2 plans can write `sources.SourcesConfig{...}` against a stable shape before Plan 10-09 ships credential loading.
|
||||
|
||||
## Deviations from Plan
|
||||
|
||||
None — plan executed exactly as written. A small `testhelpers_test.go` file was added (not listed in `files_modified`) purely to expose a test-only `newTestEngine()` helper shared between test files; this is idiomatic Go test scaffolding, not a functional deviation.
|
||||
|
||||
## Verification
|
||||
|
||||
- `go build ./...` — clean
|
||||
- `go vet ./pkg/recon/sources/...` — clean
|
||||
- `go test ./pkg/recon/sources/... -timeout 60s` — PASS (3.11s)
|
||||
|
||||
## Ready For
|
||||
|
||||
Wave 2 plans (10-02 GitHub, 10-03 GitLab, 10-04 Bitbucket, 10-05 Gist, 10-06 Codeberg, 10-07 HuggingFace, 10-08 Kaggle/sandboxes) can now import `pkg/recon/sources` and use `Client` + `BuildQueries` in parallel without conflicts. Plan 10-09 will populate `RegisterAll` with the full source list and wire it into `cmd/recon.go`.
|
||||
|
||||
## Self-Check: PASSED
|
||||
|
||||
All 7 artifact files present, both commits (75024e4, 9273f35) reachable in git history, SUMMARY.md on disk.
|
||||
Reference in New Issue
Block a user