--- phase: 10-osint-code-hosting plan: 01 type: execute wave: 1 depends_on: [] files_modified: - pkg/recon/sources/doc.go - pkg/recon/sources/httpclient.go - pkg/recon/sources/httpclient_test.go - pkg/recon/sources/queries.go - pkg/recon/sources/queries_test.go - pkg/recon/sources/register.go autonomous: true requirements: [] must_haves: truths: - "Shared retry HTTP client honors ctx cancellation and Retry-After on 429/403" - "Provider registry drives per-source query templates (no hardcoded literals)" - "Empty source registry compiles and exposes RegisterAll(engine, cfg)" artifacts: - path: "pkg/recon/sources/httpclient.go" provides: "Retrying *http.Client with context + Retry-After handling" - path: "pkg/recon/sources/queries.go" provides: "BuildQueries(registry, sourceName) []string generator" - path: "pkg/recon/sources/register.go" provides: "RegisterAll(engine *recon.Engine, cfg SourcesConfig) bootstrap" key_links: - from: "pkg/recon/sources/httpclient.go" to: "net/http + context + golang.org/x/time/rate" via: "DoWithRetry(ctx, req, limiter) (*http.Response, error)" pattern: "DoWithRetry" - from: "pkg/recon/sources/queries.go" to: "pkg/providers.Registry" via: "BuildQueries iterates reg.List() and formats provider keywords" pattern: "BuildQueries" --- Establish the shared foundation for all Phase 10 code hosting sources: a retry-aware HTTP client wrapper, a provider→query template generator driven by the provider registry, and an empty RegisterAll bootstrap that Plan 10-09 will fill in. No individual source is implemented here — this plan exists so Wave 2 plans (10-02..10-08) can run in parallel without fighting over shared helpers. Purpose: Deduplicate retry/rate-limit/backoff logic across 10 sources; centralize query generation so providers added later automatically flow to every source. Output: Compilable `pkg/recon/sources` package skeleton with tested helpers. @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md @.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/10-osint-code-hosting/10-CONTEXT.md @pkg/recon/source.go @pkg/recon/limiter.go @pkg/dorks/github.go @pkg/providers/registry.go From pkg/recon/source.go: ```go type ReconSource interface { Name() string RateLimit() rate.Limit Burst() int RespectsRobots() bool Enabled(cfg Config) bool Sweep(ctx context.Context, query string, out chan<- Finding) error } type Finding = engine.Finding type Config struct { Stealth, RespectRobots bool; EnabledSources []string; Query string } ``` From pkg/recon/limiter.go: ```go type LimiterRegistry struct { ... } func NewLimiterRegistry() *LimiterRegistry func (lr *LimiterRegistry) Wait(ctx, name, r, burst, stealth) error ``` From pkg/providers/registry.go: ```go func (r *Registry) List() []Provider // Provider has: Name string, Keywords []string, Patterns []Pattern, Tier int ``` From pkg/engine/finding.go: ```go type Finding struct { ProviderName, KeyValue, KeyMasked, Confidence, Source, SourceType string LineNumber int; Offset int64; DetectedAt time.Time Verified bool; VerifyStatus string; ... } ``` Task 1: Shared retry HTTP client helper pkg/recon/sources/doc.go, pkg/recon/sources/httpclient.go, pkg/recon/sources/httpclient_test.go - Test A: 200 OK returns response unchanged, body readable - Test B: 429 with Retry-After:1 triggers one retry then succeeds (verify via httptest counter) - Test C: 403 with Retry-After triggers retry - Test D: 401 returns ErrUnauthorized immediately, no retry - Test E: Ctx cancellation during retry sleep returns ctx.Err() - Test F: MaxRetries exhausted returns wrapped last-status error Create `pkg/recon/sources/doc.go` with the package comment: "Package sources hosts per-OSINT-source ReconSource implementations for Phase 10 code hosting (GitHub, GitLab, Bitbucket, Gist, Codeberg, HuggingFace, Kaggle, Replit, CodeSandbox, sandboxes). Each source implements pkg/recon.ReconSource." Create `pkg/recon/sources/httpclient.go` exporting: ```go package sources import ( "context" "errors" "fmt" "net/http" "strconv" "time" ) // ErrUnauthorized is returned when an API rejects credentials (401). var ErrUnauthorized = errors.New("sources: unauthorized (check credentials)") // Client is the shared retry wrapper every Phase 10 source uses. type Client struct { HTTP *http.Client MaxRetries int // default 2 UserAgent string // default "keyhunter-recon/1.0" } // NewClient returns a Client with a 30s timeout and 2 retries. func NewClient() *Client { return &Client{HTTP: &http.Client{Timeout: 30 * time.Second}, MaxRetries: 2, UserAgent: "keyhunter-recon/1.0"} } // Do executes req with retries on 429/403/5xx honoring Retry-After. // 401 returns ErrUnauthorized wrapped with the response body. // Ctx cancellation is honored during sleeps. func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error) { if req.Header.Get("User-Agent") == "" { req.Header.Set("User-Agent", c.UserAgent) } var last *http.Response for attempt := 0; attempt <= c.MaxRetries; attempt++ { r, err := c.HTTP.Do(req.WithContext(ctx)) if err != nil { return nil, fmt.Errorf("sources http: %w", err) } if r.StatusCode == http.StatusOK { return r, nil } if r.StatusCode == http.StatusUnauthorized { body := readBody(r) return nil, fmt.Errorf("%w: %s", ErrUnauthorized, body) } retriable := r.StatusCode == 429 || r.StatusCode == 403 || r.StatusCode >= 500 if !retriable || attempt == c.MaxRetries { body := readBody(r) return nil, fmt.Errorf("sources http %d: %s", r.StatusCode, body) } sleep := ParseRetryAfter(r.Header.Get("Retry-After")) r.Body.Close() last = r select { case <-time.After(sleep): case <-ctx.Done(): return nil, ctx.Err() } } _ = last return nil, fmt.Errorf("sources http: retries exhausted") } // ParseRetryAfter decodes integer-seconds Retry-After, defaulting to 1s. func ParseRetryAfter(v string) time.Duration { ... } // readBody reads up to 4KB of the body and closes it. func readBody(r *http.Response) string { ... } ``` Create `pkg/recon/sources/httpclient_test.go` using `net/http/httptest`: - Table-driven tests for each behavior above. Use an atomic counter to verify retry attempt counts. Use `httptest.NewServer` with a handler that switches on a request counter. - For ctx cancellation test: set Retry-After: 10, cancel ctx inside 100ms, assert ctx.Err() returned within 500ms. Do NOT build a LimiterRegistry wrapper here — each source calls its own LimiterRegistry.Wait before calling Client.Do. Keeps Client single-purpose (retry only). cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run TestClient -v -timeout 30s All behaviors covered; Client.Do retries on 429/403/5xx honoring Retry-After; 401 returns ErrUnauthorized immediately; ctx cancellation respected; tests green. Task 2: Provider-driven query generator + RegisterAll skeleton pkg/recon/sources/queries.go, pkg/recon/sources/queries_test.go, pkg/recon/sources/register.go - Test A: BuildQueries(reg, "github") returns one query per (provider, keyword) tuple formatted as GitHub search syntax, e.g. `"sk-proj-" in:file` - Test B: BuildQueries(reg, "gitlab") returns queries formatted for GitLab search syntax (raw keyword, no `in:file`) - Test C: BuildQueries(reg, "huggingface") returns bare keyword queries - Test D: Unknown source name returns bare keyword queries (safe default) - Test E: Providers with empty Keywords slice are skipped - Test F: Keyword dedup — if two providers share keyword, emit once per source - Test G: RegisterAll(nil, cfg) is a no-op that does not panic; RegisterAll with empty cfg does not panic Create `pkg/recon/sources/queries.go`: ```go package sources import ( "fmt" "sort" "github.com/salvacybersec/keyhunter/pkg/providers" ) // BuildQueries produces the search-string list a source should iterate for a // given provider registry. Each keyword is formatted per source-specific syntax. // Result is deterministic (sorted) for reproducible tests. func BuildQueries(reg *providers.Registry, source string) []string { if reg == nil { return nil } seen := make(map[string]struct{}) for _, p := range reg.List() { for _, k := range p.Keywords { if k == "" { continue } seen[k] = struct{}{} } } keywords := make([]string, 0, len(seen)) for k := range seen { keywords = append(keywords, k) } sort.Strings(keywords) out := make([]string, 0, len(keywords)) for _, k := range keywords { out = append(out, formatQuery(source, k)) } return out } func formatQuery(source, keyword string) string { switch source { case "github", "gist": return fmt.Sprintf("%q in:file", keyword) case "gitlab": return keyword // GitLab code search doesn't support in:file qualifier case "bitbucket": return keyword case "codeberg": return keyword default: return keyword } } ``` Create `pkg/recon/sources/queries_test.go` using `providers.NewRegistryFromProviders` with two synthetic providers (shared keyword to test dedup). Create `pkg/recon/sources/register.go`: ```go package sources import ( "github.com/salvacybersec/keyhunter/pkg/providers" "github.com/salvacybersec/keyhunter/pkg/recon" ) // SourcesConfig carries per-source credentials read from viper/env by cmd/recon.go. // Plan 10-09 fleshes this out; for now it is a placeholder struct so downstream // plans can depend on its shape. type SourcesConfig struct { GitHubToken string GitLabToken string BitbucketToken string HuggingFaceToken string KaggleUser string KaggleKey string Registry *providers.Registry Limiters *recon.LimiterRegistry } // RegisterAll registers every Phase 10 code-hosting source on engine. // Wave 2 plans append their source constructors here via additional // registerXxx helpers in this file. Plan 10-09 writes the final list. func RegisterAll(engine *recon.Engine, cfg SourcesConfig) { if engine == nil { return } // Populated by Plan 10-09 (after Wave 2 lands individual source files). } ``` Do NOT wire this into cmd/recon.go yet — Plan 10-09 handles CLI integration after every source exists. cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestBuildQueries|TestRegisterAll" -v -timeout 30s && go build ./... BuildQueries is deterministic, dedups keywords, formats per-source syntax. RegisterAll compiles as a no-op stub. Package builds with zero source implementations — ready for Wave 2 plans to add files in parallel. - `go build ./...` succeeds - `go test ./pkg/recon/sources/...` passes - `go vet ./pkg/recon/sources/...` clean pkg/recon/sources package exists with httpclient.go, queries.go, register.go, doc.go and all tests green. No source implementations present yet — that is Wave 2. After completion, create `.planning/phases/10-osint-code-hosting/10-01-SUMMARY.md`.