Files
keyhunter/.planning/phases/10-osint-code-hosting/10-01-PLAN.md

12 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
phase plan type wave depends_on files_modified autonomous requirements must_haves
10-osint-code-hosting 01 execute 1
pkg/recon/sources/doc.go
pkg/recon/sources/httpclient.go
pkg/recon/sources/httpclient_test.go
pkg/recon/sources/queries.go
pkg/recon/sources/queries_test.go
pkg/recon/sources/register.go
true
truths artifacts key_links
Shared retry HTTP client honors ctx cancellation and Retry-After on 429/403
Provider registry drives per-source query templates (no hardcoded literals)
Empty source registry compiles and exposes RegisterAll(engine, cfg)
path provides
pkg/recon/sources/httpclient.go Retrying *http.Client with context + Retry-After handling
path provides
pkg/recon/sources/queries.go BuildQueries(registry, sourceName) []string generator
path provides
pkg/recon/sources/register.go RegisterAll(engine *recon.Engine, cfg SourcesConfig) bootstrap
from to via pattern
pkg/recon/sources/httpclient.go net/http + context + golang.org/x/time/rate DoWithRetry(ctx, req, limiter) (*http.Response, error) DoWithRetry
from to via pattern
pkg/recon/sources/queries.go pkg/providers.Registry BuildQueries iterates reg.List() and formats provider keywords BuildQueries
Establish the shared foundation for all Phase 10 code hosting sources: a retry-aware HTTP client wrapper, a provider→query template generator driven by the provider registry, and an empty RegisterAll bootstrap that Plan 10-09 will fill in. No individual source is implemented here — this plan exists so Wave 2 plans (10-02..10-08) can run in parallel without fighting over shared helpers.

Purpose: Deduplicate retry/rate-limit/backoff logic across 10 sources; centralize query generation so providers added later automatically flow to every source. Output: Compilable pkg/recon/sources package skeleton with tested helpers.

<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/10-osint-code-hosting/10-CONTEXT.md @pkg/recon/source.go @pkg/recon/limiter.go @pkg/dorks/github.go @pkg/providers/registry.go From pkg/recon/source.go: ```go type ReconSource interface { Name() string RateLimit() rate.Limit Burst() int RespectsRobots() bool Enabled(cfg Config) bool Sweep(ctx context.Context, query string, out chan<- Finding) error } type Finding = engine.Finding type Config struct { Stealth, RespectRobots bool; EnabledSources []string; Query string } ```

From pkg/recon/limiter.go:

type LimiterRegistry struct { ... }
func NewLimiterRegistry() *LimiterRegistry
func (lr *LimiterRegistry) Wait(ctx, name, r, burst, stealth) error

From pkg/providers/registry.go:

func (r *Registry) List() []Provider
// Provider has: Name string, Keywords []string, Patterns []Pattern, Tier int

From pkg/engine/finding.go:

type Finding struct {
    ProviderName, KeyValue, KeyMasked, Confidence, Source, SourceType string
    LineNumber int; Offset int64; DetectedAt time.Time
    Verified bool; VerifyStatus string; ...
}
Task 1: Shared retry HTTP client helper pkg/recon/sources/doc.go, pkg/recon/sources/httpclient.go, pkg/recon/sources/httpclient_test.go - Test A: 200 OK returns response unchanged, body readable - Test B: 429 with Retry-After:1 triggers one retry then succeeds (verify via httptest counter) - Test C: 403 with Retry-After triggers retry - Test D: 401 returns ErrUnauthorized immediately, no retry - Test E: Ctx cancellation during retry sleep returns ctx.Err() - Test F: MaxRetries exhausted returns wrapped last-status error Create `pkg/recon/sources/doc.go` with the package comment: "Package sources hosts per-OSINT-source ReconSource implementations for Phase 10 code hosting (GitHub, GitLab, Bitbucket, Gist, Codeberg, HuggingFace, Kaggle, Replit, CodeSandbox, sandboxes). Each source implements pkg/recon.ReconSource."
Create `pkg/recon/sources/httpclient.go` exporting:
```go
package sources

import (
    "context"
    "errors"
    "fmt"
    "net/http"
    "strconv"
    "time"
)

// ErrUnauthorized is returned when an API rejects credentials (401).
var ErrUnauthorized = errors.New("sources: unauthorized (check credentials)")

// Client is the shared retry wrapper every Phase 10 source uses.
type Client struct {
    HTTP       *http.Client
    MaxRetries int    // default 2
    UserAgent  string // default "keyhunter-recon/1.0"
}

// NewClient returns a Client with a 30s timeout and 2 retries.
func NewClient() *Client {
    return &Client{HTTP: &http.Client{Timeout: 30 * time.Second}, MaxRetries: 2, UserAgent: "keyhunter-recon/1.0"}
}

// Do executes req with retries on 429/403/5xx honoring Retry-After.
// 401 returns ErrUnauthorized wrapped with the response body.
// Ctx cancellation is honored during sleeps.
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error) {
    if req.Header.Get("User-Agent") == "" { req.Header.Set("User-Agent", c.UserAgent) }
    var last *http.Response
    for attempt := 0; attempt <= c.MaxRetries; attempt++ {
        r, err := c.HTTP.Do(req.WithContext(ctx))
        if err != nil { return nil, fmt.Errorf("sources http: %w", err) }
        if r.StatusCode == http.StatusOK { return r, nil }
        if r.StatusCode == http.StatusUnauthorized {
            body := readBody(r)
            return nil, fmt.Errorf("%w: %s", ErrUnauthorized, body)
        }
        retriable := r.StatusCode == 429 || r.StatusCode == 403 || r.StatusCode >= 500
        if !retriable || attempt == c.MaxRetries {
            body := readBody(r)
            return nil, fmt.Errorf("sources http %d: %s", r.StatusCode, body)
        }
        sleep := ParseRetryAfter(r.Header.Get("Retry-After"))
        r.Body.Close()
        last = r
        select {
        case <-time.After(sleep):
        case <-ctx.Done(): return nil, ctx.Err()
        }
    }
    _ = last
    return nil, fmt.Errorf("sources http: retries exhausted")
}

// ParseRetryAfter decodes integer-seconds Retry-After, defaulting to 1s.
func ParseRetryAfter(v string) time.Duration { ... }
// readBody reads up to 4KB of the body and closes it.
func readBody(r *http.Response) string { ... }
```

Create `pkg/recon/sources/httpclient_test.go` using `net/http/httptest`:
- Table-driven tests for each behavior above. Use an atomic counter to verify
  retry attempt counts. Use `httptest.NewServer` with a handler that switches on
  a request counter.
- For ctx cancellation test: set Retry-After: 10, cancel ctx inside 100ms, assert
  ctx.Err() returned within 500ms.

Do NOT build a LimiterRegistry wrapper here — each source calls its own LimiterRegistry.Wait
before calling Client.Do. Keeps Client single-purpose (retry only).
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run TestClient -v -timeout 30s All behaviors covered; Client.Do retries on 429/403/5xx honoring Retry-After; 401 returns ErrUnauthorized immediately; ctx cancellation respected; tests green. Task 2: Provider-driven query generator + RegisterAll skeleton pkg/recon/sources/queries.go, pkg/recon/sources/queries_test.go, pkg/recon/sources/register.go - Test A: BuildQueries(reg, "github") returns one query per (provider, keyword) tuple formatted as GitHub search syntax, e.g. `"sk-proj-" in:file` - Test B: BuildQueries(reg, "gitlab") returns queries formatted for GitLab search syntax (raw keyword, no `in:file`) - Test C: BuildQueries(reg, "huggingface") returns bare keyword queries - Test D: Unknown source name returns bare keyword queries (safe default) - Test E: Providers with empty Keywords slice are skipped - Test F: Keyword dedup — if two providers share keyword, emit once per source - Test G: RegisterAll(nil, cfg) is a no-op that does not panic; RegisterAll with empty cfg does not panic Create `pkg/recon/sources/queries.go`: ```go package sources
import (
    "fmt"
    "sort"

    "github.com/salvacybersec/keyhunter/pkg/providers"
)

// BuildQueries produces the search-string list a source should iterate for a
// given provider registry. Each keyword is formatted per source-specific syntax.
// Result is deterministic (sorted) for reproducible tests.
func BuildQueries(reg *providers.Registry, source string) []string {
    if reg == nil { return nil }
    seen := make(map[string]struct{})
    for _, p := range reg.List() {
        for _, k := range p.Keywords {
            if k == "" { continue }
            seen[k] = struct{}{}
        }
    }
    keywords := make([]string, 0, len(seen))
    for k := range seen { keywords = append(keywords, k) }
    sort.Strings(keywords)

    out := make([]string, 0, len(keywords))
    for _, k := range keywords {
        out = append(out, formatQuery(source, k))
    }
    return out
}

func formatQuery(source, keyword string) string {
    switch source {
    case "github", "gist":
        return fmt.Sprintf("%q in:file", keyword)
    case "gitlab":
        return keyword // GitLab code search doesn't support in:file qualifier
    case "bitbucket":
        return keyword
    case "codeberg":
        return keyword
    default:
        return keyword
    }
}
```

Create `pkg/recon/sources/queries_test.go` using `providers.NewRegistryFromProviders`
with two synthetic providers (shared keyword to test dedup).

Create `pkg/recon/sources/register.go`:
```go
package sources

import (
    "github.com/salvacybersec/keyhunter/pkg/providers"
    "github.com/salvacybersec/keyhunter/pkg/recon"
)

// SourcesConfig carries per-source credentials read from viper/env by cmd/recon.go.
// Plan 10-09 fleshes this out; for now it is a placeholder struct so downstream
// plans can depend on its shape.
type SourcesConfig struct {
    GitHubToken      string
    GitLabToken      string
    BitbucketToken   string
    HuggingFaceToken string
    KaggleUser       string
    KaggleKey        string
    Registry         *providers.Registry
    Limiters         *recon.LimiterRegistry
}

// RegisterAll registers every Phase 10 code-hosting source on engine.
// Wave 2 plans append their source constructors here via additional
// registerXxx helpers in this file. Plan 10-09 writes the final list.
func RegisterAll(engine *recon.Engine, cfg SourcesConfig) {
    if engine == nil { return }
    // Populated by Plan 10-09 (after Wave 2 lands individual source files).
}
```

Do NOT wire this into cmd/recon.go yet — Plan 10-09 handles CLI integration after
every source exists.
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestBuildQueries|TestRegisterAll" -v -timeout 30s && go build ./... BuildQueries is deterministic, dedups keywords, formats per-source syntax. RegisterAll compiles as a no-op stub. Package builds with zero source implementations — ready for Wave 2 plans to add files in parallel. - `go build ./...` succeeds - `go test ./pkg/recon/sources/...` passes - `go vet ./pkg/recon/sources/...` clean

<success_criteria> pkg/recon/sources package exists with httpclient.go, queries.go, register.go, doc.go and all tests green. No source implementations present yet — that is Wave 2. </success_criteria>

After completion, create `.planning/phases/10-osint-code-hosting/10-01-SUMMARY.md`.