332 lines
12 KiB
Markdown
332 lines
12 KiB
Markdown
---
|
|
phase: 10-osint-code-hosting
|
|
plan: 01
|
|
type: execute
|
|
wave: 1
|
|
depends_on: []
|
|
files_modified:
|
|
- pkg/recon/sources/doc.go
|
|
- pkg/recon/sources/httpclient.go
|
|
- pkg/recon/sources/httpclient_test.go
|
|
- pkg/recon/sources/queries.go
|
|
- pkg/recon/sources/queries_test.go
|
|
- pkg/recon/sources/register.go
|
|
autonomous: true
|
|
requirements: []
|
|
must_haves:
|
|
truths:
|
|
- "Shared retry HTTP client honors ctx cancellation and Retry-After on 429/403"
|
|
- "Provider registry drives per-source query templates (no hardcoded literals)"
|
|
- "Empty source registry compiles and exposes RegisterAll(engine, cfg)"
|
|
artifacts:
|
|
- path: "pkg/recon/sources/httpclient.go"
|
|
provides: "Retrying *http.Client with context + Retry-After handling"
|
|
- path: "pkg/recon/sources/queries.go"
|
|
provides: "BuildQueries(registry, sourceName) []string generator"
|
|
- path: "pkg/recon/sources/register.go"
|
|
provides: "RegisterAll(engine *recon.Engine, cfg SourcesConfig) bootstrap"
|
|
key_links:
|
|
- from: "pkg/recon/sources/httpclient.go"
|
|
to: "net/http + context + golang.org/x/time/rate"
|
|
via: "DoWithRetry(ctx, req, limiter) (*http.Response, error)"
|
|
pattern: "DoWithRetry"
|
|
- from: "pkg/recon/sources/queries.go"
|
|
to: "pkg/providers.Registry"
|
|
via: "BuildQueries iterates reg.List() and formats provider keywords"
|
|
pattern: "BuildQueries"
|
|
---
|
|
|
|
<objective>
|
|
Establish the shared foundation for all Phase 10 code hosting sources: a retry-aware HTTP
|
|
client wrapper, a provider→query template generator driven by the provider registry, and
|
|
an empty RegisterAll bootstrap that Plan 10-09 will fill in. No individual source is
|
|
implemented here — this plan exists so Wave 2 plans (10-02..10-08) can run in parallel
|
|
without fighting over shared helpers.
|
|
|
|
Purpose: Deduplicate retry/rate-limit/backoff logic across 10 sources; centralize query
|
|
generation so providers added later automatically flow to every source.
|
|
Output: Compilable `pkg/recon/sources` package skeleton with tested helpers.
|
|
</objective>
|
|
|
|
<execution_context>
|
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
|
</execution_context>
|
|
|
|
<context>
|
|
@.planning/PROJECT.md
|
|
@.planning/ROADMAP.md
|
|
@.planning/STATE.md
|
|
@.planning/phases/10-osint-code-hosting/10-CONTEXT.md
|
|
@pkg/recon/source.go
|
|
@pkg/recon/limiter.go
|
|
@pkg/dorks/github.go
|
|
@pkg/providers/registry.go
|
|
|
|
<interfaces>
|
|
From pkg/recon/source.go:
|
|
```go
|
|
type ReconSource interface {
|
|
Name() string
|
|
RateLimit() rate.Limit
|
|
Burst() int
|
|
RespectsRobots() bool
|
|
Enabled(cfg Config) bool
|
|
Sweep(ctx context.Context, query string, out chan<- Finding) error
|
|
}
|
|
type Finding = engine.Finding
|
|
type Config struct { Stealth, RespectRobots bool; EnabledSources []string; Query string }
|
|
```
|
|
|
|
From pkg/recon/limiter.go:
|
|
```go
|
|
type LimiterRegistry struct { ... }
|
|
func NewLimiterRegistry() *LimiterRegistry
|
|
func (lr *LimiterRegistry) Wait(ctx, name, r, burst, stealth) error
|
|
```
|
|
|
|
From pkg/providers/registry.go:
|
|
```go
|
|
func (r *Registry) List() []Provider
|
|
// Provider has: Name string, Keywords []string, Patterns []Pattern, Tier int
|
|
```
|
|
|
|
From pkg/engine/finding.go:
|
|
```go
|
|
type Finding struct {
|
|
ProviderName, KeyValue, KeyMasked, Confidence, Source, SourceType string
|
|
LineNumber int; Offset int64; DetectedAt time.Time
|
|
Verified bool; VerifyStatus string; ...
|
|
}
|
|
```
|
|
</interfaces>
|
|
</context>
|
|
|
|
<tasks>
|
|
|
|
<task type="auto" tdd="true">
|
|
<name>Task 1: Shared retry HTTP client helper</name>
|
|
<files>pkg/recon/sources/doc.go, pkg/recon/sources/httpclient.go, pkg/recon/sources/httpclient_test.go</files>
|
|
<behavior>
|
|
- Test A: 200 OK returns response unchanged, body readable
|
|
- Test B: 429 with Retry-After:1 triggers one retry then succeeds (verify via httptest counter)
|
|
- Test C: 403 with Retry-After triggers retry
|
|
- Test D: 401 returns ErrUnauthorized immediately, no retry
|
|
- Test E: Ctx cancellation during retry sleep returns ctx.Err()
|
|
- Test F: MaxRetries exhausted returns wrapped last-status error
|
|
</behavior>
|
|
<action>
|
|
Create `pkg/recon/sources/doc.go` with the package comment: "Package sources hosts per-OSINT-source ReconSource implementations for Phase 10 code hosting (GitHub, GitLab, Bitbucket, Gist, Codeberg, HuggingFace, Kaggle, Replit, CodeSandbox, sandboxes). Each source implements pkg/recon.ReconSource."
|
|
|
|
Create `pkg/recon/sources/httpclient.go` exporting:
|
|
```go
|
|
package sources
|
|
|
|
import (
|
|
"context"
|
|
"errors"
|
|
"fmt"
|
|
"net/http"
|
|
"strconv"
|
|
"time"
|
|
)
|
|
|
|
// ErrUnauthorized is returned when an API rejects credentials (401).
|
|
var ErrUnauthorized = errors.New("sources: unauthorized (check credentials)")
|
|
|
|
// Client is the shared retry wrapper every Phase 10 source uses.
|
|
type Client struct {
|
|
HTTP *http.Client
|
|
MaxRetries int // default 2
|
|
UserAgent string // default "keyhunter-recon/1.0"
|
|
}
|
|
|
|
// NewClient returns a Client with a 30s timeout and 2 retries.
|
|
func NewClient() *Client {
|
|
return &Client{HTTP: &http.Client{Timeout: 30 * time.Second}, MaxRetries: 2, UserAgent: "keyhunter-recon/1.0"}
|
|
}
|
|
|
|
// Do executes req with retries on 429/403/5xx honoring Retry-After.
|
|
// 401 returns ErrUnauthorized wrapped with the response body.
|
|
// Ctx cancellation is honored during sleeps.
|
|
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error) {
|
|
if req.Header.Get("User-Agent") == "" { req.Header.Set("User-Agent", c.UserAgent) }
|
|
var last *http.Response
|
|
for attempt := 0; attempt <= c.MaxRetries; attempt++ {
|
|
r, err := c.HTTP.Do(req.WithContext(ctx))
|
|
if err != nil { return nil, fmt.Errorf("sources http: %w", err) }
|
|
if r.StatusCode == http.StatusOK { return r, nil }
|
|
if r.StatusCode == http.StatusUnauthorized {
|
|
body := readBody(r)
|
|
return nil, fmt.Errorf("%w: %s", ErrUnauthorized, body)
|
|
}
|
|
retriable := r.StatusCode == 429 || r.StatusCode == 403 || r.StatusCode >= 500
|
|
if !retriable || attempt == c.MaxRetries {
|
|
body := readBody(r)
|
|
return nil, fmt.Errorf("sources http %d: %s", r.StatusCode, body)
|
|
}
|
|
sleep := ParseRetryAfter(r.Header.Get("Retry-After"))
|
|
r.Body.Close()
|
|
last = r
|
|
select {
|
|
case <-time.After(sleep):
|
|
case <-ctx.Done(): return nil, ctx.Err()
|
|
}
|
|
}
|
|
_ = last
|
|
return nil, fmt.Errorf("sources http: retries exhausted")
|
|
}
|
|
|
|
// ParseRetryAfter decodes integer-seconds Retry-After, defaulting to 1s.
|
|
func ParseRetryAfter(v string) time.Duration { ... }
|
|
// readBody reads up to 4KB of the body and closes it.
|
|
func readBody(r *http.Response) string { ... }
|
|
```
|
|
|
|
Create `pkg/recon/sources/httpclient_test.go` using `net/http/httptest`:
|
|
- Table-driven tests for each behavior above. Use an atomic counter to verify
|
|
retry attempt counts. Use `httptest.NewServer` with a handler that switches on
|
|
a request counter.
|
|
- For ctx cancellation test: set Retry-After: 10, cancel ctx inside 100ms, assert
|
|
ctx.Err() returned within 500ms.
|
|
|
|
Do NOT build a LimiterRegistry wrapper here — each source calls its own LimiterRegistry.Wait
|
|
before calling Client.Do. Keeps Client single-purpose (retry only).
|
|
</action>
|
|
<verify>
|
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run TestClient -v -timeout 30s</automated>
|
|
</verify>
|
|
<done>
|
|
All behaviors covered; Client.Do retries on 429/403/5xx honoring Retry-After; 401
|
|
returns ErrUnauthorized immediately; ctx cancellation respected; tests green.
|
|
</done>
|
|
</task>
|
|
|
|
<task type="auto" tdd="true">
|
|
<name>Task 2: Provider-driven query generator + RegisterAll skeleton</name>
|
|
<files>pkg/recon/sources/queries.go, pkg/recon/sources/queries_test.go, pkg/recon/sources/register.go</files>
|
|
<behavior>
|
|
- Test A: BuildQueries(reg, "github") returns one query per (provider, keyword) tuple formatted as GitHub search syntax, e.g. `"sk-proj-" in:file`
|
|
- Test B: BuildQueries(reg, "gitlab") returns queries formatted for GitLab search syntax (raw keyword, no `in:file`)
|
|
- Test C: BuildQueries(reg, "huggingface") returns bare keyword queries
|
|
- Test D: Unknown source name returns bare keyword queries (safe default)
|
|
- Test E: Providers with empty Keywords slice are skipped
|
|
- Test F: Keyword dedup — if two providers share keyword, emit once per source
|
|
- Test G: RegisterAll(nil, cfg) is a no-op that does not panic; RegisterAll with empty cfg does not panic
|
|
</behavior>
|
|
<action>
|
|
Create `pkg/recon/sources/queries.go`:
|
|
```go
|
|
package sources
|
|
|
|
import (
|
|
"fmt"
|
|
"sort"
|
|
|
|
"github.com/salvacybersec/keyhunter/pkg/providers"
|
|
)
|
|
|
|
// BuildQueries produces the search-string list a source should iterate for a
|
|
// given provider registry. Each keyword is formatted per source-specific syntax.
|
|
// Result is deterministic (sorted) for reproducible tests.
|
|
func BuildQueries(reg *providers.Registry, source string) []string {
|
|
if reg == nil { return nil }
|
|
seen := make(map[string]struct{})
|
|
for _, p := range reg.List() {
|
|
for _, k := range p.Keywords {
|
|
if k == "" { continue }
|
|
seen[k] = struct{}{}
|
|
}
|
|
}
|
|
keywords := make([]string, 0, len(seen))
|
|
for k := range seen { keywords = append(keywords, k) }
|
|
sort.Strings(keywords)
|
|
|
|
out := make([]string, 0, len(keywords))
|
|
for _, k := range keywords {
|
|
out = append(out, formatQuery(source, k))
|
|
}
|
|
return out
|
|
}
|
|
|
|
func formatQuery(source, keyword string) string {
|
|
switch source {
|
|
case "github", "gist":
|
|
return fmt.Sprintf("%q in:file", keyword)
|
|
case "gitlab":
|
|
return keyword // GitLab code search doesn't support in:file qualifier
|
|
case "bitbucket":
|
|
return keyword
|
|
case "codeberg":
|
|
return keyword
|
|
default:
|
|
return keyword
|
|
}
|
|
}
|
|
```
|
|
|
|
Create `pkg/recon/sources/queries_test.go` using `providers.NewRegistryFromProviders`
|
|
with two synthetic providers (shared keyword to test dedup).
|
|
|
|
Create `pkg/recon/sources/register.go`:
|
|
```go
|
|
package sources
|
|
|
|
import (
|
|
"github.com/salvacybersec/keyhunter/pkg/providers"
|
|
"github.com/salvacybersec/keyhunter/pkg/recon"
|
|
)
|
|
|
|
// SourcesConfig carries per-source credentials read from viper/env by cmd/recon.go.
|
|
// Plan 10-09 fleshes this out; for now it is a placeholder struct so downstream
|
|
// plans can depend on its shape.
|
|
type SourcesConfig struct {
|
|
GitHubToken string
|
|
GitLabToken string
|
|
BitbucketToken string
|
|
HuggingFaceToken string
|
|
KaggleUser string
|
|
KaggleKey string
|
|
Registry *providers.Registry
|
|
Limiters *recon.LimiterRegistry
|
|
}
|
|
|
|
// RegisterAll registers every Phase 10 code-hosting source on engine.
|
|
// Wave 2 plans append their source constructors here via additional
|
|
// registerXxx helpers in this file. Plan 10-09 writes the final list.
|
|
func RegisterAll(engine *recon.Engine, cfg SourcesConfig) {
|
|
if engine == nil { return }
|
|
// Populated by Plan 10-09 (after Wave 2 lands individual source files).
|
|
}
|
|
```
|
|
|
|
Do NOT wire this into cmd/recon.go yet — Plan 10-09 handles CLI integration after
|
|
every source exists.
|
|
</action>
|
|
<verify>
|
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestBuildQueries|TestRegisterAll" -v -timeout 30s && go build ./...</automated>
|
|
</verify>
|
|
<done>
|
|
BuildQueries is deterministic, dedups keywords, formats per-source syntax.
|
|
RegisterAll compiles as a no-op stub. Package builds with zero source
|
|
implementations — ready for Wave 2 plans to add files in parallel.
|
|
</done>
|
|
</task>
|
|
|
|
</tasks>
|
|
|
|
<verification>
|
|
- `go build ./...` succeeds
|
|
- `go test ./pkg/recon/sources/...` passes
|
|
- `go vet ./pkg/recon/sources/...` clean
|
|
</verification>
|
|
|
|
<success_criteria>
|
|
pkg/recon/sources package exists with httpclient.go, queries.go, register.go, doc.go
|
|
and all tests green. No source implementations present yet — that is Wave 2.
|
|
</success_criteria>
|
|
|
|
<output>
|
|
After completion, create `.planning/phases/10-osint-code-hosting/10-01-SUMMARY.md`.
|
|
</output>
|