Files
keyhunter/.planning/phases/10-osint-code-hosting/10-09-PLAN.md

228 lines
11 KiB
Markdown

---
phase: 10-osint-code-hosting
plan: 09
type: execute
wave: 3
depends_on: [10-01, 10-02, 10-03, 10-04, 10-05, 10-06, 10-07, 10-08]
files_modified:
- pkg/recon/sources/register.go
- pkg/recon/sources/register_test.go
- pkg/recon/sources/integration_test.go
- cmd/recon.go
autonomous: true
requirements: []
must_haves:
truths:
- "RegisterAll wires all 10 Phase 10 sources onto a recon.Engine"
- "cmd/recon.go buildReconEngine() reads viper config + env vars for tokens and calls RegisterAll"
- "Integration test spins up httptest servers for all sources, runs SweepAll via Engine, asserts Findings from each source arrive with correct SourceType"
- "Guardrail: enabling a source without its required credential logs a skip but does not error"
artifacts:
- path: "pkg/recon/sources/register.go"
provides: "RegisterAll with 10 source constructors wired"
contains: "engine.Register"
- path: "pkg/recon/sources/integration_test.go"
provides: "End-to-end SweepAll test with httptest fixtures for every source"
- path: "cmd/recon.go"
provides: "CLI reads config and invokes sources.RegisterAll"
key_links:
- from: "cmd/recon.go"
to: "pkg/recon/sources.RegisterAll"
via: "sources.RegisterAll(eng, cfg)"
pattern: "sources\\.RegisterAll"
- from: "pkg/recon/sources/register.go"
to: "pkg/recon.Engine.Register"
via: "engine.Register(source)"
pattern: "engine\\.Register"
---
<objective>
Final Wave 3 plan: wire every Phase 10 source into `sources.RegisterAll`, update
`cmd/recon.go` to construct a real `SourcesConfig` from viper/env, and add an
end-to-end integration test that drives all 10 sources through recon.Engine.SweepAll
using httptest fixtures.
Purpose: Users can run `keyhunter recon full --sources=github,gitlab,...` and get
actual findings from any Phase 10 source whose credential is configured.
Output: Wired register.go + cmd/recon.go + passing integration test.
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/phases/10-osint-code-hosting/10-CONTEXT.md
@.planning/phases/10-osint-code-hosting/10-01-SUMMARY.md
@.planning/phases/10-osint-code-hosting/10-02-SUMMARY.md
@.planning/phases/10-osint-code-hosting/10-03-SUMMARY.md
@.planning/phases/10-osint-code-hosting/10-04-SUMMARY.md
@.planning/phases/10-osint-code-hosting/10-05-SUMMARY.md
@.planning/phases/10-osint-code-hosting/10-06-SUMMARY.md
@.planning/phases/10-osint-code-hosting/10-07-SUMMARY.md
@.planning/phases/10-osint-code-hosting/10-08-SUMMARY.md
@pkg/recon/engine.go
@pkg/recon/source.go
@pkg/providers/registry.go
@cmd/recon.go
<interfaces>
After Wave 2, each source file in pkg/recon/sources/ exports a constructor
roughly of the form:
func NewGitHubSource(token, reg, lim) *GitHubSource
func NewGitLabSource(token, reg, lim) *GitLabSource
func NewBitbucketSource(token, workspace, reg, lim) *BitbucketSource
func NewGistSource(token, reg, lim) *GistSource
func NewCodebergSource(token, reg, lim) *CodebergSource
func NewHuggingFaceSource(token, reg, lim) *HuggingFaceSource
func NewReplitSource(reg, lim) *ReplitSource
func NewCodeSandboxSource(reg, lim) *CodeSandboxSource
func NewSandboxesSource(reg, lim) *SandboxesSource
func NewKaggleSource(user, key, reg, lim) *KaggleSource
(Verify actual signatures when reading Wave 2 SUMMARYs before writing register.go.)
</interfaces>
</context>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Wire RegisterAll + register_test.go</name>
<files>pkg/recon/sources/register.go, pkg/recon/sources/register_test.go</files>
<behavior>
- Test A: RegisterAll with a fresh engine and empty SourcesConfig registers all 10 sources by name (GitHub/GitLab/Bitbucket/Gist/Codeberg/HuggingFace/Replit/CodeSandbox/Sandboxes/Kaggle)
- Test B: engine.List() returns all 10 source names in sorted order
- Test C: Calling RegisterAll(nil, cfg) is a no-op (no panic)
- Test D: Sources without creds are still registered but their Enabled() returns false
</behavior>
<action>
Rewrite `pkg/recon/sources/register.go` RegisterAll body to construct each
source with appropriate fields from SourcesConfig and call engine.Register:
```go
func RegisterAll(engine *recon.Engine, cfg SourcesConfig) {
if engine == nil { return }
reg := cfg.Registry
lim := cfg.Limiters
engine.Register(NewGitHubSource(cfg.GitHubToken, reg, lim))
engine.Register(NewGitLabSource(cfg.GitLabToken, reg, lim))
engine.Register(NewBitbucketSource(cfg.BitbucketToken, cfg.BitbucketWorkspace, reg, lim))
engine.Register(NewGistSource(cfg.GitHubToken, reg, lim))
engine.Register(NewCodebergSource(cfg.CodebergToken, reg, lim))
engine.Register(NewHuggingFaceSource(cfg.HuggingFaceToken, reg, lim))
engine.Register(NewReplitSource(reg, lim))
engine.Register(NewCodeSandboxSource(reg, lim))
engine.Register(NewSandboxesSource(reg, lim))
engine.Register(NewKaggleSource(cfg.KaggleUser, cfg.KaggleKey, reg, lim))
}
```
Extend SourcesConfig with any fields Wave 2 introduced (BitbucketWorkspace,
CodebergToken). Adjust field names to actual Wave 2 SUMMARY signatures.
Create `pkg/recon/sources/register_test.go`:
- Build minimal registry via providers.NewRegistryFromProviders with 1 synthetic provider
- Build recon.Engine, call RegisterAll with cfg having all creds empty
- Assert eng.List() returns exactly these 10 names:
bitbucket, codeberg, codesandbox, gist, github, gitlab, huggingface, kaggle, replit, sandboxes
- Assert nil engine call is no-op (no panic)
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run TestRegisterAll -v -timeout 30s</automated>
</verify>
<done>
RegisterAll wires all 10 sources; registry_test green.
</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Integration test across all sources + cmd/recon.go wiring</name>
<files>pkg/recon/sources/integration_test.go, cmd/recon.go</files>
<behavior>
- Integration test: spins up 10 httptest servers (or one multiplexed server with per-path routing) that return canned responses for each source's endpoints
- Uses BaseURL overrides on each source (direct construction, not RegisterAll, since RegisterAll uses production URLs)
- Registers each override-configured source on a fresh recon.Engine and calls SweepAll
- Asserts at least 1 Finding emerged for each of the 10 SourceType values: recon:github, recon:gitlab, recon:bitbucket, recon:gist, recon:codeberg, recon:huggingface, recon:replit, recon:codesandbox, recon:sandboxes, recon:kaggle
- CLI: `keyhunter recon list` (after wiring) prints all 10 source names in addition to "example"
</behavior>
<action>
Create `pkg/recon/sources/integration_test.go`:
- Build a single httptest server with a mux routing per-path:
`/search/code` (github) → ghSearchResponse JSON
`/api/v4/search` (gitlab) → blob array JSON
`/2.0/workspaces/ws/search/code` (bitbucket) → values JSON
`/gists/public` + `/raw/gist1` (gist) → gist list + raw matching keyword
`/api/v1/repos/search` (codeberg) → data array
`/api/spaces`, `/api/models` (huggingface) → id arrays
`/search?q=...&type=repls` (replit) → HTML fixture
`/search?query=...&type=sandboxes` (codesandbox) → HTML fixture
`/codepen-search` (sandboxes sub) → HTML; `/jsfiddle-search` → JSON
`/api/v1/kernels/list` (kaggle) → ref array
- For each source, construct with BaseURL/Platforms overrides pointing at test server
- Register all on a fresh recon.Engine
- Provide synthetic providers.Registry with keyword "sk-proj-" matching openai
- Call eng.SweepAll(ctx, recon.Config{Query:"ignored"})
- Assert findings grouped by SourceType covers all 10 expected values
- Use a 30s test timeout
Update `cmd/recon.go`:
- Import `github.com/salvacybersec/keyhunter/pkg/recon/sources`, `github.com/spf13/viper`, and the providers package
- In `buildReconEngine()`:
```go
func buildReconEngine() *recon.Engine {
e := recon.NewEngine()
e.Register(recon.ExampleSource{})
reg, err := providers.NewRegistry()
if err != nil {
fmt.Fprintf(os.Stderr, "recon: failed to load providers: %v\n", err)
return e
}
cfg := sources.SourcesConfig{
Registry: reg,
Limiters: recon.NewLimiterRegistry(),
GitHubToken: firstNonEmpty(os.Getenv("GITHUB_TOKEN"), viper.GetString("recon.github.token")),
GitLabToken: firstNonEmpty(os.Getenv("GITLAB_TOKEN"), viper.GetString("recon.gitlab.token")),
BitbucketToken: firstNonEmpty(os.Getenv("BITBUCKET_TOKEN"), viper.GetString("recon.bitbucket.token")),
BitbucketWorkspace: viper.GetString("recon.bitbucket.workspace"),
CodebergToken: firstNonEmpty(os.Getenv("CODEBERG_TOKEN"), viper.GetString("recon.codeberg.token")),
HuggingFaceToken: firstNonEmpty(os.Getenv("HUGGINGFACE_TOKEN"), viper.GetString("recon.huggingface.token")),
KaggleUser: firstNonEmpty(os.Getenv("KAGGLE_USERNAME"), viper.GetString("recon.kaggle.username")),
KaggleKey: firstNonEmpty(os.Getenv("KAGGLE_KEY"), viper.GetString("recon.kaggle.key")),
}
sources.RegisterAll(e, cfg)
return e
}
func firstNonEmpty(a, b string) string { if a != "" { return a }; return b }
```
- Preserve existing reconFullCmd / reconListCmd behavior.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run TestIntegration -v -timeout 60s && go build ./... && go run . recon list | sort</automated>
</verify>
<done>
Integration test passes with at least one Finding per SourceType across all 10
sources. `keyhunter recon list` prints all 10 source names plus "example".
</done>
</task>
</tasks>
<verification>
- `go build ./...`
- `go vet ./...`
- `go test ./pkg/recon/sources/... -v -timeout 60s`
- `go test ./pkg/recon/... -timeout 60s` (ensure no regression in Phase 9 recon tests)
- `go run . recon list` prints all 10 new source names
</verification>
<success_criteria>
All Phase 10 code hosting sources registered via sources.RegisterAll, wired into
cmd/recon.go, and exercised end-to-end by an integration test hitting httptest
fixtures for every source. Phase 10 requirements RECON-CODE-01..10 complete.
</success_criteria>
<output>
After completion, create `.planning/phases/10-osint-code-hosting/10-09-SUMMARY.md`.
</output>