--- phase: 13-osint_package_registries_container_iac plan: 01 type: execute wave: 1 depends_on: [] files_modified: - pkg/recon/sources/npm.go - pkg/recon/sources/npm_test.go - pkg/recon/sources/pypi.go - pkg/recon/sources/pypi_test.go - pkg/recon/sources/cratesio.go - pkg/recon/sources/cratesio_test.go - pkg/recon/sources/rubygems.go - pkg/recon/sources/rubygems_test.go autonomous: true requirements: - RECON-PKG-01 - RECON-PKG-02 must_haves: truths: - "NpmSource searches npm registry for packages matching provider keywords and emits findings" - "PyPISource searches PyPI for packages matching provider keywords and emits findings" - "CratesIOSource searches crates.io for crates matching provider keywords and emits findings" - "RubyGemsSource searches rubygems.org for gems matching provider keywords and emits findings" - "All four sources handle context cancellation, empty registries, and HTTP errors gracefully" artifacts: - path: "pkg/recon/sources/npm.go" provides: "NpmSource implementing recon.ReconSource" contains: "func (s *NpmSource) Sweep" - path: "pkg/recon/sources/npm_test.go" provides: "httptest-based tests for NpmSource" contains: "httptest.NewServer" - path: "pkg/recon/sources/pypi.go" provides: "PyPISource implementing recon.ReconSource" contains: "func (s *PyPISource) Sweep" - path: "pkg/recon/sources/pypi_test.go" provides: "httptest-based tests for PyPISource" contains: "httptest.NewServer" - path: "pkg/recon/sources/cratesio.go" provides: "CratesIOSource implementing recon.ReconSource" contains: "func (s *CratesIOSource) Sweep" - path: "pkg/recon/sources/cratesio_test.go" provides: "httptest-based tests for CratesIOSource" contains: "httptest.NewServer" - path: "pkg/recon/sources/rubygems.go" provides: "RubyGemsSource implementing recon.ReconSource" contains: "func (s *RubyGemsSource) Sweep" - path: "pkg/recon/sources/rubygems_test.go" provides: "httptest-based tests for RubyGemsSource" contains: "httptest.NewServer" key_links: - from: "pkg/recon/sources/npm.go" to: "pkg/recon/source.go" via: "implements ReconSource interface" pattern: "var _ recon\\.ReconSource" - from: "pkg/recon/sources/pypi.go" to: "pkg/recon/source.go" via: "implements ReconSource interface" pattern: "var _ recon\\.ReconSource" --- Implement four package registry ReconSource modules: npm, PyPI, Crates.io, and RubyGems. Purpose: Enables KeyHunter to scan the four most popular package registries for packages that may contain leaked API keys, covering JavaScript, Python, Rust, and Ruby ecosystems. Output: 4 source files + 4 test files in pkg/recon/sources/ @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md @.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @pkg/recon/source.go @pkg/recon/sources/register.go @pkg/recon/sources/httpclient.go @pkg/recon/sources/queries.go @pkg/recon/sources/replit.go (pattern reference — credentialless scraper source) @pkg/recon/sources/github.go (pattern reference — API-key-gated source) @pkg/recon/sources/replit_test.go (test pattern reference) From pkg/recon/source.go: ```go type ReconSource interface { Name() string RateLimit() rate.Limit Burst() int RespectsRobots() bool Enabled(cfg Config) bool Sweep(ctx context.Context, query string, out chan<- Finding) error } ``` From pkg/recon/sources/httpclient.go: ```go func NewClient() *Client func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error) ``` From pkg/recon/sources/queries.go: ```go func BuildQueries(reg *providers.Registry, source string) []string ``` Task 1: Implement NpmSource and PyPISource pkg/recon/sources/npm.go, pkg/recon/sources/npm_test.go, pkg/recon/sources/pypi.go, pkg/recon/sources/pypi_test.go Create NpmSource in npm.go following the established ReplitSource pattern (credentialless, RespectsRobots=true): **NpmSource** (npm.go): - Struct: `NpmSource` with fields `BaseURL string`, `Registry *providers.Registry`, `Limiters *recon.LimiterRegistry`, `Client *Client` - Compile-time assertion: `var _ recon.ReconSource = (*NpmSource)(nil)` - Name() returns "npm" - RateLimit() returns rate.Every(2 * time.Second) — npm registry is generous but be polite - Burst() returns 2 - RespectsRobots() returns false (API endpoint, not scraped HTML) - Enabled() always returns true (no credentials needed) - BaseURL defaults to "https://registry.npmjs.org" if empty - Sweep() logic: 1. Call BuildQueries(s.Registry, "npm") to get keyword list 2. For each keyword, GET `{BaseURL}/-/v1/search?text={keyword}&size=20` 3. Parse JSON response: `{"objects": [{"package": {"name": "...", "links": {"npm": "..."}}}]}` 4. Define response structs: `npmSearchResponse`, `npmObject`, `npmPackage`, `npmLinks` 5. Emit one Finding per result with Source=links.npm (or construct from package name), SourceType="recon:npm", Confidence="low" 6. Honor ctx cancellation between queries, use Limiters.Wait before each request **PyPISource** (pypi.go): - Same pattern as NpmSource - Name() returns "pypi" - RateLimit() returns rate.Every(2 * time.Second) - Burst() returns 2 - RespectsRobots() returns false - Enabled() always true - BaseURL defaults to "https://pypi.org" - Sweep() logic: 1. BuildQueries(s.Registry, "pypi") 2. For each keyword, GET `{BaseURL}/search/?q={keyword}&o=` (HTML page) OR use the XML-RPC/JSON approach: Actually use the simple JSON API: GET `{BaseURL}/pypi/{keyword}/json` is for specific packages. For search, use: GET `https://pypi.org/search/?q={keyword}` and parse HTML for project links. Simpler approach: GET `{BaseURL}/simple/` is too large. Use the warehouse search page. Best approach: GET `{BaseURL}/search/?q={keyword}` returns HTML. Parse `` links. 3. Parse HTML response for project links matching `/project/[^/]+/` pattern 4. Emit Finding per result with Source="{BaseURL}/project/{name}/", SourceType="recon:pypi" 5. Use extractAnchorHrefs pattern or a simpler regex on href attributes **Tests** — Follow replit_test.go pattern exactly: - npm_test.go: httptest server returning canned npm search JSON. Test Sweep extracts findings, test Name/Rate/Burst, test ctx cancellation, test Enabled always true. - pypi_test.go: httptest server returning canned HTML with package-snippet links. Same test categories. cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestNpm|TestPyPI" -v -count=1 NpmSource and PyPISource pass all tests: Sweep emits correct findings from httptest fixtures, Name/Rate/Burst/Enabled return expected values, ctx cancellation is handled Task 2: Implement CratesIOSource and RubyGemsSource pkg/recon/sources/cratesio.go, pkg/recon/sources/cratesio_test.go, pkg/recon/sources/rubygems.go, pkg/recon/sources/rubygems_test.go **CratesIOSource** (cratesio.go): - Struct: `CratesIOSource` with `BaseURL`, `Registry`, `Limiters`, `Client` - Compile-time assertion: `var _ recon.ReconSource = (*CratesIOSource)(nil)` - Name() returns "crates" - RateLimit() returns rate.Every(1 * time.Second) — crates.io asks for 1 req/sec - Burst() returns 1 - RespectsRobots() returns false (JSON API) - Enabled() always true - BaseURL defaults to "https://crates.io" - Sweep() logic: 1. BuildQueries(s.Registry, "crates") 2. For each keyword, GET `{BaseURL}/api/v1/crates?q={keyword}&per_page=20` 3. Parse JSON: `{"crates": [{"id": "...", "name": "...", "repository": "..."}]}` 4. Define response structs: `cratesSearchResponse`, `crateEntry` 5. Emit Finding per crate: Source="https://crates.io/crates/{name}", SourceType="recon:crates" 6. IMPORTANT: crates.io requires a custom User-Agent header. Set req.Header.Set("User-Agent", "keyhunter-recon/1.0 (https://github.com/salvacybersec/keyhunter)") before passing to client.Do **RubyGemsSource** (rubygems.go): - Same pattern - Name() returns "rubygems" - RateLimit() returns rate.Every(2 * time.Second) - Burst() returns 2 - RespectsRobots() returns false (JSON API) - Enabled() always true - BaseURL defaults to "https://rubygems.org" - Sweep() logic: 1. BuildQueries(s.Registry, "rubygems") 2. For each keyword, GET `{BaseURL}/api/v1/search.json?query={keyword}&page=1` 3. Parse JSON array: `[{"name": "...", "project_uri": "..."}]` 4. Define response struct: `rubyGemEntry` 5. Emit Finding per gem: Source=project_uri, SourceType="recon:rubygems" **Tests** — same httptest pattern: - cratesio_test.go: httptest serving canned JSON with crate entries. Verify User-Agent header is set. Test all standard categories. - rubygems_test.go: httptest serving canned JSON array. Test all standard categories. cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestCratesIO|TestRubyGems" -v -count=1 CratesIOSource and RubyGemsSource pass all tests. CratesIO sends proper User-Agent header. Both emit correct findings from httptest fixtures. All 8 new files compile and pass tests: ```bash go test ./pkg/recon/sources/ -run "TestNpm|TestPyPI|TestCratesIO|TestRubyGems" -v -count=1 go vet ./pkg/recon/sources/ ``` - 4 new source files implement recon.ReconSource interface - 4 test files use httptest with canned fixtures - All tests pass - No compilation errors across the package After completion, create `.planning/phases/13-osint_package_registries_container_iac/13-01-SUMMARY.md`