Files
keyhunter/.planning/phases/13-osint_package_registries_container_iac/13-01-PLAN.md

10 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
phase plan type wave depends_on files_modified autonomous requirements must_haves
13-osint_package_registries_container_iac 01 execute 1
pkg/recon/sources/npm.go
pkg/recon/sources/npm_test.go
pkg/recon/sources/pypi.go
pkg/recon/sources/pypi_test.go
pkg/recon/sources/cratesio.go
pkg/recon/sources/cratesio_test.go
pkg/recon/sources/rubygems.go
pkg/recon/sources/rubygems_test.go
true
RECON-PKG-01
RECON-PKG-02
truths artifacts key_links
NpmSource searches npm registry for packages matching provider keywords and emits findings
PyPISource searches PyPI for packages matching provider keywords and emits findings
CratesIOSource searches crates.io for crates matching provider keywords and emits findings
RubyGemsSource searches rubygems.org for gems matching provider keywords and emits findings
All four sources handle context cancellation, empty registries, and HTTP errors gracefully
path provides contains
pkg/recon/sources/npm.go NpmSource implementing recon.ReconSource func (s *NpmSource) Sweep
path provides contains
pkg/recon/sources/npm_test.go httptest-based tests for NpmSource httptest.NewServer
path provides contains
pkg/recon/sources/pypi.go PyPISource implementing recon.ReconSource func (s *PyPISource) Sweep
path provides contains
pkg/recon/sources/pypi_test.go httptest-based tests for PyPISource httptest.NewServer
path provides contains
pkg/recon/sources/cratesio.go CratesIOSource implementing recon.ReconSource func (s *CratesIOSource) Sweep
path provides contains
pkg/recon/sources/cratesio_test.go httptest-based tests for CratesIOSource httptest.NewServer
path provides contains
pkg/recon/sources/rubygems.go RubyGemsSource implementing recon.ReconSource func (s *RubyGemsSource) Sweep
path provides contains
pkg/recon/sources/rubygems_test.go httptest-based tests for RubyGemsSource httptest.NewServer
from to via pattern
pkg/recon/sources/npm.go pkg/recon/source.go implements ReconSource interface var _ recon.ReconSource
from to via pattern
pkg/recon/sources/pypi.go pkg/recon/source.go implements ReconSource interface var _ recon.ReconSource
Implement four package registry ReconSource modules: npm, PyPI, Crates.io, and RubyGems.

Purpose: Enables KeyHunter to scan the four most popular package registries for packages that may contain leaked API keys, covering JavaScript, Python, Rust, and Ruby ecosystems. Output: 4 source files + 4 test files in pkg/recon/sources/

<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @pkg/recon/source.go @pkg/recon/sources/register.go @pkg/recon/sources/httpclient.go @pkg/recon/sources/queries.go @pkg/recon/sources/replit.go (pattern reference — credentialless scraper source) @pkg/recon/sources/github.go (pattern reference — API-key-gated source) @pkg/recon/sources/replit_test.go (test pattern reference)

From pkg/recon/source.go:

type ReconSource interface {
    Name() string
    RateLimit() rate.Limit
    Burst() int
    RespectsRobots() bool
    Enabled(cfg Config) bool
    Sweep(ctx context.Context, query string, out chan<- Finding) error
}

From pkg/recon/sources/httpclient.go:

func NewClient() *Client
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)

From pkg/recon/sources/queries.go:

func BuildQueries(reg *providers.Registry, source string) []string
Task 1: Implement NpmSource and PyPISource pkg/recon/sources/npm.go, pkg/recon/sources/npm_test.go, pkg/recon/sources/pypi.go, pkg/recon/sources/pypi_test.go Create NpmSource in npm.go following the established ReplitSource pattern (credentialless, RespectsRobots=true):

NpmSource (npm.go):

  • Struct: NpmSource with fields BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, Client *Client
  • Compile-time assertion: var _ recon.ReconSource = (*NpmSource)(nil)
  • Name() returns "npm"
  • RateLimit() returns rate.Every(2 * time.Second) — npm registry is generous but be polite
  • Burst() returns 2
  • RespectsRobots() returns false (API endpoint, not scraped HTML)
  • Enabled() always returns true (no credentials needed)
  • BaseURL defaults to "https://registry.npmjs.org" if empty
  • Sweep() logic:
    1. Call BuildQueries(s.Registry, "npm") to get keyword list
    2. For each keyword, GET {BaseURL}/-/v1/search?text={keyword}&size=20
    3. Parse JSON response: {"objects": [{"package": {"name": "...", "links": {"npm": "..."}}}]}
    4. Define response structs: npmSearchResponse, npmObject, npmPackage, npmLinks
    5. Emit one Finding per result with Source=links.npm (or construct from package name), SourceType="recon:npm", Confidence="low"
    6. Honor ctx cancellation between queries, use Limiters.Wait before each request

PyPISource (pypi.go):

  • Same pattern as NpmSource
  • Name() returns "pypi"
  • RateLimit() returns rate.Every(2 * time.Second)
  • Burst() returns 2
  • RespectsRobots() returns false
  • Enabled() always true
  • BaseURL defaults to "https://pypi.org"
  • Sweep() logic:
    1. BuildQueries(s.Registry, "pypi")
    2. For each keyword, GET {BaseURL}/search/?q={keyword}&o= (HTML page) OR use the XML-RPC/JSON approach: Actually use the simple JSON API: GET {BaseURL}/pypi/{keyword}/json is for specific packages. For search, use: GET https://pypi.org/search/?q={keyword} and parse HTML for project links. Simpler approach: GET {BaseURL}/simple/ is too large. Use the warehouse search page. Best approach: GET {BaseURL}/search/?q={keyword} returns HTML. Parse <a class="package-snippet" href="/project/{name}/"> links.
    3. Parse HTML response for project links matching /project/[^/]+/ pattern
    4. Emit Finding per result with Source="{BaseURL}/project/{name}/", SourceType="recon:pypi"
    5. Use extractAnchorHrefs pattern or a simpler regex on href attributes

Tests — Follow replit_test.go pattern exactly:

  • npm_test.go: httptest server returning canned npm search JSON. Test Sweep extracts findings, test Name/Rate/Burst, test ctx cancellation, test Enabled always true.
  • pypi_test.go: httptest server returning canned HTML with package-snippet links. Same test categories. cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestNpm|TestPyPI" -v -count=1 NpmSource and PyPISource pass all tests: Sweep emits correct findings from httptest fixtures, Name/Rate/Burst/Enabled return expected values, ctx cancellation is handled
Task 2: Implement CratesIOSource and RubyGemsSource pkg/recon/sources/cratesio.go, pkg/recon/sources/cratesio_test.go, pkg/recon/sources/rubygems.go, pkg/recon/sources/rubygems_test.go **CratesIOSource** (cratesio.go): - Struct: `CratesIOSource` with `BaseURL`, `Registry`, `Limiters`, `Client` - Compile-time assertion: `var _ recon.ReconSource = (*CratesIOSource)(nil)` - Name() returns "crates" - RateLimit() returns rate.Every(1 * time.Second) — crates.io asks for 1 req/sec - Burst() returns 1 - RespectsRobots() returns false (JSON API) - Enabled() always true - BaseURL defaults to "https://crates.io" - Sweep() logic: 1. BuildQueries(s.Registry, "crates") 2. For each keyword, GET `{BaseURL}/api/v1/crates?q={keyword}&per_page=20` 3. Parse JSON: `{"crates": [{"id": "...", "name": "...", "repository": "..."}]}` 4. Define response structs: `cratesSearchResponse`, `crateEntry` 5. Emit Finding per crate: Source="https://crates.io/crates/{name}", SourceType="recon:crates" 6. IMPORTANT: crates.io requires a custom User-Agent header. Set req.Header.Set("User-Agent", "keyhunter-recon/1.0 (https://github.com/salvacybersec/keyhunter)") before passing to client.Do

RubyGemsSource (rubygems.go):

  • Same pattern
  • Name() returns "rubygems"
  • RateLimit() returns rate.Every(2 * time.Second)
  • Burst() returns 2
  • RespectsRobots() returns false (JSON API)
  • Enabled() always true
  • BaseURL defaults to "https://rubygems.org"
  • Sweep() logic:
    1. BuildQueries(s.Registry, "rubygems")
    2. For each keyword, GET {BaseURL}/api/v1/search.json?query={keyword}&page=1
    3. Parse JSON array: [{"name": "...", "project_uri": "..."}]
    4. Define response struct: rubyGemEntry
    5. Emit Finding per gem: Source=project_uri, SourceType="recon:rubygems"

Tests — same httptest pattern:

  • cratesio_test.go: httptest serving canned JSON with crate entries. Verify User-Agent header is set. Test all standard categories.
  • rubygems_test.go: httptest serving canned JSON array. Test all standard categories. cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestCratesIO|TestRubyGems" -v -count=1 CratesIOSource and RubyGemsSource pass all tests. CratesIO sends proper User-Agent header. Both emit correct findings from httptest fixtures.
All 8 new files compile and pass tests: ```bash go test ./pkg/recon/sources/ -run "TestNpm|TestPyPI|TestCratesIO|TestRubyGems" -v -count=1 go vet ./pkg/recon/sources/ ```

<success_criteria>

  • 4 new source files implement recon.ReconSource interface
  • 4 test files use httptest with canned fixtures
  • All tests pass
  • No compilation errors across the package </success_criteria>
After completion, create `.planning/phases/13-osint_package_registries_container_iac/13-01-SUMMARY.md`