10 KiB
10 KiB
phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
| phase | plan | type | wave | depends_on | files_modified | autonomous | requirements | must_haves | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13-osint_package_registries_container_iac | 01 | execute | 1 |
|
true |
|
|
Purpose: Enables KeyHunter to scan the four most popular package registries for packages that may contain leaked API keys, covering JavaScript, Python, Rust, and Ruby ecosystems. Output: 4 source files + 4 test files in pkg/recon/sources/
<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @pkg/recon/source.go @pkg/recon/sources/register.go @pkg/recon/sources/httpclient.go @pkg/recon/sources/queries.go @pkg/recon/sources/replit.go (pattern reference — credentialless scraper source) @pkg/recon/sources/github.go (pattern reference — API-key-gated source) @pkg/recon/sources/replit_test.go (test pattern reference)From pkg/recon/source.go:
type ReconSource interface {
Name() string
RateLimit() rate.Limit
Burst() int
RespectsRobots() bool
Enabled(cfg Config) bool
Sweep(ctx context.Context, query string, out chan<- Finding) error
}
From pkg/recon/sources/httpclient.go:
func NewClient() *Client
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
From pkg/recon/sources/queries.go:
func BuildQueries(reg *providers.Registry, source string) []string
NpmSource (npm.go):
- Struct:
NpmSourcewith fieldsBaseURL string,Registry *providers.Registry,Limiters *recon.LimiterRegistry,Client *Client - Compile-time assertion:
var _ recon.ReconSource = (*NpmSource)(nil) - Name() returns "npm"
- RateLimit() returns rate.Every(2 * time.Second) — npm registry is generous but be polite
- Burst() returns 2
- RespectsRobots() returns false (API endpoint, not scraped HTML)
- Enabled() always returns true (no credentials needed)
- BaseURL defaults to "https://registry.npmjs.org" if empty
- Sweep() logic:
- Call BuildQueries(s.Registry, "npm") to get keyword list
- For each keyword, GET
{BaseURL}/-/v1/search?text={keyword}&size=20 - Parse JSON response:
{"objects": [{"package": {"name": "...", "links": {"npm": "..."}}}]} - Define response structs:
npmSearchResponse,npmObject,npmPackage,npmLinks - Emit one Finding per result with Source=links.npm (or construct from package name), SourceType="recon:npm", Confidence="low"
- Honor ctx cancellation between queries, use Limiters.Wait before each request
PyPISource (pypi.go):
- Same pattern as NpmSource
- Name() returns "pypi"
- RateLimit() returns rate.Every(2 * time.Second)
- Burst() returns 2
- RespectsRobots() returns false
- Enabled() always true
- BaseURL defaults to "https://pypi.org"
- Sweep() logic:
- BuildQueries(s.Registry, "pypi")
- For each keyword, GET
{BaseURL}/search/?q={keyword}&o=(HTML page) OR use the XML-RPC/JSON approach: Actually use the simple JSON API: GET{BaseURL}/pypi/{keyword}/jsonis for specific packages. For search, use: GEThttps://pypi.org/search/?q={keyword}and parse HTML for project links. Simpler approach: GET{BaseURL}/simple/is too large. Use the warehouse search page. Best approach: GET{BaseURL}/search/?q={keyword}returns HTML. Parse<a class="package-snippet" href="/project/{name}/">links. - Parse HTML response for project links matching
/project/[^/]+/pattern - Emit Finding per result with Source="{BaseURL}/project/{name}/", SourceType="recon:pypi"
- Use extractAnchorHrefs pattern or a simpler regex on href attributes
Tests — Follow replit_test.go pattern exactly:
- npm_test.go: httptest server returning canned npm search JSON. Test Sweep extracts findings, test Name/Rate/Burst, test ctx cancellation, test Enabled always true.
- pypi_test.go: httptest server returning canned HTML with package-snippet links. Same test categories. cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestNpm|TestPyPI" -v -count=1 NpmSource and PyPISource pass all tests: Sweep emits correct findings from httptest fixtures, Name/Rate/Burst/Enabled return expected values, ctx cancellation is handled
RubyGemsSource (rubygems.go):
- Same pattern
- Name() returns "rubygems"
- RateLimit() returns rate.Every(2 * time.Second)
- Burst() returns 2
- RespectsRobots() returns false (JSON API)
- Enabled() always true
- BaseURL defaults to "https://rubygems.org"
- Sweep() logic:
- BuildQueries(s.Registry, "rubygems")
- For each keyword, GET
{BaseURL}/api/v1/search.json?query={keyword}&page=1 - Parse JSON array:
[{"name": "...", "project_uri": "..."}] - Define response struct:
rubyGemEntry - Emit Finding per gem: Source=project_uri, SourceType="recon:rubygems"
Tests — same httptest pattern:
- cratesio_test.go: httptest serving canned JSON with crate entries. Verify User-Agent header is set. Test all standard categories.
- rubygems_test.go: httptest serving canned JSON array. Test all standard categories. cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestCratesIO|TestRubyGems" -v -count=1 CratesIOSource and RubyGemsSource pass all tests. CratesIO sends proper User-Agent header. Both emit correct findings from httptest fixtures.
<success_criteria>
- 4 new source files implement recon.ReconSource interface
- 4 test files use httptest with canned fixtures
- All tests pass
- No compilation errors across the package </success_criteria>