docs(13): create phase plan — 4 plans for package registries + container/IaC sources
This commit is contained in:
@@ -0,0 +1,235 @@
|
||||
---
|
||||
phase: 13-osint_package_registries_container_iac
|
||||
plan: 01
|
||||
type: execute
|
||||
wave: 1
|
||||
depends_on: []
|
||||
files_modified:
|
||||
- pkg/recon/sources/npm.go
|
||||
- pkg/recon/sources/npm_test.go
|
||||
- pkg/recon/sources/pypi.go
|
||||
- pkg/recon/sources/pypi_test.go
|
||||
- pkg/recon/sources/cratesio.go
|
||||
- pkg/recon/sources/cratesio_test.go
|
||||
- pkg/recon/sources/rubygems.go
|
||||
- pkg/recon/sources/rubygems_test.go
|
||||
autonomous: true
|
||||
requirements:
|
||||
- RECON-PKG-01
|
||||
- RECON-PKG-02
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "NpmSource searches npm registry for packages matching provider keywords and emits findings"
|
||||
- "PyPISource searches PyPI for packages matching provider keywords and emits findings"
|
||||
- "CratesIOSource searches crates.io for crates matching provider keywords and emits findings"
|
||||
- "RubyGemsSource searches rubygems.org for gems matching provider keywords and emits findings"
|
||||
- "All four sources handle context cancellation, empty registries, and HTTP errors gracefully"
|
||||
artifacts:
|
||||
- path: "pkg/recon/sources/npm.go"
|
||||
provides: "NpmSource implementing recon.ReconSource"
|
||||
contains: "func (s *NpmSource) Sweep"
|
||||
- path: "pkg/recon/sources/npm_test.go"
|
||||
provides: "httptest-based tests for NpmSource"
|
||||
contains: "httptest.NewServer"
|
||||
- path: "pkg/recon/sources/pypi.go"
|
||||
provides: "PyPISource implementing recon.ReconSource"
|
||||
contains: "func (s *PyPISource) Sweep"
|
||||
- path: "pkg/recon/sources/pypi_test.go"
|
||||
provides: "httptest-based tests for PyPISource"
|
||||
contains: "httptest.NewServer"
|
||||
- path: "pkg/recon/sources/cratesio.go"
|
||||
provides: "CratesIOSource implementing recon.ReconSource"
|
||||
contains: "func (s *CratesIOSource) Sweep"
|
||||
- path: "pkg/recon/sources/cratesio_test.go"
|
||||
provides: "httptest-based tests for CratesIOSource"
|
||||
contains: "httptest.NewServer"
|
||||
- path: "pkg/recon/sources/rubygems.go"
|
||||
provides: "RubyGemsSource implementing recon.ReconSource"
|
||||
contains: "func (s *RubyGemsSource) Sweep"
|
||||
- path: "pkg/recon/sources/rubygems_test.go"
|
||||
provides: "httptest-based tests for RubyGemsSource"
|
||||
contains: "httptest.NewServer"
|
||||
key_links:
|
||||
- from: "pkg/recon/sources/npm.go"
|
||||
to: "pkg/recon/source.go"
|
||||
via: "implements ReconSource interface"
|
||||
pattern: "var _ recon\\.ReconSource"
|
||||
- from: "pkg/recon/sources/pypi.go"
|
||||
to: "pkg/recon/source.go"
|
||||
via: "implements ReconSource interface"
|
||||
pattern: "var _ recon\\.ReconSource"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Implement four package registry ReconSource modules: npm, PyPI, Crates.io, and RubyGems.
|
||||
|
||||
Purpose: Enables KeyHunter to scan the four most popular package registries for packages that may contain leaked API keys, covering JavaScript, Python, Rust, and Ruby ecosystems.
|
||||
Output: 4 source files + 4 test files in pkg/recon/sources/
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@pkg/recon/source.go
|
||||
@pkg/recon/sources/register.go
|
||||
@pkg/recon/sources/httpclient.go
|
||||
@pkg/recon/sources/queries.go
|
||||
@pkg/recon/sources/replit.go (pattern reference — credentialless scraper source)
|
||||
@pkg/recon/sources/github.go (pattern reference — API-key-gated source)
|
||||
@pkg/recon/sources/replit_test.go (test pattern reference)
|
||||
|
||||
<interfaces>
|
||||
<!-- Executor needs these contracts. Extracted from codebase. -->
|
||||
|
||||
From pkg/recon/source.go:
|
||||
```go
|
||||
type ReconSource interface {
|
||||
Name() string
|
||||
RateLimit() rate.Limit
|
||||
Burst() int
|
||||
RespectsRobots() bool
|
||||
Enabled(cfg Config) bool
|
||||
Sweep(ctx context.Context, query string, out chan<- Finding) error
|
||||
}
|
||||
```
|
||||
|
||||
From pkg/recon/sources/httpclient.go:
|
||||
```go
|
||||
func NewClient() *Client
|
||||
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
|
||||
```
|
||||
|
||||
From pkg/recon/sources/queries.go:
|
||||
```go
|
||||
func BuildQueries(reg *providers.Registry, source string) []string
|
||||
```
|
||||
</interfaces>
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Implement NpmSource and PyPISource</name>
|
||||
<files>pkg/recon/sources/npm.go, pkg/recon/sources/npm_test.go, pkg/recon/sources/pypi.go, pkg/recon/sources/pypi_test.go</files>
|
||||
<action>
|
||||
Create NpmSource in npm.go following the established ReplitSource pattern (credentialless, RespectsRobots=true):
|
||||
|
||||
**NpmSource** (npm.go):
|
||||
- Struct: `NpmSource` with fields `BaseURL string`, `Registry *providers.Registry`, `Limiters *recon.LimiterRegistry`, `Client *Client`
|
||||
- Compile-time assertion: `var _ recon.ReconSource = (*NpmSource)(nil)`
|
||||
- Name() returns "npm"
|
||||
- RateLimit() returns rate.Every(2 * time.Second) — npm registry is generous but be polite
|
||||
- Burst() returns 2
|
||||
- RespectsRobots() returns false (API endpoint, not scraped HTML)
|
||||
- Enabled() always returns true (no credentials needed)
|
||||
- BaseURL defaults to "https://registry.npmjs.org" if empty
|
||||
- Sweep() logic:
|
||||
1. Call BuildQueries(s.Registry, "npm") to get keyword list
|
||||
2. For each keyword, GET `{BaseURL}/-/v1/search?text={keyword}&size=20`
|
||||
3. Parse JSON response: `{"objects": [{"package": {"name": "...", "links": {"npm": "..."}}}]}`
|
||||
4. Define response structs: `npmSearchResponse`, `npmObject`, `npmPackage`, `npmLinks`
|
||||
5. Emit one Finding per result with Source=links.npm (or construct from package name), SourceType="recon:npm", Confidence="low"
|
||||
6. Honor ctx cancellation between queries, use Limiters.Wait before each request
|
||||
|
||||
**PyPISource** (pypi.go):
|
||||
- Same pattern as NpmSource
|
||||
- Name() returns "pypi"
|
||||
- RateLimit() returns rate.Every(2 * time.Second)
|
||||
- Burst() returns 2
|
||||
- RespectsRobots() returns false
|
||||
- Enabled() always true
|
||||
- BaseURL defaults to "https://pypi.org"
|
||||
- Sweep() logic:
|
||||
1. BuildQueries(s.Registry, "pypi")
|
||||
2. For each keyword, GET `{BaseURL}/search/?q={keyword}&o=` (HTML page) OR use the XML-RPC/JSON approach:
|
||||
Actually use the simple JSON API: GET `{BaseURL}/pypi/{keyword}/json` is for specific packages.
|
||||
For search, use: GET `https://pypi.org/search/?q={keyword}` and parse HTML for project links.
|
||||
Simpler approach: GET `{BaseURL}/simple/` is too large. Use the warehouse search page.
|
||||
Best approach: GET `{BaseURL}/search/?q={keyword}` returns HTML. Parse `<a class="package-snippet" href="/project/{name}/">` links.
|
||||
3. Parse HTML response for project links matching `/project/[^/]+/` pattern
|
||||
4. Emit Finding per result with Source="{BaseURL}/project/{name}/", SourceType="recon:pypi"
|
||||
5. Use extractAnchorHrefs pattern or a simpler regex on href attributes
|
||||
|
||||
**Tests** — Follow replit_test.go pattern exactly:
|
||||
- npm_test.go: httptest server returning canned npm search JSON. Test Sweep extracts findings, test Name/Rate/Burst, test ctx cancellation, test Enabled always true.
|
||||
- pypi_test.go: httptest server returning canned HTML with package-snippet links. Same test categories.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestNpm|TestPyPI" -v -count=1</automated>
|
||||
</verify>
|
||||
<done>NpmSource and PyPISource pass all tests: Sweep emits correct findings from httptest fixtures, Name/Rate/Burst/Enabled return expected values, ctx cancellation is handled</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Implement CratesIOSource and RubyGemsSource</name>
|
||||
<files>pkg/recon/sources/cratesio.go, pkg/recon/sources/cratesio_test.go, pkg/recon/sources/rubygems.go, pkg/recon/sources/rubygems_test.go</files>
|
||||
<action>
|
||||
**CratesIOSource** (cratesio.go):
|
||||
- Struct: `CratesIOSource` with `BaseURL`, `Registry`, `Limiters`, `Client`
|
||||
- Compile-time assertion: `var _ recon.ReconSource = (*CratesIOSource)(nil)`
|
||||
- Name() returns "crates"
|
||||
- RateLimit() returns rate.Every(1 * time.Second) — crates.io asks for 1 req/sec
|
||||
- Burst() returns 1
|
||||
- RespectsRobots() returns false (JSON API)
|
||||
- Enabled() always true
|
||||
- BaseURL defaults to "https://crates.io"
|
||||
- Sweep() logic:
|
||||
1. BuildQueries(s.Registry, "crates")
|
||||
2. For each keyword, GET `{BaseURL}/api/v1/crates?q={keyword}&per_page=20`
|
||||
3. Parse JSON: `{"crates": [{"id": "...", "name": "...", "repository": "..."}]}`
|
||||
4. Define response structs: `cratesSearchResponse`, `crateEntry`
|
||||
5. Emit Finding per crate: Source="https://crates.io/crates/{name}", SourceType="recon:crates"
|
||||
6. IMPORTANT: crates.io requires a custom User-Agent header. Set req.Header.Set("User-Agent", "keyhunter-recon/1.0 (https://github.com/salvacybersec/keyhunter)") before passing to client.Do
|
||||
|
||||
**RubyGemsSource** (rubygems.go):
|
||||
- Same pattern
|
||||
- Name() returns "rubygems"
|
||||
- RateLimit() returns rate.Every(2 * time.Second)
|
||||
- Burst() returns 2
|
||||
- RespectsRobots() returns false (JSON API)
|
||||
- Enabled() always true
|
||||
- BaseURL defaults to "https://rubygems.org"
|
||||
- Sweep() logic:
|
||||
1. BuildQueries(s.Registry, "rubygems")
|
||||
2. For each keyword, GET `{BaseURL}/api/v1/search.json?query={keyword}&page=1`
|
||||
3. Parse JSON array: `[{"name": "...", "project_uri": "..."}]`
|
||||
4. Define response struct: `rubyGemEntry`
|
||||
5. Emit Finding per gem: Source=project_uri, SourceType="recon:rubygems"
|
||||
|
||||
**Tests** — same httptest pattern:
|
||||
- cratesio_test.go: httptest serving canned JSON with crate entries. Verify User-Agent header is set. Test all standard categories.
|
||||
- rubygems_test.go: httptest serving canned JSON array. Test all standard categories.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestCratesIO|TestRubyGems" -v -count=1</automated>
|
||||
</verify>
|
||||
<done>CratesIOSource and RubyGemsSource pass all tests. CratesIO sends proper User-Agent header. Both emit correct findings from httptest fixtures.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
All 8 new files compile and pass tests:
|
||||
```bash
|
||||
go test ./pkg/recon/sources/ -run "TestNpm|TestPyPI|TestCratesIO|TestRubyGems" -v -count=1
|
||||
go vet ./pkg/recon/sources/
|
||||
```
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- 4 new source files implement recon.ReconSource interface
|
||||
- 4 test files use httptest with canned fixtures
|
||||
- All tests pass
|
||||
- No compilation errors across the package
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/13-osint_package_registries_container_iac/13-01-SUMMARY.md`
|
||||
</output>
|
||||
Reference in New Issue
Block a user