docs(13): create phase plan — 4 plans for package registries + container/IaC sources

This commit is contained in:
salvacybersec
2026-04-06 12:50:38 +03:00
parent a5253cf9dd
commit 877ae8c743
5 changed files with 917 additions and 1 deletions

View File

@@ -270,7 +270,12 @@ Plans:
3. `keyhunter recon --sources=dockerhub` extracts and scans image layers and build args from public Docker Hub images
4. `keyhunter recon --sources=k8s` discovers publicly exposed Kubernetes dashboards and scans publicly readable Secret/ConfigMap objects
5. `keyhunter recon --sources=terraform,helm,ansible` scans Terraform registry modules, Helm chart repositories, and Ansible Galaxy roles
**Plans**: TBD
**Plans**: 4 plans
Plans:
- [ ] 13-01-PLAN.md — NpmSource + PyPISource + CratesIOSource + RubyGemsSource (RECON-PKG-01, RECON-PKG-02)
- [ ] 13-02-PLAN.md — MavenSource + NuGetSource + GoProxySource + PackagistSource (RECON-PKG-02, RECON-PKG-03)
- [ ] 13-03-PLAN.md — DockerHubSource + KubernetesSource + TerraformSource + HelmSource (RECON-INFRA-01..04)
- [ ] 13-04-PLAN.md — RegisterAll wiring + integration test (all Phase 13 reqs)
### Phase 14: OSINT CI/CD Logs, Web Archives & Frontend Leaks
**Goal**: Users can scan public CI/CD build logs, historical web snapshots from the Wayback Machine and CommonCrawl, and frontend JavaScript artifacts (source maps, webpack bundles, exposed .env files) for leaked API keys

View File

@@ -0,0 +1,235 @@
---
phase: 13-osint_package_registries_container_iac
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- pkg/recon/sources/npm.go
- pkg/recon/sources/npm_test.go
- pkg/recon/sources/pypi.go
- pkg/recon/sources/pypi_test.go
- pkg/recon/sources/cratesio.go
- pkg/recon/sources/cratesio_test.go
- pkg/recon/sources/rubygems.go
- pkg/recon/sources/rubygems_test.go
autonomous: true
requirements:
- RECON-PKG-01
- RECON-PKG-02
must_haves:
truths:
- "NpmSource searches npm registry for packages matching provider keywords and emits findings"
- "PyPISource searches PyPI for packages matching provider keywords and emits findings"
- "CratesIOSource searches crates.io for crates matching provider keywords and emits findings"
- "RubyGemsSource searches rubygems.org for gems matching provider keywords and emits findings"
- "All four sources handle context cancellation, empty registries, and HTTP errors gracefully"
artifacts:
- path: "pkg/recon/sources/npm.go"
provides: "NpmSource implementing recon.ReconSource"
contains: "func (s *NpmSource) Sweep"
- path: "pkg/recon/sources/npm_test.go"
provides: "httptest-based tests for NpmSource"
contains: "httptest.NewServer"
- path: "pkg/recon/sources/pypi.go"
provides: "PyPISource implementing recon.ReconSource"
contains: "func (s *PyPISource) Sweep"
- path: "pkg/recon/sources/pypi_test.go"
provides: "httptest-based tests for PyPISource"
contains: "httptest.NewServer"
- path: "pkg/recon/sources/cratesio.go"
provides: "CratesIOSource implementing recon.ReconSource"
contains: "func (s *CratesIOSource) Sweep"
- path: "pkg/recon/sources/cratesio_test.go"
provides: "httptest-based tests for CratesIOSource"
contains: "httptest.NewServer"
- path: "pkg/recon/sources/rubygems.go"
provides: "RubyGemsSource implementing recon.ReconSource"
contains: "func (s *RubyGemsSource) Sweep"
- path: "pkg/recon/sources/rubygems_test.go"
provides: "httptest-based tests for RubyGemsSource"
contains: "httptest.NewServer"
key_links:
- from: "pkg/recon/sources/npm.go"
to: "pkg/recon/source.go"
via: "implements ReconSource interface"
pattern: "var _ recon\\.ReconSource"
- from: "pkg/recon/sources/pypi.go"
to: "pkg/recon/source.go"
via: "implements ReconSource interface"
pattern: "var _ recon\\.ReconSource"
---
<objective>
Implement four package registry ReconSource modules: npm, PyPI, Crates.io, and RubyGems.
Purpose: Enables KeyHunter to scan the four most popular package registries for packages that may contain leaked API keys, covering JavaScript, Python, Rust, and Ruby ecosystems.
Output: 4 source files + 4 test files in pkg/recon/sources/
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@pkg/recon/source.go
@pkg/recon/sources/register.go
@pkg/recon/sources/httpclient.go
@pkg/recon/sources/queries.go
@pkg/recon/sources/replit.go (pattern reference — credentialless scraper source)
@pkg/recon/sources/github.go (pattern reference — API-key-gated source)
@pkg/recon/sources/replit_test.go (test pattern reference)
<interfaces>
<!-- Executor needs these contracts. Extracted from codebase. -->
From pkg/recon/source.go:
```go
type ReconSource interface {
Name() string
RateLimit() rate.Limit
Burst() int
RespectsRobots() bool
Enabled(cfg Config) bool
Sweep(ctx context.Context, query string, out chan<- Finding) error
}
```
From pkg/recon/sources/httpclient.go:
```go
func NewClient() *Client
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
```
From pkg/recon/sources/queries.go:
```go
func BuildQueries(reg *providers.Registry, source string) []string
```
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Implement NpmSource and PyPISource</name>
<files>pkg/recon/sources/npm.go, pkg/recon/sources/npm_test.go, pkg/recon/sources/pypi.go, pkg/recon/sources/pypi_test.go</files>
<action>
Create NpmSource in npm.go following the established ReplitSource pattern (credentialless, RespectsRobots=true):
**NpmSource** (npm.go):
- Struct: `NpmSource` with fields `BaseURL string`, `Registry *providers.Registry`, `Limiters *recon.LimiterRegistry`, `Client *Client`
- Compile-time assertion: `var _ recon.ReconSource = (*NpmSource)(nil)`
- Name() returns "npm"
- RateLimit() returns rate.Every(2 * time.Second) — npm registry is generous but be polite
- Burst() returns 2
- RespectsRobots() returns false (API endpoint, not scraped HTML)
- Enabled() always returns true (no credentials needed)
- BaseURL defaults to "https://registry.npmjs.org" if empty
- Sweep() logic:
1. Call BuildQueries(s.Registry, "npm") to get keyword list
2. For each keyword, GET `{BaseURL}/-/v1/search?text={keyword}&size=20`
3. Parse JSON response: `{"objects": [{"package": {"name": "...", "links": {"npm": "..."}}}]}`
4. Define response structs: `npmSearchResponse`, `npmObject`, `npmPackage`, `npmLinks`
5. Emit one Finding per result with Source=links.npm (or construct from package name), SourceType="recon:npm", Confidence="low"
6. Honor ctx cancellation between queries, use Limiters.Wait before each request
**PyPISource** (pypi.go):
- Same pattern as NpmSource
- Name() returns "pypi"
- RateLimit() returns rate.Every(2 * time.Second)
- Burst() returns 2
- RespectsRobots() returns false
- Enabled() always true
- BaseURL defaults to "https://pypi.org"
- Sweep() logic:
1. BuildQueries(s.Registry, "pypi")
2. For each keyword, GET `{BaseURL}/search/?q={keyword}&o=` (HTML page) OR use the XML-RPC/JSON approach:
Actually use the simple JSON API: GET `{BaseURL}/pypi/{keyword}/json` is for specific packages.
For search, use: GET `https://pypi.org/search/?q={keyword}` and parse HTML for project links.
Simpler approach: GET `{BaseURL}/simple/` is too large. Use the warehouse search page.
Best approach: GET `{BaseURL}/search/?q={keyword}` returns HTML. Parse `<a class="package-snippet" href="/project/{name}/">` links.
3. Parse HTML response for project links matching `/project/[^/]+/` pattern
4. Emit Finding per result with Source="{BaseURL}/project/{name}/", SourceType="recon:pypi"
5. Use extractAnchorHrefs pattern or a simpler regex on href attributes
**Tests** — Follow replit_test.go pattern exactly:
- npm_test.go: httptest server returning canned npm search JSON. Test Sweep extracts findings, test Name/Rate/Burst, test ctx cancellation, test Enabled always true.
- pypi_test.go: httptest server returning canned HTML with package-snippet links. Same test categories.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestNpm|TestPyPI" -v -count=1</automated>
</verify>
<done>NpmSource and PyPISource pass all tests: Sweep emits correct findings from httptest fixtures, Name/Rate/Burst/Enabled return expected values, ctx cancellation is handled</done>
</task>
<task type="auto">
<name>Task 2: Implement CratesIOSource and RubyGemsSource</name>
<files>pkg/recon/sources/cratesio.go, pkg/recon/sources/cratesio_test.go, pkg/recon/sources/rubygems.go, pkg/recon/sources/rubygems_test.go</files>
<action>
**CratesIOSource** (cratesio.go):
- Struct: `CratesIOSource` with `BaseURL`, `Registry`, `Limiters`, `Client`
- Compile-time assertion: `var _ recon.ReconSource = (*CratesIOSource)(nil)`
- Name() returns "crates"
- RateLimit() returns rate.Every(1 * time.Second) — crates.io asks for 1 req/sec
- Burst() returns 1
- RespectsRobots() returns false (JSON API)
- Enabled() always true
- BaseURL defaults to "https://crates.io"
- Sweep() logic:
1. BuildQueries(s.Registry, "crates")
2. For each keyword, GET `{BaseURL}/api/v1/crates?q={keyword}&per_page=20`
3. Parse JSON: `{"crates": [{"id": "...", "name": "...", "repository": "..."}]}`
4. Define response structs: `cratesSearchResponse`, `crateEntry`
5. Emit Finding per crate: Source="https://crates.io/crates/{name}", SourceType="recon:crates"
6. IMPORTANT: crates.io requires a custom User-Agent header. Set req.Header.Set("User-Agent", "keyhunter-recon/1.0 (https://github.com/salvacybersec/keyhunter)") before passing to client.Do
**RubyGemsSource** (rubygems.go):
- Same pattern
- Name() returns "rubygems"
- RateLimit() returns rate.Every(2 * time.Second)
- Burst() returns 2
- RespectsRobots() returns false (JSON API)
- Enabled() always true
- BaseURL defaults to "https://rubygems.org"
- Sweep() logic:
1. BuildQueries(s.Registry, "rubygems")
2. For each keyword, GET `{BaseURL}/api/v1/search.json?query={keyword}&page=1`
3. Parse JSON array: `[{"name": "...", "project_uri": "..."}]`
4. Define response struct: `rubyGemEntry`
5. Emit Finding per gem: Source=project_uri, SourceType="recon:rubygems"
**Tests** — same httptest pattern:
- cratesio_test.go: httptest serving canned JSON with crate entries. Verify User-Agent header is set. Test all standard categories.
- rubygems_test.go: httptest serving canned JSON array. Test all standard categories.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestCratesIO|TestRubyGems" -v -count=1</automated>
</verify>
<done>CratesIOSource and RubyGemsSource pass all tests. CratesIO sends proper User-Agent header. Both emit correct findings from httptest fixtures.</done>
</task>
</tasks>
<verification>
All 8 new files compile and pass tests:
```bash
go test ./pkg/recon/sources/ -run "TestNpm|TestPyPI|TestCratesIO|TestRubyGems" -v -count=1
go vet ./pkg/recon/sources/
```
</verification>
<success_criteria>
- 4 new source files implement recon.ReconSource interface
- 4 test files use httptest with canned fixtures
- All tests pass
- No compilation errors across the package
</success_criteria>
<output>
After completion, create `.planning/phases/13-osint_package_registries_container_iac/13-01-SUMMARY.md`
</output>

View File

@@ -0,0 +1,215 @@
---
phase: 13-osint_package_registries_container_iac
plan: 02
type: execute
wave: 1
depends_on: []
files_modified:
- pkg/recon/sources/maven.go
- pkg/recon/sources/maven_test.go
- pkg/recon/sources/nuget.go
- pkg/recon/sources/nuget_test.go
- pkg/recon/sources/goproxy.go
- pkg/recon/sources/goproxy_test.go
- pkg/recon/sources/packagist.go
- pkg/recon/sources/packagist_test.go
autonomous: true
requirements:
- RECON-PKG-02
- RECON-PKG-03
must_haves:
truths:
- "MavenSource searches Maven Central for artifacts matching provider keywords and emits findings"
- "NuGetSource searches NuGet gallery for packages matching provider keywords and emits findings"
- "GoProxySource searches Go module proxy for modules matching provider keywords and emits findings"
- "PackagistSource searches Packagist for PHP packages matching provider keywords and emits findings"
- "All four sources handle context cancellation, empty registries, and HTTP errors gracefully"
artifacts:
- path: "pkg/recon/sources/maven.go"
provides: "MavenSource implementing recon.ReconSource"
contains: "func (s *MavenSource) Sweep"
- path: "pkg/recon/sources/nuget.go"
provides: "NuGetSource implementing recon.ReconSource"
contains: "func (s *NuGetSource) Sweep"
- path: "pkg/recon/sources/goproxy.go"
provides: "GoProxySource implementing recon.ReconSource"
contains: "func (s *GoProxySource) Sweep"
- path: "pkg/recon/sources/packagist.go"
provides: "PackagistSource implementing recon.ReconSource"
contains: "func (s *PackagistSource) Sweep"
key_links:
- from: "pkg/recon/sources/maven.go"
to: "pkg/recon/source.go"
via: "implements ReconSource interface"
pattern: "var _ recon\\.ReconSource"
- from: "pkg/recon/sources/nuget.go"
to: "pkg/recon/source.go"
via: "implements ReconSource interface"
pattern: "var _ recon\\.ReconSource"
---
<objective>
Implement four package registry ReconSource modules: Maven Central, NuGet, Go Proxy, and Packagist.
Purpose: Extends package registry coverage to Java/JVM, .NET, Go, and PHP ecosystems, completing the full set of 8 package registries for RECON-PKG-02 and RECON-PKG-03.
Output: 4 source files + 4 test files in pkg/recon/sources/
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@pkg/recon/source.go
@pkg/recon/sources/httpclient.go
@pkg/recon/sources/queries.go
@pkg/recon/sources/replit.go (pattern reference)
@pkg/recon/sources/replit_test.go (test pattern reference)
<interfaces>
From pkg/recon/source.go:
```go
type ReconSource interface {
Name() string
RateLimit() rate.Limit
Burst() int
RespectsRobots() bool
Enabled(cfg Config) bool
Sweep(ctx context.Context, query string, out chan<- Finding) error
}
```
From pkg/recon/sources/httpclient.go:
```go
func NewClient() *Client
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
```
From pkg/recon/sources/queries.go:
```go
func BuildQueries(reg *providers.Registry, source string) []string
```
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Implement MavenSource and NuGetSource</name>
<files>pkg/recon/sources/maven.go, pkg/recon/sources/maven_test.go, pkg/recon/sources/nuget.go, pkg/recon/sources/nuget_test.go</files>
<action>
**MavenSource** (maven.go):
- Struct: `MavenSource` with `BaseURL`, `Registry`, `Limiters`, `Client`
- Compile-time assertion: `var _ recon.ReconSource = (*MavenSource)(nil)`
- Name() returns "maven"
- RateLimit() returns rate.Every(2 * time.Second)
- Burst() returns 2
- RespectsRobots() returns false (JSON API)
- Enabled() always true (no credentials needed)
- BaseURL defaults to "https://search.maven.org"
- Sweep() logic:
1. BuildQueries(s.Registry, "maven")
2. For each keyword, GET `{BaseURL}/solrsearch/select?q={keyword}&rows=20&wt=json`
3. Parse JSON: `{"response": {"docs": [{"g": "group", "a": "artifact", "latestVersion": "1.0"}]}}`
4. Define response structs: `mavenSearchResponse`, `mavenResponseBody`, `mavenDoc`
5. Emit Finding per doc: Source="https://search.maven.org/artifact/{g}/{a}/{latestVersion}/jar", SourceType="recon:maven"
**NuGetSource** (nuget.go):
- Struct: `NuGetSource` with `BaseURL`, `Registry`, `Limiters`, `Client`
- Compile-time assertion: `var _ recon.ReconSource = (*NuGetSource)(nil)`
- Name() returns "nuget"
- RateLimit() returns rate.Every(1 * time.Second)
- Burst() returns 3
- RespectsRobots() returns false (JSON API)
- Enabled() always true
- BaseURL defaults to "https://azuresearch-usnc.nuget.org"
- Sweep() logic:
1. BuildQueries(s.Registry, "nuget")
2. For each keyword, GET `{BaseURL}/query?q={keyword}&take=20`
3. Parse JSON: `{"data": [{"id": "...", "version": "...", "projectUrl": "..."}]}`
4. Define response structs: `nugetSearchResponse`, `nugetPackage`
5. Emit Finding per package: Source=projectUrl (fallback to "https://www.nuget.org/packages/{id}"), SourceType="recon:nuget"
**Tests** — httptest pattern:
- maven_test.go: httptest serving canned Solr JSON. Test Sweep extracts findings, Name/Rate/Burst, ctx cancellation.
- nuget_test.go: httptest serving canned NuGet search JSON. Same test categories.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestMaven|TestNuGet" -v -count=1</automated>
</verify>
<done>MavenSource and NuGetSource pass all tests: findings extracted from httptest fixtures, metadata methods return expected values</done>
</task>
<task type="auto">
<name>Task 2: Implement GoProxySource and PackagistSource</name>
<files>pkg/recon/sources/goproxy.go, pkg/recon/sources/goproxy_test.go, pkg/recon/sources/packagist.go, pkg/recon/sources/packagist_test.go</files>
<action>
**GoProxySource** (goproxy.go):
- Struct: `GoProxySource` with `BaseURL`, `Registry`, `Limiters`, `Client`
- Compile-time assertion: `var _ recon.ReconSource = (*GoProxySource)(nil)`
- Name() returns "goproxy"
- RateLimit() returns rate.Every(2 * time.Second)
- Burst() returns 2
- RespectsRobots() returns false
- Enabled() always true
- BaseURL defaults to "https://pkg.go.dev"
- Sweep() logic:
1. BuildQueries(s.Registry, "goproxy")
2. For each keyword, GET `{BaseURL}/search?q={keyword}&m=package` — this returns HTML
3. Parse HTML for search result links matching pattern `/[^"]+` inside `<a data-href=` or `<a href="/...">` elements with class containing "SearchSnippet"
4. Simpler approach: use regex to extract hrefs matching `href="(/[a-z][^"]*)"` from search result snippet divs
5. Emit Finding per result: Source="{BaseURL}{path}", SourceType="recon:goproxy"
6. Note: pkg.go.dev search returns HTML, not JSON. Use the same HTML parsing approach as ReplitSource (extractAnchorHrefs with appropriate regex).
7. Define a package-level regexp: `goProxyLinkRE = regexp.MustCompile(`^/[a-z][a-z0-9./_-]*$`)` to match Go module paths
**PackagistSource** (packagist.go):
- Struct: `PackagistSource` with `BaseURL`, `Registry`, `Limiters`, `Client`
- Compile-time assertion: `var _ recon.ReconSource = (*PackagistSource)(nil)`
- Name() returns "packagist"
- RateLimit() returns rate.Every(2 * time.Second)
- Burst() returns 2
- RespectsRobots() returns false (JSON API)
- Enabled() always true
- BaseURL defaults to "https://packagist.org"
- Sweep() logic:
1. BuildQueries(s.Registry, "packagist")
2. For each keyword, GET `{BaseURL}/search.json?q={keyword}&per_page=20`
3. Parse JSON: `{"results": [{"name": "vendor/package", "url": "..."}]}`
4. Define response structs: `packagistSearchResponse`, `packagistPackage`
5. Emit Finding per package: Source=url, SourceType="recon:packagist"
**Tests** — httptest pattern:
- goproxy_test.go: httptest serving canned HTML with search result links. Test extraction of Go module paths.
- packagist_test.go: httptest serving canned Packagist JSON. Test all standard categories.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestGoProxy|TestPackagist" -v -count=1</automated>
</verify>
<done>GoProxySource and PackagistSource pass all tests. GoProxy HTML parsing extracts module paths correctly. Packagist JSON parsing works.</done>
</task>
</tasks>
<verification>
All 8 new files compile and pass tests:
```bash
go test ./pkg/recon/sources/ -run "TestMaven|TestNuGet|TestGoProxy|TestPackagist" -v -count=1
go vet ./pkg/recon/sources/
```
</verification>
<success_criteria>
- 4 new source files implement recon.ReconSource interface
- 4 test files use httptest with canned fixtures
- All tests pass
- No compilation errors across the package
</success_criteria>
<output>
After completion, create `.planning/phases/13-osint_package_registries_container_iac/13-02-SUMMARY.md`
</output>

View File

@@ -0,0 +1,224 @@
---
phase: 13-osint_package_registries_container_iac
plan: 03
type: execute
wave: 1
depends_on: []
files_modified:
- pkg/recon/sources/dockerhub.go
- pkg/recon/sources/dockerhub_test.go
- pkg/recon/sources/kubernetes.go
- pkg/recon/sources/kubernetes_test.go
- pkg/recon/sources/terraform.go
- pkg/recon/sources/terraform_test.go
- pkg/recon/sources/helm.go
- pkg/recon/sources/helm_test.go
autonomous: true
requirements:
- RECON-INFRA-01
- RECON-INFRA-02
- RECON-INFRA-03
- RECON-INFRA-04
must_haves:
truths:
- "DockerHubSource searches Docker Hub for images matching provider keywords and emits findings"
- "KubernetesSource searches for publicly exposed Kubernetes configs via search/dorking and emits findings"
- "TerraformSource searches Terraform Registry for modules matching provider keywords and emits findings"
- "HelmSource searches Artifact Hub for Helm charts matching provider keywords and emits findings"
- "All four sources handle context cancellation, empty registries, and HTTP errors gracefully"
artifacts:
- path: "pkg/recon/sources/dockerhub.go"
provides: "DockerHubSource implementing recon.ReconSource"
contains: "func (s *DockerHubSource) Sweep"
- path: "pkg/recon/sources/kubernetes.go"
provides: "KubernetesSource implementing recon.ReconSource"
contains: "func (s *KubernetesSource) Sweep"
- path: "pkg/recon/sources/terraform.go"
provides: "TerraformSource implementing recon.ReconSource"
contains: "func (s *TerraformSource) Sweep"
- path: "pkg/recon/sources/helm.go"
provides: "HelmSource implementing recon.ReconSource"
contains: "func (s *HelmSource) Sweep"
key_links:
- from: "pkg/recon/sources/dockerhub.go"
to: "pkg/recon/source.go"
via: "implements ReconSource interface"
pattern: "var _ recon\\.ReconSource"
- from: "pkg/recon/sources/terraform.go"
to: "pkg/recon/source.go"
via: "implements ReconSource interface"
pattern: "var _ recon\\.ReconSource"
---
<objective>
Implement four container and infrastructure-as-code ReconSource modules: Docker Hub, Kubernetes, Terraform Registry, and Helm (via Artifact Hub).
Purpose: Enables KeyHunter to scan container images, Kubernetes configs, Terraform modules, and Helm charts for leaked API keys embedded in infrastructure definitions.
Output: 4 source files + 4 test files in pkg/recon/sources/
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@pkg/recon/source.go
@pkg/recon/sources/httpclient.go
@pkg/recon/sources/queries.go
@pkg/recon/sources/replit.go (pattern reference)
@pkg/recon/sources/shodan.go (pattern reference — search API source)
@pkg/recon/sources/replit_test.go (test pattern reference)
<interfaces>
From pkg/recon/source.go:
```go
type ReconSource interface {
Name() string
RateLimit() rate.Limit
Burst() int
RespectsRobots() bool
Enabled(cfg Config) bool
Sweep(ctx context.Context, query string, out chan<- Finding) error
}
```
From pkg/recon/sources/httpclient.go:
```go
func NewClient() *Client
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
```
From pkg/recon/sources/queries.go:
```go
func BuildQueries(reg *providers.Registry, source string) []string
```
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Implement DockerHubSource and KubernetesSource</name>
<files>pkg/recon/sources/dockerhub.go, pkg/recon/sources/dockerhub_test.go, pkg/recon/sources/kubernetes.go, pkg/recon/sources/kubernetes_test.go</files>
<action>
**DockerHubSource** (dockerhub.go):
- Struct: `DockerHubSource` with `BaseURL`, `Registry`, `Limiters`, `Client`
- Compile-time assertion: `var _ recon.ReconSource = (*DockerHubSource)(nil)`
- Name() returns "dockerhub"
- RateLimit() returns rate.Every(2 * time.Second) — Docker Hub rate limits unauthenticated at ~100 pulls/6h, search is more lenient
- Burst() returns 2
- RespectsRobots() returns false (JSON API)
- Enabled() always true (Docker Hub search is unauthenticated)
- BaseURL defaults to "https://hub.docker.com"
- Sweep() logic:
1. BuildQueries(s.Registry, "dockerhub")
2. For each keyword, GET `{BaseURL}/v2/search/repositories/?query={keyword}&page_size=20`
3. Parse JSON: `{"results": [{"repo_name": "...", "description": "...", "is_official": false}]}`
4. Define response structs: `dockerHubSearchResponse`, `dockerHubRepo`
5. Emit Finding per result: Source="https://hub.docker.com/r/{repo_name}", SourceType="recon:dockerhub"
6. Description in finding can hint at build-arg or env-var exposure
**KubernetesSource** (kubernetes.go):
- Struct: `KubernetesSource` with `BaseURL`, `Registry`, `Limiters`, `Client`
- Compile-time assertion: `var _ recon.ReconSource = (*KubernetesSource)(nil)`
- Name() returns "k8s"
- RateLimit() returns rate.Every(3 * time.Second)
- Burst() returns 1
- RespectsRobots() returns true (searches public web for exposed K8s dashboards/configs)
- Enabled() always true
- BaseURL defaults to "https://search.censys.io" — uses Censys-style search for exposed K8s dashboards
- ALTERNATIVE simpler approach: Search GitHub for exposed Kubernetes manifests containing secrets.
Use BaseURL "https://api.github.com" and search for `kind: Secret` or `apiVersion: v1 kind: ConfigMap` with provider keywords.
BUT this duplicates GitHubSource.
- BEST approach: Use a dedicated search via pkg.go.dev-style HTML scraping but for Kubernetes YAML files on public artifact hubs.
Actually, the simplest approach that aligns with RECON-INFRA-02 ("discovers publicly exposed Kubernetes dashboards and scans publicly readable Secret/ConfigMap objects"):
Use Shodan/Censys-style dork queries. But those sources already exist.
- FINAL approach: KubernetesSource searches Artifact Hub (artifacthub.io) for Kubernetes manifests/operators that may embed secrets. ArtifactHub has a JSON API.
GET `{BaseURL}/api/v1/packages/search?ts_query_web={keyword}&kind=0&limit=20` (kind=0 = Helm charts, but also covers operators)
Actually, use kind=6 for "Kube Operator" or leave blank for all kinds.
BaseURL defaults to "https://artifacthub.io"
Parse JSON: `{"packages": [{"name": "...", "normalized_name": "...", "repository": {"name": "...", "url": "..."}}]}`
Emit Finding: Source="https://artifacthub.io/packages/{repository.kind}/{repository.name}/{package.name}", SourceType="recon:k8s"
**Tests** — httptest pattern:
- dockerhub_test.go: httptest serving canned Docker Hub search JSON. Verify findings have correct SourceType and Source URL format.
- kubernetes_test.go: httptest serving canned Artifact Hub search JSON. Standard test categories.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestDockerHub|TestKubernetes" -v -count=1</automated>
</verify>
<done>DockerHubSource and KubernetesSource pass all tests: Docker Hub search returns repo findings, K8s source finds Artifact Hub packages</done>
</task>
<task type="auto">
<name>Task 2: Implement TerraformSource and HelmSource</name>
<files>pkg/recon/sources/terraform.go, pkg/recon/sources/terraform_test.go, pkg/recon/sources/helm.go, pkg/recon/sources/helm_test.go</files>
<action>
**TerraformSource** (terraform.go):
- Struct: `TerraformSource` with `BaseURL`, `Registry`, `Limiters`, `Client`
- Compile-time assertion: `var _ recon.ReconSource = (*TerraformSource)(nil)`
- Name() returns "terraform"
- RateLimit() returns rate.Every(2 * time.Second)
- Burst() returns 2
- RespectsRobots() returns false (JSON API)
- Enabled() always true
- BaseURL defaults to "https://registry.terraform.io"
- Sweep() logic:
1. BuildQueries(s.Registry, "terraform")
2. For each keyword, GET `{BaseURL}/v1/modules?q={keyword}&limit=20`
3. Parse JSON: `{"modules": [{"id": "namespace/name/provider", "namespace": "...", "name": "...", "provider": "...", "description": "..."}]}`
4. Define response structs: `terraformSearchResponse`, `terraformModule`
5. Emit Finding per module: Source="https://registry.terraform.io/modules/{namespace}/{name}/{provider}", SourceType="recon:terraform"
**HelmSource** (helm.go):
- Struct: `HelmSource` with `BaseURL`, `Registry`, `Limiters`, `Client`
- Compile-time assertion: `var _ recon.ReconSource = (*HelmSource)(nil)`
- Name() returns "helm"
- RateLimit() returns rate.Every(2 * time.Second)
- Burst() returns 2
- RespectsRobots() returns false (JSON API)
- Enabled() always true
- BaseURL defaults to "https://artifacthub.io"
- Sweep() logic:
1. BuildQueries(s.Registry, "helm")
2. For each keyword, GET `{BaseURL}/api/v1/packages/search?ts_query_web={keyword}&kind=0&limit=20` (kind=0 = Helm charts)
3. Parse JSON: `{"packages": [{"package_id": "...", "name": "...", "normalized_name": "...", "repository": {"name": "...", "kind": 0}}]}`
4. Define response structs: `artifactHubSearchResponse`, `artifactHubPackage`, `artifactHubRepo`
5. Emit Finding per package: Source="https://artifacthub.io/packages/helm/{repo.name}/{package.name}", SourceType="recon:helm"
6. Note: HelmSource and KubernetesSource both use Artifact Hub but with different `kind` parameters and different SourceType tags. Keep them separate — different concerns.
**Tests** — httptest pattern:
- terraform_test.go: httptest serving canned Terraform registry JSON. Verify module URL construction from namespace/name/provider.
- helm_test.go: httptest serving canned Artifact Hub JSON for Helm charts. Standard test categories.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestTerraform|TestHelm" -v -count=1</automated>
</verify>
<done>TerraformSource and HelmSource pass all tests. Terraform constructs correct module URLs. Helm extracts Artifact Hub packages correctly.</done>
</task>
</tasks>
<verification>
All 8 new files compile and pass tests:
```bash
go test ./pkg/recon/sources/ -run "TestDockerHub|TestKubernetes|TestTerraform|TestHelm" -v -count=1
go vet ./pkg/recon/sources/
```
</verification>
<success_criteria>
- 4 new source files implement recon.ReconSource interface
- 4 test files use httptest with canned fixtures
- All tests pass
- No compilation errors across the package
</success_criteria>
<output>
After completion, create `.planning/phases/13-osint_package_registries_container_iac/13-03-SUMMARY.md`
</output>

View File

@@ -0,0 +1,237 @@
---
phase: 13-osint_package_registries_container_iac
plan: 04
type: execute
wave: 2
depends_on:
- "13-01"
- "13-02"
- "13-03"
files_modified:
- pkg/recon/sources/register.go
- pkg/recon/sources/register_test.go
- pkg/recon/sources/integration_test.go
- cmd/recon.go
autonomous: true
requirements:
- RECON-PKG-01
- RECON-PKG-02
- RECON-PKG-03
- RECON-INFRA-01
- RECON-INFRA-02
- RECON-INFRA-03
- RECON-INFRA-04
must_haves:
truths:
- "RegisterAll registers all 12 new Phase 13 sources (40 total) on the engine"
- "All 40 sources appear in engine.List() sorted alphabetically"
- "Integration test runs SweepAll across all 40 sources with httptest fixtures and gets at least one finding per SourceType"
- "cmd/recon.go wires any new SourcesConfig fields needed for Phase 13 sources"
artifacts:
- path: "pkg/recon/sources/register.go"
provides: "Updated RegisterAll with 12 new Phase 13 source registrations"
contains: "NpmSource"
- path: "pkg/recon/sources/register_test.go"
provides: "Updated test asserting 40 sources registered"
contains: "40"
- path: "pkg/recon/sources/integration_test.go"
provides: "Updated integration test with httptest mux handlers for all 12 new sources"
contains: "recon:npm"
key_links:
- from: "pkg/recon/sources/register.go"
to: "pkg/recon/sources/npm.go"
via: "engine.Register call"
pattern: "NpmSource"
- from: "pkg/recon/sources/register.go"
to: "pkg/recon/sources/dockerhub.go"
via: "engine.Register call"
pattern: "DockerHubSource"
- from: "pkg/recon/sources/integration_test.go"
to: "all 12 new sources"
via: "httptest mux handlers"
pattern: "recon:(npm|pypi|crates|rubygems|maven|nuget|goproxy|packagist|dockerhub|k8s|terraform|helm)"
---
<objective>
Wire all 12 Phase 13 sources into RegisterAll, update register_test.go to assert 40 total sources, and extend the integration test with httptest handlers for all new sources.
Purpose: Connects the individually-implemented sources into the recon engine so `keyhunter recon` discovers and runs them. Integration test proves end-to-end SweepAll works across all 40 sources.
Output: Updated register.go, register_test.go, integration_test.go, cmd/recon.go
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@pkg/recon/sources/register.go
@pkg/recon/sources/register_test.go
@pkg/recon/sources/integration_test.go
@cmd/recon.go
<!-- Depends on Plans 13-01, 13-02, 13-03 outputs -->
@.planning/phases/13-osint_package_registries_container_iac/13-01-SUMMARY.md
@.planning/phases/13-osint_package_registries_container_iac/13-02-SUMMARY.md
@.planning/phases/13-osint_package_registries_container_iac/13-03-SUMMARY.md
<interfaces>
From pkg/recon/sources/register.go (current):
```go
type SourcesConfig struct {
GitHubToken string
// ... existing fields ...
Registry *providers.Registry
Limiters *recon.LimiterRegistry
}
func RegisterAll(engine *recon.Engine, cfg SourcesConfig) { ... }
```
From pkg/recon/engine.go:
```go
func (e *Engine) Register(src ReconSource)
func (e *Engine) List() []string // sorted source names
```
New sources created by Plans 13-01..03 (all credentialless, struct-literal style):
- NpmSource{BaseURL, Registry, Limiters, Client}
- PyPISource{BaseURL, Registry, Limiters, Client}
- CratesIOSource{BaseURL, Registry, Limiters, Client}
- RubyGemsSource{BaseURL, Registry, Limiters, Client}
- MavenSource{BaseURL, Registry, Limiters, Client}
- NuGetSource{BaseURL, Registry, Limiters, Client}
- GoProxySource{BaseURL, Registry, Limiters, Client}
- PackagistSource{BaseURL, Registry, Limiters, Client}
- DockerHubSource{BaseURL, Registry, Limiters, Client}
- KubernetesSource{BaseURL, Registry, Limiters, Client}
- TerraformSource{BaseURL, Registry, Limiters, Client}
- HelmSource{BaseURL, Registry, Limiters, Client}
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Wire Phase 13 sources into RegisterAll and update register_test</name>
<files>pkg/recon/sources/register.go, pkg/recon/sources/register_test.go</files>
<action>
**register.go updates:**
1. Add a `// Phase 13: Package registry sources (credentialless).` comment block after the Phase 12 cloud storage block
2. Register all 8 package registry sources as struct literals (no New* constructors needed since they're credentialless):
```go
engine.Register(&NpmSource{Registry: reg, Limiters: lim})
engine.Register(&PyPISource{Registry: reg, Limiters: lim})
engine.Register(&CratesIOSource{Registry: reg, Limiters: lim})
engine.Register(&RubyGemsSource{Registry: reg, Limiters: lim})
engine.Register(&MavenSource{Registry: reg, Limiters: lim})
engine.Register(&NuGetSource{Registry: reg, Limiters: lim})
engine.Register(&GoProxySource{Registry: reg, Limiters: lim})
engine.Register(&PackagistSource{Registry: reg, Limiters: lim})
```
3. Add a `// Phase 13: Container & IaC sources (credentialless).` comment block
4. Register all 4 infra sources:
```go
engine.Register(&DockerHubSource{Registry: reg, Limiters: lim})
engine.Register(&KubernetesSource{Registry: reg, Limiters: lim})
engine.Register(&TerraformSource{Registry: reg, Limiters: lim})
engine.Register(&HelmSource{Registry: reg, Limiters: lim})
```
5. Update the RegisterAll doc comment: change "28 sources total" to "40 sources total" and mention Phase 13
6. No new SourcesConfig fields needed — all Phase 13 sources are credentialless
**register_test.go updates:**
1. Rename `TestRegisterAll_WiresAllTwentyEightSources` to `TestRegisterAll_WiresAllFortySources`
2. Update `want` slice to include all 12 new names in alphabetical order: "crates", "dockerhub", "goproxy", "helm", "k8s", "maven", "npm", "nuget", "packagist", "pypi", "rubygems", "terraform" merged into existing list
3. Update `TestRegisterAll_MissingCredsStillRegistered` count from 28 to 40
4. The full sorted list should be: azureblob, binaryedge, bing, bitbucket, brave, censys, codeberg, codesandbox, crates, dockerhub, dospaces, duckduckgo, fofa, gcs, gist, gistpaste, github, gitlab, google, goproxy, helm, huggingface, k8s, kaggle, maven, netlas, npm, nuget, packagist, pastebin, pastesites, pypi, replit, rubygems, s3, sandboxes, shodan, spaces, terraform, yandex, zoomeye
Wait — that's 41. Let me recount existing: azureblob, binaryedge, bing, bitbucket, brave, censys, codeberg, codesandbox, duckduckgo, fofa, gcs, gist, gistpaste, github, gitlab, google, huggingface, kaggle, netlas, pastebin, pastesites, replit, s3, sandboxes, shodan, spaces, yandex, zoomeye = 28.
Add 12 new: crates, dockerhub, goproxy, helm, k8s, maven, npm, nuget, packagist, pypi, rubygems, terraform = 12.
But wait — check if dospaces is already in the list. Looking at register.go: DOSpacesScanner is registered. Check its Name(). Need to verify.
Read the current want list from register_test.go to be precise. It has 28 entries already listed. Add the 12 new ones merged alphabetically. Total = 40.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestRegisterAll" -v -count=1</automated>
</verify>
<done>RegisterAll registers all 40 sources. TestRegisterAll_WiresAllFortySources passes with complete sorted name list. Missing creds test asserts 40.</done>
</task>
<task type="auto">
<name>Task 2: Extend integration test with Phase 13 httptest handlers</name>
<files>pkg/recon/sources/integration_test.go, cmd/recon.go</files>
<action>
**integration_test.go updates:**
1. Add httptest mux handlers for all 12 new sources. Each handler serves canned JSON/HTML fixture matching the API format that source expects:
**npm** — `mux.HandleFunc("/npm/-/v1/search", ...)` returning `{"objects": [{"package": {"name": "leak-pkg", "links": {"npm": "https://npmjs.com/package/leak-pkg"}}}]}`
**pypi** — `mux.HandleFunc("/pypi/search/", ...)` returning HTML with `<a href="/project/leaked-pkg/">` links
**crates** — `mux.HandleFunc("/crates/api/v1/crates", ...)` returning `{"crates": [{"name": "leaked-crate"}]}`
**rubygems** — `mux.HandleFunc("/rubygems/api/v1/search.json", ...)` returning `[{"name": "leaked-gem", "project_uri": "https://rubygems.org/gems/leaked-gem"}]`
**maven** — `mux.HandleFunc("/maven/solrsearch/select", ...)` returning `{"response": {"docs": [{"g": "com.leak", "a": "sdk", "latestVersion": "1.0"}]}}`
**nuget** — `mux.HandleFunc("/nuget/query", ...)` returning `{"data": [{"id": "LeakedPkg", "version": "1.0"}]}`
**goproxy** — `mux.HandleFunc("/goproxy/search", ...)` returning HTML with `<a href="/github.com/leak/module">` links
**packagist** — `mux.HandleFunc("/packagist/search.json", ...)` returning `{"results": [{"name": "vendor/leaked", "url": "https://packagist.org/packages/vendor/leaked"}]}`
**dockerhub** — `mux.HandleFunc("/dockerhub/v2/search/repositories/", ...)` returning `{"results": [{"repo_name": "user/leaked-image"}]}`
**k8s** — `mux.HandleFunc("/k8s/api/v1/packages/search", ...)` returning `{"packages": [{"name": "leaked-operator", "repository": {"name": "bitnami", "kind": 6}}]}`
**terraform** — `mux.HandleFunc("/terraform/v1/modules", ...)` returning `{"modules": [{"namespace": "hashicorp", "name": "leaked", "provider": "aws"}]}`
**helm** — `mux.HandleFunc("/helm/api/v1/packages/search", ...)` returning `{"packages": [{"name": "leaked-chart", "repository": {"name": "bitnami", "kind": 0}}]}`
NOTE: The mux path prefixes (e.g., `/npm/`, `/pypi/`) are conventions to route in a single httptest server. Each source constructor in the test sets BaseURL to `srv.URL + "/npm"`, `srv.URL + "/pypi"`, etc.
2. Register each new source with BaseURL pointing at `srv.URL + "/{prefix}"`:
```go
engine.Register(&NpmSource{BaseURL: srv.URL + "/npm", Registry: reg, Limiters: lim, Client: NewClient()})
// ... same for all 12
```
3. Update the expected SourceType set to include all 12 new types: "recon:npm", "recon:pypi", "recon:crates", "recon:rubygems", "recon:maven", "recon:nuget", "recon:goproxy", "recon:packagist", "recon:dockerhub", "recon:k8s", "recon:terraform", "recon:helm"
4. Update the test name/comment from "28 sources" to "40 sources"
**cmd/recon.go updates:**
- No new SourcesConfig fields needed since all Phase 13 sources are credentialless
- Verify the existing cmd/recon.go RegisterAll call passes through correctly — no changes expected but confirm no compilation errors
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestIntegration_AllSources" -v -count=1 -timeout=60s</automated>
</verify>
<done>Integration test passes with all 40 sources producing at least one finding each via httptest. Full package compiles clean.</done>
</task>
</tasks>
<verification>
Full test suite passes:
```bash
go test ./pkg/recon/sources/ -v -count=1 -timeout=120s
go vet ./pkg/recon/sources/
go build ./cmd/...
```
</verification>
<success_criteria>
- RegisterAll registers 40 sources (28 existing + 12 new)
- register_test.go asserts exact 40-name sorted list
- Integration test exercises all 40 sources via httptest
- cmd/recon.go compiles with updated register.go
- `go test ./pkg/recon/sources/ -count=1` all green
</success_criteria>
<output>
After completion, create `.planning/phases/13-osint_package_registries_container_iac/13-04-SUMMARY.md`
</output>