docs(14): create phase plan

This commit is contained in:
salvacybersec
2026-04-06 13:12:01 +03:00
parent dc90785ab0
commit 06b0ae0e91
5 changed files with 746 additions and 1 deletions

View File

@@ -287,7 +287,13 @@ Plans:
3. `keyhunter recon --sources=wayback` queries the CDX API for historical snapshots of target domains and scans retrieved content 3. `keyhunter recon --sources=wayback` queries the CDX API for historical snapshots of target domains and scans retrieved content
4. `keyhunter recon --sources=commoncrawl` searches CommonCrawl indexes for pages matching LLM provider keywords and scans WARC records 4. `keyhunter recon --sources=commoncrawl` searches CommonCrawl indexes for pages matching LLM provider keywords and scans WARC records
5. `keyhunter recon --sources=sourcemaps,webpack,dotenv,swagger,deploypreview` each extract and scan the relevant JS artifacts and configuration files 5. `keyhunter recon --sources=sourcemaps,webpack,dotenv,swagger,deploypreview` each extract and scan the relevant JS artifacts and configuration files
**Plans**: TBD **Plans**: 4 plans
Plans:
- [ ] 14-01-PLAN.md — CI/CD log sources: GitHubActions, TravisCI, CircleCI, Jenkins, GitLabCI
- [ ] 14-02-PLAN.md — Web archive sources: Wayback Machine, CommonCrawl
- [ ] 14-03-PLAN.md — Frontend leak sources: SourceMap, Webpack, EnvLeak, Swagger, DeployPreview
- [ ] 14-04-PLAN.md — RegisterAll wiring + integration test (all Phase 14 reqs)
### Phase 15: OSINT Forums, Collaboration & Log Aggregators ### Phase 15: OSINT Forums, Collaboration & Log Aggregators
**Goal**: Users can search developer forums, public collaboration tool pages, and exposed monitoring dashboards for leaked API keys — covering Stack Overflow, Reddit, HackerNews, dev.to, Telegram channels, Discord, Notion, Confluence, Trello, Google Docs, Elasticsearch, Grafana, and Sentry **Goal**: Users can search developer forums, public collaboration tool pages, and exposed monitoring dashboards for leaked API keys — covering Stack Overflow, Reddit, HackerNews, dev.to, Telegram channels, Discord, Notion, Confluence, Trello, Google Docs, Elasticsearch, Grafana, and Sentry

View File

@@ -0,0 +1,204 @@
---
phase: 14-osint_ci_cd_logs_web_archives_frontend_leaks
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- pkg/recon/sources/ghactions.go
- pkg/recon/sources/ghactions_test.go
- pkg/recon/sources/travisci.go
- pkg/recon/sources/travisci_test.go
- pkg/recon/sources/circleci.go
- pkg/recon/sources/circleci_test.go
- pkg/recon/sources/jenkins.go
- pkg/recon/sources/jenkins_test.go
- pkg/recon/sources/gitlabci.go
- pkg/recon/sources/gitlabci_test.go
autonomous: true
requirements:
- RECON-CI-01
- RECON-CI-02
- RECON-CI-03
- RECON-CI-04
must_haves:
truths:
- "GitHub Actions workflow log scanning finds keys in public run logs"
- "Travis CI and CircleCI build log scanning finds keys in public logs"
- "Jenkins exposed instance scanning finds keys in console output"
- "GitLab CI pipeline trace scanning finds keys in job traces"
artifacts:
- path: "pkg/recon/sources/ghactions.go"
provides: "GitHubActionsSource implementing ReconSource"
contains: "func (s *GitHubActionsSource) Sweep"
- path: "pkg/recon/sources/travisci.go"
provides: "TravisCISource implementing ReconSource"
contains: "func (s *TravisCISource) Sweep"
- path: "pkg/recon/sources/circleci.go"
provides: "CircleCISource implementing ReconSource"
contains: "func (s *CircleCISource) Sweep"
- path: "pkg/recon/sources/jenkins.go"
provides: "JenkinsSource implementing ReconSource"
contains: "func (s *JenkinsSource) Sweep"
- path: "pkg/recon/sources/gitlabci.go"
provides: "GitLabCISource implementing ReconSource"
contains: "func (s *GitLabCISource) Sweep"
key_links:
- from: "pkg/recon/sources/ghactions.go"
to: "pkg/recon/source.go"
via: "implements ReconSource interface"
pattern: "var _ recon\\.ReconSource"
- from: "pkg/recon/sources/travisci.go"
to: "pkg/recon/source.go"
via: "implements ReconSource interface"
pattern: "var _ recon\\.ReconSource"
---
<objective>
Implement five CI/CD build log scanning sources: GitHubActionsSource, TravisCISource, CircleCISource, JenkinsSource, and GitLabCISource. Each searches public build logs/pipeline traces for leaked API keys.
Purpose: CI/CD logs are a top vector for key leaks -- build systems often print environment variables, secret injection failures, or debug output containing API keys. Covering the five major CI platforms gives broad detection coverage.
Output: 5 source files + 5 test files in pkg/recon/sources/
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@pkg/recon/source.go
@pkg/recon/sources/register.go
@pkg/recon/sources/httpclient.go
@pkg/recon/sources/queries.go
@pkg/recon/sources/npm.go
@pkg/recon/sources/npm_test.go
<interfaces>
From pkg/recon/source.go:
```go
type Finding = engine.Finding
type ReconSource interface {
Name() string
RateLimit() rate.Limit
Burst() int
RespectsRobots() bool
Enabled(cfg Config) bool
Sweep(ctx context.Context, query string, out chan<- Finding) error
}
```
From pkg/recon/sources/httpclient.go:
```go
type Client struct { HTTP *http.Client; MaxRetries int; UserAgent string }
func NewClient() *Client
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
```
From pkg/recon/sources/queries.go:
```go
func BuildQueries(reg *providers.Registry, source string) []string
```
From pkg/recon/sources/register.go:
```go
type SourcesConfig struct {
GitHubToken string
GitLabToken string
// ... other fields
Registry *providers.Registry
Limiters *recon.LimiterRegistry
}
```
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Implement GitHubActionsSource and TravisCISource with tests</name>
<files>pkg/recon/sources/ghactions.go, pkg/recon/sources/ghactions_test.go, pkg/recon/sources/travisci.go, pkg/recon/sources/travisci_test.go</files>
<action>
Create GitHubActionsSource (RECON-CI-01):
- Struct fields: Token string, BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, Client *Client
- Name() returns "github-actions"
- RateLimit: rate.Every(2*time.Second), Burst: 3
- RespectsRobots: false (API-based)
- Enabled: returns true only when Token is non-empty
- Sweep: For each query from BuildQueries(registry, "github-actions"), search GitHub API for workflow runs via GET /search/code?q={query}+path:.github/workflows, then for each result fetch the run logs. Use the GitHub Actions API: GET /repos/{owner}/{repo}/actions/runs?per_page=5, then GET /repos/{owner}/{repo}/actions/runs/{run_id}/logs (returns zip). For simplicity, use the search code endpoint to find repos with workflows referencing provider keywords, then emit findings with SourceType "recon:github-actions". Auth via "Authorization: Bearer {token}" header.
- Compile-time interface check: var _ recon.ReconSource = (*GitHubActionsSource)(nil)
Create TravisCISource (RECON-CI-02):
- Struct fields: BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, Client *Client
- Name() returns "travis"
- RateLimit: rate.Every(3*time.Second), Burst: 2
- RespectsRobots: true (web scraping)
- Enabled: always true (credentialless, public logs)
- Sweep: For each query from BuildQueries, use Travis CI API v3: GET https://api.travis-ci.com/repos?search={query}&sort_by=recent_activity&limit=5, then for each repo fetch recent builds GET /repo/{slug}/builds?limit=3, then fetch job logs GET /job/{id}/log.txt. Parse log text for provider keywords. Emit findings with SourceType "recon:travis". Use "Travis-API-Version: 3" header.
Tests: Use httptest.NewServer with fixture JSON responses. Test Sweep extracts findings from mock API responses. Test Enabled returns correct boolean based on token presence (for GHActions). Test context cancellation stops early.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestGitHubActions|TestTravis" -count=1 -v</automated>
</verify>
<done>GitHubActionsSource and TravisCISource implement ReconSource, emit findings from mock CI logs, all tests pass</done>
</task>
<task type="auto">
<name>Task 2: Implement CircleCISource, JenkinsSource, and GitLabCISource with tests</name>
<files>pkg/recon/sources/circleci.go, pkg/recon/sources/circleci_test.go, pkg/recon/sources/jenkins.go, pkg/recon/sources/jenkins_test.go, pkg/recon/sources/gitlabci.go, pkg/recon/sources/gitlabci_test.go</files>
<action>
Create CircleCISource (RECON-CI-02):
- Struct fields: BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, Client *Client
- Name() returns "circleci"
- RateLimit: rate.Every(3*time.Second), Burst: 2
- RespectsRobots: false (API-based)
- Enabled: always true (public project builds are accessible without auth)
- Sweep: Use CircleCI API v2: GET https://circleci.com/api/v2/insights/{project-slug}/workflows?branch=main for public projects. For each query, search via GET /api/v1.1/project/{vcs}/{org}/{repo}?limit=5&filter=completed, then fetch build output. Emit findings with SourceType "recon:circleci". Since CircleCI v2 API requires auth for most endpoints, use the v1.1 public endpoint pattern: GET https://circleci.com/api/v1.1/project/github/{org}/{repo}?limit=5 for public repos discovered via keyword search.
Create JenkinsSource (RECON-CI-03):
- Struct fields: BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, Client *Client
- Name() returns "jenkins"
- RateLimit: rate.Every(5*time.Second), Burst: 1
- RespectsRobots: true (web scraping exposed instances)
- Enabled: always true (credentialless, scans exposed instances)
- Sweep: For each query, construct URLs for common exposed Jenkins patterns: {domain}/job/{query}/lastBuild/consoleText. Use provider keywords to search for known Jenkins instances via the query parameter. Emit findings with SourceType "recon:jenkins". Slower rate limit (5s) because scanning exposed instances should be cautious.
Create GitLabCISource (RECON-CI-04):
- Struct fields: Token string, BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, Client *Client
- Name() returns "gitlab-ci"
- RateLimit: rate.Every(2*time.Second), Burst: 3
- RespectsRobots: false (API-based)
- Enabled: returns true only when Token is non-empty
- Sweep: Use GitLab API: GET https://gitlab.com/api/v4/projects?search={query}&visibility=public&per_page=5, then for each project GET /api/v4/projects/{id}/pipelines?per_page=3, then GET /api/v4/projects/{id}/jobs/{job_id}/trace. Auth via "PRIVATE-TOKEN: {token}" header. Emit findings with SourceType "recon:gitlab-ci".
Tests for all three: httptest.NewServer with fixture responses. Test Sweep emits findings. Test Enabled logic. Test context cancellation.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestCircleCI|TestJenkins|TestGitLabCI" -count=1 -v</automated>
</verify>
<done>CircleCISource, JenkinsSource, and GitLabCISource implement ReconSource, emit findings from mock responses, all tests pass</done>
</task>
</tasks>
<verification>
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestGitHubActions|TestTravis|TestCircleCI|TestJenkins|TestGitLabCI" -count=1 -v
cd /home/salva/Documents/apikey && go vet ./pkg/recon/sources/
</verification>
<success_criteria>
- 5 new source files compile and implement ReconSource (var _ check)
- 5 test files pass with httptest mocks
- All 5 sources use BuildQueries + Client + LimiterRegistry pattern
- GitHubActionsSource and GitLabCISource gate on Token; others always enabled
</success_criteria>
<output>
After completion, create `.planning/phases/14-osint_ci_cd_logs_web_archives_frontend_leaks/14-01-SUMMARY.md`
</output>

View File

@@ -0,0 +1,163 @@
---
phase: 14-osint_ci_cd_logs_web_archives_frontend_leaks
plan: 02
type: execute
wave: 1
depends_on: []
files_modified:
- pkg/recon/sources/wayback.go
- pkg/recon/sources/wayback_test.go
- pkg/recon/sources/commoncrawl.go
- pkg/recon/sources/commoncrawl_test.go
autonomous: true
requirements:
- RECON-ARCH-01
- RECON-ARCH-02
must_haves:
truths:
- "Wayback Machine CDX API queries find historical snapshots containing provider keywords"
- "CommonCrawl index search finds pages matching provider keywords and scans WARC content"
artifacts:
- path: "pkg/recon/sources/wayback.go"
provides: "WaybackSource implementing ReconSource"
contains: "func (s *WaybackSource) Sweep"
- path: "pkg/recon/sources/commoncrawl.go"
provides: "CommonCrawlSource implementing ReconSource"
contains: "func (s *CommonCrawlSource) Sweep"
key_links:
- from: "pkg/recon/sources/wayback.go"
to: "pkg/recon/source.go"
via: "implements ReconSource interface"
pattern: "var _ recon\\.ReconSource"
- from: "pkg/recon/sources/commoncrawl.go"
to: "pkg/recon/source.go"
via: "implements ReconSource interface"
pattern: "var _ recon\\.ReconSource"
---
<objective>
Implement two web archive scanning sources: WaybackSource (Wayback Machine CDX API) and CommonCrawlSource (CommonCrawl index API). Both search historical web snapshots for leaked API keys.
Purpose: Web archives preserve historical versions of pages that may have since been scrubbed. Keys accidentally exposed in config files, JavaScript, or API documentation may persist in archive snapshots even after removal from the live site.
Output: 2 source files + 2 test files in pkg/recon/sources/
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@pkg/recon/source.go
@pkg/recon/sources/httpclient.go
@pkg/recon/sources/queries.go
@pkg/recon/sources/npm.go
@pkg/recon/sources/npm_test.go
<interfaces>
From pkg/recon/source.go:
```go
type Finding = engine.Finding
type ReconSource interface {
Name() string
RateLimit() rate.Limit
Burst() int
RespectsRobots() bool
Enabled(cfg Config) bool
Sweep(ctx context.Context, query string, out chan<- Finding) error
}
```
From pkg/recon/sources/httpclient.go:
```go
type Client struct { HTTP *http.Client; MaxRetries int; UserAgent string }
func NewClient() *Client
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
```
From pkg/recon/sources/queries.go:
```go
func BuildQueries(reg *providers.Registry, source string) []string
```
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Implement WaybackSource with tests</name>
<files>pkg/recon/sources/wayback.go, pkg/recon/sources/wayback_test.go</files>
<action>
Create WaybackSource (RECON-ARCH-01):
- Struct fields: BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, Client *Client
- Name() returns "wayback"
- RateLimit: rate.Every(5*time.Second), Burst: 1 (Wayback CDX API is rate-sensitive)
- RespectsRobots: true (web archive, respect their robots.txt)
- Enabled: always true (credentialless, public CDX API)
- Sweep: For each query from BuildQueries(registry, "wayback"):
1. Query CDX API: GET http://web.archive.org/cdx/search/cdx?url=*.{domain}/*&output=json&fl=timestamp,original,statuscode&filter=statuscode:200&limit=10&matchType=domain where domain is derived from the query keyword (e.g., "api.openai.com" for OpenAI keywords). For generic keywords like "sk-proj-", use the CDX full-text search approach: GET http://web.archive.org/cdx/search/cdx?url=*&output=json&fl=timestamp,original&limit=10 with the keyword in the URL pattern.
2. For each CDX result, the snapshot URL is: https://web.archive.org/web/{timestamp}/{original_url}
3. Emit findings with Source set to the snapshot URL and SourceType "recon:wayback"
4. Do NOT fetch the actual archived page content (that would be too slow and bandwidth-heavy). Instead, emit the CDX match as a lead for further investigation.
- BaseURL defaults to "http://web.archive.org" if empty (allows test injection).
- Compile-time interface check: var _ recon.ReconSource = (*WaybackSource)(nil)
Test: httptest.NewServer returning CDX JSON fixture (array-of-arrays format: [["timestamp","original","statuscode"],["20240101120000","https://example.com/config.js","200"]]). Verify Sweep emits findings with correct snapshot URLs. Test context cancellation. Test empty CDX response produces no findings.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestWayback" -count=1 -v</automated>
</verify>
<done>WaybackSource implements ReconSource, queries CDX API via mock, emits findings with archive snapshot URLs, all tests pass</done>
</task>
<task type="auto">
<name>Task 2: Implement CommonCrawlSource with tests</name>
<files>pkg/recon/sources/commoncrawl.go, pkg/recon/sources/commoncrawl_test.go</files>
<action>
Create CommonCrawlSource (RECON-ARCH-02):
- Struct fields: BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, Client *Client
- Name() returns "commoncrawl"
- RateLimit: rate.Every(5*time.Second), Burst: 1 (CommonCrawl index is rate-sensitive)
- RespectsRobots: false (API-based index query, not scraping)
- Enabled: always true (credentialless, public index API)
- Sweep: For each query from BuildQueries(registry, "commoncrawl"):
1. Query CommonCrawl Index API: GET https://index.commoncrawl.org/CC-MAIN-2024-10-index?url=*.{domain}/*&output=json&limit=10 where CC-MAIN-2024-10 is the latest available index (hardcode a recent crawl ID; can be updated later). For keyword-based queries, use the URL pattern matching.
2. CommonCrawl index returns NDJSON (one JSON object per line), each with fields: url, timestamp, filename, offset, length.
3. Emit findings with Source set to the matched URL and SourceType "recon:commoncrawl". Include the WARC filename in the finding metadata for follow-up retrieval.
4. Do NOT fetch actual WARC records (too large). Emit index matches as leads.
- BaseURL defaults to "https://index.commoncrawl.org" if empty.
- Use a CrawlID field (default "CC-MAIN-2024-10") to allow specifying which crawl index to search.
- Compile-time interface check: var _ recon.ReconSource = (*CommonCrawlSource)(nil)
Test: httptest.NewServer returning NDJSON fixture (one JSON object per line with url, timestamp, filename fields). Verify Sweep emits findings. Test empty response. Test context cancellation. Test malformed NDJSON lines are skipped gracefully.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestCommonCrawl" -count=1 -v</automated>
</verify>
<done>CommonCrawlSource implements ReconSource, queries index API via mock, emits findings from NDJSON results, all tests pass</done>
</task>
</tasks>
<verification>
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestWayback|TestCommonCrawl" -count=1 -v
cd /home/salva/Documents/apikey && go vet ./pkg/recon/sources/
</verification>
<success_criteria>
- 2 new source files compile and implement ReconSource (var _ check)
- 2 test files pass with httptest mocks
- Both sources use BuildQueries + Client + LimiterRegistry pattern
- Both are credentialless (always enabled)
- WaybackSource constructs proper CDX snapshot URLs
- CommonCrawlSource parses NDJSON line-by-line
</success_criteria>
<output>
After completion, create `.planning/phases/14-osint_ci_cd_logs_web_archives_frontend_leaks/14-02-SUMMARY.md`
</output>

View File

@@ -0,0 +1,196 @@
---
phase: 14-osint_ci_cd_logs_web_archives_frontend_leaks
plan: 03
type: execute
wave: 1
depends_on: []
files_modified:
- pkg/recon/sources/sourcemap.go
- pkg/recon/sources/sourcemap_test.go
- pkg/recon/sources/webpack.go
- pkg/recon/sources/webpack_test.go
- pkg/recon/sources/envleak.go
- pkg/recon/sources/envleak_test.go
- pkg/recon/sources/swagger.go
- pkg/recon/sources/swagger_test.go
- pkg/recon/sources/deploypreview.go
- pkg/recon/sources/deploypreview_test.go
autonomous: true
requirements:
- RECON-JS-01
- RECON-JS-02
- RECON-JS-03
- RECON-JS-04
- RECON-JS-05
must_haves:
truths:
- "Source map extraction discovers original source files containing API keys"
- "Webpack/Vite bundle scanning finds inlined env vars with API keys"
- "Exposed .env file scanning finds publicly accessible environment files"
- "Swagger/OpenAPI doc scanning finds API keys in example fields"
- "Vercel/Netlify deploy preview scanning finds keys in JS bundles"
artifacts:
- path: "pkg/recon/sources/sourcemap.go"
provides: "SourceMapSource implementing ReconSource"
contains: "func (s *SourceMapSource) Sweep"
- path: "pkg/recon/sources/webpack.go"
provides: "WebpackSource implementing ReconSource"
contains: "func (s *WebpackSource) Sweep"
- path: "pkg/recon/sources/envleak.go"
provides: "EnvLeakSource implementing ReconSource"
contains: "func (s *EnvLeakSource) Sweep"
- path: "pkg/recon/sources/swagger.go"
provides: "SwaggerSource implementing ReconSource"
contains: "func (s *SwaggerSource) Sweep"
- path: "pkg/recon/sources/deploypreview.go"
provides: "DeployPreviewSource implementing ReconSource"
contains: "func (s *DeployPreviewSource) Sweep"
key_links:
- from: "pkg/recon/sources/sourcemap.go"
to: "pkg/recon/source.go"
via: "implements ReconSource interface"
pattern: "var _ recon\\.ReconSource"
- from: "pkg/recon/sources/envleak.go"
to: "pkg/recon/source.go"
via: "implements ReconSource interface"
pattern: "var _ recon\\.ReconSource"
---
<objective>
Implement five frontend leak scanning sources: SourceMapSource, WebpackSource, EnvLeakSource, SwaggerSource, and DeployPreviewSource. Each targets a different vector for API key exposure in client-facing web assets.
Purpose: Frontend JavaScript bundles, source maps, exposed .env files, API documentation, and deploy previews are high-value targets where developers accidentally ship server-side secrets to the client. These are often reachable without authentication.
Output: 5 source files + 5 test files in pkg/recon/sources/
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@pkg/recon/source.go
@pkg/recon/sources/httpclient.go
@pkg/recon/sources/queries.go
@pkg/recon/sources/npm.go
@pkg/recon/sources/npm_test.go
<interfaces>
From pkg/recon/source.go:
```go
type Finding = engine.Finding
type ReconSource interface {
Name() string
RateLimit() rate.Limit
Burst() int
RespectsRobots() bool
Enabled(cfg Config) bool
Sweep(ctx context.Context, query string, out chan<- Finding) error
}
```
From pkg/recon/sources/httpclient.go:
```go
type Client struct { HTTP *http.Client; MaxRetries int; UserAgent string }
func NewClient() *Client
func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error)
```
From pkg/recon/sources/queries.go:
```go
func BuildQueries(reg *providers.Registry, source string) []string
```
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Implement SourceMapSource, WebpackSource, and EnvLeakSource with tests</name>
<files>pkg/recon/sources/sourcemap.go, pkg/recon/sources/sourcemap_test.go, pkg/recon/sources/webpack.go, pkg/recon/sources/webpack_test.go, pkg/recon/sources/envleak.go, pkg/recon/sources/envleak_test.go</files>
<action>
Create SourceMapSource (RECON-JS-01):
- Struct fields: BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, Client *Client
- Name() returns "sourcemaps"
- RateLimit: rate.Every(3*time.Second), Burst: 2
- RespectsRobots: true (fetching web resources)
- Enabled: always true (credentialless)
- Sweep: For each query from BuildQueries(registry, "sourcemaps"), construct common source map URL patterns to probe. The source uses the query as a domain/URL hint and checks common paths: {url}.map, {url}/main.js.map, {url}/static/js/main.*.js.map. For each accessible .map file, the response contains a JSON object with "sources" and "sourcesContent" arrays -- the sourcesContent contains original source code that may have API keys. Emit findings with SourceType "recon:sourcemaps" and Source set to the map file URL.
- Since we cannot enumerate all domains, Sweep uses BuildQueries to get provider-related keywords and constructs probe URLs. The source is a lead generator -- it emits URLs where source maps were found accessible.
- Compile-time interface check: var _ recon.ReconSource = (*SourceMapSource)(nil)
Create WebpackSource (RECON-JS-02):
- Struct fields: BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, Client *Client
- Name() returns "webpack"
- RateLimit: rate.Every(3*time.Second), Burst: 2
- RespectsRobots: true (fetching web resources)
- Enabled: always true (credentialless)
- Sweep: For each query, probe common Webpack/Vite build artifact paths: /_next/static/chunks/*, /static/js/main.*.js, /assets/index-*.js, /dist/bundle.js. Look for patterns like process.env.NEXT_PUBLIC_, REACT_APP_, VITE_ prefixed variables that often contain API keys. Emit findings with SourceType "recon:webpack". The source emits leads for URLs containing webpack build artifacts with env var patterns.
Create EnvLeakSource (RECON-JS-03):
- Struct fields: BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, Client *Client
- Name() returns "dotenv"
- RateLimit: rate.Every(2*time.Second), Burst: 2
- RespectsRobots: true (probing web servers)
- Enabled: always true (credentialless)
- Sweep: For each query (used as domain hint), probe common exposed .env paths: /.env, /.env.local, /.env.production, /.env.development, /app/.env, /api/.env, /.env.backup, /.env.example. Check if the response contains key=value patterns (specifically lines matching provider keywords). Emit findings with SourceType "recon:dotenv" and Source set to the accessible .env URL. This is a common web vulnerability -- many frameworks serve .env if misconfigured.
Tests for all three: httptest.NewServer returning appropriate fixture content (JSON source map, JS bundle with process.env references, .env file content). Verify Sweep emits findings with correct SourceType. Test empty/404 responses produce no findings. Test context cancellation.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestSourceMap|TestWebpack|TestEnvLeak" -count=1 -v</automated>
</verify>
<done>SourceMapSource, WebpackSource, EnvLeakSource implement ReconSource, emit findings from mocked web responses, all tests pass</done>
</task>
<task type="auto">
<name>Task 2: Implement SwaggerSource and DeployPreviewSource with tests</name>
<files>pkg/recon/sources/swagger.go, pkg/recon/sources/swagger_test.go, pkg/recon/sources/deploypreview.go, pkg/recon/sources/deploypreview_test.go</files>
<action>
Create SwaggerSource (RECON-JS-04):
- Struct fields: BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, Client *Client
- Name() returns "swagger"
- RateLimit: rate.Every(3*time.Second), Burst: 2
- RespectsRobots: true (fetching web resources)
- Enabled: always true (credentialless)
- Sweep: For each query (domain hint), probe common Swagger/OpenAPI documentation paths: /swagger.json, /openapi.json, /api-docs, /v2/api-docs, /swagger/v1/swagger.json, /docs/openapi.json. Parse the JSON response and look for "example" or "default" fields in security scheme definitions or parameter definitions that contain actual API key values (a common misconfiguration where developers put real keys as examples). Emit findings with SourceType "recon:swagger" and Source set to the accessible docs URL.
Create DeployPreviewSource (RECON-JS-05):
- Struct fields: BaseURL string, Registry *providers.Registry, Limiters *recon.LimiterRegistry, Client *Client
- Name() returns "deploypreview"
- RateLimit: rate.Every(3*time.Second), Burst: 2
- RespectsRobots: true (fetching web resources)
- Enabled: always true (credentialless)
- Sweep: For each query, construct Vercel/Netlify deploy preview URL patterns. Vercel previews follow: {project}-{hash}-{team}.vercel.app, Netlify: deploy-preview-{n}--{site}.netlify.app. The source uses BuildQueries to get keywords and searches for deploy preview artifacts. Probe /_next/data/ and /__NEXT_DATA__ script tags on Vercel previews, and /static/ on Netlify previews. Deploy previews often have different (less restrictive) environment variables than production. Emit findings with SourceType "recon:deploypreview".
Tests for both: httptest.NewServer with fixture responses (Swagger JSON with example API keys, HTML with __NEXT_DATA__ containing env vars). Verify Sweep emits findings. Test 404/empty responses. Test context cancellation.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestSwagger|TestDeployPreview" -count=1 -v</automated>
</verify>
<done>SwaggerSource and DeployPreviewSource implement ReconSource, emit findings from mocked responses, all tests pass</done>
</task>
</tasks>
<verification>
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestSourceMap|TestWebpack|TestEnvLeak|TestSwagger|TestDeployPreview" -count=1 -v
cd /home/salva/Documents/apikey && go vet ./pkg/recon/sources/
</verification>
<success_criteria>
- 5 new source files compile and implement ReconSource (var _ check)
- 5 test files pass with httptest mocks
- All 5 sources use BuildQueries + Client + LimiterRegistry pattern
- All are credentialless (always enabled)
- Each source has distinct SourceType: recon:sourcemaps, recon:webpack, recon:dotenv, recon:swagger, recon:deploypreview
</success_criteria>
<output>
After completion, create `.planning/phases/14-osint_ci_cd_logs_web_archives_frontend_leaks/14-03-SUMMARY.md`
</output>

View File

@@ -0,0 +1,176 @@
---
phase: 14-osint_ci_cd_logs_web_archives_frontend_leaks
plan: 04
type: execute
wave: 2
depends_on:
- 14-01
- 14-02
- 14-03
files_modified:
- pkg/recon/sources/register.go
- cmd/recon.go
- pkg/recon/sources/register_test.go
autonomous: true
requirements:
- RECON-CI-01
- RECON-CI-02
- RECON-CI-03
- RECON-CI-04
- RECON-ARCH-01
- RECON-ARCH-02
- RECON-JS-01
- RECON-JS-02
- RECON-JS-03
- RECON-JS-04
- RECON-JS-05
must_haves:
truths:
- "RegisterAll wires all 12 new Phase 14 sources onto the engine (52 total)"
- "cmd/recon.go passes GitHub and GitLab tokens to Phase 14 credential-gated sources"
- "Integration test confirms all 52 sources register and credential-gated ones report Enabled correctly"
artifacts:
- path: "pkg/recon/sources/register.go"
provides: "RegisterAll with 52 sources (40 Phase 10-13 + 12 Phase 14)"
contains: "Phase 14"
- path: "pkg/recon/sources/register_test.go"
provides: "Integration test for all 52 registered sources"
contains: "52"
key_links:
- from: "pkg/recon/sources/register.go"
to: "pkg/recon/sources/ghactions.go"
via: "engine.Register call"
pattern: "GitHubActionsSource"
- from: "pkg/recon/sources/register.go"
to: "pkg/recon/sources/wayback.go"
via: "engine.Register call"
pattern: "WaybackSource"
- from: "cmd/recon.go"
to: "pkg/recon/sources/register.go"
via: "SourcesConfig population"
pattern: "sources\\.RegisterAll"
---
<objective>
Wire all 12 Phase 14 sources into RegisterAll and update cmd/recon.go to pass credentials for token-gated sources (GitHubActions reuses GitHubToken, GitLabCI reuses GitLabToken). Add integration test confirming 52 total sources register.
Purpose: This plan connects all Phase 14 source implementations to the engine so `keyhunter recon` can discover and run them. Without wiring, the sources exist but are unreachable.
Output: Updated register.go, cmd/recon.go, and register_test.go
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@pkg/recon/sources/register.go
@cmd/recon.go
<interfaces>
From pkg/recon/sources/register.go (current state):
```go
type SourcesConfig struct {
GitHubToken string
GitLabToken string
// ... existing fields
Registry *providers.Registry
Limiters *recon.LimiterRegistry
}
func RegisterAll(engine *recon.Engine, cfg SourcesConfig) {
// Currently registers 40 sources (Phase 10-13)
}
```
New Phase 14 sources to wire:
- GitHubActionsSource{Token, Registry, Limiters} -- reuses GitHubToken
- TravisCISource{Registry, Limiters} -- credentialless
- CircleCISource{Registry, Limiters} -- credentialless
- JenkinsSource{Registry, Limiters} -- credentialless
- GitLabCISource{Token, Registry, Limiters} -- reuses GitLabToken
- WaybackSource{Registry, Limiters} -- credentialless
- CommonCrawlSource{Registry, Limiters} -- credentialless
- SourceMapSource{Registry, Limiters} -- credentialless
- WebpackSource{Registry, Limiters} -- credentialless
- EnvLeakSource{Registry, Limiters} -- credentialless
- SwaggerSource{Registry, Limiters} -- credentialless
- DeployPreviewSource{Registry, Limiters} -- credentialless
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Wire Phase 14 sources in RegisterAll and update cmd/recon.go</name>
<files>pkg/recon/sources/register.go, cmd/recon.go</files>
<action>
Update RegisterAll in register.go:
1. Add a "Phase 14: CI/CD log sources" section after the Phase 13 block
2. Register GitHubActionsSource with Token from cfg.GitHubToken (reuses existing field -- no new SourcesConfig fields needed)
3. Register TravisCISource, CircleCISource, JenkinsSource as credentialless struct literals with Registry+Limiters
4. Register GitLabCISource with Token from cfg.GitLabToken (reuses existing field)
5. Add a "Phase 14: Web archive sources" section
6. Register WaybackSource and CommonCrawlSource as credentialless struct literals
7. Add a "Phase 14: Frontend leak sources" section
8. Register SourceMapSource, WebpackSource, EnvLeakSource, SwaggerSource, DeployPreviewSource as credentialless struct literals
9. Update the RegisterAll doc comment to say "52 sources total" (was 40)
No changes needed to SourcesConfig -- GitHubActionsSource reuses GitHubToken and GitLabCISource reuses GitLabToken, both already in the struct.
Update cmd/recon.go: No changes needed -- GitHubToken and GitLabToken are already populated in buildReconEngine(). The new sources pick them up automatically through SourcesConfig.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go build ./cmd/... && go vet ./pkg/recon/sources/ ./cmd/...</automated>
</verify>
<done>RegisterAll registers 52 sources, go build succeeds, no new SourcesConfig fields needed</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Integration test for 52-source RegisterAll</name>
<files>pkg/recon/sources/register_test.go</files>
<behavior>
- Test: RegisterAll with nil engine does not panic
- Test: RegisterAll with valid engine registers exactly 52 sources
- Test: GitHubActionsSource.Enabled is false when GitHubToken is empty, true when set
- Test: GitLabCISource.Enabled is false when GitLabToken is empty, true when set
- Test: All credentialless Phase 14 sources (travis, circleci, jenkins, wayback, commoncrawl, sourcemaps, webpack, dotenv, swagger, deploypreview) report Enabled==true
- Test: All 52 source names are unique (no duplicates)
</behavior>
<action>
Update existing register_test.go (or create if not exists). Follow the pattern from Phase 13 wiring tests:
1. TestRegisterAll_NilEngine -- call RegisterAll(nil, cfg), assert no panic
2. TestRegisterAll_SourceCount -- create engine, call RegisterAll, assert engine has 52 registered sources
3. TestRegisterAll_Phase14Enabled -- assert credential-gated sources (github-actions, gitlab-ci) report Enabled correctly based on token presence, and all credentialless sources report Enabled==true
4. TestRegisterAll_UniqueNames -- collect all source names, assert no duplicates
Use a minimal SourcesConfig with providers.NewRegistryFromProviders and recon.NewLimiterRegistry. Set GitHubToken and GitLabToken to test values for the enabled tests.
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestRegisterAll" -count=1 -v</automated>
</verify>
<done>Integration test confirms 52 sources registered, credential gating works, no duplicate names, all tests pass</done>
</task>
</tasks>
<verification>
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run "TestRegisterAll" -count=1 -v
cd /home/salva/Documents/apikey && go build ./cmd/... && go vet ./...
</verification>
<success_criteria>
- RegisterAll registers exactly 52 sources (40 existing + 12 new)
- go build ./cmd/... succeeds without errors
- Integration test passes confirming source count, credential gating, and name uniqueness
- No new SourcesConfig fields were needed (reuses GitHubToken and GitLabToken)
</success_criteria>
<output>
After completion, create `.planning/phases/14-osint_ci_cd_logs_web_archives_frontend_leaks/14-04-SUMMARY.md`
</output>