164 lines
7.1 KiB
Markdown
164 lines
7.1 KiB
Markdown
---
|
|
phase: 10-osint-code-hosting
|
|
plan: 04
|
|
type: execute
|
|
wave: 2
|
|
depends_on: [10-01]
|
|
files_modified:
|
|
- pkg/recon/sources/bitbucket.go
|
|
- pkg/recon/sources/bitbucket_test.go
|
|
- pkg/recon/sources/gist.go
|
|
- pkg/recon/sources/gist_test.go
|
|
autonomous: true
|
|
requirements: [RECON-CODE-03, RECON-CODE-04]
|
|
must_haves:
|
|
truths:
|
|
- "BitbucketSource queries Bitbucket 2.0 code search API and emits Findings"
|
|
- "GistSource queries GitHub Gist search (re-uses GitHub token) and emits Findings"
|
|
- "Both disabled when respective credentials are empty"
|
|
artifacts:
|
|
- path: "pkg/recon/sources/bitbucket.go"
|
|
provides: "BitbucketSource implementing recon.ReconSource"
|
|
- path: "pkg/recon/sources/gist.go"
|
|
provides: "GistSource implementing recon.ReconSource"
|
|
key_links:
|
|
- from: "pkg/recon/sources/gist.go"
|
|
to: "pkg/recon/sources/httpclient.go"
|
|
via: "Client.Do with Bearer <github-token>"
|
|
pattern: "client\\.Do"
|
|
- from: "pkg/recon/sources/bitbucket.go"
|
|
to: "pkg/recon/sources/httpclient.go"
|
|
via: "Client.Do"
|
|
pattern: "client\\.Do"
|
|
---
|
|
|
|
<objective>
|
|
Implement BitbucketSource (RECON-CODE-03) and GistSource (RECON-CODE-04). Grouped
|
|
because both are small API integrations with similar shapes (JSON array/values,
|
|
per-item URL, token gating).
|
|
|
|
Purpose: RECON-CODE-03, RECON-CODE-04.
|
|
Output: Two new ReconSource implementations + tests.
|
|
</objective>
|
|
|
|
<execution_context>
|
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
|
</execution_context>
|
|
|
|
<context>
|
|
@.planning/phases/10-osint-code-hosting/10-CONTEXT.md
|
|
@.planning/phases/10-osint-code-hosting/10-01-SUMMARY.md
|
|
@pkg/recon/source.go
|
|
@pkg/recon/sources/httpclient.go
|
|
@pkg/recon/sources/queries.go
|
|
|
|
<interfaces>
|
|
Bitbucket 2.0 search (docs: https://developer.atlassian.com/cloud/bitbucket/rest/api-group-search/):
|
|
GET /2.0/workspaces/{workspace}/search/code?search_query=<query>
|
|
Auth: Bearer <token> (app password or OAuth)
|
|
Response: { "values": [{ "content_match_count": N, "file": {"path":"","commit":{...}}, "page_url": "..." }] }
|
|
Note: Requires a workspace param — make it configurable via SourcesConfig.BitbucketWorkspace;
|
|
if unset, source is disabled. Rate: 1000/hour → rate.Every(3.6 * time.Second), burst 1.
|
|
|
|
GitHub Gist search: GitHub does not expose a dedicated /search/gists endpoint that
|
|
searches gist contents. Use the /gists/public endpoint + client-side filtering as
|
|
fallback: GET /gists/public?per_page=100 returns public gists; for each gist, fetch
|
|
/gists/{id} and scan file contents for keyword matches. Keep implementation minimal:
|
|
just enumerate the first page, match against keyword list, emit Findings with
|
|
Source = gist.html_url. Auth: Bearer <github-token>. Rate: 30/min → rate.Every(2s).
|
|
</interfaces>
|
|
</context>
|
|
|
|
<tasks>
|
|
|
|
<task type="auto" tdd="true">
|
|
<name>Task 1: BitbucketSource + tests</name>
|
|
<files>pkg/recon/sources/bitbucket.go, pkg/recon/sources/bitbucket_test.go</files>
|
|
<behavior>
|
|
- Test A: Enabled false when token OR workspace empty
|
|
- Test B: Enabled true when both set
|
|
- Test C: Sweep queries /2.0/workspaces/{ws}/search/code with Bearer header
|
|
- Test D: Decodes `{values:[{file:{path,commit:{...}},page_url:"..."}]}` and emits Finding with Source=page_url, SourceType="recon:bitbucket"
|
|
- Test E: 401 → ErrUnauthorized
|
|
- Test F: Ctx cancellation
|
|
</behavior>
|
|
<action>
|
|
Create `pkg/recon/sources/bitbucket.go`:
|
|
- Struct `BitbucketSource { Token, Workspace, BaseURL string; Registry *providers.Registry; Limiters *recon.LimiterRegistry; client *Client }`
|
|
- Default BaseURL: `https://api.bitbucket.org`
|
|
- Name "bitbucket", RateLimit rate.Every(3600*time.Millisecond), Burst 1, RespectsRobots false
|
|
- Enabled = s.Token != "" && s.Workspace != ""
|
|
- Sweep: for each query in BuildQueries(reg, "bitbucket"), limiters.Wait, issue
|
|
GET request, decode into struct with `Values []struct{ PageURL string "json:page_url"; File struct{ Path string } "json:file" }`, emit Findings
|
|
- Compile-time assert `var _ recon.ReconSource = (*BitbucketSource)(nil)`
|
|
|
|
Create `pkg/recon/sources/bitbucket_test.go` with httptest server, synthetic
|
|
registry, assertions on URL path `/2.0/workspaces/testws/search/code`, Bearer
|
|
header, and emitted Findings.
|
|
</action>
|
|
<verify>
|
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run TestBitbucket -v -timeout 30s</automated>
|
|
</verify>
|
|
<done>
|
|
BitbucketSource passes all tests, implements ReconSource.
|
|
</done>
|
|
</task>
|
|
|
|
<task type="auto" tdd="true">
|
|
<name>Task 2: GistSource + tests</name>
|
|
<files>pkg/recon/sources/gist.go, pkg/recon/sources/gist_test.go</files>
|
|
<behavior>
|
|
- Test A: Enabled false when GitHub token empty
|
|
- Test B: Sweep fetches /gists/public?per_page=100 with Bearer auth
|
|
- Test C: For each gist, iterates files map; if any file.content contains a provider keyword, emits one Finding with Source=gist.html_url
|
|
- Test D: Ctx cancellation
|
|
- Test E: 401 → ErrUnauthorized
|
|
- Test F: Gist without matching keyword → no Finding emitted
|
|
</behavior>
|
|
<action>
|
|
Create `pkg/recon/sources/gist.go`:
|
|
- Struct `GistSource { Token, BaseURL string; Registry *providers.Registry; Limiters *recon.LimiterRegistry; client *Client }`
|
|
- BaseURL default `https://api.github.com`
|
|
- Name "gist", RateLimit rate.Every(2*time.Second), Burst 1, RespectsRobots false
|
|
- Enabled = s.Token != ""
|
|
- Sweep flow:
|
|
1. Build keyword list from registry (flat set)
|
|
2. GET /gists/public?per_page=100 with Bearer header
|
|
3. Decode `[]struct{ HTMLURL string "json:html_url"; Files map[string]struct{ Filename, RawURL string "json:raw_url" } "json:files" }`
|
|
4. For each gist, for each file, if we can match without fetching raw content,
|
|
skip raw fetch (keep Phase 10 minimal). Fallback: fetch file.RawURL and
|
|
scan content for any keyword from the set; on hit, emit one Finding
|
|
per gist (not per file) with ProviderName from matched keyword.
|
|
5. Respect limiters.Wait before each outbound request (gist list + each raw fetch)
|
|
- Compile-time assert `var _ recon.ReconSource = (*GistSource)(nil)`
|
|
|
|
Create `pkg/recon/sources/gist_test.go`:
|
|
- httptest server with two routes: `/gists/public` returns 2 gists each with 1 file, raw_url pointing to same server `/raw/<id>`; `/raw/<id>` returns content containing "sk-proj-" for one and an unrelated string for the other
|
|
- Assert exactly 1 Finding emitted, Source matches the gist's html_url
|
|
- 401 test, ctx cancellation test, empty-token test
|
|
</action>
|
|
<verify>
|
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run TestGist -v -timeout 30s</automated>
|
|
</verify>
|
|
<done>
|
|
GistSource emits Findings only when a known provider keyword is present in a gist
|
|
file body; all tests green.
|
|
</done>
|
|
</task>
|
|
|
|
</tasks>
|
|
|
|
<verification>
|
|
- `go build ./...`
|
|
- `go test ./pkg/recon/sources/ -run "TestBitbucket|TestGist" -v`
|
|
</verification>
|
|
|
|
<success_criteria>
|
|
RECON-CODE-03 and RECON-CODE-04 satisfied.
|
|
</success_criteria>
|
|
|
|
<output>
|
|
After completion, create `.planning/phases/10-osint-code-hosting/10-04-SUMMARY.md`.
|
|
</output>
|