--- phase: 10-osint-code-hosting plan: 04 subsystem: recon/sources tags: [recon, osint, bitbucket, gist, wave-2] requires: - pkg/recon/sources.Client (Plan 10-01) - pkg/recon/sources.BuildQueries (Plan 10-01) - pkg/recon.LimiterRegistry (Phase 9) - pkg/providers.Registry provides: - pkg/recon/sources.BitbucketSource (RECON-CODE-03) - pkg/recon/sources.GistSource (RECON-CODE-04) affects: - pkg/recon/sources (two new source implementations) tech_stack_added: [] patterns: - "Token+workspace gating (Bitbucket requires both to enable)" - "Content-scan fallback when API has no dedicated search (Gist)" - "One Finding per gist (not per file) to avoid duplicate leak reports" - "256KB read cap on raw content fetches" key_files_created: - pkg/recon/sources/bitbucket.go - pkg/recon/sources/bitbucket_test.go - pkg/recon/sources/gist.go - pkg/recon/sources/gist_test.go key_files_modified: [] decisions: - "BitbucketSource disables cleanly when either token OR workspace is empty (no error)" - "GistSource enumerates /gists/public first page only; broader sweeps deferred" - "GistSource emits one Finding per matching gist, not per file (prevents fan-out of a single leak)" - "providerForQuery resolves keyword→provider name for Bitbucket Findings (API doesn't echo keyword)" - "Bitbucket rate: rate.Every(3.6s) burst 1; Gist rate: rate.Every(2s) burst 1" metrics: duration_minutes: 6 tasks_completed: 2 tests_added: 9 completed_at: "2026-04-05T22:30:00Z" requirements: [RECON-CODE-03, RECON-CODE-04] --- # Phase 10 Plan 04: Bitbucket + Gist Sources Summary One-liner: BitbucketSource hits the Cloud 2.0 code search API with workspace+token gating, and GistSource fans out over /gists/public fetching each file's raw content to match provider keywords, emitting one Finding per matching gist. ## What Was Built ### BitbucketSource (RECON-CODE-03) - `pkg/recon/sources/bitbucket.go` — implements `recon.ReconSource`. - Endpoint: `GET {base}/2.0/workspaces/{workspace}/search/code?search_query={kw}`. - Auth: `Authorization: Bearer `. - Disabled when either `Token` or `Workspace` is empty (clean no-op, no error). - Rate: `rate.Every(3600ms)` burst 1 (Bitbucket 1000/hr API limit). - Iterates `BuildQueries(registry, "bitbucket")` — one request per provider keyword. - Decodes `{values:[{file:{path,commit{hash}},page_url}]}` and emits one Finding per entry. - `SourceType = "recon:bitbucket"`, `Source = page_url` (falls back to synthetic `bitbucket:{ws}/{path}@{hash}` when page_url missing). ### GistSource (RECON-CODE-04) - `pkg/recon/sources/gist.go` — implements `recon.ReconSource`. - Endpoint: `GET {base}/gists/public?per_page=100`. - Per gist, per file: fetches `raw_url` (also with Bearer auth) and scans content against the provider keyword set (flattened `keyword → providerName` map). - 256KB read cap per raw file to avoid pathological payloads. - Emits **one Finding per matching gist** (breaks on first keyword match across that gist's files) — prevents a multi-file leak from producing N duplicate Findings. - `ProviderName` set from the matched keyword; `Source = gist.html_url`; `SourceType = "recon:gist"`. - Rate: `rate.Every(2s)` burst 1 (30 req/min). Limiter waited before **every** outbound request (list + each raw fetch) so GitHub's shared budget is respected. - Disabled when token is empty. ## How It Fits - Depends on Plan 10-01 foundation: `sources.Client` (retry + 401→ErrUnauthorized), `BuildQueries`, `recon.LimiterRegistry`. - Does **not** modify `register.go` — Plan 10-09 wires all Wave 2 sources into `RegisterAll` after every plan lands. - Finding shape matches `engine.Finding` so downstream dedup/verify/storage paths in Phases 9/5/4 consume them without changes. ## Tests `go test ./pkg/recon/sources/ -run "TestBitbucket|TestGist" -v` ### Bitbucket (4 tests) - `TestBitbucket_EnabledRequiresTokenAndWorkspace` — all four gate combinations. - `TestBitbucket_SweepEmitsFindings` — httptest server, asserts `/2.0/workspaces/testws/search/code` path, Bearer header, non-empty `search_query`, Finding source/type. - `TestBitbucket_Unauthorized` — 401 → `errors.Is(err, ErrUnauthorized)`. - `TestBitbucket_ContextCancellation` — slow server + 50ms ctx deadline. ### Gist (5 tests) - `TestGist_EnabledRequiresToken` — empty vs set token. - `TestGist_SweepEmitsFindingsOnKeywordMatch` — two gists, only one raw body contains `sk-proj-`; asserts exactly 1 Finding, correct `html_url`, `ProviderName=openai`. - `TestGist_NoMatch_NoFinding` — gist with unrelated content produces zero Findings. - `TestGist_Unauthorized` — 401 → `ErrUnauthorized`. - `TestGist_ContextCancellation` — slow server + 50ms ctx deadline. All 9 tests pass. `go build ./...` is clean. ## Deviations from Plan None — plan executed exactly as written. No Rule 1/2/3 auto-fixes were required; all tests passed on first full run after writing implementations. ## Decisions Made 1. **Keyword→provider mapping on the Bitbucket side lives in `providerForQuery`** — Bitbucket's API doesn't echo the keyword in the response, so we parse the query back to a provider name. Simple substring match over registry keywords is sufficient at current scale. 2. **GistSource emits one Finding per gist, not per file.** A single secret often lands in a `config.env` with supporting `README.md` and `docker-compose.yml` — treating the gist as the leak unit keeps noise down and matches how human reviewers triage. 3. **Limiter waited before every raw fetch, not just the list call.** GitHub's 30/min budget is shared across API endpoints, so each raw content fetch consumes a token. 4. **256KB cap on raw content reads.** Pathological gists (multi-MB logs, minified bundles) would otherwise block the sweep; 256KB is enough to surface a key that's typically near the top of a config file. ## Commits - `d279abf` — feat(10-04): add BitbucketSource for code search recon - `0e16e8e` — feat(10-04): add GistSource for public gist keyword recon ## Self-Check: PASSED - FOUND: pkg/recon/sources/bitbucket.go - FOUND: pkg/recon/sources/bitbucket_test.go - FOUND: pkg/recon/sources/gist.go - FOUND: pkg/recon/sources/gist_test.go - FOUND: commit d279abf - FOUND: commit 0e16e8e - Tests: 9/9 passing (`go test ./pkg/recon/sources/ -run "TestBitbucket|TestGist"`) - Build: `go build ./...` clean