Files
keyhunter/.planning/phases/10-osint-code-hosting/10-04-SUMMARY.md
2026-04-06 01:18:53 +03:00

6.2 KiB

phase, plan, subsystem, tags, requires, provides, affects, tech_stack_added, patterns, key_files_created, key_files_modified, decisions, metrics, requirements
phase plan subsystem tags requires provides affects tech_stack_added patterns key_files_created key_files_modified decisions metrics requirements
10-osint-code-hosting 04 recon/sources
recon
osint
bitbucket
gist
wave-2
pkg/recon/sources.Client (Plan 10-01)
pkg/recon/sources.BuildQueries (Plan 10-01)
pkg/recon.LimiterRegistry (Phase 9)
pkg/providers.Registry
pkg/recon/sources.BitbucketSource (RECON-CODE-03)
pkg/recon/sources.GistSource (RECON-CODE-04)
pkg/recon/sources (two new source implementations)
Token+workspace gating (Bitbucket requires both to enable)
Content-scan fallback when API has no dedicated search (Gist)
One Finding per gist (not per file) to avoid duplicate leak reports
256KB read cap on raw content fetches
pkg/recon/sources/bitbucket.go
pkg/recon/sources/bitbucket_test.go
pkg/recon/sources/gist.go
pkg/recon/sources/gist_test.go
BitbucketSource disables cleanly when either token OR workspace is empty (no error)
GistSource enumerates /gists/public first page only; broader sweeps deferred
GistSource emits one Finding per matching gist, not per file (prevents fan-out of a single leak)
providerForQuery resolves keyword→provider name for Bitbucket Findings (API doesn't echo keyword)
Bitbucket rate: rate.Every(3.6s) burst 1; Gist rate: rate.Every(2s) burst 1
duration_minutes tasks_completed tests_added completed_at
6 2 9 2026-04-05T22:30:00Z
RECON-CODE-03
RECON-CODE-04

Phase 10 Plan 04: Bitbucket + Gist Sources Summary

One-liner: BitbucketSource hits the Cloud 2.0 code search API with workspace+token gating, and GistSource fans out over /gists/public fetching each file's raw content to match provider keywords, emitting one Finding per matching gist.

What Was Built

BitbucketSource (RECON-CODE-03)

  • pkg/recon/sources/bitbucket.go — implements recon.ReconSource.
  • Endpoint: GET {base}/2.0/workspaces/{workspace}/search/code?search_query={kw}.
  • Auth: Authorization: Bearer <token>.
  • Disabled when either Token or Workspace is empty (clean no-op, no error).
  • Rate: rate.Every(3600ms) burst 1 (Bitbucket 1000/hr API limit).
  • Iterates BuildQueries(registry, "bitbucket") — one request per provider keyword.
  • Decodes {values:[{file:{path,commit{hash}},page_url}]} and emits one Finding per entry.
  • SourceType = "recon:bitbucket", Source = page_url (falls back to synthetic bitbucket:{ws}/{path}@{hash} when page_url missing).

GistSource (RECON-CODE-04)

  • pkg/recon/sources/gist.go — implements recon.ReconSource.
  • Endpoint: GET {base}/gists/public?per_page=100.
  • Per gist, per file: fetches raw_url (also with Bearer auth) and scans content against the provider keyword set (flattened keyword → providerName map).
  • 256KB read cap per raw file to avoid pathological payloads.
  • Emits one Finding per matching gist (breaks on first keyword match across that gist's files) — prevents a multi-file leak from producing N duplicate Findings.
  • ProviderName set from the matched keyword; Source = gist.html_url; SourceType = "recon:gist".
  • Rate: rate.Every(2s) burst 1 (30 req/min). Limiter waited before every outbound request (list + each raw fetch) so GitHub's shared budget is respected.
  • Disabled when token is empty.

How It Fits

  • Depends on Plan 10-01 foundation: sources.Client (retry + 401→ErrUnauthorized), BuildQueries, recon.LimiterRegistry.
  • Does not modify register.go — Plan 10-09 wires all Wave 2 sources into RegisterAll after every plan lands.
  • Finding shape matches engine.Finding so downstream dedup/verify/storage paths in Phases 9/5/4 consume them without changes.

Tests

go test ./pkg/recon/sources/ -run "TestBitbucket|TestGist" -v

Bitbucket (4 tests)

  • TestBitbucket_EnabledRequiresTokenAndWorkspace — all four gate combinations.
  • TestBitbucket_SweepEmitsFindings — httptest server, asserts /2.0/workspaces/testws/search/code path, Bearer header, non-empty search_query, Finding source/type.
  • TestBitbucket_Unauthorized — 401 → errors.Is(err, ErrUnauthorized).
  • TestBitbucket_ContextCancellation — slow server + 50ms ctx deadline.

Gist (5 tests)

  • TestGist_EnabledRequiresToken — empty vs set token.
  • TestGist_SweepEmitsFindingsOnKeywordMatch — two gists, only one raw body contains sk-proj-; asserts exactly 1 Finding, correct html_url, ProviderName=openai.
  • TestGist_NoMatch_NoFinding — gist with unrelated content produces zero Findings.
  • TestGist_Unauthorized — 401 → ErrUnauthorized.
  • TestGist_ContextCancellation — slow server + 50ms ctx deadline.

All 9 tests pass. go build ./... is clean.

Deviations from Plan

None — plan executed exactly as written. No Rule 1/2/3 auto-fixes were required; all tests passed on first full run after writing implementations.

Decisions Made

  1. Keyword→provider mapping on the Bitbucket side lives in providerForQuery — Bitbucket's API doesn't echo the keyword in the response, so we parse the query back to a provider name. Simple substring match over registry keywords is sufficient at current scale.
  2. GistSource emits one Finding per gist, not per file. A single secret often lands in a config.env with supporting README.md and docker-compose.yml — treating the gist as the leak unit keeps noise down and matches how human reviewers triage.
  3. Limiter waited before every raw fetch, not just the list call. GitHub's 30/min budget is shared across API endpoints, so each raw content fetch consumes a token.
  4. 256KB cap on raw content reads. Pathological gists (multi-MB logs, minified bundles) would otherwise block the sweep; 256KB is enough to surface a key that's typically near the top of a config file.

Commits

  • d279abf — feat(10-04): add BitbucketSource for code search recon
  • 0e16e8e — feat(10-04): add GistSource for public gist keyword recon

Self-Check: PASSED

  • FOUND: pkg/recon/sources/bitbucket.go
  • FOUND: pkg/recon/sources/bitbucket_test.go
  • FOUND: pkg/recon/sources/gist.go
  • FOUND: pkg/recon/sources/gist_test.go
  • FOUND: commit d279abf
  • FOUND: commit 0e16e8e
  • Tests: 9/9 passing (go test ./pkg/recon/sources/ -run "TestBitbucket|TestGist")
  • Build: go build ./... clean