Both disabled when respective credentials are empty
path
provides
pkg/recon/sources/bitbucket.go
BitbucketSource implementing recon.ReconSource
path
provides
pkg/recon/sources/gist.go
GistSource implementing recon.ReconSource
from
to
via
pattern
pkg/recon/sources/gist.go
pkg/recon/sources/httpclient.go
Client.Do with Bearer <github-token>
client.Do
from
to
via
pattern
pkg/recon/sources/bitbucket.go
pkg/recon/sources/httpclient.go
Client.Do
client.Do
Implement BitbucketSource (RECON-CODE-03) and GistSource (RECON-CODE-04). Grouped
because both are small API integrations with similar shapes (JSON array/values,
per-item URL, token gating).
Purpose: RECON-CODE-03, RECON-CODE-04.
Output: Two new ReconSource implementations + tests.
@.planning/phases/10-osint-code-hosting/10-CONTEXT.md
@.planning/phases/10-osint-code-hosting/10-01-SUMMARY.md
@pkg/recon/source.go
@pkg/recon/sources/httpclient.go
@pkg/recon/sources/queries.go
Bitbucket 2.0 search (docs: https://developer.atlassian.com/cloud/bitbucket/rest/api-group-search/):
GET /2.0/workspaces/{workspace}/search/code?search_query=
Auth: Bearer (app password or OAuth)
Response: { "values": [{ "content_match_count": N, "file": {"path":"","commit":{...}}, "page_url": "..." }] }
Note: Requires a workspace param — make it configurable via SourcesConfig.BitbucketWorkspace;
if unset, source is disabled. Rate: 1000/hour → rate.Every(3.6 * time.Second), burst 1.
GitHub Gist search: GitHub does not expose a dedicated /search/gists endpoint that
searches gist contents. Use the /gists/public endpoint + client-side filtering as
fallback: GET /gists/public?per_page=100 returns public gists; for each gist, fetch
/gists/{id} and scan file contents for keyword matches. Keep implementation minimal:
just enumerate the first page, match against keyword list, emit Findings with
Source = gist.html_url. Auth: Bearer . Rate: 30/min → rate.Every(2s).
Task 1: BitbucketSource + tests
pkg/recon/sources/bitbucket.go, pkg/recon/sources/bitbucket_test.go
- Test A: Enabled false when token OR workspace empty
- Test B: Enabled true when both set
- Test C: Sweep queries /2.0/workspaces/{ws}/search/code with Bearer header
- Test D: Decodes `{values:[{file:{path,commit:{...}},page_url:"..."}]}` and emits Finding with Source=page_url, SourceType="recon:bitbucket"
- Test E: 401 → ErrUnauthorized
- Test F: Ctx cancellation
Create `pkg/recon/sources/bitbucket.go`:
- Struct `BitbucketSource { Token, Workspace, BaseURL string; Registry *providers.Registry; Limiters *recon.LimiterRegistry; client *Client }`
- Default BaseURL: `https://api.bitbucket.org`
- Name "bitbucket", RateLimit rate.Every(3600*time.Millisecond), Burst 1, RespectsRobots false
- Enabled = s.Token != "" && s.Workspace != ""
- Sweep: for each query in BuildQueries(reg, "bitbucket"), limiters.Wait, issue
GET request, decode into struct with `Values []struct{ PageURL string "json:page_url"; File struct{ Path string } "json:file" }`, emit Findings
- Compile-time assert `var _ recon.ReconSource = (*BitbucketSource)(nil)`
Create `pkg/recon/sources/bitbucket_test.go` with httptest server, synthetic
registry, assertions on URL path `/2.0/workspaces/testws/search/code`, Bearer
header, and emitted Findings.
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run TestBitbucket -v -timeout 30s
BitbucketSource passes all tests, implements ReconSource.
Task 2: GistSource + tests
pkg/recon/sources/gist.go, pkg/recon/sources/gist_test.go
- Test A: Enabled false when GitHub token empty
- Test B: Sweep fetches /gists/public?per_page=100 with Bearer auth
- Test C: For each gist, iterates files map; if any file.content contains a provider keyword, emits one Finding with Source=gist.html_url
- Test D: Ctx cancellation
- Test E: 401 → ErrUnauthorized
- Test F: Gist without matching keyword → no Finding emitted
Create `pkg/recon/sources/gist.go`:
- Struct `GistSource { Token, BaseURL string; Registry *providers.Registry; Limiters *recon.LimiterRegistry; client *Client }`
- BaseURL default `https://api.github.com`
- Name "gist", RateLimit rate.Every(2*time.Second), Burst 1, RespectsRobots false
- Enabled = s.Token != ""
- Sweep flow:
1. Build keyword list from registry (flat set)
2. GET /gists/public?per_page=100 with Bearer header
3. Decode `[]struct{ HTMLURL string "json:html_url"; Files map[string]struct{ Filename, RawURL string "json:raw_url" } "json:files" }`
4. For each gist, for each file, if we can match without fetching raw content,
skip raw fetch (keep Phase 10 minimal). Fallback: fetch file.RawURL and
scan content for any keyword from the set; on hit, emit one Finding
per gist (not per file) with ProviderName from matched keyword.
5. Respect limiters.Wait before each outbound request (gist list + each raw fetch)
- Compile-time assert `var _ recon.ReconSource = (*GistSource)(nil)`
Create `pkg/recon/sources/gist_test.go`:
- httptest server with two routes: `/gists/public` returns 2 gists each with 1 file, raw_url pointing to same server `/raw/<id>`; `/raw/<id>` returns content containing "sk-proj-" for one and an unrelated string for the other
- Assert exactly 1 Finding emitted, Source matches the gist's html_url
- 401 test, ctx cancellation test, empty-token test
cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run TestGist -v -timeout 30s
GistSource emits Findings only when a known provider keyword is present in a gist
file body; all tests green.
- `go build ./...`
- `go test ./pkg/recon/sources/ -run "TestBitbucket|TestGist" -v`
<success_criteria>
RECON-CODE-03 and RECON-CODE-04 satisfied.
</success_criteria>
After completion, create `.planning/phases/10-osint-code-hosting/10-04-SUMMARY.md`.