Files
keyhunter/.planning/phases/10-osint-code-hosting/10-08-PLAN.md

110 lines
3.9 KiB
Markdown

---
phase: 10-osint-code-hosting
plan: 08
type: execute
wave: 2
depends_on: [10-01]
files_modified:
- pkg/recon/sources/kaggle.go
- pkg/recon/sources/kaggle_test.go
autonomous: true
requirements: [RECON-CODE-09]
must_haves:
truths:
- "KaggleSource queries Kaggle public API /api/v1/kernels/list with Basic auth (username:key) and emits Findings"
- "Disabled when either KaggleUser or KaggleKey is empty"
- "Findings tagged recon:kaggle; Source = https://www.kaggle.com/code/<ref>"
artifacts:
- path: "pkg/recon/sources/kaggle.go"
provides: "KaggleSource implementing recon.ReconSource"
key_links:
- from: "pkg/recon/sources/kaggle.go"
to: "pkg/recon/sources/httpclient.go"
via: "Client.Do with req.SetBasicAuth(user, key)"
pattern: "SetBasicAuth"
---
<objective>
Implement KaggleSource querying Kaggle's public REST API for public notebooks
(kernels). Kaggle uses HTTP Basic auth (username + API key from kaggle.json).
Purpose: RECON-CODE-09.
Output: pkg/recon/sources/kaggle.go + tests.
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/phases/10-osint-code-hosting/10-CONTEXT.md
@.planning/phases/10-osint-code-hosting/10-01-SUMMARY.md
@pkg/recon/source.go
@pkg/recon/sources/httpclient.go
<interfaces>
Kaggle API (docs: https://www.kaggle.com/docs/api):
GET https://www.kaggle.com/api/v1/kernels/list?search=<q>&pageSize=50
Auth: HTTP Basic (username:key)
Response: array of { "ref": "owner/kernel-slug", "title": "...", "author": "..." }
URL derivation: https://www.kaggle.com/code/<ref>
Rate limit: 60/min → rate.Every(1*time.Second), burst 1.
</interfaces>
</context>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: KaggleSource + tests</name>
<files>pkg/recon/sources/kaggle.go, pkg/recon/sources/kaggle_test.go</files>
<behavior>
- Test A: Enabled false when User empty; false when Key empty; true when both set
- Test B: Sweep sets Basic auth header via req.SetBasicAuth(user, key)
- Test C: Decodes array of {ref} → Findings with Source = baseURL + "/code/" + ref, SourceType="recon:kaggle"
- Test D: 401 → ErrUnauthorized
- Test E: Ctx cancellation
- Test F: Missing creds → Sweep returns nil immediately (no HTTP calls, verified via counter=0)
</behavior>
<action>
Create `pkg/recon/sources/kaggle.go`:
- Struct `KaggleSource { User, Key, BaseURL, WebBaseURL string; Registry *providers.Registry; Limiters *recon.LimiterRegistry; client *Client }`
- Default BaseURL `https://www.kaggle.com`, WebBaseURL same
- Name "kaggle", RateLimit rate.Every(1*time.Second), Burst 1, RespectsRobots false
- Enabled = s.User != "" && s.Key != ""
- Sweep: for each query from BuildQueries(reg, "kaggle"), build
`{base}/api/v1/kernels/list?search=<q>&pageSize=50`, call req.SetBasicAuth(User, Key),
client.Do, decode `[]struct{ Ref string "json:ref" }`, emit Findings
- Compile-time assert
Create `pkg/recon/sources/kaggle_test.go`:
- httptest server that validates Authorization header starts with "Basic " and
decodes to "testuser:testkey"
- Returns JSON array with 2 refs
- Assert 2 Findings with expected Source URLs
- Missing-creds test: Sweep returns nil, handler never called (use atomic counter)
- 401 and cancellation tests
</action>
<verify>
<automated>cd /home/salva/Documents/apikey && go test ./pkg/recon/sources/ -run TestKaggle -v -timeout 30s</automated>
</verify>
<done>
KaggleSource passes all tests, implements ReconSource.
</done>
</task>
</tasks>
<verification>
- `go test ./pkg/recon/sources/ -run TestKaggle -v`
</verification>
<success_criteria>
RECON-CODE-09 satisfied.
</success_criteria>
<output>
After completion, create `.planning/phases/10-osint-code-hosting/10-08-SUMMARY.md`.
</output>