--- phase: 10-osint-code-hosting plan: 08 subsystem: recon tags: [kaggle, osint, http-basic-auth, httptest] requires: - phase: 10-osint-code-hosting provides: "recon.ReconSource interface, sources.Client, BuildQueries, LimiterRegistry (Plan 10-01)" provides: - "KaggleSource implementing recon.ReconSource against Kaggle /api/v1/kernels/list" - "HTTP Basic auth wiring via req.SetBasicAuth(user, key)" - "Finding normalization to Source=/code/, SourceType=recon:kaggle" affects: [10-09-register, 10-full-integration] tech-stack: added: [] patterns: - "Basic-auth recon source pattern (user + key) as counterpart to bearer-token sources" - "Credential-gated Sweep: return nil without HTTP when either credential missing" key-files: created: - pkg/recon/sources/kaggle.go - pkg/recon/sources/kaggle_test.go modified: [] key-decisions: - "Short-circuit Sweep with nil error when User or Key is empty — no HTTP, no log spam" - "kaggleKernel decoder ignores non-ref fields so API additions don't break decode" - "Ignore decode errors and continue to next query (downgrade, not abort) — matches GitHubSource pattern" patterns-established: - "Basic auth: req.SetBasicAuth(s.User, s.Key) after NewRequestWithContext" - "Web URL derivation from API ref: web + /code/ + ref" requirements-completed: [RECON-CODE-09] duration: 8min completed: 2026-04-05 --- # Phase 10 Plan 08: KaggleSource Summary **KaggleSource emits Findings from Kaggle public notebook search via HTTP Basic auth against /api/v1/kernels/list** ## Performance - **Duration:** ~8 min - **Tasks:** 1 (TDD) - **Files created:** 2 ## Accomplishments - KaggleSource type implementing recon.ReconSource (Name, RateLimit, Burst, RespectsRobots, Enabled, Sweep) - Credentials-gated: both User AND Key required; missing either returns nil with zero HTTP calls - HTTP Basic auth wired via req.SetBasicAuth to Kaggle's /api/v1/kernels/list endpoint - Findings normalized with SourceType "recon:kaggle" and Source = WebBaseURL + "/code/" + ref - 60 req/min rate limit via rate.Every(1*time.Second), burst 1, honoring per-source LimiterRegistry - Compile-time interface assertion: `var _ recon.ReconSource = (*KaggleSource)(nil)` ## Task Commits 1. **Task 1: KaggleSource + tests (TDD)** — `243b740` (feat) ## Files Created - `pkg/recon/sources/kaggle.go` — KaggleSource implementation, kaggleKernel decoder, interface assertion - `pkg/recon/sources/kaggle_test.go` — 6 httptest-driven tests ## Test Coverage | Test | Covers | |------|--------| | TestKaggle_Enabled | All 4 credential combinations (empty/empty, user-only, key-only, both) | | TestKaggle_Sweep_BasicAuthAndFindings | Authorization header decoded as testuser:testkey, 2 refs → 2 Findings with correct Source URLs and recon:kaggle SourceType | | TestKaggle_Sweep_MissingCredentials_NoHTTP | Atomic counter verifies zero HTTP calls when either User or Key empty | | TestKaggle_Sweep_Unauthorized | 401 response wrapped as ErrUnauthorized | | TestKaggle_Sweep_CtxCancellation | Pre-cancelled ctx returns context.Canceled promptly | | TestKaggle_ReconSourceInterface | Compile + runtime assertions on Name, Burst, RespectsRobots, RateLimit | All 6 tests pass in isolation: `go test ./pkg/recon/sources/ -run TestKaggle -v` ## Decisions Made - **Missing-cred behavior:** Sweep returns nil (no error) when either credential absent. Matches GitHubSource pattern — disabled sources log-and-skip at the Engine level, not error out. - **Decode tolerance:** kaggleKernel struct only declares `Ref string`. Other fields (title, author, language) are silently discarded so upstream API changes don't break the source. - **Error downgrade:** Non-401 HTTP errors skip to next query rather than aborting the whole sweep. 401 is the only hard-fail case because it means credentials are actually invalid, not transient. - **Dual BaseURL fields:** BaseURL (API) and WebBaseURL (Finding URL stem) are separate struct fields so tests can point BaseURL at httptest.NewServer while WebBaseURL stays at the production kaggle.com domain for assertion stability. ## Deviations from Plan None — plan executed exactly as written. All truths from frontmatter (`must_haves`) satisfied: - KaggleSource queries `/api/v1/kernels/list` with Basic auth → TestKaggle_Sweep_BasicAuthAndFindings - Disabled when either credential empty → TestKaggle_Enabled + TestKaggle_Sweep_MissingCredentials_NoHTTP - Findings tagged recon:kaggle with Source = web + /code/ + ref → TestKaggle_Sweep_BasicAuthAndFindings ## Issues Encountered - **Sibling-wave file churn:** During testing, sibling Wave 2 plans (10-02 GitHub, 10-05 Replit, 10-07 CodeSandbox, 10-03 GitLab) had already dropped partial files into `pkg/recon/sources/` in the main repo. A stray `github_test.go` with no `github.go` broke package compilation. Resolved by running tests in this plan's git worktree where only kaggle.go and kaggle_test.go are present alongside the Plan 10-01 scaffolding. No cross-plan changes made — scope boundary respected. Final wave merge will resolve all sibling files together. ## Next Phase Readiness - KaggleSource is ready for registration in Plan 10-09 (`RegisterAll` wiring). - No blockers for downstream plans. RECON-CODE-09 satisfied. ## Self-Check: PASSED - File exists: `pkg/recon/sources/kaggle.go` — FOUND - File exists: `pkg/recon/sources/kaggle_test.go` — FOUND - Commit exists: `243b740` — FOUND (feat(10-08): add KaggleSource with HTTP Basic auth) - Tests pass: 6/6 TestKaggle_* (verified with sibling files stashed to isolate package build) --- *Phase: 10-osint-code-hosting* *Plan: 08* *Completed: 2026-04-05*