From 792ac8d54bb6cbd6c1059cc3b4870cf43f5b14fa Mon Sep 17 00:00:00 2001 From: salvacybersec Date: Mon, 6 Apr 2026 01:16:24 +0300 Subject: [PATCH] docs(10-08): complete KaggleSource plan --- .../10-osint-code-hosting/10-08-SUMMARY.md | 117 ++++++++++++++++++ 1 file changed, 117 insertions(+) create mode 100644 .planning/phases/10-osint-code-hosting/10-08-SUMMARY.md diff --git a/.planning/phases/10-osint-code-hosting/10-08-SUMMARY.md b/.planning/phases/10-osint-code-hosting/10-08-SUMMARY.md new file mode 100644 index 0000000..a5b0c02 --- /dev/null +++ b/.planning/phases/10-osint-code-hosting/10-08-SUMMARY.md @@ -0,0 +1,117 @@ +--- +phase: 10-osint-code-hosting +plan: 08 +subsystem: recon +tags: [kaggle, osint, http-basic-auth, httptest] + +requires: + - phase: 10-osint-code-hosting + provides: "recon.ReconSource interface, sources.Client, BuildQueries, LimiterRegistry (Plan 10-01)" +provides: + - "KaggleSource implementing recon.ReconSource against Kaggle /api/v1/kernels/list" + - "HTTP Basic auth wiring via req.SetBasicAuth(user, key)" + - "Finding normalization to Source=/code/, SourceType=recon:kaggle" +affects: [10-09-register, 10-full-integration] + +tech-stack: + added: [] + patterns: + - "Basic-auth recon source pattern (user + key) as counterpart to bearer-token sources" + - "Credential-gated Sweep: return nil without HTTP when either credential missing" + +key-files: + created: + - pkg/recon/sources/kaggle.go + - pkg/recon/sources/kaggle_test.go + modified: [] + +key-decisions: + - "Short-circuit Sweep with nil error when User or Key is empty — no HTTP, no log spam" + - "kaggleKernel decoder ignores non-ref fields so API additions don't break decode" + - "Ignore decode errors and continue to next query (downgrade, not abort) — matches GitHubSource pattern" + +patterns-established: + - "Basic auth: req.SetBasicAuth(s.User, s.Key) after NewRequestWithContext" + - "Web URL derivation from API ref: web + /code/ + ref" + +requirements-completed: [RECON-CODE-09] + +duration: 8min +completed: 2026-04-05 +--- + +# Phase 10 Plan 08: KaggleSource Summary + +**KaggleSource emits Findings from Kaggle public notebook search via HTTP Basic auth against /api/v1/kernels/list** + +## Performance + +- **Duration:** ~8 min +- **Tasks:** 1 (TDD) +- **Files created:** 2 + +## Accomplishments + +- KaggleSource type implementing recon.ReconSource (Name, RateLimit, Burst, RespectsRobots, Enabled, Sweep) +- Credentials-gated: both User AND Key required; missing either returns nil with zero HTTP calls +- HTTP Basic auth wired via req.SetBasicAuth to Kaggle's /api/v1/kernels/list endpoint +- Findings normalized with SourceType "recon:kaggle" and Source = WebBaseURL + "/code/" + ref +- 60 req/min rate limit via rate.Every(1*time.Second), burst 1, honoring per-source LimiterRegistry +- Compile-time interface assertion: `var _ recon.ReconSource = (*KaggleSource)(nil)` + +## Task Commits + +1. **Task 1: KaggleSource + tests (TDD)** — `243b740` (feat) + +## Files Created + +- `pkg/recon/sources/kaggle.go` — KaggleSource implementation, kaggleKernel decoder, interface assertion +- `pkg/recon/sources/kaggle_test.go` — 6 httptest-driven tests + +## Test Coverage + +| Test | Covers | +|------|--------| +| TestKaggle_Enabled | All 4 credential combinations (empty/empty, user-only, key-only, both) | +| TestKaggle_Sweep_BasicAuthAndFindings | Authorization header decoded as testuser:testkey, 2 refs → 2 Findings with correct Source URLs and recon:kaggle SourceType | +| TestKaggle_Sweep_MissingCredentials_NoHTTP | Atomic counter verifies zero HTTP calls when either User or Key empty | +| TestKaggle_Sweep_Unauthorized | 401 response wrapped as ErrUnauthorized | +| TestKaggle_Sweep_CtxCancellation | Pre-cancelled ctx returns context.Canceled promptly | +| TestKaggle_ReconSourceInterface | Compile + runtime assertions on Name, Burst, RespectsRobots, RateLimit | + +All 6 tests pass in isolation: `go test ./pkg/recon/sources/ -run TestKaggle -v` + +## Decisions Made + +- **Missing-cred behavior:** Sweep returns nil (no error) when either credential absent. Matches GitHubSource pattern — disabled sources log-and-skip at the Engine level, not error out. +- **Decode tolerance:** kaggleKernel struct only declares `Ref string`. Other fields (title, author, language) are silently discarded so upstream API changes don't break the source. +- **Error downgrade:** Non-401 HTTP errors skip to next query rather than aborting the whole sweep. 401 is the only hard-fail case because it means credentials are actually invalid, not transient. +- **Dual BaseURL fields:** BaseURL (API) and WebBaseURL (Finding URL stem) are separate struct fields so tests can point BaseURL at httptest.NewServer while WebBaseURL stays at the production kaggle.com domain for assertion stability. + +## Deviations from Plan + +None — plan executed exactly as written. All truths from frontmatter (`must_haves`) satisfied: +- KaggleSource queries `/api/v1/kernels/list` with Basic auth → TestKaggle_Sweep_BasicAuthAndFindings +- Disabled when either credential empty → TestKaggle_Enabled + TestKaggle_Sweep_MissingCredentials_NoHTTP +- Findings tagged recon:kaggle with Source = web + /code/ + ref → TestKaggle_Sweep_BasicAuthAndFindings + +## Issues Encountered + +- **Sibling-wave file churn:** During testing, sibling Wave 2 plans (10-02 GitHub, 10-05 Replit, 10-07 CodeSandbox, 10-03 GitLab) had already dropped partial files into `pkg/recon/sources/` in the main repo. A stray `github_test.go` with no `github.go` broke package compilation. Resolved by running tests in this plan's git worktree where only kaggle.go and kaggle_test.go are present alongside the Plan 10-01 scaffolding. No cross-plan changes made — scope boundary respected. Final wave merge will resolve all sibling files together. + +## Next Phase Readiness + +- KaggleSource is ready for registration in Plan 10-09 (`RegisterAll` wiring). +- No blockers for downstream plans. RECON-CODE-09 satisfied. + +## Self-Check: PASSED + +- File exists: `pkg/recon/sources/kaggle.go` — FOUND +- File exists: `pkg/recon/sources/kaggle_test.go` — FOUND +- Commit exists: `243b740` — FOUND (feat(10-08): add KaggleSource with HTTP Basic auth) +- Tests pass: 6/6 TestKaggle_* (verified with sibling files stashed to isolate package build) + +--- +*Phase: 10-osint-code-hosting* +*Plan: 08* +*Completed: 2026-04-05*