Files
keyhunter/.planning/phases/10-osint-code-hosting/10-08-SUMMARY.md
2026-04-06 01:16:24 +03:00

5.6 KiB

phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, requirements-completed, duration, completed
phase plan subsystem tags requires provides affects tech-stack key-files key-decisions patterns-established requirements-completed duration completed
10-osint-code-hosting 08 recon
kaggle
osint
http-basic-auth
httptest
phase provides
10-osint-code-hosting recon.ReconSource interface, sources.Client, BuildQueries, LimiterRegistry (Plan 10-01)
KaggleSource implementing recon.ReconSource against Kaggle /api/v1/kernels/list
HTTP Basic auth wiring via req.SetBasicAuth(user, key)
Finding normalization to Source=<web>/code/<ref>, SourceType=recon:kaggle
10-09-register
10-full-integration
added patterns
Basic-auth recon source pattern (user + key) as counterpart to bearer-token sources
Credential-gated Sweep: return nil without HTTP when either credential missing
created modified
pkg/recon/sources/kaggle.go
pkg/recon/sources/kaggle_test.go
Short-circuit Sweep with nil error when User or Key is empty — no HTTP, no log spam
kaggleKernel decoder ignores non-ref fields so API additions don't break decode
Ignore decode errors and continue to next query (downgrade, not abort) — matches GitHubSource pattern
Basic auth: req.SetBasicAuth(s.User, s.Key) after NewRequestWithContext
Web URL derivation from API ref: web + /code/ + ref
RECON-CODE-09
8min 2026-04-05

Phase 10 Plan 08: KaggleSource Summary

KaggleSource emits Findings from Kaggle public notebook search via HTTP Basic auth against /api/v1/kernels/list

Performance

  • Duration: ~8 min
  • Tasks: 1 (TDD)
  • Files created: 2

Accomplishments

  • KaggleSource type implementing recon.ReconSource (Name, RateLimit, Burst, RespectsRobots, Enabled, Sweep)
  • Credentials-gated: both User AND Key required; missing either returns nil with zero HTTP calls
  • HTTP Basic auth wired via req.SetBasicAuth to Kaggle's /api/v1/kernels/list endpoint
  • Findings normalized with SourceType "recon:kaggle" and Source = WebBaseURL + "/code/" + ref
  • 60 req/min rate limit via rate.Every(1*time.Second), burst 1, honoring per-source LimiterRegistry
  • Compile-time interface assertion: var _ recon.ReconSource = (*KaggleSource)(nil)

Task Commits

  1. Task 1: KaggleSource + tests (TDD)243b740 (feat)

Files Created

  • pkg/recon/sources/kaggle.go — KaggleSource implementation, kaggleKernel decoder, interface assertion
  • pkg/recon/sources/kaggle_test.go — 6 httptest-driven tests

Test Coverage

Test Covers
TestKaggle_Enabled All 4 credential combinations (empty/empty, user-only, key-only, both)
TestKaggle_Sweep_BasicAuthAndFindings Authorization header decoded as testuser:testkey, 2 refs → 2 Findings with correct Source URLs and recon:kaggle SourceType
TestKaggle_Sweep_MissingCredentials_NoHTTP Atomic counter verifies zero HTTP calls when either User or Key empty
TestKaggle_Sweep_Unauthorized 401 response wrapped as ErrUnauthorized
TestKaggle_Sweep_CtxCancellation Pre-cancelled ctx returns context.Canceled promptly
TestKaggle_ReconSourceInterface Compile + runtime assertions on Name, Burst, RespectsRobots, RateLimit

All 6 tests pass in isolation: go test ./pkg/recon/sources/ -run TestKaggle -v

Decisions Made

  • Missing-cred behavior: Sweep returns nil (no error) when either credential absent. Matches GitHubSource pattern — disabled sources log-and-skip at the Engine level, not error out.
  • Decode tolerance: kaggleKernel struct only declares Ref string. Other fields (title, author, language) are silently discarded so upstream API changes don't break the source.
  • Error downgrade: Non-401 HTTP errors skip to next query rather than aborting the whole sweep. 401 is the only hard-fail case because it means credentials are actually invalid, not transient.
  • Dual BaseURL fields: BaseURL (API) and WebBaseURL (Finding URL stem) are separate struct fields so tests can point BaseURL at httptest.NewServer while WebBaseURL stays at the production kaggle.com domain for assertion stability.

Deviations from Plan

None — plan executed exactly as written. All truths from frontmatter (must_haves) satisfied:

  • KaggleSource queries /api/v1/kernels/list with Basic auth → TestKaggle_Sweep_BasicAuthAndFindings
  • Disabled when either credential empty → TestKaggle_Enabled + TestKaggle_Sweep_MissingCredentials_NoHTTP
  • Findings tagged recon:kaggle with Source = web + /code/ + ref → TestKaggle_Sweep_BasicAuthAndFindings

Issues Encountered

  • Sibling-wave file churn: During testing, sibling Wave 2 plans (10-02 GitHub, 10-05 Replit, 10-07 CodeSandbox, 10-03 GitLab) had already dropped partial files into pkg/recon/sources/ in the main repo. A stray github_test.go with no github.go broke package compilation. Resolved by running tests in this plan's git worktree where only kaggle.go and kaggle_test.go are present alongside the Plan 10-01 scaffolding. No cross-plan changes made — scope boundary respected. Final wave merge will resolve all sibling files together.

Next Phase Readiness

  • KaggleSource is ready for registration in Plan 10-09 (RegisterAll wiring).
  • No blockers for downstream plans. RECON-CODE-09 satisfied.

Self-Check: PASSED

  • File exists: pkg/recon/sources/kaggle.go — FOUND
  • File exists: pkg/recon/sources/kaggle_test.go — FOUND
  • Commit exists: 243b740 — FOUND (feat(10-08): add KaggleSource with HTTP Basic auth)
  • Tests pass: 6/6 TestKaggle_* (verified with sibling files stashed to isolate package build)

Phase: 10-osint-code-hosting Plan: 08 Completed: 2026-04-05