Merge branch 'worktree-agent-ad7ef8d3'

This commit is contained in:
salvacybersec
2026-04-06 01:20:33 +03:00
3 changed files with 470 additions and 0 deletions

View File

@@ -0,0 +1,117 @@
---
phase: 10-osint-code-hosting
plan: 08
subsystem: recon
tags: [kaggle, osint, http-basic-auth, httptest]
requires:
- phase: 10-osint-code-hosting
provides: "recon.ReconSource interface, sources.Client, BuildQueries, LimiterRegistry (Plan 10-01)"
provides:
- "KaggleSource implementing recon.ReconSource against Kaggle /api/v1/kernels/list"
- "HTTP Basic auth wiring via req.SetBasicAuth(user, key)"
- "Finding normalization to Source=<web>/code/<ref>, SourceType=recon:kaggle"
affects: [10-09-register, 10-full-integration]
tech-stack:
added: []
patterns:
- "Basic-auth recon source pattern (user + key) as counterpart to bearer-token sources"
- "Credential-gated Sweep: return nil without HTTP when either credential missing"
key-files:
created:
- pkg/recon/sources/kaggle.go
- pkg/recon/sources/kaggle_test.go
modified: []
key-decisions:
- "Short-circuit Sweep with nil error when User or Key is empty — no HTTP, no log spam"
- "kaggleKernel decoder ignores non-ref fields so API additions don't break decode"
- "Ignore decode errors and continue to next query (downgrade, not abort) — matches GitHubSource pattern"
patterns-established:
- "Basic auth: req.SetBasicAuth(s.User, s.Key) after NewRequestWithContext"
- "Web URL derivation from API ref: web + /code/ + ref"
requirements-completed: [RECON-CODE-09]
duration: 8min
completed: 2026-04-05
---
# Phase 10 Plan 08: KaggleSource Summary
**KaggleSource emits Findings from Kaggle public notebook search via HTTP Basic auth against /api/v1/kernels/list**
## Performance
- **Duration:** ~8 min
- **Tasks:** 1 (TDD)
- **Files created:** 2
## Accomplishments
- KaggleSource type implementing recon.ReconSource (Name, RateLimit, Burst, RespectsRobots, Enabled, Sweep)
- Credentials-gated: both User AND Key required; missing either returns nil with zero HTTP calls
- HTTP Basic auth wired via req.SetBasicAuth to Kaggle's /api/v1/kernels/list endpoint
- Findings normalized with SourceType "recon:kaggle" and Source = WebBaseURL + "/code/" + ref
- 60 req/min rate limit via rate.Every(1*time.Second), burst 1, honoring per-source LimiterRegistry
- Compile-time interface assertion: `var _ recon.ReconSource = (*KaggleSource)(nil)`
## Task Commits
1. **Task 1: KaggleSource + tests (TDD)**`243b740` (feat)
## Files Created
- `pkg/recon/sources/kaggle.go` — KaggleSource implementation, kaggleKernel decoder, interface assertion
- `pkg/recon/sources/kaggle_test.go` — 6 httptest-driven tests
## Test Coverage
| Test | Covers |
|------|--------|
| TestKaggle_Enabled | All 4 credential combinations (empty/empty, user-only, key-only, both) |
| TestKaggle_Sweep_BasicAuthAndFindings | Authorization header decoded as testuser:testkey, 2 refs → 2 Findings with correct Source URLs and recon:kaggle SourceType |
| TestKaggle_Sweep_MissingCredentials_NoHTTP | Atomic counter verifies zero HTTP calls when either User or Key empty |
| TestKaggle_Sweep_Unauthorized | 401 response wrapped as ErrUnauthorized |
| TestKaggle_Sweep_CtxCancellation | Pre-cancelled ctx returns context.Canceled promptly |
| TestKaggle_ReconSourceInterface | Compile + runtime assertions on Name, Burst, RespectsRobots, RateLimit |
All 6 tests pass in isolation: `go test ./pkg/recon/sources/ -run TestKaggle -v`
## Decisions Made
- **Missing-cred behavior:** Sweep returns nil (no error) when either credential absent. Matches GitHubSource pattern — disabled sources log-and-skip at the Engine level, not error out.
- **Decode tolerance:** kaggleKernel struct only declares `Ref string`. Other fields (title, author, language) are silently discarded so upstream API changes don't break the source.
- **Error downgrade:** Non-401 HTTP errors skip to next query rather than aborting the whole sweep. 401 is the only hard-fail case because it means credentials are actually invalid, not transient.
- **Dual BaseURL fields:** BaseURL (API) and WebBaseURL (Finding URL stem) are separate struct fields so tests can point BaseURL at httptest.NewServer while WebBaseURL stays at the production kaggle.com domain for assertion stability.
## Deviations from Plan
None — plan executed exactly as written. All truths from frontmatter (`must_haves`) satisfied:
- KaggleSource queries `/api/v1/kernels/list` with Basic auth → TestKaggle_Sweep_BasicAuthAndFindings
- Disabled when either credential empty → TestKaggle_Enabled + TestKaggle_Sweep_MissingCredentials_NoHTTP
- Findings tagged recon:kaggle with Source = web + /code/ + ref → TestKaggle_Sweep_BasicAuthAndFindings
## Issues Encountered
- **Sibling-wave file churn:** During testing, sibling Wave 2 plans (10-02 GitHub, 10-05 Replit, 10-07 CodeSandbox, 10-03 GitLab) had already dropped partial files into `pkg/recon/sources/` in the main repo. A stray `github_test.go` with no `github.go` broke package compilation. Resolved by running tests in this plan's git worktree where only kaggle.go and kaggle_test.go are present alongside the Plan 10-01 scaffolding. No cross-plan changes made — scope boundary respected. Final wave merge will resolve all sibling files together.
## Next Phase Readiness
- KaggleSource is ready for registration in Plan 10-09 (`RegisterAll` wiring).
- No blockers for downstream plans. RECON-CODE-09 satisfied.
## Self-Check: PASSED
- File exists: `pkg/recon/sources/kaggle.go` — FOUND
- File exists: `pkg/recon/sources/kaggle_test.go` — FOUND
- Commit exists: `243b740` — FOUND (feat(10-08): add KaggleSource with HTTP Basic auth)
- Tests pass: 6/6 TestKaggle_* (verified with sibling files stashed to isolate package build)
---
*Phase: 10-osint-code-hosting*
*Plan: 08*
*Completed: 2026-04-05*