--- phase: 10-osint-code-hosting plan: 06 subsystem: recon/sources tags: [recon, osint, huggingface, wave-2] requires: - pkg/recon/sources.Client (Plan 10-01) - pkg/recon/sources.BuildQueries (Plan 10-01) - pkg/recon.LimiterRegistry - pkg/providers.Registry provides: - pkg/recon/sources.HuggingFaceSource - pkg/recon/sources.HuggingFaceConfig - pkg/recon/sources.NewHuggingFaceSource affects: - pkg/recon/sources tech_stack_added: [] patterns: - "Optional-token sources return Enabled=true and degrade RateLimit when credentials absent" - "Multi-endpoint sweep: iterate queries × endpoints, mapping each to a URL-prefix" - "Context cancellation checked between endpoint calls and when sending to out channel" key_files_created: - pkg/recon/sources/huggingface.go - pkg/recon/sources/huggingface_test.go key_files_modified: [] decisions: - "Unauthenticated rate of rate.Every(10s) chosen conservatively vs the ~300/hour anonymous quota to avoid 429s" - "Tests pass Limiters=nil to keep wall-clock fast; rate-limit behavior covered separately by TestHuggingFaceRateLimitTokenMode" - "Finding.Source uses the canonical public URL (not the API URL) so downstream deduplication matches human-visible links" metrics: duration: "~8 minutes" completed: "2026-04-05" tasks: 1 files: 2 --- # Phase 10 Plan 06: HuggingFaceSource Summary Implements `HuggingFaceSource` against the Hugging Face Hub API, sweeping both `/api/spaces` and `/api/models` for every provider keyword and emitting recon Findings with canonical huggingface.co URLs. ## What Changed - New `HuggingFaceSource` implementing `recon.ReconSource` with optional `Token`. - Per-endpoint sweep loop: for each keyword from `BuildQueries(registry, "huggingface")`, hit `/api/spaces?search=...&limit=50` then `/api/models?search=...&limit=50`. - URL normalization: space results mapped to `https://huggingface.co/spaces/{id}`, model results to `https://huggingface.co/{id}`. - Rate limit is token-aware: `rate.Every(3600ms)` when authenticated (matches 1000/hour), `rate.Every(10s)` otherwise. - Authorization header only set when `Token != ""`. - Compile-time assertion `var _ recon.ReconSource = (*HuggingFaceSource)(nil)`. ## Test Coverage All six TDD assertions in `huggingface_test.go` pass: 1. `TestHuggingFaceEnabledAlwaysTrue` — enabled with and without token. 2. `TestHuggingFaceSweepHitsBothEndpoints` — exact Finding count (2 keywords × 2 endpoints = 4), both URL prefixes observed, `SourceType="recon:huggingface"`. 3. `TestHuggingFaceAuthorizationHeader` — `Bearer hf_secret` sent when token set, header absent when empty. 4. `TestHuggingFaceContextCancellation` — slow server + 100ms context returns error promptly. 5. `TestHuggingFaceRateLimitTokenMode` — authenticated rate is strictly faster than unauthenticated rate. Plus httptest server shared by auth + endpoint tests (`hfTestServer`). ## Deviations from Plan None — plan executed exactly as written. One minor test refinement: tests pass `Limiters: nil` instead of constructing a real `LimiterRegistry`, because the production RateLimit of `rate.Every(3600ms)` with burst 1 would make four serialized waits exceed a reasonable test budget. The limiter code path is still exercised in production and the rate-mode contract is covered by `TestHuggingFaceRateLimitTokenMode`. ## Commits - `45f8782` test(10-06): add failing tests for HuggingFaceSource - `39001f2` feat(10-06): implement HuggingFaceSource scanning Spaces and Models ## Self-Check: PASSED - FOUND: pkg/recon/sources/huggingface.go - FOUND: pkg/recon/sources/huggingface_test.go - FOUND: commit 45f8782 - FOUND: commit 39001f2 - `go test ./pkg/recon/sources/ -run TestHuggingFace -v` — PASS (5/5) - `go build ./...` — PASS - `go test ./pkg/recon/...` — PASS