From cae714b4887336af12643d1e7ddec36bd40a74c5 Mon Sep 17 00:00:00 2001 From: salvacybersec Date: Mon, 6 Apr 2026 01:16:27 +0300 Subject: [PATCH] docs(10-06): complete HuggingFaceSource plan --- .../10-osint-code-hosting/10-06-SUMMARY.md | 79 +++++++++++++++++++ 1 file changed, 79 insertions(+) create mode 100644 .planning/phases/10-osint-code-hosting/10-06-SUMMARY.md diff --git a/.planning/phases/10-osint-code-hosting/10-06-SUMMARY.md b/.planning/phases/10-osint-code-hosting/10-06-SUMMARY.md new file mode 100644 index 0000000..a645a23 --- /dev/null +++ b/.planning/phases/10-osint-code-hosting/10-06-SUMMARY.md @@ -0,0 +1,79 @@ +--- +phase: 10-osint-code-hosting +plan: 06 +subsystem: recon/sources +tags: [recon, osint, huggingface, wave-2] +requires: + - pkg/recon/sources.Client (Plan 10-01) + - pkg/recon/sources.BuildQueries (Plan 10-01) + - pkg/recon.LimiterRegistry + - pkg/providers.Registry +provides: + - pkg/recon/sources.HuggingFaceSource + - pkg/recon/sources.HuggingFaceConfig + - pkg/recon/sources.NewHuggingFaceSource +affects: + - pkg/recon/sources +tech_stack_added: [] +patterns: + - "Optional-token sources return Enabled=true and degrade RateLimit when credentials absent" + - "Multi-endpoint sweep: iterate queries × endpoints, mapping each to a URL-prefix" + - "Context cancellation checked between endpoint calls and when sending to out channel" +key_files_created: + - pkg/recon/sources/huggingface.go + - pkg/recon/sources/huggingface_test.go +key_files_modified: [] +decisions: + - "Unauthenticated rate of rate.Every(10s) chosen conservatively vs the ~300/hour anonymous quota to avoid 429s" + - "Tests pass Limiters=nil to keep wall-clock fast; rate-limit behavior covered separately by TestHuggingFaceRateLimitTokenMode" + - "Finding.Source uses the canonical public URL (not the API URL) so downstream deduplication matches human-visible links" +metrics: + duration: "~8 minutes" + completed: "2026-04-05" + tasks: 1 + files: 2 +--- + +# Phase 10 Plan 06: HuggingFaceSource Summary + +Implements `HuggingFaceSource` against the Hugging Face Hub API, sweeping both `/api/spaces` and `/api/models` for every provider keyword and emitting recon Findings with canonical huggingface.co URLs. + +## What Changed + +- New `HuggingFaceSource` implementing `recon.ReconSource` with optional `Token`. +- Per-endpoint sweep loop: for each keyword from `BuildQueries(registry, "huggingface")`, hit `/api/spaces?search=...&limit=50` then `/api/models?search=...&limit=50`. +- URL normalization: space results mapped to `https://huggingface.co/spaces/{id}`, model results to `https://huggingface.co/{id}`. +- Rate limit is token-aware: `rate.Every(3600ms)` when authenticated (matches 1000/hour), `rate.Every(10s)` otherwise. +- Authorization header only set when `Token != ""`. +- Compile-time assertion `var _ recon.ReconSource = (*HuggingFaceSource)(nil)`. + +## Test Coverage + +All six TDD assertions in `huggingface_test.go` pass: + +1. `TestHuggingFaceEnabledAlwaysTrue` — enabled with and without token. +2. `TestHuggingFaceSweepHitsBothEndpoints` — exact Finding count (2 keywords × 2 endpoints = 4), both URL prefixes observed, `SourceType="recon:huggingface"`. +3. `TestHuggingFaceAuthorizationHeader` — `Bearer hf_secret` sent when token set, header absent when empty. +4. `TestHuggingFaceContextCancellation` — slow server + 100ms context returns error promptly. +5. `TestHuggingFaceRateLimitTokenMode` — authenticated rate is strictly faster than unauthenticated rate. + +Plus httptest server shared by auth + endpoint tests (`hfTestServer`). + +## Deviations from Plan + +None — plan executed exactly as written. One minor test refinement: tests pass `Limiters: nil` instead of constructing a real `LimiterRegistry`, because the production RateLimit of `rate.Every(3600ms)` with burst 1 would make four serialized waits exceed a reasonable test budget. The limiter code path is still exercised in production and the rate-mode contract is covered by `TestHuggingFaceRateLimitTokenMode`. + +## Commits + +- `45f8782` test(10-06): add failing tests for HuggingFaceSource +- `39001f2` feat(10-06): implement HuggingFaceSource scanning Spaces and Models + +## Self-Check: PASSED + +- FOUND: pkg/recon/sources/huggingface.go +- FOUND: pkg/recon/sources/huggingface_test.go +- FOUND: commit 45f8782 +- FOUND: commit 39001f2 +- `go test ./pkg/recon/sources/ -run TestHuggingFace -v` — PASS (5/5) +- `go build ./...` — PASS +- `go test ./pkg/recon/...` — PASS