Files
keyhunter/.planning/phases/10-osint-code-hosting/10-06-SUMMARY.md
2026-04-06 01:16:27 +03:00

80 lines
3.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
phase: 10-osint-code-hosting
plan: 06
subsystem: recon/sources
tags: [recon, osint, huggingface, wave-2]
requires:
- pkg/recon/sources.Client (Plan 10-01)
- pkg/recon/sources.BuildQueries (Plan 10-01)
- pkg/recon.LimiterRegistry
- pkg/providers.Registry
provides:
- pkg/recon/sources.HuggingFaceSource
- pkg/recon/sources.HuggingFaceConfig
- pkg/recon/sources.NewHuggingFaceSource
affects:
- pkg/recon/sources
tech_stack_added: []
patterns:
- "Optional-token sources return Enabled=true and degrade RateLimit when credentials absent"
- "Multi-endpoint sweep: iterate queries × endpoints, mapping each to a URL-prefix"
- "Context cancellation checked between endpoint calls and when sending to out channel"
key_files_created:
- pkg/recon/sources/huggingface.go
- pkg/recon/sources/huggingface_test.go
key_files_modified: []
decisions:
- "Unauthenticated rate of rate.Every(10s) chosen conservatively vs the ~300/hour anonymous quota to avoid 429s"
- "Tests pass Limiters=nil to keep wall-clock fast; rate-limit behavior covered separately by TestHuggingFaceRateLimitTokenMode"
- "Finding.Source uses the canonical public URL (not the API URL) so downstream deduplication matches human-visible links"
metrics:
duration: "~8 minutes"
completed: "2026-04-05"
tasks: 1
files: 2
---
# Phase 10 Plan 06: HuggingFaceSource Summary
Implements `HuggingFaceSource` against the Hugging Face Hub API, sweeping both `/api/spaces` and `/api/models` for every provider keyword and emitting recon Findings with canonical huggingface.co URLs.
## What Changed
- New `HuggingFaceSource` implementing `recon.ReconSource` with optional `Token`.
- Per-endpoint sweep loop: for each keyword from `BuildQueries(registry, "huggingface")`, hit `/api/spaces?search=...&limit=50` then `/api/models?search=...&limit=50`.
- URL normalization: space results mapped to `https://huggingface.co/spaces/{id}`, model results to `https://huggingface.co/{id}`.
- Rate limit is token-aware: `rate.Every(3600ms)` when authenticated (matches 1000/hour), `rate.Every(10s)` otherwise.
- Authorization header only set when `Token != ""`.
- Compile-time assertion `var _ recon.ReconSource = (*HuggingFaceSource)(nil)`.
## Test Coverage
All six TDD assertions in `huggingface_test.go` pass:
1. `TestHuggingFaceEnabledAlwaysTrue` — enabled with and without token.
2. `TestHuggingFaceSweepHitsBothEndpoints` — exact Finding count (2 keywords × 2 endpoints = 4), both URL prefixes observed, `SourceType="recon:huggingface"`.
3. `TestHuggingFaceAuthorizationHeader``Bearer hf_secret` sent when token set, header absent when empty.
4. `TestHuggingFaceContextCancellation` — slow server + 100ms context returns error promptly.
5. `TestHuggingFaceRateLimitTokenMode` — authenticated rate is strictly faster than unauthenticated rate.
Plus httptest server shared by auth + endpoint tests (`hfTestServer`).
## Deviations from Plan
None — plan executed exactly as written. One minor test refinement: tests pass `Limiters: nil` instead of constructing a real `LimiterRegistry`, because the production RateLimit of `rate.Every(3600ms)` with burst 1 would make four serialized waits exceed a reasonable test budget. The limiter code path is still exercised in production and the rate-mode contract is covered by `TestHuggingFaceRateLimitTokenMode`.
## Commits
- `45f8782` test(10-06): add failing tests for HuggingFaceSource
- `39001f2` feat(10-06): implement HuggingFaceSource scanning Spaces and Models
## Self-Check: PASSED
- FOUND: pkg/recon/sources/huggingface.go
- FOUND: pkg/recon/sources/huggingface_test.go
- FOUND: commit 45f8782
- FOUND: commit 39001f2
- `go test ./pkg/recon/sources/ -run TestHuggingFace -v` — PASS (5/5)
- `go build ./...` — PASS
- `go test ./pkg/recon/...` — PASS