Merge branch 'worktree-agent-ac81d6ab'

This commit is contained in:
salvacybersec
2026-04-06 01:20:25 +03:00
3 changed files with 464 additions and 0 deletions

View File

@@ -0,0 +1,79 @@
---
phase: 10-osint-code-hosting
plan: 06
subsystem: recon/sources
tags: [recon, osint, huggingface, wave-2]
requires:
- pkg/recon/sources.Client (Plan 10-01)
- pkg/recon/sources.BuildQueries (Plan 10-01)
- pkg/recon.LimiterRegistry
- pkg/providers.Registry
provides:
- pkg/recon/sources.HuggingFaceSource
- pkg/recon/sources.HuggingFaceConfig
- pkg/recon/sources.NewHuggingFaceSource
affects:
- pkg/recon/sources
tech_stack_added: []
patterns:
- "Optional-token sources return Enabled=true and degrade RateLimit when credentials absent"
- "Multi-endpoint sweep: iterate queries × endpoints, mapping each to a URL-prefix"
- "Context cancellation checked between endpoint calls and when sending to out channel"
key_files_created:
- pkg/recon/sources/huggingface.go
- pkg/recon/sources/huggingface_test.go
key_files_modified: []
decisions:
- "Unauthenticated rate of rate.Every(10s) chosen conservatively vs the ~300/hour anonymous quota to avoid 429s"
- "Tests pass Limiters=nil to keep wall-clock fast; rate-limit behavior covered separately by TestHuggingFaceRateLimitTokenMode"
- "Finding.Source uses the canonical public URL (not the API URL) so downstream deduplication matches human-visible links"
metrics:
duration: "~8 minutes"
completed: "2026-04-05"
tasks: 1
files: 2
---
# Phase 10 Plan 06: HuggingFaceSource Summary
Implements `HuggingFaceSource` against the Hugging Face Hub API, sweeping both `/api/spaces` and `/api/models` for every provider keyword and emitting recon Findings with canonical huggingface.co URLs.
## What Changed
- New `HuggingFaceSource` implementing `recon.ReconSource` with optional `Token`.
- Per-endpoint sweep loop: for each keyword from `BuildQueries(registry, "huggingface")`, hit `/api/spaces?search=...&limit=50` then `/api/models?search=...&limit=50`.
- URL normalization: space results mapped to `https://huggingface.co/spaces/{id}`, model results to `https://huggingface.co/{id}`.
- Rate limit is token-aware: `rate.Every(3600ms)` when authenticated (matches 1000/hour), `rate.Every(10s)` otherwise.
- Authorization header only set when `Token != ""`.
- Compile-time assertion `var _ recon.ReconSource = (*HuggingFaceSource)(nil)`.
## Test Coverage
All six TDD assertions in `huggingface_test.go` pass:
1. `TestHuggingFaceEnabledAlwaysTrue` — enabled with and without token.
2. `TestHuggingFaceSweepHitsBothEndpoints` — exact Finding count (2 keywords × 2 endpoints = 4), both URL prefixes observed, `SourceType="recon:huggingface"`.
3. `TestHuggingFaceAuthorizationHeader``Bearer hf_secret` sent when token set, header absent when empty.
4. `TestHuggingFaceContextCancellation` — slow server + 100ms context returns error promptly.
5. `TestHuggingFaceRateLimitTokenMode` — authenticated rate is strictly faster than unauthenticated rate.
Plus httptest server shared by auth + endpoint tests (`hfTestServer`).
## Deviations from Plan
None — plan executed exactly as written. One minor test refinement: tests pass `Limiters: nil` instead of constructing a real `LimiterRegistry`, because the production RateLimit of `rate.Every(3600ms)` with burst 1 would make four serialized waits exceed a reasonable test budget. The limiter code path is still exercised in production and the rate-mode contract is covered by `TestHuggingFaceRateLimitTokenMode`.
## Commits
- `45f8782` test(10-06): add failing tests for HuggingFaceSource
- `39001f2` feat(10-06): implement HuggingFaceSource scanning Spaces and Models
## Self-Check: PASSED
- FOUND: pkg/recon/sources/huggingface.go
- FOUND: pkg/recon/sources/huggingface_test.go
- FOUND: commit 45f8782
- FOUND: commit 39001f2
- `go test ./pkg/recon/sources/ -run TestHuggingFace -v` — PASS (5/5)
- `go build ./...` — PASS
- `go test ./pkg/recon/...` — PASS