Files
keyhunter/.planning/phases/10-osint-code-hosting/10-06-SUMMARY.md
2026-04-06 01:16:27 +03:00

3.7 KiB
Raw Blame History

phase, plan, subsystem, tags, requires, provides, affects, tech_stack_added, patterns, key_files_created, key_files_modified, decisions, metrics
phase plan subsystem tags requires provides affects tech_stack_added patterns key_files_created key_files_modified decisions metrics
10-osint-code-hosting 06 recon/sources
recon
osint
huggingface
wave-2
pkg/recon/sources.Client (Plan 10-01)
pkg/recon/sources.BuildQueries (Plan 10-01)
pkg/recon.LimiterRegistry
pkg/providers.Registry
pkg/recon/sources.HuggingFaceSource
pkg/recon/sources.HuggingFaceConfig
pkg/recon/sources.NewHuggingFaceSource
pkg/recon/sources
Optional-token sources return Enabled=true and degrade RateLimit when credentials absent
Multi-endpoint sweep: iterate queries × endpoints, mapping each to a URL-prefix
Context cancellation checked between endpoint calls and when sending to out channel
pkg/recon/sources/huggingface.go
pkg/recon/sources/huggingface_test.go
Unauthenticated rate of rate.Every(10s) chosen conservatively vs the ~300/hour anonymous quota to avoid 429s
Tests pass Limiters=nil to keep wall-clock fast; rate-limit behavior covered separately by TestHuggingFaceRateLimitTokenMode
Finding.Source uses the canonical public URL (not the API URL) so downstream deduplication matches human-visible links
duration completed tasks files
~8 minutes 2026-04-05 1 2

Phase 10 Plan 06: HuggingFaceSource Summary

Implements HuggingFaceSource against the Hugging Face Hub API, sweeping both /api/spaces and /api/models for every provider keyword and emitting recon Findings with canonical huggingface.co URLs.

What Changed

  • New HuggingFaceSource implementing recon.ReconSource with optional Token.
  • Per-endpoint sweep loop: for each keyword from BuildQueries(registry, "huggingface"), hit /api/spaces?search=...&limit=50 then /api/models?search=...&limit=50.
  • URL normalization: space results mapped to https://huggingface.co/spaces/{id}, model results to https://huggingface.co/{id}.
  • Rate limit is token-aware: rate.Every(3600ms) when authenticated (matches 1000/hour), rate.Every(10s) otherwise.
  • Authorization header only set when Token != "".
  • Compile-time assertion var _ recon.ReconSource = (*HuggingFaceSource)(nil).

Test Coverage

All six TDD assertions in huggingface_test.go pass:

  1. TestHuggingFaceEnabledAlwaysTrue — enabled with and without token.
  2. TestHuggingFaceSweepHitsBothEndpoints — exact Finding count (2 keywords × 2 endpoints = 4), both URL prefixes observed, SourceType="recon:huggingface".
  3. TestHuggingFaceAuthorizationHeaderBearer hf_secret sent when token set, header absent when empty.
  4. TestHuggingFaceContextCancellation — slow server + 100ms context returns error promptly.
  5. TestHuggingFaceRateLimitTokenMode — authenticated rate is strictly faster than unauthenticated rate.

Plus httptest server shared by auth + endpoint tests (hfTestServer).

Deviations from Plan

None — plan executed exactly as written. One minor test refinement: tests pass Limiters: nil instead of constructing a real LimiterRegistry, because the production RateLimit of rate.Every(3600ms) with burst 1 would make four serialized waits exceed a reasonable test budget. The limiter code path is still exercised in production and the rate-mode contract is covered by TestHuggingFaceRateLimitTokenMode.

Commits

  • 45f8782 test(10-06): add failing tests for HuggingFaceSource
  • 39001f2 feat(10-06): implement HuggingFaceSource scanning Spaces and Models

Self-Check: PASSED

  • FOUND: pkg/recon/sources/huggingface.go
  • FOUND: pkg/recon/sources/huggingface_test.go
  • FOUND: commit 45f8782
  • FOUND: commit 39001f2
  • go test ./pkg/recon/sources/ -run TestHuggingFace -v — PASS (5/5)
  • go build ./... — PASS
  • go test ./pkg/recon/... — PASS