- queries /api/spaces and /api/models via Hub API
- token optional: slower rate when absent (10s vs 3.6s)
- emits Findings with SourceType=recon:huggingface and prefixed Source URLs
- compile-time assert implements recon.ReconSource
- BuildQueries(reg, source) dedups keywords and formats per-source syntax
- github/gist use 'keyword' in:file; others use bare keyword
- SourcesConfig placeholder struct for Wave 2 plans to depend on
- RegisterAll no-op stub (Plan 10-09 will fill)
- Dedup drops duplicates keyed by sha256(ProviderName|KeyMasked|Source)
- Preserves input order and first-seen metadata (stable dedup)
- Same provider+masked with different Source URLs are kept separate
- Uses engine.Finding directly to avoid alias collision with Plan 09-01
- Engine.Register/List/SweepAll with ants pool fanout
- ExampleSource emits two deterministic findings (SourceType=recon:example)
- Tests cover Register/List idempotency, SweepAll aggregation, empty-registry,
and Enabled() filtering
- Parses robots.txt via temoto/robotstxt
- Caches per host for 1 hour; second call within TTL skips HTTP fetch
- Default-allow on network/parse/4xx/5xx errors
- Matches 'keyhunter' user-agent against disallowed paths
- Client field allows httptest injection
Satisfies RECON-INFRA-07.
- Pool of 10 realistic browser User-Agents (Chrome/Firefox/Safari/Edge)
- Covers Windows, macOS, Linux, iOS, Android
- RandomUserAgent returns a random pool entry
- StealthHeaders returns UA + Accept-Language header map