- queries /api/spaces and /api/models via Hub API
- token optional: slower rate when absent (10s vs 3.6s)
- emits Findings with SourceType=recon:huggingface and prefixed Source URLs
- compile-time assert implements recon.ReconSource
- BuildQueries(reg, source) dedups keywords and formats per-source syntax
- github/gist use 'keyword' in:file; others use bare keyword
- SourcesConfig placeholder struct for Wave 2 plans to depend on
- RegisterAll no-op stub (Plan 10-09 will fill)
- Dedup drops duplicates keyed by sha256(ProviderName|KeyMasked|Source)
- Preserves input order and first-seen metadata (stable dedup)
- Same provider+masked with different Source URLs are kept separate
- Uses engine.Finding directly to avoid alias collision with Plan 09-01
- Engine.Register/List/SweepAll with ants pool fanout
- ExampleSource emits two deterministic findings (SourceType=recon:example)
- Tests cover Register/List idempotency, SweepAll aggregation, empty-registry,
and Enabled() filtering
- Parses robots.txt via temoto/robotstxt
- Caches per host for 1 hour; second call within TTL skips HTTP fetch
- Default-allow on network/parse/4xx/5xx errors
- Matches 'keyhunter' user-agent against disallowed paths
- Client field allows httptest injection
Satisfies RECON-INFRA-07.
- Pool of 10 realistic browser User-Agents (Chrome/Firefox/Safari/Edge)
- Covers Windows, macOS, Linux, iOS, Android
- RandomUserAgent returns a random pool entry
- StealthHeaders returns UA + Accept-Language header map
- schema.sql: CREATE TABLE IF NOT EXISTS custom_dorks with unique dork_id,
source/category indexes, and tags stored as JSON TEXT
- custom_dorks.go: Save/List/Get/GetByDorkID/Delete with JSON tag round-trip
- Tests: round-trip, newest-first ordering, not-found, unique constraint,
delete no-op, schema migration idempotency
- Dork schema with Validate() mirroring provider YAML pattern
- go:embed loader tolerating empty definitions tree
- Registry with List/Get/Stats/ListBySource/ListByCategory
- Executor interface + Runner dispatch + ErrSourceNotImplemented
- Placeholder definitions/.gitkeep and repo-root dorks/.gitkeep
- Full unit test coverage for registry, validation, and runner dispatch
- FindingKey: stable SHA-256 over provider+masked+source+line
- Dedup: preserves first-seen order, returns drop count
- 8 unit tests covering stability, field sensitivity, order preservation
- Fixed 9-column header: id,provider,confidence,key,source,line,detected_at,verified,verify_status
- Uses encoding/csv for automatic quoting of commas/quotes in source paths
- Honors Options.Unmask for key column
- Registers under "csv" in output registry
- SARIFFormatter emits schema-valid SARIF 2.1.0 JSON for CI ingestion
- One rule per distinct provider, deduped in first-seen order
- Confidence mapped high/medium/low to error/warning/note
- startLine floored to 1 per SARIF spec requirement
- Registered under name 'sarif' via init()
- Filters struct: Provider, Verified (*bool), Limit, Offset
- ListFindingsFiltered: optional WHERE + ORDER BY created_at DESC, id DESC
- GetFinding: single-row lookup, propagates sql.ErrNoRows on miss
- DeleteFinding: returns RowsAffected so caller can distinguish hit/miss
- Shared scan/hydrate helpers decrypt key_value via existing Decrypt
- Renders findings as 2-space indented JSON array
- Honors Options.Unmask for key field exposure
- Omits empty verify fields via json omitempty
- Registers under "json" in output registry
- TableFormatter implements Formatter interface, registered as "table"
- Writes to arbitrary io.Writer instead of hardcoded os.Stdout
- Strips ANSI colors when writer is not a TTY or NO_COLOR is set
- Uses bundled tableStyles so plain/colored paths share one renderer
- PrintFindings retained as backward-compat wrapper delegating to Format
- When any finding has Verified=true, append a VERIFY column with colored
glyphs: ✓ live / ✗ dead / ⚠ rate / ! err / ? unk
- Per-finding VerifyMetadata is rendered on an indented secondary line
with deterministic (sorted) key ordering
- Backward compatible: unverified scans produce identical output to
pre-Phase-5 runs
- VerifyAll(ctx, findings, reg, workers) returns a result channel closed
after all findings are processed or ctx is cancelled.
- Default worker count of 10 when workers <= 0.
- Missing providers yield StatusUnknown with 'provider not found' error.
- Graceful context cancellation stops dispatch while still draining inflight.
- Add EnsureConsent(db, in, out) that returns (true, nil) immediately if
verify.consent==granted, otherwise prompts once, reads a line, persists
'granted' on 'yes' (case-insensitive), 'declined' otherwise.
- Declined is not sticky — next call re-prompts; only granted persists.
- Prompt references legal implications and directs users to 'keyhunter legal'.