salvacybersec
270bbbfb49
feat(12-02): implement FOFA, Netlas, BinaryEdge recon sources
...
- FOFASource searches FOFA API with base64-encoded queries (email+key auth)
- NetlasSource searches Netlas API with X-API-Key header auth
- BinaryEdgeSource searches BinaryEdge API with X-Key header auth
- All three implement recon.ReconSource with shared Client retry/backoff
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-06 12:24:04 +03:00
salvacybersec
bebc3e7a0b
test(11-03): add end-to-end SweepAll integration test across all 18 sources
...
- Extend httptest mux with fixtures for Google, Bing, DuckDuckGo, Yandex, Brave
- Add Pastebin (routed /pb/), GistPaste (/gp/), PasteSites (injected platform)
- Assert all 18 SourceTypes emit at least one finding via SweepAll
2026-04-06 12:06:27 +03:00
salvacybersec
3250408f23
feat(11-03): wire 18 sources into RegisterAll + credential wiring in cmd/recon.go
...
- Extend SourcesConfig with GoogleAPIKey, GoogleCX, BingAPIKey, YandexUser, YandexAPIKey, BraveAPIKey
- RegisterAll registers 8 Phase 11 sources alongside 10 Phase 10 sources (18 total)
- cmd/recon.go reads search engine API keys from env vars and viper config
- Guardrail tests updated to assert 18 sources
2026-04-06 12:02:11 +03:00
salvacybersec
a53d952518
Merge branch 'worktree-agent-a27c3406'
2026-04-06 11:58:19 +03:00
salvacybersec
ed148d47e1
feat(11-02): add PasteSitesSource multi-paste aggregator
...
- Aggregates dpaste, paste.ee, rentry, hastebin into single source
- Follows SandboxesSource multi-platform pattern with per-platform error isolation
- Two-phase search+raw-fetch with keyword matching against provider registry
2026-04-06 11:55:44 +03:00
salvacybersec
770705302c
feat(11-01): add DuckDuckGoSource, YandexSource, and BraveSource
...
- DuckDuckGoSource scrapes HTML search (no API key, always enabled, RespectsRobots=true)
- YandexSource uses Yandex XML Search API (user+key required, XML response parsing)
- BraveSource uses Brave Search API (X-Subscription-Token header, JSON response)
- All three follow established error handling: 401 aborts, transient continues, ctx cancellation returns
2026-04-06 11:54:42 +03:00
salvacybersec
7272e65207
feat(11-01): add GoogleDorkSource and BingDorkSource with formatQuery updates
...
- GoogleDorkSource uses Google Custom Search JSON API (APIKey+CX required)
- BingDorkSource uses Bing Web Search API v7 (Ocp-Apim-Subscription-Key header)
- formatQuery now handles google/bing/duckduckgo/yandex/brave dork syntax
- Both sources follow established pattern: retry via Client, rate limit via LimiterRegistry
2026-04-06 11:54:36 +03:00
salvacybersec
3c500b5473
feat(11-02): add PastebinSource and GistPasteSource for paste site scanning
...
- PastebinSource: two-phase search+raw-fetch with keyword matching
- GistPasteSource: scrapes gist.github.com public search (no auth)
- Both implement recon.ReconSource with httptest-based tests
2026-04-06 11:53:00 +03:00
salvacybersec
118decbb3e
fix(phase-10): add --sources filter flag and DB persistence to recon full
...
Closes 2 verification gaps:
1. --sources=github,gitlab flag filters registered sources before sweep
2. Findings persisted to SQLite via storage.SaveFinding after dedup
Also adds Engine.Get() method for source lookup by name.
2026-04-06 11:36:19 +03:00
salvacybersec
8528108613
test(10-09): add end-to-end SweepAll integration test across all ten sources
2026-04-06 01:26:13 +03:00
salvacybersec
fb3e57382e
feat(10-09): wire all ten Phase 10 sources in RegisterAll
2026-04-06 01:24:22 +03:00
salvacybersec
4628ccfe90
test(10-09): add failing RegisterAll wiring tests
2026-04-06 01:23:26 +03:00
salvacybersec
a034eeb14c
Merge branch 'worktree-agent-ad7ef8d3'
2026-04-06 01:20:33 +03:00
salvacybersec
a0b8f99a7f
Merge branch 'worktree-agent-ac81d6ab'
2026-04-06 01:20:25 +03:00
salvacybersec
430ace9a9a
Merge branch 'worktree-agent-a2637f83'
2026-04-06 01:20:25 +03:00
salvacybersec
91becd961f
Merge branch 'worktree-agent-a7f84823'
2026-04-06 01:20:25 +03:00
salvacybersec
6928ca4e70
Merge branch 'worktree-agent-a2fe7ff3'
2026-04-06 01:20:25 +03:00
salvacybersec
ecebffd27d
feat(10-07): add SandboxesSource aggregator (codepen/jsfiddle/stackblitz/glitch/observable)
...
- Single ReconSource umbrella iterating per-platform HTML or JSON search endpoints
- Per-platform failures logged and skipped (log-and-continue); ctx cancel aborts fast
- Sub-platform identifier encoded in Finding.KeyMasked as 'platform=<name>' (pragmatic slot)
- Gitpod intentionally omitted (no public search)
- 5 httptest-backed tests covering HTML+JSON extraction, platform-failure tolerance, ctx cancel
2026-04-06 01:18:15 +03:00
salvacybersec
4fafc01052
feat(10-05): implement CodebergSource for Gitea REST API
...
- Add CodebergSource targeting /api/v1/repos/search (Codeberg + any Gitea)
- Public API by default; Authorization: token <t> when Token set
- Unauth rate limit 60/hour, authenticated ~1000/hour
- Emit Findings keyed to repo html_url with SourceType=recon:codeberg
- Keyword index maps BuildQueries output back to ProviderName
- httptest coverage: name/interface, rate limits (both modes),
sweep decoding, header presence/absence, ctx cancellation
2026-04-06 01:17:25 +03:00
salvacybersec
0e16e8ea4c
feat(10-04): add GistSource for public gist keyword recon
...
- GistSource implements recon.ReconSource (RECON-CODE-04)
- Lists /gists/public?per_page=100, fetches each file's raw content,
scans against provider keyword set, emits one Finding per matching gist
- Disabled when GitHub token empty
- Rate: rate.Every(2s), burst 1 (30 req/min GitHub limit)
- 256KB read cap per file; skips gists without keyword matches
- httptest coverage: enable gating, sweep match, no-match, 401, ctx cancel
2026-04-06 01:17:07 +03:00
salvacybersec
62a347f476
feat(10-07): add Replit and CodeSandbox scraping sources
...
- ReplitSource scrapes /search HTML extracting /@user/repl anchors
- CodeSandboxSource scrapes /search HTML extracting /s/slug anchors
- Both use golang.org/x/net/html parser, 10 req/min rate, RespectsRobots=true
- 10 httptest-backed tests covering extraction, ctx cancel, rate/name assertions
2026-04-06 01:16:39 +03:00
salvacybersec
ab636dc5e1
fix(10-02): stabilize GitHubSource provider-name test
2026-04-06 01:15:51 +03:00
salvacybersec
0137dc57b1
feat(10-03): add GitLabSource for /api/v4/search blobs
...
- Implements recon.ReconSource against GitLab Search API
- PRIVATE-TOKEN header auth; rate.Every(30ms) burst 5 (~2000/min)
- Disabled when token empty; Sweep returns nil without calls
- Emits Finding per blob with Source=/projects/<id>/-/blob/<ref>/<path>
- 401 wrapped as ErrUnauthorized; ctx cancellation honored
- httptest coverage: enabled gating, happy path, 401, ctx cancel, iface assert
2026-04-06 01:15:49 +03:00
salvacybersec
39001f208c
feat(10-06): implement HuggingFaceSource scanning Spaces and Models
...
- queries /api/spaces and /api/models via Hub API
- token optional: slower rate when absent (10s vs 3.6s)
- emits Findings with SourceType=recon:huggingface and prefixed Source URLs
- compile-time assert implements recon.ReconSource
2026-04-06 01:15:49 +03:00
salvacybersec
45f8782464
test(10-06): add failing tests for HuggingFaceSource
...
- httptest server routes /api/spaces and /api/models
- assertions: enabled, both endpoints hit, URL prefixes, auth header, ctx cancel, rate-limit token mode
2026-04-06 01:15:43 +03:00
salvacybersec
d279abf449
feat(10-04): add BitbucketSource for code search recon
...
- BitbucketSource implements recon.ReconSource (RECON-CODE-03)
- Queries /2.0/workspaces/{ws}/search/code with Bearer auth
- Disabled when token OR workspace empty
- Rate: rate.Every(3.6s), burst 1 (Bitbucket 1000/hr limit)
- httptest coverage: enable gating, sweep, 401, ctx cancel
2026-04-06 01:15:42 +03:00
salvacybersec
243b7405cd
feat(10-08): add KaggleSource with HTTP Basic auth
...
- KaggleSource queries /api/v1/kernels/list with SetBasicAuth(user, key)
- Disabled when either KaggleUser or KaggleKey is empty (no HTTP calls)
- Emits Findings tagged recon:kaggle with Source = <web>/code/<ref>
- 60/min rate limit via rate.Every(1s), burst 1
- httptest-driven tests cover enabled, auth header, missing creds,
401 unauthorized, and ctx cancellation
- RECON-CODE-09
2026-04-06 01:15:23 +03:00
salvacybersec
fb6cb53975
feat(10-02): implement GitHubSource recon.ReconSource
2026-04-06 01:14:52 +03:00
salvacybersec
03deb603b3
test(10-02): add failing tests for GitHubSource
2026-04-06 01:12:56 +03:00
salvacybersec
9273f356e6
feat(10-01): add provider-driven query generator and RegisterAll skeleton
...
- BuildQueries(reg, source) dedups keywords and formats per-source syntax
- github/gist use 'keyword' in:file; others use bare keyword
- SourcesConfig placeholder struct for Wave 2 plans to depend on
- RegisterAll no-op stub (Plan 10-09 will fill)
2026-04-06 01:09:57 +03:00
salvacybersec
75024e4701
feat(10-01): add shared retry HTTP client for recon sources
...
- Client.Do retries 429/403/5xx honoring Retry-After
- 401 returns ErrUnauthorized immediately (no retry)
- Context cancellation honored during retry sleeps
- Default UA keyhunter-recon/1.0, 30s timeout, 2 retries
2026-04-06 01:09:02 +03:00
salvacybersec
a754ff7546
test(09-06): add recon pipeline integration test
...
- Exercises Engine + LimiterRegistry + Stealth + Dedup end-to-end
- testSource emits 5 findings with one duplicate pair (Dedup -> 4)
- TestRobotsOnlyWhenRespectsRobots asserts robots gating via httptest
- Covers RECON-INFRA-05/06/07/08
2026-04-06 00:51:08 +03:00
salvacybersec
c2137edc41
merge: plan 09-03 stealth+dedup
2026-04-06 00:45:13 +03:00
salvacybersec
2988fdf9b3
feat(09-03): implement stable cross-source finding Dedup
...
- Dedup drops duplicates keyed by sha256(ProviderName|KeyMasked|Source)
- Preserves input order and first-seen metadata (stable dedup)
- Same provider+masked with different Source URLs are kept separate
- Uses engine.Finding directly to avoid alias collision with Plan 09-01
2026-04-06 00:43:07 +03:00
salvacybersec
851b2432b8
feat(09-01): add Engine with parallel fanout and ExampleSource
...
- Engine.Register/List/SweepAll with ants pool fanout
- ExampleSource emits two deterministic findings (SourceType=recon:example)
- Tests cover Register/List idempotency, SweepAll aggregation, empty-registry,
and Enabled() filtering
2026-04-06 00:42:51 +03:00
salvacybersec
ecfa2bff28
test(09-03): add failing test for cross-source Dedup
2026-04-06 00:42:45 +03:00
salvacybersec
0373931490
feat(09-04): implement RobotsCache with 1h per-host TTL
...
- Parses robots.txt via temoto/robotstxt
- Caches per host for 1 hour; second call within TTL skips HTTP fetch
- Default-allow on network/parse/4xx/5xx errors
- Matches 'keyhunter' user-agent against disallowed paths
- Client field allows httptest injection
Satisfies RECON-INFRA-07.
2026-04-06 00:42:33 +03:00
salvacybersec
2c140e9661
feat(09-03): implement stealth UA pool and StealthHeaders
...
- Pool of 10 realistic browser User-Agents (Chrome/Firefox/Safari/Edge)
- Covers Windows, macOS, Linux, iOS, Android
- RandomUserAgent returns a random pool entry
- StealthHeaders returns UA + Accept-Language header map
2026-04-06 00:42:22 +03:00
salvacybersec
4bd6c6b05f
test(09-04): add failing tests for RobotsCache
...
- Allowed/Disallowed path matching
- Cache hit counter assertion
- Default-allow on 5xx network error
- keyhunter UA matching precedence
2026-04-06 00:42:03 +03:00
salvacybersec
bbbc05fa46
test(09-03): add failing test for stealth UA pool
2026-04-06 00:41:55 +03:00
salvacybersec
590fc33955
feat(09-02): add LimiterRegistry with per-source rate limiters and jitter
...
- NewLimiterRegistry + For(name, rate, burst) idempotent lookup
- Wait blocks on token then applies 100ms-1s jitter when stealth
- Per-source isolation (RECON-INFRA-05), ctx cancellation honored
- Tests: isolation, idempotency, ctx cancel, jitter range, no-jitter
2026-04-06 00:41:33 +03:00
salvacybersec
10af12d358
feat(09-01): add ReconSource interface and Config
...
- Define ReconSource interface: Name/RateLimit/Burst/RespectsRobots/Enabled/Sweep
- Alias recon.Finding = engine.Finding for shared storage path
- Config struct carries Stealth, RespectRobots, EnabledSources, Query
2026-04-06 00:40:46 +03:00
salvacybersec
2c554b9c9c
test(08-07): add dork count + uniqueness guardrail
...
- TestDorkCountGuardrail: enforces DORK-02 >=150 floor
- TestDorkCountPerSource: per-source minimums (github>=50, google>=30, shodan>=20, censys>=15, zoomeye/fofa/gitlab>=10, bing>=5)
- TestDorkCategoriesPresent: all 5 DORK-01 categories present
- TestDorkIDsUnique: no collisions across source files
2026-04-06 00:24:51 +03:00
salvacybersec
c504cbd5d3
feat(08-04): add 10 FOFA + 10 GitLab + 5 Bing dorks
...
- 10 FOFA queries using title=/body=/port=/cert= syntax (8 infrastructure
+ 2 frontier: Azure OpenAI cert, OpenAI proxy api_key leak)
- 10 GitLab code search dorks across frontier/specialized/infrastructure/
emerging categories (OpenAI, Anthropic, Google AI, Groq, Cohere, HF,
OpenRouter, Perplexity, DeepSeek, Pinecone)
- 5 Bing dorks using site:/filetype:/intitle:/inbody: operators
(3 frontier + 1 specialized + 1 infrastructure)
- Brings grand total across all 8 sources to 150 dorks, satisfying DORK-02
- Dual-located under pkg/dorks/definitions/ and dorks/
2026-04-06 00:21:41 +03:00
salvacybersec
1c86800c14
feat(08-04): add 15 Censys + 10 ZoomEye dorks
...
- 15 Censys Search 2.0 queries for Ollama, vLLM, LocalAI, Open WebUI,
LM Studio, Triton, TGI, LiteLLM, Portkey, LangServe, FastChat,
text-generation-webui, Azure OpenAI certs, Bedrock certs, and OpenAI
proxies (12 infrastructure + 3 frontier)
- 10 ZoomEye app/title/port/service queries covering the same LLM
infrastructure surface (9 infrastructure + 1 frontier)
- Dual-located under pkg/dorks/definitions/ (embedded) and dorks/ (repo root)
2026-04-06 00:21:34 +03:00
salvacybersec
56c11e39a0
feat(08-03): add 20 Shodan dorks for exposed LLM infrastructure
...
- frontier.yaml: 6 dorks (OpenAI/Anthropic proxies, Azure OpenAI certs, AWS Bedrock, LiteLLM)
- infrastructure.yaml: 14 dorks (Ollama, vLLM, LocalAI, LM Studio, text-generation-webui, Open WebUI, Triton, TGI, LangServe, FastChat, OpenRouter/Portkey/Helicone gateways)
- Real Shodan query syntax: http.title, http.html, ssl.cert.subject.cn, product, port, http.component
- Dual-located: pkg/dorks/definitions/shodan/ + dorks/shodan/
2026-04-06 00:21:03 +03:00
salvacybersec
348d1c057b
feat(08-03): add 30 Google dorks across 3 categories
...
- frontier.yaml: 12 dorks (OpenAI, Anthropic, Google AI, Groq, Cohere, Mistral, xAI, Replicate)
- specialized.yaml: 10 dorks (Perplexity, HF, ElevenLabs, Deepgram, AssemblyAI, Stability, Jina, Voyage)
- infrastructure.yaml: 8 dorks (OpenRouter, LiteLLM, Helicone, Portkey, Ollama, vLLM, LocalAI)
- Real site:/filetype:/intitle:/inurl: operators, no templating
- Dual-located: pkg/dorks/definitions/google/ (go:embed) + dorks/google/ (user-visible)
2026-04-06 00:20:56 +03:00
salvacybersec
9755b3756a
feat(08-02): add 25 GitHub dorks for infrastructure, emerging, enterprise categories
...
- infrastructure.yaml: 10 dorks covering Tier 5 gateways (OpenRouter,
LiteLLM, Portkey, Helicone, Cloudflare AI, Vercel AI) and Tier 8
self-hosted (Ollama, vLLM, LocalAI)
- emerging.yaml: 10 dorks covering Tier 4 Chinese providers (DeepSeek,
Moonshot, Qwen, Zhipu, MiniMax) and Tier 6 vector DBs (Pinecone,
Weaviate, Qdrant, Chroma) plus Writer.com
- enterprise.yaml: 5 dorks covering Tier 7 dev tools (Codeium, Tabnine)
and Tier 9 enterprise (Databricks, Snowflake Cortex, IBM watsonx)
- Registry now loads 50 total GitHub dorks across all 5 categories,
mirrored in both dorks/github/ and pkg/dorks/definitions/github/
2026-04-06 00:20:52 +03:00
salvacybersec
09722eaec4
feat(08-02): add 25 GitHub dorks for frontier and specialized categories
...
- frontier.yaml: 15 dorks covering Tier 1/2 providers (OpenAI, Anthropic,
Google AI, Azure OpenAI, AWS Bedrock, xAI, Cohere, Mistral, Groq,
Together, Replicate)
- specialized.yaml: 10 dorks covering Tier 3 providers (Perplexity,
Voyage, Jina, AssemblyAI, Deepgram, ElevenLabs, Stability, HuggingFace)
- Extend loader to accept YAML list format in addition to single-dork
mapping, enabling multi-dork files for Wave 2+ plans
- Mirror all YAMLs into dorks/github/ (user-visible) and
pkg/dorks/definitions/github/ (go:embed target)
2026-04-06 00:20:43 +03:00
salvacybersec
01062b88b1
feat(08-01): add custom_dorks table and CRUD for user-authored dorks
...
- schema.sql: CREATE TABLE IF NOT EXISTS custom_dorks with unique dork_id,
source/category indexes, and tags stored as JSON TEXT
- custom_dorks.go: Save/List/Get/GetByDorkID/Delete with JSON tag round-trip
- Tests: round-trip, newest-first ordering, not-found, unique constraint,
delete no-op, schema migration idempotency
2026-04-06 00:16:33 +03:00