Commit Graph

257 Commits

Author SHA1 Message Date
salvacybersec
75024e4701 feat(10-01): add shared retry HTTP client for recon sources
- Client.Do retries 429/403/5xx honoring Retry-After
- 401 returns ErrUnauthorized immediately (no retry)
- Context cancellation honored during retry sleeps
- Default UA keyhunter-recon/1.0, 30s timeout, 2 retries
2026-04-06 01:09:02 +03:00
salvacybersec
191bdee3bc docs(10-osint-code-hosting): create phase 10 plans (9 plans across 3 waves) 2026-04-06 01:07:15 +03:00
salvacybersec
cfe090a5c9 docs(10): OSINT code hosting context 2026-04-06 00:59:18 +03:00
salvacybersec
226274ca9e docs(phase-09): complete phase execution 2026-04-06 00:56:36 +03:00
salvacybersec
4b8599d959 docs(09-06): complete phase 9 OSINT infrastructure
- Add 09-06-SUMMARY.md (integration test + phase summary plan)
- Update STATE.md progress and metrics
- Update ROADMAP.md phase 09 status
- Mark RECON-INFRA-05/06/07/08 complete in REQUIREMENTS.md
2026-04-06 00:53:35 +03:00
salvacybersec
d29a7d30b2 docs(09-06): add phase 09 completion summary
Documents all 4 RECON-INFRA requirement IDs as complete, summarizes
decisions (per-source limiters, default-allow robots, SHA256 dedup,
UA pool of 10), lists handoff contract for Phases 10-16.
2026-04-06 00:52:20 +03:00
salvacybersec
a754ff7546 test(09-06): add recon pipeline integration test
- Exercises Engine + LimiterRegistry + Stealth + Dedup end-to-end
- testSource emits 5 findings with one duplicate pair (Dedup -> 4)
- TestRobotsOnlyWhenRespectsRobots asserts robots gating via httptest
- Covers RECON-INFRA-05/06/07/08
2026-04-06 00:51:08 +03:00
salvacybersec
0ff9edc6c1 docs(09-05): complete recon CLI command tree plan 2026-04-06 00:48:42 +03:00
salvacybersec
86a6bb864b feat(09-05): add recon full/list commands and remove stub
- cmd/recon.go owns reconCmd with full and list subcommands
- Wires pkg/recon.Engine.SweepAll + Dedup with ExampleSource registered
- Adds --stealth, --respect-robots (default true), --query flags
- Removes reconCmd stub from cmd/stubs.go
2026-04-06 00:47:32 +03:00
salvacybersec
c2137edc41 merge: plan 09-03 stealth+dedup 2026-04-06 00:45:13 +03:00
salvacybersec
1eb86ca308 docs(09-03): complete stealth UA pool and dedup plan
- Stealth UA pool (10 browsers) + RandomUserAgent/StealthHeaders
- Stable cross-source Dedup keyed by sha256(provider|masked|source)
- Mark RECON-INFRA-06 complete
2026-04-06 00:44:37 +03:00
salvacybersec
fb1e7f8bf5 docs(09-01): complete recon framework foundation plan 2026-04-06 00:44:04 +03:00
salvacybersec
4dbc38dcc5 docs(09-04): complete robots.txt cache plan
Adds SUMMARY, marks RECON-INFRA-07 complete, updates phase 9 roadmap.
2026-04-06 00:43:49 +03:00
salvacybersec
2988fdf9b3 feat(09-03): implement stable cross-source finding Dedup
- Dedup drops duplicates keyed by sha256(ProviderName|KeyMasked|Source)
- Preserves input order and first-seen metadata (stable dedup)
- Same provider+masked with different Source URLs are kept separate
- Uses engine.Finding directly to avoid alias collision with Plan 09-01
2026-04-06 00:43:07 +03:00
salvacybersec
851b2432b8 feat(09-01): add Engine with parallel fanout and ExampleSource
- Engine.Register/List/SweepAll with ants pool fanout
- ExampleSource emits two deterministic findings (SourceType=recon:example)
- Tests cover Register/List idempotency, SweepAll aggregation, empty-registry,
  and Enabled() filtering
2026-04-06 00:42:51 +03:00
salvacybersec
ecfa2bff28 test(09-03): add failing test for cross-source Dedup 2026-04-06 00:42:45 +03:00
salvacybersec
0373931490 feat(09-04): implement RobotsCache with 1h per-host TTL
- Parses robots.txt via temoto/robotstxt
- Caches per host for 1 hour; second call within TTL skips HTTP fetch
- Default-allow on network/parse/4xx/5xx errors
- Matches 'keyhunter' user-agent against disallowed paths
- Client field allows httptest injection

Satisfies RECON-INFRA-07.
2026-04-06 00:42:33 +03:00
salvacybersec
2c140e9661 feat(09-03): implement stealth UA pool and StealthHeaders
- Pool of 10 realistic browser User-Agents (Chrome/Firefox/Safari/Edge)
- Covers Windows, macOS, Linux, iOS, Android
- RandomUserAgent returns a random pool entry
- StealthHeaders returns UA + Accept-Language header map
2026-04-06 00:42:22 +03:00
salvacybersec
1d5d12740c docs(09-02): complete LimiterRegistry plan 2026-04-06 00:42:15 +03:00
salvacybersec
4bd6c6b05f test(09-04): add failing tests for RobotsCache
- Allowed/Disallowed path matching
- Cache hit counter assertion
- Default-allow on 5xx network error
- keyhunter UA matching precedence
2026-04-06 00:42:03 +03:00
salvacybersec
bbbc05fa46 test(09-03): add failing test for stealth UA pool 2026-04-06 00:41:55 +03:00
salvacybersec
590fc33955 feat(09-02): add LimiterRegistry with per-source rate limiters and jitter
- NewLimiterRegistry + For(name, rate, burst) idempotent lookup
- Wait blocks on token then applies 100ms-1s jitter when stealth
- Per-source isolation (RECON-INFRA-05), ctx cancellation honored
- Tests: isolation, idempotency, ctx cancel, jitter range, no-jitter
2026-04-06 00:41:33 +03:00
salvacybersec
10af12d358 feat(09-01): add ReconSource interface and Config
- Define ReconSource interface: Name/RateLimit/Burst/RespectsRobots/Enabled/Sweep
- Alias recon.Finding = engine.Finding for shared storage path
- Config struct carries Stealth, RespectRobots, EnabledSources, Query
2026-04-06 00:40:46 +03:00
salvacybersec
c3b9fb4043 chore(09-04): add github.com/temoto/robotstxt dependency
- Added temoto/robotstxt v1.1.2 for robots.txt parsing in recon sources
2026-04-06 00:40:39 +03:00
salvacybersec
ff128c8063 docs(09): create phase plan 2026-04-06 00:39:27 +03:00
salvacybersec
72414e090a docs(09): OSINT infrastructure context 2026-04-06 00:33:44 +03:00
salvacybersec
ed25d9806d docs(phase-08): complete phase execution 2026-04-06 00:32:47 +03:00
salvacybersec
84cfa17c39 docs(08-06): complete dorks CLI command tree plan 2026-04-06 00:28:56 +03:00
salvacybersec
c281c96040 feat(08-06): add dorks run/add/delete with injectable executor
- Add run subcommand dispatching via dorks.Runner (github live,
  other sources wrapped into friendly ErrSourceNotImplemented)
- Add add subcommand with source/category validation and embedded
  ID collision guard
- Add delete subcommand that refuses embedded dork ids
- Expose newGitHubExecutor as package var for test injection
- cmd/dorks_test.go covers list filtering, add persistence + list
  merge marker, invalid source rejection, embedded collision,
  embedded delete refusal, custom delete, shodan not-implemented
  path, GitHub missing-token auth hint, fake executor run, yaml
  export merge, and info for both origins

Completes DORK-03 (list/run/add/export/info/delete) and DORK-04
(--source/--category filtering).
2026-04-06 00:27:41 +03:00
salvacybersec
b7934ce169 feat(08-06): add dorks list/info/export commands
- Replace cmd/stubs.go dorksCmd stub with full command tree
- Add cmd/dorks.go with list, info, export subcommands
- Wire Registry + custom_dorks merge for list/export
- Bind GITHUB_TOKEN env var via viper for downstream run

Satisfies part of DORK-03 (list/info/export) and DORK-04 (source/category
filtering). run/add/delete land in Task 2.
2026-04-06 00:26:36 +03:00
salvacybersec
f9e3ad99f8 docs(08-07): complete dork guardrail test plan 2026-04-06 00:25:55 +03:00
salvacybersec
2c554b9c9c test(08-07): add dork count + uniqueness guardrail
- TestDorkCountGuardrail: enforces DORK-02 >=150 floor
- TestDorkCountPerSource: per-source minimums (github>=50, google>=30, shodan>=20, censys>=15, zoomeye/fofa/gitlab>=10, bing>=5)
- TestDorkCategoriesPresent: all 5 DORK-01 categories present
- TestDorkIDsUnique: no collisions across source files
2026-04-06 00:24:51 +03:00
salvacybersec
3a1ee18198 docs(08-05): complete GitHub Code Search live executor plan
- GitHubExecutor implements Executor interface against api.github.com/search/code
- Retry-After honored once for 403/429; ctx cancel respected during sleep
- ErrMissingAuth wrapped for empty token AND 401 server response
- 8 httptest-backed subtests cover success/limit-cap/retry/rate-limit/401/422/source
- Zero new dependencies (stdlib net/http + net/url only)
2026-04-06 00:23:16 +03:00
salvacybersec
2617b22753 docs(08-03): complete Google + Shodan dorks plan
- 30 Google + 20 Shodan dorks delivered
- Requirements DORK-01, DORK-02, DORK-04 marked complete
- SUMMARY.md records list-format YAML + dual-location mirror pattern
2026-04-06 00:22:38 +03:00
salvacybersec
213177ddf4 docs(08-04): complete censys+zoomeye+fofa+gitlab+bing dorks plan 2026-04-06 00:22:36 +03:00
salvacybersec
17f17edf1e docs(08-02): complete 50 GitHub dorks plan 2026-04-06 00:22:13 +03:00
salvacybersec
c504cbd5d3 feat(08-04): add 10 FOFA + 10 GitLab + 5 Bing dorks
- 10 FOFA queries using title=/body=/port=/cert= syntax (8 infrastructure
  + 2 frontier: Azure OpenAI cert, OpenAI proxy api_key leak)
- 10 GitLab code search dorks across frontier/specialized/infrastructure/
  emerging categories (OpenAI, Anthropic, Google AI, Groq, Cohere, HF,
  OpenRouter, Perplexity, DeepSeek, Pinecone)
- 5 Bing dorks using site:/filetype:/intitle:/inbody: operators
  (3 frontier + 1 specialized + 1 infrastructure)
- Brings grand total across all 8 sources to 150 dorks, satisfying DORK-02
- Dual-located under pkg/dorks/definitions/ and dorks/
2026-04-06 00:21:41 +03:00
salvacybersec
1c86800c14 feat(08-04): add 15 Censys + 10 ZoomEye dorks
- 15 Censys Search 2.0 queries for Ollama, vLLM, LocalAI, Open WebUI,
  LM Studio, Triton, TGI, LiteLLM, Portkey, LangServe, FastChat,
  text-generation-webui, Azure OpenAI certs, Bedrock certs, and OpenAI
  proxies (12 infrastructure + 3 frontier)
- 10 ZoomEye app/title/port/service queries covering the same LLM
  infrastructure surface (9 infrastructure + 1 frontier)
- Dual-located under pkg/dorks/definitions/ (embedded) and dorks/ (repo root)
2026-04-06 00:21:34 +03:00
salvacybersec
56c11e39a0 feat(08-03): add 20 Shodan dorks for exposed LLM infrastructure
- frontier.yaml: 6 dorks (OpenAI/Anthropic proxies, Azure OpenAI certs, AWS Bedrock, LiteLLM)
- infrastructure.yaml: 14 dorks (Ollama, vLLM, LocalAI, LM Studio, text-generation-webui, Open WebUI, Triton, TGI, LangServe, FastChat, OpenRouter/Portkey/Helicone gateways)
- Real Shodan query syntax: http.title, http.html, ssl.cert.subject.cn, product, port, http.component
- Dual-located: pkg/dorks/definitions/shodan/ + dorks/shodan/
2026-04-06 00:21:03 +03:00
salvacybersec
348d1c057b feat(08-03): add 30 Google dorks across 3 categories
- frontier.yaml: 12 dorks (OpenAI, Anthropic, Google AI, Groq, Cohere, Mistral, xAI, Replicate)
- specialized.yaml: 10 dorks (Perplexity, HF, ElevenLabs, Deepgram, AssemblyAI, Stability, Jina, Voyage)
- infrastructure.yaml: 8 dorks (OpenRouter, LiteLLM, Helicone, Portkey, Ollama, vLLM, LocalAI)
- Real site:/filetype:/intitle:/inurl: operators, no templating
- Dual-located: pkg/dorks/definitions/google/ (go:embed) + dorks/google/ (user-visible)
2026-04-06 00:20:56 +03:00
salvacybersec
9755b3756a feat(08-02): add 25 GitHub dorks for infrastructure, emerging, enterprise categories
- infrastructure.yaml: 10 dorks covering Tier 5 gateways (OpenRouter,
  LiteLLM, Portkey, Helicone, Cloudflare AI, Vercel AI) and Tier 8
  self-hosted (Ollama, vLLM, LocalAI)
- emerging.yaml: 10 dorks covering Tier 4 Chinese providers (DeepSeek,
  Moonshot, Qwen, Zhipu, MiniMax) and Tier 6 vector DBs (Pinecone,
  Weaviate, Qdrant, Chroma) plus Writer.com
- enterprise.yaml: 5 dorks covering Tier 7 dev tools (Codeium, Tabnine)
  and Tier 9 enterprise (Databricks, Snowflake Cortex, IBM watsonx)
- Registry now loads 50 total GitHub dorks across all 5 categories,
  mirrored in both dorks/github/ and pkg/dorks/definitions/github/
2026-04-06 00:20:52 +03:00
salvacybersec
09722eaec4 feat(08-02): add 25 GitHub dorks for frontier and specialized categories
- frontier.yaml: 15 dorks covering Tier 1/2 providers (OpenAI, Anthropic,
  Google AI, Azure OpenAI, AWS Bedrock, xAI, Cohere, Mistral, Groq,
  Together, Replicate)
- specialized.yaml: 10 dorks covering Tier 3 providers (Perplexity,
  Voyage, Jina, AssemblyAI, Deepgram, ElevenLabs, Stability, HuggingFace)
- Extend loader to accept YAML list format in addition to single-dork
  mapping, enabling multi-dork files for Wave 2+ plans
- Mirror all YAMLs into dorks/github/ (user-visible) and
  pkg/dorks/definitions/github/ (go:embed target)
2026-04-06 00:20:43 +03:00
salvacybersec
2dc7078708 docs(08-01): complete dork engine foundation plan
SUMMARY, STATE, ROADMAP, and REQUIREMENTS updates for pkg/dorks
foundation + custom_dorks storage (DORK-01, DORK-03).
2026-04-06 00:17:53 +03:00
salvacybersec
01062b88b1 feat(08-01): add custom_dorks table and CRUD for user-authored dorks
- schema.sql: CREATE TABLE IF NOT EXISTS custom_dorks with unique dork_id,
  source/category indexes, and tags stored as JSON TEXT
- custom_dorks.go: Save/List/Get/GetByDorkID/Delete with JSON tag round-trip
- Tests: round-trip, newest-first ordering, not-found, unique constraint,
  delete no-op, schema migration idempotency
2026-04-06 00:16:33 +03:00
salvacybersec
fd6efbb4c2 feat(08-01): add pkg/dorks foundation (schema, loader, registry, executor)
- Dork schema with Validate() mirroring provider YAML pattern
- go:embed loader tolerating empty definitions tree
- Registry with List/Get/Stats/ListBySource/ListByCategory
- Executor interface + Runner dispatch + ErrSourceNotImplemented
- Placeholder definitions/.gitkeep and repo-root dorks/.gitkeep
- Full unit test coverage for registry, validation, and runner dispatch
2026-04-06 00:15:32 +03:00
salvacybersec
46cf55ad37 docs(08): create phase plan 2026-04-06 00:13:13 +03:00
salvacybersec
4c2081821f docs(08): dork engine context 2026-04-06 00:05:59 +03:00
salvacybersec
436791f263 docs(phase-07): complete phase execution 2026-04-06 00:05:04 +03:00
salvacybersec
ca526d8e32 docs(07-04): complete import command plan 2026-04-06 00:00:24 +03:00
salvacybersec
9dbb0b87d4 feat(07-04): wire keyhunter import command with dedup and DB persist
- Replace import stub with cmd/import.go dispatching to pkg/importer
  (trufflehog, gitleaks, gitleaks-csv) via --format flag
- Reuse openDBWithKey helper so encryption + path resolution match scan/keys
- engineToStorage converts engine.Finding -> storage.Finding (Source -> SourcePath)
- Add pkg/storage.FindingExistsByKey for idempotent cross-import dedup
  keyed on (provider, masked key, source path, line number)
- cmd/import_test.go: selector table, field conversion, end-to-end trufflehog
  import with re-run duplicate assertion, unknown-format + missing-file errors
- pkg/storage queries_test: FindingExistsByKey hit and four miss cases

Delivers IMP-01/02/03 end-to-end.
2026-04-05 23:59:39 +03:00