salvacybersec
4fafc01052
feat(10-05): implement CodebergSource for Gitea REST API
...
- Add CodebergSource targeting /api/v1/repos/search (Codeberg + any Gitea)
- Public API by default; Authorization: token <t> when Token set
- Unauth rate limit 60/hour, authenticated ~1000/hour
- Emit Findings keyed to repo html_url with SourceType=recon:codeberg
- Keyword index maps BuildQueries output back to ProviderName
- httptest coverage: name/interface, rate limits (both modes),
sweep decoding, header presence/absence, ctx cancellation
2026-04-06 01:17:25 +03:00
salvacybersec
9273f356e6
feat(10-01): add provider-driven query generator and RegisterAll skeleton
...
- BuildQueries(reg, source) dedups keywords and formats per-source syntax
- github/gist use 'keyword' in:file; others use bare keyword
- SourcesConfig placeholder struct for Wave 2 plans to depend on
- RegisterAll no-op stub (Plan 10-09 will fill)
2026-04-06 01:09:57 +03:00
salvacybersec
75024e4701
feat(10-01): add shared retry HTTP client for recon sources
...
- Client.Do retries 429/403/5xx honoring Retry-After
- 401 returns ErrUnauthorized immediately (no retry)
- Context cancellation honored during retry sleeps
- Default UA keyhunter-recon/1.0, 30s timeout, 2 retries
2026-04-06 01:09:02 +03:00
salvacybersec
a754ff7546
test(09-06): add recon pipeline integration test
...
- Exercises Engine + LimiterRegistry + Stealth + Dedup end-to-end
- testSource emits 5 findings with one duplicate pair (Dedup -> 4)
- TestRobotsOnlyWhenRespectsRobots asserts robots gating via httptest
- Covers RECON-INFRA-05/06/07/08
2026-04-06 00:51:08 +03:00
salvacybersec
c2137edc41
merge: plan 09-03 stealth+dedup
2026-04-06 00:45:13 +03:00
salvacybersec
2988fdf9b3
feat(09-03): implement stable cross-source finding Dedup
...
- Dedup drops duplicates keyed by sha256(ProviderName|KeyMasked|Source)
- Preserves input order and first-seen metadata (stable dedup)
- Same provider+masked with different Source URLs are kept separate
- Uses engine.Finding directly to avoid alias collision with Plan 09-01
2026-04-06 00:43:07 +03:00
salvacybersec
851b2432b8
feat(09-01): add Engine with parallel fanout and ExampleSource
...
- Engine.Register/List/SweepAll with ants pool fanout
- ExampleSource emits two deterministic findings (SourceType=recon:example)
- Tests cover Register/List idempotency, SweepAll aggregation, empty-registry,
and Enabled() filtering
2026-04-06 00:42:51 +03:00
salvacybersec
ecfa2bff28
test(09-03): add failing test for cross-source Dedup
2026-04-06 00:42:45 +03:00
salvacybersec
0373931490
feat(09-04): implement RobotsCache with 1h per-host TTL
...
- Parses robots.txt via temoto/robotstxt
- Caches per host for 1 hour; second call within TTL skips HTTP fetch
- Default-allow on network/parse/4xx/5xx errors
- Matches 'keyhunter' user-agent against disallowed paths
- Client field allows httptest injection
Satisfies RECON-INFRA-07.
2026-04-06 00:42:33 +03:00
salvacybersec
2c140e9661
feat(09-03): implement stealth UA pool and StealthHeaders
...
- Pool of 10 realistic browser User-Agents (Chrome/Firefox/Safari/Edge)
- Covers Windows, macOS, Linux, iOS, Android
- RandomUserAgent returns a random pool entry
- StealthHeaders returns UA + Accept-Language header map
2026-04-06 00:42:22 +03:00
salvacybersec
4bd6c6b05f
test(09-04): add failing tests for RobotsCache
...
- Allowed/Disallowed path matching
- Cache hit counter assertion
- Default-allow on 5xx network error
- keyhunter UA matching precedence
2026-04-06 00:42:03 +03:00
salvacybersec
bbbc05fa46
test(09-03): add failing test for stealth UA pool
2026-04-06 00:41:55 +03:00
salvacybersec
590fc33955
feat(09-02): add LimiterRegistry with per-source rate limiters and jitter
...
- NewLimiterRegistry + For(name, rate, burst) idempotent lookup
- Wait blocks on token then applies 100ms-1s jitter when stealth
- Per-source isolation (RECON-INFRA-05), ctx cancellation honored
- Tests: isolation, idempotency, ctx cancel, jitter range, no-jitter
2026-04-06 00:41:33 +03:00
salvacybersec
10af12d358
feat(09-01): add ReconSource interface and Config
...
- Define ReconSource interface: Name/RateLimit/Burst/RespectsRobots/Enabled/Sweep
- Alias recon.Finding = engine.Finding for shared storage path
- Config struct carries Stealth, RespectRobots, EnabledSources, Query
2026-04-06 00:40:46 +03:00
salvacybersec
2c554b9c9c
test(08-07): add dork count + uniqueness guardrail
...
- TestDorkCountGuardrail: enforces DORK-02 >=150 floor
- TestDorkCountPerSource: per-source minimums (github>=50, google>=30, shodan>=20, censys>=15, zoomeye/fofa/gitlab>=10, bing>=5)
- TestDorkCategoriesPresent: all 5 DORK-01 categories present
- TestDorkIDsUnique: no collisions across source files
2026-04-06 00:24:51 +03:00
salvacybersec
c504cbd5d3
feat(08-04): add 10 FOFA + 10 GitLab + 5 Bing dorks
...
- 10 FOFA queries using title=/body=/port=/cert= syntax (8 infrastructure
+ 2 frontier: Azure OpenAI cert, OpenAI proxy api_key leak)
- 10 GitLab code search dorks across frontier/specialized/infrastructure/
emerging categories (OpenAI, Anthropic, Google AI, Groq, Cohere, HF,
OpenRouter, Perplexity, DeepSeek, Pinecone)
- 5 Bing dorks using site:/filetype:/intitle:/inbody: operators
(3 frontier + 1 specialized + 1 infrastructure)
- Brings grand total across all 8 sources to 150 dorks, satisfying DORK-02
- Dual-located under pkg/dorks/definitions/ and dorks/
2026-04-06 00:21:41 +03:00
salvacybersec
1c86800c14
feat(08-04): add 15 Censys + 10 ZoomEye dorks
...
- 15 Censys Search 2.0 queries for Ollama, vLLM, LocalAI, Open WebUI,
LM Studio, Triton, TGI, LiteLLM, Portkey, LangServe, FastChat,
text-generation-webui, Azure OpenAI certs, Bedrock certs, and OpenAI
proxies (12 infrastructure + 3 frontier)
- 10 ZoomEye app/title/port/service queries covering the same LLM
infrastructure surface (9 infrastructure + 1 frontier)
- Dual-located under pkg/dorks/definitions/ (embedded) and dorks/ (repo root)
2026-04-06 00:21:34 +03:00
salvacybersec
56c11e39a0
feat(08-03): add 20 Shodan dorks for exposed LLM infrastructure
...
- frontier.yaml: 6 dorks (OpenAI/Anthropic proxies, Azure OpenAI certs, AWS Bedrock, LiteLLM)
- infrastructure.yaml: 14 dorks (Ollama, vLLM, LocalAI, LM Studio, text-generation-webui, Open WebUI, Triton, TGI, LangServe, FastChat, OpenRouter/Portkey/Helicone gateways)
- Real Shodan query syntax: http.title, http.html, ssl.cert.subject.cn, product, port, http.component
- Dual-located: pkg/dorks/definitions/shodan/ + dorks/shodan/
2026-04-06 00:21:03 +03:00
salvacybersec
348d1c057b
feat(08-03): add 30 Google dorks across 3 categories
...
- frontier.yaml: 12 dorks (OpenAI, Anthropic, Google AI, Groq, Cohere, Mistral, xAI, Replicate)
- specialized.yaml: 10 dorks (Perplexity, HF, ElevenLabs, Deepgram, AssemblyAI, Stability, Jina, Voyage)
- infrastructure.yaml: 8 dorks (OpenRouter, LiteLLM, Helicone, Portkey, Ollama, vLLM, LocalAI)
- Real site:/filetype:/intitle:/inurl: operators, no templating
- Dual-located: pkg/dorks/definitions/google/ (go:embed) + dorks/google/ (user-visible)
2026-04-06 00:20:56 +03:00
salvacybersec
9755b3756a
feat(08-02): add 25 GitHub dorks for infrastructure, emerging, enterprise categories
...
- infrastructure.yaml: 10 dorks covering Tier 5 gateways (OpenRouter,
LiteLLM, Portkey, Helicone, Cloudflare AI, Vercel AI) and Tier 8
self-hosted (Ollama, vLLM, LocalAI)
- emerging.yaml: 10 dorks covering Tier 4 Chinese providers (DeepSeek,
Moonshot, Qwen, Zhipu, MiniMax) and Tier 6 vector DBs (Pinecone,
Weaviate, Qdrant, Chroma) plus Writer.com
- enterprise.yaml: 5 dorks covering Tier 7 dev tools (Codeium, Tabnine)
and Tier 9 enterprise (Databricks, Snowflake Cortex, IBM watsonx)
- Registry now loads 50 total GitHub dorks across all 5 categories,
mirrored in both dorks/github/ and pkg/dorks/definitions/github/
2026-04-06 00:20:52 +03:00
salvacybersec
09722eaec4
feat(08-02): add 25 GitHub dorks for frontier and specialized categories
...
- frontier.yaml: 15 dorks covering Tier 1/2 providers (OpenAI, Anthropic,
Google AI, Azure OpenAI, AWS Bedrock, xAI, Cohere, Mistral, Groq,
Together, Replicate)
- specialized.yaml: 10 dorks covering Tier 3 providers (Perplexity,
Voyage, Jina, AssemblyAI, Deepgram, ElevenLabs, Stability, HuggingFace)
- Extend loader to accept YAML list format in addition to single-dork
mapping, enabling multi-dork files for Wave 2+ plans
- Mirror all YAMLs into dorks/github/ (user-visible) and
pkg/dorks/definitions/github/ (go:embed target)
2026-04-06 00:20:43 +03:00
salvacybersec
01062b88b1
feat(08-01): add custom_dorks table and CRUD for user-authored dorks
...
- schema.sql: CREATE TABLE IF NOT EXISTS custom_dorks with unique dork_id,
source/category indexes, and tags stored as JSON TEXT
- custom_dorks.go: Save/List/Get/GetByDorkID/Delete with JSON tag round-trip
- Tests: round-trip, newest-first ordering, not-found, unique constraint,
delete no-op, schema migration idempotency
2026-04-06 00:16:33 +03:00
salvacybersec
fd6efbb4c2
feat(08-01): add pkg/dorks foundation (schema, loader, registry, executor)
...
- Dork schema with Validate() mirroring provider YAML pattern
- go:embed loader tolerating empty definitions tree
- Registry with List/Get/Stats/ListBySource/ListByCategory
- Executor interface + Runner dispatch + ErrSourceNotImplemented
- Placeholder definitions/.gitkeep and repo-root dorks/.gitkeep
- Full unit test coverage for registry, validation, and runner dispatch
2026-04-06 00:15:32 +03:00
salvacybersec
9dbb0b87d4
feat(07-04): wire keyhunter import command with dedup and DB persist
...
- Replace import stub with cmd/import.go dispatching to pkg/importer
(trufflehog, gitleaks, gitleaks-csv) via --format flag
- Reuse openDBWithKey helper so encryption + path resolution match scan/keys
- engineToStorage converts engine.Finding -> storage.Finding (Source -> SourcePath)
- Add pkg/storage.FindingExistsByKey for idempotent cross-import dedup
keyed on (provider, masked key, source path, line number)
- cmd/import_test.go: selector table, field conversion, end-to-end trufflehog
import with re-run duplicate assertion, unknown-format + missing-file errors
- pkg/storage queries_test: FindingExistsByKey hit and four miss cases
Delivers IMP-01/02/03 end-to-end.
2026-04-05 23:59:39 +03:00
salvacybersec
bd8eb9b611
test(07-03): SARIF GitHub code scanning validation
...
- Minimal required-fields fixture for GitHub SARIF upload schema
- TestSARIFGitHubValidation: asserts $schema/version/runs, tool.driver.name,
per-result ruleId/level/message/locations, physicalLocation.region.startLine >= 1
- Covers startLine floor for LineNumber=0 inputs
- TestSARIFGitHubValidation_EmptyFindings: empty input still yields a valid
document with results: [] (not null)
2026-04-05 23:55:38 +03:00
salvacybersec
83640ac200
feat(07-02): add Gitleaks JSON + CSV importers
...
- GitleaksImporter parses native JSON array output to []engine.Finding
- GitleaksCSVImporter parses CSV with header-based column resolution
- normalizeGitleaksRuleID strips suffixes (-api-key, -access-token, ...)
- Shared buildGitleaksFinding helper keeps JSON/CSV paths in lockstep
- Test fixtures + 8 tests covering happy path, empty, invalid, symlink fallback
2026-04-05 23:55:36 +03:00
salvacybersec
46eec328d2
feat(07-01): Importer interface and TruffleHog v3 JSON adapter
...
- pkg/importer/importer.go: shared Importer interface (Name, Import)
- pkg/importer/trufflehog.go: TruffleHogImporter with v3 JSON decoding,
detector-name normalization (OpenAI/GithubV2/AWS -> canonical ids),
SourceMetadata path+line extraction for Git/Filesystem/Github
- pkg/importer/testdata/trufflehog-sample.json: 3-record fixture
- pkg/importer/trufflehog_test.go: Name, Import, NormalizeName, EmptyArray,
InvalidJSON tests -- all passing
2026-04-05 23:55:24 +03:00
salvacybersec
6a3d5b0cb7
feat(07-03): dedup helper for imported findings
...
- FindingKey: stable SHA-256 over provider+masked+source+line
- Dedup: preserves first-seen order, returns drop count
- 8 unit tests covering stability, field sensitivity, order preservation
2026-04-05 23:54:44 +03:00
salvacybersec
03249fb3d1
feat(06-02): implement CSVFormatter with Unmask support
...
- Fixed 9-column header: id,provider,confidence,key,source,line,detected_at,verified,verify_status
- Uses encoding/csv for automatic quoting of commas/quotes in source paths
- Honors Options.Unmask for key column
- Registers under "csv" in output registry
2026-04-05 23:32:07 +03:00
salvacybersec
b35881aaef
test(06-02): add failing tests for CSVFormatter
2026-04-05 23:31:44 +03:00
salvacybersec
2717aa3196
feat(06-03): implement SARIF 2.1.0 formatter with hand-rolled structs
...
- SARIFFormatter emits schema-valid SARIF 2.1.0 JSON for CI ingestion
- One rule per distinct provider, deduped in first-seen order
- Confidence mapped high/medium/low to error/warning/note
- startLine floored to 1 per SARIF spec requirement
- Registered under name 'sarif' via init()
2026-04-05 23:31:15 +03:00
salvacybersec
b1e4dea51c
feat(06-04): implement findings query layer for keys command
...
- Filters struct: Provider, Verified (*bool), Limit, Offset
- ListFindingsFiltered: optional WHERE + ORDER BY created_at DESC, id DESC
- GetFinding: single-row lookup, propagates sql.ErrNoRows on miss
- DeleteFinding: returns RowsAffected so caller can distinguish hit/miss
- Shared scan/hydrate helpers decrypt key_value via existing Decrypt
2026-04-05 23:31:15 +03:00
salvacybersec
164477136c
feat(06-02): implement JSONFormatter with Unmask support
...
- Renders findings as 2-space indented JSON array
- Honors Options.Unmask for key field exposure
- Omits empty verify fields via json omitempty
- Registers under "json" in output registry
2026-04-05 23:31:12 +03:00
salvacybersec
2cb35d50ac
test(06-03): add failing tests for SARIF 2.1.0 formatter
2026-04-05 23:30:38 +03:00
salvacybersec
67763ec498
test(06-04): add failing tests for findings query layer
...
- Filters struct with provider, verified, limit/offset
- ListFindingsFiltered, GetFinding, DeleteFinding coverage
- Uses in-memory SQLite with seeded fixtures across 2 providers
2026-04-05 23:30:33 +03:00
salvacybersec
c933673ca9
test(06-02): add failing tests for JSONFormatter
2026-04-05 23:30:12 +03:00
salvacybersec
8e4db5db09
feat(06-01): refactor table output into TableFormatter
...
- TableFormatter implements Formatter interface, registered as "table"
- Writes to arbitrary io.Writer instead of hardcoded os.Stdout
- Strips ANSI colors when writer is not a TTY or NO_COLOR is set
- Uses bundled tableStyles so plain/colored paths share one renderer
- PrintFindings retained as backward-compat wrapper delegating to Format
2026-04-05 23:27:53 +03:00
salvacybersec
8c37252c1b
test(06-01): add failing tests for TableFormatter refactor
...
- Add TestTableFormatter_Empty, NoColorInBuffer, Unverified/VerifiedLayout
- Add TestTableFormatter_Masking, MetadataSorted, RegisteredUnderTable
- Keep legacy PrintFindings tests as backward-compat wrapper coverage
2026-04-05 23:27:03 +03:00
salvacybersec
291c97ed0b
feat(06-01): add Formatter interface, Registry, and TTY color detection
...
- pkg/output/formatter.go: Formatter interface, Options, Registry with
Register/Get/Names, ErrUnknownFormat sentinel
- pkg/output/colors.go: IsTTY + ColorsEnabled honoring NO_COLOR
- Promote github.com/mattn/go-isatty to direct dependency
- Unit tests cover registry round-trip, unknown lookup, sorted Names,
non-TTY buffer, NO_COLOR override
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-05 18:41:23 +03:00
salvacybersec
cc9dabe5f5
feat(05-05): render VERIFY column and metadata line in output table
...
- When any finding has Verified=true, append a VERIFY column with colored
glyphs: ✓ live / ✗ dead / ⚠ rate / ! err / ? unk
- Per-finding VerifyMetadata is rendered on an indented secondary line
with deterministic (sorted) key ordering
- Backward compatible: unverified scans produce identical output to
pre-Phase-5 runs
2026-04-05 15:54:51 +03:00
salvacybersec
edba8fb5d4
test(05-05): add failing tests for VERIFY column and metadata rendering
2026-04-05 15:54:13 +03:00
salvacybersec
35c7759f02
feat(05-03): add VerifyAll ants worker pool for parallel verification
...
- VerifyAll(ctx, findings, reg, workers) returns a result channel closed
after all findings are processed or ctx is cancelled.
- Default worker count of 10 when workers <= 0.
- Missing providers yield StatusUnknown with 'provider not found' error.
- Graceful context cancellation stops dispatch while still draining inflight.
2026-04-05 15:49:22 +03:00
salvacybersec
45ee2f8f53
test(05-03): add failing tests for VerifyAll worker pool
...
- TestVerifyAll_MultipleFindings: 5 findings via 3-worker pool
- TestVerifyAll_MissingProvider: unknown provider yields StatusUnknown
- TestVerifyAll_ContextCancellation: cancellation closes channel early
- Add providers.NewRegistryFromProviders test helper
2026-04-05 15:48:46 +03:00
salvacybersec
3dfe72779b
feat(05-03): implement HTTPVerifier single-key verification
...
- HTTPVerifier with TLS 1.2+ client and configurable per-call timeout
- {{KEY}} template substitution in URL, header values, and body
- Classification via EffectiveSuccessCodes/FailureCodes/RateLimitCodes
- Retry-After header captured on rate-limit responses
- gjson-based metadata extraction for JSON responses (1 MiB cap)
- HTTPS-only enforcement; missing URL yields StatusUnknown
- Consent stub added to unblock parallel Plan 05-02 worktree (Rule 3 deviation)
2026-04-05 15:47:49 +03:00
salvacybersec
d4c140371e
feat(05-02): implement EnsureConsent prompt gating --verify
...
- Add EnsureConsent(db, in, out) that returns (true, nil) immediately if
verify.consent==granted, otherwise prompts once, reads a line, persists
'granted' on 'yes' (case-insensitive), 'declined' otherwise.
- Declined is not sticky — next call re-prompts; only granted persists.
- Prompt references legal implications and directs users to 'keyhunter legal'.
2026-04-05 15:47:30 +03:00
salvacybersec
6a94ce5903
test(05-04): guardrail tests for Tier 1 verify spec completeness
...
- TestTier1VerifySpecs_Complete asserts 11 Tier 1 providers have HTTPS
verify URLs and non-empty effective success codes
- TestInflection_NoVerifyEndpoint documents the intentional empty URL
- Prevents future regressions when editing provider YAMLs
2026-04-05 15:46:57 +03:00
salvacybersec
e5f72149cf
test(05-02): add failing tests for EnsureConsent prompt logic
2026-04-05 15:46:41 +03:00
salvacybersec
f3ae8f0b09
feat(05-04): extend Tier 1 provider verify specs
...
- 12 Tier 1 providers now carry success_codes, failure_codes, rate_limit_codes
- {{KEY}} template in headers or URL (double-brace canonical form)
- metadata_paths added where provider APIs return useful metadata
- Anthropic switched to POST /v1/messages with minimal body
- Perplexity gains JSON body, content-type header
- Inflection verify URL left empty (no public endpoint)
- Dual-location sync preserved: providers/ mirrors pkg/providers/definitions/
2026-04-05 15:46:30 +03:00
salvacybersec
3ceccd98ad
test(05-03): add failing tests for HTTPVerifier single-key verification
...
- 10 test cases covering live/dead/rate-limited/unknown/error classification
- Key substitution in header/body/URL via {{KEY}} template
- JSON metadata extraction via gjson paths
- HTTPS-only enforcement and per-call timeout
2026-04-05 15:46:15 +03:00
salvacybersec
260e342f2f
feat(05-02): add LEGAL.md, embed it, and wire keyhunter legal command
...
- Add LEGAL.md at repo root (109 lines) covering CFAA, Computer Misuse Act,
EU Directive 2013/40/EU, responsible use, disclosure, and disclaimer.
- Mirror to pkg/legal/LEGAL.md for go:embed (Go cannot traverse parents).
- Add pkg/legal package exposing Text() for the embedded markdown.
- Add cmd/legal.go registering keyhunter legal subcommand to print it.
2026-04-05 15:46:11 +03:00