salvacybersec
fd6efbb4c2
feat(08-01): add pkg/dorks foundation (schema, loader, registry, executor)
...
- Dork schema with Validate() mirroring provider YAML pattern
- go:embed loader tolerating empty definitions tree
- Registry with List/Get/Stats/ListBySource/ListByCategory
- Executor interface + Runner dispatch + ErrSourceNotImplemented
- Placeholder definitions/.gitkeep and repo-root dorks/.gitkeep
- Full unit test coverage for registry, validation, and runner dispatch
2026-04-06 00:15:32 +03:00
salvacybersec
9dbb0b87d4
feat(07-04): wire keyhunter import command with dedup and DB persist
...
- Replace import stub with cmd/import.go dispatching to pkg/importer
(trufflehog, gitleaks, gitleaks-csv) via --format flag
- Reuse openDBWithKey helper so encryption + path resolution match scan/keys
- engineToStorage converts engine.Finding -> storage.Finding (Source -> SourcePath)
- Add pkg/storage.FindingExistsByKey for idempotent cross-import dedup
keyed on (provider, masked key, source path, line number)
- cmd/import_test.go: selector table, field conversion, end-to-end trufflehog
import with re-run duplicate assertion, unknown-format + missing-file errors
- pkg/storage queries_test: FindingExistsByKey hit and four miss cases
Delivers IMP-01/02/03 end-to-end.
2026-04-05 23:59:39 +03:00
salvacybersec
bd8eb9b611
test(07-03): SARIF GitHub code scanning validation
...
- Minimal required-fields fixture for GitHub SARIF upload schema
- TestSARIFGitHubValidation: asserts $schema/version/runs, tool.driver.name,
per-result ruleId/level/message/locations, physicalLocation.region.startLine >= 1
- Covers startLine floor for LineNumber=0 inputs
- TestSARIFGitHubValidation_EmptyFindings: empty input still yields a valid
document with results: [] (not null)
2026-04-05 23:55:38 +03:00
salvacybersec
83640ac200
feat(07-02): add Gitleaks JSON + CSV importers
...
- GitleaksImporter parses native JSON array output to []engine.Finding
- GitleaksCSVImporter parses CSV with header-based column resolution
- normalizeGitleaksRuleID strips suffixes (-api-key, -access-token, ...)
- Shared buildGitleaksFinding helper keeps JSON/CSV paths in lockstep
- Test fixtures + 8 tests covering happy path, empty, invalid, symlink fallback
2026-04-05 23:55:36 +03:00
salvacybersec
46eec328d2
feat(07-01): Importer interface and TruffleHog v3 JSON adapter
...
- pkg/importer/importer.go: shared Importer interface (Name, Import)
- pkg/importer/trufflehog.go: TruffleHogImporter with v3 JSON decoding,
detector-name normalization (OpenAI/GithubV2/AWS -> canonical ids),
SourceMetadata path+line extraction for Git/Filesystem/Github
- pkg/importer/testdata/trufflehog-sample.json: 3-record fixture
- pkg/importer/trufflehog_test.go: Name, Import, NormalizeName, EmptyArray,
InvalidJSON tests -- all passing
2026-04-05 23:55:24 +03:00
salvacybersec
6a3d5b0cb7
feat(07-03): dedup helper for imported findings
...
- FindingKey: stable SHA-256 over provider+masked+source+line
- Dedup: preserves first-seen order, returns drop count
- 8 unit tests covering stability, field sensitivity, order preservation
2026-04-05 23:54:44 +03:00
salvacybersec
03249fb3d1
feat(06-02): implement CSVFormatter with Unmask support
...
- Fixed 9-column header: id,provider,confidence,key,source,line,detected_at,verified,verify_status
- Uses encoding/csv for automatic quoting of commas/quotes in source paths
- Honors Options.Unmask for key column
- Registers under "csv" in output registry
2026-04-05 23:32:07 +03:00
salvacybersec
b35881aaef
test(06-02): add failing tests for CSVFormatter
2026-04-05 23:31:44 +03:00
salvacybersec
2717aa3196
feat(06-03): implement SARIF 2.1.0 formatter with hand-rolled structs
...
- SARIFFormatter emits schema-valid SARIF 2.1.0 JSON for CI ingestion
- One rule per distinct provider, deduped in first-seen order
- Confidence mapped high/medium/low to error/warning/note
- startLine floored to 1 per SARIF spec requirement
- Registered under name 'sarif' via init()
2026-04-05 23:31:15 +03:00
salvacybersec
b1e4dea51c
feat(06-04): implement findings query layer for keys command
...
- Filters struct: Provider, Verified (*bool), Limit, Offset
- ListFindingsFiltered: optional WHERE + ORDER BY created_at DESC, id DESC
- GetFinding: single-row lookup, propagates sql.ErrNoRows on miss
- DeleteFinding: returns RowsAffected so caller can distinguish hit/miss
- Shared scan/hydrate helpers decrypt key_value via existing Decrypt
2026-04-05 23:31:15 +03:00
salvacybersec
164477136c
feat(06-02): implement JSONFormatter with Unmask support
...
- Renders findings as 2-space indented JSON array
- Honors Options.Unmask for key field exposure
- Omits empty verify fields via json omitempty
- Registers under "json" in output registry
2026-04-05 23:31:12 +03:00
salvacybersec
2cb35d50ac
test(06-03): add failing tests for SARIF 2.1.0 formatter
2026-04-05 23:30:38 +03:00
salvacybersec
67763ec498
test(06-04): add failing tests for findings query layer
...
- Filters struct with provider, verified, limit/offset
- ListFindingsFiltered, GetFinding, DeleteFinding coverage
- Uses in-memory SQLite with seeded fixtures across 2 providers
2026-04-05 23:30:33 +03:00
salvacybersec
c933673ca9
test(06-02): add failing tests for JSONFormatter
2026-04-05 23:30:12 +03:00
salvacybersec
8e4db5db09
feat(06-01): refactor table output into TableFormatter
...
- TableFormatter implements Formatter interface, registered as "table"
- Writes to arbitrary io.Writer instead of hardcoded os.Stdout
- Strips ANSI colors when writer is not a TTY or NO_COLOR is set
- Uses bundled tableStyles so plain/colored paths share one renderer
- PrintFindings retained as backward-compat wrapper delegating to Format
2026-04-05 23:27:53 +03:00
salvacybersec
8c37252c1b
test(06-01): add failing tests for TableFormatter refactor
...
- Add TestTableFormatter_Empty, NoColorInBuffer, Unverified/VerifiedLayout
- Add TestTableFormatter_Masking, MetadataSorted, RegisteredUnderTable
- Keep legacy PrintFindings tests as backward-compat wrapper coverage
2026-04-05 23:27:03 +03:00
salvacybersec
291c97ed0b
feat(06-01): add Formatter interface, Registry, and TTY color detection
...
- pkg/output/formatter.go: Formatter interface, Options, Registry with
Register/Get/Names, ErrUnknownFormat sentinel
- pkg/output/colors.go: IsTTY + ColorsEnabled honoring NO_COLOR
- Promote github.com/mattn/go-isatty to direct dependency
- Unit tests cover registry round-trip, unknown lookup, sorted Names,
non-TTY buffer, NO_COLOR override
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-05 18:41:23 +03:00
salvacybersec
cc9dabe5f5
feat(05-05): render VERIFY column and metadata line in output table
...
- When any finding has Verified=true, append a VERIFY column with colored
glyphs: ✓ live / ✗ dead / ⚠ rate / ! err / ? unk
- Per-finding VerifyMetadata is rendered on an indented secondary line
with deterministic (sorted) key ordering
- Backward compatible: unverified scans produce identical output to
pre-Phase-5 runs
2026-04-05 15:54:51 +03:00
salvacybersec
edba8fb5d4
test(05-05): add failing tests for VERIFY column and metadata rendering
2026-04-05 15:54:13 +03:00
salvacybersec
35c7759f02
feat(05-03): add VerifyAll ants worker pool for parallel verification
...
- VerifyAll(ctx, findings, reg, workers) returns a result channel closed
after all findings are processed or ctx is cancelled.
- Default worker count of 10 when workers <= 0.
- Missing providers yield StatusUnknown with 'provider not found' error.
- Graceful context cancellation stops dispatch while still draining inflight.
2026-04-05 15:49:22 +03:00
salvacybersec
45ee2f8f53
test(05-03): add failing tests for VerifyAll worker pool
...
- TestVerifyAll_MultipleFindings: 5 findings via 3-worker pool
- TestVerifyAll_MissingProvider: unknown provider yields StatusUnknown
- TestVerifyAll_ContextCancellation: cancellation closes channel early
- Add providers.NewRegistryFromProviders test helper
2026-04-05 15:48:46 +03:00
salvacybersec
3dfe72779b
feat(05-03): implement HTTPVerifier single-key verification
...
- HTTPVerifier with TLS 1.2+ client and configurable per-call timeout
- {{KEY}} template substitution in URL, header values, and body
- Classification via EffectiveSuccessCodes/FailureCodes/RateLimitCodes
- Retry-After header captured on rate-limit responses
- gjson-based metadata extraction for JSON responses (1 MiB cap)
- HTTPS-only enforcement; missing URL yields StatusUnknown
- Consent stub added to unblock parallel Plan 05-02 worktree (Rule 3 deviation)
2026-04-05 15:47:49 +03:00
salvacybersec
d4c140371e
feat(05-02): implement EnsureConsent prompt gating --verify
...
- Add EnsureConsent(db, in, out) that returns (true, nil) immediately if
verify.consent==granted, otherwise prompts once, reads a line, persists
'granted' on 'yes' (case-insensitive), 'declined' otherwise.
- Declined is not sticky — next call re-prompts; only granted persists.
- Prompt references legal implications and directs users to 'keyhunter legal'.
2026-04-05 15:47:30 +03:00
salvacybersec
6a94ce5903
test(05-04): guardrail tests for Tier 1 verify spec completeness
...
- TestTier1VerifySpecs_Complete asserts 11 Tier 1 providers have HTTPS
verify URLs and non-empty effective success codes
- TestInflection_NoVerifyEndpoint documents the intentional empty URL
- Prevents future regressions when editing provider YAMLs
2026-04-05 15:46:57 +03:00
salvacybersec
e5f72149cf
test(05-02): add failing tests for EnsureConsent prompt logic
2026-04-05 15:46:41 +03:00
salvacybersec
f3ae8f0b09
feat(05-04): extend Tier 1 provider verify specs
...
- 12 Tier 1 providers now carry success_codes, failure_codes, rate_limit_codes
- {{KEY}} template in headers or URL (double-brace canonical form)
- metadata_paths added where provider APIs return useful metadata
- Anthropic switched to POST /v1/messages with minimal body
- Perplexity gains JSON body, content-type header
- Inflection verify URL left empty (no public endpoint)
- Dual-location sync preserved: providers/ mirrors pkg/providers/definitions/
2026-04-05 15:46:30 +03:00
salvacybersec
3ceccd98ad
test(05-03): add failing tests for HTTPVerifier single-key verification
...
- 10 test cases covering live/dead/rate-limited/unknown/error classification
- Key substitution in header/body/URL via {{KEY}} template
- JSON metadata extraction via gjson paths
- HTTPS-only enforcement and per-call timeout
2026-04-05 15:46:15 +03:00
salvacybersec
260e342f2f
feat(05-02): add LEGAL.md, embed it, and wire keyhunter legal command
...
- Add LEGAL.md at repo root (109 lines) covering CFAA, Computer Misuse Act,
EU Directive 2013/40/EU, responsible use, disclosure, and disclaimer.
- Mirror to pkg/legal/LEGAL.md for go:embed (Go cannot traverse parents).
- Add pkg/legal package exposing Text() for the embedded markdown.
- Add cmd/legal.go registering keyhunter legal subcommand to print it.
2026-04-05 15:46:11 +03:00
salvacybersec
aec559d2aa
feat(05-01): migrate findings schema with verify_* columns
...
- schema.sql: new findings columns verified, verify_status, verify_http_code, verify_metadata_json
- db.go: migrateFindingsVerifyColumns runs on Open() for legacy DBs using PRAGMA table_info + ALTER TABLE
- findings.go: Finding struct gains Verified/VerifyStatus/VerifyHTTPCode/VerifyMetadata
- SaveFinding serializes verify metadata as JSON (NULL when nil)
- ListFindings round-trips all verify fields
2026-04-05 15:42:53 +03:00
salvacybersec
26544872f7
test(05-01): add failing tests for findings verify columns
...
- Round-trip verify fields (Verified, VerifyStatus, VerifyHTTPCode, VerifyMetadata)
- Empty verify fields persist as defaults
- Legacy DB schema migrates verify columns idempotently
2026-04-05 15:41:49 +03:00
salvacybersec
30c0e9871b
feat(05-01): extend VerifySpec and Finding, add gjson dep
...
- VerifySpec: add SuccessCodes, FailureCodes, RateLimitCodes, MetadataPaths, Body
- Preserve legacy ValidStatus/InvalidStatus for backward compat
- Add EffectiveSuccessCodes/FailureCodes/RateLimitCodes fallback helpers
- Add ExtractMetadata helper using gjson (skeleton for Plan 05-03)
- Finding: add Verified, VerifyStatus, VerifyHTTPCode, VerifyMetadata, VerifyError
- Add github.com/tidwall/gjson v1.18.0 as direct dependency
2026-04-05 15:41:13 +03:00
salvacybersec
499f5d5025
test(05-01): add failing tests for extended VerifySpec
...
- New canonical fields: SuccessCodes, FailureCodes, RateLimitCodes, MetadataPaths, Body
- EffectiveSuccessCodes/FailureCodes/RateLimitCodes fallback logic
- Legacy ValidStatus/InvalidStatus still parse
2026-04-05 15:39:35 +03:00
salvacybersec
850c3ff8e9
feat(04-04): add StdinSource, URLSource, and ClipboardSource
...
- StdinSource reads from an injectable io.Reader (INPUT-03)
- URLSource fetches http/https with 30s timeout, 50MB cap, scheme whitelist, and Content-Type filter (INPUT-04)
- ClipboardSource wraps atotto/clipboard with graceful fallback for missing tooling (INPUT-05)
- emitByteChunks local helper mirrors file.go windowing to stay independent of sibling wave-1 plans
- Tests cover happy path, cancellation, redirects, oversize bodies, binary content types, scheme rejection, and clipboard error paths
2026-04-05 15:18:23 +03:00
salvacybersec
6f834c9c06
feat(04-02): implement DirSource with recursive walk, glob exclusion, and mmap
...
- Add DirSource with filepath.WalkDir recursive traversal
- Default exclusions for .git, node_modules, vendor, *.min.js, *.map
- Binary file detection via NUL byte sniff (first 512 bytes)
- mmap reads for files >= 10MB via golang.org/x/exp/mmap
- Deterministic sorted emission order for reproducible tests
- Refactor FileSource to share emitChunks/isBinary helpers and mmap large files
2026-04-05 15:18:10 +03:00
salvacybersec
e48a7a489e
feat(04-03): implement GitSource with full-history traversal
...
- Walks every commit across branches, tags, remote-tracking refs, and stash
- Deduplicates blob scans by OID (seenBlobs map) so identical content
across commits/files is scanned exactly once
- Emits chunks with source format git:<short-sha>:<path>
- Honors --since filter via GitSource.Since (commit author date)
- Resolves annotated tag objects down to their commit hash
- Skips binary blobs via go-git IsBinary plus null-byte sniff
- 8 subtests cover history walk, dedup, modified-file, multi-branch,
tag reachability, since filter, source format, missing repo
2026-04-05 15:18:05 +03:00
salvacybersec
ce6298f304
test(04-02): add failing tests for DirSource recursive walk and mmap
2026-04-05 15:16:48 +03:00
salvacybersec
1aea496a17
test(03-08): add Tier 3-9 guardrail tests locking 108 total providers
...
- Add tier39_test.go with per-tier count assertions (T3=12, T4=16, T5=11, T6=15, T7=10, T8=10, T9=8)
- Lock all 82 Tier 3-9 provider names against drift via expectedTier3..expectedTier9 slices
- Assert total registry provider count == 108
- Existing TestAllPatternsCompile and TestAllProvidersHaveKeywords transitively cover Tier 3-9 regex compilation and keyword presence
- Satisfies PROV-03..PROV-09
2026-04-05 14:45:41 +03:00
salvacybersec
bad80b0d8a
Merge branch 'worktree-agent-a090b6ec'
2026-04-05 14:44:26 +03:00
salvacybersec
a019ba9a3d
feat(03-01): add 8 Tier 4 providers (Baichuan, StepFun, SenseTime, iFlytek, Tencent, SiliconFlow, 360AI, Kuaishou)
...
- SiliconFlow uses documented sk- prefix
- Other 7 keyword-only (no documented key format, avoids false positives)
- Completes PROV-04: 16 Tier 4 Chinese/regional providers
2026-04-05 14:42:46 +03:00
salvacybersec
a73cea361b
feat(03-07): add LangSmith and 6 vector DB providers
...
- LangSmith with lsv2_(pt|sk) high-confidence regex
- Pinecone with pcsk_ high-confidence regex
- Weaviate, Qdrant, Chroma, Milvus/Zilliz, Neon (keyword-only)
- Completes 15 Tier 6 emerging/niche providers (PROV-06)
2026-04-05 14:42:36 +03:00
salvacybersec
440daab2a2
feat(03-06): add Databricks, Snowflake, Oracle GenAI, HPE GreenLake Tier 9 providers
...
- Databricks dapi-prefixed high-confidence regex pattern
- Snowflake/Oracle/HPE keyword-only detection
- Completes PROV-09 (8 Tier 9 enterprise providers)
2026-04-05 14:42:19 +03:00
salvacybersec
367cfedb6f
feat(03-05): add GPT4All, text-gen-webui, TensorRT-LLM, Triton, Jan AI provider YAMLs
...
- 5 more Tier 8 self-hosted runtime definitions (keyword-only)
- Completes 10 Tier 8 providers, satisfying PROV-08
- Dual-located in providers/ and pkg/providers/definitions/
2026-04-05 14:42:04 +03:00
salvacybersec
0ac12e52de
feat(03-02): add voice and image/video Tier 3 providers
...
- Deepgram (hex40, low confidence)
- ElevenLabs (hex32, XI_API_KEY header)
- Stability AI (sk- prefix, medium confidence)
- Runway (keyword-only)
- Midjourney (keyword-only, no official API)
Completes PROV-03: 12 Tier 3 Specialized providers (with pre-existing huggingface).
2026-04-05 14:42:02 +03:00
salvacybersec
fbbb54b7a6
feat(03-04): add CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI providers
...
- Codestral with low-confidence 32-char generic pattern + high entropy
- watsonx with IBM IAM token endpoint for verification
- CodeWhisperer, Replit AI, Oracle AI as keyword-only
- Completes PROV-07 (10 Tier 7 code/dev tools providers)
2026-04-05 14:41:56 +03:00
salvacybersec
fbe9e8b0dc
feat(03-07): add 8 emerging labs, writing tools, observability providers
...
- Reka, Aleph Alpha, Lamini (emerging LLM labs)
- Writer, Jasper, Typeface (writing tools)
- Comet ML/Opik, Weights & Biases (observability)
- Dual-located in providers/ and pkg/providers/definitions/
2026-04-05 14:41:56 +03:00
salvacybersec
c8d326c34d
feat(03-03): add Martian, Kong, BricksAI, Aether, Not Diamond gateways
...
- Keyword-only detection (no documented public key formats)
- Completes 11 Tier 5 infrastructure/gateway providers for PROV-05
2026-04-05 14:41:55 +03:00
salvacybersec
35dbbc71f1
feat(03-01): add 8 Tier 4 Chinese providers (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, 01.AI, MiniMax)
...
- DeepSeek, Moonshot, Qwen use documented sk- prefix patterns
- Zhipu, Baidu, ByteDance use keyword-only detection (no documented key format)
- All dual-located in providers/ and pkg/providers/definitions/
2026-04-05 14:41:50 +03:00
salvacybersec
469ed0c0dd
feat(03-06): add Salesforce, ServiceNow, SAP, Palantir Tier 9 providers
...
- Keyword-only detection; strong env var anchors
- Dual-located in providers/ and pkg/providers/definitions/
2026-04-05 14:41:42 +03:00
salvacybersec
370dca0cbb
feat(03-05): add Ollama, vLLM, LocalAI, LM Studio, llama.cpp provider YAMLs
...
- 5 Tier 8 self-hosted runtime provider definitions (keyword-only)
- Localhost endpoints and env var anchors for OSINT correlation
- Dual-located in providers/ and pkg/providers/definitions/
2026-04-05 14:41:35 +03:00
salvacybersec
7ad9588212
feat(03-02): add search and embeddings Tier 3 providers
...
- Perplexity (pplx- prefix, high confidence)
- You.com (keyword-only)
- Voyage AI (pa- prefix, medium confidence)
- Jina AI (jina_ prefix, high confidence)
- Unstructured.io (keyword-only)
- AssemblyAI (hex32, low confidence)
2026-04-05 14:41:33 +03:00