Commit Graph

206 Commits

Author SHA1 Message Date
salvacybersec
6a94ce5903 test(05-04): guardrail tests for Tier 1 verify spec completeness
- TestTier1VerifySpecs_Complete asserts 11 Tier 1 providers have HTTPS
  verify URLs and non-empty effective success codes
- TestInflection_NoVerifyEndpoint documents the intentional empty URL
- Prevents future regressions when editing provider YAMLs
2026-04-05 15:46:57 +03:00
salvacybersec
e5f72149cf test(05-02): add failing tests for EnsureConsent prompt logic 2026-04-05 15:46:41 +03:00
salvacybersec
f3ae8f0b09 feat(05-04): extend Tier 1 provider verify specs
- 12 Tier 1 providers now carry success_codes, failure_codes, rate_limit_codes
- {{KEY}} template in headers or URL (double-brace canonical form)
- metadata_paths added where provider APIs return useful metadata
- Anthropic switched to POST /v1/messages with minimal body
- Perplexity gains JSON body, content-type header
- Inflection verify URL left empty (no public endpoint)
- Dual-location sync preserved: providers/ mirrors pkg/providers/definitions/
2026-04-05 15:46:30 +03:00
salvacybersec
3ceccd98ad test(05-03): add failing tests for HTTPVerifier single-key verification
- 10 test cases covering live/dead/rate-limited/unknown/error classification
- Key substitution in header/body/URL via {{KEY}} template
- JSON metadata extraction via gjson paths
- HTTPS-only enforcement and per-call timeout
2026-04-05 15:46:15 +03:00
salvacybersec
260e342f2f feat(05-02): add LEGAL.md, embed it, and wire keyhunter legal command
- Add LEGAL.md at repo root (109 lines) covering CFAA, Computer Misuse Act,
  EU Directive 2013/40/EU, responsible use, disclosure, and disclaimer.
- Mirror to pkg/legal/LEGAL.md for go:embed (Go cannot traverse parents).
- Add pkg/legal package exposing Text() for the embedded markdown.
- Add cmd/legal.go registering keyhunter legal subcommand to print it.
2026-04-05 15:46:11 +03:00
salvacybersec
177888bfa8 docs(05-01): complete verification foundation plan
Wave 0 contracts for the verification engine are in place:
- VerifySpec extended with SuccessCodes/FailureCodes/RateLimitCodes/MetadataPaths/Body
- Finding extended with Verified/VerifyStatus/VerifyHTTPCode/VerifyMetadata/VerifyError
- findings table schema migrated with verify_* columns (fresh + legacy DBs)
- gjson dep wired as direct require
- VRFY-02, VRFY-03 marked complete
2026-04-05 15:44:20 +03:00
salvacybersec
aec559d2aa feat(05-01): migrate findings schema with verify_* columns
- schema.sql: new findings columns verified, verify_status, verify_http_code, verify_metadata_json
- db.go: migrateFindingsVerifyColumns runs on Open() for legacy DBs using PRAGMA table_info + ALTER TABLE
- findings.go: Finding struct gains Verified/VerifyStatus/VerifyHTTPCode/VerifyMetadata
- SaveFinding serializes verify metadata as JSON (NULL when nil)
- ListFindings round-trips all verify fields
2026-04-05 15:42:53 +03:00
salvacybersec
26544872f7 test(05-01): add failing tests for findings verify columns
- Round-trip verify fields (Verified, VerifyStatus, VerifyHTTPCode, VerifyMetadata)
- Empty verify fields persist as defaults
- Legacy DB schema migrates verify columns idempotently
2026-04-05 15:41:49 +03:00
salvacybersec
30c0e9871b feat(05-01): extend VerifySpec and Finding, add gjson dep
- VerifySpec: add SuccessCodes, FailureCodes, RateLimitCodes, MetadataPaths, Body
- Preserve legacy ValidStatus/InvalidStatus for backward compat
- Add EffectiveSuccessCodes/FailureCodes/RateLimitCodes fallback helpers
- Add ExtractMetadata helper using gjson (skeleton for Plan 05-03)
- Finding: add Verified, VerifyStatus, VerifyHTTPCode, VerifyMetadata, VerifyError
- Add github.com/tidwall/gjson v1.18.0 as direct dependency
2026-04-05 15:41:13 +03:00
salvacybersec
499f5d5025 test(05-01): add failing tests for extended VerifySpec
- New canonical fields: SuccessCodes, FailureCodes, RateLimitCodes, MetadataPaths, Body
- EffectiveSuccessCodes/FailureCodes/RateLimitCodes fallback logic
- Legacy ValidStatus/InvalidStatus still parse
2026-04-05 15:39:35 +03:00
salvacybersec
0b667566c4 docs(05): create phase 5 verification engine plans 2026-04-05 15:38:23 +03:00
salvacybersec
e65b9c981b docs(05): verification engine context 2026-04-05 15:30:11 +03:00
salvacybersec
b079ed202c docs(phase-04): complete phase execution 2026-04-05 15:29:09 +03:00
salvacybersec
c40e9f087f docs(04-05): complete scan command source wiring plan 2026-04-05 15:24:46 +03:00
salvacybersec
b151e88a29 feat(04-05): wire all Phase 4 sources through scan command
- Add --git, --url, --clipboard, --since, --max-file-size, --insecure flags
- Introduce selectSource dispatcher with sourceFlags struct
- Dispatch to Dir/File/Git/Stdin/URL/Clipboard sources based on args+flags
- Reject mutually exclusive source selectors with clear error
- Forward --exclude patterns into DirSource
- Args changed to MaximumNArgs(1) to allow --url/--clipboard without positional
2026-04-05 15:23:12 +03:00
salvacybersec
9105ca11f5 test(04-05): add failing tests for selectSource dispatcher 2026-04-05 15:21:37 +03:00
salvacybersec
66fc597399 docs(04-04): complete stdin/url/clipboard sources plan 2026-04-05 15:20:10 +03:00
salvacybersec
563d6e5ce2 docs(04-02): complete DirSource plan 2026-04-05 15:19:50 +03:00
salvacybersec
e8ba651a51 docs(04-03): complete GitSource plan 2026-04-05 15:19:15 +03:00
salvacybersec
850c3ff8e9 feat(04-04): add StdinSource, URLSource, and ClipboardSource
- StdinSource reads from an injectable io.Reader (INPUT-03)
- URLSource fetches http/https with 30s timeout, 50MB cap, scheme whitelist, and Content-Type filter (INPUT-04)
- ClipboardSource wraps atotto/clipboard with graceful fallback for missing tooling (INPUT-05)
- emitByteChunks local helper mirrors file.go windowing to stay independent of sibling wave-1 plans
- Tests cover happy path, cancellation, redirects, oversize bodies, binary content types, scheme rejection, and clipboard error paths
2026-04-05 15:18:23 +03:00
salvacybersec
6f834c9c06 feat(04-02): implement DirSource with recursive walk, glob exclusion, and mmap
- Add DirSource with filepath.WalkDir recursive traversal
- Default exclusions for .git, node_modules, vendor, *.min.js, *.map
- Binary file detection via NUL byte sniff (first 512 bytes)
- mmap reads for files >= 10MB via golang.org/x/exp/mmap
- Deterministic sorted emission order for reproducible tests
- Refactor FileSource to share emitChunks/isBinary helpers and mmap large files
2026-04-05 15:18:10 +03:00
salvacybersec
e48a7a489e feat(04-03): implement GitSource with full-history traversal
- Walks every commit across branches, tags, remote-tracking refs, and stash
- Deduplicates blob scans by OID (seenBlobs map) so identical content
  across commits/files is scanned exactly once
- Emits chunks with source format git:<short-sha>:<path>
- Honors --since filter via GitSource.Since (commit author date)
- Resolves annotated tag objects down to their commit hash
- Skips binary blobs via go-git IsBinary plus null-byte sniff
- 8 subtests cover history walk, dedup, modified-file, multi-branch,
  tag reachability, since filter, source format, missing repo
2026-04-05 15:18:05 +03:00
salvacybersec
ce6298f304 test(04-02): add failing tests for DirSource recursive walk and mmap 2026-04-05 15:16:48 +03:00
salvacybersec
842cfea268 docs(04-01): complete dependency bootstrap plan 2026-04-05 15:15:32 +03:00
salvacybersec
0f30c0d156 chore(04-01): add go-git, clipboard, and x/exp/mmap dependencies
- github.com/go-git/go-git/v5 v5.17.2 (git history traversal)
- github.com/atotto/clipboard v0.1.4 (cross-platform clipboard)
- golang.org/x/exp (mmap for large file reads)

Wave 0 dependency bootstrap for Phase 4 input sources. Modules
are recorded as indirect until Wave 1 plans import them; go.sum
contains checksums. go build ./... and go vet ./... both green.
2026-04-05 15:14:37 +03:00
salvacybersec
3d38616d80 docs(04-input-sources): create phase plan 2026-04-05 15:12:57 +03:00
salvacybersec
1bc8f02370 docs(04): phase context with source adapter decisions 2026-04-05 15:00:25 +03:00
salvacybersec
03e768782a docs: mark phases 2-3 complete in ROADMAP checkboxes 2026-04-05 14:50:48 +03:00
salvacybersec
626544e4af docs(phase-03): complete phase execution 2026-04-05 14:50:13 +03:00
salvacybersec
a639cdea02 docs(03-08): complete Tier 3-9 guardrail tests plan 2026-04-05 14:46:35 +03:00
salvacybersec
1aea496a17 test(03-08): add Tier 3-9 guardrail tests locking 108 total providers
- Add tier39_test.go with per-tier count assertions (T3=12, T4=16, T5=11, T6=15, T7=10, T8=10, T9=8)
- Lock all 82 Tier 3-9 provider names against drift via expectedTier3..expectedTier9 slices
- Assert total registry provider count == 108
- Existing TestAllPatternsCompile and TestAllProvidersHaveKeywords transitively cover Tier 3-9 regex compilation and keyword presence
- Satisfies PROV-03..PROV-09
2026-04-05 14:45:41 +03:00
salvacybersec
bad80b0d8a Merge branch 'worktree-agent-a090b6ec' 2026-04-05 14:44:26 +03:00
salvacybersec
d34da519dc docs(03-01): complete Tier 4 Chinese/regional providers plan 2026-04-05 14:43:49 +03:00
salvacybersec
592e5ca325 docs(03-07): complete emerging/niche + vector DB providers plan 2026-04-05 14:43:29 +03:00
salvacybersec
f1e6c8e0ac docs(03-06): complete Tier 9 enterprise providers plan
- SUMMARY.md for plan 03-06
- STATE/ROADMAP/REQUIREMENTS updated (PROV-09 complete)
2026-04-05 14:43:02 +03:00
salvacybersec
e9948f4ccf docs(03-02): complete Tier 3 specialized providers plan
11 new Tier 3 providers (search, embeddings, voice, image/video). PROV-03 satisfied.
2026-04-05 14:43:01 +03:00
salvacybersec
a019ba9a3d feat(03-01): add 8 Tier 4 providers (Baichuan, StepFun, SenseTime, iFlytek, Tencent, SiliconFlow, 360AI, Kuaishou)
- SiliconFlow uses documented sk- prefix
- Other 7 keyword-only (no documented key format, avoids false positives)
- Completes PROV-04: 16 Tier 4 Chinese/regional providers
2026-04-05 14:42:46 +03:00
salvacybersec
0789b662c3 docs(03-04): complete Tier 7 code/dev tools providers plan 2026-04-05 14:42:43 +03:00
salvacybersec
a75d81a8d6 docs(03-05): complete Tier 8 self-hosted runtimes plan
- SUMMARY.md documents 10 Tier 8 runtime providers
- PROV-08 satisfied
2026-04-05 14:42:42 +03:00
salvacybersec
d50f83ac2d docs(03-03): complete tier 5 infrastructure/gateway providers plan 2026-04-05 14:42:38 +03:00
salvacybersec
a73cea361b feat(03-07): add LangSmith and 6 vector DB providers
- LangSmith with lsv2_(pt|sk) high-confidence regex
- Pinecone with pcsk_ high-confidence regex
- Weaviate, Qdrant, Chroma, Milvus/Zilliz, Neon (keyword-only)
- Completes 15 Tier 6 emerging/niche providers (PROV-06)
2026-04-05 14:42:36 +03:00
salvacybersec
440daab2a2 feat(03-06): add Databricks, Snowflake, Oracle GenAI, HPE GreenLake Tier 9 providers
- Databricks dapi-prefixed high-confidence regex pattern
- Snowflake/Oracle/HPE keyword-only detection
- Completes PROV-09 (8 Tier 9 enterprise providers)
2026-04-05 14:42:19 +03:00
salvacybersec
367cfedb6f feat(03-05): add GPT4All, text-gen-webui, TensorRT-LLM, Triton, Jan AI provider YAMLs
- 5 more Tier 8 self-hosted runtime definitions (keyword-only)
- Completes 10 Tier 8 providers, satisfying PROV-08
- Dual-located in providers/ and pkg/providers/definitions/
2026-04-05 14:42:04 +03:00
salvacybersec
0ac12e52de feat(03-02): add voice and image/video Tier 3 providers
- Deepgram (hex40, low confidence)
- ElevenLabs (hex32, XI_API_KEY header)
- Stability AI (sk- prefix, medium confidence)
- Runway (keyword-only)
- Midjourney (keyword-only, no official API)

Completes PROV-03: 12 Tier 3 Specialized providers (with pre-existing huggingface).
2026-04-05 14:42:02 +03:00
salvacybersec
fbbb54b7a6 feat(03-04): add CodeWhisperer, Replit AI, Codestral, watsonx, Oracle AI providers
- Codestral with low-confidence 32-char generic pattern + high entropy
- watsonx with IBM IAM token endpoint for verification
- CodeWhisperer, Replit AI, Oracle AI as keyword-only
- Completes PROV-07 (10 Tier 7 code/dev tools providers)
2026-04-05 14:41:56 +03:00
salvacybersec
fbe9e8b0dc feat(03-07): add 8 emerging labs, writing tools, observability providers
- Reka, Aleph Alpha, Lamini (emerging LLM labs)
- Writer, Jasper, Typeface (writing tools)
- Comet ML/Opik, Weights & Biases (observability)
- Dual-located in providers/ and pkg/providers/definitions/
2026-04-05 14:41:56 +03:00
salvacybersec
c8d326c34d feat(03-03): add Martian, Kong, BricksAI, Aether, Not Diamond gateways
- Keyword-only detection (no documented public key formats)
- Completes 11 Tier 5 infrastructure/gateway providers for PROV-05
2026-04-05 14:41:55 +03:00
salvacybersec
35dbbc71f1 feat(03-01): add 8 Tier 4 Chinese providers (DeepSeek, Zhipu, Moonshot, Qwen, Baidu, ByteDance, 01.AI, MiniMax)
- DeepSeek, Moonshot, Qwen use documented sk- prefix patterns
- Zhipu, Baidu, ByteDance use keyword-only detection (no documented key format)
- All dual-located in providers/ and pkg/providers/definitions/
2026-04-05 14:41:50 +03:00
salvacybersec
469ed0c0dd feat(03-06): add Salesforce, ServiceNow, SAP, Palantir Tier 9 providers
- Keyword-only detection; strong env var anchors
- Dual-located in providers/ and pkg/providers/definitions/
2026-04-05 14:41:42 +03:00
salvacybersec
370dca0cbb feat(03-05): add Ollama, vLLM, LocalAI, LM Studio, llama.cpp provider YAMLs
- 5 Tier 8 self-hosted runtime provider definitions (keyword-only)
- Localhost endpoints and env var anchors for OSINT correlation
- Dual-located in providers/ and pkg/providers/definitions/
2026-04-05 14:41:35 +03:00