docs(01-foundation): create phase 1 plan — 5 plans across 3 execution waves
Wave 0: module init + test scaffolding (01-01) Wave 1: provider registry (01-02) + storage layer (01-03) in parallel Wave 2: scan engine pipeline (01-04, depends on 01-02) Wave 3: CLI wiring + integration checkpoint (01-05, depends on all) Covers all 16 Phase 1 requirements: CORE-01 through CORE-07, STOR-01 through STOR-03, CLI-01 through CLI-05, PROV-10. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -43,7 +43,14 @@ Decimal phases appear between their surrounding integers in numeric order.
|
|||||||
3. `keyhunter config init` creates `~/.keyhunter.yaml` and `keyhunter config set <key> <value>` persists values
|
3. `keyhunter config init` creates `~/.keyhunter.yaml` and `keyhunter config set <key> <value>` persists values
|
||||||
4. `keyhunter providers list` and `keyhunter providers info <name>` return provider metadata from YAML definitions
|
4. `keyhunter providers list` and `keyhunter providers info <name>` return provider metadata from YAML definitions
|
||||||
5. Provider YAML schema includes `format_version` and `last_verified` fields validated at load time
|
5. Provider YAML schema includes `format_version` and `last_verified` fields validated at load time
|
||||||
**Plans**: TBD
|
**Plans**: 5 plans
|
||||||
|
|
||||||
|
Plans:
|
||||||
|
- [ ] 01-01-PLAN.md — Go module init, dependency installation, test scaffolding and testdata fixtures
|
||||||
|
- [ ] 01-02-PLAN.md — Provider registry: YAML schema, embed loader, Aho-Corasick automaton, Registry struct
|
||||||
|
- [ ] 01-03-PLAN.md — Storage layer: AES-256-GCM encryption, Argon2id key derivation, SQLite + Finding CRUD
|
||||||
|
- [ ] 01-04-PLAN.md — Scan engine pipeline: keyword pre-filter, regex+entropy detector, FileSource, ants worker pool
|
||||||
|
- [ ] 01-05-PLAN.md — CLI wiring: scan, providers list/info/stats, config init/set/get, output table
|
||||||
|
|
||||||
### Phase 2: Tier 1-2 Providers
|
### Phase 2: Tier 1-2 Providers
|
||||||
**Goal**: The 26 highest-value LLM provider YAML definitions exist with accurate regex patterns, keyword lists, confidence levels, and verify endpoints — covering OpenAI, Anthropic, Google AI, AWS Bedrock, Azure OpenAI and all major inference platforms
|
**Goal**: The 26 highest-value LLM provider YAML definitions exist with accurate regex patterns, keyword lists, confidence levels, and verify endpoints — covering OpenAI, Anthropic, Google AI, AWS Bedrock, Azure OpenAI and all major inference platforms
|
||||||
@@ -248,7 +255,7 @@ Phases execute in numeric order: 1 → 2 → 3 → ... → 18
|
|||||||
|
|
||||||
| Phase | Plans Complete | Status | Completed |
|
| Phase | Plans Complete | Status | Completed |
|
||||||
|-------|----------------|--------|-----------|
|
|-------|----------------|--------|-----------|
|
||||||
| 1. Foundation | 0/? | Not started | - |
|
| 1. Foundation | 0/5 | Planning complete | - |
|
||||||
| 2. Tier 1-2 Providers | 0/? | Not started | - |
|
| 2. Tier 1-2 Providers | 0/? | Not started | - |
|
||||||
| 3. Tier 3-9 Providers | 0/? | Not started | - |
|
| 3. Tier 3-9 Providers | 0/? | Not started | - |
|
||||||
| 4. Input Sources | 0/? | Not started | - |
|
| 4. Input Sources | 0/? | Not started | - |
|
||||||
|
|||||||
359
.planning/phases/01-foundation/01-01-PLAN.md
Normal file
359
.planning/phases/01-foundation/01-01-PLAN.md
Normal file
@@ -0,0 +1,359 @@
|
|||||||
|
---
|
||||||
|
phase: 01-foundation
|
||||||
|
plan: 01
|
||||||
|
type: execute
|
||||||
|
wave: 0
|
||||||
|
depends_on: []
|
||||||
|
files_modified:
|
||||||
|
- go.mod
|
||||||
|
- go.sum
|
||||||
|
- main.go
|
||||||
|
- testdata/samples/openai_key.txt
|
||||||
|
- testdata/samples/anthropic_key.txt
|
||||||
|
- testdata/samples/no_keys.txt
|
||||||
|
- pkg/providers/registry_test.go
|
||||||
|
- pkg/storage/db_test.go
|
||||||
|
- pkg/engine/scanner_test.go
|
||||||
|
autonomous: true
|
||||||
|
requirements: [CORE-01, CORE-02, CORE-03, CORE-04, CORE-05, CORE-06, CORE-07, STOR-01, STOR-02, STOR-03, CLI-01]
|
||||||
|
|
||||||
|
must_haves:
|
||||||
|
truths:
|
||||||
|
- "go.mod exists with all Phase 1 dependencies at pinned versions"
|
||||||
|
- "go build ./... succeeds with zero errors on a fresh checkout"
|
||||||
|
- "go test ./... -short runs without compilation errors (tests may fail — stubs are fine)"
|
||||||
|
- "testdata/ contains files with known key patterns for scanner integration tests"
|
||||||
|
artifacts:
|
||||||
|
- path: "go.mod"
|
||||||
|
provides: "Module declaration with all Phase 1 dependencies"
|
||||||
|
contains: "module github.com/salvacybersec/keyhunter"
|
||||||
|
- path: "main.go"
|
||||||
|
provides: "Binary entry point under 30 lines"
|
||||||
|
contains: "func main()"
|
||||||
|
- path: "testdata/samples/openai_key.txt"
|
||||||
|
provides: "Sample file with synthetic OpenAI key for scanner tests"
|
||||||
|
- path: "pkg/providers/registry_test.go"
|
||||||
|
provides: "Test stubs for provider loading and registry"
|
||||||
|
- path: "pkg/storage/db_test.go"
|
||||||
|
provides: "Test stubs for SQLite + encryption roundtrip"
|
||||||
|
- path: "pkg/engine/scanner_test.go"
|
||||||
|
provides: "Test stubs for pipeline stages"
|
||||||
|
key_links:
|
||||||
|
- from: "go.mod"
|
||||||
|
to: "petar-dambovaliev/aho-corasick"
|
||||||
|
via: "require directive"
|
||||||
|
pattern: "petar-dambovaliev/aho-corasick"
|
||||||
|
- from: "go.mod"
|
||||||
|
to: "modernc.org/sqlite"
|
||||||
|
via: "require directive"
|
||||||
|
pattern: "modernc.org/sqlite"
|
||||||
|
---
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Initialize the Go module, install all Phase 1 dependencies at pinned versions, create the minimal main.go entry point, and lay down test scaffolding with testdata fixtures that every subsequent plan's tests depend on.
|
||||||
|
|
||||||
|
Purpose: All subsequent plans require a compiling module and test infrastructure to exist before they can add production code and make tests green. Wave 0 satisfies this bootstrap requirement.
|
||||||
|
Output: go.mod, go.sum, main.go, pkg/*/test stubs, testdata/ fixtures.
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||||
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
@.planning/PROJECT.md
|
||||||
|
@.planning/ROADMAP.md
|
||||||
|
@.planning/phases/01-foundation/01-RESEARCH.md
|
||||||
|
@.planning/phases/01-foundation/01-VALIDATION.md
|
||||||
|
|
||||||
|
<interfaces>
|
||||||
|
<!-- Module path used throughout the project -->
|
||||||
|
Module: github.com/salvacybersec/keyhunter
|
||||||
|
|
||||||
|
<!-- Pinned versions from RESEARCH.md -->
|
||||||
|
Dependencies to install:
|
||||||
|
github.com/spf13/cobra@v1.10.2
|
||||||
|
github.com/spf13/viper@v1.21.0
|
||||||
|
modernc.org/sqlite@latest
|
||||||
|
gopkg.in/yaml.v3@v3.0.1
|
||||||
|
github.com/petar-dambovaliev/aho-corasick@latest
|
||||||
|
github.com/panjf2000/ants/v2@v2.12.0
|
||||||
|
golang.org/x/crypto@latest
|
||||||
|
golang.org/x/time@latest
|
||||||
|
github.com/charmbracelet/lipgloss@latest
|
||||||
|
github.com/stretchr/testify@latest
|
||||||
|
|
||||||
|
<!-- Go version -->
|
||||||
|
go 1.22
|
||||||
|
|
||||||
|
<!-- Directory structure to scaffold (from RESEARCH.md) -->
|
||||||
|
keyhunter/
|
||||||
|
main.go
|
||||||
|
cmd/
|
||||||
|
root.go (created in Plan 05)
|
||||||
|
scan.go (created in Plan 05)
|
||||||
|
providers.go (created in Plan 05)
|
||||||
|
config.go (created in Plan 05)
|
||||||
|
pkg/
|
||||||
|
providers/ (created in Plan 02)
|
||||||
|
engine/ (created in Plan 04)
|
||||||
|
storage/ (created in Plan 03)
|
||||||
|
config/ (created in Plan 05)
|
||||||
|
output/ (created in Plan 05)
|
||||||
|
providers/ (created in Plan 02)
|
||||||
|
testdata/
|
||||||
|
samples/
|
||||||
|
</interfaces>
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<tasks>
|
||||||
|
|
||||||
|
<task type="auto" tdd="false">
|
||||||
|
<name>Task 1: Initialize Go module and install Phase 1 dependencies</name>
|
||||||
|
<files>go.mod, go.sum</files>
|
||||||
|
<read_first>
|
||||||
|
- /home/salva/Documents/apikey/.planning/phases/01-foundation/01-RESEARCH.md (Standard Stack section — exact library versions)
|
||||||
|
- /home/salva/Documents/apikey/CLAUDE.md (Technology Stack table — version constraints)
|
||||||
|
</read_first>
|
||||||
|
<action>
|
||||||
|
Run the following commands in the project root (/home/salva/Documents/apikey):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
go mod init github.com/salvacybersec/keyhunter
|
||||||
|
go get github.com/spf13/cobra@v1.10.2
|
||||||
|
go get github.com/spf13/viper@v1.21.0
|
||||||
|
go get modernc.org/sqlite@latest
|
||||||
|
go get gopkg.in/yaml.v3@v3.0.1
|
||||||
|
go get github.com/petar-dambovaliev/aho-corasick@latest
|
||||||
|
go get github.com/panjf2000/ants/v2@v2.12.0
|
||||||
|
go get golang.org/x/crypto@latest
|
||||||
|
go get golang.org/x/time@latest
|
||||||
|
go get github.com/charmbracelet/lipgloss@latest
|
||||||
|
go get github.com/stretchr/testify@latest
|
||||||
|
go mod tidy
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify the resulting go.mod contains:
|
||||||
|
- `module github.com/salvacybersec/keyhunter`
|
||||||
|
- `go 1.22` (or 1.22.x)
|
||||||
|
- `github.com/spf13/cobra v1.10.2`
|
||||||
|
- `github.com/spf13/viper v1.21.0`
|
||||||
|
- `github.com/petar-dambovaliev/aho-corasick` (any version)
|
||||||
|
- `github.com/panjf2000/ants/v2 v2.12.0`
|
||||||
|
- `modernc.org/sqlite` (any v1.35.x)
|
||||||
|
- `github.com/charmbracelet/lipgloss` (any version)
|
||||||
|
|
||||||
|
Do NOT add: chi, templ, telego, gocron — these are Phase 17-18 only.
|
||||||
|
Do NOT use CGO_ENABLED=1 or mattn/go-sqlite3.
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && grep -q 'module github.com/salvacybersec/keyhunter' go.mod && grep -q 'cobra v1.10.2' go.mod && grep -q 'modernc.org/sqlite' go.mod && echo "go.mod OK"</automated>
|
||||||
|
</verify>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- go.mod contains `module github.com/salvacybersec/keyhunter`
|
||||||
|
- go.mod contains `github.com/spf13/cobra v1.10.2` (exact)
|
||||||
|
- go.mod contains `github.com/spf13/viper v1.21.0` (exact)
|
||||||
|
- go.mod contains `github.com/panjf2000/ants/v2 v2.12.0` (exact)
|
||||||
|
- go.mod contains `modernc.org/sqlite` (v1.35.x)
|
||||||
|
- go.mod contains `github.com/petar-dambovaliev/aho-corasick`
|
||||||
|
- go.mod contains `golang.org/x/crypto`
|
||||||
|
- go.mod contains `github.com/charmbracelet/lipgloss`
|
||||||
|
- go.sum exists and is non-empty
|
||||||
|
- `go mod verify` exits 0
|
||||||
|
</acceptance_criteria>
|
||||||
|
<done>go.mod and go.sum committed with all Phase 1 dependencies at correct versions</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="auto" tdd="false">
|
||||||
|
<name>Task 2: Create main.go entry point and test scaffolding</name>
|
||||||
|
<files>
|
||||||
|
main.go,
|
||||||
|
testdata/samples/openai_key.txt,
|
||||||
|
testdata/samples/anthropic_key.txt,
|
||||||
|
testdata/samples/multiple_keys.txt,
|
||||||
|
testdata/samples/no_keys.txt,
|
||||||
|
pkg/providers/registry_test.go,
|
||||||
|
pkg/storage/db_test.go,
|
||||||
|
pkg/engine/scanner_test.go
|
||||||
|
</files>
|
||||||
|
<read_first>
|
||||||
|
- /home/salva/Documents/apikey/.planning/phases/01-foundation/01-VALIDATION.md (Wave 0 Requirements and Per-Task Verification Map)
|
||||||
|
- /home/salva/Documents/apikey/.planning/phases/01-foundation/01-RESEARCH.md (Architecture Patterns, project structure diagram)
|
||||||
|
</read_first>
|
||||||
|
<action>
|
||||||
|
Create the following files:
|
||||||
|
|
||||||
|
**main.go** (must be under 30 lines):
|
||||||
|
```go
|
||||||
|
package main
|
||||||
|
|
||||||
|
import "github.com/salvacybersec/keyhunter/cmd"
|
||||||
|
|
||||||
|
func main() {
|
||||||
|
cmd.Execute()
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**testdata/samples/openai_key.txt** — file containing a synthetic (non-real) OpenAI-style key for scanner integration tests:
|
||||||
|
```
|
||||||
|
# Test file: synthetic OpenAI key pattern
|
||||||
|
OPENAI_API_KEY=sk-proj-ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr1234
|
||||||
|
```
|
||||||
|
|
||||||
|
**testdata/samples/anthropic_key.txt** — file containing a synthetic Anthropic-style key:
|
||||||
|
```
|
||||||
|
# Test file: synthetic Anthropic key pattern
|
||||||
|
export ANTHROPIC_API_KEY="sk-ant-api03-ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxy01234567890-ABCDE"
|
||||||
|
```
|
||||||
|
|
||||||
|
**testdata/samples/multiple_keys.txt** — file with both key types:
|
||||||
|
```
|
||||||
|
# Multiple providers in one file
|
||||||
|
OPENAI_API_KEY=sk-proj-ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr5678
|
||||||
|
ANTHROPIC_API_KEY=sk-ant-api03-XYZabcdefghijklmnopqrstuvwxyz01234567890ABCDEFGH-XYZAB
|
||||||
|
```
|
||||||
|
|
||||||
|
**testdata/samples/no_keys.txt** — file with no keys (negative test case):
|
||||||
|
```
|
||||||
|
# This file contains no API keys
|
||||||
|
# Used to verify false-positive rate is zero for clean files
|
||||||
|
Hello world
|
||||||
|
```
|
||||||
|
|
||||||
|
**pkg/providers/registry_test.go** — test stubs (will be filled by Plan 02):
|
||||||
|
```go
|
||||||
|
package providers_test
|
||||||
|
|
||||||
|
import (
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestRegistryLoad verifies that provider YAML files are loaded from embed.FS.
|
||||||
|
// Stub: will be implemented when registry.go exists (Plan 02).
|
||||||
|
func TestRegistryLoad(t *testing.T) {
|
||||||
|
t.Skip("stub — implement after registry.go exists")
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestProviderSchemaValidation verifies format_version and last_verified are required.
|
||||||
|
// Stub: will be implemented when schema.go validation exists (Plan 02).
|
||||||
|
func TestProviderSchemaValidation(t *testing.T) {
|
||||||
|
t.Skip("stub — implement after schema.go validation exists")
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestAhoCorasickBuild verifies Aho-Corasick automaton builds from provider keywords.
|
||||||
|
// Stub: will be implemented when registry builds automaton (Plan 02).
|
||||||
|
func TestAhoCorasickBuild(t *testing.T) {
|
||||||
|
t.Skip("stub — implement after registry AC build exists")
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**pkg/storage/db_test.go** — test stubs (will be filled by Plan 03):
|
||||||
|
```go
|
||||||
|
package storage_test
|
||||||
|
|
||||||
|
import (
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestDBOpen verifies SQLite database opens and creates schema.
|
||||||
|
// Stub: will be implemented when db.go exists (Plan 03).
|
||||||
|
func TestDBOpen(t *testing.T) {
|
||||||
|
t.Skip("stub — implement after db.go exists")
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestEncryptDecryptRoundtrip verifies AES-256-GCM encrypt/decrypt roundtrip.
|
||||||
|
// Stub: will be implemented when encrypt.go exists (Plan 03).
|
||||||
|
func TestEncryptDecryptRoundtrip(t *testing.T) {
|
||||||
|
t.Skip("stub — implement after encrypt.go exists")
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestArgon2KeyDerivation verifies Argon2id produces 32-byte key deterministically.
|
||||||
|
// Stub: will be implemented when crypto.go exists (Plan 03).
|
||||||
|
func TestArgon2KeyDerivation(t *testing.T) {
|
||||||
|
t.Skip("stub — implement after crypto.go exists")
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**pkg/engine/scanner_test.go** — test stubs (will be filled by Plan 04):
|
||||||
|
```go
|
||||||
|
package engine_test
|
||||||
|
|
||||||
|
import (
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestShannonEntropy verifies the entropy function returns expected values.
|
||||||
|
// Stub: will be implemented when entropy.go exists (Plan 04).
|
||||||
|
func TestShannonEntropy(t *testing.T) {
|
||||||
|
t.Skip("stub — implement after entropy.go exists")
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestKeywordPreFilter verifies Aho-Corasick pre-filter rejects files without keywords.
|
||||||
|
// Stub: will be implemented when filter.go exists (Plan 04).
|
||||||
|
func TestKeywordPreFilter(t *testing.T) {
|
||||||
|
t.Skip("stub — implement after filter.go exists")
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestScannerPipeline verifies end-to-end scan of testdata returns expected findings.
|
||||||
|
// Stub: will be implemented when engine.go exists (Plan 04).
|
||||||
|
func TestScannerPipeline(t *testing.T) {
|
||||||
|
t.Skip("stub — implement after engine.go exists")
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create the `cmd/` package directory with a minimal stub so main.go compiles:
|
||||||
|
|
||||||
|
**cmd/root.go** (minimal stub — will be replaced by Plan 05):
|
||||||
|
```go
|
||||||
|
package cmd
|
||||||
|
|
||||||
|
import "os"
|
||||||
|
|
||||||
|
// Execute is a stub. The real command tree is built in Plan 05.
|
||||||
|
func Execute() {
|
||||||
|
_ = os.Args
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
After creating all files, run `go build ./...` to confirm the module compiles.
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go build ./... && go test ./... -short 2>&1 | grep -v "^--- SKIP" | grep -v "^SKIP" | grep -v "^ok" || true && echo "BUILD OK"</automated>
|
||||||
|
</verify>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- `go build ./...` exits 0 with no errors
|
||||||
|
- `go test ./... -short` exits 0 (all stubs skip, no failures)
|
||||||
|
- main.go is under 30 lines
|
||||||
|
- testdata/samples/openai_key.txt contains `sk-proj-` prefix
|
||||||
|
- testdata/samples/anthropic_key.txt contains `sk-ant-api03-` prefix
|
||||||
|
- testdata/samples/no_keys.txt contains no key patterns
|
||||||
|
- pkg/providers/registry_test.go, pkg/storage/db_test.go, pkg/engine/scanner_test.go each exist with skip-based stubs
|
||||||
|
- cmd/root.go exists so `go build ./...` compiles
|
||||||
|
</acceptance_criteria>
|
||||||
|
<done>Module compiles, test stubs exist, testdata fixtures created. Subsequent plans can now add production code and make tests green.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
</tasks>
|
||||||
|
|
||||||
|
<verification>
|
||||||
|
After both tasks:
|
||||||
|
- `cd /home/salva/Documents/apikey && go build ./...` exits 0
|
||||||
|
- `go test ./... -short` exits 0
|
||||||
|
- `grep -r 'sk-proj-' testdata/` finds the OpenAI test fixture
|
||||||
|
- `grep -r 'sk-ant-api03-' testdata/` finds the Anthropic test fixture
|
||||||
|
- go.mod has all required dependencies at specified versions
|
||||||
|
</verification>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- go.mod initialized with module path `github.com/salvacybersec/keyhunter` and Go 1.22
|
||||||
|
- All 10 Phase 1 dependencies installed at correct versions
|
||||||
|
- main.go under 30 lines, compiles successfully
|
||||||
|
- 3 test stub files exist (providers, storage, engine)
|
||||||
|
- 4 testdata fixture files exist (openai key, anthropic key, multiple keys, no keys)
|
||||||
|
- `go build ./...` and `go test ./... -short` both exit 0
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<output>
|
||||||
|
After completion, create `.planning/phases/01-foundation/01-01-SUMMARY.md` following the summary template.
|
||||||
|
</output>
|
||||||
663
.planning/phases/01-foundation/01-02-PLAN.md
Normal file
663
.planning/phases/01-foundation/01-02-PLAN.md
Normal file
@@ -0,0 +1,663 @@
|
|||||||
|
---
|
||||||
|
phase: 01-foundation
|
||||||
|
plan: 02
|
||||||
|
type: execute
|
||||||
|
wave: 1
|
||||||
|
depends_on: [01-01]
|
||||||
|
files_modified:
|
||||||
|
- providers/openai.yaml
|
||||||
|
- providers/anthropic.yaml
|
||||||
|
- providers/huggingface.yaml
|
||||||
|
- pkg/providers/schema.go
|
||||||
|
- pkg/providers/loader.go
|
||||||
|
- pkg/providers/registry.go
|
||||||
|
- pkg/providers/registry_test.go
|
||||||
|
autonomous: true
|
||||||
|
requirements: [CORE-02, CORE-03, CORE-06, PROV-10]
|
||||||
|
|
||||||
|
must_haves:
|
||||||
|
truths:
|
||||||
|
- "Provider YAML files are embedded at compile time — no filesystem access at runtime"
|
||||||
|
- "Registry loads all YAML files from embed.FS and returns a slice of Provider structs"
|
||||||
|
- "Provider schema validation rejects YAML missing format_version or last_verified"
|
||||||
|
- "Aho-Corasick automaton is built from all provider keywords at registry init"
|
||||||
|
- "keyhunter providers list command lists providers (tested via registry methods)"
|
||||||
|
artifacts:
|
||||||
|
- path: "providers/openai.yaml"
|
||||||
|
provides: "Reference provider definition with all schema fields"
|
||||||
|
contains: "format_version"
|
||||||
|
- path: "pkg/providers/schema.go"
|
||||||
|
provides: "Provider, Pattern, VerifySpec Go structs with UnmarshalYAML validation"
|
||||||
|
exports: ["Provider", "Pattern", "VerifySpec"]
|
||||||
|
- path: "pkg/providers/registry.go"
|
||||||
|
provides: "Registry struct with List, Get, Stats, AC methods"
|
||||||
|
exports: ["Registry", "NewRegistry"]
|
||||||
|
- path: "pkg/providers/loader.go"
|
||||||
|
provides: "embed.FS declaration and fs.WalkDir loading logic"
|
||||||
|
contains: "go:embed"
|
||||||
|
key_links:
|
||||||
|
- from: "pkg/providers/loader.go"
|
||||||
|
to: "providers/*.yaml"
|
||||||
|
via: "//go:embed directive"
|
||||||
|
pattern: "go:embed.*providers"
|
||||||
|
- from: "pkg/providers/registry.go"
|
||||||
|
to: "github.com/petar-dambovaliev/aho-corasick"
|
||||||
|
via: "AC automaton build at NewRegistry()"
|
||||||
|
pattern: "ahocorasick"
|
||||||
|
- from: "pkg/providers/schema.go"
|
||||||
|
to: "format_version and last_verified YAML fields"
|
||||||
|
via: "UnmarshalYAML validation"
|
||||||
|
pattern: "UnmarshalYAML"
|
||||||
|
---
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Build the provider registry: YAML schema structs with validation, embed.FS loader, in-memory registry with List/Get/Stats/AC methods, and three reference provider YAML definitions. The Aho-Corasick automaton is built from all provider keywords at registry initialization.
|
||||||
|
|
||||||
|
Purpose: Every downstream subsystem (scan engine, CLI providers command, verification engine) depends on the Registry interface. This plan establishes the stable contract they build against.
|
||||||
|
Output: providers/*.yaml, pkg/providers/{schema,loader,registry}.go, registry_test.go (stubs filled).
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||||
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
@.planning/phases/01-foundation/01-RESEARCH.md
|
||||||
|
@.planning/phases/01-foundation/01-01-SUMMARY.md
|
||||||
|
|
||||||
|
<interfaces>
|
||||||
|
<!-- Provider YAML schema (from ARCHITECTURE.md and RESEARCH.md) -->
|
||||||
|
Full provider YAML structure:
|
||||||
|
```yaml
|
||||||
|
format_version: 1
|
||||||
|
name: openai
|
||||||
|
display_name: OpenAI
|
||||||
|
tier: 1
|
||||||
|
last_verified: "2026-04-04"
|
||||||
|
keywords:
|
||||||
|
- "sk-proj-"
|
||||||
|
- "openai"
|
||||||
|
patterns:
|
||||||
|
- regex: 'sk-proj-[A-Za-z0-9_\-]{48,}'
|
||||||
|
entropy_min: 3.5
|
||||||
|
confidence: high
|
||||||
|
verify:
|
||||||
|
method: GET
|
||||||
|
url: https://api.openai.com/v1/models
|
||||||
|
headers:
|
||||||
|
Authorization: "Bearer {KEY}"
|
||||||
|
valid_status: [200]
|
||||||
|
invalid_status: [401, 403]
|
||||||
|
```
|
||||||
|
|
||||||
|
<!-- Go struct mapping -->
|
||||||
|
Provider struct fields:
|
||||||
|
FormatVersion int (yaml:"format_version" — must be >= 1)
|
||||||
|
Name string (yaml:"name")
|
||||||
|
DisplayName string (yaml:"display_name")
|
||||||
|
Tier int (yaml:"tier")
|
||||||
|
LastVerified string (yaml:"last_verified" — must be non-empty)
|
||||||
|
Keywords []string (yaml:"keywords")
|
||||||
|
Patterns []Pattern (yaml:"patterns")
|
||||||
|
Verify VerifySpec (yaml:"verify")
|
||||||
|
|
||||||
|
Pattern struct fields:
|
||||||
|
Regex string (yaml:"regex")
|
||||||
|
EntropyMin float64 (yaml:"entropy_min")
|
||||||
|
Confidence string (yaml:"confidence" — "high", "medium", "low")
|
||||||
|
|
||||||
|
VerifySpec struct fields:
|
||||||
|
Method string (yaml:"method")
|
||||||
|
URL string (yaml:"url")
|
||||||
|
Headers map[string]string (yaml:"headers")
|
||||||
|
ValidStatus []int (yaml:"valid_status")
|
||||||
|
InvalidStatus []int (yaml:"invalid_status")
|
||||||
|
|
||||||
|
<!-- Registry methods needed by downstream plans -->
|
||||||
|
type Registry struct { ... }
|
||||||
|
func NewRegistry() (*Registry, error)
|
||||||
|
func (r *Registry) List() []Provider
|
||||||
|
func (r *Registry) Get(name string) (Provider, bool)
|
||||||
|
func (r *Registry) Stats() RegistryStats // {Total int, ByTier map[int]int, ByConfidence map[string]int}
|
||||||
|
func (r *Registry) AC() ahocorasick.AhoCorasick // pre-built automaton
|
||||||
|
|
||||||
|
<!-- embed path convention -->
|
||||||
|
The embed directive must reference providers relative to loader.go location.
|
||||||
|
loader.go is at pkg/providers/loader.go.
|
||||||
|
providers/ directory is at project root.
|
||||||
|
Use: //go:embed ../../providers/*.yaml
|
||||||
|
and embed.FS path will be "../../providers/openai.yaml" etc.
|
||||||
|
|
||||||
|
Actually: Go embed paths must be relative and cannot use "..".
|
||||||
|
Correct approach: place the embed in a file at project root level, or adjust.
|
||||||
|
Better approach from research: put loader in providers package, embed from pkg/providers,
|
||||||
|
but reference the providers/ dir which sits at root.
|
||||||
|
|
||||||
|
Resolution: The go:embed directive path is relative to the SOURCE FILE, not the module root.
|
||||||
|
Since loader.go is at pkg/providers/loader.go, to embed ../../providers/*.yaml would work
|
||||||
|
syntactically but Go's embed restricts paths containing "..".
|
||||||
|
|
||||||
|
Use this instead: place a providers_embed.go at the PROJECT ROOT (same dir as go.mod):
|
||||||
|
package main -- NO, this breaks package separation
|
||||||
|
|
||||||
|
Correct architectural pattern (from RESEARCH.md example):
|
||||||
|
The embed FS should be in pkg/providers/loader.go using a path that doesn't traverse up.
|
||||||
|
Solution: embed the providers directory from within the providers package itself by
|
||||||
|
symlinking or — better — move the YAML files to pkg/providers/definitions/*.yaml and use:
|
||||||
|
//go:embed definitions/*.yaml
|
||||||
|
|
||||||
|
This is the clean solution: pkg/providers/definitions/openai.yaml etc.
|
||||||
|
Update files_modified accordingly. The RESEARCH.md shows //go:embed ../../providers/*.yaml
|
||||||
|
but that path won't work with Go's embed restrictions. Use definitions/ subdirectory instead.
|
||||||
|
</interfaces>
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<tasks>
|
||||||
|
|
||||||
|
<task type="auto" tdd="true">
|
||||||
|
<name>Task 1: Provider YAML schema structs with validation</name>
|
||||||
|
<files>pkg/providers/schema.go, providers/openai.yaml, providers/anthropic.yaml, providers/huggingface.yaml</files>
|
||||||
|
<read_first>
|
||||||
|
- /home/salva/Documents/apikey/.planning/phases/01-foundation/01-RESEARCH.md (Pattern 1: Provider Registry, Provider YAML schema section, PROV-10 row in requirements table)
|
||||||
|
- /home/salva/Documents/apikey/.planning/research/ARCHITECTURE.md (Provider Registry component, YAML schema example)
|
||||||
|
</read_first>
|
||||||
|
<behavior>
|
||||||
|
- Test 1: Provider with format_version=0 → UnmarshalYAML returns error "format_version must be >= 1"
|
||||||
|
- Test 2: Provider with empty last_verified → UnmarshalYAML returns error "last_verified is required"
|
||||||
|
- Test 3: Valid provider YAML → UnmarshalYAML succeeds, Provider.Name == "openai"
|
||||||
|
- Test 4: Provider with no patterns → loaded successfully (patterns list can be empty for schema-only providers)
|
||||||
|
- Test 5: Pattern.Confidence not in {"high","medium","low"} → error "confidence must be high, medium, or low"
|
||||||
|
</behavior>
|
||||||
|
<action>
|
||||||
|
Create pkg/providers/schema.go:
|
||||||
|
|
||||||
|
```go
|
||||||
|
package providers
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"gopkg.in/yaml.v3"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Provider represents a single API key provider definition loaded from YAML.
|
||||||
|
type Provider struct {
|
||||||
|
FormatVersion int `yaml:"format_version"`
|
||||||
|
Name string `yaml:"name"`
|
||||||
|
DisplayName string `yaml:"display_name"`
|
||||||
|
Tier int `yaml:"tier"`
|
||||||
|
LastVerified string `yaml:"last_verified"`
|
||||||
|
Keywords []string `yaml:"keywords"`
|
||||||
|
Patterns []Pattern `yaml:"patterns"`
|
||||||
|
Verify VerifySpec `yaml:"verify"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pattern defines a single regex pattern for API key detection.
|
||||||
|
type Pattern struct {
|
||||||
|
Regex string `yaml:"regex"`
|
||||||
|
EntropyMin float64 `yaml:"entropy_min"`
|
||||||
|
Confidence string `yaml:"confidence"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// VerifySpec defines how to verify a key is live (used by Phase 5 verification engine).
|
||||||
|
type VerifySpec struct {
|
||||||
|
Method string `yaml:"method"`
|
||||||
|
URL string `yaml:"url"`
|
||||||
|
Headers map[string]string `yaml:"headers"`
|
||||||
|
ValidStatus []int `yaml:"valid_status"`
|
||||||
|
InvalidStatus []int `yaml:"invalid_status"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// RegistryStats holds aggregate statistics about loaded providers.
|
||||||
|
type RegistryStats struct {
|
||||||
|
Total int
|
||||||
|
ByTier map[int]int
|
||||||
|
ByConfidence map[string]int
|
||||||
|
}
|
||||||
|
|
||||||
|
// UnmarshalYAML implements yaml.Unmarshaler with schema validation (satisfies PROV-10).
|
||||||
|
func (p *Provider) UnmarshalYAML(value *yaml.Node) error {
|
||||||
|
// Use a type alias to avoid infinite recursion
|
||||||
|
type ProviderAlias Provider
|
||||||
|
var alias ProviderAlias
|
||||||
|
if err := value.Decode(&alias); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if alias.FormatVersion < 1 {
|
||||||
|
return fmt.Errorf("provider %q: format_version must be >= 1 (got %d)", alias.Name, alias.FormatVersion)
|
||||||
|
}
|
||||||
|
if alias.LastVerified == "" {
|
||||||
|
return fmt.Errorf("provider %q: last_verified is required", alias.Name)
|
||||||
|
}
|
||||||
|
validConfidences := map[string]bool{"high": true, "medium": true, "low": true, "": true}
|
||||||
|
for _, pat := range alias.Patterns {
|
||||||
|
if !validConfidences[pat.Confidence] {
|
||||||
|
return fmt.Errorf("provider %q: pattern confidence %q must be high, medium, or low", alias.Name, pat.Confidence)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
*p = Provider(alias)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create the three reference YAML provider definitions. These are SCHEMA EXAMPLES for Phase 1; full pattern libraries come in Phase 2-3.
|
||||||
|
|
||||||
|
**providers/openai.yaml:**
|
||||||
|
```yaml
|
||||||
|
format_version: 1
|
||||||
|
name: openai
|
||||||
|
display_name: OpenAI
|
||||||
|
tier: 1
|
||||||
|
last_verified: "2026-04-04"
|
||||||
|
keywords:
|
||||||
|
- "sk-proj-"
|
||||||
|
- "openai"
|
||||||
|
patterns:
|
||||||
|
- regex: 'sk-proj-[A-Za-z0-9_\-]{48,}'
|
||||||
|
entropy_min: 3.5
|
||||||
|
confidence: high
|
||||||
|
verify:
|
||||||
|
method: GET
|
||||||
|
url: https://api.openai.com/v1/models
|
||||||
|
headers:
|
||||||
|
Authorization: "Bearer {KEY}"
|
||||||
|
valid_status: [200]
|
||||||
|
invalid_status: [401, 403]
|
||||||
|
```
|
||||||
|
|
||||||
|
**providers/anthropic.yaml:**
|
||||||
|
```yaml
|
||||||
|
format_version: 1
|
||||||
|
name: anthropic
|
||||||
|
display_name: Anthropic
|
||||||
|
tier: 1
|
||||||
|
last_verified: "2026-04-04"
|
||||||
|
keywords:
|
||||||
|
- "sk-ant-api03-"
|
||||||
|
- "anthropic"
|
||||||
|
patterns:
|
||||||
|
- regex: 'sk-ant-api03-[A-Za-z0-9_\-]{93,}'
|
||||||
|
entropy_min: 3.5
|
||||||
|
confidence: high
|
||||||
|
verify:
|
||||||
|
method: GET
|
||||||
|
url: https://api.anthropic.com/v1/models
|
||||||
|
headers:
|
||||||
|
x-api-key: "{KEY}"
|
||||||
|
anthropic-version: "2023-06-01"
|
||||||
|
valid_status: [200]
|
||||||
|
invalid_status: [401, 403]
|
||||||
|
```
|
||||||
|
|
||||||
|
**providers/huggingface.yaml:**
|
||||||
|
```yaml
|
||||||
|
format_version: 1
|
||||||
|
name: huggingface
|
||||||
|
display_name: HuggingFace
|
||||||
|
tier: 3
|
||||||
|
last_verified: "2026-04-04"
|
||||||
|
keywords:
|
||||||
|
- "hf_"
|
||||||
|
- "huggingface"
|
||||||
|
patterns:
|
||||||
|
- regex: 'hf_[A-Za-z0-9]{34,}'
|
||||||
|
entropy_min: 3.5
|
||||||
|
confidence: high
|
||||||
|
verify:
|
||||||
|
method: GET
|
||||||
|
url: https://huggingface.co/api/whoami-v2
|
||||||
|
headers:
|
||||||
|
Authorization: "Bearer {KEY}"
|
||||||
|
valid_status: [200]
|
||||||
|
invalid_status: [401, 403]
|
||||||
|
```
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go build ./pkg/providers/... && go test ./pkg/providers/... -run TestProviderSchemaValidation -v 2>&1 | head -30</automated>
|
||||||
|
</verify>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- `go build ./pkg/providers/...` exits 0
|
||||||
|
- providers/openai.yaml contains `format_version: 1` and `last_verified`
|
||||||
|
- providers/anthropic.yaml contains `format_version: 1` and `last_verified`
|
||||||
|
- providers/huggingface.yaml contains `format_version: 1` and `last_verified`
|
||||||
|
- pkg/providers/schema.go exports: Provider, Pattern, VerifySpec, RegistryStats
|
||||||
|
- Provider.UnmarshalYAML returns error when format_version < 1
|
||||||
|
- Provider.UnmarshalYAML returns error when last_verified is empty
|
||||||
|
- `grep -q 'UnmarshalYAML' pkg/providers/schema.go` exits 0
|
||||||
|
</acceptance_criteria>
|
||||||
|
<done>Provider schema structs exist with validation. Three reference YAML files exist with all required fields.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="auto" tdd="true">
|
||||||
|
<name>Task 2: Embed loader, registry with Aho-Corasick, and filled test stubs</name>
|
||||||
|
<files>pkg/providers/loader.go, pkg/providers/registry.go, pkg/providers/registry_test.go</files>
|
||||||
|
<read_first>
|
||||||
|
- /home/salva/Documents/apikey/.planning/phases/01-foundation/01-RESEARCH.md (Pattern 1: Provider Registry with Compile-Time Embed — exact code example)
|
||||||
|
- /home/salva/Documents/apikey/pkg/providers/schema.go (types just created in Task 1)
|
||||||
|
</read_first>
|
||||||
|
<behavior>
|
||||||
|
- Test 1: NewRegistry() loads 3 providers from embedded YAML → registry.List() returns slice of length 3
|
||||||
|
- Test 2: registry.Get("openai") → returns Provider with Name=="openai", bool==true
|
||||||
|
- Test 3: registry.Get("nonexistent") → returns zero Provider, bool==false
|
||||||
|
- Test 4: registry.Stats().Total == 3 and Stats().ByTier[1] == 2 (openai + anthropic are tier 1)
|
||||||
|
- Test 5: AC automaton built — registry.AC().FindAll("sk-proj-abc") returns non-empty slice
|
||||||
|
- Test 6: AC automaton does NOT match — registry.AC().FindAll("hello world") returns empty slice
|
||||||
|
</behavior>
|
||||||
|
<action>
|
||||||
|
IMPORTANT NOTE ON EMBED PATHS: Go's embed package does NOT allow paths containing "..".
|
||||||
|
Since loader.go is at pkg/providers/loader.go, it CANNOT embed ../../providers/*.yaml.
|
||||||
|
|
||||||
|
Solution: Place provider YAML files at pkg/providers/definitions/*.yaml and use:
|
||||||
|
//go:embed definitions/*.yaml
|
||||||
|
|
||||||
|
This means the YAML files created in Task 1 at providers/openai.yaml etc. are the
|
||||||
|
"source of truth" files users may inspect, but the embedded versions live in
|
||||||
|
pkg/providers/definitions/. Copy them there (or move and update Task 1 output).
|
||||||
|
|
||||||
|
Actually, the cleanest solution per Go embed docs: put an embed.go file at the PACKAGE
|
||||||
|
level that embeds a subdirectory. Since pkg/providers/ package owns the embed, use:
|
||||||
|
pkg/providers/definitions/openai.yaml (embedded)
|
||||||
|
providers/openai.yaml (user-facing, can symlink or keep as docs)
|
||||||
|
|
||||||
|
For Phase 1, keep BOTH: the providers/ root dir for user reference, definitions/ for embed.
|
||||||
|
Copy the three YAML files from providers/ to pkg/providers/definitions/ at the end.
|
||||||
|
|
||||||
|
Create **pkg/providers/loader.go**:
|
||||||
|
```go
|
||||||
|
package providers
|
||||||
|
|
||||||
|
import (
|
||||||
|
"embed"
|
||||||
|
"fmt"
|
||||||
|
"io/fs"
|
||||||
|
"path/filepath"
|
||||||
|
|
||||||
|
"gopkg.in/yaml.v3"
|
||||||
|
)
|
||||||
|
|
||||||
|
//go:embed definitions/*.yaml
|
||||||
|
var definitionsFS embed.FS
|
||||||
|
|
||||||
|
// loadProviders reads all YAML files from the embedded definitions FS.
|
||||||
|
func loadProviders() ([]Provider, error) {
|
||||||
|
var providers []Provider
|
||||||
|
err := fs.WalkDir(definitionsFS, "definitions", func(path string, d fs.DirEntry, err error) error {
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if d.IsDir() || filepath.Ext(path) != ".yaml" {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
data, err := definitionsFS.ReadFile(path)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("reading provider file %s: %w", path, err)
|
||||||
|
}
|
||||||
|
var p Provider
|
||||||
|
if err := yaml.Unmarshal(data, &p); err != nil {
|
||||||
|
return fmt.Errorf("parsing provider %s: %w", path, err)
|
||||||
|
}
|
||||||
|
providers = append(providers, p)
|
||||||
|
return nil
|
||||||
|
})
|
||||||
|
return providers, err
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create **pkg/providers/registry.go**:
|
||||||
|
```go
|
||||||
|
package providers
|
||||||
|
|
||||||
|
import (
|
||||||
|
ahocorasick "github.com/petar-dambovaliev/aho-corasick"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Registry is the in-memory store of all loaded provider definitions.
|
||||||
|
// It is initialized once at startup and is safe for concurrent reads.
|
||||||
|
type Registry struct {
|
||||||
|
providers []Provider
|
||||||
|
index map[string]int // name -> slice index
|
||||||
|
ac ahocorasick.AhoCorasick // pre-built automaton for keyword pre-filter
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewRegistry loads all embedded provider YAML files, validates them, builds the
|
||||||
|
// Aho-Corasick automaton from all provider keywords, and returns the Registry.
|
||||||
|
func NewRegistry() (*Registry, error) {
|
||||||
|
providers, err := loadProviders()
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("loading providers: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
index := make(map[string]int, len(providers))
|
||||||
|
var keywords []string
|
||||||
|
for i, p := range providers {
|
||||||
|
index[p.Name] = i
|
||||||
|
keywords = append(keywords, p.Keywords...)
|
||||||
|
}
|
||||||
|
|
||||||
|
builder := ahocorasick.NewAhoCorasickBuilder(ahocorasick.Opts{DFA: true})
|
||||||
|
ac := builder.Build(keywords)
|
||||||
|
|
||||||
|
return &Registry{
|
||||||
|
providers: providers,
|
||||||
|
index: index,
|
||||||
|
ac: ac,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// List returns all loaded providers.
|
||||||
|
func (r *Registry) List() []Provider {
|
||||||
|
return r.providers
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get returns a provider by name and a boolean indicating whether it was found.
|
||||||
|
func (r *Registry) Get(name string) (Provider, bool) {
|
||||||
|
idx, ok := r.index[name]
|
||||||
|
if !ok {
|
||||||
|
return Provider{}, false
|
||||||
|
}
|
||||||
|
return r.providers[idx], true
|
||||||
|
}
|
||||||
|
|
||||||
|
// Stats returns aggregate statistics about the loaded providers.
|
||||||
|
func (r *Registry) Stats() RegistryStats {
|
||||||
|
stats := RegistryStats{
|
||||||
|
Total: len(r.providers),
|
||||||
|
ByTier: make(map[int]int),
|
||||||
|
ByConfidence: make(map[string]int),
|
||||||
|
}
|
||||||
|
for _, p := range r.providers {
|
||||||
|
stats.ByTier[p.Tier]++
|
||||||
|
for _, pat := range p.Patterns {
|
||||||
|
stats.ByConfidence[pat.Confidence]++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return stats
|
||||||
|
}
|
||||||
|
|
||||||
|
// AC returns the pre-built Aho-Corasick automaton for keyword pre-filtering.
|
||||||
|
func (r *Registry) AC() ahocorasick.AhoCorasick {
|
||||||
|
return r.ac
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Note: registry.go needs `import "fmt"` added.
|
||||||
|
|
||||||
|
Then copy the three YAML files into the embed location:
|
||||||
|
```bash
|
||||||
|
mkdir -p /home/salva/Documents/apikey/pkg/providers/definitions
|
||||||
|
cp /home/salva/Documents/apikey/providers/openai.yaml /home/salva/Documents/apikey/pkg/providers/definitions/
|
||||||
|
cp /home/salva/Documents/apikey/providers/anthropic.yaml /home/salva/Documents/apikey/pkg/providers/definitions/
|
||||||
|
cp /home/salva/Documents/apikey/providers/huggingface.yaml /home/salva/Documents/apikey/pkg/providers/definitions/
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, fill in **pkg/providers/registry_test.go** (replacing the stubs from Plan 01):
|
||||||
|
```go
|
||||||
|
package providers_test
|
||||||
|
|
||||||
|
import (
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/providers"
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestRegistryLoad(t *testing.T) {
|
||||||
|
reg, err := providers.NewRegistry()
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.GreaterOrEqual(t, len(reg.List()), 3, "expected at least 3 providers loaded")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestRegistryGet(t *testing.T) {
|
||||||
|
reg, err := providers.NewRegistry()
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
p, ok := reg.Get("openai")
|
||||||
|
assert.True(t, ok)
|
||||||
|
assert.Equal(t, "openai", p.Name)
|
||||||
|
assert.Equal(t, 1, p.Tier)
|
||||||
|
|
||||||
|
_, ok = reg.Get("nonexistent-provider")
|
||||||
|
assert.False(t, ok)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestRegistryStats(t *testing.T) {
|
||||||
|
reg, err := providers.NewRegistry()
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
stats := reg.Stats()
|
||||||
|
assert.GreaterOrEqual(t, stats.Total, 3)
|
||||||
|
assert.GreaterOrEqual(t, stats.ByTier[1], 2, "expected at least 2 tier-1 providers")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestAhoCorasickBuild(t *testing.T) {
|
||||||
|
reg, err := providers.NewRegistry()
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
ac := reg.AC()
|
||||||
|
|
||||||
|
// Should match OpenAI keyword
|
||||||
|
matches := ac.FindAll("OPENAI_API_KEY=sk-proj-abc")
|
||||||
|
assert.NotEmpty(t, matches, "expected AC to find keyword in string containing 'sk-proj-'")
|
||||||
|
|
||||||
|
// Should not match clean text
|
||||||
|
noMatches := ac.FindAll("hello world no secrets here")
|
||||||
|
assert.Empty(t, noMatches, "expected no AC matches in text with no provider keywords")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestProviderSchemaValidation(t *testing.T) {
|
||||||
|
import_yaml := `
|
||||||
|
format_version: 0
|
||||||
|
name: invalid
|
||||||
|
last_verified: ""
|
||||||
|
`
|
||||||
|
// Directly test UnmarshalYAML via yaml.Unmarshal
|
||||||
|
var p providers.Provider
|
||||||
|
err := yaml.Unmarshal([]byte(import_yaml), &p) // NOTE: need import "gopkg.in/yaml.v3"
|
||||||
|
assert.Error(t, err, "expected validation error for format_version=0")
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Note: The TestProviderSchemaValidation test needs `import "gopkg.in/yaml.v3"` added.
|
||||||
|
Add it to the imports. Full corrected test file with proper imports:
|
||||||
|
|
||||||
|
```go
|
||||||
|
package providers_test
|
||||||
|
|
||||||
|
import (
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/providers"
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
"gopkg.in/yaml.v3"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestRegistryLoad(t *testing.T) {
|
||||||
|
reg, err := providers.NewRegistry()
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.GreaterOrEqual(t, len(reg.List()), 3, "expected at least 3 providers")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestRegistryGet(t *testing.T) {
|
||||||
|
reg, err := providers.NewRegistry()
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
p, ok := reg.Get("openai")
|
||||||
|
assert.True(t, ok)
|
||||||
|
assert.Equal(t, "openai", p.Name)
|
||||||
|
assert.Equal(t, 1, p.Tier)
|
||||||
|
|
||||||
|
_, notOk := reg.Get("nonexistent-provider")
|
||||||
|
assert.False(t, notOk)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestRegistryStats(t *testing.T) {
|
||||||
|
reg, err := providers.NewRegistry()
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
stats := reg.Stats()
|
||||||
|
assert.GreaterOrEqual(t, stats.Total, 3)
|
||||||
|
assert.GreaterOrEqual(t, stats.ByTier[1], 2)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestAhoCorasickBuild(t *testing.T) {
|
||||||
|
reg, err := providers.NewRegistry()
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
ac := reg.AC()
|
||||||
|
matches := ac.FindAll("export OPENAI_API_KEY=sk-proj-abc")
|
||||||
|
assert.NotEmpty(t, matches)
|
||||||
|
|
||||||
|
noMatches := ac.FindAll("hello world nothing here")
|
||||||
|
assert.Empty(t, noMatches)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestProviderSchemaValidation(t *testing.T) {
|
||||||
|
invalid := []byte("format_version: 0\nname: invalid\nlast_verified: \"\"\n")
|
||||||
|
var p providers.Provider
|
||||||
|
err := yaml.Unmarshal(invalid, &p)
|
||||||
|
assert.Error(t, err)
|
||||||
|
assert.Contains(t, err.Error(), "format_version")
|
||||||
|
}
|
||||||
|
```
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/providers/... -v -count=1 2>&1 | tail -20</automated>
|
||||||
|
</verify>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- `go test ./pkg/providers/... -v` exits 0 with all 5 tests PASS (not SKIP)
|
||||||
|
- TestRegistryLoad passes with >= 3 providers
|
||||||
|
- TestRegistryGet passes — "openai" found, "nonexistent" not found
|
||||||
|
- TestRegistryStats passes — Total >= 3
|
||||||
|
- TestAhoCorasickBuild passes — "sk-proj-" match found, "hello world" empty
|
||||||
|
- TestProviderSchemaValidation passes — error on format_version=0
|
||||||
|
- `grep -r 'go:embed' pkg/providers/loader.go` exits 0
|
||||||
|
- pkg/providers/definitions/ directory exists with 3 YAML files
|
||||||
|
</acceptance_criteria>
|
||||||
|
<done>Registry loads providers from embedded YAML, builds Aho-Corasick automaton, exposes List/Get/Stats/AC. All 5 tests pass.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
</tasks>
|
||||||
|
|
||||||
|
<verification>
|
||||||
|
After both tasks:
|
||||||
|
- `go test ./pkg/providers/... -v -count=1` exits 0 with 5 tests PASS
|
||||||
|
- `go build ./...` still exits 0
|
||||||
|
- `grep -q 'format_version' providers/openai.yaml providers/anthropic.yaml providers/huggingface.yaml` exits 0
|
||||||
|
- `grep -q 'go:embed' pkg/providers/loader.go` exits 0
|
||||||
|
- pkg/providers/definitions/ has 3 YAML files (same content as providers/)
|
||||||
|
</verification>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- 3 reference provider YAML files exist in providers/ and pkg/providers/definitions/ with format_version and last_verified
|
||||||
|
- Provider schema validates format_version >= 1 and non-empty last_verified (PROV-10)
|
||||||
|
- Registry loads providers from embed.FS at compile time (CORE-02)
|
||||||
|
- Aho-Corasick automaton built from all keywords at NewRegistry() (CORE-06)
|
||||||
|
- Registry exposes List(), Get(), Stats(), AC() (CORE-03)
|
||||||
|
- 5 provider tests all pass
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<output>
|
||||||
|
After completion, create `.planning/phases/01-foundation/01-02-SUMMARY.md` following the summary template.
|
||||||
|
</output>
|
||||||
634
.planning/phases/01-foundation/01-03-PLAN.md
Normal file
634
.planning/phases/01-foundation/01-03-PLAN.md
Normal file
@@ -0,0 +1,634 @@
|
|||||||
|
---
|
||||||
|
phase: 01-foundation
|
||||||
|
plan: 03
|
||||||
|
type: execute
|
||||||
|
wave: 1
|
||||||
|
depends_on: [01-01]
|
||||||
|
files_modified:
|
||||||
|
- pkg/storage/schema.sql
|
||||||
|
- pkg/storage/encrypt.go
|
||||||
|
- pkg/storage/crypto.go
|
||||||
|
- pkg/storage/db.go
|
||||||
|
- pkg/storage/findings.go
|
||||||
|
- pkg/storage/db_test.go
|
||||||
|
autonomous: true
|
||||||
|
requirements: [STOR-01, STOR-02, STOR-03]
|
||||||
|
|
||||||
|
must_haves:
|
||||||
|
truths:
|
||||||
|
- "SQLite database opens, runs migrations from embedded schema.sql, and closes cleanly"
|
||||||
|
- "AES-256-GCM Encrypt/Decrypt roundtrip produces the original plaintext"
|
||||||
|
- "Argon2id DeriveKey with the same passphrase and salt always returns the same 32-byte key"
|
||||||
|
- "A Finding can be saved to the database with the key_value stored encrypted and retrieved as plaintext"
|
||||||
|
- "The raw database file does NOT contain plaintext API key values"
|
||||||
|
artifacts:
|
||||||
|
- path: "pkg/storage/encrypt.go"
|
||||||
|
provides: "Encrypt(plaintext, key) and Decrypt(ciphertext, key) using AES-256-GCM"
|
||||||
|
exports: ["Encrypt", "Decrypt"]
|
||||||
|
- path: "pkg/storage/crypto.go"
|
||||||
|
provides: "DeriveKey(passphrase, salt) using Argon2id RFC 9106 params"
|
||||||
|
exports: ["DeriveKey", "NewSalt"]
|
||||||
|
- path: "pkg/storage/db.go"
|
||||||
|
provides: "DB struct with Open(), Close(), WAL mode, embedded schema migration"
|
||||||
|
exports: ["DB", "Open"]
|
||||||
|
- path: "pkg/storage/findings.go"
|
||||||
|
provides: "SaveFinding(finding, encKey) and ListFindings(encKey) CRUD"
|
||||||
|
exports: ["SaveFinding", "ListFindings", "Finding"]
|
||||||
|
- path: "pkg/storage/schema.sql"
|
||||||
|
provides: "CREATE TABLE statements for findings, scans, settings"
|
||||||
|
contains: "CREATE TABLE IF NOT EXISTS findings"
|
||||||
|
key_links:
|
||||||
|
- from: "pkg/storage/findings.go"
|
||||||
|
to: "pkg/storage/encrypt.go"
|
||||||
|
via: "Encrypt() called before INSERT, Decrypt() called after SELECT"
|
||||||
|
pattern: "Encrypt|Decrypt"
|
||||||
|
- from: "pkg/storage/db.go"
|
||||||
|
to: "pkg/storage/schema.sql"
|
||||||
|
via: "//go:embed schema.sql and db.Exec on open"
|
||||||
|
pattern: "go:embed.*schema"
|
||||||
|
- from: "pkg/storage/crypto.go"
|
||||||
|
to: "golang.org/x/crypto/argon2"
|
||||||
|
via: "argon2.IDKey call"
|
||||||
|
pattern: "argon2\\.IDKey"
|
||||||
|
---
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Build the storage layer: AES-256-GCM column encryption, Argon2id key derivation, SQLite database with WAL mode and embedded schema, and Finding CRUD operations that transparently encrypt key values on write and decrypt on read.
|
||||||
|
|
||||||
|
Purpose: Scanner results from Plan 04 and CLI commands from Plan 05 need a storage layer to persist findings. The encryption contract (Encrypt/Decrypt/DeriveKey) must exist before the scanner pipeline can store keys.
|
||||||
|
Output: pkg/storage/{encrypt,crypto,db,findings,schema}.go and db_test.go (stubs filled).
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||||
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
@.planning/phases/01-foundation/01-RESEARCH.md
|
||||||
|
@.planning/phases/01-foundation/01-01-SUMMARY.md
|
||||||
|
|
||||||
|
<interfaces>
|
||||||
|
<!-- AES-256-GCM encrypt/decrypt pattern from RESEARCH.md Pattern 3 -->
|
||||||
|
func Encrypt(plaintext []byte, key []byte) ([]byte, error)
|
||||||
|
// key must be exactly 32 bytes (AES-256)
|
||||||
|
// nonce prepended to ciphertext in returned []byte
|
||||||
|
// uses crypto/aes + crypto/cipher GCM
|
||||||
|
|
||||||
|
func Decrypt(ciphertext []byte, key []byte) ([]byte, error)
|
||||||
|
// expects nonce prepended format from Encrypt()
|
||||||
|
// returns ErrCiphertextTooShort if len < nonceSize
|
||||||
|
|
||||||
|
<!-- Argon2id key derivation pattern from RESEARCH.md Pattern 4 -->
|
||||||
|
func DeriveKey(passphrase []byte, salt []byte) []byte
|
||||||
|
// params: time=1, memory=64*1024, threads=4, keyLen=32
|
||||||
|
// returns exactly 32 bytes deterministically
|
||||||
|
|
||||||
|
func NewSalt() ([]byte, error)
|
||||||
|
// generates 16 random bytes via crypto/rand
|
||||||
|
|
||||||
|
<!-- SQLite schema — findings table -->
|
||||||
|
findings table columns:
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT
|
||||||
|
scan_id INTEGER REFERENCES scans(id)
|
||||||
|
provider_name TEXT NOT NULL
|
||||||
|
key_value BLOB NOT NULL -- AES-256-GCM encrypted, nonce prepended
|
||||||
|
key_masked TEXT NOT NULL -- first8...last4, stored plaintext for display
|
||||||
|
confidence TEXT NOT NULL -- "high", "medium", "low"
|
||||||
|
source_path TEXT
|
||||||
|
source_type TEXT -- "file", "dir", "git", "stdin", "url"
|
||||||
|
line_number INTEGER
|
||||||
|
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||||
|
|
||||||
|
scans table columns:
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT
|
||||||
|
started_at DATETIME NOT NULL
|
||||||
|
finished_at DATETIME
|
||||||
|
source_path TEXT
|
||||||
|
finding_count INTEGER DEFAULT 0
|
||||||
|
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||||
|
|
||||||
|
settings table columns:
|
||||||
|
key TEXT PRIMARY KEY
|
||||||
|
value TEXT NOT NULL
|
||||||
|
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||||
|
|
||||||
|
<!-- Finding struct for inter-package communication -->
|
||||||
|
type Finding struct {
|
||||||
|
ID int64
|
||||||
|
ScanID int64
|
||||||
|
ProviderName string
|
||||||
|
KeyValue string // plaintext — encrypted before storage
|
||||||
|
KeyMasked string // first8chars...last4chars
|
||||||
|
Confidence string
|
||||||
|
SourcePath string
|
||||||
|
SourceType string
|
||||||
|
LineNumber int
|
||||||
|
}
|
||||||
|
|
||||||
|
<!-- DB driver registration -->
|
||||||
|
import _ "modernc.org/sqlite"
|
||||||
|
// driver registered as "sqlite" (NOT "sqlite3")
|
||||||
|
db, err := sql.Open("sqlite", dataSourceName)
|
||||||
|
</interfaces>
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<tasks>
|
||||||
|
|
||||||
|
<task type="auto" tdd="true">
|
||||||
|
<name>Task 1: AES-256-GCM encryption and Argon2id key derivation</name>
|
||||||
|
<files>pkg/storage/encrypt.go, pkg/storage/crypto.go</files>
|
||||||
|
<read_first>
|
||||||
|
- /home/salva/Documents/apikey/.planning/phases/01-foundation/01-RESEARCH.md (Pattern 3: AES-256-GCM Column Encryption and Pattern 4: Argon2id Key Derivation — exact code examples)
|
||||||
|
</read_first>
|
||||||
|
<behavior>
|
||||||
|
- Test 1: Encrypt then Decrypt same key → returns original plaintext exactly
|
||||||
|
- Test 2: Encrypt produces output longer than input (nonce + tag overhead)
|
||||||
|
- Test 3: Two Encrypt calls on same plaintext → different ciphertext (random nonce)
|
||||||
|
- Test 4: Decrypt with wrong key → returns error (GCM authentication fails)
|
||||||
|
- Test 5: DeriveKey with same passphrase+salt → same 32-byte output (deterministic)
|
||||||
|
- Test 6: DeriveKey output is exactly 32 bytes
|
||||||
|
- Test 7: NewSalt() returns 16 bytes, two calls return different values
|
||||||
|
</behavior>
|
||||||
|
<action>
|
||||||
|
Create **pkg/storage/encrypt.go**:
|
||||||
|
```go
|
||||||
|
package storage
|
||||||
|
|
||||||
|
import (
|
||||||
|
"crypto/aes"
|
||||||
|
"crypto/cipher"
|
||||||
|
"crypto/rand"
|
||||||
|
"errors"
|
||||||
|
"io"
|
||||||
|
)
|
||||||
|
|
||||||
|
// ErrCiphertextTooShort is returned when ciphertext is shorter than the GCM nonce size.
|
||||||
|
var ErrCiphertextTooShort = errors.New("ciphertext too short")
|
||||||
|
|
||||||
|
// Encrypt encrypts plaintext using AES-256-GCM with a random nonce.
|
||||||
|
// The nonce is prepended to the returned ciphertext.
|
||||||
|
// key must be exactly 32 bytes (AES-256).
|
||||||
|
func Encrypt(plaintext []byte, key []byte) ([]byte, error) {
|
||||||
|
block, err := aes.NewCipher(key)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
gcm, err := cipher.NewGCM(block)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
nonce := make([]byte, gcm.NonceSize())
|
||||||
|
if _, err := io.ReadFull(rand.Reader, nonce); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
// Seal appends encrypted data to nonce, so nonce is prepended
|
||||||
|
ciphertext := gcm.Seal(nonce, nonce, plaintext, nil)
|
||||||
|
return ciphertext, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Decrypt decrypts ciphertext produced by Encrypt.
|
||||||
|
// Expects the nonce to be prepended to the ciphertext.
|
||||||
|
func Decrypt(ciphertext []byte, key []byte) ([]byte, error) {
|
||||||
|
block, err := aes.NewCipher(key)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
gcm, err := cipher.NewGCM(block)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
nonceSize := gcm.NonceSize()
|
||||||
|
if len(ciphertext) < nonceSize {
|
||||||
|
return nil, ErrCiphertextTooShort
|
||||||
|
}
|
||||||
|
nonce, ciphertext := ciphertext[:nonceSize], ciphertext[nonceSize:]
|
||||||
|
return gcm.Open(nil, nonce, ciphertext, nil)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create **pkg/storage/crypto.go**:
|
||||||
|
```go
|
||||||
|
package storage
|
||||||
|
|
||||||
|
import (
|
||||||
|
"crypto/rand"
|
||||||
|
|
||||||
|
"golang.org/x/crypto/argon2"
|
||||||
|
)
|
||||||
|
|
||||||
|
const (
|
||||||
|
argon2Time uint32 = 1
|
||||||
|
argon2Memory uint32 = 64 * 1024 // 64 MB — RFC 9106 Section 7.3
|
||||||
|
argon2Threads uint8 = 4
|
||||||
|
argon2KeyLen uint32 = 32 // AES-256 key length
|
||||||
|
saltSize = 16
|
||||||
|
)
|
||||||
|
|
||||||
|
// DeriveKey produces a 32-byte AES-256 key from a passphrase and salt using Argon2id.
|
||||||
|
// Uses RFC 9106 Section 7.3 recommended parameters.
|
||||||
|
// Given the same passphrase and salt, always returns the same key.
|
||||||
|
func DeriveKey(passphrase []byte, salt []byte) []byte {
|
||||||
|
return argon2.IDKey(passphrase, salt, argon2Time, argon2Memory, argon2Threads, argon2KeyLen)
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewSalt generates a cryptographically random 16-byte salt.
|
||||||
|
// Store alongside the database and reuse on each key derivation.
|
||||||
|
func NewSalt() ([]byte, error) {
|
||||||
|
salt := make([]byte, saltSize)
|
||||||
|
if _, err := rand.Read(salt); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
return salt, nil
|
||||||
|
}
|
||||||
|
```
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go build ./pkg/storage/... && echo "BUILD OK"</automated>
|
||||||
|
</verify>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- `go build ./pkg/storage/...` exits 0
|
||||||
|
- pkg/storage/encrypt.go exports: Encrypt, Decrypt, ErrCiphertextTooShort
|
||||||
|
- pkg/storage/crypto.go exports: DeriveKey, NewSalt
|
||||||
|
- `grep -q 'argon2\.IDKey' pkg/storage/crypto.go` exits 0
|
||||||
|
- `grep -q 'crypto/aes' pkg/storage/encrypt.go` exits 0
|
||||||
|
- `grep -q 'cipher\.NewGCM' pkg/storage/encrypt.go` exits 0
|
||||||
|
</acceptance_criteria>
|
||||||
|
<done>Encrypt/Decrypt and DeriveKey/NewSalt exist and compile. Encryption uses AES-256-GCM with random nonce. Key derivation uses Argon2id RFC 9106 parameters.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="auto" tdd="true">
|
||||||
|
<name>Task 2: SQLite database, schema, Finding CRUD, and filled test stubs</name>
|
||||||
|
<files>pkg/storage/schema.sql, pkg/storage/db.go, pkg/storage/findings.go, pkg/storage/db_test.go</files>
|
||||||
|
<read_first>
|
||||||
|
- /home/salva/Documents/apikey/.planning/phases/01-foundation/01-RESEARCH.md (STOR-01 row, Pattern 1 for embed usage pattern)
|
||||||
|
- /home/salva/Documents/apikey/pkg/storage/encrypt.go (Encrypt/Decrypt signatures)
|
||||||
|
- /home/salva/Documents/apikey/pkg/storage/crypto.go (DeriveKey signature)
|
||||||
|
</read_first>
|
||||||
|
<behavior>
|
||||||
|
- Test 1: Open(":memory:") returns *DB without error, schema tables exist
|
||||||
|
- Test 2: Encrypt/Decrypt roundtrip — Encrypt([]byte("sk-proj-abc"), key) then Decrypt returns "sk-proj-abc"
|
||||||
|
- Test 3: DeriveKey(passphrase, salt) twice returns identical 32 bytes
|
||||||
|
- Test 4: NewSalt() twice returns different slices
|
||||||
|
- Test 5: SaveFinding stores finding → ListFindings decrypts and returns KeyValue == "sk-proj-test"
|
||||||
|
- Test 6: Database file (when not :memory:) does NOT contain literal "sk-proj-test" in raw bytes
|
||||||
|
</behavior>
|
||||||
|
<action>
|
||||||
|
Create **pkg/storage/schema.sql**:
|
||||||
|
```sql
|
||||||
|
-- KeyHunter database schema
|
||||||
|
-- Version: 1
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS scans (
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
|
started_at DATETIME NOT NULL,
|
||||||
|
finished_at DATETIME,
|
||||||
|
source_path TEXT,
|
||||||
|
finding_count INTEGER DEFAULT 0,
|
||||||
|
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS findings (
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
|
scan_id INTEGER REFERENCES scans(id),
|
||||||
|
provider_name TEXT NOT NULL,
|
||||||
|
key_value BLOB NOT NULL,
|
||||||
|
key_masked TEXT NOT NULL,
|
||||||
|
confidence TEXT NOT NULL,
|
||||||
|
source_path TEXT,
|
||||||
|
source_type TEXT,
|
||||||
|
line_number INTEGER,
|
||||||
|
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS settings (
|
||||||
|
key TEXT PRIMARY KEY,
|
||||||
|
value TEXT NOT NULL,
|
||||||
|
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Indexes for common queries
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_findings_scan_id ON findings(scan_id);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_findings_provider ON findings(provider_name);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_findings_created ON findings(created_at DESC);
|
||||||
|
```
|
||||||
|
|
||||||
|
Create **pkg/storage/db.go**:
|
||||||
|
```go
|
||||||
|
package storage
|
||||||
|
|
||||||
|
import (
|
||||||
|
"database/sql"
|
||||||
|
_ "embed"
|
||||||
|
"fmt"
|
||||||
|
|
||||||
|
_ "modernc.org/sqlite"
|
||||||
|
)
|
||||||
|
|
||||||
|
//go:embed schema.sql
|
||||||
|
var schemaSQLBytes []byte
|
||||||
|
|
||||||
|
// DB wraps the sql.DB connection with KeyHunter-specific behavior.
|
||||||
|
type DB struct {
|
||||||
|
sql *sql.DB
|
||||||
|
}
|
||||||
|
|
||||||
|
// Open opens or creates a SQLite database at path, runs embedded schema migrations,
|
||||||
|
// and enables WAL mode for better concurrent read performance.
|
||||||
|
// Use ":memory:" for tests.
|
||||||
|
func Open(path string) (*DB, error) {
|
||||||
|
sqlDB, err := sql.Open("sqlite", path)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("opening database: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Enable WAL mode for concurrent reads
|
||||||
|
if _, err := sqlDB.Exec("PRAGMA journal_mode=WAL"); err != nil {
|
||||||
|
sqlDB.Close()
|
||||||
|
return nil, fmt.Errorf("enabling WAL mode: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Enable foreign keys
|
||||||
|
if _, err := sqlDB.Exec("PRAGMA foreign_keys=ON"); err != nil {
|
||||||
|
sqlDB.Close()
|
||||||
|
return nil, fmt.Errorf("enabling foreign keys: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Run schema migrations
|
||||||
|
if _, err := sqlDB.Exec(string(schemaSQLBytes)); err != nil {
|
||||||
|
sqlDB.Close()
|
||||||
|
return nil, fmt.Errorf("running schema migrations: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return &DB{sql: sqlDB}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Close closes the underlying database connection.
|
||||||
|
func (db *DB) Close() error {
|
||||||
|
return db.sql.Close()
|
||||||
|
}
|
||||||
|
|
||||||
|
// SQL returns the underlying sql.DB for advanced use cases.
|
||||||
|
func (db *DB) SQL() *sql.DB {
|
||||||
|
return db.sql
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create **pkg/storage/findings.go**:
|
||||||
|
```go
|
||||||
|
package storage
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Finding represents a detected API key with metadata.
|
||||||
|
// KeyValue is always plaintext in this struct — encryption happens at the storage boundary.
|
||||||
|
type Finding struct {
|
||||||
|
ID int64
|
||||||
|
ScanID int64
|
||||||
|
ProviderName string
|
||||||
|
KeyValue string // plaintext — encrypted before storage, decrypted after retrieval
|
||||||
|
KeyMasked string // first8...last4, stored plaintext
|
||||||
|
Confidence string
|
||||||
|
SourcePath string
|
||||||
|
SourceType string
|
||||||
|
LineNumber int
|
||||||
|
CreatedAt time.Time
|
||||||
|
}
|
||||||
|
|
||||||
|
// MaskKey returns the masked form of a key: first 8 chars + "..." + last 4 chars.
|
||||||
|
// If the key is too short (< 12 chars), returns the full key masked with asterisks.
|
||||||
|
func MaskKey(key string) string {
|
||||||
|
if len(key) < 12 {
|
||||||
|
return "****"
|
||||||
|
}
|
||||||
|
return key[:8] + "..." + key[len(key)-4:]
|
||||||
|
}
|
||||||
|
|
||||||
|
// SaveFinding encrypts the finding's KeyValue and persists the finding to the database.
|
||||||
|
// encKey must be a 32-byte AES-256 key (from DeriveKey).
|
||||||
|
func (db *DB) SaveFinding(f Finding, encKey []byte) (int64, error) {
|
||||||
|
encrypted, err := Encrypt([]byte(f.KeyValue), encKey)
|
||||||
|
if err != nil {
|
||||||
|
return 0, fmt.Errorf("encrypting key value: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
masked := f.KeyMasked
|
||||||
|
if masked == "" {
|
||||||
|
masked = MaskKey(f.KeyValue)
|
||||||
|
}
|
||||||
|
|
||||||
|
res, err := db.sql.Exec(
|
||||||
|
`INSERT INTO findings (scan_id, provider_name, key_value, key_masked, confidence, source_path, source_type, line_number)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?)`,
|
||||||
|
f.ScanID, f.ProviderName, encrypted, masked, f.Confidence, f.SourcePath, f.SourceType, f.LineNumber,
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
return 0, fmt.Errorf("inserting finding: %w", err)
|
||||||
|
}
|
||||||
|
return res.LastInsertId()
|
||||||
|
}
|
||||||
|
|
||||||
|
// ListFindings retrieves all findings, decrypting key values using encKey.
|
||||||
|
// encKey must be the same 32-byte key used during SaveFinding.
|
||||||
|
func (db *DB) ListFindings(encKey []byte) ([]Finding, error) {
|
||||||
|
rows, err := db.sql.Query(
|
||||||
|
`SELECT id, scan_id, provider_name, key_value, key_masked, confidence,
|
||||||
|
source_path, source_type, line_number, created_at
|
||||||
|
FROM findings ORDER BY created_at DESC`,
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("querying findings: %w", err)
|
||||||
|
}
|
||||||
|
defer rows.Close()
|
||||||
|
|
||||||
|
var findings []Finding
|
||||||
|
for rows.Next() {
|
||||||
|
var f Finding
|
||||||
|
var encrypted []byte
|
||||||
|
var createdAt string
|
||||||
|
err := rows.Scan(
|
||||||
|
&f.ID, &f.ScanID, &f.ProviderName, &encrypted, &f.KeyMasked,
|
||||||
|
&f.Confidence, &f.SourcePath, &f.SourceType, &f.LineNumber, &createdAt,
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("scanning finding row: %w", err)
|
||||||
|
}
|
||||||
|
plain, err := Decrypt(encrypted, encKey)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("decrypting finding %d: %w", f.ID, err)
|
||||||
|
}
|
||||||
|
f.KeyValue = string(plain)
|
||||||
|
f.CreatedAt, _ = time.Parse("2006-01-02 15:04:05", createdAt)
|
||||||
|
findings = append(findings, f)
|
||||||
|
}
|
||||||
|
return findings, rows.Err()
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Fill **pkg/storage/db_test.go** (replacing stubs from Plan 01):
|
||||||
|
```go
|
||||||
|
package storage_test
|
||||||
|
|
||||||
|
import (
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/storage"
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestDBOpen(t *testing.T) {
|
||||||
|
db, err := storage.Open(":memory:")
|
||||||
|
require.NoError(t, err)
|
||||||
|
defer db.Close()
|
||||||
|
|
||||||
|
// Verify schema tables exist
|
||||||
|
rows, err := db.SQL().Query("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name")
|
||||||
|
require.NoError(t, err)
|
||||||
|
defer rows.Close()
|
||||||
|
|
||||||
|
var tables []string
|
||||||
|
for rows.Next() {
|
||||||
|
var name string
|
||||||
|
require.NoError(t, rows.Scan(&name))
|
||||||
|
tables = append(tables, name)
|
||||||
|
}
|
||||||
|
assert.Contains(t, tables, "findings")
|
||||||
|
assert.Contains(t, tables, "scans")
|
||||||
|
assert.Contains(t, tables, "settings")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestEncryptDecryptRoundtrip(t *testing.T) {
|
||||||
|
key := make([]byte, 32) // all-zero key for test
|
||||||
|
for i := range key {
|
||||||
|
key[i] = byte(i)
|
||||||
|
}
|
||||||
|
plaintext := []byte("sk-proj-supersecretapikey1234")
|
||||||
|
|
||||||
|
ciphertext, err := storage.Encrypt(plaintext, key)
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Greater(t, len(ciphertext), len(plaintext), "ciphertext should be longer than plaintext")
|
||||||
|
|
||||||
|
recovered, err := storage.Decrypt(ciphertext, key)
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Equal(t, plaintext, recovered)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestEncryptNonDeterministic(t *testing.T) {
|
||||||
|
key := make([]byte, 32)
|
||||||
|
plain := []byte("test-key")
|
||||||
|
ct1, err1 := storage.Encrypt(plain, key)
|
||||||
|
ct2, err2 := storage.Encrypt(plain, key)
|
||||||
|
require.NoError(t, err1)
|
||||||
|
require.NoError(t, err2)
|
||||||
|
assert.NotEqual(t, ct1, ct2, "same plaintext encrypted twice should produce different ciphertext")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestDecryptWrongKey(t *testing.T) {
|
||||||
|
key1 := make([]byte, 32)
|
||||||
|
key2 := make([]byte, 32)
|
||||||
|
key2[0] = 0xFF
|
||||||
|
|
||||||
|
ct, err := storage.Encrypt([]byte("secret"), key1)
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
_, err = storage.Decrypt(ct, key2)
|
||||||
|
assert.Error(t, err, "decryption with wrong key should fail")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestArgon2KeyDerivation(t *testing.T) {
|
||||||
|
passphrase := []byte("my-secure-passphrase")
|
||||||
|
salt := []byte("1234567890abcdef") // 16 bytes
|
||||||
|
|
||||||
|
key1 := storage.DeriveKey(passphrase, salt)
|
||||||
|
key2 := storage.DeriveKey(passphrase, salt)
|
||||||
|
|
||||||
|
assert.Equal(t, 32, len(key1), "derived key must be 32 bytes")
|
||||||
|
assert.Equal(t, key1, key2, "same passphrase+salt must produce same key")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestNewSalt(t *testing.T) {
|
||||||
|
salt1, err1 := storage.NewSalt()
|
||||||
|
salt2, err2 := storage.NewSalt()
|
||||||
|
require.NoError(t, err1)
|
||||||
|
require.NoError(t, err2)
|
||||||
|
assert.Equal(t, 16, len(salt1))
|
||||||
|
assert.NotEqual(t, salt1, salt2, "two salts should differ")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestSaveFindingEncrypted(t *testing.T) {
|
||||||
|
db, err := storage.Open(":memory:")
|
||||||
|
require.NoError(t, err)
|
||||||
|
defer db.Close()
|
||||||
|
|
||||||
|
// Derive a test key
|
||||||
|
key := storage.DeriveKey([]byte("testpassphrase"), []byte("testsalt1234xxxx"))
|
||||||
|
|
||||||
|
f := storage.Finding{
|
||||||
|
ProviderName: "openai",
|
||||||
|
KeyValue: "sk-proj-test1234567890abcdefghijklmnopqr",
|
||||||
|
Confidence: "high",
|
||||||
|
SourcePath: "/test/file.env",
|
||||||
|
SourceType: "file",
|
||||||
|
LineNumber: 42,
|
||||||
|
}
|
||||||
|
|
||||||
|
id, err := db.SaveFinding(f, key)
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Greater(t, id, int64(0))
|
||||||
|
|
||||||
|
findings, err := db.ListFindings(key)
|
||||||
|
require.NoError(t, err)
|
||||||
|
require.Len(t, findings, 1)
|
||||||
|
assert.Equal(t, "sk-proj-test1234567890abcdefghijklmnopqr", findings[0].KeyValue)
|
||||||
|
assert.Equal(t, "openai", findings[0].ProviderName)
|
||||||
|
// Verify masking
|
||||||
|
assert.Equal(t, "sk-proj-...opqr", findings[0].KeyMasked)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/storage/... -v -count=1 2>&1 | tail -25</automated>
|
||||||
|
</verify>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- `go test ./pkg/storage/... -v -count=1` exits 0 with all 7 tests PASS (no SKIP)
|
||||||
|
- TestDBOpen finds tables: findings, scans, settings
|
||||||
|
- TestEncryptDecryptRoundtrip passes — recovered plaintext matches original
|
||||||
|
- TestEncryptNonDeterministic passes — two encryptions differ
|
||||||
|
- TestDecryptWrongKey passes — wrong key causes error
|
||||||
|
- TestArgon2KeyDerivation passes — 32 bytes, deterministic
|
||||||
|
- TestNewSalt passes — 16 bytes, non-deterministic
|
||||||
|
- TestSaveFindingEncrypted passes — stored and retrieved with correct KeyValue and KeyMasked
|
||||||
|
- `grep -q 'go:embed.*schema' pkg/storage/db.go` exits 0
|
||||||
|
- `grep -q 'modernc.org/sqlite' pkg/storage/db.go` exits 0
|
||||||
|
- `grep -q 'journal_mode=WAL' pkg/storage/db.go` exits 0
|
||||||
|
</acceptance_criteria>
|
||||||
|
<done>Storage layer complete — SQLite opens with schema, AES-256-GCM encrypt/decrypt works, Argon2id key derivation works, SaveFinding/ListFindings encrypt/decrypt transparently. All 7 tests pass.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
</tasks>
|
||||||
|
|
||||||
|
<verification>
|
||||||
|
After both tasks:
|
||||||
|
- `go test ./pkg/storage/... -v -count=1` exits 0 with 7 tests PASS
|
||||||
|
- `go build ./...` still exits 0
|
||||||
|
- `grep -q 'argon2\.IDKey' pkg/storage/crypto.go` exits 0
|
||||||
|
- `grep -q 'cipher\.NewGCM' pkg/storage/encrypt.go` exits 0
|
||||||
|
- `grep -q 'journal_mode=WAL' pkg/storage/db.go` exits 0
|
||||||
|
- schema.sql contains CREATE TABLE for findings, scans, settings
|
||||||
|
</verification>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- SQLite database opens and auto-migrates from embedded schema.sql (STOR-01)
|
||||||
|
- AES-256-GCM column encryption works: Encrypt + Decrypt roundtrip returns original (STOR-02)
|
||||||
|
- Argon2id key derivation: DeriveKey deterministic, 32 bytes, RFC 9106 params (STOR-03)
|
||||||
|
- FindingCRUD: SaveFinding encrypts before INSERT, ListFindings decrypts after SELECT
|
||||||
|
- All 7 storage tests pass
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<output>
|
||||||
|
After completion, create `.planning/phases/01-foundation/01-03-SUMMARY.md` following the summary template.
|
||||||
|
</output>
|
||||||
682
.planning/phases/01-foundation/01-04-PLAN.md
Normal file
682
.planning/phases/01-foundation/01-04-PLAN.md
Normal file
@@ -0,0 +1,682 @@
|
|||||||
|
---
|
||||||
|
phase: 01-foundation
|
||||||
|
plan: 04
|
||||||
|
type: execute
|
||||||
|
wave: 2
|
||||||
|
depends_on: [01-02]
|
||||||
|
files_modified:
|
||||||
|
- pkg/engine/chunk.go
|
||||||
|
- pkg/engine/finding.go
|
||||||
|
- pkg/engine/entropy.go
|
||||||
|
- pkg/engine/filter.go
|
||||||
|
- pkg/engine/detector.go
|
||||||
|
- pkg/engine/engine.go
|
||||||
|
- pkg/engine/sources/source.go
|
||||||
|
- pkg/engine/sources/file.go
|
||||||
|
- pkg/engine/scanner_test.go
|
||||||
|
autonomous: true
|
||||||
|
requirements: [CORE-01, CORE-04, CORE-05, CORE-06, CORE-07]
|
||||||
|
|
||||||
|
must_haves:
|
||||||
|
truths:
|
||||||
|
- "Shannon entropy function returns expected values for known inputs"
|
||||||
|
- "Aho-Corasick pre-filter passes chunks containing provider keywords and drops those without"
|
||||||
|
- "Detector correctly identifies OpenAI and Anthropic key patterns in test fixtures via regex"
|
||||||
|
- "Full scan pipeline: scan testdata/samples/openai_key.txt → Finding with ProviderName==openai"
|
||||||
|
- "Full scan pipeline: scan testdata/samples/no_keys.txt → zero findings"
|
||||||
|
- "Worker pool uses ants v2 with configurable worker count"
|
||||||
|
artifacts:
|
||||||
|
- path: "pkg/engine/chunk.go"
|
||||||
|
provides: "Chunk struct (Data []byte, Source string, Offset int64)"
|
||||||
|
exports: ["Chunk"]
|
||||||
|
- path: "pkg/engine/finding.go"
|
||||||
|
provides: "Finding struct (provider, key value, masked, confidence, source, line)"
|
||||||
|
exports: ["Finding", "MaskKey"]
|
||||||
|
- path: "pkg/engine/entropy.go"
|
||||||
|
provides: "Shannon(s string) float64 — ~10 line stdlib math implementation"
|
||||||
|
exports: ["Shannon"]
|
||||||
|
- path: "pkg/engine/filter.go"
|
||||||
|
provides: "KeywordFilter stage — runs Aho-Corasick and passes/drops chunks"
|
||||||
|
exports: ["KeywordFilter"]
|
||||||
|
- path: "pkg/engine/detector.go"
|
||||||
|
provides: "Detector stage — applies provider regexps and entropy check to chunks"
|
||||||
|
exports: ["Detector"]
|
||||||
|
- path: "pkg/engine/engine.go"
|
||||||
|
provides: "Engine struct with Scan(ctx, src, cfg) <-chan Finding"
|
||||||
|
exports: ["Engine", "NewEngine", "ScanConfig"]
|
||||||
|
- path: "pkg/engine/sources/source.go"
|
||||||
|
provides: "Source interface with Chunks(ctx, chan<- Chunk) error"
|
||||||
|
exports: ["Source"]
|
||||||
|
- path: "pkg/engine/sources/file.go"
|
||||||
|
provides: "FileSource implementing Source for single-file scanning"
|
||||||
|
exports: ["FileSource", "NewFileSource"]
|
||||||
|
key_links:
|
||||||
|
- from: "pkg/engine/engine.go"
|
||||||
|
to: "pkg/providers/registry.go"
|
||||||
|
via: "Engine holds *providers.Registry, uses Registry.AC() for pre-filter"
|
||||||
|
pattern: "providers\\.Registry"
|
||||||
|
- from: "pkg/engine/filter.go"
|
||||||
|
to: "github.com/petar-dambovaliev/aho-corasick"
|
||||||
|
via: "AC.FindAll() on each chunk"
|
||||||
|
pattern: "FindAll"
|
||||||
|
- from: "pkg/engine/detector.go"
|
||||||
|
to: "pkg/engine/entropy.go"
|
||||||
|
via: "Shannon() called when EntropyMin > 0 in pattern"
|
||||||
|
pattern: "Shannon"
|
||||||
|
- from: "pkg/engine/engine.go"
|
||||||
|
to: "github.com/panjf2000/ants/v2"
|
||||||
|
via: "ants.NewPool for detector workers"
|
||||||
|
pattern: "ants\\.NewPool"
|
||||||
|
---
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Build the three-stage scanning engine pipeline: Aho-Corasick keyword pre-filter, regex + entropy detector workers using ants goroutine pool, and a FileSource adapter. Wire them together in an Engine that emits Findings on a channel.
|
||||||
|
|
||||||
|
Purpose: The scan engine is the core differentiator. Plans 02 and 03 provide its dependencies (Registry for patterns + keywords, storage types for Finding). The CLI (Plan 05) calls Engine.Scan() to implement `keyhunter scan`.
|
||||||
|
Output: pkg/engine/{chunk,finding,entropy,filter,detector,engine}.go and sources/{source,file}.go. scanner_test.go stubs filled.
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||||
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
@.planning/phases/01-foundation/01-RESEARCH.md
|
||||||
|
@.planning/phases/01-foundation/01-02-SUMMARY.md
|
||||||
|
|
||||||
|
<interfaces>
|
||||||
|
<!-- Provider Registry types (from Plan 02) -->
|
||||||
|
package providers
|
||||||
|
|
||||||
|
type Provider struct {
|
||||||
|
Name string
|
||||||
|
Keywords []string
|
||||||
|
Patterns []Pattern
|
||||||
|
Tier int
|
||||||
|
}
|
||||||
|
|
||||||
|
type Pattern struct {
|
||||||
|
Regex string
|
||||||
|
EntropyMin float64
|
||||||
|
Confidence string
|
||||||
|
}
|
||||||
|
|
||||||
|
type Registry struct { ... }
|
||||||
|
func (r *Registry) List() []Provider
|
||||||
|
func (r *Registry) AC() ahocorasick.AhoCorasick // pre-built Aho-Corasick
|
||||||
|
|
||||||
|
<!-- Three-stage pipeline pattern from RESEARCH.md Pattern 2 -->
|
||||||
|
chunksChan chan Chunk (buffer: 1000)
|
||||||
|
detectableChan chan Chunk (buffer: 500)
|
||||||
|
resultsChan chan Finding (buffer: 100)
|
||||||
|
|
||||||
|
Stage 1: Source.Chunks() → chunksChan (goroutine, closes chan on done)
|
||||||
|
Stage 2: KeywordFilter(chunksChan) → detectableChan (goroutine, AC.FindAll)
|
||||||
|
Stage 3: N detector workers (ants pool) → resultsChan
|
||||||
|
|
||||||
|
<!-- ScanConfig -->
|
||||||
|
type ScanConfig struct {
|
||||||
|
Workers int // default: runtime.NumCPU() * 8
|
||||||
|
Verify bool // Phase 5 — always false in Phase 1
|
||||||
|
Unmask bool // for output layer
|
||||||
|
}
|
||||||
|
|
||||||
|
<!-- Source interface -->
|
||||||
|
type Source interface {
|
||||||
|
Chunks(ctx context.Context, out chan<- Chunk) error
|
||||||
|
}
|
||||||
|
|
||||||
|
<!-- FileSource -->
|
||||||
|
type FileSource struct {
|
||||||
|
Path string
|
||||||
|
ChunkSize int // bytes per chunk, default 4096
|
||||||
|
}
|
||||||
|
|
||||||
|
Chunking strategy: read file in chunks of ChunkSize bytes with overlap of max(256, maxPatternLen)
|
||||||
|
to avoid splitting a key across chunk boundaries.
|
||||||
|
|
||||||
|
<!-- Aho-Corasick import -->
|
||||||
|
import ahocorasick "github.com/petar-dambovaliev/aho-corasick"
|
||||||
|
// ac.FindAll(s string) []ahocorasick.Match — returns match positions
|
||||||
|
|
||||||
|
<!-- ants import -->
|
||||||
|
import "github.com/panjf2000/ants/v2"
|
||||||
|
// pool, _ := ants.NewPool(workers, ants.WithOptions(...))
|
||||||
|
// pool.Submit(func() { ... })
|
||||||
|
// pool.ReleaseWithTimeout(timeout)
|
||||||
|
</interfaces>
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<tasks>
|
||||||
|
|
||||||
|
<task type="auto" tdd="true">
|
||||||
|
<name>Task 1: Core types and Shannon entropy function</name>
|
||||||
|
<files>pkg/engine/chunk.go, pkg/engine/finding.go, pkg/engine/entropy.go</files>
|
||||||
|
<read_first>
|
||||||
|
- /home/salva/Documents/apikey/.planning/phases/01-foundation/01-RESEARCH.md (CORE-04 row: Shannon entropy, ~10-line stdlib function, threshold 3.5 bits/char)
|
||||||
|
- /home/salva/Documents/apikey/pkg/storage/findings.go (Finding and MaskKey defined there — engine.Finding is a separate type for the pipeline)
|
||||||
|
</read_first>
|
||||||
|
<behavior>
|
||||||
|
- Test 1: Shannon("aaaaaaa") → value near 0.0 (all same characters, no entropy)
|
||||||
|
- Test 2: Shannon("abcdefgh") → value near 3.0 (8 distinct chars)
|
||||||
|
- Test 3: Shannon("sk-proj-ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr") → >= 3.5 (real key entropy)
|
||||||
|
- Test 4: Shannon("") → 0.0 (empty string)
|
||||||
|
- Test 5: MaskKey("sk-proj-abc1234") → "sk-proj-...1234" (first 8 + last 4)
|
||||||
|
- Test 6: MaskKey("abc") → "****" (too short to mask)
|
||||||
|
</behavior>
|
||||||
|
<action>
|
||||||
|
Create **pkg/engine/chunk.go**:
|
||||||
|
```go
|
||||||
|
package engine
|
||||||
|
|
||||||
|
// Chunk is a segment of file content passed through the scanning pipeline.
|
||||||
|
type Chunk struct {
|
||||||
|
Data []byte // raw bytes
|
||||||
|
Source string // file path, URL, or description
|
||||||
|
Offset int64 // byte offset of this chunk within the source
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create **pkg/engine/finding.go**:
|
||||||
|
```go
|
||||||
|
package engine
|
||||||
|
|
||||||
|
import "time"
|
||||||
|
|
||||||
|
// Finding represents a detected API key from the scanning pipeline.
|
||||||
|
// KeyValue holds the plaintext key — the storage layer encrypts it before persisting.
|
||||||
|
type Finding struct {
|
||||||
|
ProviderName string
|
||||||
|
KeyValue string // full plaintext key
|
||||||
|
KeyMasked string // first8...last4
|
||||||
|
Confidence string // "high", "medium", "low"
|
||||||
|
Source string // file path or description
|
||||||
|
SourceType string // "file", "dir", "git", "stdin", "url"
|
||||||
|
LineNumber int
|
||||||
|
Offset int64
|
||||||
|
DetectedAt time.Time
|
||||||
|
}
|
||||||
|
|
||||||
|
// MaskKey returns a masked representation: first 8 chars + "..." + last 4 chars.
|
||||||
|
// Returns "****" if the key is shorter than 12 characters.
|
||||||
|
func MaskKey(key string) string {
|
||||||
|
if len(key) < 12 {
|
||||||
|
return "****"
|
||||||
|
}
|
||||||
|
return key[:8] + "..." + key[len(key)-4:]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create **pkg/engine/entropy.go**:
|
||||||
|
```go
|
||||||
|
package engine
|
||||||
|
|
||||||
|
import "math"
|
||||||
|
|
||||||
|
// Shannon computes the Shannon entropy of a string in bits per character.
|
||||||
|
// Returns 0.0 for empty strings.
|
||||||
|
// A value >= 3.5 indicates high randomness, consistent with real API keys.
|
||||||
|
func Shannon(s string) float64 {
|
||||||
|
if len(s) == 0 {
|
||||||
|
return 0.0
|
||||||
|
}
|
||||||
|
freq := make(map[rune]float64)
|
||||||
|
for _, c := range s {
|
||||||
|
freq[c]++
|
||||||
|
}
|
||||||
|
n := float64(len([]rune(s)))
|
||||||
|
var entropy float64
|
||||||
|
for _, count := range freq {
|
||||||
|
p := count / n
|
||||||
|
entropy -= p * math.Log2(p)
|
||||||
|
}
|
||||||
|
return entropy
|
||||||
|
}
|
||||||
|
```
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go build ./pkg/engine/... && echo "BUILD OK"</automated>
|
||||||
|
</verify>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- `go build ./pkg/engine/...` exits 0
|
||||||
|
- pkg/engine/chunk.go exports Chunk with fields Data, Source, Offset
|
||||||
|
- pkg/engine/finding.go exports Finding and MaskKey
|
||||||
|
- pkg/engine/entropy.go exports Shannon using math.Log2
|
||||||
|
- `grep -q 'math\.Log2' pkg/engine/entropy.go` exits 0
|
||||||
|
- Shannon("aaaaaaa") == 0.0 (manually verifiable from code)
|
||||||
|
- MaskKey("sk-proj-abc1234") produces "sk-proj-...1234"
|
||||||
|
</acceptance_criteria>
|
||||||
|
<done>Chunk, Finding, MaskKey, and Shannon exist and compile. Shannon uses stdlib math only — no external library.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="auto" tdd="true">
|
||||||
|
<name>Task 2: Pipeline stages, engine orchestration, FileSource, and filled test stubs</name>
|
||||||
|
<files>
|
||||||
|
pkg/engine/filter.go,
|
||||||
|
pkg/engine/detector.go,
|
||||||
|
pkg/engine/engine.go,
|
||||||
|
pkg/engine/sources/source.go,
|
||||||
|
pkg/engine/sources/file.go,
|
||||||
|
pkg/engine/scanner_test.go
|
||||||
|
</files>
|
||||||
|
<read_first>
|
||||||
|
- /home/salva/Documents/apikey/.planning/phases/01-foundation/01-RESEARCH.md (Pattern 2: Three-Stage Scanning Pipeline — exact channel-based code example)
|
||||||
|
- /home/salva/Documents/apikey/pkg/engine/chunk.go
|
||||||
|
- /home/salva/Documents/apikey/pkg/engine/finding.go
|
||||||
|
- /home/salva/Documents/apikey/pkg/engine/entropy.go
|
||||||
|
- /home/salva/Documents/apikey/pkg/providers/registry.go (Registry.AC() and Registry.List() signatures)
|
||||||
|
</read_first>
|
||||||
|
<behavior>
|
||||||
|
- Test 1: Scan testdata/samples/openai_key.txt → 1 finding, ProviderName=="openai", KeyValue contains "sk-proj-"
|
||||||
|
- Test 2: Scan testdata/samples/anthropic_key.txt → 1 finding, ProviderName=="anthropic"
|
||||||
|
- Test 3: Scan testdata/samples/no_keys.txt → 0 findings
|
||||||
|
- Test 4: Scan testdata/samples/multiple_keys.txt → 2 findings (openai + anthropic)
|
||||||
|
- Test 5: Shannon("sk-proj-ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr") >= 3.5 (entropy check)
|
||||||
|
- Test 6: KeywordFilter drops a chunk with text "hello world" (no provider keywords)
|
||||||
|
</behavior>
|
||||||
|
<action>
|
||||||
|
Create **pkg/engine/sources/source.go**:
|
||||||
|
```go
|
||||||
|
package sources
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/engine"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Source is the interface all input adapters must implement.
|
||||||
|
// Chunks writes content segments to the out channel until the source is exhausted or ctx is cancelled.
|
||||||
|
type Source interface {
|
||||||
|
Chunks(ctx context.Context, out chan<- engine.Chunk) error
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create **pkg/engine/sources/file.go**:
|
||||||
|
```go
|
||||||
|
package sources
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"os"
|
||||||
|
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/engine"
|
||||||
|
)
|
||||||
|
|
||||||
|
const defaultChunkSize = 4096
|
||||||
|
const chunkOverlap = 256 // overlap between chunks to avoid splitting keys at boundaries
|
||||||
|
|
||||||
|
// FileSource reads a single file and emits overlapping chunks.
|
||||||
|
type FileSource struct {
|
||||||
|
Path string
|
||||||
|
ChunkSize int
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewFileSource creates a FileSource for the given path with the default chunk size.
|
||||||
|
func NewFileSource(path string) *FileSource {
|
||||||
|
return &FileSource{Path: path, ChunkSize: defaultChunkSize}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Chunks reads the file in overlapping segments and sends each chunk to out.
|
||||||
|
func (f *FileSource) Chunks(ctx context.Context, out chan<- engine.Chunk) error {
|
||||||
|
data, err := os.ReadFile(f.Path)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
size := f.ChunkSize
|
||||||
|
if size <= 0 {
|
||||||
|
size = defaultChunkSize
|
||||||
|
}
|
||||||
|
if len(data) <= size {
|
||||||
|
// File fits in one chunk
|
||||||
|
select {
|
||||||
|
case <-ctx.Done():
|
||||||
|
return ctx.Err()
|
||||||
|
case out <- engine.Chunk{Data: data, Source: f.Path, Offset: 0}:
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
// Emit overlapping chunks
|
||||||
|
var offset int64
|
||||||
|
for start := 0; start < len(data); start += size - chunkOverlap {
|
||||||
|
end := start + size
|
||||||
|
if end > len(data) {
|
||||||
|
end = len(data)
|
||||||
|
}
|
||||||
|
chunk := engine.Chunk{
|
||||||
|
Data: data[start:end],
|
||||||
|
Source: f.Path,
|
||||||
|
Offset: offset,
|
||||||
|
}
|
||||||
|
select {
|
||||||
|
case <-ctx.Done():
|
||||||
|
return ctx.Err()
|
||||||
|
case out <- chunk:
|
||||||
|
}
|
||||||
|
offset += int64(end - start)
|
||||||
|
if end == len(data) {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create **pkg/engine/filter.go**:
|
||||||
|
```go
|
||||||
|
package engine
|
||||||
|
|
||||||
|
import (
|
||||||
|
ahocorasick "github.com/petar-dambovaliev/aho-corasick"
|
||||||
|
)
|
||||||
|
|
||||||
|
// KeywordFilter filters a stream of chunks using an Aho-Corasick automaton.
|
||||||
|
// Only chunks that contain at least one provider keyword are sent to out.
|
||||||
|
// This is Stage 2 of the pipeline (runs after Source, before Detector).
|
||||||
|
func KeywordFilter(ac ahocorasick.AhoCorasick, in <-chan Chunk, out chan<- Chunk) {
|
||||||
|
for chunk := range in {
|
||||||
|
if len(ac.FindAll(string(chunk.Data))) > 0 {
|
||||||
|
out <- chunk
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create **pkg/engine/detector.go**:
|
||||||
|
```go
|
||||||
|
package engine
|
||||||
|
|
||||||
|
import (
|
||||||
|
"regexp"
|
||||||
|
"strings"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/providers"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Detector applies provider regex patterns and optional entropy checks to a chunk.
|
||||||
|
// It returns all findings from the chunk.
|
||||||
|
func Detect(chunk Chunk, providerList []providers.Provider) []Finding {
|
||||||
|
var findings []Finding
|
||||||
|
content := string(chunk.Data)
|
||||||
|
|
||||||
|
for _, p := range providerList {
|
||||||
|
for _, pat := range p.Patterns {
|
||||||
|
re, err := regexp.Compile(pat.Regex)
|
||||||
|
if err != nil {
|
||||||
|
continue // invalid regex — skip silently
|
||||||
|
}
|
||||||
|
matches := re.FindAllString(content, -1)
|
||||||
|
for _, match := range matches {
|
||||||
|
// Apply entropy check if threshold is set
|
||||||
|
if pat.EntropyMin > 0 && Shannon(match) < pat.EntropyMin {
|
||||||
|
continue // too low entropy — likely a placeholder
|
||||||
|
}
|
||||||
|
line := lineNumber(content, match)
|
||||||
|
findings = append(findings, Finding{
|
||||||
|
ProviderName: p.Name,
|
||||||
|
KeyValue: match,
|
||||||
|
KeyMasked: MaskKey(match),
|
||||||
|
Confidence: pat.Confidence,
|
||||||
|
Source: chunk.Source,
|
||||||
|
SourceType: "file",
|
||||||
|
LineNumber: line,
|
||||||
|
Offset: chunk.Offset,
|
||||||
|
DetectedAt: time.Now(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return findings
|
||||||
|
}
|
||||||
|
|
||||||
|
// lineNumber returns the 1-based line number where match first appears in content.
|
||||||
|
func lineNumber(content, match string) int {
|
||||||
|
idx := strings.Index(content, match)
|
||||||
|
if idx < 0 {
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
return strings.Count(content[:idx], "\n") + 1
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create **pkg/engine/engine.go**:
|
||||||
|
```go
|
||||||
|
package engine
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"runtime"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"github.com/panjf2000/ants/v2"
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/providers"
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/engine/sources"
|
||||||
|
)
|
||||||
|
|
||||||
|
// ScanConfig controls scan execution parameters.
|
||||||
|
type ScanConfig struct {
|
||||||
|
Workers int // number of detector goroutines; defaults to runtime.NumCPU() * 8
|
||||||
|
Verify bool // opt-in active verification (Phase 5)
|
||||||
|
Unmask bool // include full key in Finding.KeyValue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Engine orchestrates the three-stage scanning pipeline.
|
||||||
|
type Engine struct {
|
||||||
|
registry *providers.Registry
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewEngine creates an Engine backed by the given provider registry.
|
||||||
|
func NewEngine(registry *providers.Registry) *Engine {
|
||||||
|
return &Engine{registry: registry}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Scan runs the three-stage pipeline against src and returns a channel of Findings.
|
||||||
|
// The channel is closed when all chunks have been processed.
|
||||||
|
// The caller must drain the channel fully or cancel ctx to avoid goroutine leaks.
|
||||||
|
func (e *Engine) Scan(ctx context.Context, src sources.Source, cfg ScanConfig) (<-chan Finding, error) {
|
||||||
|
workers := cfg.Workers
|
||||||
|
if workers <= 0 {
|
||||||
|
workers = runtime.NumCPU() * 8
|
||||||
|
}
|
||||||
|
|
||||||
|
chunksChan := make(chan Chunk, 1000)
|
||||||
|
detectableChan := make(chan Chunk, 500)
|
||||||
|
resultsChan := make(chan Finding, 100)
|
||||||
|
|
||||||
|
// Stage 1: source → chunksChan
|
||||||
|
go func() {
|
||||||
|
defer close(chunksChan)
|
||||||
|
_ = src.Chunks(ctx, chunksChan)
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Stage 2: keyword pre-filter → detectableChan
|
||||||
|
go func() {
|
||||||
|
defer close(detectableChan)
|
||||||
|
KeywordFilter(e.registry.AC(), chunksChan, detectableChan)
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Stage 3: detector workers → resultsChan
|
||||||
|
pool, err := ants.NewPool(workers)
|
||||||
|
if err != nil {
|
||||||
|
close(resultsChan)
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
providerList := e.registry.List()
|
||||||
|
|
||||||
|
var wg sync.WaitGroup
|
||||||
|
var mu sync.Mutex
|
||||||
|
|
||||||
|
go func() {
|
||||||
|
defer func() {
|
||||||
|
wg.Wait()
|
||||||
|
close(resultsChan)
|
||||||
|
pool.ReleaseWithTimeout(5 * time.Second)
|
||||||
|
}()
|
||||||
|
|
||||||
|
for chunk := range detectableChan {
|
||||||
|
c := chunk // capture
|
||||||
|
wg.Add(1)
|
||||||
|
_ = pool.Submit(func() {
|
||||||
|
defer wg.Done()
|
||||||
|
found := Detect(c, providerList)
|
||||||
|
mu.Lock()
|
||||||
|
for _, f := range found {
|
||||||
|
select {
|
||||||
|
case resultsChan <- f:
|
||||||
|
case <-ctx.Done():
|
||||||
|
}
|
||||||
|
}
|
||||||
|
mu.Unlock()
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
return resultsChan, nil
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Fill **pkg/engine/scanner_test.go** (replacing stubs from Plan 01):
|
||||||
|
```go
|
||||||
|
package engine_test
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/engine"
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/engine/sources"
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/providers"
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
)
|
||||||
|
|
||||||
|
func newTestRegistry(t *testing.T) *providers.Registry {
|
||||||
|
t.Helper()
|
||||||
|
reg, err := providers.NewRegistry()
|
||||||
|
require.NoError(t, err)
|
||||||
|
return reg
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestShannonEntropy(t *testing.T) {
|
||||||
|
assert.InDelta(t, 0.0, engine.Shannon("aaaaaaa"), 0.01)
|
||||||
|
assert.Greater(t, engine.Shannon("sk-proj-ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr"), 3.5)
|
||||||
|
assert.Equal(t, 0.0, engine.Shannon(""))
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestKeywordPreFilter(t *testing.T) {
|
||||||
|
reg := newTestRegistry(t)
|
||||||
|
ac := reg.AC()
|
||||||
|
|
||||||
|
// Chunk with OpenAI keyword should pass
|
||||||
|
matches := ac.FindAll("export OPENAI_API_KEY=sk-proj-test")
|
||||||
|
assert.NotEmpty(t, matches)
|
||||||
|
|
||||||
|
// Chunk with no keywords should be dropped
|
||||||
|
noMatches := ac.FindAll("hello world no secrets here")
|
||||||
|
assert.Empty(t, noMatches)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestScannerPipelineOpenAI(t *testing.T) {
|
||||||
|
reg := newTestRegistry(t)
|
||||||
|
eng := engine.NewEngine(reg)
|
||||||
|
src := sources.NewFileSource("../../testdata/samples/openai_key.txt")
|
||||||
|
cfg := engine.ScanConfig{Workers: 2}
|
||||||
|
|
||||||
|
ch, err := eng.Scan(context.Background(), src, cfg)
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
var findings []engine.Finding
|
||||||
|
for f := range ch {
|
||||||
|
findings = append(findings, f)
|
||||||
|
}
|
||||||
|
|
||||||
|
require.Len(t, findings, 1, "expected exactly 1 finding in openai_key.txt")
|
||||||
|
assert.Equal(t, "openai", findings[0].ProviderName)
|
||||||
|
assert.Contains(t, findings[0].KeyValue, "sk-proj-")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestScannerPipelineNoKeys(t *testing.T) {
|
||||||
|
reg := newTestRegistry(t)
|
||||||
|
eng := engine.NewEngine(reg)
|
||||||
|
src := sources.NewFileSource("../../testdata/samples/no_keys.txt")
|
||||||
|
cfg := engine.ScanConfig{Workers: 2}
|
||||||
|
|
||||||
|
ch, err := eng.Scan(context.Background(), src, cfg)
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
var findings []engine.Finding
|
||||||
|
for f := range ch {
|
||||||
|
findings = append(findings, f)
|
||||||
|
}
|
||||||
|
|
||||||
|
assert.Empty(t, findings, "expected zero findings in no_keys.txt")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestScannerPipelineMultipleKeys(t *testing.T) {
|
||||||
|
reg := newTestRegistry(t)
|
||||||
|
eng := engine.NewEngine(reg)
|
||||||
|
src := sources.NewFileSource("../../testdata/samples/multiple_keys.txt")
|
||||||
|
cfg := engine.ScanConfig{Workers: 2}
|
||||||
|
|
||||||
|
ch, err := eng.Scan(context.Background(), src, cfg)
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
var findings []engine.Finding
|
||||||
|
for f := range ch {
|
||||||
|
findings = append(findings, f)
|
||||||
|
}
|
||||||
|
|
||||||
|
assert.GreaterOrEqual(t, len(findings), 2, "expected at least 2 findings in multiple_keys.txt")
|
||||||
|
|
||||||
|
var names []string
|
||||||
|
for _, f := range findings {
|
||||||
|
names = append(names, f.ProviderName)
|
||||||
|
}
|
||||||
|
assert.Contains(t, names, "openai")
|
||||||
|
assert.Contains(t, names, "anthropic")
|
||||||
|
}
|
||||||
|
```
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go test ./pkg/engine/... -v -count=1 2>&1 | tail -30</automated>
|
||||||
|
</verify>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- `go test ./pkg/engine/... -v -count=1` exits 0 with all tests PASS (no SKIP)
|
||||||
|
- TestShannonEntropy passes — 0.0 for "aaaaaaa", >= 3.5 for real key pattern
|
||||||
|
- TestKeywordPreFilter passes — AC matches sk-proj-, empty for "hello world"
|
||||||
|
- TestScannerPipelineOpenAI passes — 1 finding with ProviderName=="openai"
|
||||||
|
- TestScannerPipelineNoKeys passes — 0 findings
|
||||||
|
- TestScannerPipelineMultipleKeys passes — >= 2 findings with both provider names
|
||||||
|
- `grep -q 'ants\.NewPool' pkg/engine/engine.go` exits 0
|
||||||
|
- `grep -q 'KeywordFilter' pkg/engine/engine.go` exits 0
|
||||||
|
- `go build ./...` still exits 0
|
||||||
|
</acceptance_criteria>
|
||||||
|
<done>Three-stage scanning pipeline works end-to-end: FileSource → KeywordFilter (AC) → Detect (regex + entropy) → Finding channel. All engine tests pass.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
</tasks>
|
||||||
|
|
||||||
|
<verification>
|
||||||
|
After both tasks:
|
||||||
|
- `go test ./pkg/engine/... -v -count=1` exits 0 with 6 tests PASS
|
||||||
|
- `go build ./...` exits 0
|
||||||
|
- `grep -q 'ants\.NewPool' pkg/engine/engine.go` exits 0
|
||||||
|
- `grep -q 'math\.Log2' pkg/engine/entropy.go` exits 0
|
||||||
|
- Scanning testdata/samples/openai_key.txt returns 1 finding with provider "openai"
|
||||||
|
- Scanning testdata/samples/no_keys.txt returns 0 findings
|
||||||
|
</verification>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- Three-stage pipeline: AC pre-filter → regex + entropy detector → results channel (CORE-01, CORE-06)
|
||||||
|
- Shannon entropy function using stdlib math (CORE-04)
|
||||||
|
- ants v2 goroutine pool with configurable worker count (CORE-05)
|
||||||
|
- FileSource adapter reading files in overlapping chunks (CORE-07 partial — full mmap in Phase 4)
|
||||||
|
- All engine tests pass against real testdata fixtures
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<output>
|
||||||
|
After completion, create `.planning/phases/01-foundation/01-04-SUMMARY.md` following the summary template.
|
||||||
|
</output>
|
||||||
748
.planning/phases/01-foundation/01-05-PLAN.md
Normal file
748
.planning/phases/01-foundation/01-05-PLAN.md
Normal file
@@ -0,0 +1,748 @@
|
|||||||
|
---
|
||||||
|
phase: 01-foundation
|
||||||
|
plan: 05
|
||||||
|
type: execute
|
||||||
|
wave: 3
|
||||||
|
depends_on: [01-02, 01-03, 01-04]
|
||||||
|
files_modified:
|
||||||
|
- cmd/root.go
|
||||||
|
- cmd/scan.go
|
||||||
|
- cmd/providers.go
|
||||||
|
- cmd/config.go
|
||||||
|
- pkg/config/config.go
|
||||||
|
- pkg/output/table.go
|
||||||
|
autonomous: false
|
||||||
|
requirements: [CLI-01, CLI-02, CLI-03, CLI-04, CLI-05]
|
||||||
|
|
||||||
|
must_haves:
|
||||||
|
truths:
|
||||||
|
- "`keyhunter scan ./testdata/samples/openai_key.txt` runs the pipeline and prints a finding"
|
||||||
|
- "`keyhunter providers list` prints a table with at least 3 providers"
|
||||||
|
- "`keyhunter providers info openai` prints OpenAI provider details"
|
||||||
|
- "`keyhunter config init` creates ~/.keyhunter.yaml without error"
|
||||||
|
- "`keyhunter config set workers 16` persists the value to ~/.keyhunter.yaml"
|
||||||
|
- "`keyhunter --help` shows all top-level commands: scan, providers, config"
|
||||||
|
artifacts:
|
||||||
|
- path: "cmd/root.go"
|
||||||
|
provides: "Cobra root command with PersistentPreRunE config loading"
|
||||||
|
contains: "cobra.Command"
|
||||||
|
- path: "cmd/scan.go"
|
||||||
|
provides: "scan command wiring Engine + FileSource + output table"
|
||||||
|
exports: ["scanCmd"]
|
||||||
|
- path: "cmd/providers.go"
|
||||||
|
provides: "providers list/info/stats subcommands using Registry"
|
||||||
|
exports: ["providersCmd"]
|
||||||
|
- path: "cmd/config.go"
|
||||||
|
provides: "config init/set/get subcommands using Viper"
|
||||||
|
exports: ["configCmd"]
|
||||||
|
- path: "pkg/config/config.go"
|
||||||
|
provides: "Config struct with Load() and defaults"
|
||||||
|
exports: ["Config", "Load"]
|
||||||
|
- path: "pkg/output/table.go"
|
||||||
|
provides: "lipgloss terminal table for printing Findings"
|
||||||
|
exports: ["PrintFindings"]
|
||||||
|
key_links:
|
||||||
|
- from: "cmd/scan.go"
|
||||||
|
to: "pkg/engine/engine.go"
|
||||||
|
via: "engine.NewEngine(registry).Scan() called in RunE"
|
||||||
|
pattern: "engine\\.NewEngine"
|
||||||
|
- from: "cmd/scan.go"
|
||||||
|
to: "pkg/storage/db.go"
|
||||||
|
via: "storage.Open() called, SaveFinding for each result"
|
||||||
|
pattern: "storage\\.Open"
|
||||||
|
- from: "cmd/root.go"
|
||||||
|
to: "github.com/spf13/viper"
|
||||||
|
via: "viper.SetConfigFile in PersistentPreRunE"
|
||||||
|
pattern: "viper\\.SetConfigFile"
|
||||||
|
- from: "cmd/providers.go"
|
||||||
|
to: "pkg/providers/registry.go"
|
||||||
|
via: "Registry.List(), Registry.Get(), Registry.Stats() called"
|
||||||
|
pattern: "registry\\.List|registry\\.Get|registry\\.Stats"
|
||||||
|
---
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Wire all subsystems together through the Cobra CLI: scan command (engine + storage + output), providers list/info/stats commands, and config init/set/get commands. This is the integration layer — all business logic lives in pkg/, cmd/ only wires.
|
||||||
|
|
||||||
|
Purpose: Satisfies all Phase 1 CLI requirements and delivers the first working `keyhunter scan` command that completes the end-to-end success criteria.
|
||||||
|
Output: cmd/{root,scan,providers,config}.go, pkg/config/config.go, pkg/output/table.go.
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
|
||||||
|
@$HOME/.claude/get-shit-done/templates/summary.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
@.planning/phases/01-foundation/01-RESEARCH.md
|
||||||
|
@.planning/phases/01-foundation/01-02-SUMMARY.md
|
||||||
|
@.planning/phases/01-foundation/01-03-SUMMARY.md
|
||||||
|
@.planning/phases/01-foundation/01-04-SUMMARY.md
|
||||||
|
|
||||||
|
<interfaces>
|
||||||
|
<!-- Engine (from Plan 04) -->
|
||||||
|
package engine
|
||||||
|
type ScanConfig struct { Workers int; Verify bool; Unmask bool }
|
||||||
|
func NewEngine(registry *providers.Registry) *Engine
|
||||||
|
func (e *Engine) Scan(ctx context.Context, src sources.Source, cfg ScanConfig) (<-chan Finding, error)
|
||||||
|
|
||||||
|
<!-- FileSource (from Plan 04) -->
|
||||||
|
package sources
|
||||||
|
func NewFileSource(path string) *FileSource
|
||||||
|
|
||||||
|
<!-- Finding type (from Plan 04) -->
|
||||||
|
type Finding struct {
|
||||||
|
ProviderName string
|
||||||
|
KeyValue string
|
||||||
|
KeyMasked string
|
||||||
|
Confidence string
|
||||||
|
Source string
|
||||||
|
LineNumber int
|
||||||
|
}
|
||||||
|
|
||||||
|
<!-- Storage (from Plan 03) -->
|
||||||
|
package storage
|
||||||
|
func Open(path string) (*DB, error)
|
||||||
|
func (db *DB) SaveFinding(f Finding, encKey []byte) (int64, error)
|
||||||
|
func DeriveKey(passphrase []byte, salt []byte) []byte
|
||||||
|
func NewSalt() ([]byte, error)
|
||||||
|
|
||||||
|
<!-- Registry (from Plan 02) -->
|
||||||
|
package providers
|
||||||
|
func NewRegistry() (*Registry, error)
|
||||||
|
func (r *Registry) List() []Provider
|
||||||
|
func (r *Registry) Get(name string) (Provider, bool)
|
||||||
|
func (r *Registry) Stats() RegistryStats
|
||||||
|
|
||||||
|
<!-- Config defaults -->
|
||||||
|
DBPath: ~/.keyhunter/keyhunter.db
|
||||||
|
ConfigPath: ~/.keyhunter.yaml
|
||||||
|
Workers: runtime.NumCPU() * 8
|
||||||
|
Passphrase: (prompt if not in env KEYHUNTER_PASSPHRASE — Phase 1: use empty string as dev default)
|
||||||
|
|
||||||
|
<!-- Viper config keys -->
|
||||||
|
"database.path" → DBPath
|
||||||
|
"scan.workers" → Workers
|
||||||
|
"encryption.passphrase" → Passphrase (sensitive — warn in help)
|
||||||
|
|
||||||
|
<!-- lipgloss table output -->
|
||||||
|
Columns: PROVIDER | MASKED KEY | CONFIDENCE | SOURCE | LINE
|
||||||
|
Colors: use lipgloss.NewStyle().Foreground() for confidence: high=green, medium=yellow, low=red
|
||||||
|
</interfaces>
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<tasks>
|
||||||
|
|
||||||
|
<task type="auto" tdd="false">
|
||||||
|
<name>Task 1: Config package, output table, and root command</name>
|
||||||
|
<files>pkg/config/config.go, pkg/output/table.go, cmd/root.go</files>
|
||||||
|
<read_first>
|
||||||
|
- /home/salva/Documents/apikey/.planning/phases/01-foundation/01-RESEARCH.md (CLI-01, CLI-02, CLI-03 rows, Standard Stack: cobra v1.10.2 + viper v1.21.0)
|
||||||
|
- /home/salva/Documents/apikey/pkg/engine/finding.go (Finding struct fields for output)
|
||||||
|
</read_first>
|
||||||
|
<action>
|
||||||
|
Create **pkg/config/config.go**:
|
||||||
|
```go
|
||||||
|
package config
|
||||||
|
|
||||||
|
import (
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"runtime"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Config holds all KeyHunter runtime configuration.
|
||||||
|
// Values are populated from ~/.keyhunter.yaml, environment variables, and CLI flags (in that precedence order).
|
||||||
|
type Config struct {
|
||||||
|
DBPath string // path to SQLite database file
|
||||||
|
ConfigPath string // path to config YAML file
|
||||||
|
Workers int // number of scanner worker goroutines
|
||||||
|
Passphrase string // encryption passphrase (sensitive)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Load returns a Config with defaults applied.
|
||||||
|
// Callers should override individual fields after Load() using viper-bound values.
|
||||||
|
func Load() Config {
|
||||||
|
home, _ := os.UserHomeDir()
|
||||||
|
return Config{
|
||||||
|
DBPath: filepath.Join(home, ".keyhunter", "keyhunter.db"),
|
||||||
|
ConfigPath: filepath.Join(home, ".keyhunter.yaml"),
|
||||||
|
Workers: runtime.NumCPU() * 8,
|
||||||
|
Passphrase: "", // Phase 1: empty passphrase; Phase 6+ will prompt
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create **pkg/output/table.go**:
|
||||||
|
```go
|
||||||
|
package output
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
|
||||||
|
"github.com/charmbracelet/lipgloss"
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/engine"
|
||||||
|
)
|
||||||
|
|
||||||
|
var (
|
||||||
|
styleHigh = lipgloss.NewStyle().Foreground(lipgloss.Color("2")) // green
|
||||||
|
styleMedium = lipgloss.NewStyle().Foreground(lipgloss.Color("3")) // yellow
|
||||||
|
styleLow = lipgloss.NewStyle().Foreground(lipgloss.Color("1")) // red
|
||||||
|
styleHeader = lipgloss.NewStyle().Bold(true).Underline(true)
|
||||||
|
)
|
||||||
|
|
||||||
|
// PrintFindings writes findings as a colored terminal table to stdout.
|
||||||
|
// If unmask is true, KeyValue is shown; otherwise KeyMasked is shown.
|
||||||
|
func PrintFindings(findings []engine.Finding, unmask bool) {
|
||||||
|
if len(findings) == 0 {
|
||||||
|
fmt.Println("No API keys found.")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Header
|
||||||
|
fmt.Fprintf(os.Stdout, "%-20s %-40s %-10s %-30s %s\n",
|
||||||
|
styleHeader.Render("PROVIDER"),
|
||||||
|
styleHeader.Render("KEY"),
|
||||||
|
styleHeader.Render("CONFIDENCE"),
|
||||||
|
styleHeader.Render("SOURCE"),
|
||||||
|
styleHeader.Render("LINE"),
|
||||||
|
)
|
||||||
|
fmt.Println(lipgloss.NewStyle().Foreground(lipgloss.Color("8")).Render(
|
||||||
|
"──────────────────────────────────────────────────────────────────────────────────────────────────────────",
|
||||||
|
))
|
||||||
|
|
||||||
|
for _, f := range findings {
|
||||||
|
keyDisplay := f.KeyMasked
|
||||||
|
if unmask {
|
||||||
|
keyDisplay = f.KeyValue
|
||||||
|
}
|
||||||
|
|
||||||
|
confStyle := styleLow
|
||||||
|
switch f.Confidence {
|
||||||
|
case "high":
|
||||||
|
confStyle = styleHigh
|
||||||
|
case "medium":
|
||||||
|
confStyle = styleMedium
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Fprintf(os.Stdout, "%-20s %-40s %-10s %-30s %d\n",
|
||||||
|
f.ProviderName,
|
||||||
|
keyDisplay,
|
||||||
|
confStyle.Render(f.Confidence),
|
||||||
|
truncate(f.Source, 28),
|
||||||
|
f.LineNumber,
|
||||||
|
)
|
||||||
|
}
|
||||||
|
fmt.Printf("\n%d key(s) found.\n", len(findings))
|
||||||
|
}
|
||||||
|
|
||||||
|
func truncate(s string, max int) string {
|
||||||
|
if len(s) <= max {
|
||||||
|
return s
|
||||||
|
}
|
||||||
|
return "..." + s[len(s)-max+3:]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create **cmd/root.go** (replaces the stub from Plan 01):
|
||||||
|
```go
|
||||||
|
package cmd
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
|
||||||
|
"github.com/spf13/cobra"
|
||||||
|
"github.com/spf13/viper"
|
||||||
|
)
|
||||||
|
|
||||||
|
var cfgFile string
|
||||||
|
|
||||||
|
// rootCmd is the base command when called without any subcommands.
|
||||||
|
var rootCmd = &cobra.Command{
|
||||||
|
Use: "keyhunter",
|
||||||
|
Short: "KeyHunter — detect leaked LLM API keys across 108+ providers",
|
||||||
|
Long: `KeyHunter scans files, git history, and internet sources for leaked LLM API keys.
|
||||||
|
Supports 108+ providers with Aho-Corasick pre-filtering and regex + entropy detection.`,
|
||||||
|
SilenceUsage: true,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Execute is the entry point called by main.go.
|
||||||
|
func Execute() {
|
||||||
|
if err := rootCmd.Execute(); err != nil {
|
||||||
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func init() {
|
||||||
|
cobra.OnInitialize(initConfig)
|
||||||
|
rootCmd.PersistentFlags().StringVar(&cfgFile, "config", "", "config file (default: ~/.keyhunter.yaml)")
|
||||||
|
rootCmd.AddCommand(scanCmd)
|
||||||
|
rootCmd.AddCommand(providersCmd)
|
||||||
|
rootCmd.AddCommand(configCmd)
|
||||||
|
}
|
||||||
|
|
||||||
|
func initConfig() {
|
||||||
|
if cfgFile != "" {
|
||||||
|
viper.SetConfigFile(cfgFile)
|
||||||
|
} else {
|
||||||
|
home, err := os.UserHomeDir()
|
||||||
|
if err != nil {
|
||||||
|
fmt.Fprintln(os.Stderr, "warning: cannot determine home directory:", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
viper.SetConfigName(".keyhunter")
|
||||||
|
viper.SetConfigType("yaml")
|
||||||
|
viper.AddConfigPath(home)
|
||||||
|
viper.AddConfigPath(".")
|
||||||
|
}
|
||||||
|
|
||||||
|
viper.SetEnvPrefix("KEYHUNTER")
|
||||||
|
viper.AutomaticEnv()
|
||||||
|
|
||||||
|
// Defaults
|
||||||
|
viper.SetDefault("scan.workers", 0) // 0 = auto (CPU*8)
|
||||||
|
viper.SetDefault("database.path", filepath.Join(mustHomeDir(), ".keyhunter", "keyhunter.db"))
|
||||||
|
|
||||||
|
// Config file is optional — ignore if not found
|
||||||
|
_ = viper.ReadInConfig()
|
||||||
|
}
|
||||||
|
|
||||||
|
func mustHomeDir() string {
|
||||||
|
h, _ := os.UserHomeDir()
|
||||||
|
return h
|
||||||
|
}
|
||||||
|
```
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go build ./... && ./keyhunter --help 2>&1 | grep -E "scan|providers|config" && echo "HELP OK"</automated>
|
||||||
|
</verify>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- `go build ./...` exits 0
|
||||||
|
- `./keyhunter --help` shows "scan", "providers", and "config" in command list
|
||||||
|
- pkg/config/config.go exports Config and Load
|
||||||
|
- pkg/output/table.go exports PrintFindings
|
||||||
|
- cmd/root.go declares rootCmd, Execute(), scanCmd, providersCmd, configCmd referenced
|
||||||
|
- `grep -q 'viper\.SetConfigFile\|viper\.SetConfigName' cmd/root.go` exits 0
|
||||||
|
- lipgloss used for header and confidence coloring
|
||||||
|
</acceptance_criteria>
|
||||||
|
<done>Root command, config package, and output table exist. `keyhunter --help` shows the three top-level commands.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="auto" tdd="false">
|
||||||
|
<name>Task 2: scan, providers, and config subcommands</name>
|
||||||
|
<files>cmd/scan.go, cmd/providers.go, cmd/config.go</files>
|
||||||
|
<read_first>
|
||||||
|
- /home/salva/Documents/apikey/.planning/phases/01-foundation/01-RESEARCH.md (CLI-04, CLI-05 rows, Pattern 2 pipeline usage)
|
||||||
|
- /home/salva/Documents/apikey/cmd/root.go (rootCmd, viper setup)
|
||||||
|
- /home/salva/Documents/apikey/pkg/engine/engine.go (Engine.Scan, ScanConfig)
|
||||||
|
- /home/salva/Documents/apikey/pkg/storage/db.go (Open, SaveFinding)
|
||||||
|
- /home/salva/Documents/apikey/pkg/providers/registry.go (NewRegistry, List, Get, Stats)
|
||||||
|
</read_first>
|
||||||
|
<action>
|
||||||
|
Create **cmd/scan.go**:
|
||||||
|
```go
|
||||||
|
package cmd
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"runtime"
|
||||||
|
|
||||||
|
"github.com/spf13/cobra"
|
||||||
|
"github.com/spf13/viper"
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/config"
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/engine"
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/engine/sources"
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/output"
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/providers"
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/storage"
|
||||||
|
)
|
||||||
|
|
||||||
|
var (
|
||||||
|
flagWorkers int
|
||||||
|
flagVerify bool
|
||||||
|
flagUnmask bool
|
||||||
|
flagOutput string
|
||||||
|
flagExclude []string
|
||||||
|
)
|
||||||
|
|
||||||
|
var scanCmd = &cobra.Command{
|
||||||
|
Use: "scan <path>",
|
||||||
|
Short: "Scan a file or directory for leaked API keys",
|
||||||
|
Args: cobra.ExactArgs(1),
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
target := args[0]
|
||||||
|
|
||||||
|
// Load config
|
||||||
|
cfg := config.Load()
|
||||||
|
if viper.GetInt("scan.workers") > 0 {
|
||||||
|
cfg.Workers = viper.GetInt("scan.workers")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Workers flag overrides config
|
||||||
|
workers := flagWorkers
|
||||||
|
if workers <= 0 {
|
||||||
|
workers = cfg.Workers
|
||||||
|
}
|
||||||
|
if workers <= 0 {
|
||||||
|
workers = runtime.NumCPU() * 8
|
||||||
|
}
|
||||||
|
|
||||||
|
// Initialize registry
|
||||||
|
reg, err := providers.NewRegistry()
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("loading providers: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Initialize engine
|
||||||
|
eng := engine.NewEngine(reg)
|
||||||
|
src := sources.NewFileSource(target)
|
||||||
|
|
||||||
|
scanCfg := engine.ScanConfig{
|
||||||
|
Workers: workers,
|
||||||
|
Verify: flagVerify,
|
||||||
|
Unmask: flagUnmask,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Open database (ensure directory exists)
|
||||||
|
dbPath := viper.GetString("database.path")
|
||||||
|
if dbPath == "" {
|
||||||
|
dbPath = cfg.DBPath
|
||||||
|
}
|
||||||
|
if err := os.MkdirAll(filepath.Dir(dbPath), 0700); err != nil {
|
||||||
|
return fmt.Errorf("creating database directory: %w", err)
|
||||||
|
}
|
||||||
|
db, err := storage.Open(dbPath)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("opening database: %w", err)
|
||||||
|
}
|
||||||
|
defer db.Close()
|
||||||
|
|
||||||
|
// Derive encryption key (Phase 1: empty passphrase with fixed dev salt)
|
||||||
|
salt := []byte("keyhunter-dev-s0") // Phase 1 placeholder — Phase 6 replaces with proper salt storage
|
||||||
|
encKey := storage.DeriveKey([]byte(cfg.Passphrase), salt)
|
||||||
|
|
||||||
|
// Run scan
|
||||||
|
ch, err := eng.Scan(context.Background(), src, scanCfg)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("starting scan: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
var findings []engine.Finding
|
||||||
|
for f := range ch {
|
||||||
|
findings = append(findings, f)
|
||||||
|
// Persist to storage
|
||||||
|
storeFinding := storage.Finding{
|
||||||
|
ProviderName: f.ProviderName,
|
||||||
|
KeyValue: f.KeyValue,
|
||||||
|
KeyMasked: f.KeyMasked,
|
||||||
|
Confidence: f.Confidence,
|
||||||
|
SourcePath: f.Source,
|
||||||
|
SourceType: f.SourceType,
|
||||||
|
LineNumber: f.LineNumber,
|
||||||
|
}
|
||||||
|
if _, err := db.SaveFinding(storeFinding, encKey); err != nil {
|
||||||
|
fmt.Fprintf(os.Stderr, "warning: failed to save finding: %v\n", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Output
|
||||||
|
switch flagOutput {
|
||||||
|
case "json":
|
||||||
|
// Phase 6 — basic JSON for now
|
||||||
|
fmt.Printf("[] # JSON output: Phase 6\n")
|
||||||
|
default:
|
||||||
|
output.PrintFindings(findings, flagUnmask)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Exit code semantics (CLI-05 / OUT-06): 0=clean, 1=found, 2=error
|
||||||
|
if len(findings) > 0 {
|
||||||
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
func init() {
|
||||||
|
scanCmd.Flags().IntVar(&flagWorkers, "workers", 0, "number of worker goroutines (default: CPU*8)")
|
||||||
|
scanCmd.Flags().BoolVar(&flagVerify, "verify", false, "actively verify found keys (opt-in, Phase 5)")
|
||||||
|
scanCmd.Flags().BoolVar(&flagUnmask, "unmask", false, "show full key values (default: masked)")
|
||||||
|
scanCmd.Flags().StringVar(&flagOutput, "output", "table", "output format: table, json (more in Phase 6)")
|
||||||
|
scanCmd.Flags().StringSliceVar(&flagExclude, "exclude", nil, "glob patterns to exclude (e.g. *.min.js)")
|
||||||
|
viper.BindPFlag("scan.workers", scanCmd.Flags().Lookup("workers"))
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create **cmd/providers.go**:
|
||||||
|
```go
|
||||||
|
package cmd
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
"strings"
|
||||||
|
|
||||||
|
"github.com/charmbracelet/lipgloss"
|
||||||
|
"github.com/spf13/cobra"
|
||||||
|
"github.com/salvacybersec/keyhunter/pkg/providers"
|
||||||
|
)
|
||||||
|
|
||||||
|
var providersCmd = &cobra.Command{
|
||||||
|
Use: "providers",
|
||||||
|
Short: "Manage and inspect provider definitions",
|
||||||
|
}
|
||||||
|
|
||||||
|
var providersListCmd = &cobra.Command{
|
||||||
|
Use: "list",
|
||||||
|
Short: "List all loaded provider definitions",
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
reg, err := providers.NewRegistry()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
bold := lipgloss.NewStyle().Bold(true)
|
||||||
|
fmt.Fprintf(os.Stdout, "%-20s %-6s %-8s %s\n",
|
||||||
|
bold.Render("NAME"), bold.Render("TIER"), bold.Render("PATTERNS"), bold.Render("KEYWORDS"))
|
||||||
|
fmt.Println(strings.Repeat("─", 70))
|
||||||
|
for _, p := range reg.List() {
|
||||||
|
fmt.Fprintf(os.Stdout, "%-20s %-6d %-8d %s\n",
|
||||||
|
p.Name, p.Tier, len(p.Patterns), strings.Join(p.Keywords, ", "))
|
||||||
|
}
|
||||||
|
stats := reg.Stats()
|
||||||
|
fmt.Printf("\nTotal: %d providers\n", stats.Total)
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
var providersInfoCmd = &cobra.Command{
|
||||||
|
Use: "info <name>",
|
||||||
|
Short: "Show detailed info for a provider",
|
||||||
|
Args: cobra.ExactArgs(1),
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
reg, err := providers.NewRegistry()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
p, ok := reg.Get(args[0])
|
||||||
|
if !ok {
|
||||||
|
return fmt.Errorf("provider %q not found", args[0])
|
||||||
|
}
|
||||||
|
fmt.Printf("Name: %s\n", p.Name)
|
||||||
|
fmt.Printf("Display Name: %s\n", p.DisplayName)
|
||||||
|
fmt.Printf("Tier: %d\n", p.Tier)
|
||||||
|
fmt.Printf("Last Verified: %s\n", p.LastVerified)
|
||||||
|
fmt.Printf("Keywords: %s\n", strings.Join(p.Keywords, ", "))
|
||||||
|
fmt.Printf("Patterns: %d\n", len(p.Patterns))
|
||||||
|
for i, pat := range p.Patterns {
|
||||||
|
fmt.Printf(" [%d] regex=%s confidence=%s entropy_min=%.1f\n",
|
||||||
|
i+1, pat.Regex, pat.Confidence, pat.EntropyMin)
|
||||||
|
}
|
||||||
|
if p.Verify.URL != "" {
|
||||||
|
fmt.Printf("Verify URL: %s %s\n", p.Verify.Method, p.Verify.URL)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
var providersStatsCmd = &cobra.Command{
|
||||||
|
Use: "stats",
|
||||||
|
Short: "Show provider statistics",
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
reg, err := providers.NewRegistry()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
stats := reg.Stats()
|
||||||
|
fmt.Printf("Total providers: %d\n", stats.Total)
|
||||||
|
fmt.Printf("By tier:\n")
|
||||||
|
for tier := 1; tier <= 9; tier++ {
|
||||||
|
if count := stats.ByTier[tier]; count > 0 {
|
||||||
|
fmt.Printf(" Tier %d: %d\n", tier, count)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
fmt.Printf("By confidence:\n")
|
||||||
|
for conf, count := range stats.ByConfidence {
|
||||||
|
fmt.Printf(" %s: %d\n", conf, count)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
func init() {
|
||||||
|
providersCmd.AddCommand(providersListCmd)
|
||||||
|
providersCmd.AddCommand(providersInfoCmd)
|
||||||
|
providersCmd.AddCommand(providersStatsCmd)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Create **cmd/config.go**:
|
||||||
|
```go
|
||||||
|
package cmd
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
|
||||||
|
"github.com/spf13/cobra"
|
||||||
|
"github.com/spf13/viper"
|
||||||
|
)
|
||||||
|
|
||||||
|
var configCmd = &cobra.Command{
|
||||||
|
Use: "config",
|
||||||
|
Short: "Manage KeyHunter configuration",
|
||||||
|
}
|
||||||
|
|
||||||
|
var configInitCmd = &cobra.Command{
|
||||||
|
Use: "init",
|
||||||
|
Short: "Create default configuration file at ~/.keyhunter.yaml",
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
home, err := os.UserHomeDir()
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("cannot determine home directory: %w", err)
|
||||||
|
}
|
||||||
|
configPath := filepath.Join(home, ".keyhunter.yaml")
|
||||||
|
|
||||||
|
// Set defaults before writing
|
||||||
|
viper.SetDefault("scan.workers", 0)
|
||||||
|
viper.SetDefault("database.path", filepath.Join(home, ".keyhunter", "keyhunter.db"))
|
||||||
|
|
||||||
|
if err := viper.WriteConfigAs(configPath); err != nil {
|
||||||
|
return fmt.Errorf("writing config: %w", err)
|
||||||
|
}
|
||||||
|
fmt.Printf("Config initialized: %s\n", configPath)
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
var configSetCmd = &cobra.Command{
|
||||||
|
Use: "set <key> <value>",
|
||||||
|
Short: "Set a configuration value",
|
||||||
|
Args: cobra.ExactArgs(2),
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
key, value := args[0], args[1]
|
||||||
|
viper.Set(key, value)
|
||||||
|
if err := viper.WriteConfig(); err != nil {
|
||||||
|
// If config file doesn't exist yet, create it
|
||||||
|
home, _ := os.UserHomeDir()
|
||||||
|
configPath := filepath.Join(home, ".keyhunter.yaml")
|
||||||
|
if err2 := viper.WriteConfigAs(configPath); err2 != nil {
|
||||||
|
return fmt.Errorf("writing config: %w", err2)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
fmt.Printf("Set %s = %s\n", key, value)
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
var configGetCmd = &cobra.Command{
|
||||||
|
Use: "get <key>",
|
||||||
|
Short: "Get a configuration value",
|
||||||
|
Args: cobra.ExactArgs(1),
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
val := viper.Get(args[0])
|
||||||
|
if val == nil {
|
||||||
|
return fmt.Errorf("key %q not found", args[0])
|
||||||
|
}
|
||||||
|
fmt.Printf("%v\n", val)
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
func init() {
|
||||||
|
configCmd.AddCommand(configInitCmd)
|
||||||
|
configCmd.AddCommand(configSetCmd)
|
||||||
|
configCmd.AddCommand(configGetCmd)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
<automated>cd /home/salva/Documents/apikey && go build -o keyhunter . && ./keyhunter providers list && ./keyhunter providers info openai && echo "PROVIDERS OK"</automated>
|
||||||
|
</verify>
|
||||||
|
<acceptance_criteria>
|
||||||
|
- `go build -o keyhunter .` exits 0
|
||||||
|
- `./keyhunter --help` shows scan, providers, config commands
|
||||||
|
- `./keyhunter providers list` prints table with >= 3 rows including "openai"
|
||||||
|
- `./keyhunter providers info openai` prints Name, Tier, Keywords, Patterns, Verify URL
|
||||||
|
- `./keyhunter providers stats` prints "Total providers: 3" or more
|
||||||
|
- `./keyhunter config init` creates or updates ~/.keyhunter.yaml
|
||||||
|
- `./keyhunter config set scan.workers 16` exits 0
|
||||||
|
- `./keyhunter scan testdata/samples/openai_key.txt` exits 1 (keys found) and prints a table row with "openai"
|
||||||
|
- `./keyhunter scan testdata/samples/no_keys.txt` exits 0 and prints "No API keys found."
|
||||||
|
- `grep -q 'viper\.BindPFlag' cmd/scan.go` exits 0
|
||||||
|
</acceptance_criteria>
|
||||||
|
<done>Full CLI works: scan finds and persists keys, providers list/info/stats work, config init/set/get work. Phase 1 success criteria all met.</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="checkpoint:human-verify" gate="blocking">
|
||||||
|
<what-built>
|
||||||
|
Complete Phase 1 implementation:
|
||||||
|
- Provider registry with 3 YAML definitions, Aho-Corasick automaton, schema validation
|
||||||
|
- Storage layer with AES-256-GCM encryption, Argon2id key derivation, SQLite WAL mode
|
||||||
|
- Three-stage scan engine: keyword pre-filter → regex + entropy detector → finding channel
|
||||||
|
- CLI: keyhunter scan, providers list/info/stats, config init/set/get
|
||||||
|
</what-built>
|
||||||
|
<how-to-verify>
|
||||||
|
Run these commands from the project root and confirm each expected output:
|
||||||
|
|
||||||
|
1. `cd /home/salva/Documents/apikey && go test ./... -v -count=1`
|
||||||
|
Expected: All tests PASS, zero FAIL, zero SKIP (except original stubs now filled)
|
||||||
|
|
||||||
|
2. `./keyhunter scan testdata/samples/openai_key.txt`
|
||||||
|
Expected: Exit code 1, table printed with 1 row showing "openai" provider, masked key
|
||||||
|
|
||||||
|
3. `./keyhunter scan testdata/samples/no_keys.txt`
|
||||||
|
Expected: Exit code 0, "No API keys found." printed
|
||||||
|
|
||||||
|
4. `./keyhunter providers list`
|
||||||
|
Expected: Table with openai, anthropic, huggingface rows
|
||||||
|
|
||||||
|
5. `./keyhunter providers info openai`
|
||||||
|
Expected: Name, Tier 1, Keywords including "sk-proj-", Pattern regex shown
|
||||||
|
|
||||||
|
6. `./keyhunter config init`
|
||||||
|
Expected: "Config initialized: ~/.keyhunter.yaml" and the file exists
|
||||||
|
|
||||||
|
7. `./keyhunter config set scan.workers 16 && ./keyhunter config get scan.workers`
|
||||||
|
Expected: "Set scan.workers = 16" then "16"
|
||||||
|
|
||||||
|
8. Build the binary with production flags:
|
||||||
|
`CGO_ENABLED=0 go build -ldflags="-s -w" -o keyhunter-prod .`
|
||||||
|
Expected: Builds without error, binary produced
|
||||||
|
</how-to-verify>
|
||||||
|
<resume-signal>Type "approved" if all 8 checks pass, or describe which check failed and what output you saw.</resume-signal>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
</tasks>
|
||||||
|
|
||||||
|
<verification>
|
||||||
|
Full Phase 1 integration check:
|
||||||
|
- `go test ./... -count=1` exits 0
|
||||||
|
- `./keyhunter scan testdata/samples/openai_key.txt` exits 1 with findings table
|
||||||
|
- `./keyhunter scan testdata/samples/no_keys.txt` exits 0 with "No API keys found."
|
||||||
|
- `./keyhunter providers list` shows 3+ providers
|
||||||
|
- `./keyhunter config init` creates ~/.keyhunter.yaml
|
||||||
|
- `CGO_ENABLED=0 go build -ldflags="-s -w" -o keyhunter-prod .` exits 0
|
||||||
|
</verification>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- Cobra CLI with scan, providers, config commands (CLI-01)
|
||||||
|
- `keyhunter config init` creates ~/.keyhunter.yaml (CLI-02)
|
||||||
|
- `keyhunter config set key value` persists (CLI-03)
|
||||||
|
- `keyhunter providers list/info/stats` work (CLI-04)
|
||||||
|
- scan flags: --workers, --verify, --unmask, --output, --exclude (CLI-05)
|
||||||
|
- All Phase 1 success criteria from ROADMAP.md satisfied:
|
||||||
|
1. `keyhunter scan ./somefile` runs three-stage pipeline and returns findings with provider names
|
||||||
|
2. Findings persisted to SQLite with AES-256 encrypted key_value
|
||||||
|
3. `keyhunter config init` and `config set` work
|
||||||
|
4. `keyhunter providers list/info` return provider metadata from YAML
|
||||||
|
5. Provider YAML has format_version and last_verified, validated at load time
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<output>
|
||||||
|
After completion, create `.planning/phases/01-foundation/01-05-SUMMARY.md` following the summary template.
|
||||||
|
</output>
|
||||||
Reference in New Issue
Block a user