Files
keyhunter/.planning/phases/04-input-sources/04-05-PLAN.md
2026-04-06 12:27:23 +03:00

436 lines
15 KiB
Markdown

---
phase: 04-input-sources
plan: 05
type: execute
wave: 2
depends_on: ["04-02", "04-03", "04-04"]
files_modified:
- cmd/scan.go
- cmd/scan_sources_test.go
autonomous: true
requirements:
- INPUT-06
must_haves:
truths:
- "keyhunter scan <dir> uses DirSource when target is a directory (not FileSource)"
- "keyhunter scan <file> continues to use FileSource when target is a single file"
- "keyhunter scan --git <repo> uses GitSource, honoring --since YYYY-MM-DD"
- "keyhunter scan stdin and keyhunter scan - both use StdinSource"
- "keyhunter scan --url <https://...> uses URLSource"
- "keyhunter scan --clipboard uses ClipboardSource (no positional arg required)"
- "--exclude flags are forwarded to DirSource"
- "Exactly one source is selected — conflicting flags return an error"
artifacts:
- path: "cmd/scan.go"
provides: "Source-selection logic dispatching to the appropriate Source implementation"
contains: "selectSource"
min_lines: 180
- path: "cmd/scan_sources_test.go"
provides: "Unit tests for selectSource covering every flag combination"
min_lines: 80
key_links:
- from: "cmd/scan.go"
to: "pkg/engine/sources"
via: "sources.NewDirSource/NewGitSource/NewStdinSource/NewURLSource/NewClipboardSource"
pattern: "sources\\.New(Dir|Git|Stdin|URL|Clipboard)Source"
- from: "cmd/scan.go"
to: "cobra flags"
via: "--git, --url, --clipboard, --since, --exclude"
pattern: "\\-\\-git|\\-\\-url|\\-\\-clipboard|\\-\\-since"
---
<objective>
Wire the four new source adapters (DirSource, GitSource, StdinSource, URLSource, ClipboardSource) into `cmd/scan.go` via a new `selectSource` helper that inspects CLI flags and positional args to pick exactly one source. Satisfies INPUT-06 (the "all inputs flow through the same pipeline" integration requirement).
Purpose: Plans 04-02 through 04-04 deliver the Source implementations in isolation. This plan is the single integration point that makes them reachable from the CLI, with argument validation to prevent ambiguous invocations like `keyhunter scan --git --url https://...`.
Output: Updated `cmd/scan.go` with new flags and dispatching logic, plus a focused test file exercising `selectSource` directly.
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/phases/04-input-sources/04-CONTEXT.md
@cmd/scan.go
@pkg/engine/sources/source.go
<interfaces>
Source constructors from Wave 1 plans:
```go
// Plan 04-02
func NewFileSource(path string) *FileSource
func NewDirSource(root string, extraExcludes ...string) *DirSource
func NewDirSourceRaw(root string, excludes []string) *DirSource
// Plan 04-03
func NewGitSource(repoPath string) *GitSource
type GitSource struct {
RepoPath string
Since time.Time
ChunkSize int
}
// Plan 04-04
func NewStdinSource() *StdinSource
func NewURLSource(rawURL string) *URLSource
func NewClipboardSource() *ClipboardSource
```
Existing cmd/scan.go contract (see file for full body):
- Package `cmd`
- Uses `sources.NewFileSource(target)` unconditionally today
- Has `flagExclude []string` already declared
- init() registers flags: --workers, --verify, --unmask, --output, --exclude
</interfaces>
</context>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Add source-selection flags and dispatch logic to cmd/scan.go</name>
<read_first>
- cmd/scan.go (full file)
- pkg/engine/sources/source.go
- pkg/engine/sources/dir.go (produced by 04-02)
- pkg/engine/sources/git.go (produced by 04-03)
- pkg/engine/sources/stdin.go (produced by 04-04)
- pkg/engine/sources/url.go (produced by 04-04)
- pkg/engine/sources/clipboard.go (produced by 04-04)
</read_first>
<files>cmd/scan.go, cmd/scan_sources_test.go</files>
<behavior>
- Test 1: selectSource with target="." on a directory returns a *DirSource
- Test 2: selectSource with target pointing to a file returns a *FileSource
- Test 3: selectSource with flagGit=true and target="./repo" returns a *GitSource
- Test 4: selectSource with flagGit=true and flagSince="2024-01-01" sets GitSource.Since correctly
- Test 5: selectSource with invalid --since format returns a parse error
- Test 6: selectSource with flagURL set returns a *URLSource
- Test 7: selectSource with flagClipboard=true and no args returns a *ClipboardSource
- Test 8: selectSource with target="stdin" returns a *StdinSource
- Test 9: selectSource with target="-" returns a *StdinSource
- Test 10: selectSource with both --git and --url set returns an error
- Test 11: selectSource with --clipboard and a positional target returns an error
- Test 12: selectSource forwards --exclude patterns into DirSource.Excludes
</behavior>
<action>
Edit `cmd/scan.go`. The end state must:
1. Add new package-level flag vars alongside the existing ones:
```go
var (
flagWorkers int
flagVerify bool
flagUnmask bool
flagOutput string
flagExclude []string
flagGit bool
flagURL string
flagClipboard bool
flagSince string
flagMaxFileSize int64
flagInsecure bool
)
```
2. Change `scanCmd.Args` so a positional target is optional when `--url` or `--clipboard` is used:
```go
var scanCmd = &cobra.Command{
Use: "scan [path|stdin|-]",
Short: "Scan files, directories, git history, stdin, URLs, or clipboard for leaked API keys",
Args: cobra.MaximumNArgs(1),
RunE: func(cmd *cobra.Command, args []string) error {
// ... existing config load ...
src, err := selectSource(args, sourceFlags{
Git: flagGit,
URL: flagURL,
Clipboard: flagClipboard,
Since: flagSince,
Excludes: flagExclude,
})
if err != nil {
return err
}
// Replace the old `src := sources.NewFileSource(target)` line with use of the dispatched src.
// Keep all downstream code unchanged (engine, storage, output).
// ... rest of existing RunE body, using src ...
_ = src
return nil // placeholder — keep existing logic
},
}
```
3. Add the selectSource helper and its supporting struct, in `cmd/scan.go`:
```go
// sourceFlags captures the CLI inputs that control source selection.
// Extracted into a struct so selectSource is straightforward to unit test.
type sourceFlags struct {
Git bool
URL string
Clipboard bool
Since string
Excludes []string
}
// selectSource inspects positional args and source flags, validates that
// exactly one source is specified, and returns the appropriate Source.
func selectSource(args []string, f sourceFlags) (sources.Source, error) {
// Count explicit source selectors that take no positional path.
explicitCount := 0
if f.URL != "" {
explicitCount++
}
if f.Clipboard {
explicitCount++
}
if f.Git {
explicitCount++
}
if explicitCount > 1 {
return nil, fmt.Errorf("scan: --git, --url, and --clipboard are mutually exclusive")
}
// Clipboard and URL take no positional argument.
if f.Clipboard {
if len(args) > 0 {
return nil, fmt.Errorf("scan: --clipboard does not accept a positional argument")
}
return sources.NewClipboardSource(), nil
}
if f.URL != "" {
if len(args) > 0 {
return nil, fmt.Errorf("scan: --url does not accept a positional argument")
}
return sources.NewURLSource(f.URL), nil
}
if len(args) == 0 {
return nil, fmt.Errorf("scan: missing target (path, stdin, -, or a source flag)")
}
target := args[0]
if target == "stdin" || target == "-" {
if f.Git {
return nil, fmt.Errorf("scan: --git cannot be combined with stdin")
}
return sources.NewStdinSource(), nil
}
if f.Git {
gs := sources.NewGitSource(target)
if f.Since != "" {
t, err := time.Parse("2006-01-02", f.Since)
if err != nil {
return nil, fmt.Errorf("scan: --since must be YYYY-MM-DD: %w", err)
}
gs.Since = t
}
return gs, nil
}
info, err := os.Stat(target)
if err != nil {
return nil, fmt.Errorf("scan: stat %q: %w", target, err)
}
if info.IsDir() {
return sources.NewDirSource(target, f.Excludes...), nil
}
return sources.NewFileSource(target), nil
}
```
4. In the existing `init()`, register the new flags next to the existing ones:
```go
func init() {
scanCmd.Flags().IntVar(&flagWorkers, "workers", 0, "number of worker goroutines (default: CPU*8)")
scanCmd.Flags().BoolVar(&flagVerify, "verify", false, "actively verify found keys (opt-in, Phase 5)")
scanCmd.Flags().BoolVar(&flagUnmask, "unmask", false, "show full key values (default: masked)")
scanCmd.Flags().StringVar(&flagOutput, "output", "table", "output format: table, json")
scanCmd.Flags().StringSliceVar(&flagExclude, "exclude", nil, "extra glob patterns to exclude (e.g. *.min.js)")
// Phase 4 source-selection flags.
scanCmd.Flags().BoolVar(&flagGit, "git", false, "treat target as a git repo and scan full history")
scanCmd.Flags().StringVar(&flagURL, "url", "", "fetch and scan a remote http(s) URL (no positional arg)")
scanCmd.Flags().BoolVar(&flagClipboard, "clipboard", false, "scan current clipboard contents")
scanCmd.Flags().StringVar(&flagSince, "since", "", "for --git: only scan commits after YYYY-MM-DD")
scanCmd.Flags().Int64Var(&flagMaxFileSize, "max-file-size", 0, "max file size in bytes to scan (0 = unlimited)")
scanCmd.Flags().BoolVar(&flagInsecure, "insecure", false, "for --url: skip TLS certificate verification")
_ = viper.BindPFlag("scan.workers", scanCmd.Flags().Lookup("workers"))
}
```
5. Replace the single line `src := sources.NewFileSource(target)` in the existing RunE body with the `selectSource` dispatch. Leave ALL downstream code (engine.Scan, storage.SaveFinding, output switch, exit code logic) untouched. Ensure the `target` variable is only used where relevant (it is no longer the sole driver of source construction).
6. Add the `time` import to `cmd/scan.go`.
Create `cmd/scan_sources_test.go`:
```go
package cmd
import (
"os"
"path/filepath"
"testing"
"time"
"github.com/stretchr/testify/require"
"github.com/salvacybersec/keyhunter/pkg/engine/sources"
)
func TestSelectSource_Directory(t *testing.T) {
dir := t.TempDir()
src, err := selectSource([]string{dir}, sourceFlags{})
require.NoError(t, err)
_, ok := src.(*sources.DirSource)
require.True(t, ok, "expected *DirSource, got %T", src)
}
func TestSelectSource_File(t *testing.T) {
dir := t.TempDir()
f := filepath.Join(dir, "a.txt")
require.NoError(t, os.WriteFile(f, []byte("x"), 0o644))
src, err := selectSource([]string{f}, sourceFlags{})
require.NoError(t, err)
_, ok := src.(*sources.FileSource)
require.True(t, ok, "expected *FileSource, got %T", src)
}
func TestSelectSource_Git(t *testing.T) {
src, err := selectSource([]string{"./some-repo"}, sourceFlags{Git: true})
require.NoError(t, err)
gs, ok := src.(*sources.GitSource)
require.True(t, ok, "expected *GitSource, got %T", src)
require.Equal(t, "./some-repo", gs.RepoPath)
}
func TestSelectSource_GitSince(t *testing.T) {
src, err := selectSource([]string{"./repo"}, sourceFlags{Git: true, Since: "2024-01-15"})
require.NoError(t, err)
gs := src.(*sources.GitSource)
want, _ := time.Parse("2006-01-02", "2024-01-15")
require.Equal(t, want, gs.Since)
}
func TestSelectSource_GitSinceBadFormat(t *testing.T) {
_, err := selectSource([]string{"./repo"}, sourceFlags{Git: true, Since: "15/01/2024"})
require.Error(t, err)
require.Contains(t, err.Error(), "YYYY-MM-DD")
}
func TestSelectSource_URL(t *testing.T) {
src, err := selectSource(nil, sourceFlags{URL: "https://example.com/a.js"})
require.NoError(t, err)
_, ok := src.(*sources.URLSource)
require.True(t, ok)
}
func TestSelectSource_URLRejectsPositional(t *testing.T) {
_, err := selectSource([]string{"./foo"}, sourceFlags{URL: "https://x"})
require.Error(t, err)
}
func TestSelectSource_Clipboard(t *testing.T) {
src, err := selectSource(nil, sourceFlags{Clipboard: true})
require.NoError(t, err)
_, ok := src.(*sources.ClipboardSource)
require.True(t, ok)
}
func TestSelectSource_ClipboardRejectsPositional(t *testing.T) {
_, err := selectSource([]string{"./foo"}, sourceFlags{Clipboard: true})
require.Error(t, err)
}
func TestSelectSource_Stdin(t *testing.T) {
for _, tok := range []string{"stdin", "-"} {
src, err := selectSource([]string{tok}, sourceFlags{})
require.NoError(t, err)
_, ok := src.(*sources.StdinSource)
require.True(t, ok, "token %q: expected *StdinSource, got %T", tok, src)
}
}
func TestSelectSource_MutuallyExclusive(t *testing.T) {
_, err := selectSource(nil, sourceFlags{Git: true, URL: "https://x"})
require.Error(t, err)
require.Contains(t, err.Error(), "mutually exclusive")
}
func TestSelectSource_MissingTarget(t *testing.T) {
_, err := selectSource(nil, sourceFlags{})
require.Error(t, err)
require.Contains(t, err.Error(), "missing target")
}
func TestSelectSource_DirForwardsExcludes(t *testing.T) {
dir := t.TempDir()
src, err := selectSource([]string{dir}, sourceFlags{Excludes: []string{"*.log", "tmp/**"}})
require.NoError(t, err)
ds := src.(*sources.DirSource)
// NewDirSource merges DefaultExcludes with extras, so user patterns must be present.
found := 0
for _, e := range ds.Excludes {
if e == "*.log" || e == "tmp/**" {
found++
}
}
require.Equal(t, 2, found, "user excludes not forwarded, got %v", ds.Excludes)
}
```
After making these changes, run `go build ./...` and fix any import or compile errors. Do NOT modify pkg/engine/sources/* files — they are owned by Wave 1 plans.
</action>
<verify>
<automated>go build ./... && go test ./cmd/... -run TestSelectSource -race -count=1</automated>
</verify>
<acceptance_criteria>
- `go build ./...` exits 0
- `go test ./cmd/... -run TestSelectSource -race -count=1` passes all 13 subtests
- `go test ./... -race -count=1` full suite passes
- `grep -n "selectSource" cmd/scan.go` returns at least two hits (definition + call site)
- `grep -n "flagGit\|flagURL\|flagClipboard\|flagSince" cmd/scan.go` returns at least 4 hits
- `grep -n "sources.NewDirSource\|sources.NewGitSource\|sources.NewStdinSource\|sources.NewURLSource\|sources.NewClipboardSource" cmd/scan.go` returns 5 hits
- `grep -n "mutually exclusive" cmd/scan.go` returns a hit
- `keyhunter scan --help` (via `go run . scan --help`) lists --git, --url, --clipboard, --since flags
</acceptance_criteria>
<done>
cmd/scan.go dispatches to the correct Source implementation based on positional args and flags, with unambiguous error messages for conflicting selectors. All selectSource tests pass under -race. The existing single-file FileSource path still works unchanged.
</done>
</task>
</tasks>
<verification>
- `go build ./...` exits 0
- `go test ./... -race -count=1` full suite green (including earlier Wave 1 plan tests)
- `go run . scan --help` lists new flags
- `go run . scan ./pkg` completes successfully (DirSource path)
- `echo "API_KEY=test" | go run . scan -` completes successfully (StdinSource path)
</verification>
<success_criteria>
Users can invoke every Phase 4 input mode from the CLI and each one flows through the unchanged three-stage detection pipeline. INPUT-01 through INPUT-05 are reachable via CLI, and INPUT-06 (the integration meta-requirement) is satisfied by the passing test suite plus the help-text listing.
</success_criteria>
<output>
After completion, create `.planning/phases/04-input-sources/04-05-SUMMARY.md` documenting:
- selectSource signature and branches
- Flag additions
- Test pass summary
- A short one-line example invocation per new source (dir, git, stdin, url, clipboard)
- Confirmation that existing Phase 1-3 tests still pass
</output>