Files
keyhunter/.planning/phases/04-input-sources/04-VERIFICATION.md
2026-04-05 15:29:09 +03:00

15 KiB

phase, verified, status, score, human_verification
phase verified status score human_verification
04-input-sources 2026-04-05T00:00:00Z passed 6/6 must-haves verified
test expected why_human
Clipboard scan on a machine with xclip/xsel/wl-clipboard installed keyhunter scan --clipboard detects keys copied to clipboard Clipboard tooling is environment-specific; ClipboardSource.Unsupported branch + injectable Reader covers code paths, but real OS clipboard I/O cannot be exercised in a headless verify run
test expected why_human
URL scan against a real HTTPS endpoint keyhunter scan --url https://raw.githubusercontent.com/... fetches and scans Requires network access; URL parsing, scheme validation, content-type filtering, 50MB cap, and LimitReader all verified statically and via unit tests, but live TLS + redirect behavior is best smoke-tested by a human
test expected why_human
Git history scan against a repo with multiple branches, tags, and a stash keyhunter scan --git /path/to/repo detects keys that exist only in old commits / non-HEAD refs Multi-branch/tag/stash traversal and OID deduplication verified in unit tests using in-process go-git fixtures; a real repo smoke test confirms end-to-end behavior including --since filtering

Phase 4: Input Sources Verification Report

Phase Goal: Users can point KeyHunter at any content source — local files, git history across all branches, piped content, remote URLs, and the clipboard — and all are scanned through the same detection pipeline Verified: 2026-04-05 Status: passed Re-verification: No — initial verification

Goal Achievement

Observable Truths (from Success Criteria)

# Truth Status Evidence
1 keyhunter scan ./myrepo recursively scans with glob exclusions and mmap above threshold VERIFIED cmd/scan.go:242 dispatches to sources.NewDirSource(target, f.Excludes...); pkg/engine/sources/dir.go implements filepath.WalkDir + DefaultExcludes + isExcluded glob match + mmap.Open branch at size >= MmapThreshold (10MB). Behavioral check: dir scan on temp dir with node_modules/ excluded default, keys only surface from the keep file.
2 keyhunter scan --git ./myrepo scans full history (branches, tags, stash) with --since VERIFIED cmd/scan.go:226 flag dispatch; pkg/engine/sources/git.go uses git.PlainOpen, collectSeedCommits walks refs/heads, refs/tags, refs/stash, IsRemote, dereferences annotated tags, dedupes blobs by OID, and c.Author.When.Before(g.Since) enforces --since. Behavioral check: scan --git /tmp/testrepo produced findings with source: "git:7dd369c:leak.txt".
3 cat secrets.txt | keyhunter scan stdin detects keys from piped input VERIFIED cmd/scan.go:219 maps target=="stdin" || "-" to sources.NewStdinSource(); pkg/engine/sources/stdin.go reads from io.Reader (default os.Stdin) and emits chunks with Source="stdin". Behavioral check: echo "OPENAI_API_KEY=sk-proj-..." | keyhunter scan stdin produced findings with "source": "stdin".
4 keyhunter scan --url https://... fetches and scans remote content VERIFIED cmd/scan.go:207 dispatches to sources.NewURLSource(f.URL); pkg/engine/sources/url.go enforces http/https scheme whitelist, 30s timeout, 5-redirect cap, io.LimitReader(body, 50MB+1), content-type allowlist (text/*, application/json, application/javascript, application/xml, yaml variants), and emits with Source="url:<url>". Unit tests in url_test.go cover scheme rejection, content-type filter, size cap.
5 keyhunter scan --clipboard scans current clipboard content VERIFIED cmd/scan.go:201 dispatches to sources.NewClipboardSource() (no positional arg); pkg/engine/sources/clipboard.go calls clipboard.ReadAll via injectable Reader func, returns clear error when clipboard.Unsupported is true. Unit tests in clipboard_test.go cover reader fixture and unsupported branch.
6 All sources flow through the same detection pipeline (INPUT-06 integration) VERIFIED All Sources implement the single Source.Chunks(ctx, out) interface in pkg/engine/sources/source.go. cmd/scan.go:65 calls selectSource(args, sourceFlags{...}) and passes the returned sources.Source to eng.Scan(ctx, src, scanCfg) unchanged — the engine never branches on source type. selectSource enforces mutual exclusion of --git, --url, --clipboard.

Score: 6/6 truths verified

Required Artifacts

Artifact Expected Status Details
pkg/engine/sources/dir.go DirSource, recursive walk, excludes, mmap VERIFIED 218 lines; exports DirSource, NewDirSource, NewDirSourceRaw; contains filepath.WalkDir, mmap.Open, isBinary, DefaultExcludes, emitChunks
pkg/engine/sources/dir_test.go Tests for walk/exclude/binary/mmap/ordering/ctx VERIFIED 146 lines; TestDirSource_RecursiveWalk, _DefaultExcludes, _UserExclude, _BinarySkipped, _MmapLargeFile, _MissingRoot, _CtxCancellation
pkg/engine/sources/file.go FileSource reusing mmap + emitChunks VERIFIED 60 lines; uses MmapThreshold, mmap.Open, shared emitChunks helper
pkg/engine/sources/git.go GitSource via go-git/v5, all refs, OID dedup, --since, short-SHA source VERIFIED 216 lines; git.PlainOpen, collectSeedCommits (branches/tags/stash/remotes), seenBlobs OID dedup, shortSHA:path format, Since filter
pkg/engine/sources/git_test.go In-process go-git fixture tests VERIFIED 186 lines
pkg/engine/sources/stdin.go StdinSource reading io.Reader, source="stdin" VERIFIED 85 lines; injectable reader via NewStdinSourceFrom, default os.Stdin
pkg/engine/sources/stdin_test.go Reader fixture tests VERIFIED 50 lines
pkg/engine/sources/url.go URLSource with http.Client, LimitReader, CT whitelist VERIFIED 135 lines; http.Client, CheckRedirect cap 5, io.LimitReader, scheme whitelist, allowed CT list
pkg/engine/sources/url_test.go Scheme/CT/size cap tests VERIFIED 102 lines
pkg/engine/sources/clipboard.go ClipboardSource via atotto/clipboard with graceful fallback VERIFIED 45 lines; clipboard.Unsupported guard, clipboard.ReadAll, injectable Reader
pkg/engine/sources/clipboard_test.go Fixture + unsupported branch tests VERIFIED 54 lines
cmd/scan.go selectSource dispatcher + new flags VERIFIED 292 lines; selectSource(args, sourceFlags), flags --git --url --clipboard --since --exclude --insecure --max-file-size
cmd/scan_sources_test.go selectSource unit tests VERIFIED 112 lines
From To Via Status Details
dir.go golang.org/x/exp/mmap mmap.Open for large files WIRED pkg/engine/sources/dir.go:157 mmap.Open(path) inside size >= MmapThreshold branch
dir.go filepath.WalkDir recursive traversal WIRED pkg/engine/sources/dir.go:77
dir.go types.Chunk channel send via emitChunks WIRED pkg/engine/sources/dir.go:197,210 out <- types.Chunk{...}
git.go go-git/v5 git.PlainOpen WIRED pkg/engine/sources/git.go:47
git.go repo.References iterate refs/heads, refs/tags, refs/stash WIRED pkg/engine/sources/git.go:102 repo.References()
git.go types.Chunk channel send with git:sha:path source WIRED emitGitChunks sends types.Chunk{Source: fmt.Sprintf("git:%s:%s", shortSHA, f.Name)}
url.go net/http http.Client with timeout WIRED pkg/engine/sources/url.go:52-62 defaultHTTPClient with 30s Timeout + CheckRedirect
url.go io.LimitReader MaxContentLength enforcement WIRED pkg/engine/sources/url.go:104
clipboard.go atotto/clipboard clipboard.ReadAll WIRED pkg/engine/sources/clipboard.go:25,35
cmd/scan.go pkg/engine/sources sources.New{Dir,Git,Stdin,URL,Clipboard}Source WIRED cmd/scan.go:205,211,223,227,243,245 all five constructors reachable through selectSource
cmd/scan.go cobra flags --git --url --clipboard --since --exclude WIRED cmd/scan.go:284-289 all flags registered in init() and read via the sourceFlags struct
selectSource eng.Scan returned sources.Source passed to engine WIRED cmd/scan.go:65,105 single unified dispatch into the shared pipeline

Note: gsd-tools verify key-links reported some links as not-verified due to regex-pattern strictness (e.g. literal git.PlainOpen vs how the tool constructs its search). All links were confirmed by direct grep against the files. No real wiring gap exists.

Data-Flow Trace (Level 4)

Artifact Data Variable Source Produces Real Data Status
DirSource.Chunks data in emitFile os.ReadFile / mmap.Open+ReadAt of real files found via filepath.WalkDir Yes FLOWING
GitSource.Chunks data from f.Reader() io.ReadAll Real git blobs from commits walked via repo.Log across all seeded refs Yes FLOWING
StdinSource.Chunks data from io.ReadAll(s.Reader) os.Stdin in production, injected reader in tests Yes FLOWING
URLSource.Chunks data from io.ReadAll(limited) Real HTTP response body via client.Do(req) Yes FLOWING
ClipboardSource.Chunks text from reader() clipboard.ReadAll in production Yes FLOWING
cmd/scan.go src sources.Source selectSource returns the concrete source based on flags/args Yes FLOWING

Behavioral Spot-Checks

Behavior Command Result Status
Build succeeds go build ./... exit 0 PASS
Race tests pass go test ./pkg/engine/sources/... ./cmd/... -race -count=1 ok pkg/engine/sources 1.950s / ok cmd 1.018s PASS
Vet clean go vet ./... exit 0, no output PASS
Scan help exposes new flags keyhunter scan --help Shows --git --url --clipboard --exclude --since --insecure --max-file-size PASS
Stdin scan detects keys echo "OPENAI_API_KEY=sk-proj-..." | keyhunter scan stdin --output json JSON findings with "source": "stdin" PASS
Dir scan with default excludes keyhunter scan $TMPDIR (with node_modules/foo.txt decoy) Only finds keys in root test.txt, node_modules skipped PASS
Git scan produces git:sha:path source keyhunter scan --git /tmp/testrepo --output json Findings with "source": "git:7dd369c:leak.txt" PASS
Mutual-exclusion validation keyhunter scan --git --url https://example.com Error: scan: --git, --url, and --clipboard are mutually exclusive PASS

Requirements Coverage

Note: The phase prompt lists requirement IDs INPUT-01..INPUT-06. The REQUIREMENTS.md text for some IDs does not exactly match the phase-prompt mapping (appears to be a doc-order mismatch), but the phase's own plan requirements: declarations are consistent with the phase goal. Coverage is assessed against the phase-prompt mapping (which matches plan assignments).

Requirement Source Plan Description Status Evidence
INPUT-01 04-02 Directory/recursive scan with glob exclusions SATISFIED DirSource + DefaultExcludes + --exclude flag forwarding
CORE-07 04-02 mmap-based large file reads SATISFIED MmapThreshold=10MB, mmap.Open in both dir.go and file.go
INPUT-02 04-03 Git history scan across branches, tags, stash with --since SATISFIED GitSource with collectSeedCommits and Since filter
INPUT-03 04-04 stdin scanning SATISFIED StdinSource + cmd/scan.go stdin/- dispatch
INPUT-04 04-04 URL fetching SATISFIED URLSource + --url flag
INPUT-05 04-04 Clipboard scanning SATISFIED ClipboardSource + --clipboard flag
INPUT-06 04-05 Unified source pipeline SATISFIED Single Source interface, selectSource dispatcher, unchanged engine pipeline

Plan 04-01 (go.mod bootstrap) carries no requirement IDs — it is infra-only.

Anti-Patterns Found

No blocker anti-patterns. Minor wiring observations:

File Line Pattern Severity Impact
cmd/scan.go 289 --insecure flag declared but not forwarded to URLSource.Insecure Info URLSource.Insecure field exists and is ready to receive the value, but selectSource does not pass it through. TLS verification currently always on. Not in phase Success Criteria; flag is documented for --url future wiring. Recommend fast-follow in Phase 5 or hotfix.
cmd/scan.go 288 --max-file-size flag declared but not forwarded to DirSource/FileSource Info Same as above — flagMaxFileSize is read into a variable but never referenced by selectSource. Not in phase Success Criteria.
pkg/engine/sources/git.go 21 gitBinarySniffSize duplicates BinarySniffSize from dir.go Info Comment already notes "Local to this file until plan 04-02 introduces a package-wide constant". Constants are now defined in dir.go but git.go still uses the local copy. Cosmetic; no correctness issue.

None of these block the phase goal. None contradict any must-have truth or Success Criterion.

Human Verification Required

See human_verification: in frontmatter. Automated checks (build, race tests, unit tests, CLI smoke tests for stdin/dir/git) all pass. Items requiring human validation are the environmental/network-dependent sources: real clipboard I/O, live HTTPS fetch, and a multi-branch/tag/stash git repo smoke test. All three are code-path covered by unit tests via injected fixtures.

Gaps Summary

None. All six observable truths verified, all thirteen artifacts pass levels 1-4 (exist, substantive, wired, data flowing), all key links confirmed by direct inspection, all unit tests pass under -race, and every Success Criterion was exercised by a behavioral spot-check against the compiled binary (except clipboard + live URL which are appropriately routed to human verification).

The phase goal — "users can point KeyHunter at any content source ... and all are scanned through the same detection pipeline" — is achieved: five distinct sources all implement the single Source.Chunks interface, selectSource dispatches exactly one of them, and engine.Scan consumes the unified interface with no source-type branching.

Two minor follow-up items (--insecure, --max-file-size not forwarded) are documented as Info-level anti-patterns; they are outside the phase Success Criteria and do not block passage.


Verified: 2026-04-05 Verifier: Claude (gsd-verifier)