15 KiB
phase, verified, status, score, human_verification
| phase | verified | status | score | human_verification | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 04-input-sources | 2026-04-05T00:00:00Z | passed | 6/6 must-haves verified |
|
Phase 4: Input Sources Verification Report
Phase Goal: Users can point KeyHunter at any content source — local files, git history across all branches, piped content, remote URLs, and the clipboard — and all are scanned through the same detection pipeline Verified: 2026-04-05 Status: passed Re-verification: No — initial verification
Goal Achievement
Observable Truths (from Success Criteria)
| # | Truth | Status | Evidence |
|---|---|---|---|
| 1 | keyhunter scan ./myrepo recursively scans with glob exclusions and mmap above threshold |
VERIFIED | cmd/scan.go:242 dispatches to sources.NewDirSource(target, f.Excludes...); pkg/engine/sources/dir.go implements filepath.WalkDir + DefaultExcludes + isExcluded glob match + mmap.Open branch at size >= MmapThreshold (10MB). Behavioral check: dir scan on temp dir with node_modules/ excluded default, keys only surface from the keep file. |
| 2 | keyhunter scan --git ./myrepo scans full history (branches, tags, stash) with --since |
VERIFIED | cmd/scan.go:226 flag dispatch; pkg/engine/sources/git.go uses git.PlainOpen, collectSeedCommits walks refs/heads, refs/tags, refs/stash, IsRemote, dereferences annotated tags, dedupes blobs by OID, and c.Author.When.Before(g.Since) enforces --since. Behavioral check: scan --git /tmp/testrepo produced findings with source: "git:7dd369c:leak.txt". |
| 3 | cat secrets.txt | keyhunter scan stdin detects keys from piped input |
VERIFIED | cmd/scan.go:219 maps target=="stdin" || "-" to sources.NewStdinSource(); pkg/engine/sources/stdin.go reads from io.Reader (default os.Stdin) and emits chunks with Source="stdin". Behavioral check: echo "OPENAI_API_KEY=sk-proj-..." | keyhunter scan stdin produced findings with "source": "stdin". |
| 4 | keyhunter scan --url https://... fetches and scans remote content |
VERIFIED | cmd/scan.go:207 dispatches to sources.NewURLSource(f.URL); pkg/engine/sources/url.go enforces http/https scheme whitelist, 30s timeout, 5-redirect cap, io.LimitReader(body, 50MB+1), content-type allowlist (text/*, application/json, application/javascript, application/xml, yaml variants), and emits with Source="url:<url>". Unit tests in url_test.go cover scheme rejection, content-type filter, size cap. |
| 5 | keyhunter scan --clipboard scans current clipboard content |
VERIFIED | cmd/scan.go:201 dispatches to sources.NewClipboardSource() (no positional arg); pkg/engine/sources/clipboard.go calls clipboard.ReadAll via injectable Reader func, returns clear error when clipboard.Unsupported is true. Unit tests in clipboard_test.go cover reader fixture and unsupported branch. |
| 6 | All sources flow through the same detection pipeline (INPUT-06 integration) | VERIFIED | All Sources implement the single Source.Chunks(ctx, out) interface in pkg/engine/sources/source.go. cmd/scan.go:65 calls selectSource(args, sourceFlags{...}) and passes the returned sources.Source to eng.Scan(ctx, src, scanCfg) unchanged — the engine never branches on source type. selectSource enforces mutual exclusion of --git, --url, --clipboard. |
Score: 6/6 truths verified
Required Artifacts
| Artifact | Expected | Status | Details |
|---|---|---|---|
pkg/engine/sources/dir.go |
DirSource, recursive walk, excludes, mmap | VERIFIED | 218 lines; exports DirSource, NewDirSource, NewDirSourceRaw; contains filepath.WalkDir, mmap.Open, isBinary, DefaultExcludes, emitChunks |
pkg/engine/sources/dir_test.go |
Tests for walk/exclude/binary/mmap/ordering/ctx | VERIFIED | 146 lines; TestDirSource_RecursiveWalk, _DefaultExcludes, _UserExclude, _BinarySkipped, _MmapLargeFile, _MissingRoot, _CtxCancellation |
pkg/engine/sources/file.go |
FileSource reusing mmap + emitChunks | VERIFIED | 60 lines; uses MmapThreshold, mmap.Open, shared emitChunks helper |
pkg/engine/sources/git.go |
GitSource via go-git/v5, all refs, OID dedup, --since, short-SHA source |
VERIFIED | 216 lines; git.PlainOpen, collectSeedCommits (branches/tags/stash/remotes), seenBlobs OID dedup, shortSHA:path format, Since filter |
pkg/engine/sources/git_test.go |
In-process go-git fixture tests | VERIFIED | 186 lines |
pkg/engine/sources/stdin.go |
StdinSource reading io.Reader, source="stdin" |
VERIFIED | 85 lines; injectable reader via NewStdinSourceFrom, default os.Stdin |
pkg/engine/sources/stdin_test.go |
Reader fixture tests | VERIFIED | 50 lines |
pkg/engine/sources/url.go |
URLSource with http.Client, LimitReader, CT whitelist | VERIFIED | 135 lines; http.Client, CheckRedirect cap 5, io.LimitReader, scheme whitelist, allowed CT list |
pkg/engine/sources/url_test.go |
Scheme/CT/size cap tests | VERIFIED | 102 lines |
pkg/engine/sources/clipboard.go |
ClipboardSource via atotto/clipboard with graceful fallback | VERIFIED | 45 lines; clipboard.Unsupported guard, clipboard.ReadAll, injectable Reader |
pkg/engine/sources/clipboard_test.go |
Fixture + unsupported branch tests | VERIFIED | 54 lines |
cmd/scan.go |
selectSource dispatcher + new flags | VERIFIED | 292 lines; selectSource(args, sourceFlags), flags --git --url --clipboard --since --exclude --insecure --max-file-size |
cmd/scan_sources_test.go |
selectSource unit tests | VERIFIED | 112 lines |
Key Link Verification
| From | To | Via | Status | Details |
|---|---|---|---|---|
dir.go |
golang.org/x/exp/mmap |
mmap.Open for large files |
WIRED | pkg/engine/sources/dir.go:157 mmap.Open(path) inside size >= MmapThreshold branch |
dir.go |
filepath.WalkDir |
recursive traversal | WIRED | pkg/engine/sources/dir.go:77 |
dir.go |
types.Chunk |
channel send via emitChunks |
WIRED | pkg/engine/sources/dir.go:197,210 out <- types.Chunk{...} |
git.go |
go-git/v5 |
git.PlainOpen |
WIRED | pkg/engine/sources/git.go:47 |
git.go |
repo.References |
iterate refs/heads, refs/tags, refs/stash | WIRED | pkg/engine/sources/git.go:102 repo.References() |
git.go |
types.Chunk |
channel send with git:sha:path source |
WIRED | emitGitChunks sends types.Chunk{Source: fmt.Sprintf("git:%s:%s", shortSHA, f.Name)} |
url.go |
net/http |
http.Client with timeout |
WIRED | pkg/engine/sources/url.go:52-62 defaultHTTPClient with 30s Timeout + CheckRedirect |
url.go |
io.LimitReader |
MaxContentLength enforcement | WIRED | pkg/engine/sources/url.go:104 |
clipboard.go |
atotto/clipboard |
clipboard.ReadAll |
WIRED | pkg/engine/sources/clipboard.go:25,35 |
cmd/scan.go |
pkg/engine/sources |
sources.New{Dir,Git,Stdin,URL,Clipboard}Source |
WIRED | cmd/scan.go:205,211,223,227,243,245 all five constructors reachable through selectSource |
cmd/scan.go |
cobra flags | --git --url --clipboard --since --exclude |
WIRED | cmd/scan.go:284-289 all flags registered in init() and read via the sourceFlags struct |
selectSource |
eng.Scan |
returned sources.Source passed to engine |
WIRED | cmd/scan.go:65,105 single unified dispatch into the shared pipeline |
Note: gsd-tools verify key-links reported some links as not-verified due to regex-pattern strictness (e.g. literal git.PlainOpen vs how the tool constructs its search). All links were confirmed by direct grep against the files. No real wiring gap exists.
Data-Flow Trace (Level 4)
| Artifact | Data Variable | Source | Produces Real Data | Status |
|---|---|---|---|---|
DirSource.Chunks |
data in emitFile |
os.ReadFile / mmap.Open+ReadAt of real files found via filepath.WalkDir |
Yes | FLOWING |
GitSource.Chunks |
data from f.Reader() io.ReadAll |
Real git blobs from commits walked via repo.Log across all seeded refs |
Yes | FLOWING |
StdinSource.Chunks |
data from io.ReadAll(s.Reader) |
os.Stdin in production, injected reader in tests |
Yes | FLOWING |
URLSource.Chunks |
data from io.ReadAll(limited) |
Real HTTP response body via client.Do(req) |
Yes | FLOWING |
ClipboardSource.Chunks |
text from reader() |
clipboard.ReadAll in production |
Yes | FLOWING |
cmd/scan.go src |
sources.Source |
selectSource returns the concrete source based on flags/args |
Yes | FLOWING |
Behavioral Spot-Checks
| Behavior | Command | Result | Status |
|---|---|---|---|
| Build succeeds | go build ./... |
exit 0 | PASS |
| Race tests pass | go test ./pkg/engine/sources/... ./cmd/... -race -count=1 |
ok pkg/engine/sources 1.950s / ok cmd 1.018s |
PASS |
| Vet clean | go vet ./... |
exit 0, no output | PASS |
| Scan help exposes new flags | keyhunter scan --help |
Shows --git --url --clipboard --exclude --since --insecure --max-file-size |
PASS |
| Stdin scan detects keys | echo "OPENAI_API_KEY=sk-proj-..." | keyhunter scan stdin --output json |
JSON findings with "source": "stdin" |
PASS |
| Dir scan with default excludes | keyhunter scan $TMPDIR (with node_modules/foo.txt decoy) |
Only finds keys in root test.txt, node_modules skipped |
PASS |
Git scan produces git:sha:path source |
keyhunter scan --git /tmp/testrepo --output json |
Findings with "source": "git:7dd369c:leak.txt" |
PASS |
| Mutual-exclusion validation | keyhunter scan --git --url https://example.com |
Error: scan: --git, --url, and --clipboard are mutually exclusive |
PASS |
Requirements Coverage
Note: The phase prompt lists requirement IDs INPUT-01..INPUT-06. The REQUIREMENTS.md text for some IDs does not exactly match the phase-prompt mapping (appears to be a doc-order mismatch), but the phase's own plan requirements: declarations are consistent with the phase goal. Coverage is assessed against the phase-prompt mapping (which matches plan assignments).
| Requirement | Source Plan | Description | Status | Evidence |
|---|---|---|---|---|
| INPUT-01 | 04-02 | Directory/recursive scan with glob exclusions | SATISFIED | DirSource + DefaultExcludes + --exclude flag forwarding |
| CORE-07 | 04-02 | mmap-based large file reads | SATISFIED | MmapThreshold=10MB, mmap.Open in both dir.go and file.go |
| INPUT-02 | 04-03 | Git history scan across branches, tags, stash with --since |
SATISFIED | GitSource with collectSeedCommits and Since filter |
| INPUT-03 | 04-04 | stdin scanning | SATISFIED | StdinSource + cmd/scan.go stdin/- dispatch |
| INPUT-04 | 04-04 | URL fetching | SATISFIED | URLSource + --url flag |
| INPUT-05 | 04-04 | Clipboard scanning | SATISFIED | ClipboardSource + --clipboard flag |
| INPUT-06 | 04-05 | Unified source pipeline | SATISFIED | Single Source interface, selectSource dispatcher, unchanged engine pipeline |
Plan 04-01 (go.mod bootstrap) carries no requirement IDs — it is infra-only.
Anti-Patterns Found
No blocker anti-patterns. Minor wiring observations:
| File | Line | Pattern | Severity | Impact |
|---|---|---|---|---|
cmd/scan.go |
289 | --insecure flag declared but not forwarded to URLSource.Insecure |
Info | URLSource.Insecure field exists and is ready to receive the value, but selectSource does not pass it through. TLS verification currently always on. Not in phase Success Criteria; flag is documented for --url future wiring. Recommend fast-follow in Phase 5 or hotfix. |
cmd/scan.go |
288 | --max-file-size flag declared but not forwarded to DirSource/FileSource |
Info | Same as above — flagMaxFileSize is read into a variable but never referenced by selectSource. Not in phase Success Criteria. |
pkg/engine/sources/git.go |
21 | gitBinarySniffSize duplicates BinarySniffSize from dir.go |
Info | Comment already notes "Local to this file until plan 04-02 introduces a package-wide constant". Constants are now defined in dir.go but git.go still uses the local copy. Cosmetic; no correctness issue. |
None of these block the phase goal. None contradict any must-have truth or Success Criterion.
Human Verification Required
See human_verification: in frontmatter. Automated checks (build, race tests, unit tests, CLI smoke tests for stdin/dir/git) all pass. Items requiring human validation are the environmental/network-dependent sources: real clipboard I/O, live HTTPS fetch, and a multi-branch/tag/stash git repo smoke test. All three are code-path covered by unit tests via injected fixtures.
Gaps Summary
None. All six observable truths verified, all thirteen artifacts pass levels 1-4 (exist, substantive, wired, data flowing), all key links confirmed by direct inspection, all unit tests pass under -race, and every Success Criterion was exercised by a behavioral spot-check against the compiled binary (except clipboard + live URL which are appropriately routed to human verification).
The phase goal — "users can point KeyHunter at any content source ... and all are scanned through the same detection pipeline" — is achieved: five distinct sources all implement the single Source.Chunks interface, selectSource dispatches exactly one of them, and engine.Scan consumes the unified interface with no source-type branching.
Two minor follow-up items (--insecure, --max-file-size not forwarded) are documented as Info-level anti-patterns; they are outside the phase Success Criteria and do not block passage.
Verified: 2026-04-05 Verifier: Claude (gsd-verifier)