docs(phase-04): complete phase execution
This commit is contained in:
@@ -4,7 +4,7 @@ milestone: v1.0
|
||||
milestone_name: milestone
|
||||
status: executing
|
||||
stopped_at: Completed 04-05-PLAN.md
|
||||
last_updated: "2026-04-05T12:24:42.157Z"
|
||||
last_updated: "2026-04-05T12:29:09.228Z"
|
||||
last_activity: 2026-04-05
|
||||
progress:
|
||||
total_phases: 18
|
||||
@@ -25,8 +25,8 @@ See: .planning/PROJECT.md (updated 2026-04-04)
|
||||
|
||||
## Current Position
|
||||
|
||||
Phase: 04 (input-sources) — EXECUTING
|
||||
Plan: 5 of 5
|
||||
Phase: 5
|
||||
Plan: Not started
|
||||
Status: Ready to execute
|
||||
Last activity: 2026-04-05
|
||||
|
||||
|
||||
144
.planning/phases/04-input-sources/04-VERIFICATION.md
Normal file
144
.planning/phases/04-input-sources/04-VERIFICATION.md
Normal file
@@ -0,0 +1,144 @@
|
||||
---
|
||||
phase: 04-input-sources
|
||||
verified: 2026-04-05T00:00:00Z
|
||||
status: passed
|
||||
score: 6/6 must-haves verified
|
||||
human_verification:
|
||||
- test: "Clipboard scan on a machine with xclip/xsel/wl-clipboard installed"
|
||||
expected: "keyhunter scan --clipboard detects keys copied to clipboard"
|
||||
why_human: "Clipboard tooling is environment-specific; ClipboardSource.Unsupported branch + injectable Reader covers code paths, but real OS clipboard I/O cannot be exercised in a headless verify run"
|
||||
- test: "URL scan against a real HTTPS endpoint"
|
||||
expected: "keyhunter scan --url https://raw.githubusercontent.com/... fetches and scans"
|
||||
why_human: "Requires network access; URL parsing, scheme validation, content-type filtering, 50MB cap, and LimitReader all verified statically and via unit tests, but live TLS + redirect behavior is best smoke-tested by a human"
|
||||
- test: "Git history scan against a repo with multiple branches, tags, and a stash"
|
||||
expected: "keyhunter scan --git /path/to/repo detects keys that exist only in old commits / non-HEAD refs"
|
||||
why_human: "Multi-branch/tag/stash traversal and OID deduplication verified in unit tests using in-process go-git fixtures; a real repo smoke test confirms end-to-end behavior including --since filtering"
|
||||
---
|
||||
|
||||
# Phase 4: Input Sources Verification Report
|
||||
|
||||
**Phase Goal:** Users can point KeyHunter at any content source — local files, git history across all branches, piped content, remote URLs, and the clipboard — and all are scanned through the same detection pipeline
|
||||
**Verified:** 2026-04-05
|
||||
**Status:** passed
|
||||
**Re-verification:** No — initial verification
|
||||
|
||||
## Goal Achievement
|
||||
|
||||
### Observable Truths (from Success Criteria)
|
||||
|
||||
| # | Truth | Status | Evidence |
|
||||
|---|-------|--------|----------|
|
||||
| 1 | `keyhunter scan ./myrepo` recursively scans with glob exclusions and mmap above threshold | VERIFIED | `cmd/scan.go:242` dispatches to `sources.NewDirSource(target, f.Excludes...)`; `pkg/engine/sources/dir.go` implements `filepath.WalkDir` + `DefaultExcludes` + `isExcluded` glob match + `mmap.Open` branch at `size >= MmapThreshold (10MB)`. Behavioral check: dir scan on temp dir with `node_modules/` excluded default, keys only surface from the keep file. |
|
||||
| 2 | `keyhunter scan --git ./myrepo` scans full history (branches, tags, stash) with `--since` | VERIFIED | `cmd/scan.go:226` flag dispatch; `pkg/engine/sources/git.go` uses `git.PlainOpen`, `collectSeedCommits` walks `refs/heads`, `refs/tags`, `refs/stash`, `IsRemote`, dereferences annotated tags, dedupes blobs by OID, and `c.Author.When.Before(g.Since)` enforces `--since`. Behavioral check: `scan --git /tmp/testrepo` produced findings with `source: "git:7dd369c:leak.txt"`. |
|
||||
| 3 | `cat secrets.txt \| keyhunter scan stdin` detects keys from piped input | VERIFIED | `cmd/scan.go:219` maps `target=="stdin" \|\| "-"` to `sources.NewStdinSource()`; `pkg/engine/sources/stdin.go` reads from `io.Reader` (default `os.Stdin`) and emits chunks with `Source="stdin"`. Behavioral check: `echo "OPENAI_API_KEY=sk-proj-..." \| keyhunter scan stdin` produced findings with `"source": "stdin"`. |
|
||||
| 4 | `keyhunter scan --url https://...` fetches and scans remote content | VERIFIED | `cmd/scan.go:207` dispatches to `sources.NewURLSource(f.URL)`; `pkg/engine/sources/url.go` enforces http/https scheme whitelist, 30s timeout, 5-redirect cap, `io.LimitReader(body, 50MB+1)`, content-type allowlist (`text/*`, `application/json`, `application/javascript`, `application/xml`, yaml variants), and emits with `Source="url:<url>"`. Unit tests in `url_test.go` cover scheme rejection, content-type filter, size cap. |
|
||||
| 5 | `keyhunter scan --clipboard` scans current clipboard content | VERIFIED | `cmd/scan.go:201` dispatches to `sources.NewClipboardSource()` (no positional arg); `pkg/engine/sources/clipboard.go` calls `clipboard.ReadAll` via injectable `Reader` func, returns clear error when `clipboard.Unsupported` is true. Unit tests in `clipboard_test.go` cover reader fixture and unsupported branch. |
|
||||
| 6 | All sources flow through the same detection pipeline (INPUT-06 integration) | VERIFIED | All Sources implement the single `Source.Chunks(ctx, out)` interface in `pkg/engine/sources/source.go`. `cmd/scan.go:65` calls `selectSource(args, sourceFlags{...})` and passes the returned `sources.Source` to `eng.Scan(ctx, src, scanCfg)` unchanged — the engine never branches on source type. `selectSource` enforces mutual exclusion of `--git`, `--url`, `--clipboard`. |
|
||||
|
||||
**Score:** 6/6 truths verified
|
||||
|
||||
### Required Artifacts
|
||||
|
||||
| Artifact | Expected | Status | Details |
|
||||
|----------|----------|--------|---------|
|
||||
| `pkg/engine/sources/dir.go` | DirSource, recursive walk, excludes, mmap | VERIFIED | 218 lines; exports `DirSource`, `NewDirSource`, `NewDirSourceRaw`; contains `filepath.WalkDir`, `mmap.Open`, `isBinary`, `DefaultExcludes`, `emitChunks` |
|
||||
| `pkg/engine/sources/dir_test.go` | Tests for walk/exclude/binary/mmap/ordering/ctx | VERIFIED | 146 lines; `TestDirSource_RecursiveWalk`, `_DefaultExcludes`, `_UserExclude`, `_BinarySkipped`, `_MmapLargeFile`, `_MissingRoot`, `_CtxCancellation` |
|
||||
| `pkg/engine/sources/file.go` | FileSource reusing mmap + emitChunks | VERIFIED | 60 lines; uses `MmapThreshold`, `mmap.Open`, shared `emitChunks` helper |
|
||||
| `pkg/engine/sources/git.go` | GitSource via go-git/v5, all refs, OID dedup, `--since`, short-SHA source | VERIFIED | 216 lines; `git.PlainOpen`, `collectSeedCommits` (branches/tags/stash/remotes), `seenBlobs` OID dedup, `shortSHA:path` format, `Since` filter |
|
||||
| `pkg/engine/sources/git_test.go` | In-process go-git fixture tests | VERIFIED | 186 lines |
|
||||
| `pkg/engine/sources/stdin.go` | StdinSource reading `io.Reader`, source="stdin" | VERIFIED | 85 lines; injectable reader via `NewStdinSourceFrom`, default `os.Stdin` |
|
||||
| `pkg/engine/sources/stdin_test.go` | Reader fixture tests | VERIFIED | 50 lines |
|
||||
| `pkg/engine/sources/url.go` | URLSource with http.Client, LimitReader, CT whitelist | VERIFIED | 135 lines; `http.Client`, `CheckRedirect` cap 5, `io.LimitReader`, scheme whitelist, allowed CT list |
|
||||
| `pkg/engine/sources/url_test.go` | Scheme/CT/size cap tests | VERIFIED | 102 lines |
|
||||
| `pkg/engine/sources/clipboard.go` | ClipboardSource via atotto/clipboard with graceful fallback | VERIFIED | 45 lines; `clipboard.Unsupported` guard, `clipboard.ReadAll`, injectable `Reader` |
|
||||
| `pkg/engine/sources/clipboard_test.go` | Fixture + unsupported branch tests | VERIFIED | 54 lines |
|
||||
| `cmd/scan.go` | selectSource dispatcher + new flags | VERIFIED | 292 lines; `selectSource(args, sourceFlags)`, flags `--git --url --clipboard --since --exclude --insecure --max-file-size` |
|
||||
| `cmd/scan_sources_test.go` | selectSource unit tests | VERIFIED | 112 lines |
|
||||
|
||||
### Key Link Verification
|
||||
|
||||
| From | To | Via | Status | Details |
|
||||
|------|-----|-----|--------|---------|
|
||||
| `dir.go` | `golang.org/x/exp/mmap` | `mmap.Open` for large files | WIRED | `pkg/engine/sources/dir.go:157` `mmap.Open(path)` inside `size >= MmapThreshold` branch |
|
||||
| `dir.go` | `filepath.WalkDir` | recursive traversal | WIRED | `pkg/engine/sources/dir.go:77` |
|
||||
| `dir.go` | `types.Chunk` | channel send via `emitChunks` | WIRED | `pkg/engine/sources/dir.go:197,210` `out <- types.Chunk{...}` |
|
||||
| `git.go` | `go-git/v5` | `git.PlainOpen` | WIRED | `pkg/engine/sources/git.go:47` |
|
||||
| `git.go` | `repo.References` | iterate refs/heads, refs/tags, refs/stash | WIRED | `pkg/engine/sources/git.go:102` `repo.References()` |
|
||||
| `git.go` | `types.Chunk` | channel send with `git:sha:path` source | WIRED | `emitGitChunks` sends `types.Chunk{Source: fmt.Sprintf("git:%s:%s", shortSHA, f.Name)}` |
|
||||
| `url.go` | `net/http` | `http.Client` with timeout | WIRED | `pkg/engine/sources/url.go:52-62` `defaultHTTPClient` with 30s Timeout + CheckRedirect |
|
||||
| `url.go` | `io.LimitReader` | MaxContentLength enforcement | WIRED | `pkg/engine/sources/url.go:104` |
|
||||
| `clipboard.go` | `atotto/clipboard` | `clipboard.ReadAll` | WIRED | `pkg/engine/sources/clipboard.go:25,35` |
|
||||
| `cmd/scan.go` | `pkg/engine/sources` | `sources.New{Dir,Git,Stdin,URL,Clipboard}Source` | WIRED | `cmd/scan.go:205,211,223,227,243,245` all five constructors reachable through `selectSource` |
|
||||
| `cmd/scan.go` | cobra flags | `--git --url --clipboard --since --exclude` | WIRED | `cmd/scan.go:284-289` all flags registered in `init()` and read via the `sourceFlags` struct |
|
||||
| `selectSource` | `eng.Scan` | returned `sources.Source` passed to engine | WIRED | `cmd/scan.go:65,105` single unified dispatch into the shared pipeline |
|
||||
|
||||
Note: gsd-tools `verify key-links` reported some links as not-verified due to regex-pattern strictness (e.g. literal `git.PlainOpen` vs how the tool constructs its search). All links were confirmed by direct grep against the files. No real wiring gap exists.
|
||||
|
||||
### Data-Flow Trace (Level 4)
|
||||
|
||||
| Artifact | Data Variable | Source | Produces Real Data | Status |
|
||||
|----------|---------------|--------|---------------------|--------|
|
||||
| `DirSource.Chunks` | `data` in `emitFile` | `os.ReadFile` / `mmap.Open+ReadAt` of real files found via `filepath.WalkDir` | Yes | FLOWING |
|
||||
| `GitSource.Chunks` | `data` from `f.Reader()` `io.ReadAll` | Real git blobs from commits walked via `repo.Log` across all seeded refs | Yes | FLOWING |
|
||||
| `StdinSource.Chunks` | `data` from `io.ReadAll(s.Reader)` | `os.Stdin` in production, injected reader in tests | Yes | FLOWING |
|
||||
| `URLSource.Chunks` | `data` from `io.ReadAll(limited)` | Real HTTP response body via `client.Do(req)` | Yes | FLOWING |
|
||||
| `ClipboardSource.Chunks` | `text` from `reader()` | `clipboard.ReadAll` in production | Yes | FLOWING |
|
||||
| `cmd/scan.go` `src` | `sources.Source` | `selectSource` returns the concrete source based on flags/args | Yes | FLOWING |
|
||||
|
||||
### Behavioral Spot-Checks
|
||||
|
||||
| Behavior | Command | Result | Status |
|
||||
|----------|---------|--------|--------|
|
||||
| Build succeeds | `go build ./...` | exit 0 | PASS |
|
||||
| Race tests pass | `go test ./pkg/engine/sources/... ./cmd/... -race -count=1` | `ok pkg/engine/sources 1.950s` / `ok cmd 1.018s` | PASS |
|
||||
| Vet clean | `go vet ./...` | exit 0, no output | PASS |
|
||||
| Scan help exposes new flags | `keyhunter scan --help` | Shows `--git --url --clipboard --exclude --since --insecure --max-file-size` | PASS |
|
||||
| Stdin scan detects keys | `echo "OPENAI_API_KEY=sk-proj-..." \| keyhunter scan stdin --output json` | JSON findings with `"source": "stdin"` | PASS |
|
||||
| Dir scan with default excludes | `keyhunter scan $TMPDIR` (with `node_modules/foo.txt` decoy) | Only finds keys in root `test.txt`, `node_modules` skipped | PASS |
|
||||
| Git scan produces `git:sha:path` source | `keyhunter scan --git /tmp/testrepo --output json` | Findings with `"source": "git:7dd369c:leak.txt"` | PASS |
|
||||
| Mutual-exclusion validation | `keyhunter scan --git --url https://example.com` | `Error: scan: --git, --url, and --clipboard are mutually exclusive` | PASS |
|
||||
|
||||
### Requirements Coverage
|
||||
|
||||
Note: The phase prompt lists requirement IDs INPUT-01..INPUT-06. The REQUIREMENTS.md text for some IDs does not exactly match the phase-prompt mapping (appears to be a doc-order mismatch), but the phase's own plan `requirements:` declarations are consistent with the phase goal. Coverage is assessed against the phase-prompt mapping (which matches plan assignments).
|
||||
|
||||
| Requirement | Source Plan | Description | Status | Evidence |
|
||||
|-------------|-------------|-------------|--------|----------|
|
||||
| INPUT-01 | 04-02 | Directory/recursive scan with glob exclusions | SATISFIED | `DirSource` + `DefaultExcludes` + `--exclude` flag forwarding |
|
||||
| CORE-07 | 04-02 | mmap-based large file reads | SATISFIED | `MmapThreshold=10MB`, `mmap.Open` in both `dir.go` and `file.go` |
|
||||
| INPUT-02 | 04-03 | Git history scan across branches, tags, stash with `--since` | SATISFIED | `GitSource` with `collectSeedCommits` and `Since` filter |
|
||||
| INPUT-03 | 04-04 | stdin scanning | SATISFIED | `StdinSource` + `cmd/scan.go` `stdin`/`-` dispatch |
|
||||
| INPUT-04 | 04-04 | URL fetching | SATISFIED | `URLSource` + `--url` flag |
|
||||
| INPUT-05 | 04-04 | Clipboard scanning | SATISFIED | `ClipboardSource` + `--clipboard` flag |
|
||||
| INPUT-06 | 04-05 | Unified source pipeline | SATISFIED | Single `Source` interface, `selectSource` dispatcher, unchanged engine pipeline |
|
||||
|
||||
Plan 04-01 (go.mod bootstrap) carries no requirement IDs — it is infra-only.
|
||||
|
||||
### Anti-Patterns Found
|
||||
|
||||
No blocker anti-patterns. Minor wiring observations:
|
||||
|
||||
| File | Line | Pattern | Severity | Impact |
|
||||
|------|------|---------|----------|--------|
|
||||
| `cmd/scan.go` | 289 | `--insecure` flag declared but not forwarded to `URLSource.Insecure` | Info | `URLSource.Insecure` field exists and is ready to receive the value, but `selectSource` does not pass it through. TLS verification currently always on. Not in phase Success Criteria; flag is documented for `--url` future wiring. Recommend fast-follow in Phase 5 or hotfix. |
|
||||
| `cmd/scan.go` | 288 | `--max-file-size` flag declared but not forwarded to `DirSource`/`FileSource` | Info | Same as above — `flagMaxFileSize` is read into a variable but never referenced by `selectSource`. Not in phase Success Criteria. |
|
||||
| `pkg/engine/sources/git.go` | 21 | `gitBinarySniffSize` duplicates `BinarySniffSize` from `dir.go` | Info | Comment already notes "Local to this file until plan 04-02 introduces a package-wide constant". Constants are now defined in `dir.go` but `git.go` still uses the local copy. Cosmetic; no correctness issue. |
|
||||
|
||||
None of these block the phase goal. None contradict any must-have truth or Success Criterion.
|
||||
|
||||
### Human Verification Required
|
||||
|
||||
See `human_verification:` in frontmatter. Automated checks (build, race tests, unit tests, CLI smoke tests for stdin/dir/git) all pass. Items requiring human validation are the environmental/network-dependent sources: real clipboard I/O, live HTTPS fetch, and a multi-branch/tag/stash git repo smoke test. All three are code-path covered by unit tests via injected fixtures.
|
||||
|
||||
### Gaps Summary
|
||||
|
||||
None. All six observable truths verified, all thirteen artifacts pass levels 1-4 (exist, substantive, wired, data flowing), all key links confirmed by direct inspection, all unit tests pass under `-race`, and every Success Criterion was exercised by a behavioral spot-check against the compiled binary (except clipboard + live URL which are appropriately routed to human verification).
|
||||
|
||||
The phase goal — "users can point KeyHunter at any content source ... and all are scanned through the same detection pipeline" — is achieved: five distinct sources all implement the single `Source.Chunks` interface, `selectSource` dispatches exactly one of them, and `engine.Scan` consumes the unified interface with no source-type branching.
|
||||
|
||||
Two minor follow-up items (`--insecure`, `--max-file-size` not forwarded) are documented as Info-level anti-patterns; they are outside the phase Success Criteria and do not block passage.
|
||||
|
||||
---
|
||||
|
||||
_Verified: 2026-04-05_
|
||||
_Verifier: Claude (gsd-verifier)_
|
||||
Reference in New Issue
Block a user