--- phase: 04-input-sources plan: 04 subsystem: input-sources tags: [stdin, http, clipboard, net/http, atotto, io.LimitReader] requires: - phase: 01-foundation provides: Source interface, types.Chunk, FileSource chunk-windowing pattern provides: - StdinSource reading from any io.Reader (defaults to os.Stdin) - URLSource fetching http/https with timeout, size cap, scheme allowlist, Content-Type filter - ClipboardSource wrapping atotto/clipboard with injectable reader for tests - emitByteChunks helper for in-memory sources (mirrors file.go windowing) affects: [04-05 scan CLI wiring, phase 05 scan command integration] tech-stack: added: - github.com/atotto/clipboard (promoted from indirect) patterns: - Injectable Reader fields on sources for deterministic testing - io.LimitReader(cap+1) overflow-detection idiom for HTTP bodies - Content-Type prefix allowlist with parameter stripping key-files: created: - pkg/engine/sources/stdin.go - pkg/engine/sources/stdin_test.go - pkg/engine/sources/url.go - pkg/engine/sources/url_test.go - pkg/engine/sources/clipboard.go - pkg/engine/sources/clipboard_test.go modified: - go.mod - go.sum key-decisions: - "Named the in-memory chunk helper emitByteChunks to avoid colliding with emitChunks already provided by plan 04-02's dir.go" - "Content-Type allowlist includes text/*, application/json, application/javascript, application/xml, application/x-yaml, application/yaml; empty Content-Type is trusted" - "URLSource enforces the 50MB cap twice: via resp.ContentLength short-circuit and via LimitReader(cap+1) to catch servers that lie or omit Content-Length" - "ClipboardSource uses an injectable Reader func rather than a package-level shim so tests stay hermetic on headless CI" patterns-established: - "In-memory sources: ReadAll the input, then delegate to emitByteChunks for consistent windowing" - "HTTP safety: parse-and-validate scheme BEFORE building the request, fail closed on non-2xx, filter Content-Type, double-cap body size" - "Test injection: expose a function-valued field (e.g. Reader func() (string, error)) so tests never touch real OS state" requirements-completed: [INPUT-03, INPUT-04, INPUT-05] duration: ~8min completed: 2026-04-05 --- # Phase 4 Plan 4: Stdin + URL + Clipboard Sources Summary **Three self-contained Source adapters — StdinSource (io.Reader), URLSource (http/https with 30s timeout, 50MB cap, scheme allowlist, Content-Type filter), and ClipboardSource (atotto/clipboard with injectable fixture reader) — completing Phase 4's input surface area.** ## Performance - **Duration:** ~8 min - **Started:** 2026-04-05T12:10:50Z - **Completed:** 2026-04-05T12:18:55Z - **Tasks:** 1 (combined TDD task for all three sources) - **Files created:** 6 - **Files modified:** 2 (go.mod, go.sum — atotto/clipboard promoted to direct dep) ## Accomplishments - StdinSource: reads any io.Reader, emits chunks with Source="stdin", honors ctx cancellation mid-emit - URLSource: full HTTP(S) fetch with scheme allowlist, 30s timeout, 5-redirect cap, 50MB double-enforced body cap, Content-Type prefix allowlist, User-Agent header - ClipboardSource: wraps clipboard.ReadAll, detects clipboard.Unsupported, accepts an injectable Reader func for hermetic tests - Local emitByteChunks helper mirrors file.go's overlap-windowing so in-memory sources stay consistent without depending on sibling wave-1 plans ## Task Commits 1. **Task 1: StdinSource + URLSource + ClipboardSource (TDD combined)** - `850c3ff` (feat) _Single commit because this plan bundles three small self-contained sources into one TDD task per the plan's design._ ## Files Created/Modified - `pkg/engine/sources/stdin.go` - StdinSource + emitByteChunks helper - `pkg/engine/sources/stdin_test.go` - Basic / Empty / CtxCancel tests - `pkg/engine/sources/url.go` - URLSource with scheme allowlist, size cap, Content-Type filter - `pkg/engine/sources/url_test.go` - Fetches / BinaryContentType / NonHTTPScheme / 500 / Oversize / Redirect - `pkg/engine/sources/clipboard.go` - ClipboardSource with injectable Reader - `pkg/engine/sources/clipboard_test.go` - FixtureReader / ReaderError / EmptyClipboard - `go.mod`, `go.sum` - atotto/clipboard promoted from indirect to direct ## Test Results All 12 subtests pass under `-race`: | Test | Result | | ---------------------------------------- | ------ | | TestStdinSource_Basic | PASS | | TestStdinSource_Empty | PASS | | TestStdinSource_CtxCancel | PASS | | TestURLSource_Fetches | PASS | | TestURLSource_RejectsBinaryContentType | PASS | | TestURLSource_RejectsNonHTTPScheme | PASS | | TestURLSource_Rejects500 | PASS | | TestURLSource_RejectsOversizeBody | PASS | | TestURLSource_FollowsRedirect | PASS | | TestClipboardSource_FixtureReader | PASS | | TestClipboardSource_ReaderError | PASS | | TestClipboardSource_EmptyClipboard | PASS | Verification command: `go test ./pkg/engine/sources/... -race -count=1` → `ok` ## Decisions Made - **emitByteChunks naming:** Plan text assumed emitChunks would be available from plan 04-02. Since plan 04-04 runs in parallel (wave 1), I defined a local helper under a non-colliding name (`emitByteChunks`). Post-merge both helpers now coexist — future cleanup could consolidate, but there is no functional duplication risk. - **Double size-cap enforcement:** Some servers omit or misreport Content-Length. Pairing the Content-Length short-circuit with `io.LimitReader(cap+1)` is a belt-and-braces pattern that never lets >50MB enter memory. - **Empty Content-Type trusted:** Fail-open on empty Content-Type (some servers omit it for text bodies); fail-closed only on explicit binary types. ## Deviations from Plan ### Auto-fixed Issues **1. [Rule 3 - Blocking] Used emitByteChunks instead of emitChunks helper** - **Found during:** Task 1 (source implementation) - **Issue:** Plan referenced a shared `emitChunks` helper produced by plan 04-02's `dir.go`. This worktree executes in parallel with 04-02, and at write time `dir.go` was not yet present — and after merging, using the same symbol name would conflict. - **Fix:** Defined a local `emitByteChunks` helper in `stdin.go` that mirrors `file.go`'s chunk-windowing pattern. URLSource and ClipboardSource reuse it. Confirmed post-merge both helpers coexist without collision. - **Files modified:** pkg/engine/sources/stdin.go (includes helper) - **Verification:** `grep -n "func emitChunks\|func emitByteChunks" pkg/engine/sources/` shows both symbols; `go build` + `go test -race` pass. - **Committed in:** 850c3ff **2. [Rule 1 - Bug] Clipboard test substring assertion was case-sensitive** - **Found during:** Task 1 verification run - **Issue:** `TestClipboardSource_ReaderError` asserted `err.Error()` contained `"clipboard"` but the wrapped error begins with `"ClipboardSource: read: ..."`, so the lowercase substring check failed on the Go-style capitalized prefix. - **Fix:** Lowercased the error string before the Contains check (`strings.ToLower(err.Error())`). Keeps the assertion intent (error must mention clipboard) while remaining robust to casing. - **Files modified:** pkg/engine/sources/clipboard_test.go - **Verification:** Test now passes under `-race`. - **Committed in:** 850c3ff --- **Total deviations:** 2 auto-fixed (1 blocking helper collision avoidance, 1 test assertion bug) **Impact on plan:** Both fixes were required to make the plan executable in the parallel wave-1 context and to make the tests pass. No scope creep; no architectural changes. ## Issues Encountered - atotto/clipboard was present in go.sum as an indirect dependency — `go mod tidy` promoted it to direct when `clipboard.go` imported it. No install step was needed beyond that. - Linux CI environments without xclip/xsel can safely run these tests because all clipboard tests use the injectable Reader fixture; they never touch real OS clipboard state. ## Known Stubs None. All three sources are fully functional and wired end-to-end through the Source interface. Consumer wiring into `cmd/scan.go` is out of scope for this plan (owned by plan 04-05). ## User Setup Required None — no external service configuration required. The `atotto/clipboard` package is a compile-time dependency and the tests use injected fixtures so no runtime clipboard tooling is needed to pass CI. End users who invoke `keyhunter scan --clipboard` on Linux will still need xclip/xsel/wl-clipboard installed; ClipboardSource returns a clear install-hint error when tooling is missing. ## Next Phase Readiness - StdinSource, URLSource, ClipboardSource are ready for wiring into `cmd/scan.go` by plan 04-05. - `emitByteChunks` is a candidate for consolidation with `emitChunks` (from 04-02's `dir.go`) in a future refactor — functionally equivalent, named differently to dodge the wave-1 merge collision. ## Self-Check: PASSED - pkg/engine/sources/stdin.go: FOUND - pkg/engine/sources/stdin_test.go: FOUND - pkg/engine/sources/url.go: FOUND - pkg/engine/sources/url_test.go: FOUND - pkg/engine/sources/clipboard.go: FOUND - pkg/engine/sources/clipboard_test.go: FOUND - Commit 850c3ff: FOUND in git log - `go test ./pkg/engine/sources/... -race -count=1`: PASS - `go vet ./pkg/engine/sources/...`: clean - All grep acceptance checks: PASS --- *Phase: 04-input-sources* *Plan: 04* *Completed: 2026-04-05*