docs(04-04): complete stdin/url/clipboard sources plan

This commit is contained in:
salvacybersec
2026-04-05 15:20:10 +03:00
parent 563d6e5ce2
commit 66fc597399

View File

@@ -0,0 +1,176 @@
---
phase: 04-input-sources
plan: 04
subsystem: input-sources
tags: [stdin, http, clipboard, net/http, atotto, io.LimitReader]
requires:
- phase: 01-foundation
provides: Source interface, types.Chunk, FileSource chunk-windowing pattern
provides:
- StdinSource reading from any io.Reader (defaults to os.Stdin)
- URLSource fetching http/https with timeout, size cap, scheme allowlist, Content-Type filter
- ClipboardSource wrapping atotto/clipboard with injectable reader for tests
- emitByteChunks helper for in-memory sources (mirrors file.go windowing)
affects: [04-05 scan CLI wiring, phase 05 scan command integration]
tech-stack:
added:
- github.com/atotto/clipboard (promoted from indirect)
patterns:
- Injectable Reader fields on sources for deterministic testing
- io.LimitReader(cap+1) overflow-detection idiom for HTTP bodies
- Content-Type prefix allowlist with parameter stripping
key-files:
created:
- pkg/engine/sources/stdin.go
- pkg/engine/sources/stdin_test.go
- pkg/engine/sources/url.go
- pkg/engine/sources/url_test.go
- pkg/engine/sources/clipboard.go
- pkg/engine/sources/clipboard_test.go
modified:
- go.mod
- go.sum
key-decisions:
- "Named the in-memory chunk helper emitByteChunks to avoid colliding with emitChunks already provided by plan 04-02's dir.go"
- "Content-Type allowlist includes text/*, application/json, application/javascript, application/xml, application/x-yaml, application/yaml; empty Content-Type is trusted"
- "URLSource enforces the 50MB cap twice: via resp.ContentLength short-circuit and via LimitReader(cap+1) to catch servers that lie or omit Content-Length"
- "ClipboardSource uses an injectable Reader func rather than a package-level shim so tests stay hermetic on headless CI"
patterns-established:
- "In-memory sources: ReadAll the input, then delegate to emitByteChunks for consistent windowing"
- "HTTP safety: parse-and-validate scheme BEFORE building the request, fail closed on non-2xx, filter Content-Type, double-cap body size"
- "Test injection: expose a function-valued field (e.g. Reader func() (string, error)) so tests never touch real OS state"
requirements-completed: [INPUT-03, INPUT-04, INPUT-05]
duration: ~8min
completed: 2026-04-05
---
# Phase 4 Plan 4: Stdin + URL + Clipboard Sources Summary
**Three self-contained Source adapters — StdinSource (io.Reader), URLSource (http/https with 30s timeout, 50MB cap, scheme allowlist, Content-Type filter), and ClipboardSource (atotto/clipboard with injectable fixture reader) — completing Phase 4's input surface area.**
## Performance
- **Duration:** ~8 min
- **Started:** 2026-04-05T12:10:50Z
- **Completed:** 2026-04-05T12:18:55Z
- **Tasks:** 1 (combined TDD task for all three sources)
- **Files created:** 6
- **Files modified:** 2 (go.mod, go.sum — atotto/clipboard promoted to direct dep)
## Accomplishments
- StdinSource: reads any io.Reader, emits chunks with Source="stdin", honors ctx cancellation mid-emit
- URLSource: full HTTP(S) fetch with scheme allowlist, 30s timeout, 5-redirect cap, 50MB double-enforced body cap, Content-Type prefix allowlist, User-Agent header
- ClipboardSource: wraps clipboard.ReadAll, detects clipboard.Unsupported, accepts an injectable Reader func for hermetic tests
- Local emitByteChunks helper mirrors file.go's overlap-windowing so in-memory sources stay consistent without depending on sibling wave-1 plans
## Task Commits
1. **Task 1: StdinSource + URLSource + ClipboardSource (TDD combined)** - `850c3ff` (feat)
_Single commit because this plan bundles three small self-contained sources into one TDD task per the plan's design._
## Files Created/Modified
- `pkg/engine/sources/stdin.go` - StdinSource + emitByteChunks helper
- `pkg/engine/sources/stdin_test.go` - Basic / Empty / CtxCancel tests
- `pkg/engine/sources/url.go` - URLSource with scheme allowlist, size cap, Content-Type filter
- `pkg/engine/sources/url_test.go` - Fetches / BinaryContentType / NonHTTPScheme / 500 / Oversize / Redirect
- `pkg/engine/sources/clipboard.go` - ClipboardSource with injectable Reader
- `pkg/engine/sources/clipboard_test.go` - FixtureReader / ReaderError / EmptyClipboard
- `go.mod`, `go.sum` - atotto/clipboard promoted from indirect to direct
## Test Results
All 12 subtests pass under `-race`:
| Test | Result |
| ---------------------------------------- | ------ |
| TestStdinSource_Basic | PASS |
| TestStdinSource_Empty | PASS |
| TestStdinSource_CtxCancel | PASS |
| TestURLSource_Fetches | PASS |
| TestURLSource_RejectsBinaryContentType | PASS |
| TestURLSource_RejectsNonHTTPScheme | PASS |
| TestURLSource_Rejects500 | PASS |
| TestURLSource_RejectsOversizeBody | PASS |
| TestURLSource_FollowsRedirect | PASS |
| TestClipboardSource_FixtureReader | PASS |
| TestClipboardSource_ReaderError | PASS |
| TestClipboardSource_EmptyClipboard | PASS |
Verification command: `go test ./pkg/engine/sources/... -race -count=1``ok`
## Decisions Made
- **emitByteChunks naming:** Plan text assumed emitChunks would be available from plan 04-02. Since plan 04-04 runs in parallel (wave 1), I defined a local helper under a non-colliding name (`emitByteChunks`). Post-merge both helpers now coexist — future cleanup could consolidate, but there is no functional duplication risk.
- **Double size-cap enforcement:** Some servers omit or misreport Content-Length. Pairing the Content-Length short-circuit with `io.LimitReader(cap+1)` is a belt-and-braces pattern that never lets >50MB enter memory.
- **Empty Content-Type trusted:** Fail-open on empty Content-Type (some servers omit it for text bodies); fail-closed only on explicit binary types.
## Deviations from Plan
### Auto-fixed Issues
**1. [Rule 3 - Blocking] Used emitByteChunks instead of emitChunks helper**
- **Found during:** Task 1 (source implementation)
- **Issue:** Plan referenced a shared `emitChunks` helper produced by plan 04-02's `dir.go`. This worktree executes in parallel with 04-02, and at write time `dir.go` was not yet present — and after merging, using the same symbol name would conflict.
- **Fix:** Defined a local `emitByteChunks` helper in `stdin.go` that mirrors `file.go`'s chunk-windowing pattern. URLSource and ClipboardSource reuse it. Confirmed post-merge both helpers coexist without collision.
- **Files modified:** pkg/engine/sources/stdin.go (includes helper)
- **Verification:** `grep -n "func emitChunks\|func emitByteChunks" pkg/engine/sources/` shows both symbols; `go build` + `go test -race` pass.
- **Committed in:** 850c3ff
**2. [Rule 1 - Bug] Clipboard test substring assertion was case-sensitive**
- **Found during:** Task 1 verification run
- **Issue:** `TestClipboardSource_ReaderError` asserted `err.Error()` contained `"clipboard"` but the wrapped error begins with `"ClipboardSource: read: ..."`, so the lowercase substring check failed on the Go-style capitalized prefix.
- **Fix:** Lowercased the error string before the Contains check (`strings.ToLower(err.Error())`). Keeps the assertion intent (error must mention clipboard) while remaining robust to casing.
- **Files modified:** pkg/engine/sources/clipboard_test.go
- **Verification:** Test now passes under `-race`.
- **Committed in:** 850c3ff
---
**Total deviations:** 2 auto-fixed (1 blocking helper collision avoidance, 1 test assertion bug)
**Impact on plan:** Both fixes were required to make the plan executable in the parallel wave-1 context and to make the tests pass. No scope creep; no architectural changes.
## Issues Encountered
- atotto/clipboard was present in go.sum as an indirect dependency — `go mod tidy` promoted it to direct when `clipboard.go` imported it. No install step was needed beyond that.
- Linux CI environments without xclip/xsel can safely run these tests because all clipboard tests use the injectable Reader fixture; they never touch real OS clipboard state.
## Known Stubs
None. All three sources are fully functional and wired end-to-end through the Source interface. Consumer wiring into `cmd/scan.go` is out of scope for this plan (owned by plan 04-05).
## User Setup Required
None — no external service configuration required. The `atotto/clipboard` package is a compile-time dependency and the tests use injected fixtures so no runtime clipboard tooling is needed to pass CI. End users who invoke `keyhunter scan --clipboard` on Linux will still need xclip/xsel/wl-clipboard installed; ClipboardSource returns a clear install-hint error when tooling is missing.
## Next Phase Readiness
- StdinSource, URLSource, ClipboardSource are ready for wiring into `cmd/scan.go` by plan 04-05.
- `emitByteChunks` is a candidate for consolidation with `emitChunks` (from 04-02's `dir.go`) in a future refactor — functionally equivalent, named differently to dodge the wave-1 merge collision.
## Self-Check: PASSED
- pkg/engine/sources/stdin.go: FOUND
- pkg/engine/sources/stdin_test.go: FOUND
- pkg/engine/sources/url.go: FOUND
- pkg/engine/sources/url_test.go: FOUND
- pkg/engine/sources/clipboard.go: FOUND
- pkg/engine/sources/clipboard_test.go: FOUND
- Commit 850c3ff: FOUND in git log
- `go test ./pkg/engine/sources/... -race -count=1`: PASS
- `go vet ./pkg/engine/sources/...`: clean
- All grep acceptance checks: PASS
---
*Phase: 04-input-sources*
*Plan: 04*
*Completed: 2026-04-05*