--- phase: 04-input-sources plan: 04 type: execute wave: 1 depends_on: ["04-01"] files_modified: - pkg/engine/sources/stdin.go - pkg/engine/sources/stdin_test.go - pkg/engine/sources/url.go - pkg/engine/sources/url_test.go - pkg/engine/sources/clipboard.go - pkg/engine/sources/clipboard_test.go autonomous: true requirements: - INPUT-03 - INPUT-04 - INPUT-05 must_haves: truths: - "StdinSource reads from an io.Reader and emits chunks with Source='stdin'" - "URLSource fetches an http/https URL with 30s timeout, 50MB cap, rejects file:// and other schemes, and emits chunks with Source='url:'" - "URLSource rejects responses with non-text Content-Type unless allowlisted (text/*, application/json, application/javascript, application/xml)" - "ClipboardSource reads current clipboard via atotto/clipboard and emits chunks with Source='clipboard'" - "ClipboardSource returns a clear error if clipboard tooling is unavailable" artifacts: - path: "pkg/engine/sources/stdin.go" provides: "StdinSource" exports: ["StdinSource", "NewStdinSource"] min_lines: 40 - path: "pkg/engine/sources/url.go" provides: "URLSource with HTTP fetch, timeout, size cap, content-type filter" exports: ["URLSource", "NewURLSource"] min_lines: 100 - path: "pkg/engine/sources/clipboard.go" provides: "ClipboardSource wrapping atotto/clipboard" exports: ["ClipboardSource", "NewClipboardSource"] min_lines: 30 key_links: - from: "pkg/engine/sources/url.go" to: "net/http" via: "http.Client with Timeout" pattern: "http\\.Client" - from: "pkg/engine/sources/url.go" to: "io.LimitReader" via: "MaxContentLength enforcement" pattern: "LimitReader" - from: "pkg/engine/sources/clipboard.go" to: "github.com/atotto/clipboard" via: "clipboard.ReadAll" pattern: "clipboard\\.ReadAll" --- Implement three smaller Source adapters in a single plan since each is <80 lines and they share no state: - `StdinSource` reads from an injectable `io.Reader` (defaults to `os.Stdin`) — INPUT-03 - `URLSource` fetches a remote URL via stdlib `net/http` with timeout, size cap, scheme whitelist, and content-type filter — INPUT-04 - `ClipboardSource` reads the current clipboard via `github.com/atotto/clipboard` with graceful fallback — INPUT-05 Purpose: These three adapters complete the Phase 4 input surface area. Bundling them into one plan keeps wave-1 parallelism healthy (04-02 + 04-03 + 04-04 run simultaneously) while respecting the ~50% context budget since each adapter is self-contained and small. Output: Six files total (three sources + three test files). @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md @.planning/PROJECT.md @.planning/phases/04-input-sources/04-CONTEXT.md @pkg/engine/sources/source.go @pkg/types/chunk.go Source interface: ```go type Source interface { Chunks(ctx context.Context, out chan<- types.Chunk) error } ``` Shared helper (produced by plan 04-02 in pkg/engine/sources/dir.go): ```go func emitChunks(ctx context.Context, data []byte, source string, chunkSize int, out chan<- types.Chunk) error ``` atotto/clipboard API: ```go import "github.com/atotto/clipboard" func ReadAll() (string, error) func Unsupported bool // set on platforms without clipboard tooling ``` Task 1: Implement StdinSource, URLSource, and ClipboardSource with full test coverage - pkg/engine/sources/source.go - pkg/engine/sources/dir.go (for emitChunks signature from plan 04-02) - pkg/types/chunk.go - .planning/phases/04-input-sources/04-CONTEXT.md (Stdin, URL, Clipboard sections) pkg/engine/sources/stdin.go, pkg/engine/sources/stdin_test.go, pkg/engine/sources/url.go, pkg/engine/sources/url_test.go, pkg/engine/sources/clipboard.go, pkg/engine/sources/clipboard_test.go StdinSource: - Test 1: Feeding "API_KEY=xyz" through a bytes.Buffer emits one chunk with Source="stdin" - Test 2: Empty input emits zero chunks without error - Test 3: ctx cancellation returns ctx.Err() URLSource: - Test 4: Fetches content from httptest.Server, emits a chunk with Source="url:" - Test 5: Server returning 50MB+1 body is rejected with a size error - Test 6: Server returning Content-Type image/png is rejected - Test 7: Scheme "file:///etc/passwd" is rejected without any request attempt - Test 8: Server returning 500 returns a non-nil error containing "500" - Test 9: HTTP 301 redirect is followed (max 5 hops) ClipboardSource: - Test 10: If clipboard.Unsupported is true, returns an error with "clipboard" in the message - Test 11: Otherwise reads clipboard (may skip if empty on CI) — use build tag or t.Skip guard Create `pkg/engine/sources/stdin.go`: ```go package sources import ( "context" "io" "os" "github.com/salvacybersec/keyhunter/pkg/types" ) // StdinSource reads content from an io.Reader (defaults to os.Stdin) and // emits overlapping chunks. Used when a user runs `keyhunter scan stdin` // or `keyhunter scan -`. type StdinSource struct { Reader io.Reader ChunkSize int } // NewStdinSource returns a StdinSource bound to os.Stdin. func NewStdinSource() *StdinSource { return &StdinSource{Reader: os.Stdin, ChunkSize: defaultChunkSize} } // NewStdinSourceFrom returns a StdinSource bound to the given reader // (used primarily by tests). func NewStdinSourceFrom(r io.Reader) *StdinSource { return &StdinSource{Reader: r, ChunkSize: defaultChunkSize} } // Chunks reads the entire input, then hands it to the shared chunk emitter. func (s *StdinSource) Chunks(ctx context.Context, out chan<- types.Chunk) error { if s.Reader == nil { s.Reader = os.Stdin } data, err := io.ReadAll(s.Reader) if err != nil { return err } if len(data) == 0 { return nil } return emitChunks(ctx, data, "stdin", s.ChunkSize, out) } ``` Create `pkg/engine/sources/stdin_test.go`: ```go package sources import ( "bytes" "context" "testing" "time" "github.com/stretchr/testify/require" "github.com/salvacybersec/keyhunter/pkg/types" ) func TestStdinSource_Basic(t *testing.T) { src := NewStdinSourceFrom(bytes.NewBufferString("API_KEY=sk-test-xyz")) ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) defer cancel() out := make(chan types.Chunk, 8) errCh := make(chan error, 1) go func() { errCh <- src.Chunks(ctx, out); close(out) }() var got []types.Chunk for c := range out { got = append(got, c) } require.NoError(t, <-errCh) require.Len(t, got, 1) require.Equal(t, "stdin", got[0].Source) require.Equal(t, "API_KEY=sk-test-xyz", string(got[0].Data)) } func TestStdinSource_Empty(t *testing.T) { src := NewStdinSourceFrom(bytes.NewBuffer(nil)) out := make(chan types.Chunk, 1) err := src.Chunks(context.Background(), out) close(out) require.NoError(t, err) require.Len(t, out, 0) } func TestStdinSource_CtxCancel(t *testing.T) { // Large buffer so emitChunks iterates and can observe cancellation. data := make([]byte, 1<<20) src := NewStdinSourceFrom(bytes.NewReader(data)) ctx, cancel := context.WithCancel(context.Background()) cancel() out := make(chan types.Chunk) // unbuffered forces select on ctx err := src.Chunks(ctx, out) require.ErrorIs(t, err, context.Canceled) } ``` Create `pkg/engine/sources/url.go`: ```go package sources import ( "context" "errors" "fmt" "io" "net/http" "net/url" "strings" "time" "github.com/salvacybersec/keyhunter/pkg/types" ) // MaxURLContentLength is the hard cap on URLSource response bodies. const MaxURLContentLength int64 = 50 * 1024 * 1024 // 50 MB // DefaultURLTimeout is the overall request timeout (connect + read + body). const DefaultURLTimeout = 30 * time.Second // allowedContentTypes is the whitelist of Content-Type prefixes URLSource // will accept. Binary types (images, archives, executables) are rejected. var allowedContentTypes = []string{ "text/", "application/json", "application/javascript", "application/xml", "application/x-yaml", "application/yaml", } // URLSource fetches a remote resource over HTTP(S) and emits its body as chunks. type URLSource struct { URL string Client *http.Client UserAgent string Insecure bool // skip TLS verification (default false) ChunkSize int } // NewURLSource creates a URLSource with sane defaults. func NewURLSource(rawURL string) *URLSource { return &URLSource{ URL: rawURL, Client: defaultHTTPClient(), UserAgent: "keyhunter/dev", ChunkSize: defaultChunkSize, } } func defaultHTTPClient() *http.Client { return &http.Client{ Timeout: DefaultURLTimeout, CheckRedirect: func(req *http.Request, via []*http.Request) error { if len(via) >= 5 { return errors.New("stopped after 5 redirects") } return nil }, } } // Chunks validates the URL, issues a GET, and emits the response body as chunks. func (u *URLSource) Chunks(ctx context.Context, out chan<- types.Chunk) error { parsed, err := url.Parse(u.URL) if err != nil { return fmt.Errorf("URLSource: parse %q: %w", u.URL, err) } if parsed.Scheme != "http" && parsed.Scheme != "https" { return fmt.Errorf("URLSource: unsupported scheme %q (only http/https)", parsed.Scheme) } req, err := http.NewRequestWithContext(ctx, http.MethodGet, u.URL, nil) if err != nil { return fmt.Errorf("URLSource: new request: %w", err) } req.Header.Set("User-Agent", u.UserAgent) client := u.Client if client == nil { client = defaultHTTPClient() } resp, err := client.Do(req) if err != nil { return fmt.Errorf("URLSource: fetch: %w", err) } defer resp.Body.Close() if resp.StatusCode < 200 || resp.StatusCode >= 300 { return fmt.Errorf("URLSource: non-2xx status %d from %s", resp.StatusCode, u.URL) } ct := resp.Header.Get("Content-Type") if !isAllowedContentType(ct) { return fmt.Errorf("URLSource: disallowed Content-Type %q", ct) } if resp.ContentLength > MaxURLContentLength { return fmt.Errorf("URLSource: Content-Length %d exceeds cap %d", resp.ContentLength, MaxURLContentLength) } // LimitReader cap + 1 to detect overflow even if ContentLength was missing/wrong. limited := io.LimitReader(resp.Body, MaxURLContentLength+1) data, err := io.ReadAll(limited) if err != nil { return fmt.Errorf("URLSource: read body: %w", err) } if int64(len(data)) > MaxURLContentLength { return fmt.Errorf("URLSource: body exceeds %d bytes", MaxURLContentLength) } if len(data) == 0 { return nil } source := "url:" + u.URL return emitChunks(ctx, data, source, u.ChunkSize, out) } func isAllowedContentType(ct string) bool { if ct == "" { return true // some servers omit; trust and scan } // Strip parameters like "; charset=utf-8". if idx := strings.Index(ct, ";"); idx >= 0 { ct = ct[:idx] } ct = strings.TrimSpace(strings.ToLower(ct)) for _, prefix := range allowedContentTypes { if strings.HasPrefix(ct, prefix) { return true } } return false } ``` Create `pkg/engine/sources/url_test.go`: ```go package sources import ( "context" "net/http" "net/http/httptest" "strings" "testing" "time" "github.com/stretchr/testify/require" "github.com/salvacybersec/keyhunter/pkg/types" ) func drainURL(t *testing.T, src Source) ([]types.Chunk, error) { t.Helper() ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) defer cancel() out := make(chan types.Chunk, 256) errCh := make(chan error, 1) go func() { errCh <- src.Chunks(ctx, out); close(out) }() var got []types.Chunk for c := range out { got = append(got, c) } return got, <-errCh } func TestURLSource_Fetches(t *testing.T) { srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { w.Header().Set("Content-Type", "text/plain") _, _ = w.Write([]byte("API_KEY=sk-live-xyz")) })) defer srv.Close() chunks, err := drainURL(t, NewURLSource(srv.URL)) require.NoError(t, err) require.Len(t, chunks, 1) require.Equal(t, "url:"+srv.URL, chunks[0].Source) require.Equal(t, "API_KEY=sk-live-xyz", string(chunks[0].Data)) } func TestURLSource_RejectsBinaryContentType(t *testing.T) { srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { w.Header().Set("Content-Type", "image/png") _, _ = w.Write([]byte{0x89, 0x50, 0x4e, 0x47}) })) defer srv.Close() _, err := drainURL(t, NewURLSource(srv.URL)) require.Error(t, err) require.Contains(t, err.Error(), "Content-Type") } func TestURLSource_RejectsNonHTTPScheme(t *testing.T) { _, err := drainURL(t, NewURLSource("file:///etc/passwd")) require.Error(t, err) require.Contains(t, err.Error(), "unsupported scheme") } func TestURLSource_Rejects500(t *testing.T) { srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { http.Error(w, "boom", http.StatusInternalServerError) })) defer srv.Close() _, err := drainURL(t, NewURLSource(srv.URL)) require.Error(t, err) require.Contains(t, err.Error(), "500") } func TestURLSource_RejectsOversizeBody(t *testing.T) { // Serve body just over the cap. Use a small override to keep the test fast. big := strings.Repeat("a", int(MaxURLContentLength)+10) srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { w.Header().Set("Content-Type", "text/plain") _, _ = w.Write([]byte(big)) })) defer srv.Close() _, err := drainURL(t, NewURLSource(srv.URL)) require.Error(t, err) } func TestURLSource_FollowsRedirect(t *testing.T) { target := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { w.Header().Set("Content-Type", "text/plain") _, _ = w.Write([]byte("redirected body")) })) defer target.Close() redirector := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { http.Redirect(w, r, target.URL, http.StatusMovedPermanently) })) defer redirector.Close() chunks, err := drainURL(t, NewURLSource(redirector.URL)) require.NoError(t, err) require.NotEmpty(t, chunks) require.Contains(t, string(chunks[0].Data), "redirected body") } ``` Create `pkg/engine/sources/clipboard.go`: ```go package sources import ( "context" "errors" "fmt" "github.com/atotto/clipboard" "github.com/salvacybersec/keyhunter/pkg/types" ) // ClipboardSource reads the current OS clipboard contents and emits them // as a single chunk stream with Source="clipboard". Requires xclip/xsel/ // wl-clipboard on Linux, pbpaste on macOS, or native API on Windows. type ClipboardSource struct { // Reader overrides the clipboard reader; when nil the real clipboard is used. // Tests inject a func returning a fixture. Reader func() (string, error) ChunkSize int } // NewClipboardSource returns a ClipboardSource bound to the real OS clipboard. func NewClipboardSource() *ClipboardSource { return &ClipboardSource{Reader: clipboard.ReadAll, ChunkSize: defaultChunkSize} } // Chunks reads the clipboard and emits its contents. func (c *ClipboardSource) Chunks(ctx context.Context, out chan<- types.Chunk) error { if clipboard.Unsupported && c.Reader == nil { return errors.New("ClipboardSource: clipboard tooling unavailable (install xclip/xsel/wl-clipboard on Linux)") } reader := c.Reader if reader == nil { reader = clipboard.ReadAll } text, err := reader() if err != nil { return fmt.Errorf("ClipboardSource: read: %w", err) } if text == "" { return nil } return emitChunks(ctx, []byte(text), "clipboard", c.ChunkSize, out) } ``` Create `pkg/engine/sources/clipboard_test.go`: ```go package sources import ( "context" "errors" "testing" "time" "github.com/stretchr/testify/require" "github.com/salvacybersec/keyhunter/pkg/types" ) func TestClipboardSource_FixtureReader(t *testing.T) { src := &ClipboardSource{ Reader: func() (string, error) { return "sk-live-xxxxxx", nil }, ChunkSize: defaultChunkSize, } ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) defer cancel() out := make(chan types.Chunk, 4) errCh := make(chan error, 1) go func() { errCh <- src.Chunks(ctx, out); close(out) }() var got []types.Chunk for c := range out { got = append(got, c) } require.NoError(t, <-errCh) require.Len(t, got, 1) require.Equal(t, "clipboard", got[0].Source) require.Equal(t, "sk-live-xxxxxx", string(got[0].Data)) } func TestClipboardSource_ReaderError(t *testing.T) { src := &ClipboardSource{ Reader: func() (string, error) { return "", errors.New("no xclip installed") }, } out := make(chan types.Chunk, 1) err := src.Chunks(context.Background(), out) require.Error(t, err) require.Contains(t, err.Error(), "clipboard") } func TestClipboardSource_EmptyClipboard(t *testing.T) { src := &ClipboardSource{ Reader: func() (string, error) { return "", nil }, } out := make(chan types.Chunk, 1) err := src.Chunks(context.Background(), out) require.NoError(t, err) require.Len(t, out, 0) } ``` Do NOT modify `cmd/scan.go` in this plan. Do NOT create `pkg/engine/sources/dir.go`, `git.go`, or touch `file.go` — those are owned by plans 04-02 and 04-03. go test ./pkg/engine/sources/... -run 'TestStdinSource|TestURLSource|TestClipboardSource' -race -count=1 - `go build ./pkg/engine/sources/...` exits 0 - `go test ./pkg/engine/sources/... -run 'TestStdinSource|TestURLSource|TestClipboardSource' -race` passes all subtests - `grep -n "http.Client" pkg/engine/sources/url.go` hits - `grep -n "LimitReader" pkg/engine/sources/url.go` hits - `grep -n "clipboard.ReadAll" pkg/engine/sources/clipboard.go` hits - `grep -n "\"stdin\"" pkg/engine/sources/stdin.go` hits (source label) - `grep -n "\"url:\" + u.URL\\|\"url:\"+u.URL" pkg/engine/sources/url.go` hits StdinSource, URLSource, and ClipboardSource all implement Source, enforce their respective safety limits (stdin read-to-EOF, url scheme/size/content-type whitelist, clipboard tooling check), and their tests pass under -race. - `go test ./pkg/engine/sources/... -race -count=1` passes including new tests - `go vet ./pkg/engine/sources/...` clean - All grep acceptance checks hit Three new source adapters exist, each self-contained, each with test coverage, and none conflicting with file ownership of plans 04-02 (dir/file) or 04-03 (git). After completion, create `.planning/phases/04-input-sources/04-04-SUMMARY.md` listing the six files created, test names with pass status, and any platform-specific notes about clipboard tests on the executor's CI environment.